Outline DRAFT Updated

Modelling Human Hand-Eye Coordination Using Recurrent Neural Networks[TM1]

I Introduction

What the presentation is going to discuss/ introduce the project/keywords[TM2]

II Rationale

o The goal of this project is to create a VR game in which participants catch a ball, and there head, eye, and hand position/orientation are recorded. Using the participant’s data, a reinforcement learning algorithm will be developed to create a self-learning AI[TM3] . The AI’s performance will be compared to the performance of the participants via simulation in a virtual space.

o The goal of this project is to create a VR game in which participants catch a ball, and there head, eye, and hand position/orientation are used to develop a self-learning AI, which performance will then be compared to a humans in the task of intercepting a ball.

III Background

A. Artificial Neural Network (ANN)

§ An interconnected group of nodes termed “artificial neurons”, sharing the output of one neuron to the input of another.

B. Recurrent neural network (RNN)

§ A RNN is a class of ANN where connections between units/nodes form a directed cycle, allowing dynamic temporal behavior.

C. Dynamic temporal behavior

§ The trajectory of states, in a state space, followed by a system during a certain time interval[TM4] .

D. Deep learning

§ The application of ANNs to learning tasks that contain more than one hidden layer.

E. Machine learning

§ An application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience (by accessing data) without being explicitly programmed.

F. Reinforcement learning

§ A subdivision of machine learning which allows software agents (models) to automatically determine the ideal behavior within a specific context, in order to maximize its performance. This is done by creating a reward function[TM5] , allowing the agents to learn over millions of attempts, steadily increasing reinforcement signals until the agents surpass human ability.

G. Complexity of eye movements & uses for navigation and coordination

§ Basic eye movements can be defined in two categories; saccades, and fixations. Saccades are rapid movement of the eye between fixation points, and a fixation are a point/location the eyes concentrate on for an extended amount of time.

§ By using these two motions, humans are able to navigate and perform complex movements autonomously[TM6] .

§ An example In terms of non-navigation use, is when attempting to locate a moving target which has escaped the eyes visual field, predictive saccades will be made in an attempt to find the objects location of reappearance[TM7] .

H. Virtual Reality (VR) simulation

§ The VR simulation consists of a 4-sided rectangular room 20m in length, a purple cone, a yellow cylinder, 7 orange cylinders, a small green sphere, a purple vector, and on the far side of the room a red ball.

§ The purple cone represents the participant’s head, with the pointed end of the cone being the front of the head. The yellow cylinder is where the participant placed the racket to intercept the ball. The 7 orange cylinders are separate model outputs, which are separated from each other by 5 frames. The small green sphere is the gaze point, where the participant was looking towards. The purple vector is a representation of where exactly the participant was looking in respect to depth. The red ball was the focus point which the participant attempted to intercept.

§ The ball was thrown at each participant 135 times. The ball disappears at random increments of 600, 800, and 1000ms, with the “blanking” lasting 500ms leaving 300, 400, or 500ms of post-blanking before the ball could be intercepted. This blanking was to test if the participant could quickly intercept an object of unknown positioning in a short period of time. The randomization of the blanking was to disallow the development of pattern analysis, and force the participant to predict where the ball would appear.

§ The experiment was performed in a VR environment for multiple reasons such as; ease of data collection, to retain the visual structure of the natural context, control manipulation of the balls trajectory, artificial blanking of ball[TM8] , and the avoidance of uncontrollable variables such as wind resistance, inaccurate throws, etc.

IV Method

1) Participant performs VR simulation by attempting to intercept the ball with the paddle. (go into more detail[TM9] )

2) Collect hand, head, and gaze position/orientation from 10 subjects, aged 19-30 using equipment[TM10] with motion capture markers connected to them. The equipment included; an Oculus DK2 head mounted display, a 14 camera Phasespace X2 motion capture system (75 Hz), a built-in SensoMotoric Instruments binocular eye tracker (75Hz), and a Wilson badminton racket.

3) Focuses on creating multiple model (agent) outputs which contain various states (objective dimensions), state a desired action (interception of ball using paddle), and then create a reward function to optimize the policy.

4) Create media features for simulation presentation.

1. Play, pause, single frame forward, single frame backwards.

2. Changing camera viewpoint from fixed to head, attached to ball, and free camera.

3. Creating method of raising or lowering 1m in the air.

V Results[TM11]

VI Discussion

o Discuss results and relate them to scope (papers I’ve been reading and how this can help future robotics e.g. navigation, task management, grasp, etc.)

VII Conclusion and future works[TM12]

o Make solid conclusions of research (sum up presentation)

o This research could assist in the improvement of robotics AI development towards visual recognition of a moving object, and the movements needed to successfully intercept the object.

VIII Things learned[M13]

o The categorization of different eye movements

1. Saccades: rapid movement of the eye between fixation points

2. Fixations: concentrating the eyes directly on a point/location

3. Smooth pursuits: locking onto and following an objects movements fluidly

o How human eyes are used to navigate an environment and perform tasks.

1. The use of anchor point fixations, which allow the head to turn.

2. How the eye navigates towards the center of vision when turning.

3. The eyes prioritization of movement

4. The eye has a 150-200ms delay from perceived changes to the reaction of such changes to the eyes. E.g. reacting to a moving target.

o How to interpret eye tracking data and label it accordingly to saccades, fixations, blinks, and smooth pursuits.

o A basic understanding of coding using Python

o How to use Vizard as a VR simulator

[TM1]Due to my short-term memory problems some of the comments may be directed as a reminder for me to look at later, be directed to the reviewer, or both. If a comment is unclear please do not hesitate to contact me and ask questions I will attempt to respond ASAP.

Thank you for reviewing!

PS: sorry if I ask something I’ve already asked before I may have forgotten or simply looking for a formal written explanation.

Presentation room: main auditorium

Timeframe: 10 min

Are there any terms that are not well defined in the draft?

[TM2]I will detail this more when I have the rest of the structure complete to ensure a coherent introduction to my presentation

[TM3]Missing anything? Which do you prefer? How could I improve the flow of either statement?

[TM4]How can I explain these in laymen terms/how are these explained in relation to parts in the study? E.g. what is our state space/system?

How should I orientate the background info? I.e. what info should come first in relation to the others?

[TM5]Define?

[TM6]Place example of turning point action

[TM7]Then use visual comparison of how ball is tracked in RL using SMI (SensoMotoric Instruments) vs VR

[TM8]Emphasize the inefficiency of the real life ball catching comparison to enforce the claim.

[TM9]Do I state that the simulation was made with “Vizard VR toolkit”?

[TM10]Do the equipment remain here or should I create an “apparatus” section? If not;

Do I need to state which physics engine was used?

Do I need to state the degree of view the subjects had while wearing the Oculus?

Do I need to state the calibration specs?

[TM11]Will we have any solid numbers/graphs or a visual accuracy in the models before my presentation? Or should I just show the simulation and detail each part?

[TM12]Use the papers I’ve been analyzing to create a list of scopes this research could benefit in robotics development

[M13]Find method of implementing into conclusion to avoid loss of flow in presentation, only have to explain a little bit then say ect.

Search This Blog

RIT 2017 Internship Experience

Outline DRAFT Updated

Comments

Post a Comment

Popular posts from this blog

Day 23

Day 31