1 Introduction

Today, there are many new technologies studied for thousands of disabled persons to aid their communication and mobility in order to increase their life quality and allow them a more autonomous and independent life style and greater chances of social integration[1, 2]. The intelligent human-computer interaction (HCI) system based on visual information is one of these innovation technologies which are becoming more and more popular.

The eye-based HCI system adopts a series of continual eye movements as input to perform simple control activities. The common eye movements include saccades, fixations and blinks. According to the methods of recording eye motions, the eye-based HCI system can mainly be classified into two categories: invasive and active versus non-invasive and passive[3]. The active electrooculography (EOG) system needs to attach 3–5 electrodes on the skin around the eye region (as shown in Fig. 1) inorder to measure the resting potential of the retina reflecting to the eye movements, which is intrusive and makes the user uncomfortable[46]. Moreover, the contact resistance existing between the skin and the electrodes would decrease the accuracy of recording the EOG signal.

Fig. 1
figure 1

EOG-based system using electrodes

The passive video-oculography (VOG) system[79] is based on digital images of eyes captured with a video camera and coupled with image processing and machine vision technologies, as shown in Fig. 2. The VOG system has no connected devices on the user’s body, so the user may be unaware of the monitoring system. For its more natural interaction with the user, the VOG system attracts more researchers’ attention.

Fig. 2
figure 2

VOG-based system using video cameras

The VOG system encompasses a number of component technologies, including spatial eye position tracking, eye-gaze tracking, eye closure state tracking, eye movement tracking and pupil size monitoring[3]. Among these technologies, the eye gaze tracking and the eye movement tracking are easily confused by researchers. In fact, they have different working principles and application fields.

Gaze tracking is the most widely known technology among the above four components. It means the process of finding the exact point on a monitor screen which the user is gazing at[1012]. Generally, it computes the gaze point on the screen based on the pupil center corneal reflection (PCCR) method[13] which needs at least one infrared light emitting diode (LED) as the auxiliary light source. An infrared LED causes a reflection spot on the eyeball. As the eye is round, the reflection spot stays at the same position no matter in which direction the eye is looking. A video camera detects the reflection spot and the center of the pupil. The direction of the eye-gaze can be calculated from the distance of both points by simple linear mapping, as shown in Fig. 3. The eye gaze technology has been applied in the smart “eye-controlled” phones from the Samsung Company. It can automatically scroll the webpage or the window when the user gazes at the bottom of the screen of the phone. Normally, the screen is another essential component besides the video camera and the LEDs in the eye gaze tracking system.

Fig. 3
figure 3

PCCR method used to calculate the point-of-gaze (POG). The parameters are: radius of the cornea r, distance from the center of the cornea to the center of the pupil r d and the index of refraction of the aqueous humor n. The center of the cornea is located at point c and the center of the pupil is located at point p c . The optical axis L is the vector formed from c to p c , and the POG p is the intersection of the OA with the monitor plane

Eye movement tracking means tracking and interpreting different eye activities. These eye activities include fixations, saccades and blinks. Sometimes, they could be captured by an eye tracker. The technology of eye movement tracking doesn’t need the infrared LEDs as the auxiliary light sources. Also it doesn’t need the screen either. It continually captures eye movement images through the video camera and then recognizes the different eye activities by adopting advanced image processing technologies. Fig. 4 shows the diagram of an eye movement tracking system. Furthermore, the identified eye activities can be encoded as input commands to the HCI interfaces, which can control some simple devices[14, 15]. For example, when your eyes move to the left, it may guide the wheelchair to the left, your eyes moving right may guide the wheelchair to the right. So, it could be called an eye-controlled system.

Fig. 4
figure 4

Diagram of eye movement tracking system

The eye-controlled system can not only help people interact with the computer or other devices with screen, but also help people to control some simple devices, such as the on-off switch and the TV at home just through rotating their eyes. Therefore, it is more valuable for the disabled person.

In the eye-controlled system, successful identification of different eye movements is very important. Some intelligent eye-tracker products can identify the eye movements. But almost all the eye trackers need infrared LEDs as the auxiliary light source. These eye trackers must be worn on the head, which would make the user uncomfortable. Moreover, they are very expensive to the ordinary people. For these disadvantages, the eye trackers are not popular in the disabled population.

Today, the image processing technology has greatly developed. But so far, there are few references on using the image processing methods to realize recognition of eye movements. In this paper, we adopt the software (image processing methods) rather than the hardware (eye trackers) to recognize the eye movements for the eye-controlled system.

The paper is organized as follows. Section 1 explains the difference between the eye gaze tracking technology and the eye movement technology. Section 2 discusses the proper placement region of the video camera under the natural light by constructing a mathematical model on the Matlab platform. Section 3 emphasizes the methods for identification of the eye movements from non-frontal face images including image acquisition, face detection, eye window extraction and eye movement recognition. Experiments are presented in Section 4.

2 Modeling of single camera location under the natural light

More and more VOG systems based on the eye gaze tracking technologies use two or more video cameras and several infrared LEDs as the auxiliary light source to achieve more accurate point of gaze with free head motion. However, all the video cameras must be calibrated. This will increase the complexity of the system[16, 17]. In addition, these systems are limited indoors because of the LEDs light source. The eye controlled system discussed in this paper is based on the eye movement tracking technology. It is composed of only one video camera and one computer, without any auxiliary light source to produce glints or any image projecting screen to compute the gaze point. The camera is used to capture the user’s eye moving images in real time, while the computer is used to process the images or analyze data. So, it can simplify or avoid the calibration process and reduce the cost. Also, the system can be used both indoors and outdoors.

To achieve a satisfying accuracy, we should analyze the influence of the head motion on the eyes image acquisition when only one video camera is used.

2.1 Model of eyes rotation with head motion

Assuming the user’s eyes are in the same high plane as the optical axis of the video camera, we design a mathematical model (as shown in Fig. 5) on the Matlab platform. In Fig. 5, C is the video camera, line CO is the optical axis of the video camera, L is the lateral canthus of the left eye, R is the lateral canthus of the right eye, apex angle is the horizontal visual angle of the video camera. The opposite side LR with the length ω is the line between the left and the right lateral canthi. O is the midpoint of line LR. The length of one eye is d. Line LR represents the original eyes position when the user faces straight to the video camera. We adjust the focus of the camera so that the straight line LR must be in the rectangle imaging plane of the video camera.

Fig. 5
figure 5

Model of eyes rotation with head motion

If the user’s head rotates by α degrees to the right side, then line LR connecting the two eyes lateral canthi should rotate to the L′R′ position. Note that the video camera keeps still during the head motion. By projecting the two corner points N and R′ of the right eye onto the horizontal axis of the rectangle imaging plane of the video camera, points N′ and M′ are obtained, respectively. Because the length of the eye is constant, i.e., the length of line R′N equals d, the relation of R′N and M′N′ can be given as

$$R{\prime}N = d$$
(1)
$$M{\prime}N{\prime} = d \times \cos \alpha.$$
(2)

We discuss on the allowable maximum degree of head rotating: If the length of line M′N′ is too short, then the complete two-eye image could not be captured on the imaging plane of the video camera. It means that the length of line M′N′ should be longer than half length of the eye, i.e.

$$d \times \cos \alpha \geqslant{d \over 2}$$
(3)
$${0^\circ }\leqslant\alpha \leqslant{60^\circ }.$$
(4)

So, the allowable maximum degree of the head horizontally rotating to the right or left side is 60°.

2.2 Model of single camera location

We have known that the widest range of the head moving is within ±60° (“±” represents the left or right direction). In actual application, the video camera should be placed at one side of the user to avoid shading user’s sightlines. Where is the appropriate location of the video camera? Few references are found to focus on this question. In our previous paper[18], a mathematical model was made to discuss the reasonable location of the video camera as shown in Fig. 6. To simplify the model, we assume the video camera is as high as the user’s eyes, i.e., the lens of the camera locate in the same plane as eyes. And the optical axis of the camera points at the eyes. The line between the user’s two eyes is regarded as the x-axis in a rectangular coordinate system, the midpoint of the line is the origin. Therefore, what the optical axis pointing at is the y-axis direction when the video camera is placed straight ahead of the user.

Fig. 6
figure 6

Modeling of the camera’s location at the right side

In Fig. 6, apex C is the video camera, apex angle θ is the horizontal visual angle of the camera, the opposite side LR with the length ω is the line between the two lateral canthi. Parameters θ and ω are known. We rotate apex C (camera) around the midpoint O of the opposite side while keeping θ and ω constant. When the camera moves from point C to C′ (or C″), there is an angle between its present optical axis direction C′O (or C″O) and the original optical axis direction, which is called the horizontal deflection angle β of the camera. In fact, the moving trace of apex C is the nearest positions of the camera to the user’s eyes in every β direction.

In Fig. 6 (a), line C′O′ is vertical to line LR, and ∣C′O= r′. According to the triangle relations, a series of equations are got as

$$\matrix{{\theta = {\theta _1} + {\theta _2}} \hfill \cr {\beta = {\theta _2} - \varphi } \hfill \cr }$$
(5)
$$\gamma = {\pi \over 2} - \beta - {\theta _1} = {\pi \over 2} - \beta - (\theta - {\theta _2}) = {\pi \over 2} - \theta + \varphi $$
(6)
$${{{w \over 2}} \over {\sin {\theta _1}}} = {{r{\prime}} \over {\sin \gamma }} = {{r{\prime}} \over {\cos (\theta - \varphi)}}$$
(7)
$${{{w \over 2}} \over {\sin {\theta _2}}} = {{r{\prime}} \over {\sin \delta }} = {{r{\prime}} \over {\cos \varphi }}.$$
(8)

From these equations, the solution is

$$\matrix{{{{r{\prime}}^{\prime 2}} = {{{w^2}} \over 4} \times } \hfill \cr {[(4{{\cot }^2}\theta + 1) \times {{\cos }^2}\varphi + 4\cot\theta \times \sin \varphi \times \cos \varphi + {{\sin }^2}\varphi ]} \hfill \cr }$$
(9)
$$\tan \varphi = {{{w \over 2}} \over {r{\prime}\cos \beta }} \Rightarrow \,\beta = {\sin ^{ - 1}}\left[ {{w \over 2} \times {{\cos \varphi } \over {r{\prime}}}} \right] - \varphi.$$
(10)

Similarly, in Fig. 6 (b), line C″O″ is vertical to line LR, and ∣C″O∣ = r″. The solution for Fig. 6 (b) is

$$\matrix{{{{r}^{\prime\prime 2}} = {{{w^2}} \over 4} \times } \hfill \cr {[(4{{\cot }^2}\theta + 1) \times {{\cos }^2}\varphi - 4\cot \theta \times \sin \varphi \times \cos \varphi + {{\sin }^2}\varphi ]} \hfill \cr }$$
(11)
$$\beta = {\sin ^{ - 1}}\left[ {{w \over 2} \times {{\cos \varphi } \over {r^{\prime\prime}}}} \right] + \varphi.$$
(12)

The comprehensive conclusion by integrating (9)(12) is

$$\matrix{{r = } \hfill \cr {\sqrt {{{{w^2}} \over 4}{\rm{[(}}4{\rm{co}}{{\rm{t}}^2}\theta + 1{\rm{)}}\cdot{{\cos }^2}\varphi \pm 4{\rm{cot}}\theta \cdot\sin \varphi \cdot\cos \varphi + {{\sin }^2}\varphi {\rm{]}}} } \hfill \cr }$$
(13)
$$\beta = {\sin ^{ - 1}}\left[ {{w \over 2}\cdot{{\cos \varphi } \over r}} \right] \mp \varphi $$
(14)

where θ is the horizontal vision angle of the video camera, and ω is the length between two eyes lateral canthi.

Parameter r in (13) is the shortest distance between the video camera and the user’s eyes. It means the horizontal visual angle of the video camera can cover the complete two eyes on the captured eye image when the camera locates at these nearest positions. And the coordinates (r sin β, r cos β) represent the location of the camera. The moving of the apex point C (camera) is traced with the corresponding r and β.

A detailed deducing process can be found in [18], and the parameter is computed on Matlab platform. The Cognex In-Sight Micro1020 camera with the Computer M2514-MP lens is used to capture and store the eye images of the model head, as shown in Fig. 7. The horizontal visual angle is θ = 20° and the length of two eyes’ lateral canthi of the model head is ω = 94 mm. At first, the camera is straight ahead of the user, so the parameter \(\varphi = {\theta \over 2} = {10^\circ }\).

Fig. 7
figure 7

The video camera and the model head

We shift the video camera to the right side slowly, and then angle φ decreases gradually. When the video camera is just in front of the right eye lateral canthus, angle α decreases to 0°. When the camera locates beyond the lateral canthus of the right eye, angle φ starts to increase gradually. Given φ = 0°, 1°, 2°, ⋯, 90° in (13) and (14), the values of r and β are computed in pairs, as shown in Table 1.

Table 1 Computing results of the shortest distance between the camera and the user

From the simulation and computation results of Table 1, we can find the conclusions as follows:

  1. 1)

    The shortest distance r between the video camera and the midpoint of the two eyes lateral canthi decreases when α drops from 10° to 0°. Meanwhile, the deflection angle β of the optical axis of the camera approximately increases linearly.

  2. 2)

    When the camera is just in front of the user’s eyes, i.e., β = 0°, parameter r reaches the maximum value, r = 266.6 mm. With angle β growing, the distance r drops gradually. When β = 10.3°, the camera just locates in front of the right eye lateral canthus.

  3. 3)

    If φ ⩾ 60°, then r ≺ 100 mm. It means the camera is very close to the eyes. It isn’t reasonable, so these positions should be given up. Parameter φ should be smaller than 60°.

Based on these simulation results, we capture the images of the model head in each β direction (Fig. 7) by shifting the video camera. We find that when the deflection angle β is less than 40°, complete two-eye images can be captured. When β is more than 40°, it can acquire only one-eye images rather than two-eye images, as shown in Fig. 8.

Fig. 8
figure 8

Eye images in different β direction

Test results show that the single video camera should locate at one side of the user within 40° so as to capture the complete two-eye images. And the actual distance between the video camera and the user’s head should be farer than the theoretically calculated shortest value r in each β direction.

3 Identification of eye movements from non-frontal face images

Identification of eye movements is very important to the HCI system based on visual information. Many researchers have been devoting themselves to identify eye movements through all kinds of EOG signal processing methods. In this paper, we try to identify the eye movements through image processing methods. Fig. 9 shows the processing diagram of recognizing eye movements from the sampled images. The image processing procedure includes 4 steps: 1) face detection from the non-frontal head-shoulder images, 2) rough extraction of the eye window from the detected face image, 3) accurate localization of the pupil center in the eye window, 4) identification of different eye movements according to the changes of the pupil center trajectory.

Fig. 9
figure 9

The working process of the eye-controlled system

3.1 Image acquisition

We use the In-Sight Micro1020 video camera from the Cognex Company to sample the head-shoulder images under the natural light in our lab. During the acquisition, the video camera is placed at a fixed position and the user is asked to turn to the left side or the right side. The application software “Compass” installed in the iPhone 4 smart cellphone of Apple is adopted to accurately adjust the position of the user. The user is about 1.5 meters away from the video camera. The detailed image acquisition process is illustrated as follow.

First, the user is asked to sit directly in front of the video camera. And the cellphone is put on the chair. We start the Compass software and adjust the azimuth to 0° as the original position. Then, we start the In-Sight Explorer Easy Builder View application software from the Cognex Company to sample the head-shoulder images with a 2 Hz sampling rate. During the acquisition period of 15 seconds, the user is asked to continually move his eyes to the left, right, upwards and downwards in a sequential order without head motion.

Second, the user is asked to turn left or turn right to the 10° azimuth indicated by the Compass software without moving the video camera. We capture 60 head-shoulder images with the same frequency while the user moves his eyes in sequence.

Third, we repeat this acquisition process at each azimuth of ±20°, ±30°, ⋯, ±60°(“+” corresponding to the right side, and “—” corresponding to the left side). Thus, we capture total 390 images in different azimuths. Each image has the same size of 640 × 480 pixels.

3.2 Nonfrontal face detection

Among the face detection methods, the Adaboost-based cascade classifier algorithm[19, 20] has the shortest response time, which is classical and popular. However, the algorithm can only detect the frontal faces. An improved Ad-aboost algorithm based on the extended rectangle features proposed by Lienhart[21] can detect the non-frontal faces. According to the principle and work steps of this improved Adaboost algorithm, the non-frontal face detection program is developed based on the OpenCV library[22]. Table 2 lists the non-frontal face detection results in 0° –60° azimuths. There are two evaluation indices which are important for face detection. One is hit rate, and the other is false detection. The hit rate is defined as the ratio of the number of successfully detected faces and the number of total images. The false detection indicates the number of falsely detected faces. An idealized algorithm for face detection should have a 100% hit rate and no false detection. Results from Table 2 show that if the video camera locates on the side of the user within ± 30°, then the non-frontal face hit rate is 100% and the false detection is 0. But the hit rate would drop off or the number of false detection would increase if the azimuth is beyond ± 30°. Fig. 10 shows parts of successful nonfrontal face detection examples. Each row represents one azimuth which is adjusted by the Compass software in the iPhone. For example, “10° L” means the user sits on the left side of the video camera with 10° azimuth. Each column represents a different eye gesture. For example, “straight” means the user looks straight ahead, “left” means the user’s eyes are moving to the left. The rectangle labels the detected face region in each head-shoulder image. We extract the face region from each head-shoulder image with the union size of 150 × 150 pixels for further processing.

Fig. 10
figure 10

Successful non-frontal face detection results in 0°–30°

Table 2 Non-frontal face detection results in 0°–60°

3.3 Eye windows extraction

In general, all image projection functions can be used to detect the boundary of different image regions. In a face image, there are eyebrows, nose and lips besides eyes. There are two distinct characteristics in the eye area. The eye area is darker than its neighboring areas. Also, the intensity of the edge of the eyeball rapidly changes. So some researchers use integral projection function (IPF) method to locate two eyes in the frontal face images. Experiments made by us show the accuracy of locating eye center point in the non-frontal face is not satisfying. But we can roughly extract eye windows from the detected face images based on the results from IPF to prepare for the pupil center localization. Eye windows mean the image regions including eyes[23].

Suppose I (x, y) is the intensity of a pixel at location (x, y), the horizontal integral projection IPF v (x) and the vertical integral projection IPF h (y) of I (x, y) in intervals [x 1, x 2] and [y 1, y 2] can be defined as[24]

$$IP{F_h}(y) = \int_{{x_1}}^{{x_2}} {I(x,y){\rm{d}}x} $$
(15)
$$IP{F_v}(x) = \int_{{y_1}}^{{y_2}} {I(x,y){\rm{d}}y}.$$
(16)

The detailed procedure of using IPF to roughly extract eye windows from the non-frontal face images with the union size of 150 × 150 pixels is as follows.

  • Step 1. Pre-processing the detected face image. This step includes 3 sub-steps. 1) Converting the color image to the gray scale image. 2) Equalizing the histogram of the gray scale image to compensate the illumination. 3) Automatically segmenting the gray scale image by using the Otsu method[25] to get a binary image.

  • Step 2. Making the horizontal integral projection. The left eye and the right eye may not be in a horizontal line when the user doesn’t face straight to the video camera. So we use (15) to perform the horizontal IPF on the left half face and right half face, and get the two corresponding y-coordinates (y L , y R ) for the two eyes.

  • Step 3. Making the vertical integral projection. According to (16), we can calculate the x-coordinates (x L , x R )of the two eyes.

    Then, the roughly estimated left eye center point is P L (x L , y L ) and the roughly estimated right eye center is P R (x R , y R ). In Fig. 11, symbol “*” denotes the rough eye center points calculated by the IPF method.

  • Step 4. Eye windows extraction based on the roughly estimated eye center points. Suppose the distance between P L and P R is d. Then the eye windows are rectangles of the size of 0.6d × 0.3d. The coordinates of the top left corner of the right eye window could be set as (x border, y R −0.15 × d), and the coordinates of the top left corner of the left eye window could be set as (x border + d, y L − 0.15 × d). The value of parameter x border could be set in 0–10 pixels. For the right image in Fig. 11, e.g., x border = 5 (pixels), d = 120 − 36 = 84 (pixels), then the top left corner of the right eye window is at (5, 43), and the top left corner of the left eye window is at (89, 38). These two eye windows have the same size of 50 × 25 pixels.

Fig. 11
figure 11

Eye windows extraction

Sometimes the eye center points estimated by IPF method are outside the true eye sockets, and the calculated eye windows based on the estimated eye center points may not include the true eyes. To avoid missing the correct eye windows, we can choose the first frame of continually sampled face images to manually pre-estimate the range of eye sockets before starting to extract the eye windows. If the eye center points calculated by IPF are beyond the pre-estimated eye range, then the eye windows are the pre-estimated eye sockets rather than the calculated rectangle based on the IPF results.

In actual application, images are sampled continually. The head pose has few changes between any two neighboring frame images. Thus, we track the eye sockets successfully from totally 390 detected face images with different azimuths.

3.4 Pupil center locating

The eye is a slightly asymmetrical globe. The iris is the pigmented part in the eye. The pupil is the black circular opening in the iris that lets light in.

When eyes move, the position of pupils will change with eye movements. According to the change trajectory of the pupil center, different eye movements (left, right, up and down) could be recognized. Rigidly, the pupil would not appear to be a circle when the eye moves to the corner, shown as the left image in Fig. 11, i.e., the pupil center point may disappear at the time. So, the pupil center mentioned in this paper means the central point of the black region in the eye.

The popular method for achieving the coordinates of the pupil center is to calculate the centroid of the connected area on the basis of edge detection on the eyes[26].In this method, the centroid is regarded as the pupil center.

Among the edge detection algorithms, the Canny operator based method[27] is commonly adopted. Because the two eyes always move towards the same direction at any time, we just need to detect the edge of one eye by using the Canny operator. But sometimes we can’t get the desirable centroids. The failure may be caused by the property of the Canny edge detection algorithm itself. For the Canny method, it needs two threshold values to detect and connect the eye edge. When the user doesn’t face straight at the camera, i.e., the user faces at the left or the right side of the camera with a deflection angle, the threshold values used in the Canny method need to be changed adaptively with different deflecting directions in real time. If the fixed threshold values are used, it is possible to detect the eye edge incompletely. Fig. 12 shows two failure cases of the eye edge detection. There is no desirably connected area or more than one connected area after Canny edge detection on the right eye so that we can’t get the correct centroid which corresponds to the pupil center.

Fig. 12
figure 12

Failure cases of the eye edge detection

Eyes have a feature, i.e., the pupil is always darker than other parts in an eye. So, we propose a new method to achieve the accurate coordinates of the pupil center. The detailed steps of locating the pupil center are described as follows.

  • Step 1. Filtering the right eye window by the Gaussian filter with the template g = [010, 111, 010].

  • Step 2. Segmenting the right eye window in the horizontal and vertical directions respectively into 3 equal parts. In other word, segmenting the right eye window into 3 × 3 sub-areas, as shown in Fig. 13. The size of each sub-area is equal.

  • Step 3. Counting the number N of black pixels in each sub-area. Normally, the pupil area has the maximal value of the number N.

  • Step 4. Searching the point with the minimal intensity value in the pupil area, which corresponds to the pupil center point (the red “*” symbol shown in Fig. 13).

Fig. 13
figure 13

Segment the eye window into 3*3 sub-areas

According to these four steps, we calculate the pupil center coordinates of right eyes in different azimuths, as listed in Table 3. The first column, “Azimuth”, means a deflection angle between the user and the camera. The calculated pupil center point is represented with x-y coordinates in pixels on the detected face image.

Table 3 x-y coordinates of the right pupil center in 0°–30° azimuths (unit: pixel)

We refer to the criterion provided by Jesorsky et al.[28] to test the quality of pupil center locating. The accuracy of pupil center locating from the 28 face images corresponding to Table 3 is 96.43%. It shows that the proposed method can successfully locate the eye pupil center in non-frontal faces within ±30° side directions.

3.5 Eye movements identification

The pupil center changes with eye movements. According to the continually changing coordinates of pupil center, we can recognize different eye movements (left, right, up and down). In Fig. 14, it shows the relationship between the eye movements and the pupil center of any two neighboring images. When eyes move to the left, the x-coordinate value of the pupil center will increase. On the contrary, the x-coordinate value will decrease with eyes moving right. Similarly, the y-coordinate value decreases with eyes moving up. And the y-coordinate value increases with eyes moving down. Thus, we can identify the eye movements including eye moving left, moving right, and moving up and down. The prerequisite is that the images representing eye movements must be continually sampled.

Fig. 14
figure 14

Relationship between eye movements and pupil center

4 Experimental example

Experiment is done to validate the proposed methods. In the experiment, the distance between the user and the camera is about 1 m. At first, the user faces straight to the video camera, i.e., the deflection angle between the user and the video camera is 0°. Then, the user turns to the left side or right side of the video camera to make the deflection angle be ± 10°, ± 20°, ± 30° (“+” corresponding to the right side, and “—” corresponding to the left side). For each azimuth, 30 images are captured at a rate of 2 fps in the horizontal or vertical direction, respectively.

During sampling the images of eye movements, the user is asked to move eyes according to a certain sequence like in Fig. 15. Through the procedures of face detection, eye windows extraction and pupil center locating, a series of x-y coordinates of pupil center points of the right eyes are obtained.

Fig. 15
figure 15

Sequence of eye movements in the experiment

We plot these coordinates in a chronological sequence on the Matlab platform. In Fig. 16, the coordinate values of the right pupil center are plotted, with Fig. 16 (a)(g) corresponding to 0°–±30° azimuths, respectively. The x axis represents the time sequence while the y axis represents coordinate values. The blue “●” line and the red “■” line show the changing x-coordinate and the y-coordinate of the pupil center in the right eye when the user looks left or right, respectively. The green “▲” line and the purple “*” line show the changing x-coordinate and the y-coordinate of the pupil center of the right eye when the user looks upwards or downwards, respectively. Note that these x-coordinate and y-coordinate take the eye socket as the reference.

Fig. 16
figure 16

Plots of coordinate values of pupil center in different directions

By observing Fig. 16 carefully, we can know the horizontal eye movements and the vertical eye movements are recognized separately.

  1. 1)

    From the blue “●” lines, we can tell the eye movements in the horizontal direction. When the blue lines go up, i.e., the value of x-coordinate increases, it means the eyes move left. On the contrary, the blue lines going down means the eyes move right. For example, in Fig. 16 (a), one period of horizontal eye movement includes four segments: The first segment (points 1–5, C–L) represents the process of eyeball moving from the center of the eye socket to the left corner. The second segment (points 5–9, L–C) represents the process of eyeball returning from the left corner of the eye socket to the center. The third segment (points 11–14, C–R) represents the process of eyeball moving from the center to the right corner of the eye socket. The fourth segment (points 16–19, R–C) represents the process of eyeball returning from the right corner of the eye socket to the center. The x-coordinate values of points 9–11 are the same and it means the eyeball is stopping in the center of the eye socket. From point 20, it begins another circle of eye horizontal movement.

  2. 2)

    From the red “■” lines, we find that the changing values of y-coordinate are not noticeable when eyes move in the horizontal direction.

  3. 3)

    Similarly, we can tell the eye movements in the vertical direction from the purple “*” line. If purple lines go down, i.e., the value of y-coordinate decreases, then it means the eyes move up. On the contrary, the purple lines going up means the eyes move down. For example, in Fig. 16 (a), one period of vertical eye movement includes four segments: The first segment (points 1–7, C–U) represents the process of eyeball moving from the center of the eye socket to the up edge. The second segment (points 7–11, U-C) represents the process of eyeball returning from the up edge of the eye socket to the center. The third segment (points 11–15, C–D) represents the process of eyeball moving from the center to the down edge. The fourth segment (points 15–18, D-C) represents the process of eyeball returning from the down edge of the eye socket to the center. From point 19, it begins another period of vertical eye movement.

  4. 4)

    From the green “▲” lines, we find that the changing values of x-coordinate are not noticeable when the eyes move in the vertical direction.

  5. 5)

    According to the trajectory of the pupil center, the eye movements are identified. Comparing the recognized results with the actual sampled eye movement images, we can get the rates of eye movement identification at each azimuth (as shown in Table 4) which equals the ratio of the number of correct identifications and the total frames (30 frames at each azimuth). In Fig. 16 (a), when the deflection angle between the video camera and the user is zero, the relationship between the eye movements and the trajectory of the pupil center is clear. With the deflection angle increasing, it becomes more difficult to identify eye movements and the accuracy of identifying eye movements would decrease, as shown in Table 4. Also, we know that the identifying accuracy of eye horizontal movements is higher than that of eye vertical movements.

Table 4 Eye movement identification rate in 0° −±30° azimuths

5 Conclusions

The experiment proves our method can identify eye horizontal and vertical movements from 420 non-frontal face images in ±30° with an average identification rate of 86.67%. Furthermore, we sample face images from different users who wear glasses. Results show that we can still detect these faces and achieve the pupil center location in ±30° side directions of the camera and successfully recognize the eye movements.

The success of simulation experiment on Matlab platform gives us much confidence. It encourages us to realize the physical system with the electrical devices. So, it is our next work.