James L. Crowley
Robotics and Autonomous Systems, No.
19, 1997
Summary
The paper introduces
different method for face recognition in order to enhance communication between
humans and robots. Vision of the face is considered an important element in
human-to-human communication and therefore the authors believe its importance also
in the case of interacting with a machine.
Computer vision, which
today is easily achievable thanks to an exponential decrease in cost, can
permit tracking, identification and observation. The authors introduce an
innovative machine vision technique compared with the more traditional ones
based on contrast contours, which have the problem of being precise only in
presence of polygonal shapes (and therefor not good for finger or head
tracking).
Human gestures are mainly
divided in 3 types: semiotic (communicating useful information), ergotic
(associated with the notion of work) and epistemic (allows humans to learn from
the environment through tactile experience). Human-Computer Interaction as
investigated for long time the application of ergotic functions (such as
keyboards, mousses etc…), the authors as well believe that computer vision will
open a new world in this field, being a non-intrusive sensor, fast and robust.
The tracking is then defined by the authors as follows “Given an observation of an object at time t, determine the most likely
position of the same object at time t+ΔT”.
Proposed is a Correlation
as a tracking technique using a “Digital Desk”; sum of squared differences
(SSD), obtained through the neighborhood and the image itself in each pixel i and j. Using this function we can obtain the term representing the
energy in the image neighborhood and the energy in the reference window. In
this computation the size of the reference template must be computed in order
to not be corrupted by the background and in order to not be too uniform. The
size of the search region instead is computer according to the tracking sped
and is derived experimentally. When there is no tracking operation, the system
monitors an N x N tracking trigger and the energy of the resultant difference
image is computed in order to assure when the tracking device is adequately
positioned (when it drops below the threshold). The proposed method is then
applied to face recognition in comparison with other common method, such as
finding face color and blink detection.
The SSD comes useful being
more precise than color recognition and not having the problem of requiring
detection of an image pair during blinking. The method is then further
developed in eigenspace decomposition, which considered pixels as a very long
vector and having being then a prototypical face that can be considered as a
basis for describing other images. This kind of approach may be expensive in
the long term when collecting a big number of images in the database.
The method ideally is
ideal also for eye tracking problem, with the future aim of making is a pointer
device for computer, substituting the mouse.
A mentioned other
technique are color recognition, which basically work on the Bayes Rule,
finding the probability to detect skin, given a certain color, and eye
blinking, which, comparing two different photograms is able to understand
weather the target is a face or not
Key
Concepts
Face Recognition,
Human-Machine Interaction
Key Results
The author obtained
successful results in all methods proposed, with the aim for the future of
better implementation. The importance of detecting faces is underlines for
video communication, while it is also important for speech recognition through
detection of lips movement, perception weather the operator is paying attention
to a monitor or a certain activity or even just be aware of the presence of a
human in the nearby.
Definitely for fully
obtain results in this field, active computer control of the direction, zoom, focus
and aperture are required, but as stated already back in 1997 by the author,
the technology is already on hand and with relatively low price.
No comments:
Post a Comment