Follow me

Tuesday, October 2, 2012

Human – Robot Interaction Through Gesture – Free Spoken Language


Vladimir Kulyukin
Autonomous Robots, No. 16, 2004
Summary
                  Human Robot interaction has become more and more important in relevant activities with challenging tasks, from mining to space walks, therefore tight integration of perception, language and vision are important characteristics for cooperating in such environments, GUI interface is not acceptable any longer. It is believed that everyday language is somehow grounded in visually-guided activities and deep integration must be achieved at this level. The system proposed is a speech method for commanding a robot in performing certain grasping tasks (such as a “pepsi can”). To modes of interaction are possible, local (when the robot and the operator are I each others sight) and remote (which need the use of a camera to allow the human to understand the robots environments). The proposed robot has a standard 3T architecture, there are three tiers of functionality: deliberation, execution and control. The execution ties (which receives from the deliberation tier the inputs to be satisfied) is implemented with the Reactive Action Package (RAP) System, this is a set of methods for achieving a specific goal under different circumstances. Steps needed to execute a method are task nets and the RAP may operate knowing or not where the object is, the RAP becomes a task when its index clause is matching a task description. The control tier contains the robot’s skills, these are enables when the control tier is asked to execute them and is not aware of success or failures, since these are under the execution tier.
Actions are then categorized in internal (for manipulation of robot’s memory) an external (for manipulation of external objects or moving operations), the output of a skill is a set of symbolic assertions that the skill puts in the robot’s memory. Goals, object and actions are part of a Semantic Network, where each not represents a memory organization package (MOP) and they are connected between each other with abstraction links or package links. Nodes are activated by an activation algorithm on the direct memory access parsing (DMAP) done by token sequences. For token T all expectation that present T must be advances, for those activated then the target MOPs must be activated and for each MOP, in presence of callbacks (Kulyukin and Settle, 2001), a run must be performed. Object recognition is performed through GHD (Generalized Hamming Distance), which is a medication of the classical method, being in fact capable of approximate similarity, and color histogram models (CHM). The system needs pieces of a priori knowledge, it needs the semantic network of goals and objects, the library of object models, the context-free command and control grammars for speech recognition, the library of RAP and the library for robotics skills. Interaction is based on passive rarefication of knowledge through goal disambiguation, mutual understand is necessary, denoting this to be a cognitive machinery situation. Voice recognition is performed through Microsoft Speech API (SAPI), which is a middle layer between an application and a speech recognition engine, it includes the Microsoft English SR Engine Version 5, with 6o,000 English words and capable to be provided with other languages. SAPI converts voice input into a token sequence, activating the algorithm discussed above. Goal disambiguation can appear under the form of sensory, mnemonic and linguistic. The robot is capable of asking for clarifying in ambiguous situations. A great advantage of the system is that it allows introspection, this permits operator to ask the robot what it is capable to do and therefore it make the machine easy for non expert operators and helps learning performances both of robot and human.
Key Concepts
Human – Robot Interaction
Key Results
Experiments have been performed proving the importance of introspection and showing how it is mainly used in at the first interaction and abounded with decrease of learning factor. Limitations are still to be overcome, in fact it can’t handle deictic references, quantification and negation and it is capable of interacting with one person.

No comments: