Follow me

Showing posts with label Computer Vision. Show all posts
Showing posts with label Computer Vision. Show all posts

Tuesday, October 2, 2012

Human – Robot Interaction Through Gesture – Free Spoken Language


Vladimir Kulyukin
Autonomous Robots, No. 16, 2004
Summary
                  Human Robot interaction has become more and more important in relevant activities with challenging tasks, from mining to space walks, therefore tight integration of perception, language and vision are important characteristics for cooperating in such environments, GUI interface is not acceptable any longer. It is believed that everyday language is somehow grounded in visually-guided activities and deep integration must be achieved at this level. The system proposed is a speech method for commanding a robot in performing certain grasping tasks (such as a “pepsi can”). To modes of interaction are possible, local (when the robot and the operator are I each others sight) and remote (which need the use of a camera to allow the human to understand the robots environments). The proposed robot has a standard 3T architecture, there are three tiers of functionality: deliberation, execution and control. The execution ties (which receives from the deliberation tier the inputs to be satisfied) is implemented with the Reactive Action Package (RAP) System, this is a set of methods for achieving a specific goal under different circumstances. Steps needed to execute a method are task nets and the RAP may operate knowing or not where the object is, the RAP becomes a task when its index clause is matching a task description. The control tier contains the robot’s skills, these are enables when the control tier is asked to execute them and is not aware of success or failures, since these are under the execution tier.
Actions are then categorized in internal (for manipulation of robot’s memory) an external (for manipulation of external objects or moving operations), the output of a skill is a set of symbolic assertions that the skill puts in the robot’s memory. Goals, object and actions are part of a Semantic Network, where each not represents a memory organization package (MOP) and they are connected between each other with abstraction links or package links. Nodes are activated by an activation algorithm on the direct memory access parsing (DMAP) done by token sequences. For token T all expectation that present T must be advances, for those activated then the target MOPs must be activated and for each MOP, in presence of callbacks (Kulyukin and Settle, 2001), a run must be performed. Object recognition is performed through GHD (Generalized Hamming Distance), which is a medication of the classical method, being in fact capable of approximate similarity, and color histogram models (CHM). The system needs pieces of a priori knowledge, it needs the semantic network of goals and objects, the library of object models, the context-free command and control grammars for speech recognition, the library of RAP and the library for robotics skills. Interaction is based on passive rarefication of knowledge through goal disambiguation, mutual understand is necessary, denoting this to be a cognitive machinery situation. Voice recognition is performed through Microsoft Speech API (SAPI), which is a middle layer between an application and a speech recognition engine, it includes the Microsoft English SR Engine Version 5, with 6o,000 English words and capable to be provided with other languages. SAPI converts voice input into a token sequence, activating the algorithm discussed above. Goal disambiguation can appear under the form of sensory, mnemonic and linguistic. The robot is capable of asking for clarifying in ambiguous situations. A great advantage of the system is that it allows introspection, this permits operator to ask the robot what it is capable to do and therefore it make the machine easy for non expert operators and helps learning performances both of robot and human.
Key Concepts
Human – Robot Interaction
Key Results
Experiments have been performed proving the importance of introspection and showing how it is mainly used in at the first interaction and abounded with decrease of learning factor. Limitations are still to be overcome, in fact it can’t handle deictic references, quantification and negation and it is capable of interacting with one person.

Monday, October 1, 2012

A New Formulation of Visual Servoing Based on Cylindrical Coordinate System


Masami Iwatsuki, Norimitsu Okiyama
IEEE Transactions on Robotics, Vol.21, No.2, April 2005
Summary
                  Visual servoing, a technique for controlling a robot manipulator through feedbacks from visual sensors, is known to be flexible and robust and can derive robot motion directly from 2D visual data. A great advantage is in the fact that positioning accuracy of the robot is less sensitive to calibration errors (both of the robot and the camera) and image calibration errors.  Unfortunately monocular servoing that uses points as the primitives has the so-called camera retreat problem, when a rotation around the optical axis causes the camera moving backwards, in order to overcome this problem in the literature presents the use of straight lines as primitives or hybrid approaches to decouple translational and rotational components; the authors propose a faster and more way for solving the problem by using cylindrical coordinated into the formulation of visual servoing and arbitrarily shifting the position of the origin (since its coincidence with the optical axis makes the coordinate system workable only in the rotation case). The problem consist in transforming the Cartesian coordinate system into a cylindrical coordinate system, considering x and y parallel to the image plane and z parallel to the optical axis. Regarding the Cartesian approach, Pi is the point projected onto the image plane as pi(x,y)=[x y]T and the image plane velocity of feature point pi appears to be: p’i(x,y)=Jir, where J is the Jacobian an r is the velocity screw velocity: r=[vx vy vz ωx ωy ωz]. A control law can then be defined by calculating the error as the vector obtained by the different of the current feature point and the desired feature point and minimizing it through the following law: r_=-λJ+e(x,y), where λ is a constant gain and J+ is the pseudoinverse of J.
·       Method
In the case of a cylindrical coordinate system, we are in presence of a case in which (ξ,η) is representing the origin-shift parameters and x’ and y’ appear then to be coordinate of the transformed image feature pi described previously. For the new cylindrical system p’(ρ,φ) has to be computed by applying the rotation matrix to the image plane velocity. As for previously computation, the Jacobian matrix and the velocity screw are introduced in the same fashion, by keeping into account the presence of the rotational matrix which previously wasn’t used. Regarding the control law, as for before the error vector ei can be computer *this time using the radius and the argument as coordinate) and in a similar fashion we obtain ȓ=-λUJ+e(ρ,φ), where U is the orthogonal matrix.
It is proven that in the Cartesian approach is a particular case of cylindrical representation with ξ which tends to minus infinite and η which tends to 0. The paper introduces an approach for deciding the value of the origin-shift parameters. If we consider normalizing into m the homogenous coordinates of the image plane position of the feature point, we will then be able to obtain the LS error E(R)=sum(for I from 1 to n)[mgi-Rmsi]2, where mgi is the initiale image plane position at feature i and msi is the desired image plane position of feature i. R is the rotation matrix given by the multiplication of V and U, which are orthogonal matrices computed by the singular value decomposition of the correlation matrix M. From the rotation matrix we can obtain the orientation of the axis of rotation and therefore p0=[lx/lz ly/lz] (where li is the orientation for axis i of the rotational axis). The coordinate of p0 appear to be the ones interested for obtaining the origin-shift parameters location.
Key Concepts
Machine Vision, Visual Servoing
Key Results
The cylindrical system with shiftable origin has been tested and compared with the Cartesian system, demonstrating to be the most efficient camera motion, working in translation, rotation, combination in 2D, 3D and in 3D general motion with non-coplanar feature points.

Sunday, September 30, 2012

Adaptive Neural Network Control of Robot Manipulators in Task Space


Shuzhi S. Ge, C.C. Huang, L.C. Woon
IEEE Transactions on Industrial Electronics, VOL.44, NO.6, December 1997
Summary
                  Vision sensors are extremely important for tracking an mapping solutions in robotics, ego-motion (camera motion) and omnidirectional cameras appear to be an interesting application for fulfilling this field requirements’. Wide angle imagine has already been integrated in autonomous navigation systems, the paper discusses the use of omnidirectional cameras for the recovery of observer motion (ego-motion). The ego motion problem has been solved principally with the use of a two step algorithm: motion field estimation (computation of optic flow) and motion field analysis (extraction of camera translation and rotation from the optic flow); unfortunately the last step is sensitivity to noisy estimates of optic flow. It is proved that a large field of view facilitates the computation of observer motion, it is a spherical field view which allow both the focus expansion and the contraction to exist in the image, therefore the authors use a spherical perspective projection model since it is convenient for a view greater than 180 degrees. An image system may be wanted with a single center of projection since it ensures generation of pure perspectives images and the image velocity vectors onto a sphere. There are different methods for wide-angle imaging: rotating imaging systems (good for static conditions), fish eye lenses (which presents problems in obtaining a single center of projection) and catadioptric systems (which incorporates reflecting mirrors and in different cases appeared to be successful). Hyperbolic and parabolic catadioptric systems re both able to capture at least a hemisphere of viewing directions about a single point of view; knowing the geometry of these systems and applying spherical representation with the use of Jacobian transformations, it is possible to map the image velocity vectors.
·       Model
The motion field is defined as the projection of 3D velocity vectors onto a 2D surface, the rigid motion of a scene point P relative to a moving camera is then defined as: P’=-T-Ω X P, where Pˆ is the projection of scene P onto the sphere; the next step is then to derive the projection function with respect to time and substitute it in the equation mentioned above, obtaining therefore U(Pˆ), the velocity vector. The ego-motion problem is regarding the estimation of Ω, T and Ui, for this 3 algorithms are mentioned in the literature: Bruss and Horn, Zhuang et al., Jepson and Heeger (please refer to page 1001 and 1002 for the algorithms). In order to map motion in the image on the sphere we need to make the transformation of the image points on the sphere and use the Jacobian to map the image velocities. Coordinate θ is for the polar angle between z-axis and the incoming ray and φ is the azimuth angle, while x and y are the rectangular coordinates of the system with center in the origin at the center of image. Since the image plane is parallel to the x-y plane φ is to be considered always the same for all sensors ( φ-arctan(y/x) ), while the polar angle changes according is in use the parabolic omnidirectional system, the hyperbolic omnidirectional camera or the fish-eye lenses (which doesn’t have a single center of projection and introduces small errors). Finally U=SJ[dx/dt  dy/dt]T, where S is the transformation on the sphere and J the Jacobian matrix, each sensor has its own Jacobian matrix.
Key Concepts
Robot vision, Computer Vision, Omnidirectional Cameras, Ego-Motion
Key Results
The camera with the three model coming from the literature has been tested, showing that the non-linear algorithm (Bruss and Horn) is more accurate and more stable than the linear algorithms. The authors with this paper proved that these algorithms, although originally design for planar perspective cameras, can be adapted to omnidirectional cameras by mapping the optic flow field to a sphere via the use of an appropriate Jacobian matrix.