Noi, Robot: Robot vision

Showing posts with label Robot vision. Show all posts

Monday, October 8, 2012

Human-Assisted Virtual Environment Modeling

J.G. Wang, Y.F. Li

Autonomous Robots, N0.6, 1999

Summary

The paper proposes a man-machine interaction based on stereo-vision system. where the operator’s knowledge about the system is used as a guidance for modelling a 3D environment.

Virtual Environment (VE) modelling appears to be a key point in many robotic systems, specially in regard of tele-robotics. There have been many researches on how to build VE starting from vision sensors while exploring unknown environments and semi-automatic modelling with minimum human interaction. A good example of integrated robotic manipulator system using virtual reality (Chen and Trivedi, 1993, Trivedi and Chen, 1993) visualization to create advanced, flexible and intelligent user interfaces. An interactive modelling system was proposed in order to model remote physical environments through two CCD cameras, where edge information is used for stereo-matching and triangulation to extract shape information, but the system was constrained by the only motion of the camera on the Z axis.

The proposed system is performing in order that the operator can minimize the cues about the features and information the manipulator or mobile robot may encounter. The procedure followed sees first a local model build from different view point and later these local models composing a global model for the environment, once the environment has been constructed virtually, then the operator can fully concentrate in tele-operation.

Considering the use of two cameras, left and right, then two transformation matrices can be obtained: [H_R] and [H_L] these can be used for calculating W, the corresponding known image coordinate feature points of the 3D coordinate feature points. So in the end, assuming the 3D vector in W as [V_3D] and the correspondent 2D vector [V_2D], then [H]= [V_2D] [V_3D]^T[[V_3D] [V_3D]^T]^-1, where H can be decomposed in left and right matrices. Further on, if we assume [H_R] and [H_L] available, [X]=[x,y,z] of a feature in W can be calculated with its corresponding image coordinates [x_a y_a], [x_b y_b], so that [X]=[[A]^T[A]]^-1[A]^T[B], where [A] and [B] are image coordinates.

A major difficulty though in stereo vision is the correspondence problem between the feature points in two images, due to poor robustness. A human operator can therefore identify objects in most of the scenes, prompting the vision system to locate and detect some object attributes or special corresponding feature so that the image coordinates would be deducted and the 3D position in W calculated.

A binocular stereo vision, after been guided by an operator to find some correspondent prompted feature, can be used to construct the local models of objects directly. The system work in recognizing primitive solids, from which is later possible to computer composite models. The authors introduce the cuboid (for which four points are detected) and the sphere (for which determination in a 3D space is obtained through the knowledge of radius and center), which through geometrical calculations and transformations can be obtained. Vertexes of objects are found through the intersection of corresponding lines, for other more complicated objects operator’s guidance can be used. In general only one point of view cannot successfully represent a 3D object, more than one is required and therefore Multi-Viewpoint Modelling is used. Therefore from two positions (for instance A and B) a transformation M^-1takes place, determining M rotation and translation are solved separately. If C and C’ represent the coordinate relationships between view point A and B, then C’=M’C and W=M’W’; after some computation M=[R T], with R rotational component and T translational component.

Key Concepts

Machine vision, Human Robot Interaction

Key Results

Performance can be studied either with different between point and their image or with the different between measured and real size objects. The system also work with insertion tasks with an error of 0.6 mm, in case of the need of a more precise system, force sensing would then be needed. Operators can use this methodology for observing real environment from any view points on the virtual reality system.

Information Sharing via Projection Function for Coexistence of Robot and Human

Yujin Wakita, Shigeoki Hirai, Takashi Suehiro, Toshio Hori

Autonomous Robots, N0.10, 2001

Summary

The authors introduce the concept of safety based on intelligent augmentation of robotic systems. In previous studies the authors introduced the concept of tele-robotic systems (1992,1995,1996), where a robot is operated from another position with no physical contact and monitored through a television, and intelligent monitoring (1992), a system allowing conveyance of only required information through selection of data. The expansion of this last system has been the snapshot function (1995), where a laser pointer helps in teaching mode to estimate the deviation of the position, while the operator can move the robot, teaching the estimated relative deviation. A further implementation is the here proposed projection function (2001), where a robot and human jointly operate through a Digital Desk, a special environment provided with a projector perpendicular to the working table and a speaker. The aim of this research is to achieve intelligent augmentation in order to prevent and avoid undesirable contact, information sharing is a fundamental aspect in cooperative tasks between a person and a robot (Wakita, 1998). The experiment test a human and robot operating in mainly 5 states (initial, approach, grasp, release and final), the main issue is this kind of problem to be solves are: the person does not know the delivery coordinate, the person must keep holding the object until it is released, the person might be frightened by the robot movement.

The projection function consists of projecting on the table the simulated images of the moving robot, so that the human operator knows in real time the robots trajectory and understand the delivery trajectory. Force sensors in the robot’s fingers are used in order to allow the robot understand when the object has been grasped by the operator. A new teaching method also is introduced: the operator activated the teaching mode by touching the robot’s hand, then, instead of physically moving the manipulator, the projected image of the robot follows the operator’s hand to destination, the advantage is that only the model is required and no robot movement; the robot confirm through the speakers that the teaching trajectory has been saved.

The force sensors are an efficient communication method only during grasping, visual monitoring appears to be necessary for the entire delivery task.

It can be observed that humans in cooperation require visual feedback in order to understand that their motion and activity has been understood, each person expects to be observed during their action. So visual information appears to be extremely important by means of perception and it enhance safety in the system.

The digital desks comes to help once again in monitoring and indicating robots and humans in the system, in fact while operating a symbol (in the experiment it is a white rectangle) is projected on the hand of the operator when the robot has detected an action, in this way the human is aware that the robot knows about its presence.

In order to perform the experiment, a CCD camera was used for detection of human’s hand and robot position, and a video projector (SANYO LP-SG60) mounted on the ceiling in parallel with the camera.

The system as programmed, projects a white rectangle on the human’s hand when the CCD and the computer had performed the detection, while stationary hand is recognized a the delivery position.

Key Concepts

Human-Robot Interaction, Human-Robot Cooperation, Team Working

Key Results

The experiment appears to be useful prompting the importance of communication between robots and humans working together, a communication which need also visual feedback in order to ensure safety. A big part of communication is in fact performed not only by direct communication, but also by indirect feedback, showing that the message has been properly received. Future research may require adding information to the system.

Saturday, October 6, 2012

Toward a Framework for a Human-Robot Interaction

Sebastian Thrun

Human-Computer Interaction, No.19, 2004

Summary

The field of robotics has undergone a considerable change from the time it first appeared as a complete science, robots now perform many assembly and transportation tasks, often equipped with minimal sensing and computing, slaved to perform a repetitive task. The future is more and more seeing the introduction of service robots and this is mainly thanks to reduce in costs of many technologies required and increase in autonomy capabilities.

Robotics appears to be a broad discipline and therefore definitions of this science are not unique, a general definition has been done by the author in a previous paper (Thrun, 2002) a system of robotic sensors, actuators and algorithms. The United Nations has categorized robotics in three fields: industrial robotics, professional service robotics and personal service robotics.

Industrial robotics are the earliest commercial success; an industrial robot operates manipulating its physical environment, it is computer controlled and operates in industrial settings (for example on conveyor belts).

Industrial robotics started in the 60s with the first commercial manipulator, the Unimate, later on in the 70s Nissan Corporation automated an entire assembly line with robots, starting a real “robotic revolution”, simply it can be considered that today the ration human to worker to robots is approximately 10:1 (the automotive industry is definitely the one with biggest application of robotics). However industrial robots are not intended to operate directly with humans.

Professional service robots are the younger kind of robots and are projected to assist people, perhaps in accessible environments or in tasks where speed and precision won’t definitely be met by human operators (as it is becoming more common in surgery).

Personal service robots posses today the highest expected growth rate, they are projected to assist people in domestic tasks and for recreational activities, often these robots are humanoids.

In all three of these fields two are the main drivers: cost and safety, these appear to be the challenges of robotics.

Autonomy refers to the ability the robot has to accommodate variation in the environment, it is a very important factory in human-robot interaction. Industrial robots are not considered to be highly autonomous, they often are called for repetitive tasks and therefore can be programmed, a different scenario appears to be the on of service robots where complexity of the environment brings them to be design to be very autonomous since they have to be able to predict the environment uncertainties, to detect and accommodate people and so on.

Of course there is also a cost issue, which necessitates the personal robots to be low-cost, therefore it they are the most complicated since the need high levels of autonomy and low costs. In human robot interaction extremely important become the interface mechanism, industrial robots are often limited, in fact they hard programmed and programming language and simulation softwares appear to be intermediary between the robot and the human. Service robots of course require richer interfaces and therefore distinguished are indirect and direct interaction methods. Indirect interaction consists of a person operating a robot through a command, while direct interaction consist of a robot taking decision on its on in parallel with a human.

Different technologies exist in order to achieve different method of communication, an interesting example appears to be the Robonaus (Ambrose et al., 2001), a master-slave idea demonstrating how a robot can cooperate with astronaut on a space station. Speech synthetisers and screens also appear to be interesting direct interaction methods.

Investigating humanoids and appearance, together with social aspect of service robots are also important aspect which researched are today investigating for the future of robotics.

Key Concepts

Human Robot Interaction, Human Robot Cooperation

Friday, October 5, 2012

Trajectory Prediction for Moving Objects Using Artificial Neural Networks

Pierre Payeur, Hoang Le-Huy, Clement M.

Human Factors: The Journal of the Human Factors and Ergonomics Society

Summary

Prediction of objects trajectory and position is very important for industrial robots and servo systems, where this kind of information might be used for control, capture and other observation issues.

Parts’ trajectories are often unknown and catching object problem can be useful for example in a typical loading/unloading problem of a robot grasping an object (for instance s box) from a slide or a conveyor.

In such a condition of course accuracy is strictly important, both for achieving the goal and for safety purposes.

An optimum off-line method is path planning based on cubit polynomials with dynamics added (G. Sahar, Hollerbach, 1986), but this method involved a lot of computation and it is done off-line, being not flexible.

The authors propose the use of Artificial Neural Networks (ANN), which can be computed faster and most of all can learn the system, being adaptable and flexible. The problem can be defined as the prediction of the trajectory of a moving object in real time, with minimum error, with no collision on an arbitrary path.

· Model

Time t₀ is defined as the time in which the manipulator starts moving, the prediction is done with the period of time T. The trajectory model is defined with a cubic function: χ(t)=1/6α₀t³+1/2β₀t²+γ₀t+χ₀, where χ represents the position, orientation, velocity or acceleration according on needs.

Neural network are trained to define the values of the coefficients α₀, β₀, γ₀ at each change of trajectory for which they have to be recalculated, the presence of pre- and post-conditioning allows the need of a long network so that in the end there are 3 inputs, two layers of 20 hidden cells each and one output cell.

The global predictive structure is a multiplication of this basic topology by the number of d.o.f. .The Network is modeled so that it basically obtains information about the change in position and not the exact position, this method ensures reduction of calculations, but pre and post conditioning are then of course required in order to ensure the start and continuation of the calculations. Sampling period T too many times also should be avoided, so the preconditioning module computes differences between the known positions and feeds the network is pseudo-displacement value, so the network will process it and generates the anticipated pseudo-variation of the position or orientation, the post-conditioning module finds then velocity and acceleration through kinematics. The processed data from the neural nets is normalized so that the maximum value is lower than the activation function maxima (in this case a bipolar sigmoidal activation function with a range from -1.0 to 1.0). Between two successive sampling there is a maximum coordinate variation ∆p_max, while limitation in resolution is the minimum detection possible (∆p_resolution), so that the number of possible steps to consider is S_Tsingle=2∆p_max/∆p_resolution+1, which must be globally should be powered by three for each of the inputs. The size of the data can be reduced since two successive position or orientation variation are always of a similar amplitude with a constant sampling rate, so it can be assumed hat there is a maximum difference, tables (Appendix C) help for calculations shown at page 151. Once the input triplet is processed with the cubic model already introduced (to ensure that the output data presented to the network are exact), these are normalized between ±0.8 to avoid extremes, at that point the back-propagation algorithm and the pseudo-random presentation order are used for training the data (triplets are not presented in a fixed order, but the entire set must be used within one epoch).

Key Concepts

Artificial Neural Network, Path Planning, Trajectory Prediction

Conclusion

The method appear to be optimal and working as the polynomial trajectory model but being more flexible. Anticipated values are also more reliable and variances are more similar between axis so that errors could be improved and controlled.

Tuesday, October 2, 2012

Human – Robot Interaction Through Gesture – Free Spoken Language

Vladimir Kulyukin

Autonomous Robots, No. 16, 2004

Summary

Human Robot interaction has become more and more important in relevant activities with challenging tasks, from mining to space walks, therefore tight integration of perception, language and vision are important characteristics for cooperating in such environments, GUI interface is not acceptable any longer. It is believed that everyday language is somehow grounded in visually-guided activities and deep integration must be achieved at this level. The system proposed is a speech method for commanding a robot in performing certain grasping tasks (such as a “pepsi can”). To modes of interaction are possible, local (when the robot and the operator are I each others sight) and remote (which need the use of a camera to allow the human to understand the robots environments). The proposed robot has a standard 3T architecture, there are three tiers of functionality: deliberation, execution and control. The execution ties (which receives from the deliberation tier the inputs to be satisfied) is implemented with the Reactive Action Package (RAP) System, this is a set of methods for achieving a specific goal under different circumstances. Steps needed to execute a method are task nets and the RAP may operate knowing or not where the object is, the RAP becomes a task when its index clause is matching a task description. The control tier contains the robot’s skills, these are enables when the control tier is asked to execute them and is not aware of success or failures, since these are under the execution tier.

Actions are then categorized in internal (for manipulation of robot’s memory) an external (for manipulation of external objects or moving operations), the output of a skill is a set of symbolic assertions that the skill puts in the robot’s memory. Goals, object and actions are part of a Semantic Network, where each not represents a memory organization package (MOP) and they are connected between each other with abstraction links or package links. Nodes are activated by an activation algorithm on the direct memory access parsing (DMAP) done by token sequences. For token T all expectation that present T must be advances, for those activated then the target MOPs must be activated and for each MOP, in presence of callbacks (Kulyukin and Settle, 2001), a run must be performed. Object recognition is performed through GHD (Generalized Hamming Distance), which is a medication of the classical method, being in fact capable of approximate similarity, and color histogram models (CHM). The system needs pieces of a priori knowledge, it needs the semantic network of goals and objects, the library of object models, the context-free command and control grammars for speech recognition, the library of RAP and the library for robotics skills. Interaction is based on passive rarefication of knowledge through goal disambiguation, mutual understand is necessary, denoting this to be a cognitive machinery situation. Voice recognition is performed through Microsoft Speech API (SAPI), which is a middle layer between an application and a speech recognition engine, it includes the Microsoft English SR Engine Version 5, with 6o,000 English words and capable to be provided with other languages. SAPI converts voice input into a token sequence, activating the algorithm discussed above. Goal disambiguation can appear under the form of sensory, mnemonic and linguistic. The robot is capable of asking for clarifying in ambiguous situations. A great advantage of the system is that it allows introspection, this permits operator to ask the robot what it is capable to do and therefore it make the machine easy for non expert operators and helps learning performances both of robot and human.

Key Concepts

Human – Robot Interaction

Key Results

Experiments have been performed proving the importance of introspection and showing how it is mainly used in at the first interaction and abounded with decrease of learning factor. Limitations are still to be overcome, in fact it can’t handle deictic references, quantification and negation and it is capable of interacting with one person.

Noi, Robot

Pages

Follow me

Monday, October 8, 2012

Human-Assisted Virtual Environment Modeling

Information Sharing via Projection Function for Coexistence of Robot and Human

Saturday, October 6, 2012

Toward a Framework for a Human-Robot Interaction

Friday, October 5, 2012

Trajectory Prediction for Moving Objects Using Artificial Neural Networks

Tuesday, October 2, 2012

Human – Robot Interaction Through Gesture – Free Spoken Language