Magazines: AutoSpeed  |   V8X  |   Silicon Chip  |  Real Estate Shopping: Fashion  |  Cars  |  Fishing  |  Musical Instruments |  Electronics
This Issue Archived Articles Blog About Us Contact Us
SEARCH


Optical Recognition

Teaching cars to see.

Courtesy of DaimlerChrysler

Click on pics to view larger images

 Advertisement
Advertisement 

Click for larger image

"Before I retire, I want to be able to buy a Mercedes that can automatically take me to my destination - while I prepare a presentation on my laptop. In other words, I want to have the choice of driving myself or letting something else keep an eye on the traffic situation."

Uwe Franke has a very ambitious, almost utopian goal in mind: video-based assistance systems should one day be capable of guiding a car safely through traffic from point A to point B, without any involvement at all on the part of the driver. Moreover, they should be able to perform this task as quickly and safely as possible.

The project manager of the Image Interpretation group at DaimlerChrysler Research was not always this optimistic. "The problems were initially so complex that for a long time we doubted we would ever overcome them," he recalls. "But in the last few years we've been making advances so rapidly that the goal may soon be within reach."

Click for larger image

The test vehicle UTA II - the abbreviation stands for the second version of the "Urban Traffic Assistant" - is the source of Franke's new optimism. This Mercedes E-Class equipped with stereo video cameras, a colour camera and image-processing electronics offers a hint of the increased safety that video-based assistance systems will one day bring. In addition to recognizing the open stretch of road ahead, UTA II can pinpoint other vehicles, be they in front, crossing the road or parked at the curb.

It also detects pedestrians (either stationary or crossing the road), and registers traffic signs and traffic lights at the side of the road. It is even capable of recognizing curbs and road markings such as directional arrows and crosswalks.

Down to the Essentials

Machine vision and human sight are fundamentally different - with regard to both their methods of operation and their performance capabilities. That's why, despite all his optimism, Franke is cautious when it comes to predicting how quickly machine vision will improve. In particular, he doubts that it will be possible in the near future to develop and employ technical systems that outperform the human eye as an optical sensor and the brain as an "image processor."

Click for larger image

His team is therefore not attempting to make an exact copy of the biological model, and that becomes clear from a glance at the video sequences. The technical system's perception of traffic is obviously very different from the human one.

Approaching cars dart amoeba-like through the image as red clouds of dots. Alien-like pedestrians cross the road as a mere outline with a coloured border. Trees, houses and other objects in the background or far away at the edge of the picture blend together to form visually indistinguishable parts of an environment outline. Some are not depicted at all.

From a human point of view, the image is an unfamiliar, abstract reproduction of the driver's surroundings. But just as expressionistic painters reduce a "realistic" scene to clearly-defined forms and fields of colour and thereby emphasize the essential, the machine vision systems record only those details that are crucial for a safe trip. Houses off to the side of the road are left out. Whether an oncoming vehicle is a yellow Opel Corsa or a silver-grey Mercedes A-Class is irrelevant as far as the machine eye is concerned. So too is the question of whether the pedestrian in the road is a man or a woman. Much more important for safety is how quickly an oncoming car is approaching, how far away it is from UTA II and whether there will be enough space for the vehicles to pass one another.

It is exactly here, in such image processing tasks, that the advantages of technical systems, with their sensors and electronic components, can be best exploited. Whereas the human eye can only roughly estimate distances, object sizes and relative speeds, these systems can measure such parameters very precisely.

Click for larger image

"The stereo base of our system - in other words, the distance between the two video cameras - is 30 centimetres and therefore considerably larger than the distance between our eyes," Franke explains. "The result is that a person can estimate distances relatively well only up to 12 metres away. Our system, on the other hand, measures the distance and relative speed of objects up to 60 metres away from the car very precisely."

And electronic vision is characterized by another important feature: "It may be true that the human eye recognizes the environment around a car better than any technical system now available," says Franke. "But this is the case only as long as the driver is fully alert and concentrated on the task of driving."

Inattention changes things dramatically. A waving neighbour at the side of the road, a brief glance at a billboard up on a building or simply a brief lapse in concentration - any one of these is enough to cause the driver to overlook a situation of imminent danger. The eyes of UTA II, on the other hand, are positively stuck on the road, taking note of every object that stands out from the road surface as a pattern of pixels. The image-processing electronics are constantly searching for the pixel patterns they have been trained to recognize by Franke's team.

"We human beings can name practically any object we see," Franke says. "A technical system, on the other hand, can only properly identify something if it has been taught how to classify various objects - such as traffic lights or pedestrians - beforehand. And that means showing it lots and lots of examples."

Teaching the System

Click for larger image

To teach the system what a pedestrian looks like, the researchers fed it the images of several thousand pedestrian photos. If a pedestrian appears in the camera's 40-degree field of vision, the image-processing electronics compares this pixel pattern with those from the "pedestrian database." If it determines that there is a sufficient degree of correspondence between the two patterns, it decides that it is looking at a "pedestrian."

The process actually involves several steps. This is necessary to gradually reduce the complexity of the surroundings and increase the accuracy of an object's classification. The image-processing software is designed so that it first separates what is important from what is irrelevant. This step, known as "detection," is based first and foremost on the creation of a stereoscopic reproduction of the road in front of the vehicle plus the two boundary lines. Using this image, it is possible to identify all the "obstacles" that appear along the road or which limit the space available for driving. The position of the objects from the vehicle can also be determined in this manner. In addition, the detection step takes account of well-defined shapes (such as a triangle that could indicate a traffic sign), colours (for example, a patch of red light that might come from a traffic signal), and movements in the driving environment (such as a crossing vehicle).

In the detection stage, the entire pixel area depicting the video sensors' field of vision is divided into important and unimportant areas. Unimportant pixel areas are ignored; the system now directs its entire attention (or in technical terms - its computational power) to the areas identified as important.

Now the second step occurs: An object is tracked or monitored over a period of time. Here it is possible for the system to determine whether an object or obstacle is stationary or moving and, in the latter case, in what direction it is moving and how fast.

After the tracking comes the last image processing step: classification. This is where a monitored pixel pattern is categorized as a particular object, be it a pedestrian crossing the road, a stop sign or a car ahead. Only now can the technical system "name" the object it has been "viewing."

Response Time

It goes without saying that these steps must be completed in a flash. After all, the driving environment is constantly changing. The multi-stage image processing must therefore be completed in real time.

The image-processing system currently classifies 80 percent of all pedestrians correctly after a millisecond; 95 percent of all pedestrians are identified after at most five milliseconds. A spatial image of the whole driving environment is generated after a maximum of 100 milliseconds. This is then used to identify obstacles. Such high speed is required because the image-processing computer has to interpret several still pictures per second and identify changes over the course of time while tracking.

Click for larger image

The algorithm for identifying traffic lights, for example, operates with eight interpreted images per second; the shape analysis for identifying traffic signs runs 25 times per second.

Despite this enormous speed, there is no high-performance computer in UTA II's boot. Instead, all the image processing is performed by three 700 MHz Pentium III processors - by no means the fastest such components currently available on the market. Clearly, the enormous processing power is thus less the result of sophisticated hardware than of a smart software strategy and architecture developed by DaimlerChrysler Research.

For example, Matthias Oberländer, one of Franke's colleagues, achieved a real breakthrough in pattern recognition several years ago with the concept of the so-called hyper-permutation network (HPN). In a process similar to what occurs in a neural network, the brightness values of the pixels supplied by the camera are gradually linked in several steps until the pattern can be classified with sufficiently high probability as a particular object, such as a pedestrian. The crucial aspect of an HPN is that it does not use any arithmetical operations at all. The result of a link is determined using a table, rather than an arithmetic formula. The content of the table is adapted to the task in advance through automatic training processes.

Click for larger image

"Optical assistance systems can help make the vision of intelligently supported urban traffic a reality," says Franke. "One of the most important benefits would be significantly fewer accidents on the road." To back up his argument, Franke refers to the results of several surveys. These show, for example, that more than half of all accidents at intersections with traffic lights could be avoided if the driver were warned before accidentally running a red light.

As video-based systems become more reliable, their usefulness in hazardous situations will increase in importance. They could, for instance, intervene in the steering, braking system or cruise control in a situation where the driver can no longer prevent an accident through his or her own actions.

"Thanks to the progress we've made during the last few years, we have come a great deal closer to achieving our goals," says Franke, still dreaming of that special Mercedes-Benz car he plans to buy one day, before beginning his well-earned retirement.


More of our most popular articles.
Sounds ridiculous - but is it?

Technical Features - 4 October, 2007

Alternative Cars, Part 4 - Human Powered

Measuring acceleration and turbo behaviour

DIY Tech Features - 28 April, 2009

Ultimate DIY Automotive Modification Tool-Kit, Part 5

Important differences to intercooling petrol engine turbos

Technical Features - 10 January, 2008

Diesel Intercooling

Books that you'll keep forever

Special Features - 24 March, 2009

The Ten Must-Have Books

Copyright © 1996-2009 Web Publications Pty Limited. All Rights ReservedRSS|Privacy policy|Advertise
Consulting Services: Magento Experts|Technologies : Magento Extensions|ReadytoShip