Self-driving cars face challenges at construction sites, as Google’s Waymo struggles and faces scrutiny

Robotaxi, fire and theft protection “construction site”. Waymo is in trouble.

North American regulatory authorities have just launched another investigation into Waymo's self-driving cars due to a series of related accident reports.


Some hit parked vehicles on the roadside, some hit stationary obstacles, some blocked traffic… and an unusual high-frequency scene:construction site.

what happened

In about three months, Waymo reported 22 accidents, attracting the attention of the National Highway Traffic Safety Administration.

According to NHTSA documents, these accidents include collisions between Waymo autonomous vehicles and stationary and semi-stationary objects such as doors, collisions with parked vehicles, and violations of traffic safety controls.

Among them, “violation of traffic safety control devices” has been specially stated by the official to be a key investigation direction. A typical scenario isAutonomous driving system’s ability to detect and identify traffic cones/ice cones.


This point is relatively rare. Because many of the 22 accidents reported this time involved the same scenario—— construction site.

For example, last month,6 Waymo Robotaxi vehicles traveling in formationI returned to the parking lot after work, but encountered temporary traffic control at the construction site, and got stuck directly in the temporary passage area surrounded by ice cream cones, causing traffic jam for about half an hour.

Netizens with local life experience immediately recognized that this was the 101 Potrero Avenue ramp in San Francisco. The place where the Waymo driverless car got stuck happened to be the entrance to the highway.

In the end, the driver on the road got out of the car and moved the ice cream cones manually, and the convoy behind them went around several “paralyzed” unmanned vehicles.

Waymo issued a simple statement to the effect that it sent people to the scene to move the vehicle within 30 minutes, without causing any casualties or property damage, and would cooperate with the subsequent investigation.

But at construction sites in Phoenix, we're not so lucky.

The same Waymo self-driving car ignored the construction area surrounded by ice cream cones and rushed directly into the construction site.

Fortunately, the speed was not high and no one was hit, but the vehicle itself and the construction site suffered varying degrees of damage.

There are many accidents like this, and every time a short video of an unmanned vehicle “rushing into a construction site” goes viral on the Internet.

Netizens summed it up very vividly:Traffic cones are the kryptonite of RobotaxiNowadays, no matter how powerful autonomous driving is, it will be doomed when it encounters a traffic cone on a closed road.

Huh? It seems to be different from the official video shown by Waymo.

Why is the construction site difficult?

Waymo Fifth-generation autonomous driving system bypasses construction zonehas been specially analyzed as a technical highlight.

In the official demo, the scenes faced by unmanned vehicles are more complex, including traffic cones, irregular areas, and workers walking back and forth.

Of course, Waymo's self-driving car effortlessly completed a series of avoidance and detour maneuvers, and successfully passed the construction area:

What’s amazing here is that Waymo’s self-driving cars seem to be able to understand the body language of humans directing traffic, stop when they let go, and go when they let go, not just based on road conditions.

How did you do it? Maya Kabkab, Waymo's engineer in charge of prediction algorithms, briefly explained that in the fifth generation technology, Waymo has strengthened its ability to understand different objects and targets, as well as its ability to identify passable areas, both of which enable the system to plan better. Access routes.

The core is to replace CNN with a new model VectorNet to extract sensor and high-precision map information.

Simply put, high-precision maps and sensor input information are represented as points, polygons or curves, and VectorNet represents the trajectories of all road features and other objects as corresponding vectors. Based on this simplified view, VectorNet can extract the information of each vector and learn the relationship between different vectors.

The advantage is that VectorNet takes up less computing resources than CNN, produces results faster, and can theoretically extract key scene information more clearly.

But VectorNet still hasn’t solved the core of the “construction site” problem——

“Construction site” itself is an exception to HD mapsit is impossible to update synchronously, and can only rely on sensors to sense it in real time.

However, the sensor data is transferred between different sub-models in sequence, and information loss is difficult to completely avoid.

Robotaxi frequently gets stuck at construction sites. The direct reason is the incorrect detection of traffic cones and irregular objects.

The underlying reason isThe traditional autonomous driving technology paradigm has an upper limit and ceiling of capabilitiesit is difficult to cover all corner cases on the road.

Therefore, whether it can successfully avoid the construction site has become a probabilistic event: if the official demo is carefully crafted and tested repeatedly, then there will be no problem; if it is tested on the road, it will only depend on the weather.

Can it be solved end-to-end?

“Quantum mechanics cannot decide when things are going wrong” is a joke.

But in the field of autonomous driving, if you are in doubt, you can indeed go “end-to-end”.

The so-called “end-to-end” refers to the traditional technology paradigm, in which the perception, decision-making, regulation and control of autonomous driving are independent of each other. The data collected by the sensor needs to pass through this series of different algorithm modules before it can finally be “turned into” operating instructions.

The information between each independent module is transferred step by step. In this process, there will inevitably be information loss and errors, and the error of the previous module will affect the next one, and the information errors between multiple modules will continue to accumulate. , which in turn affects the overall effect of the autonomous driving solution.

Whether it is pure visual perception or fused perception, the root cause of “wrong detection and missed detection” lies here.

Of course, there is a corresponding solution, which is to use handwritten rules and try to patch them to improve the reliability of perceptual recognition. For example, if you can recognize cars and people, but you can't recognize “people standing in front of cars”, then that's easy. Wouldn't it be enough to just create a separate data set for this type of target and use it to train the model?

This is the so-called perception “whitelist” mechanism.

But the problem is that it is difficult to exhaust all kinds of traffic targets and scenarios. This time the problem of “people in front of the car” is solved, but what if the car changes from a passenger car to a large truck? Or what if a person becomes an adult and leads a child?

The same is true for Robotaxi's construction site problems. Construction sites may appear temporarily, refresh randomly, and will not limit the area or time, and the erection and construction roadblocks of each construction site are different…

Therefore, to realize the lossless transmission of information from the beginning of perception and allow the system to truly understand the environment, a new algorithm paradigm is needed – the end-to-end algorithm model.

The two ends refer to the data input end and the instruction output end respectively, and the middle is no longer divided into several independent modules.

The end-to-end model can transfer and generalize the capabilities and skills it has learned to other scenarios through a completely data-driven model, autonomously and efficiently solve various emerging long-tail problems in parking scenarios, and have faster iterations efficiency, effectively reducing city development costs.

In layman's terms, it allows the AI ​​​​to learn mature human driving behavior, see a scene, and make corresponding countermeasures. In fact, “end-to-end” has reached the threshold of AGI.

The end-to-end model was first proposed by NVIDIA in 2016. But the actual mass production practice only started in the past two years. Currently, there are only Tesla’s FSD and Chinese AI players’ CVPR 2023 best papers——UniAD.

Smart Car Reference also asked the two leading domestic self-driving players for their views on the construction site problems encountered by Waymo.

From the perspective of engineering practice, Horizon believes:

self-drivingConstruction site challenges and end-to-end technology paradigms are not tied to each other. Theoretically speaking, the problem can be solved if the perception ability is strong enough and the perception whitelist is rich enough.

But obviously, end-to-end autonomous learning capabilities and human-like thinking will solve this problem on a larger scale and more efficiently.

SenseTime’s view is more based on “first principles”. Technical experts related to Jueying Smart Driving believe:

Specific cases will not be evaluated. However, the perception of traditional rule-based intelligent driving solutions is still artificially defined elements, and the perception information is abstractly extracted, which will lead to losses and omissions in the information transmission process, making it difficult for the perception decision-making module to make correct decisions. The end-to-end is in a neural network, which inputs and transmits the information of the external environment losslessly, understands the external traffic environment more accurately and completely, and makes planning and decisions.

The rule solution can solve a scenario by adding rules and patches. But there is not only one such scene, it is infinite. After learning and training with enough data, the end-to-end solution can think and drive like a human, and solve more similar corner cases by itself.

To sum up, Horizon and SenseTime have different expressions, but the core is the same and they both agree.End-to-end is the most effective way to solve Robotaxi construction site problems. At the same time, it is also the most efficient way to solve various long-tail problems of autonomous driving.

By the way, one more thing, UniAD’s CVPR 2023 best paper was proposed, and scholars from Horizon and SenseTime participated in the writing.

The end-to-end innovation of traditional technology paradigms gives all players new opportunities: better smart driving experience, lower maintenance and generalization costs, and more competitive autonomous driving solution costs.

But the price is that the previous modular, rule-driven technical system must be overthrown and reconstructed.

Waymo, the former absolute leader in autonomous driving, is now trapped in a “construction site” dilemma, which further proves that the autonomous driving track has “impermanent water and unpredictable forces”:

Established stars may have their advantages reset to zero, and “latecomers” may also gain a lead.