Lights’ Clarity Platform – multi-view Depth Perception for the next-generation of Vehicles
Light delivers the most sophisticated depth perception technology in the world, mostly due to their as yet unseen combination of range, detail, accuracy and consistency of depth estimation. Tooploox engineers, researchers, and designers have contributed to the company’s works.
Autonomous vehicles are no longer a buzzword. According to the Markets&Markets report, the value of the global autonomous car market reached $24.1 billion in 2019 and is expected to project a CAGR of 18.06% between 2020-2025. This growth is fueled by both policymakers and companies alike. PwC estimates that up to 40% of mileage driven in 2030 will be done by autonomous vehicles. Also, up to 55% of small business owners believe that within two decades they will have a fully autonomous fleet.
When it comes to autonomous cars, the main media focus is on the artificial intelligence responsible for controlling the car’s behavior on the road (and elsewhere). Yet behind the system which decides whether to accelerate or not, there is a net of sensors which deliver the information required to do so – literally the eyes, ears, and sense of motion of the intelligence behind each decision.
And there can be no correct decision without proper data.
The Client
Light is the company behind the sophisticated multi-lens and multi-sensor cameras initially used in mobile devices. The company’s products have delivered optical zoom, a higher overall image quality, and stereoscopic depth estimation.
Recently, the company has been focused on using its expertise to build perception technology for the next generation of vehicles. Harnessing the experience gathered while delivering smartphone cameras, Light is leveraging its previously-developed multi-view technology to better address the needs of driver-assisted and autonomous vehicles.
What is multi-view and what makes it so special?
On the most basic level, multi-view is based on combining two or more images taken from different angles in order to gain information about its depth and produce a 3D image. The process is widely seen in nature, where images from two eyes are combined in the brain to produce a multidimensional view. Basically the same can be achieved by using the cameras of autonomous cars to make depth estimations.
The process itself is pretty straightforward. When one knows the distance between lenses (either cameras or eyes) and the difference in the position of a particular entity in both images, one can easily calculate depth using the principles of triangulation.
When compared to other depth estimation methods used in autonomous cars, like radar or lidar, multi-view has multiple advantages:
-
It is passive – using this technique emits no signal (contrary to laser-based lidar or radio-based radar), thus energy consumption stays low and safety remains high.
-
It delivers data used in multiple systems – the video material gathered by the cameras can be used in image analysis or filtered through existing image recognition solutions without further data collection required.
-
It works on image data – existing road infrastructure is delivered to drivers through their sight, not radar or lidar. Thus, stereoscopic cameras are the best fit to the current road norms.
But being pretty straightforward on paper doesn’t always carry over into reality. Not even close.
Challenges in Multi-view
One crucial and the most painstaking element of multi-view vision systems is making cameras efficiently work in tandem or in larger groups. There are several challenges to take into account when thinking about this technology:
-
Camera calibration
every single camera needs to be calibrated and adjusted according to the requirements of the Light Clarity Platform, especially considering the great variability among available models and configurations. On the other hand, this approach allows the company to cut costs significantly by utilizing an already available hardware
-
Camera movement
it is nearly impossible to eliminate the subtle movement of the camera lens. Even a slight change in position influences the angle between two lenses and can significantly alter estimations
-
Not knowing the exact position of pixels (only approximate positions)
the key aspect of stereo vision is the ability to find the same entity in two images (be that a tree, a car, a lamp, or a traffic light – it doesn’t matter) and know the difference between its position in images from two or more cameras. If the solution is to work in an autonomous car, it needs to be extremely fast and efficient, it needs to work on incomplete data (the sensor does not load a full image, but strings of data) and remain reliable – a task not to be underestimated
-
Slightly different pixel world representation
the basis of the computational process behind stereovision is the difference between the positions of two known pixels or a group of pixels that compose what the human eye interprets as a car or a tree. Yet it is not certain that the sensor will interpret the world in the exact same way – the fringe of the object can be blurred or moved due to the nature of the sensor, which changes light into an image. So even assuming the perfect calibration of all cameras, there can still be some misalignment of the rectified images
-
Different camera photometric calibration
another challenge is in keeping the cameras’ photometric calibrations the same – yet sometimes this is simply impossible. After light is transformed into a stream of bits and processed by machines, matching these can get even more complicated
-
Perspective distortion
last but not least, when it comes to the cameras used in autonomous cars, a widely known phenomenon of perspective distortion comes into play. The wider the gap between the cameras, the bigger the perspective distortion and the algorithm needs to take this into account
Our solutions
Considering the challenges mentioned above, there is no multi-view depth estimation without a heavy dose of engineering and the solving of multiple challenges, both those known and those unexpected. Thus, the Tooploox engineers had to deliver top-notch solutions from fields such as:
-
🡒 Signal analysis
-
🡒 Combination of data from various sources
-
🡒 Machine learning
The main element of multi-view solutions is signal analysis – the algorithmic processing of the sensor output to produce a real-time depth estimation. Light’s goal is to deliver a product that delivers significantly better performance than the existing state of the art solutions.
To achieve that, several development stages were necessary.
The effect
With support from Tooploox engineers and designers, Light managed to build a unique system that combines multiple advantages as yet unseen on the market:
-
Top-class accuracy in depth estimations – using only the camera’s input, the system delivers top-class accuracy in estimating depth of up to 1000m away – this range is unmatched.
-
Energy efficiency – the system uses only passive input, making it significantly more energy efficient than other perception solutions, such as lidar and radar. Also, using only passive input makes the solution more suitable for challenging environments, where signal emission can be disturbed by the presence of sophisticated devices.
-
Cost efficiency – the platform is delivered using cameras that are already available on the market, significantly cutting the costs of the solution.
-
Improves performance of the entire sensor stack. – the system not only uses a cutting-edge state-of-the-art depth estimation algorithm, but also validates its output with data taken from other sources, such as GPS, HD maps and neural networks which validate the cameras’ output.
-
Solid basis for further use cases – the primary use case for Light’s Clarity Platform is to make the next-generation of cars safer, with or without a driver. But that is just the beginning. With this level of depth estimation, there are many more use cases yet to be delivered.
Lessons learned
-
A testing environment is a must when delivering sophisticated, hardware-embedded software.
-
With artificial data of reasonable quality, the team can easily test an initial set of hypotheses and filter out the worst-performing iterations to advance their work only with the best, significantly boosting efficiency and reducing delivery time.
-
On the other hand, there are multiple factors in real data that can deliver unexpected results. Additional research is always necessary when transferring from synthetic to real data usage.