When Surveillance meets AI

Times:2017-05-11 Browse:1291
What will the world be like when surveillance meets AI?

Security camera shipment continues to grow at 14.1% CAGR to reach 190 million units in 2020 (according to IHS report, September 2016) despite revenue growth slowed to 8.1% CAGR. There are just too many cameras and video footage to be digested by human operators. Most security video footage is erased or over-written without being watched. Video Analytics Technology was once perceived as a solution to automate the utilization of abundant video footage resources. By means of identifying and tagging the appearance of certain patterns in a video, the system could perform search and run statistics on it. Such output could further be accumulated and analyzed to find trends and correlations. However, the potential has not been translated into business momentum. Complexity in analytic algorithm made it difficult to develop new software to detect a desirable pattern and the tremendous demand in CPU processing power made it difficult to get timely analytics output. Artificial Intelligence may be the key to unlock this potential.

Video Analytics Technology has been evolving over the past 10 years. It is getting on the headline more often lately due to the use of Artificial Intelligence. Machine learning greatly simplifies the software development process and the processing power of GPU made it possible to perform near real time video analysis. For example, the 2016 G20 summit, China has deployed security solution developed by Dahua Technology using AI – deep learning to automatically screen pedestrians in airport and train stations for criminal suspects.

Schema for Dahua facial recognition technology

Deep Learning has been Accelerating the Pace of Intelligent Surveillance

Deep learning refers to artificial neural networks that are composed of many layers. It aims to emulate human’s ability to analyze and study. It imitates the mechanism of the brain in order to interpret data, such as image, voice and text. Deep learning has been successfully applied in image and voice recognition and is set to be a future development direction. In 2013, deep learning was listed by MIT as one of the top ten breakthrough technologies.

In the security industry, the application of deep learning is important for two reasons. Deep learning, on the one hand improves the accuracy of some algorithms. On the other hand, it realizes functions which cannot be done without using deep learning. For instance, facial recognition includes three key parts: face detection, facial features alignment and feature extraction comparison. If deep learning technology was adopted, the performance of each part would be improved dramatically. Using deep learning, the facial expression, gender, age, hair color, accessories, emotion etc. all can be better recognized. Moreover, GPU can be used to accelerate the computation of deep learning algorithm. Traditional intelligent analysis is unable to cover a large-scale scene with more than 300 people, not to mention group analysis of moving scenes. Now based on deep learning technology and GPU, it can easily deal with 300 targets simultaneously and further estimate the crowd density and identify the movement of the crowd, to provide more useful information to security staffs.

Obviously, deep leaning accelerates the development of intelligent surveillance. On 7th March, 2017, Dahua, worked together with NVIDIA, a world-leading Artificial Intelligence (AI) computing company, to launch the “Deep Sense” server for smart video structure analysis. Meanwhile, Dahua also cooperated with many renowned universities in and out of China to advance research on deep learning. As a result, Dahua’s face recognition algorithm ranked number one on the public authoritative testing platform LFW, beating Tencent, Google and other top academic and commercial groups around the world.

Dahua made an early start on AI Technology

Dahua Technology made an early start in AI application amongst players in the global security industry. In 2009, Dahua established a department to research on intelligent algorithms, exploring potential applications in security solutions. The department was later merged with other research groups to form the Institute of Advanced Technology, which focuses on advanced technologies on AI, optics, Codec and ISP, etc. ANPR (automatic number plate recognition) by Dahua has greatly improved traffic and parking management for better environments, promoting sustainable urban development. Deep learning is also being applied to the recognition of vehicles and people. Human objects can be classified according to clothing, hair color, wearing eye glasses, backpack, gender, age range and even facial expression. Vehicles can be classified by color, make, model and type on top of vehicle license plate.

Vehicle Identification and Statistical Analysis

The ability to utilize AI to identify and analyze vehicles is going to be very valuable. A witness may remember the color and make but not the plate of a vehicle. After applying deep learning, there has been an obvious improvement in AI-powered security applications. On the one hand, the rate of plate number recognition has increased significantly. On the other hand, it is now able to identify car features like type, make, model and color in a more systematic way. Combining various elements in one search, it becomes possible to identify a target vehicle even if the license plate is not captured.

Dahua Vehicle Identification

Human Recognition and Statistical Analysis

Traditional intelligent video analysis technology was previously not able to perform recognition of body shape, gender, age, hair color or hair length, but Dahua’s deep learning technology made it all possible. Deep learning video analytics server handles recognition of up to 80 people within 40ms. Human recognition also suits to be applied in crowded places with continuous flows of people, such as on escalators, crossroads, business centers and gates of exhibition centers, and its accuracy rate reaches up to 95%. As long as there are enough training done, the recognition rate is only constrained by how big part of the target is exposed to the camera and their moving speed. Just as if a human operator is watching the video full time.

Application of AI to Campus Safety

In recent years, the American TV series called Person of Interest has been very popular. This TV series described details of predicting crimes by AI.  A software genius called Finch invented a program for advance recognition of potential violent criminal based on observing a pattern. Sounded like science fiction but it is close to becoming reality with AI deep learning.

American TV series--Person of Interest

A GPU powered “Deep Sense” server can cover 192-channels of HD video. Unlike previous Intelligent Video Analytics (IVA) which can only monitor the key entrances due to cost and capacity limitation, it becomes technically and economically viable to fully monitor the surveillance system of a typical building campus. With a rich set of search criteria, it is much more likely to get a match even without get a clear face shot of the target. The system can trace the trail of a target to screen for “behavior of interest”. This helps police improve their speed of solving crimes and deter criminals thereby improving security. For example, if the police want to find a suspect who is a middle-aged man with red umbrella, they can search the key words like “red umbrella”, “male”, “30 to 50 years old” and so on in the system. The AI system can perform a quick search and therefore saving a lot of manual work.

Developing Trends for AI Technology and Application in Security

The development of AI applications will likely faces many obstacles and difficulties but the trends are optimistic. The advance in human object and vehicle recognition has made significant impact to security applications. Voice recognition is likely to be the next driver. Acoustic pattern can be combined with human behavior pattern or vehicle characteristics to narrow down a search faster and reduce false alarm. Voice can also be a form of data entry or interaction. Hand gesture and body gesture or a combination of these could help the “machine” to understand the context of what is happening.


So, what will the world be like if surveillance meets AI?