While the third article in our summer series highlighted the use of natural language processing at Egis, this week we look at another major capability of artificial intelligence. Extremely effective, machine vision is the ability of AI to identify an object, an individual or a shape in a still or moving image.
The possibility of a computer being able to identify an object or a person in a photo or video is a quest that has stimulated the scientific community for over 60 years.
Very early on, manufacturers got wise to the potential of artificial intelligence to automate product inspections as part of their quality processes. Similarly, doctors were quick to grasp the benefits of having a "super assistant" to help them detect tumours at an early stage. Machine vision has a huge number of applications, because it gives computers the ability to perceive and 'understand' their environment. It can be used to identify plants, count crowds and flocks of birds (and even recognise individual birds!) or monitor events. It enables machines to model the environment, which is essential for autonomous vehicle navigation, or to generate 3D shapes efficiently.
In the 1960s, the first experiments in computer vision were carried out at American universities. The most emblematic was probably the "Summer Vision" project carried out in 1966 at MIT (Massachusetts Institute of Technology, USA). The aim at the time was to develop a system enabling computers to identify and categorise objects in images.
Since then, many researchers have contributed to the development of computer vision. In the 70s and 80s, these included David Marr and his work on the representation and processing of visual information, as well as Yann LeCun (current head of AI at Meta), Geoffrey Hinton (ex-member of Google Brain) and Yoshua Bengio (Professor at the University of Montreal). The latter were the fathers of the use of deep learning (neural networks) to develop computer vision. Deep learning, which they applied in the ImageNet recognition competition in 2012, gave a breathtaking demonstration of its effectiveness, reducing the identification error rate from 25% to 16%. A few years later, it was further reduced to a few percent.
For their work, they received the Turing Award in 2018.
Before going any further, let us take a moment to look at the concepts of "deep learning" and "neural networks". These have played a major role in the development of machine vision.
Convolutional Neural Networks (CNNs) are widely used in computer vision to process images. Each layer of a CNN applies different filters to recognise small patterns, then assembles these patterns in higher layers to recognise larger, more complex shapes. Thanks to deep learning, computers have been able to learn to identify and locate objects in images and videos. This was followed by 'transfer learning', which made it possible to use models already pre-trained on large databases to perform specific tasks with a small amount of data, generating significant savings in terms of time and computing resources. It should be emphasised that it is deep learning that has made it possible to detect facial expressions in humans and animals. This is very important because it characterises the link to human emotions.
For the sake of completeness, we should mention other learning techniques such as generative models and, in particular, generative adversarial networks (GANs). These models are used to create new images that resemble real examples and that will be used for learning!
The vision capacity of artificial intelligence has therefore transformed the way in which machines perceive and interact with the world around them. These technologies are already being used by the Egis group.
Indeed, just like natural language processing technologies, the Egis group already deploys and makes extensive use of solutions based on machine vision.
By way of example, our Consulting and Operations teams have already produced several AI proofs of concept and deployed various solutions in its activities. To illustrate this approach, here are two examples of projects carried out by the group in recent months.
The first project, AI For Infra Monitoring, which began at the end of the first half of 2022, deals with the predictive maintenance of infrastructures, in particular roads. Conducted with partner IRIS GO, this project, piloted between France and Australia (the Egis sponsor of the project is Mark Woolstencroft), addressed the following goal: can we automatically identify defects in the road and ROW assets (road markings, guardrails, signage)? The answer is yes!
All it took to achieve this innovation was to train artificial intelligence to recognise defects (using supervised machine learning). The machine identifies and locates the problem so that the team can take charge of it and solve it.
The solution now exists in the form of a Proof of Concept (POC) for in-depth evaluation. It makes it possible to continuously monitor the infrastructure and improve inspection quality. Staff safety, efficiency and responsiveness are enhanced, as is the quality of reports, because this technology interfaces easily with existing solutions (GIS, ERP and asset management).
The second project, Satellite Monitoring, deployed in Portugal in 2020 and in Turkey during 2023, involves the satellite surveillance of roads and their environment. The aim is to improve the efficiency of monitoring and the optimisation of work when necessary. Thanks to AI, it is possible to anticipate changes in the roadway and its immediate environment (terrain and vegetation).
In the first phase of the project, the objective was restricted to preventing landslides and monitoring vegetation (detecting tree diseases, for example). In the second phase, monitoring was extended to the road surface, structures, road markings and railings. The solution also includes an assessment of the site's carbon storage.
This innovation, based on the processing of satellite images by AI, makes it possible to look at the past, manage the present and anticipate the future.
Ultimately, computer vision, based on identifying the content of photos and videos, gives the machine a certain 'perception' of its external environment. There are an ever-increasing number of applications for this, ranging from facial recognition to unlock smartphones, to early detection of cancerous tumours, to autonomous vehicles that "perceive" their environment. Machine vision is everywhere!