Computer Vision: Overview of a Cutting Edge AI Technology
Explore computer vision more deeply and look at what analysts are saying about it.
Join the DZone community and get the full member experience.
Join For Free
Today's technology landscape is looking great. Artificial intelligence has begun to move from the margins to the mainstream of the global economy and has reached a great level of interest for businesses and the general public.
Among the various disciplines of AI, computer vision is acquiring considerable momentum. Let’s see what it is all about.
Industry 4.0
Progress in artificial intelligence and robotic technologies tends to reduce the gap between humans and machines capabilities, although there is still a substantial way to go to meet the ultimate goal of a human-like machine. Industry 4.0, which is increasingly developing autonomous vehicles or drones, sees the rise of advanced devices such as cameras and image sensors.
Advanced technologies provide a means to perform more and more complex tasks. That allows robots or automated processes to replace humans in order to free them from tedious tasks, giving them space and time to pursue valued work.
Data Is the Key
Viewed through the prism of technology, data is the cornerstone of digital transformation projects that successful organizations are conducting nowadays. Data can be perceived as the best link between humans and machines. Whether these are numbers, texts, or more complex data including audio, videos, or pictures, digitized information allows humans to communicate with machines — and vice versa — and also lets the machine “understand” the world around them.
What Is Computer Vision
As the term suggests, computer vision describes a set of technologies that enables computers, software, robots, or any device to acquire, analyze, and process images. The different possible sources of images are large. They can be photographs, videos, 3D devices, data from medical or industrial scanners, and many more. The aim is to provide these devices — including drones, transportation machines, or even just a simple computer — to “see” and react depending on the information they get. In its intricacy and its end-use examples, computer vision is often compared to voice recognition.
You might not be familiar with this concept and the technologies behind computer vision. However, one of them, OCR (Optical Character Recognition), is quite popular, as it has been used to recognize text within photographs or scanned documents for years. Handwriting recognition has been used for decades by bank systems in order to read checks. Object Recognition has long been used in many industries to automate quality control or to sort products in factories to cite only a few examples.
Computer vision is tied to AI in the sense that not only the device needs to see, but immediately after this recognition phase, it needs to analyze and interpret what it “saw.” This is in order to take appropriate actions and interact with its environment.
In the same way that today's AI or IoT technologies result from research started in the early 60s, computer vision is no exception. The report of the Summer Vision Project, “The construction of a system complex enough to be a real landmark in the development of -pattern recognition-” was issued in 1966.
Computer Vision vs. Image Processing
Note that there should be no confusion between computer vision and image processing. As a matter of fact, image processing is about analyzing digital images or implementing algorithms and including classification, extractions, editing, or filtering, for example. Image processing points out technologies and methodologies that are used to increase picture in terms of information while computer vision is aiming to lead to practical actions.
While, obviously, computer vision often leads to image management, it can also be used to conduct various operations including object recognition or event detection, for example.
What Analysts Are Saying
- Forrester: “Thanks to massive training data sets, deep neural networks, and graphics processing units (GPUs), computers can now accurately identify objects and features in images and video. CIOs and business technology leaders should understand how they can leverage computer vision for security, social media monitoring, marketing asset management, manufacturing, and myriad other use cases involving the classification of unstructured image data.”
- Deloitte: “Many technology sector companies have yet to turn their attention to how cognitive technologies are changing their sector or how they — or their competitors — may be able to implement these technologies in their strategy or operations .../...computer vision is the ability of computers to identify objects, scenes, and activities in unconstrained (that is, naturalistic) visual environments.”
- Arcognizance.com: “Artificial intelligence is associated to human intelligence with related characteristics such as language understanding, analysis, learning, problem-solving and others and it is situated at the core of the next generation software technologies in the market. Leading technology companies have dynamically executed AI as an essential part of their technologies. Computer vision segment is expected to grow at the highest CAGR due to the increasing implementation of computer vision in autonomous and semiautonomous applications in different industries such as manufacturing and automotive.”
- IDC: "Computer vision software technologies are transforming how traditional industries, such as automotive, retail, insurance, and healthcare, are operating. By adding computer vision components into a product or service, vendors within this space are able to increase efficacy while reducing costs."
- McKinsey: “Artificial intelligence is poised to unleash the next wave of digital disruption, and companies should prepare for it now. We already see real-life benefits for a few early adopting firms, making it more urgent than ever for others to accelerate their digital transformations. [Among the most disruptive] five AI technology systems: robotics and autonomous vehicles, computer vision, language, virtual agents, and machine learning, which includes deep learning and underpins many recent advances in the other AI technologies.”
Illustrative Practical Examples
Robots and autonomous machines like self-driving cars are traditionally the favorite fields for computer vision. However, the reality is that computer vision technologies are becoming increasingly prevalent in more and more domains such as:
The Medical Field
Huge progress is constantly made in the fields of pattern recognition and general image processing. At the same time, it appears unquestionably to the medical community and experts in the healthcare field that medical imaging has become an essential part of their whole panoply of ways to get better diagnostic tools or to considerably increase their capacity for more effective actions.
Analyses of medical images is a big help for predictive analytics and therapy. For example, computer vision applied to colonoscopy images can increase the level of valid and reliable data in order to reduce colorectal cancer-related mortality.
In another example, computer vision technologies also provide technical assistance to surgery. 3D image modeling of the skull, as part of brain tumor treatment, provides tremendous potential in advanced neurosurgical preparation. Also, since deep learning is increasingly being used in AI technologies, leveraging it for classification of lung nodules has made tremendous progress for early diagnosis of lung cancer.
Retail
Computer vision is being used in stores more and more, particularly helping to improve client experience. Pinterest Lens is a search tool that uses computer vision to detect objects the same way Shazam detects music. Using the smartphone app in stores, you can visualize a product and it will return you other products related to it.
Facial recognition is a well-known application of computer vision that can be used in a mall or in a shop. Lolli & Pops, a candy store based in the US, is using facial recognition to reward clients' loyalty. "Imagine this: You walk into your favorite store and the sales associate welcomes you by name and on-demand, shares with you which of their latest products you would most likely be interested in." That is the promise of their technological innovation that can make personalized recommendations specific to each customer.
Applications seem unlimited. They could also include the analysis of back and forth between shelves or levels in a store and possibly even analyze customers' moods. Emotion detection is based on algorithms that catch a face within a video and analyze micro expressions, process them, and at the end, interpret general feelings.
The end to checkout lines might be the ultimate goal of store technology improvements. Computer vision combined with AI might finally terminate the queuing for the checkout nightmare.
Amazon developed a concept, Amazon Go, that leverages technologies including computer vision, IoT, and AI to detect, track, and analyze customers' behavior and actions in the shop in order to automatically process checkout and send them an electronic receipt.
Banking
When it comes to associating AI technologies with banking, we are mostly thinking of fraud detection. While it's a really special area of focus for advanced technology in this domain, computer vision has much to offer in terms of innovation. Image recognition applications using machine learning to classify and extract data, to supervise the authentication of documents including IDs or driving licenses, for example, can be used to improve remote customer experience and increase security at the same time.
Drone-Based Fire Detection
The widespread and varied use of computer vision also applies to specific niche markets within the security area. Drones, or UAVs, can leverage computer vision systems to enhance humans' abilities to detect forest fires, using infrared images (IR) as part of forest fire surveillance protocols. Advanced algorithms analyze video image characteristics such as motion or brightness to detect fire. The system is making targeted extracts to make it easier to spot patterns and calculate how to see a difference between actual fires and motions, which may be mistakenly interpreted as fires. These drones can also improve firefighter security and their operational efficiency while doing for them surveillance or researches in dangerous zones. They can run advanced algorithm-based analyses to check smoke and flames to evaluate risks to predict fire propagation.
Advanced Technologies Ecosystem
According to ResearchAndMarkets.com's research, "The AI in computer vision market is expected to be valued at USD 3.62 billion in 2018 and is expected to reach USD 25.32 billion by 2023."
The number of technologies that is part of computer vision is wide and includes, for example, image recognition, which is used to recognize objects, people, but also actions, just before machine learning or Cloud Computing to take advantage of the resources in terms of CPU and in terms of storage capacities but also Edge Computing as many usages such as autonomous drones needs to process at the very place where they are created. Among those advanced technologies, machine learning and deep learning, in particular, allow the development and progression of computer vision.
Machine Learning
Machine learning is a class of algorithm aimed at providing applications a higher level of accuracy. The interesting point is that those algorithms do not necessarily need to have a clear-cut plan to achieve this. Based on data input flow, recurring statistics, and advanced analytics, they can constantly improve the value of outcomes.
Machine learning relies on the high potential of datasets. Simply put, a data set is basically a collection of related data that are combined to bring more value and get easier to accede.
The computer vision ecosystem is providing to the technical community a large amount of free image datasets. For example, Columbia University Image Library shares a dataset featuring 100 different objects imaged at every angle in a 360 rotation (COIL-100).
Deep Learning
Deep learning is a subpart of artificial intelligence based on the principles of human ways of learning to get to a better level of knowledge. Therefore, it provides possibilities to improve processes including the accuracy of computer vision outcomes.
Deep learning algorithms rely on neural networks to map subprocesses as a hierarchy of concepts. These complex concepts are sub-categorized into a sequence of much simpler concepts.
Facial Recognition
The scope of facial recognition is to map and store a digital identity thanks to deep learning algorithms. This type of Biometric Identification can be compared to the more famous voice, iris, or fingerprint identification technologies.
Anecdotally, it started in 2011 when Google proved it was possible to make a face detector using only unlabeled images. They designed a system that could learn by itself to detect cat images without “explaining” to this system what a cat looks like.
At that time, the neural network was 1,000 computers made up of 16,000 cores. It was fed with 10 million randomly selected YouTube videos… Dr. J. Dean, who worked on this project, explained in a New York Times interview that they never told the system during the training that "this is a cat'", so it basically “invented the concept of a cat".
Computer Vision in Daily Life
Today, smartphones can use high-quality cameras to identify. For example, the iPhone X from Apple runs face ID technology so that users can unlock their phones. This face ID data is encrypted and stored in the cloud and can also be used for authentication purposes when paying for something.
In China, experts who are conducting research on computer vision technologies are implementing it in everyday life at a firm pace. Not only are China’s consumers using their smartphones and facial recognition capabilities as a preferred means of payment, but this technology also helps detect and apprehend criminals.
What Does This Mean for Humans?
Computer vision is being used in the security sector to search for criminals, it's being used in urban security to predict emergency movements of crowds, and more.
By developing more and more complex and effective advanced computer vision algorithms, we are improving its corollary, human speech recognition, as both topics rely on comparable principles. All of this contributes to strengthening the situational awareness capabilities of AI and robots.
Increasing deep learning capabilities and the growing power of machine learning algorithms is a continuous cause for concern, or at least a subject that will require special attention. As a matter of fact, it brings up the issue of privacy and ethics, among other things.
However, that doesn't mean that research should stop. On the contrary, as much as any other technological development process, it has to be supervised by global collective intelligence rather than only a global industrial or military power or hegemonic nation.
Opinions expressed by DZone contributors are their own.
Comments