Computer Vision is a part of Artificial Intelligence, that involves algorithms that are trained using visual data (pictures or video) and once they learn they can perform classification and prediction tasks based on new data (new pictures or video).
A typical application would be to identify an illness or disease in a particular medical image (such as an X-Ray or an MRI), to see if a patient has a particular disease or not (1 YES and 0 NO). Classification can be more complex, for example in a video sequence, the goal would be to identify objects in a room such as a chair, a bed, a desk, a TV, a phone, etc.
This can be used for example to count and monitor unique consumers at a retail store, calculating customer conversion rates but also identifying the purchasing pattern of those that buy and not so the commercial strategy & in-store layout can be optimized.
How can we use Computer Vision Models?
-
Object recognition.
-
Face recognition.
-
Pattern recognition.
-
Manufacturing optical inspection.
-
Manufacturing quality auditing.
-
Retail consumer tracking.
-
Agriculture crop monitoring.
-
Traffic management.
-
Autonomous vehicles.
-
Document analysis and extraction.
-
Security surveillance / intrusion detection.
-
Medical diagnostics.
There are tons of literature about this, but the bottom line is that there are 2 types of Computer Vision algorithms based on their application.
-
Image Processing. Classification operations can be done by training a convoluted neural network with images and once trained, the algorithm can classify any new image. Classification can be simple (binary) for example to state is this image of Person “A” (Yes or No) or is this X-Ray of someone that has Edema? (Yes or No), or it can be more complex, recognizing multiple objects within one image and/or recognizing multiple gestures and/or features within the same image.
-
Video Processing. Classification can be done the same way as for an image, training a convoluted neural network with images and once trained, the algorithm can classify any new image. A video is a sequence of images so it can conduct a continuous of classification operations as each image of the video is displayed and so it can perform the classification on each snapshot or the group of snapshots and can “track” or follow the pattern of such classification operations or object identifications along the video sequence.
There are more Computer Vision models, for example some models enable to not just identify a particular “state” (for example a disease in an X-Ray) but also identify the “stage” in which that image (or video) is (for example if it is 2nd stage Cancer and predict how much time to get to 3rd stage Cancer with a certain level of accuracy).
Applications of Computer Vision can also be very creative and multiple video sources can be used simultaneously – for example to identify theft patterns at a retail store.
ADDITIONAL SERVICES
Computer Vision Models oftentimes require a lot of work on images and video files. These sometimes can be grouped together to create additional products, such as training software and simulators.
Furthermore, while identifying objects, new objects can be overlaid into the video elements (e.g., from 2D to 3D objects, creating an augmented reality or even virtual reality output). So, it is possible to create augmented/virtual reality solutions.
And we can connect the augmented reality or virtual reality solutions to other simulators (like process simulators), to generate a fully inmersible experience where users not only see and interact with each other but even interact with technologies in the real world in real time.
Contact our sales team-
Goal – what are we trying to achieve and why?
-
Data availability & labeling – what data do I have available? Is that data enough to describe the goal? Do I have the data labeled or can I label the data so I can proceed with the training of the model? Note that in this case the data are pictures, images, or video!! Labeling of the data means associating each picture with the classification condition that wants to be predicted (e.g., is this person X).
-
Data hygiene & EDA. Is the data quality fit for purpose? What are the early conclusions driven by the exploratory data analysis? Should we include all features? Do we need to pre-process data (scale, transform, encode, etc.)?
-
Model creation. Code the model using a convoluted neural network or a convoluted deep neural network. These are neural networks with initial convolution layers that provide filtering and pooling functions to isolate features in the images, while the rest of the neural network can be a typical artificial neural network, given the huge amounts of data these networks are usually very large in number of neurons and layers. Optimization tasks are complex and time consuming. Training can take days, even weeks!
-
Model production. Once we have a solid training model, create an application that uses this model for the required prediction. This is usually done by creating a simple website or application that requests the user for an image or video, runs the model with the provided image or picture and displays the prediction so a decision can be made. (for example, if the patient of the picture or image is sick).
-
Purpose relates to the objectives that the company has related to the business problem they want to resolve using the Computer Vision model. For example, the type of information they want to predict (e.g., does the image or video represent a YES or NO condition, etc.).
-
Data relates to the available data, labeling of the data, quality of the data and preprocessing of the data so it can be used for modeling.
-
Output relates to what the prediction will be (for example 0 for image or video condition to be “false” and 1 to be “true”), as well as the performance indicators used for the success criteria (for example, Recall, Precision, F1, Accuracy, etc.).
-
Platform relates to the requirements of the application that will put the Computer Vision model into production (once trained).
-
Deployment relates to how to deploy and maintain this Computer Vision model solution within the company and circle back to the Purpose.
A: Every computer vision model is unique. Now, there are many models already created and available, and these can be leveraged sometimes, which helps cut down the cost of training. In some cases, the models that exist are not focused on a specific application, but they do have quality work done to identify the features in a picture or video which can be leveraged but other parts of the neural network have to be trained. Because of this, every project needs to be quoted individually.
A: As time goes by, Computer Vision models become better and faster. Newer models tend to perform the best; however, some older models excel at specific applications. Ultimately the project goals and purpose define what algorithm will work best and it should balance the performance of the trained model vs. the specifications needed, vs. the cost of development and training. As said, training of these models tends to be expensive.
A: Computer Vision models are not cheap. The bulk of the cost is divided in 2 parts: data preparation, which is ~50% of the project, and CNN training, which is ~35% of the project. The other 15% of the project is the actual model coding & optimization. Having said this, if you have a very solid dataset (image/video set), well labeled, balanced, with high resolution & image quality, the cost can go down significantly. If the accuracy of the classification or prediction does not need to be very high, the cost can also go down.
An example for the medical industry, to have a diagnostic tool with high accuracy and where images are in varying quality, over 250,000 images are available, but they are not balanced and tons of work are needed to organize them and prepare them for the model; in this type of project assuming is a classification project, the cost would be in the range of $40,000 to $80,000 USD., for a single classification and/or prediction.
A: This is a difficult question to answer. Working on images is not fast and if there are several hundred thousand of them it will take some time to label them properly, clean them up and prepare them for modeling (e.g., standardize them, scale them, etc.). Also, if the requirements of the model are very high accuracy results, the training can take a very long time even on the fastest computers. So, the total time can be high. Now there can be smaller projects with a fast turnaround as well, since not all Computer Vision projects are huge.
For example, a typical object recognition system can be made using an existing pre-trained model which can significantly drive the cost down and provide a very good result to the customer, in a few days!
A: SentientInfo has tremendous experience with Computer Vision model projects of various kinds, many kinds of applications; we have specialized in the medical industry where we have created several Computer Aided Diagnostic tools for over 25 illnesses including Pneumonia, Edema, Lung Cancer, Macular disorders, and Breast Cancer detection. We have also specialized in industrial applications, detecting component or assembly quality on the conveyor belt.
We also have experience in retail, with systems that count customers and calculate consumer conversion but also that monitor consumer activity within the store and even security systems that identify theft or other threats within the store.
In all cases we code everything ourselves. Another benefit is that we have been users of such models in large corporations and understand the complexities of working with large companies, manage multiple stakeholders and deal with corporate politics to achieve superior results. So, in summary: we know our stuff, we are lean, and we know how to do it technically and teamwork with corporation teams including adapting to local culture to achieve superior results.