In my previous blog post, I wrote about the implementation of a general machine learning model using Apple’s Core ML framework. In this blog post, I will continue my discussion of Core ML and focus on a specific aspect of the framework—computer vision.
What is computer vision?
Computer vision, as its name implies, is a field of study that deals with the techniques that enable computers to analyze, understand, and describe images or videos. Using computer vision, applications can perform image analysis tasks such as face recognition, image classification, feature detection. The goal of computer vision is to “use computers to emulate human learning and being able to make inferences and take actions based on visual inputs,” (Bagale). It is important to note that computer vision should not be confused with image processing, which is a process that emphasizes the enhancement, not the interpretation, of images. Computer vision has many real-world applications, including Tesla’s electric vehicles which are now equipped with the Autopilot feature powered by computer vision (as shown in the cover image).
The Vision API
The Vision API was introduced along with Core ML during WWDC 2017. It offers a suite of computer vision technologies including:
- Face detection
- Face landmark detection
- Image Registration (for combining and aligning multiple images)
- Rectangle detection
- Barcode detection
- Text detection
- Object tracking (for faces, rectangles, and general templates)
While a few of the above technologies were already available in previous SDKs, they are now based on deep learning, which has recently made groundbreaking changes in the field of computer vision. The result of using Vision API is higher precision, fewer false positives, and higher performance. Aside from using the existing options already included in the Vision API, developers can also use trained Core ML models with the Vision API.
In this example, we will build a computer-vision-enabled application that performs real-time object recognition using the device’s camera. We will use the Inception v3 model which is trained to classify 1,000 categories of common objects such as trees and animals.
Implementing Core ML and Vision
The first step is to download the machine learning model. The Inception v3 model has already been converted into the Core ML format and can be downloaded from here. Create a new project in Xcode and, as you’d expect, drag and drop the Inceptionv3.mlmodel file into the project navigator on the left. When you select the model, you should be able to see an overview of the model.
As you can see, the Inception v3 is a neural network classifier (which we will cover in the near future) and accepts the input in the format of an image (color 299 x 299). To perform real-time object recognition, we will need feed every frame of the video to the model. Therefore, the first step is to set up the camera and an AVCaptureSession. Since this step is relatively simple and can be found in Apple’s official documentation, we will not cover this in detail. However, it is important to note that due to Apple’s privacy restrictions, you will need to add a privacy item and description in the info.plist file.
Once the camera and the AVCaptureSession are set up, we can then proceed to the implementation of the Vision API. We can create the “createOutput” method which takes the input video frame and form Vision requests. To do this, we can use the following code:
We also need to create a “handleClassifications” method which reports the results predicted by the Inception v3 model:
Finally, we will add the following code in viewDidLoad to establish the connection between our Vision requests and Core ML model:
If you elect to add a videoLayer into the interface builder, you can show a real-time video feed in your app. You can also add a UITextView to display predictions made by the Inception v3 model.
Congratulations, you have created your first computer-vision-enabled iOS application! In next week’s blog post, I will begin my introduction of neural networks and its applications.
Apple Inc. “Vision.” Apple Developer, developer.apple.com/documentation/vision. Accessed 1 Nov. 2017.
Awseeley. “Swift Core ML Machine Learning Tutorial.” GitHub, developer.apple.com/videos/play/wwdc2017/506/. Accessed 1 Nov. 2017.
Bagale, Ravindra. “DIfferences between computer vision and image processing.” StackExchange – Computer Science, 5 Dec. 2012, cs.stackexchange.com/questions/7050/differences-between-computer-vision-and-image-processing. Accessed 1 Nov. 2017.
Vision Framework: Building on Core ML. Performance by Brett Keating and Frank Doepke. Apple Developer, Apple, developer.apple.com/videos/play/wwdc2017/506/. Accessed 1 Nov. 2017.