Implementing a Machine Learning Model with Core ML Part II – Vision API – Kevin

In my previous blog post, I wrote about the implementation of a general machine learning model using Apple’s Core ML framework. In this blog post, I will continue my discussion of Core ML and focus on a specific aspect of the framework—computer vision.

What is computer vision?

Computer vision, as its name implies, is a field of study that deals with the techniques that enable computers to analyze, understand, and describe images or videos. Using computer vision, applications can perform image analysis tasks such as face recognition, image classification, feature detection. The goal of computer vision is to “use computers to emulate human learning and being able to make inferences and take actions based on visual inputs,” (Bagale). It is important to note that computer vision should not be confused with image processing, which is a process that emphasizes the enhancement, not the interpretation, of images. Computer vision has many real-world applications, including Tesla’s electric vehicles which are now equipped with the Autopilot feature powered by computer vision (as shown in the cover image).

The Vision API

The Vision API was introduced along with Core ML during WWDC 2017. It offers a suite of computer vision technologies including:

  • Face detection
  • Face landmark detection
  • Image Registration (for combining and aligning multiple images)
  • Rectangle detection
  • Barcode detection
  • Text detection
  • Object tracking (for faces, rectangles, and general templates)

While a few of the above technologies were already available in previous SDKs, they are now based on deep learning, which has recently made groundbreaking changes in the field of computer vision. The result of using Vision API is higher precision, fewer false positives, and higher performance. Aside from using the existing options already included in the Vision API, developers can also use trained Core ML models with the Vision API.

In this example, we will build a computer-vision-enabled application that performs real-time object recognition using the device’s camera. We will use the Inception v3 model which is trained to classify 1,000 categories of common objects such as trees and animals.

Implementing Core ML and Vision

The first step is to download the machine learning model. The Inception v3 model has already been converted into the Core ML format and can be downloaded from here. Create a new project in Xcode and, as you’d expect, drag and drop the Inceptionv3.mlmodel file into the project navigator on the left. When you select the model, you should be able to see an overview of the model.Screen Shot 2017-11-05 at 4.47.57 PM

As you can see, the Inception v3 is a neural network classifier (which we will cover in the near future) and accepts the input in the format of an image (color 299 x 299). To perform real-time object recognition, we will need feed every frame of the video to the model. Therefore, the first step is to set up the camera and an AVCaptureSession. Since this step is relatively simple and can be found in Apple’s official documentation, we will not cover this in detail. However, it is important to note that due to Apple’s privacy restrictions, you will need to add a privacy item and description in the info.plist file.

Once the camera and the AVCaptureSession are set up, we can then proceed to the implementation of the Vision API. We can create the “createOutput” method which takes the input video frame and form Vision requests. To do this, we can use the following code:

Screen Shot 2017-11-05 at 6.00.50 PM.png

We also need to create a “handleClassifications” method which reports the results predicted by the Inception v3 model:

Screen Shot 2017-11-05 at 5.47.21 PM.png

Finally, we will add the following code in viewDidLoad to establish the connection between  our Vision requests and Core ML model:

Screen Shot 2017-11-05 at 5.55.47 PM.png

If you elect to add a videoLayer into the interface builder, you can show a real-time video feed in your app. You can also add a UITextView to display predictions made by the Inception v3 model.

Congratulations, you have created your first computer-vision-enabled iOS application! In next week’s blog post, I will begin my introduction of neural networks and its applications.

Works Cited 

Apple Inc. “Vision.” Apple Developer, Accessed 1 Nov. 2017.

Awseeley. “Swift Core ML Machine Learning Tutorial.” GitHub, Accessed 1 Nov. 2017.

Bagale, Ravindra. “DIfferences between computer vision and image processing.” StackExchange – Computer Science, 5 Dec. 2012, Accessed 1 Nov. 2017.

Vision Framework: Building on Core ML. Performance by Brett Keating and Frank Doepke. Apple Developer, Apple, Accessed 1 Nov. 2017.

2 thoughts on “Implementing a Machine Learning Model with Core ML Part II – Vision API – Kevin

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.