Training an Object Detector with TensorFlow (Part II)-Kevin

For last week’s blog post, I wrote a short tutorial for training a custom object detection model using TensorFlow Object Detection API. Due to the limited space and time constraints, my tutorial was not quite finished. Therefore, in this week’s blog, I will continue my tutorial and include additional steps such as the usage of a tool to test your model’s accuracy.

Step 2: Labeling Images & Create Datasets

Once you have gathered enough images for training, you can start labelling them (i.e. identify the robots in a screenshot). We will label these images using the PASCAL VOC format. The software I used is LabelImg, which provides a graphical user interface and exports labelling data as an XML file in the PASCAL VOC format. To install LabelImg, you can visit LabelImg’s GitHub repository, clone the repository, and then compile the python code. To save you the trouble, LabelImg also provides pre-compiled binaries on Windows and Linux machines (If you Google really carefully, there are also unofficial pre-compiled binaries for macOS).

Screen Shot 2018-05-05 at 6.02.25 PM.png

Once you have installed LabelImg, copy all of your unlabeled images into a folder. In my case, I simply named it “robots,” but I am sure you can come up with better names. Open LabelImg and click on “Open Dir.” Select the folder that contains the training images. You will find these images imported into LabelImg. Next, we need to draw rectangular boxes around the robots to tell our model where the robots are in each of the screenshots. Simply click on “Create Rectbox” button and start drawing boxes around robots. When you release your mouse, you will be asked to type in a label. I simply typed in “robot” and hit enter. From my experience, the accuracy of the model can be improved significantly if we draw rectangular boxes around the bumpers (the bottom part of the robot with team number and alliance color) rather than the entire robot. Be sure to hit “save” before moving on to the next image.

Screen Shot 2018-05-05 at 6.06.48 PM.png

Once you are done with labeling robots from the screenshots, your training image directory should look like the screenshot above. Each image should be accompanied with a XML file that describes the labelling information. Be sure to sequentially number the images and XML files (i.e. “robot-1”, “robot-2”, etc.).

Finally, we need to generate a TFRecord file. Before we perform the conversion, we need to create two CSV tables that links the training images to their respective XML files. This step can be automate using GitHub user Datitran’s python script. Save the script to the PARENT directory of your images directory. Then, under the same directory, create a folder and name it “annotations” and copy all the XML files to it. Run the script with the “python” command in terminal or Windows CMD. This will generate a CSV file that has 8 columns.

You can use Microsoft Office Excel or Apple Numbers to read the file. Now, we are going to split it into two files: “train.csv” and “test.csv”. We do not want to be using the same dataset to perform both training and testing. There are various reasons, but the most prominent one is to prevent overfitting of the model. I recommend creating putting 80% of the entries into the training CSV file and reserve the rest for testing.

We also need to clone the TensorFlow Object Detection API library to our local machine before we initiate the conversion process. Download the entire library to your project folder. Once you have created the CSV files and downloaded the library, we can begin the conversion process. Datitran also provides a script to do this. There are two flags associated with the script: input CSV path and output path. Make sure that you have read the script and placed the required files under the right directory before running the script. I suggest creating a “resource” folder and placing it under the root directory of your project. This way, you can put all of you XML, CSV and images under the data folder.

To create the TFRecord file for training, use the following command:

python --csv_input=data/train.csv --output_path=train.record

Similarly, to create the TFRecord file for testing, use the following command:

python --csv_input=data/test.csv --output_path=test.record

If the script has executed without error, you should be able to see two new files: “train.record” and “test.record”. We can then move on to the next step which is configuring the training process.

Step 3: Choosing a Pre-Trained Model

To simplify the training process, you can use a pre-trained object detection model. To download a pre-trained model, you can visit the TensorFlow Object Detection Model Zoo.

Screen Shot 2018-05-05 at 7.21.03 PM.png

As you can see from the screenshot above, you have a lot of options. The key to your selection are the model’s speed and its mean average precision (mAP). On mobile devices such as smartphones or, in our case, the onboard co-processors of the robot, the preferred model is the MobileNet trained with SSD algorithm. It has a much smaller profile than all other models and is much more efficient. The downside is that its mAP is not as good as the ones trained with RCNN or Faster RCNN. Additionally, we will not use the last four models because the outputs are masks as opposed to boxes. To download the pre-trained model, simply click on the model name and unzip (or untar) the downloaded file. Create a folder called “data” under the root directory of your project and copy all of the following files from the unzipped folder to “data” folder:

  • check
  • frozen_inference_graph.pb
  • model.ckpt.index
  • model.ckpt.meta

The weights in the pre-trained model are frozen and will be adjusted based on the robot images you feed in. Again, to save you the trouble of configuring a Object Detection Pipeline, you can download a pre-generated pipeline configuration file. Be sure to select the config file with a file name that matches the name of the pretrained model. You can also see the config files often end with either “pets” or “coco.” We want to use the “coco” version of the config file because the ending marks that the config file was used to train a model using the COCO (Common Objects in Context) dataset. Obviously, we do not want our model to be trained in the same way as a pet detector is trained. Additionally, you can now move your “train.record” and “test.record” into the data folder.

The placement of the pipeline configuration file is not specified, meaning that you can place it under the root directory. Open the file with a text editor or a IDE, you can see the file structure. You will need to carefully follow the following step to correctly configure the file:

  1. Using the text editor of your choice, create a file called “object-detection.pbtxt”. Type in  
    item {
    id: 1
    name: 'robot'
     and save the file to the root directory.
  2. Navigate to the section (should be at the bottom) named “train_input_reader” in the config file. Here, you will need to point the “input_path” to your “train.record” and point the “label_map_path” to the “object-detection.pbtxt” file you just created. It is important to note that on Window machines, you will need to use slashes as opposed to backslashes.
  3. Similar to No. 2, find the “eval_input_reader” and point “input_path” to “test.record” under the “data” folder. Point “label_map_path” to the same .pbtxt file.
  4. Depending on the number of samples you created, you need to adjust the “num_examples” under the “eval_config” section. Set the number to the amount of entries you have in your test.csv file. Setting the number arbitrarily might result in some of your test images not being evaluated.
  5. At the very top of the document, adjust the “num_classes” to 1 because we only have one category of object we want to detect––robot.
  6. You might also want to lower the “batch_size” to be less than 5, depending the amount of available video memory of you GPU.

Press Ctrl + S or Command + S to save the file to the root directory, and we are done with Step 3.

Step 4: Training the Model + Monitor with TensorBoard

Training the model is, in fact, one of the easiest steps in this tutorial. Create two folders under the root project directory and name them “traindir” and “eval_dir”. Open up terminal on Mac or Linux or Command Line Tools on Windows navigate to the folder you created for this project using the “cd” command. Run the following command:

python object_detection/ \
--logtostderr \
--pipeline_config_path={the name of your pipeline config file} \

TensorFlow will need a minute or two to initiate the training process and, after that, you custom object detector will begin training. If you are using the GPU accelerated version of TensorFlow, you should have a huge performance gain over the non-GPU accelerate version. You should also be able to see the name of your graphics card in the logs after running the command above.

You are also able to check the total loss using TensorBoard, which helps you visualize learning and debug. TensorBoard comes preinstalled with TensorFlow. To use it, simply open up an additional terminal or CMD window and type in the following command:

tensorboard --logdir=traindir

Copy the address provided in the command line windows and paste it into your browser. If you click on “scalars,” you should be able to see multiple graph. We want to focus on the “TotalLoss”, and the general trend of the graph should look like the following as more steps are completed:


Step 5: Testing the Model

To test the accuracy of your model, you might want to pause the training beforehand. Don’t worry: your progress will be saved as the checkpoints are constantly saved by TensorFlow. If you have a powerful GPU (such as the Titan X or GTX 1080Ti), you can run the evaluation job while your training job is running. To being the evaluation job, open up a new terminal or CMD window and run the following command (You will find it surprising similar to the training command):

python object_detection/ \
--logtostderr \
--pipeline_config_path={the name of your pipeline config file} \
--checkpoint_dir=traindir \

Once you have began the evaluation process, you can also monitor the evaluation results using TensorBoard by running the following command (Again, you will find it surprisingly similar to the training TensorBoard command):

tensorboard --logdir=evaldir

You should be able to see the mAP graph as well as the result of several test image after being fed though the model by clicking on the “image” button on the top.IMG_1142.jpg

This blog post effectively concludes my independent research blog. I hope that you have been inspired since the inception of this blog and want to thank you for being such a loyal reader. Please continue to follow Westtown School’s independent seminar blog page so that you can continue to be inspire by all the other talented writers of Westtown.

Kevin Wang
May 6, 2018

Works Cited

Choi, Jongwook. “A Practical Guide for Debugging TensorFlow Codes.” GitHub, 18 Feb. 2017, Accessed 6 May 2018.

GitHub. Accessed 6 May 2018.

Nealwu. Kites Detections Output. GitHub, 21 Sept. 2017, Accessed 6 Apr. 2018.

Tran, Dat. “How to train your own Object Detector with TensorFlow’s Object Detector API.” Towards Data Science, 28 July 2017, Accessed 6 Apr. 2018.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.