What can you do if your app can understand what is happening in the world? Computer Vision is perfect for automated tasks, IoT devices, monitoring, augmented reality, and detecting all kinds of visual signals (facial expressions, gestures, etc.).
Equipped with computer vision, you can start programming a robot to navigate around the room, search through your album collection based on specific features, or set up a web cam on your laptop to stop you from stressing your eyes out or falling asleep.
The easiest way to get image recognition on your app is to use an online service. Today we are going to use a service called Azure, provided by Microsoft. Azure has an image classifier that has already been trained, and you can access it by subscribing to Computer Vision API.
For those of you who want to have a bit of flexibility in training the classifier, Azure also provides Custom Vision Services, where you just upload and tag images, and it automatically learns how to tag new images.
For ambitious programmers who already know how to use APIs, we will briefly cover how to create and train your own model using CNTK (Microsoft Cognitive Toolkit). It is a well-known machine learning framework comparable to TensorFlow, which would also be useful for building other machine learning models.
Today we will be using two programming languages, Python and JavaScript. Python is a handy language for processing information and data. JavaScript is a web-friendly language often used for interactive websites, and has evolved into a general-purpose language with the development of Node.js, which can run on the machine as well as in the browser.
Unlike more low-level (or computer-friendly) languages such as C++ and Java, Python and JavaScript has better human interpretability. We can use any language to use these Computer Vision APIs, but for the sake of simplicity we will be focusing on using Python and JavaScript today.
Please view the PDF file for a full handout.