image recognition

Amazon Rekognition enhanced capabilities

Amazon Machine Learning Blog announces some very significant improvements to the Amazon Rekognition – their service for image and video analysis. These particular changes are improving the face recognition technology.

“Face detection” tries to answer the question: Is there a face in this picture? In real-world images, various aspects can have an impact on a system’s ability to detect faces with high accuracy. These aspects might include pose variations caused by head movement and/or camera movements, occlusion due to foreground or background objects (such as faces covered by hats, hair, or hands of another person in the foreground), illumination variations (such as low contrast and shadows), bright lighting that leads to washed out faces, low quality and resolution that leads to noisy and blurry faces, and distortion from cameras and lenses themselves. These issues manifest as missed detections (a face not detected) or false detections (an image region detected as a face even when there is no face). For example, on social media different poses, camera filters, lighting, and occlusions (such as a photo bomb) are common. For financial services customers, verification of customer identity as a part of multi-factor authentication and fraud prevention workflows involves matching a high resolution selfie (a face image) with a lower resolution, small, and often blurry image of face on a photo identity document (such as a passport or driving license). Also, many customers have to detect and recognize faces of low contrast from images where the camera is pointing at a bright light.

With the latest updates, Amazon Rekognition can now detect 40 percent more faces – that would have been previously missed – in images that have some of the most challenging conditions described earlier. At the same time, the rate of false detections is reduced by 50 percent. This means that customers such as social media apps can get consistent and reliable detections (fewer misses, fewer false detections) with higher confidence, allowing them to deliver better customer experiences in use cases like automated profile photo review. In addition, face recognition now returns 30 percent more correct ‘best’ matches (the most similar face) compared to our previous model when searching against a large collection of faces. This enables customers to obtain better search results in applications like fraud prevention. Face matches now also have more consistent similarity scores across varying lighting, pose, and appearance, allowing customers to use higher confidence thresholds, avoid false matches, and reduce human review in applications such as identity verification. As always, for use cases involving civil liberties or customer sentiments, where the veracity of the match is critical, we recommend that customers use best practices, higher confidence level (at least 99%), and always include human review.

Have a look at their blog post for some examples of what the machine can recognize as a face. Some of these are difficult enough to treat many humans, I think.

Caire – content aware image resize library

Caire is content aware image resize library.

How does it works

An energy map (edge detection) is generated from the provided image.

The algorithm tries to find the least important parts of the image taking into account the lowest energy values.

Using a dynamic programming approach the algorithm will generate individual seams accrossing the image from top to down, or from left to right (depending on the horizontal or vertical resizing) and will allocate for each seam a custom value, the least important pixels having the lowest energy cost and the most important ones having the highest cost.

Traverse the image from the second row to the last row and compute the cumulative minimum energy for all possible connected seams for each entry.

The minimum energy level is calculated by summing up the current pixel with the lowest value of the neighboring pixels from the previous row.

Traverse the image from top to bottom and compute the minimum energy level. For each pixel in a row we compute the energy of the current pixel plus the energy of one of the three possible pixels above it.

Find the lowest cost seam from the energy matrix starting from the last row and remove it.

Repeat the process.

How an A.I. ‘Cat-and-Mouse Game’ Generates Believable Fake Photos

The New York Times is running a very fascinating article on the progress of the artificial intelligence and machine learning in both identifying and generating fake photos – How an A.I. ‘Cat-and-Mouse Game’ Generates Believable Fake Photos. The above image shows the progress of the AI working against itself and learning from its own results – one part is trying to identify if the photo is fake or not, and the other part is trying to generate a fake photo which will pass the test. When the test fails, the system learns, improves, and tries again. Look at the last row of photos, which are super realistic and took the system between 10 to 18 days to learn how to generate.

But that’s not all. It gets better, and I quote:

A second team of Nvidia researchers recently built a system that can automatically alter a street photo taken on a summer’s day so that it looks like a snowy winter scene. Researchers at the University of California, Berkeley, have designed another that learns to convert horses into zebras and Monets into Van Goghs. DeepMind, a London-based A.I. lab owned by Google, is exploring technology that can generate its own videos. And Adobe is fashioning similar machine learning techniques with an eye toward pushing them into products like Photoshop, its popular image design tool.

Here are a few more photos that were generated:

This is remarkable. But if you keep reading the article, you’ll quickly discover that there is even more to it. What’s next in line after pictures? You are correct: videos. You better sit down before you watch this video, showing Obama’s lip sync:

So, can’t trust the TV. Can’t trust the Internet. Who do you trust?

How real is real? Or is it all fake?

Jason Kottke blog links to an interesting article about a guy submitting a fake, as in computer generated, image and getting his real French ID card.

The photo I submitted for this request is actually a 3D model created on a computer, by means of several different software and techniques used for special effects in movies and in the video game industry. It is a digital image, where the body is absent, the result of an artificial process.

The image corresponds to the official demands for an ID: it is resembling, is recent, and answers all the criteria of framing, light, bottom and contrasts to be observed.

The document validating my french identity in the most official way thus presents today an image of me which is practically virtual, a version of video game, fiction.

The article also links to a different image study, done by, supposedly, Google Street View camera, and then, possibly, manipulated in Photoshop.

I’ve been involved with some regulated industries (like Forex) which require proof of identity and residence, all submitted digitally (via email or web form file upload). There’s always a fair amount of obviously fake images sent in. But the above two stories beg a question of where does one draw a line. With the recent technological advances and an increasing reliance on digital ways, how can anybody reliably validate an image as fake or real?

I don’t have an answer, but I think the only way here is to fight fire with fire. As in use technology to do image analysis, rather than an untrained human eye. Are there well known or well established tools that can do the job? Not sure about well known or established, but at least some tools do exist.

Real-time face detection and emotion/gender classification

I came across this interesting Python tool that helps with real-time face detection and emotion and gender classification. Here is a more complete brief description from the project page:

Real-time face detection and emotion/gender classification using fer2013/IMDB datasets with a keras CNN model and openCV.

IMDB gender classification test accuracy: 96%.

fer2013 emotion classification test accuracy: 66%.