Caire – content aware image resize library

Caire is content aware image resize library.

How does it works

  • An energy map (edge detection) is generated from the provided image.
  • The algorithm tries to find the least important parts of the image taking into account the lowest energy values.
  • Using a dynamic programming approach the algorithm will generate individual seams accrossing the image from top to down, or from left to right (depending on the horizontal or vertical resizing) and will allocate for each seam a custom value, the least important pixels having the lowest energy cost and the most important ones having the highest cost.
  • Traverse the image from the second row to the last row and compute the cumulative minimum energy for all possible connected seams for each entry.
  • The minimum energy level is calculated by summing up the current pixel with the lowest value of the neighboring pixels from the previous row.
  • Traverse the image from top to bottom and compute the minimum energy level. For each pixel in a row we compute the energy of the current pixel plus the energy of one of the three possible pixels above it.
  • Find the lowest cost seam from the energy matrix starting from the last row and remove it.
  • Repeat the process.

How an A.I. ‘Cat-and-Mouse Game’ Generates Believable Fake Photos

The New York Times is running a very fascinating article on the progress of the artificial intelligence and machine learning in both identifying and generating fake photos – How an A.I. ‘Cat-and-Mouse Game’ Generates Believable Fake Photos.   The above image shows the progress of the AI working against itself and learning from its own results – one part is trying to identify if the photo is fake or not, and the other part is trying to generate a fake photo which will pass the test.  When the test fails, the system learns, improves, and tries again.  Look at the last row of photos, which are super realistic and took the system between 10 to 18 days to learn how to generate.

But that’s not all.  It gets better, and I quote:

A second team of Nvidia researchers recently built a system that can automatically alter a street photo taken on a summer’s day so that it looks like a snowy winter scene. Researchers at the University of California, Berkeley, have designed another that learns to convert horses into zebras and Monets into Van Goghs. DeepMind, a London-based A.I. lab owned by Google, is exploring technology that can generate its own videos. And Adobe is fashioning similar machine learning techniques with an eye toward pushing them into products like Photoshop, its popular image design tool.

Here are a few more photos that were generated:

This is remarkable.  But if you keep reading the article, you’ll quickly discover that there is even more to it.  What’s next in line after pictures?  You are correct: videos.  You better sit down before you watch this video, showing Obama’s lip sync:

So, can’t trust the TV.  Can’t trust the Internet.  Who do you trust?

How real is real? Or is it all fake?

Jason Kottke blog links to an interesting article about a guy submitting a fake, as in computer generated, image and getting his real French ID card.

The photo I submitted for this request is actually a 3D model created on a computer, by means of several different software and techniques used for special effects in movies and in the video game industry. It is a digital image, where the body is absent, the result of an artificial process.

The image corresponds to the official demands for an ID: it is resembling, is recent, and answers all the criteria of framing, light, bottom and contrasts to be observed.

The document validating my french identity in the most official way thus presents today an image of me which is practically virtual, a version of video game, fiction.

The article also links to a different image study, done by, supposedly, Google Street View camera, and then, possibly, manipulated in Photoshop.

I’ve been involved with some regulated industries (like Forex) which require proof of identity and residence, all submitted digitally (via email or web form file upload). There’s always a fair amount of obviously fake images sent in.  But the above two stories beg a question of where does one draw a line.  With the recent technological advances and an increasing reliance on digital ways, how can anybody reliably validate an image as fake or real?

I don’t have an answer, but I think the only way here is to fight fire with fire.  As in use technology to do image analysis, rather than an untrained human eye.  Are there well known or well established tools that can do the job?  Not sure about well known or established, but at least some tools do exist.

Real-time face detection and emotion/gender classification

I came across this interesting Python tool that helps with real-time face detection and emotion and gender classification.  Here is a more complete brief description from the project page:

Real-time face detection and emotion/gender classification using fer2013/IMDB datasets with a keras CNN model and openCV.

  • IMDB gender classification test accuracy: 96%.
  • fer2013 emotion classification test accuracy: 66%.

Amazon Rekognition – Image Detection and Recognition Powered by Deep Learning

I know, I know, this blog is turning into an Amazon marketing blow-horn, but what can I do? Amazon re:Invent 2016 conference turned into an exciting stream of news for the regular Joe, like yours truly.

This time, Amazon Rekognition is announced, which is an image detection and recognition service, powered by deep learning.  This is yet another area traditionally difficult for the computers.

Like with the other Amazon AWS services, I was eager to try it out.  So I grabbed a few images from my Instagram stream, and uploaded them into the Rekognition Console.  I don’t think Rekognition actually uses Instagram to learn about the tags and such (but it is possible).  Just to make it a bit more difficult for them, I’ve used the generic image names like q1.jpg, q2.jpg, etc.

Here are the results.  Firstly, the burger.

rekognition-burger

This was spot on, with burger, food, and seasoning identified as labels.  The confidence for burger and food was almost 99%, which is correct.

Then, the beer can with a laptop in the background.

rekognition-beer

Can and tin labels are at 98% confidence. Beverage, drink, computer and electronics are at 69%, which is not bad at all.

Then I decided to try something with people.  Here goes my son Maxim, in a very grainy, low-light picture.

rekognition-maxim

People, person, human at 99%, which is correct.  Portrait and selfie at 58%, which is accurate enough.  And then female at 53%, which is not exactly the case.  But with him being still a kid, that’s not too terrible.

Let’s see what it thinks of me then.

rekognition-leonid

Human, people, person at 99% – yup. 98% for beard and hair is not bad.  But it completely missed out on the duck! :)  I guess it returns a limited number of labels, and while the duck is pretty obvious, the size of it, compared to how much of the picture is occupied by my ugly mug, is insignificant.

Overall, these are quite good results.  This blog post covers a few other cases, like figuring out the breed of a dog and emotional state of people in the picture, which is even cooler, than my tests.

Pricing-wise, I think the service is quite affordable as well:

rekognition-pricing

$1 USD per 1,000 images is very reasonable.  The traditional Free Tier allows for 5,000 images per month.  And API calls that support more than 1 image per call, are still counted as a single image.

All I need now is a project where I can apply this awesomeness…

Google Street View vs. captcha

Google Online Security Blog shares the news on the innovation in image recognition technology used in Google Street View:

Translating a street address to an exact location on a map is harder than it seems. To take on this challenge and make Google Maps even more useful, we’ve been working on a new system to help locate addresses even more accurately, using some of the technology from the Street View and reCAPTCHA teams.

This technology finds and reads street numbers in Street View, and correlates those numbers with existing addresses to pinpoint their exact location on Google Maps. We’ve described these findings in a scientific paper at the International Conference on Learning Representations (ICLR). In this paper, we show that this system is able to accurately detect and read difficult numbers in Street View with 90% accuracy.

Here are some examples of correctly identified street numbers – quite impressive!

street numbers

What’s even more interesting that pushing this technology for good uses also empowers the evil side of things:

Turns out that this new algorithm can also be used to read CAPTCHA puzzles—we found that it can decipher the hardest distorted text puzzles from reCAPTCHA with over 99% accuracy.

Oops!

Burberry Kisses, good or evil?

Here is something I have mixed feelings about:

Thanks to modern technology you can connect with your loved ones by sending a quick note, a photo of your cat, even a smile :) around the world in seconds. But one of humanity’s most iconic forms of communication—the kiss—has been left out in the cold. Now, though, you can send a kiss to anyone, anywhere in the world, through Burberry Kisses, a new campaign from Burberry and Google. And not just any kiss, but your kiss.

On one hand, this is sweet and romantic.  Yet, on the other, Google is so well known for its crowd-sourcing experiments, that it makes me wonder – what’s behind this one?  After all, when Google wanted to fix all those bad scans in Google Books project, they’ve started the Google Captcha service that used everyone on the web.  When Google wanted to teach it’s voice recognition of all the accents (at least in the States), they’ve opened up a directory service.   And they’ve done more of the same for images, artificial intelligence, and even maps.

So, what is a possible usage for a huge collection of lip images?

The darkest version I have is somewhere around fingerprinting.  Lip prints are probably as unique as finger prints.  And when you mix it up with, say, face recognition that they already have, who knows where that can lead.  Oh, by the way, now that I thought of face recognition, Android’s face recognition lock sounds suspicious as well.  Oh, crap.  I think I’m going paranoid!

Humans in image recognition

It looks like humans aren’t all that useless when it comes to technology.  There are still a few areas that we do better than machines.  Image recognition is one of them.  TechCrunch runs the story about one company that seems to be using humans in image recognition process.  Comments to that story also mention Google doing the same.

To me it feels like a problem with timing.  There is a need to tag and search a whole lot of images.  But there is no good automated solution available.  So we are falling back on humans.  It’s easy to come up with a few other areas, in which there is a need today for solutions which won’t even be here tomorrow.  Technology needs help, I guess.