Previous Episode: Episode 4: Humans Only

In this episode we introduce opencv, a popular open source computer vision library created by Dr. Gary Bradski and developed in large part by a team of Russian computer vision and optimization experts. We interviewed Gary at his office at Willow Garage, where they are building an open research platform for personal robotics, including the [...]


In this episode we introduce opencv, a popular open source computer vision library created by Dr. Gary Bradski and developed in large part by a team of Russian computer vision and optimization experts. We interviewed Gary at his office at Willow Garage, where they are building an open research platform for personal robotics, including the PR2 above.

Handy links below.


General

OpenCV, the official page.
Learning OpenCV, a very thorough introduction from O’Reilly publishing, written by Dr. Gary Bradski and Dr. Adrian Kaehler.
Sarov, the secret Soviet city (still apparently closed to foreigners).
How does Sudoku Grab work? An explanation of the iPhone Sudoku solver Nat uses to cheat in multiplayer games online.
CubeCheater, the iPhone Rubik’s Cube solver which is now offline due to legal threats (!!). But you can still see a video.
TinEye, the reverse image search engine. Find out where an image comes from.
Google Goggles.
The Poseidon Drowning Detection System.

DARPA Grand Challenge

Wikipedia’s overview of the DARPA Grand Challenge
Stanley, the car from Stanford which won in 2005.
A nice video recap of the 2005 DARPA Grand Challenge
Demo of Stanley’s vision system

OpenCV Hacks

ASL Finger Recognition using C# OpenCV wrapper
Tracking a tennis ball
Control your mouse in Linux with a red glove
Face features detection system
Toss and catch a virtual ball
Use your head as a mouse with MouseTrap
Sample code for OpenCV face detection
Sample code for grabbing and displaying frames from a webcam
Emgu, a .NET wrapper for OpenCV.

Transcript

00:00:00

00:00:13
Gary: It makes the little line matching canal find all the triangulations to fill in this scene in a dense way. If you have dense, nice 3D models then there’s a lot you can do in perception. But it also works – it’s getting – you can’t see this but this is rapidly flashing on and off and so it gets a clear image and then it gets the dense image and then the other stereo comes in between all that and gets a color image. And then there’s this camera, too, is a high resolution five megapixel, but that only goes …
00:00:44

00:00:52
Nat: That was Gary Bradski, a computer vision expert and senior scientist at Willow Garage, which develops hardware and open source software for personal robotics.
Nat: In 2005, Gary led the computer vision team that won the DARPA Grand Challenge. Gary gave me a tour of the robot they’re building at Willow Garage called the PR2. The robot’s vision system is based on an open sourced computer vision library called OpenCV that Gary created
Alex: And OpenCV is the main focus of our program today.
Nat: I’m Nat Friedman in sunny San Francisco.
Alex: And I’m Alex Graveley, reporting from Boston.
Nat: And this is Hacker Medley, the podcast for curious hackers.
00:01:30

00:01:42
Nat: So Alex, for years and years computer vision has mostly operated in the realm of research or sometimes industrial applications, like in factories to monitor equipment to make sure it doesn’t, you know, spin out of control or in mining, but all of a sudden it’s sort of starting to shop up everywhere in our own lives.
Nat: For example, I’ve got a bunch of apps on my phone that can do things like let me take a picture of a Sudoku board and then recognize the board and all the numbers and show me the solution to the whole board or one that does the same kind of thing for Rubik’s cubes.
Alex: Yeah, and in fact, this stuff is sort of creeping in all over the place. Most cameras these days have face detection auto focus and for instance, Google’s releasing a new product called Google Goggles which does an image-based search.
Nat: Yeah, so you take a picture of like a book or a product and it’ll show you information about, you know, reviews of that book, how much it costs online and this is sort of called image search or reverse image search and there’s a website called TinEye.com that’s been doing the same thing for quite a long time.
00:02:49
Alex: So Nat, what is computer vision?
Nat: Okay, so good idea. Let’s define it before we get into it too much. Gary Bradski, who we interviewed for this podcast, he wrote a book about his library, OpenCV, and at the beginning of the book he gives a pretty good definition of computer vision so I’m just going to quote it here. He says:
Nat: “Computer vision is the transformation of data from a still or video camera into either a decision or a new representation.” And then he goes on to say, “a decision might be something like there’s a person in this scene or there are 17 tumor cells on this slide”.
Alex: And a new representation might be something like Google’s Street View where they stitch a bunch of photos together taken from cars driving all over the city that you’re searching and project them onto a spherical surface so you can pan around and see where the store that you’re trying to find is.
00:03:46
Nat: So, of course, you know, why is this happening, why is computer vision becoming more widespread? Well, of course, one of the big reasons is the widespread availability of CCDs, right? So we’ve got these cameras in our cell phones or other portable devices, most computers come with webcams these days and the algorithms have developed, too. You know, even in the last 10, 15 years computer vision algorithms have improved quite a lot, but one of the other things that’s driven the adoption of computer vision technology is the availability of open source building blocks and OpenCV is a great example of this.
Nat: Gary started the OpenCV library in 1999 when he was working at Intel Research and he spent some time after joining Intel touring different universities around the US and visiting their different computer vision labs.
00:04:35
Gary: And I saw that MIT Media Lab was – had the advantage that they were building on the infrastructure that had built up from other students, so when a student came into the media lab they immediately had available all the image processing, all the scan computer vision routines, and so the students were able to do much more ambitious research.
Nat: So when Gary got to Intel it made sense for him to help the computer vision community by building a common platform that could be used to do computer vision research and to advance the state of the art in computer vision without having to build everything from scratch, but also it made sense for Intel because they would be able to encourage people to use a lot more processing power.
Nat: So Gary had a bunch of image routines that he’d written himself, but he knew he needed more help and as it turned out Intel had just the team for the job.
00:05:27
Gary: So Intel, at the time, was contracting with people in Sarov, Russia, which is their secret city the Soviet Union never acknowledged that exists. It was erased literally on the map. But Intel had hired contractors there who were, some of them, former nuclear weapons designers that they wanted to keep busy doing something else, such as debugging software. And so they became the core and these were really well trained people, they became the core of this library development.
Alex: And Gary chose for his library, the incredibly loose BSD license so that it could be used in the most places possible. And so it’s used all over the place but you can’t really track it because no one has to report what they’re – if they’re using it or not.
00:06:16
Nat: So that was the goal of OpenCV, provide the base level computer vision functions as, you know, as deep and as broad as possible and make them really widely available. And over the last 10 years the library has grown and grown and now there’s tons of functionality. If you get Gary’s recent OpenCV book from O’Reilly, it’s about 500 pages long of great explanations of all the different functions that are in OpenCV. Maybe we should try to go through those. It’s a lot of stuff though.
Alex: Open CV can be split into a bunch of categories of functions built on top of each other and interrelated. The first category is basic image processing routines.
Nat: Gaussian smoothing, bilateral smoothing, morphological dilation, morphological erosion, morphological top hat, black hat, floodfill, pyramid segmentation, Canny edge detection, sparse perspective, affine transformation
Alex: The second category is higher level vision and video processing.
Nat: Contour finding, polygon approximation, convexity defect detection, background subtraction, scene modeling, stereo 3D vision, the watershed algorithm, corner finding, the Lucas-Kanade method for optical flow, the Horn-Shunck method, the Kalman filter
Alex: And the third category is machine learning algorithms.
Nat: Mahalanobis distance, K-nearest neighbor, Bayesian classifiers, binary decision trees, boosting, random trees, Haar classifiers, multilayer perceptrons, support vector machines, expectation maximization.
00:07:28
Alex: So there’s quite a lot of stuff in this library. You can do anything from really basic image transformations to – I mean there’s one entry point that identifies faces in an image. And there’s also some useful platform – 3D platform routines for drawing windows or grabbing frames off of the video camera that you have built into your computer.
Nat: Yeah, that’s pretty cool actually if you’re using OpenCV on Linux or Windows machine it’s basically like two lines of code to open the camera and just grab frames off of it in C or Python or C++.
Alex: So why doesn’t it work that way in ‘06?
Nat: I don’t know. Good question.
Alex: So Nat, what other users of OpenCV are out there?
00:08:08
Nat: Well when I asked Gary, who is using OpenCV the first thing he said is that because it’s a BSD licensed library they can’t know, right? People don’t have to release their code, they don’t have to report that they’re using it, but they suspect, for example, that the face detection algorithm that’s in OpenCV is the foundation of the face detection algorithm that Omron licenses to pretty much all the camera manufacturers worldwide. And then, you know, there’s a whole bunch of other big names. It’s used in space, it’s used by all sorts of companies developing anything related to computer vision.
Nat: One of the applications that Gary told me about that I thought was really interesting is this European product, it’s a drowning detection system that you install at a public pool, and it’s got a couple of cameras above the pool and inside the pool, on the walls of the pool, and it detects when someone’s drowning. And I asked Gary how it works.
00:09:00
Gary: So of course they have to be careful about, you don’t want a lot of false alarms, what you call false positives, and yet you don’t want to have a misdetection. So you don’t want someone to drown and say oh we missed it. Right? So they have a series of checks and they’re looking for blobs that will sit on, that are lying on the bottom of the swimming pool. So I believe, I don’t know the details of their algorithms, but I believe what they’re doing is a kind of background check of the, they learn what the swimming pool looks like with no people in it and then they’re looking for things that lie motionless on the bottom for a certain amount of time. And they also have more than one camera, so they’re looking for confirms at different levels.
00:09:54
Gary: And so when they detect something’s on the bottom it’s not moving and some cameras confirm this then they set off an alarm. On their site, I can’t remember the name of this company, I mean it’s used in the US, too, they have a, probably Google for drowning detection system. It’s a European company. But they have some film strips that actually show this rescuing people. Some guy had a heart attack, he sunk under the water, the lifeguard pulled him up and they saved him, not only from the heart attack but from the drowning. And a couple other known savings of people. So that’s all good.
00:10:34

00:10:45
Alex: So Open CV was even used in winning the DARPA Grand Challenge.
Nat: Yeah, the year that Gary participated in the DARPA Grand Challenge in 2005, when they won actually, OpenCV was the foundation of the vision system they built, but actually why don’t we tell people what the DARPA grand challenge is in case they don’t know.
Alex: Yeah, so DARPA, which is the research arm of the US Defense Department, is the, you know, the guys that sponsored the development of the internet, sponsored a contest to autonomously drive a car 140 miles through the dessert and whoever could – whatever team could build an autonomous car that would be able to do that in the shortest amount of time would win. And I don’t know how much the prize was, maybeÖ
Nat: It was $2 million.
Alex: Oh, it was $2 million.
Nat: Yeah. And Gary stressed for me that he did not get any of that money so it apparently went into Stanford’s endowment. But in 2005 he did lead the team that built the vision system for Stanford’s entry, which was called “Stanley”, and you know, the goal, of course, was to drive, you know, over this sort of twisty path, these roads that went through the dessert in Nevada and they were mountain roads that were quite twisty and they were flat roads that were quite straight and so what Gary’s challenge was was to identify where is the road. I asked him to explain how is it that you go about finding the road.
00:12:13
Gary: Okay. So there’s a lot of ways and we’ve looked at many ways. One of which is dirt roads. They tend to have parallel lines that converge into the horizon and so we did a bunch of convolutions with detectors and that worked quite well. We had detected roads to like 96% or something, which is totally useless for robots because you have 4% misdetection you’re in a ditch or off the cliff.
Gary: So what happened in the conference we had to scale back to use very simple techniques and what happened is we had short ranged laser, five of them, that would be able to tell what the road looked like close by or it would identify that’s flat terrain close by, I like that. And the vision system, by calibrating these together, the laser could tell the vision system, that patch is what I like.
00:13:07
Gary: We use simple color segmentation techniques, which was fitting Gaussian models. These all exist now in the background subtraction routines in OpenCV but fitting a Gaussian color model to this and we simply then had a distribution of what road looked like in color space and we segmented the rest of the scene by seeing which pixels hit that distribution. We had a bunch of other checks that this had to be a contiguous region, it had to be connected.
Gary: We actually, at one point, had trapezoidal checks that we didn’t need in the end. So there were a couple of checks and it worked very fast and reliably.
00:13:52
Gary: And so, you know, basically you have 100% recognition rate to be able to run. The vision’s goal was to detect when the car could go faster. When we’re in the mountains on curvy roads we could go faster than 25 mph and so vision was on but it wasn’t used because we would never see enough road to tell the robot you can run now, but on the straightaway’s where you could go fast vision was crucial for telling the car look this is faster than you can stop in your laser range but I’m sure I see a clear path, floor it.
Gary: There’s a lot of other ways of doing road detection now. We have this watershed transform so you can use that for segmenting roads. I show an example of that as an exercise in the book. We now have a new algorithm that’s written after the book called GrabCut that could easily, if you give me some points in the road and some points out I can easily segment the road. Very nice, much better segmentations than we were getting. So there’s a lot of routines now.
00:15:00
Nat: So I mean, what’s really impressive to me about this, Alex, is that in the last five years even, you know, there’s been huge advancements in the ability to do stuff like segment the road and identify where the road is. It seems like we’re kind of in a golden era for computer vision where the state of the art is really making huge strides over a pretty short period of time.
Nat: And I think one of the things that’s most interesting that Alex and I discovered while we were reading about Open CV is that if you go to YouTube and search for Open CV, we just did this the other day, you’ll find dozens of videos and these are basically hackers who have done these little demos or written these little tools based on Open CV. It’s like this incredibly vibrant hacker scene, like a demo scene of people doing hacks in Open CV.
Nat: So, for example, one of them we found is this guy from Indonesia who did this hack with his web cam based on Open CV to recognize sign language. So you do signs in front of your web cam and it tells you what letters you’re signing, all based on Open CV. It’s pretty amazing.
00:16:00
Alex: Yeah, and that’s really basic because he’s just looking at – he’s just identifying letters and not doing gestures yet, but you know, it’s a start and it shows sort of the power of having a library that lets you do really complex operations. It’s something to build from.
Nat: Yeah, actually this is really cool. He didn’t publish his codes and not all these guys are publishing their code, but apparently he wrote this ASL finger spelling recognizer in C Sharp using a C Sharp wrapper and he kind of, he wrote a little recipe for how it works. He said step one, I did hard object detection to detect whether the hand is open or closed and to determine the position of the box, step two, movement detection, if things are shaking then reset the region of interest box, step three, in order to extract the hand shape I used skin detection based on HSV or RGB, depending on the lighting, and step four, to classify the image I use [Canorous?] neighbor, which is in the machine learning section of Open CV, 100 training images per sign and he says he’s got, out of the 26 letter alphabet, 19 signs identifying perfectly and he’s still working on seven others, you know, the seven remaining letters.
00:17:04
Alex: That’s cool because that’s – I mean you’re talking about the use of all those different categories of algorithms that are present in Open CV to solve, you know, what on the outside seems like a pretty straightforward problem, like you’re just looking for hand figures that look like what you know a sign language A or C to look like.
Nat: You’ve been messing around with OpenCV, haven’t you Alex?
Alex: In my case I was playing with the face detection and eye detection utilities that exist, and I noticed that sort of once you try and dig down into – once you try and do something that is not what a given algorithm was designed for that you end up, I mean you get into a lot of detail on a lot of, you know, techniques in order to handle what you want.
Alex: For instance, in eye detection it’s, you know, it’s difficult to detect eyes if the person is not looking straight at the camera, if their head is rotated, if their hair is in the way, all these kind of things that once you actually try and do it generally makes life really difficult.
00:18:17
Nat: Yeah, I think like part of the challenge is that our intuition for what should be easy and straightforward in computer vision doesn’t match what actually is easy, you know. Like find the eyes or whatever seems pretty straightforward but it turns out to be a little bit challenging and, you know, one of the things I like about OpenCV though is that in a bunch of different languages, you know, in C Sharp, Python, C, C++, there’s a bunch of sample code and these samples are perfect. I mean they’re like 50 lines to 200 lines of code that do just one thing usually. They’re simple. You can feed them your own images and you can just grab the code and kind of start changing things, you know, find the right starting point example and use that to build whatever it is you’re trying to do.
00:18:59
Nat: And like we said, you know, if you go online you’ll find people who do, I don’t know, someone wrote like a tennis ball tracker so he’s bouncing a tennis ball over the room and OpenCV is recognizing it and tracking it really quickly, and people build security camera motion detectors and there’s a whole lot of examples online of great hacks you can do with this thing.
Alex: Yeah, I like the one that I saw where you can show the camera – you can hold the number of fingers up to the camera and then the camera will count the fingers that you’re showing it then add them all up and show you the result.
Nat: Oh yeah, that’s a good one. One of major future areas that they’re investing in in OpenCV is stereo vision and that’s because Willow Garage is sponsoring a lot of the development of OpenCV and, you know, their focus is to build a personal robot that uses some different stereo vision techniques and, you know, laser scanners and other techniques like that to try and model a 3D world and interact with it. And so that’s, you know, doing real time stereo object detection recognition is a big part of the future direction of OpenCV. But you can also, you know, take it in any direction you want. It’s pretty powerful.
00:20:03
Alex: And so that’s our show. Thanks for listening.
Nat: Yeah, we definitely encourage you to check out our website. We often hear that one of the best parts about our show is the set of links that we provide along with each episode, which I don’t know how to feel about that to be honest, Alex. I guess it is a pretty good set of links usually.
Alex: Yeah, we give a good link.
Nat: So check that out and we’d love to get your feedback and if you like our show feel free to subscribe so we can get our feedburner number up and feel better about ourselves and our place in the world.
00:20:32