Login with your Facebook Account
To download this program become a
member. JOIN NOW >>
Let me recall – call up the – the three next speakers. Let me go in a sequence, that we will show you what they are doing and explain why that’s relevant. Nikolaj Nyholm first, Nikolaj where are you? He is the CEO of Polar Rose in Germany - in Sweden. They have developed a system that includes pretty sophisticated facial recognition. I think I am not going first. And he will go first. Alexander Straub from Germany he is also hiding somewhere. Who runs a Company called Pixsta. He is one of the cofounders and he also have a visual search engine and I feel that he will show you some applications in commerce for example and then professor Arnold Smeulders from the Netherlands who has a variety of search engines that are quite amazing and he will show us I think four of them. So Nikolaj are you set up? I am set up. So why don't you go? Right. Nikolaj Nyholm. So this is - this is a short introduction demo to what we are doing at Polar Rose. We – we sat around a couple of years ago, started looking at the amazing amount of specially photo on video which was starting to popup on the net. Early days of flicker, early days of Smug Mug, early days of 23 in Denmark, these photos sharing sites which were growing, Youtube obviously coming in about a year and a half ago, massive amounts of data, massive amounts of multimedia starting to show up on the net but few ways it really sort browse and work with – with this data if there was no text translation to this. Flicker grows at about a rate of - doubles at a rate of about every eight months, I think Youtube at about five months and the general web has in text doubles only every 12 months. So there is a definitely a challenge in trying to figure out how can we deal with this massive amount of data. The company Polar Rose was founded in late 2004 out of - based out of research of Lund and Malmö and specifically we were – we were taken – we were taken back at – at the challenge of – of consumer average end user photos. If you – when you walk in to the US, when you go through immigration, left index finger, right index finger, looks straight at the camera, what you cannot see is that there are cameras from the left or the cameras from the right taking stereophonic pictures of you. If you don't look straight at the camera, that the software will actually detect it and – and tell the officer to make sure you look straight at it. The reason doing this is that – that for face recognition to work traditionally you need a very controlled environment, you need controlled lighting you need lighting from – from the same place every time, you look - you need the subject looking straight at the – at the camera because if my – if my pose changes, if the lighting changes, the pixels change and thus the – the pattern recognition of – of the computer vision software changes. So our - our challenge was taking a 2D picture and extracting 3D shape from that. By being able to do that from a single 2D image shot by your camera phone, shot by your cheap Canon camera, we - but just the single 2D image by extract the 3D data out of this we were able to correct for pose and also compute the lining sources and thus circum at least some of the challenges that that face recognition has with the – with the end user photos. So starting to analyze how this all works, this is - we – we measure about a 140 different vectors up here and the better, it might be a bit – bit difficult on this overhead to see, but the better we – we do our matching the – the closer the graphs match, the better the job our software has done. But also some of the interesting things that when we start to looking at – at some of the characteristics of these 140 different vectors is the fact that with a 98.2 percent certainty we can actually project whether this is a male or this is a female. So a lot of interesting data is – is – is in people’s face, it’s- it’s in the images but it’s – it’s data which cannot today be represented by an average search engine. So what we – what we have done is we built a plug in for initially for Firefox as our reference model, we are still in beta - we only have a beta - 1200 users on it. A 1200 who - which probably, half of them are using it very actively. But these users have installed a plug-in of Firefox. When they browse a page, the – the browser processes the images in the page. Sees if there are any people there and adds a rose accordingly. And so in this picture on Flicker it finds two roses, user clicks on it. We don’t have a name for – for this person but we have some other possible matches that we – we think is the same girl as this. This in turn – this data gets first back to polarrose.com - other user is to – to use for – for - when Bruno here - over here goes ahead and sees a – a third web page and Naomi here is also on that or Jeff Parker is also on that. the data, the individual small bits which are being picked up by – by our users and the data which is being contributed by our users gets – gets morphed across the entire user base despite the processing being – being on the individual client. I will just quickly try to switch to the browser, okay doesn’t look entirely good on – on this screen. But this is the taken – this is the taken example. I am on timesonline, one of the big UK papers. Roses start appearing here on, in this case, George Clooney, we just process this image right now and during the beta phase we are not doing any kind of real – real time matching. It's a sort of a chicken in egg problem that we need a lot of data before we can – we can start giving matches back. I type in this as George Clooney, I go to my own page here and this is – these are the results that I have – where I have contributed data back to the Polar Rose system. I don’t think I can browse horizontally here, so there is something I can not show you, it will be interesting. So switching – switching back to the presentation here, what - what we are starting to do is obviously apply this technology in a bunch of different places. The product in bunch of different places. One is browsing with your browsers as just showed, another is going to a photo sharing site, in this case Swedish one called Picsbox. Sort all your photos based on – or view the photos of this album based on who is in them or who is not in them. Show me the pictures of my wife but not my mother in law. Go to cnn.com – another example being able to – to match articles across – across different sites based on the people who are – who are actually in them. The more sort of far out case is probably match.com. We have had a lot of contacts from the party sites, the dating sites and so forth who want to be able to give people some way to – a different way to browse than the – than the 28 drop downs which – which typically assimilate sex and interest and age and stuff like that. So show me somebody you like her, show me this girl on match.com or at least somebody who – who looks like her is – is an often found request. And then finally the thing which we are only now venturing into is video. Indexing the vast amount of video simply based on the people who are – who are in this. This – the – the reason that we are – we are so focused on – on people is that about 40 percent of all the images which are uploaded today contain people in them and about 60 percent of the searches on Google’s image search or any of the other image search engines is for people, is for names. So we think that - that this is really the primary entry point to is – is people, show Helena Christensen and then give me – show me her – her hand bag and – and this is the social object that – that people are most likely to – to be keen of about in – in a photo. Thank you. Thanks. Thank you. Just – just one question, when we mailed in the past weeks, you told me “Oh, we are like developing bionic software†Yeah. I took it for a joke, but may be it's not. No, no. well, I mean, I snap a photo of you, I match it against three billion other photos. I am not going to find Bruno. We are not that good yet, may be ten years from now – may be 20 years, may be when we start – stereophonic cameras are actually - stereophonic consumer cameras are actually starting to appear on the market. So things can, may be changed. I am not sure we want them to change that way but – but the reason we call it bionic is – is that we – we - the computer helps the user but the user also helps the computer back, there are lot of – there are a lot of different terms for this, but we – we like the term bionic in that – that - or human computing that that people – people contribute back to – to something like this, it’s not user generated content but it’s simply helping the computer, it’s training the engines and making the matching better we had 20 pictures of you, we probably can find you Thank you. Thank you. Thanks Nikolaj Nyholm. Ready to swap computers? Yeah so let’s get the next machine up. And see I think a different approach to it because this is also about tagging, people tagging photos and our next speaker Straub told me we actually don’t really look at tagging – don’t really look at a metadata beyond what’s really in the image, beyond the attributes of the object itself, so let’s see how it is compared ready to go? Ready to go, Alex - Alexander Straub please start. Hello, thank for giving me the audience and yeah what we started off is basically a technology which we incorporate into a company year ago called Pixelstar, short form is Pixta and what we have found is that there was actually an interesting way to look at images without extra looking at all the textual information around them but looking at the pixel itself and then actually trying to calculate on those images the appropriate basic characteristics, we could call them also features and then actually relating all over those images to each other depending on the features which we have calculated and so you could imagine, you look at an image we run the processing over that image along a hundred or thousand or hundred thousand of different feature calculations at least to actually split out all mathematical values and then those mathematical values are all actually a vector craft, a matrix and then actually all of those basically vector crafts get actually oriented to each other and so we know actually, the next nearest closest match to an image and depending on its features and then we basically developed web applications to put them out into basically the domain of the web and the browser, when we saw that we started off with actually quite broad images like the coral database of images and felt there was actually something very commercial about it because we all in a way do a lot of shopping on the web these days and there is something characteristically different how we go about it, we came all from a very male dominated basically perspective, what people knew what they wanted so they could type in Sony HD camera backslash and so on and then actually run a comparison shopping engine and you can do this very well today with about 20 million products and you can update those basically on a hourly basis, but how do you do the same thing which happens to all of us when we walk in to a store, we get inspired and the inspiration actually drives us from one rack to the next and actually from one pair of shoe to a hand bag and from hand bag to a pair of jeans and eventually we leave – or a lot of people actually on the high street leaves with a shopping basket and the shopping basket are filled with the goods and that’s what necessary in the case on the web, our conversion rates on the web are relatively small, online shopping is growing but actually reaching into the database of all the products is very – very hard and making them visually basically appear and then also browsable and I think, the world has talked to a long time about search, but there is a lot to be said about basically browsing because I mean the web we all know that we can grow today with in the second to another basically link. We are pretty basically inclined to actually follow links and then jump back because if you make a little mistake it doesn’t really matter that much, so let me show what we have built and then I can visualize a bit more what I was talking about. So we are starting off with a random set of images out of 100s of 1000s of basically consumer products and there is all kind of basically objects, features which actually shows some think on – we don't know anything about a metadata, we don't know anything about the brand about what actually is described on that image, we just know actually what mathematically has been calculated around it, so let’s choose an image and see what happens, let’s say we like actually this piece of jewelry and we click on it and the we basically now resend – that piece of jewelry and actually arrange around it basically other pieces of jewelry at which are back and we take for example a bag, you can do the same thing for the bag and what you now see is actually – a representation of not objects which I - almost identical to it but actually objects which in a way have a gradient, so it allows exactly what I talked about the aspect of browsing. It’s – it’s a new way of searching and we have actually these applications running on the web and several ways its locations and what we are finding is that people actually highly enjoy that. It’s not like you go to a search box, you type something in, something gets spit in out and you look at the first page in the first three and four – three to four results which are interestingly enough becoming quite boring I must say, the Google search results are starting to get actually quite – uninteresting over time after using it for last ten years and - and then you can actually make a decision so if you don't like really what you actually have clicked on, it was at red bag and you would really want to actually venture into blue bags we give you that choice to go in to blue bags. If you started to actually think well blue bags is not really the thing which I really want any more but I actually have a – a skirt which matches this skirt, I can actually go to this one and see what’s actually similar in the database around this, so well the interesting thing is Diane von Furstenberg has a pattern which we picked up to be some how of similar nature and if you want to go back to jewelry, we can jump into jewellery but have always an escape mechanism to jump back into bags, from bags to belts and let’s say into – into shoes and what this leads actually to is that in – in that journey, we have processed with our own eyes relatively quickly about four or 500 products and make – made a conscious decision to click on some of them and click not on other and along that journey you leave a trail and a pass which then leads you to actually do the following, you actually click out and you land basically at an e-tailer which actually is procuring that good and then that e-tailer again over their own collection runs a similar browse engine. So again how can you access this inventory of that e-tailer if you don't like that pair of shoe which we just landed and you don't really want to go through all of these brand names and just want to actually really have a look at basically those products, you basically can do that then on the e-tailer’s site. So it’s a - what I have just - what we did is actually, we put a lot of mass basically in, not only basically – basically a machine which calculates relationship but we also actually put them into a layer so people can access that massive amount of basically processing power needed to do these calculation to give you an idea for a million images you would need probably a trillion pair vice distance called calculations depending on how many features they use, so you are pretty much basically going very quickly to the limits what’s today - what’s today’s processing power possible and – and I think the key is actually, to all of this is also the speed of basically the interaction with those interfaces. So how did you really end up actually in – in that e-tailer and what actually happened in along that journey and I invite all of you to play with that if you want, it’s running in several basically locations around the web, we don't have a site ourselves yet but the - basically sites which are running those basically seems are the vogues, Elle, marieclaire, cosmopolitan hand bags of this world as well as big - basically e- tailers for example in the jewellery space, actually Clark which are then in their own site has an explorer engine over there basically inventory, so thank you very much. Thanks. Alex don’t go away – don’t go away while he prepares his computers, you are telling us that behind this dynamic matching of styles and colors and there is no metadata, no tagging, no definitions, no descriptions, no colors, no categories? Everything is – is it done based on the attributes of the images itself period? Yeah period - the image itself. Actually- and its actually quite a handsome way to do it because I think even people talk today about crawling the web, picking up basically metadata on the web, it’s still very – very hard because a semantic web hasn’t been really the way it should be today so it’s – it’s all in the image so to say. Pretty good. Thank you. Third demo for the Dutch among you, I am sure that you know this guy, for the others just note it’s – he is the scientific director of one of the most advanced media labs in – in Holland and also of a big project called the MultimediaN which is a private-public run project of sizable dimension, I think he will show us a little bit about – about it, Arnold Smeulders, lots of different search engines for recognition of many things including emotions, let’s see how it does it, Arnold? Thank you. so MultimediaN is a public private partnership of Netherlands actually a very good idea to advance directly from the science into a high tech, I represent science today and my problem of this presentation is that there is not a sexy part of it that I am in the content and so you have to bear with me until the end. Let’s consider four angles at searching and in the Netherlands, one out of 20 calls to the internet is searching and raising the most rapidly of all uses of the internet, so it is truly important. The first one is the semantic web browser. We made a presentation of it and different from, the standard web basis, this one is completely dynamic and not only dynamic as generated from a knowledge base but also it aligns several different types of knowledge basis I will say something in the end to it. Let me show the presentation. The cultural search engine by MultimediaN connects these digital databases and provides the context in our cultural heritage. Unlike the millions of static websites, this one supports an easy and never ending tour to the world of cultural heritage. This site demonstrates the true semantic web, a search experience based on encyclopedic knowledge tailored to your interest timely and just in time to enjoy. By using semantic web technology, the partners had this cultural work table find solutions to interconnecting information sources in a smart way and build a new infrastructure. It’s only a presentation but it would be interesting to make the – the interface exactly like this actually. The essence here is not that it is dynamic, the essence is that it reconciles several knowledge bases in the world describing arts and art pictures which each have their own terminology and then by introducing semantic web technology that is anthologies. You were align those – all those different expert systems into one massive amount and then whenever you go, whenever you browse, the – the actual presentation to you is generated dynamically. So basically it is never ending. Let me go to the next slide from here, yeah - okay, my second search engine is completely different. It calls from livejournal.com, that is a – a huge blog site. And in that blog site you can indicate your mood. And I have generated a few displays from the web, I have copied them into the presentation. If you type in the word excited, you get this pattern and as you see, it's ethnically through the day and other than expected, the top of excitement is not at eleven in the evening but it is somewhere – it is somewhere lets say, in the middle of the – of the afternoon. Now this is a regular pattern which repeats itself everyday, the – when bloggers indicated that they are excited, but sometimes there are statistical deviations as you can see from the peak and this happens when for instance, at one point in time, July 16 – Harry Potter book was introduced the half blood prince precisely at that moment, blog went up very much and we called the web, whenever there is a peak, we called the web for what letters do they speak. This is for the indicated mood of excited, but if you do this for say, Katrina – you will see that there are the top emotion as a different color, the color doesn’t mean anything, but the top emotion has a different color. And then you will see that there is regular all sorts of emotions until one year after the memorial of the – of the hurricane, all people get into the contemplative mood, so they found a 100 terms that you can indicate what your mood is – they select all this one sort to say. And then of course, the attention grows up. Now, I have a question for you – if you will type in buy – what would you expect for the dominant – so we searched through all the blogs, where there is buy and the related terms and then we searched for what was the mood indicated. Now what do you think is the prime mood if you type in buy. Is it random as in here, as the first part in here, is it all the same one? Should I think, do you expect them to be greedy? Or basically here is your answer. This is November, please note that the buy – more and more is increased – so slowly the tension builds up for the – the December month but also they all very consistent. When they think of buy, when they type on buy, there mood is tired. So this is very interesting. Okay. So this is the second search engine. Let’s move to the third one and this is not so much about how you search or what you search or how you present it but the interface which is really important. So – really sorry that – that is not here but anyway this is about computation and StreetTivo is not about peer to peer exchange of video but peer to peer indexing. And let’s present this by this video. Media center is Windows – optimized for the TV screen. It gives access to all your digital photos, music, video and hard disk recorded television using a TV remote control. So StreetTivo is about bringing multimedia analysis into your living room but it does so using peer to peer. The idea is that all those media centers in all those living rooms connected to the internet can cooperate. Things that are hard to allow are easily when done together. Together all those media centers in almost living rooms provides enormous computing power. Now I started by saying that many of the video analysis techniques are hard for computers. This means they are computation intensive. It takes good computers more than 10 hours to analyze a single hour of television program. So the idea of StreetTivo is to let many people interesting same television program cooperate. Together they can perform these advanced multimedia analysis tasks quickly - researches of the technical university to enter as they felt – Etc – and the last one is about really content browsing, now there are quite a few browsers but they are mostly based on tagging that exist, either tagging that exists with the video content or tagging that exists from social tagging like here Google does which is fine, which is needed which is helpful but in the end lots of archives or video will go which are not tagged at all for instance your home archive or the archive of a web channel which the famous concert hall of the Netherlands who archive all their recordings and they – and publish them on the web. So we did this and there is this scientific competition its called tech fit, we make video browsers of news, we do very well admits other people but let me show you my demo there. Can we switch the screen to relay on that computer thank you. Okay so this is the - I didn’t bring the news video - the news demo of the – because – but this one is very large archive of pop concerts and we have learnt the machine to discriminate drummers actions where there are drummers and if you – if you have selected - you wanted to see a drum in action, I have to say, you could go to the - to your browser and then see so I think there is something wrong sorry, let me see – I should have done this one okay - so now you get an overview through the starting from this selected drum scene which the machine did for you on the basis of you selecting the drummer and we learning to recognize drum scenes, you get on this line you get the actual remaining concert and if you go to another drum scene like this one or like that one, you can go to the – to that to that concert and this basically goes on and on for ever, it is not completely flawless because there is no drummer here unless it is the one - the one in part of the instrument but basically what we have done, what our added value is just not the way of presenting an access to a large archive of pop concerts but what we have done is analyzed the content of the video and discriminated drummers by machine learning techniques from the remaining part of the concerts and that is basically what I would like to say. Thank you. Thank you – thanks.


