I met with the Jewish Film Festival today. They've got one of the best problems that an organization can have: a massive archive of assets that are inaccessible to the public. Why is this a great problem?
- Because they're rich with interestingness
- It's easy to fix.
- And once it's fixed, they'll have tremendous value to offer the public
What kind of assets? Films, articles, photos, interviews, audio recordings, ratings, datasets. You name it, they've got it stored away in a number of fairly archaic systems. Plus, they've got a yearly festival when they bring people together face to face and a dynamic community that cares a lot about the subject matter.
Of course, like any nonprofit, cash is in short supply - so they're looking for an inexpensive way to make it available...
Naturally, I told them about The Extraordinaries - and I'm hoping to include them as an "Adventure" (ie: volunteer opportunity) for our launch in June. Assuming they can get their assets into a database (maybe even use something like
google base, although their
policies are a little restrictive), we could then create a simple tagging app on The Extras that would allow volunteers to go through the archive and make some sense out of it. Of course, they'd need a great visualization of the data to make the thing sing on the front-end, but the otherwise monotonous work of cataloging could be taken care of by the crowd - made un-monotonous to each crowd-member by fitting it into few-minute chunks).
Well, that got me thinking. The tagging system described above is good for making assets accessible to (~)exact search and 'browse for similar.' What it doesn't give us is the relationships between bits of information. So, we might have a description of a film such as Schindler's List that reads:
"The relocation of Polish Jews from surrounding areas to Krakow in late 1939, shortly after the beginning of World War II." (source: IMDB)
The information contained in this sentence is only valuable to a human reader - or when searched for the exact text contained in the phrase. So, if you were searching for "Krakow and World War II," great, you've got a hit. But, if you were searching for "Skala" (23km north of Krakow) and "World War II," you'd miss out.
Enter, the
Semantic Web, the intention of which is to make information (such as the phrase above) make sense to computers by defining a set of relationships - and then connecting objects according to
those relationships.
The phrase could be semantically encoded as such:
Polish is the same as Poland
Poland is a country
Krakow is a city in Poland
World War II is a war
Schindler's List is a movie
Schlindler's List takes place in PolandWith this data (and more like it), the festival can start to ask some *really* interesting questions of their database. Such as:
Show me films made by men about the Nazi War that take place in Poland
Show me articles related to Jews in Krakow written between 1985 to 1987
Show me photos shot between 1930 and 1932 within a 100 mile radius of Krakow
So, what would you do with this newfound ability? For one, the festival would be able to answer all kinds of historical questions that were previously dis-connected. Doing a research project about Polish Jews from Krakow? You could run a series of queries to give you a whole mess of data to inform your project. Check out
DBPedia and the amazing queries of this sort that you can run using Wikipedia as your datasource - such as
"German musicians who were born in Berlin."
Not a researcher or a history hound?The festival could build a "Semantic Explorer" the likes of which no-one has ever seen. I mean that literally, because I have yet to see a compelling user interface that allows people to explore semantic relationships. Here's the UI from one of the DBPedia offshoots:

Not very exciting... or sensical to the human. Computers can dig it, but no go for the average rest of us. I've looked at a lot of these and haven't really come across anything that's blown me away. I've certainly seen visualizations that are blow away beautiful, or useful for one-off purposes, but not for generic navigation of any topic.
This one, by
MusicMap is one of the most useful I've come across. As in, you can actually navigate it and get to some useful information. And it's structure would seem to work for a variety of topics, not just music.
I have no idea if they're using a Semantic Web architecture... they could just be using track lists and album names... but, the point is that the UI could work for semantic exploration.
This Flash app called Asterisq, Constellation Framework also looks interesting, but not very pretty.
This paper spells out some of the issues surrounding semantic visualization.
This all sounds pretty great. But the big problem here is: who the heck is going to do all of this sematic encoding. It's the job of a lifetime.... or maybe not.
Maybe semantic encoding can be done by the crowd. And maybe it can be fun! Maybe it can be a game-like challenge on The Extraordinaries. Each phrase could go through a few different passes (done by the same user or different ones). Take the same phrase as above:
"The relocation of Polish Jews from surrounding areas to Krakow in late 1939, shortly after the beginning of World War II."
Task 1: identify the nouns in this phrase by tapping them. Tap twice to de-select.
"The relocation of Polish Jews from surrounding areas to Krakow in late 1939, shortly after the beginning of World War II."
(i know "Polish" is an adjective here, but it could also be a noun. Maybe scrambling the ordering of the words would obviate this problem).
Task 2: Complete this sentence: "Krakow is a ______." [When you start to fill in the blank, you get suggestions... so you start to type "Ci..." and it fills in "City" for you.] Tap the verb in this sentence to get a different verb. So if you tap "is a" you get other options like "is in," "belongs to," "takes place in," ... There's some UI to figure out here (and a smart ontology to write), but I think it's doable.
Maybe, like
GWAP, you can play against someone else who needs to agree with what you've created. Or the results of tasks 1 & 2 can go through a rating process.
This system could get even more interesting by incorporating the captioning and subtitle tracks from films. I'm really jazzed on
dotSub right now - they've got a crowdsourced system for captioning and then translating films into a gazillion languages. Every phrase from every film has timecode associated with it. So you can click on a phrase from a film and skip right to it in that film. Way cool.
Imagine combining this system with the one I describe above... so you can jump from film to film by way of relationships. If you're watching Schindler's List and someone mentions Moravia, you can pause the film and explore photos, films, articles, etc, about Moravia. You could get a list of films in which they've discussed car factories during World War II in Moravia. And then you could jump to the exact spot in the film where this topic is discussed. And you could do it in any language! You could be watching an English movie subtitled in Basque and jump to a related scene in a French movie, also subtitled in Basque.
This vision is pretty compelling. It doesn't seem incredibly complicated. Complicated, yes. But not incredibly. What am I missing? Could 90% of this be done just by tags? Have I overcomplicated something that can be simpler? Love some feedback.
I think I'm missing some piece of the Semantic Web logical puzzle. The system outlined above seems to work, but some of the encoding only has to be done once. Ie: the nouns, once identified, could be subsequently encoded by a parser, rather than by hand. So what am I missing?
Is it that, in addition to encoding text, the objects would be encoded. So, a human would build the relationships from a video file (Shindler's List) to other objects. Eg:
Steven Spielberg (a person) isDirectorOf Schindler's List (a movie)?
Posted by: Ben Rigby | February 16, 2009 at 02:16 PM