O'Reilly Emerging Technology Conference.
Books Safari Bookshelf Conferences O'Reilly Network

Arrow Home
Arrow Registration
Arrow Speakers
Arrow Keynotes
Arrow Tutorials
Arrow Sessions
Arrow At-a-Glance
Arrow BOFs
Arrow Community
Arrow Events
Arrow Exhibitors
Arrow Sponsors
Arrow Hotel/Travel
Arrow Venue Map
Arrow See & Do
Arrow Press
Arrow Join Mailing List 

April 22-25, 2003, Santa Clara -Explore. Invent. Connect.


Peer-to-Peer Semantic Search Engines: Building a Memex
Maciej Ceglowski, National Institute for Technology and Liberal Education
John Cuadrado

Track: Rich Internet Applications
Date: Wednesday, April 23
Time: 2:00pm - 2:45pm
Location: Lafayette/San Tomas/Lawrence

Imagine searching through your own research notes, Google, and a set of your favorite weblogs all at the same time, the results coming back to you ranked in a meaningful order. Imagine getting relevant results to a search even when there is no keyword match, or being able to refine your search by selecting a set of good results and asking for more.

The presenters, who first met and discussed the concept at last year's ETCon, have been working on just such a project: an open-source latent semantic search engine that lives on the desktop and lets users navigate and search their own writing - notes, articles, or weblog entries. Because it examines patterns of word use across many documents, the tool offers significantly improved search results, and can accept long natural-language search queries, including entire documents. By allowing documents to organize themselves into topic clusters, the tool also offers a macro view of the user's data, in useful digest form.

Apart from its utility as a standalone desktop program, the prototype is designed to work as a web service, creating the potential for a distributed peer-to-peer network of individual search engines. This kind of network, which is the project's ultimate goal, would allow users to send queries out over the Internet, decide where those queries should look, and receive collated results from a variety of different, complementary sources. Unlike other search aggregators, the ability to interleave results in a meaningful way is an organic part of the search algorithm's design. Whether searching weblogs, research notes, article archives, or personal notes, users would have full control over what they searched, and an unprecedented ability to make their own work accessible to others.

The presenters will be demonstrating their prototype software, available under the GPL, in hopes of attracting interest and sparking discussion.

Download presentation file

O'Reilly Home | Privacy Policy

© 2002, O'Reilly Media, Inc.