Sunday, January 15, 2012

Pursuit of semantic desktop

Welcome back to our mini-series of articles exploring usage of text analysis (text mining) techniques. In this part, we are going to make few changes. The first change is regarding the source of information used for demonstration. We will move from Internet resources (like RSS/Atom feeds) into personal desktop area and regular documents - just to prove, that the same techniques can be applied here as well. The second change is that supporting visualizations are provided as YouTube videos, rather than static pictures.

The definition of Semantic desktop is straigthforward (given in Sauermann et al. 2005):

A Semantic Desktop is a device in which an individual stores all her digital information like documents, multimedia and messages. These are interpreted as Semantic Web resources, each is identified by a Uniform Resource Identifier (URI) and all data is accessible and queryable as RDF graph. Resources from the web can be stored and authored content can be shared with others. Ontologies allow the user to express personal mental models and form the semantic glue interconnecting information and systems. Applications respect this and store, read and communicate via ontologies and Semantic Web protocols. The Semantic Desktop is an enlarged supplement to the user’s memory.

To put it in a simple way - the idea here is that we can do a lot better with files located on our machines than just by seeing their name, suffix, icon and size. We can analyze the actual content of the files, recognize their topical and non-topical structures (via techniques like clustering, named entity recognition, tagging etc...), and associations between them and construct ontologies as mentioned before - to express personal mental models, rules etc. With adding a little bit of semantic-web formalism we can ask our desktop to give us those documents where Ford is mentioned as a famous actor and not successful entrepreneur, we can even ask desktops of our colleagues, with their mental models, seeking new knowledge and get insight not only in the content of documents, but also in meta information created by other (perhaps even better) brains.

The whole process can be pretty simple and straightforward - let's suppose that we have bunch of documents (unsorted and without any further knowledge beyond the mentioned attributes of name, suffix and size). We can index and mine out some important (in this case topical) structures,


The first video shows indexing and obfuscation of 1200+ files, followed by cold-start analysis and creation of the very first element. Video ends with preview of created element via simple search.

following with the important associations between them.



In the second video we will see, how easily links between previously created knowledge elements (see previous video) can be created - for example - missing link between "search/searching" and "index/indexing".  


Now let's skip the formalism and try to use our work during sample search:



Simple demonstration how previously mined topics (via text analysis) and their relationships can be used to provide users with "navigated search" experience. First a simple query "search" is fired, resulting into presentation of (500+) search results and suggested topics. "Top two topics" ("scenario" and "tool") however are not a direction we want to go, so we choose to exclude these suggestions. After that we see topic "clustering" that (in the domain of information retrieval) is the way we want to go - so we choose to drill down here - this brings up more related topics and of course narrows the resulting set of documents to nearly a half. After that we do more drill-downs and excludes - each one shaping our topic map (= mental model of one particular user) and supporting result set to the final state from which we can start exploring the documents.


Possibilities are endless - we can use structures and models to identify patterns and trends, working like this for a while we can add time factor and visualizations like variety of time-lines - all this will lead us to definition of new rules and structures, fueled by shared knowledge of group of similar professionals (social semantic desktop) build with similar tools and effort those of ours.



No comments:

Post a Comment