Characteristics of a Semantic Web Application

What are the characteristics of a credible Semantic Web site? That was the question on a semantic web group on LinkedIn. I attempt an answer out here.

Is there anything called a Semantic Web app?

My immediate thought was, does anyone know at all? Is there a minimum set of features that would make an application SemWeb compliant? Of course there is the vision of Tim Berners Lee on the Semantic Web out here. The Wikipedia article here does a good job of laying out the overall idea of it.

But there is no consensus, that I am aware of at least, on the minimum characteristics required for anything to be called a SemWeb app.

Without accepted criteria, anything goes

Without a established threshold it becomes easy for the hype machine to mislead and set wrong expectations. Those of you who followed the startup Twine will know what am talking about. Not everything with the SemWeb label is remotely what the vision of TimBLee implied.

Here is an answer that I proposed.

Criteria 1 – Data Portability

Use common agreed upon standards to markup information, so that they can be mashed up in contexts the original data provider did not anticipate. This is not a trivial exercise. Often data is locked in proprietary data formats and behind antique APIs. An entire industry of data integration tools exists to serve this problem.

The metadata surrounding the data is one aspect. The other is how much of this metadata actually is available at the point of consumption for consumers to leverage. This is more odious than it sounds. This would be topic for another blog post!

Check DataPortability for initiatives in this area.

Criteria 2 – Ubiquity

Make the above marked up data available in the widest possible channels. Though this is a content delivery criteria I feel its critical to derive the benefits of the SemWeb. No point hiding semantically rich data behind proprietary APIs and endpoints. HTTP and the REST route should be the protocol; XMPP is another delivery channel, if your data is time-sensitive.

Criteria 3 – Expose data graph

I use this term for lack of alternatives. Data does not live in a silo. Defining a grammar for your data via an ontology is just one aspect. There is always a reference to some other element that will enhance or clarify its meaning.

Mapping and translating between ontologies is a possibility too. Still the idea of a data graph needs to be present. Make this explicit by providing links from key entities and facts within your data, say by linking to DBPedia if the concepts involved are public. If the information is private to your organization, then allow the data consumption hops possible across applications within your organization.

Criteria 4 – Allow inferences

Too many SW apps stop at searching and aggregation. I feel some basic amount of inferences should be allowed. To make non-obvious connections bare should be the outcome for a data graph that is linked deeply. To make patterns hidden with data apparent.

I remember seeing the term Serendipity Quotient, a measure how much non-apparent connections or insights can be revealed. This could be similar to data mining but I think this is a superficial similarity. The nature of insights from the SW apps would also be on unstructured data unlike data mining which is more attuned to structured data.

Note that we are not trying to be dogmatic about which data formats or inference mechanisms are used.

Infancy of the SemWeb

Going by this criteria I think we are yet to see a proper SemWeb app. These are early days and the apps are our first attempt at building something so ambitious as a globally linked data, allowing machines to be infused with intelligence.

We also have to account for the fact that many of these criteria may be already implemented behind the scenes to pull off the kind of smart behavior we have come to expect from the SemWeb.

Your chance to add meaning!

With that I would like to pose some questions. Do you agree with the criteria above? What would you add/remove/embellish to this list? Are there apps that do all of the above?

Authoring tools and Semantics – Possibilties that outrun imagination!

Nitin K asks “Where are the meaning-enabled authoring tools?” on ReadWriteWeb. Though the article asks the right questions, the conclusions it derives, that authoring applications have not yet learnt to capture ‘semantic knowledge’ and that their XML creation capabilities are severely limited.

Now I don’t know how much research Nitin has done on this, or if its a case of selective dissonance,  but Microsoft Word since version 2003 has had really decent support for XML, which has reached maturity in the XML standards for Word 2007 with the new file formats.

If we interpret Nitin’s definition of “meaning-enabled” applications as those that can mark out any specific element of content with an XML tag, with all of the tags adhering to an XML Schema, then Word already does it.

Nitin concludes after coversations with a variety of folks that there is no intrerest in semantic authoring tools. Adding semantics for the sake of adding it does not add any value to the user, which leads to comments that question the usefulness of such an exercise.

I believe any application that seeks to be successfull and useful to consumers should strive to give minimal indication that the user is working for the collective good. This is one of the reasons why tagging works so well on the web, at least in my opinion. Because self-interest trumps collective good anyday. A user applying a bunch of tags does so in the interest of being able to recollect the tagged thing by labelling it with associated ideas and words. The fact that such tags are being viewed with a multi-dimensional lens to mine insights is something that always escapes the cognitive process of the individual.

In fact I manage a product, Word-based add-in, that does just this. Users perform actions as they would in a plain-vanilla Word document and all tagging is done by us behind the scenes. Users gain all the benefits of richly marked up content without any additional cost. The key to our approach is the seamless user experience.

But this seamlessness comes with a cost. Any user defined modification to the tags are possible only if the developer has catered for it explicitly. We allow the user to overcome this by allowing them to define additional/custom metadata before handing off the document to the next stage in the workflow process.

The Semantic Web movement gains momentum with all the attention its been getting lately. But we need to remember that apps like Word, with their support for XML, have enabled content + metadata to co-exist for a long time and that live production apps have been successfully built on top of it.

That said I recognize the benefits of RDF and RDFa, or even Microformats. The ability to run inference rules on top of a forest of triples connected to each other is rife with possibilities that would outrun my wildest imagination in a mere wink! But we need to observe and gather the lessons of the past. Tags: , , , ,

In the beginning was Word 2003

To elaborate on the work that I do- I manage a small offshore development team of 11 for a equity research authoring tool.
We leverage the Smart Documents feature of Word 2003 extensively. This allows us to mark specific content fragments within the document with XML tags, thus allowing validations to be performed, to display content specific UI on the Word Task Pane, display SmartTags that suggest context specific actions and so on.

My first task when I joined the team was to play a technology role, primarily to design and implement a feature that we call compilations. Easiest way to think of this feature is to remember ETL, as in Extract Transform Load of Warehousing data. We extract content from source documents authored with our tool, perform some transformations and load them onto a new target document.

In developing this feature I have learnt a lot of technical and non-technical aspects involved in designing, implementing and maintaining a product feature. Over the next few blog entries I hope to start off with the concrete technical topics and then move onto the more abstract management related areas.

Topic of my technical posts will be on WordML and how it can be leveraged to create documents that hold structured xml tagged content.