Musings from Mars Banner Image
For Software Addicts: Yes!MaybeNah!
Mars Report:

Taking a Snapshot of the Semantic Web:
Mighty Big, But Still Kinda Blurry

Published February 21st, 2009
title text

It's still somewhat difficult to get a handle on exactly what is meant by the "Semantic Web," and whether today's technologies are truly able to realize the vision of Tim Berners-Lee, who first articulated it back in 1999. From what I've read, I think there's general agreement that we aren't even close to being "there" yet, but that many of the ongoing Semantic Web activities, technologies, development platforms, and new applications are a big leap beyond the unstructured web that still dominates today.

There is a huge, seemingly endless amount of work being done by thousands of groups all trying to contribute to making the Semantic Web a reality. In my few weeks of research, I still feel as though I've just stepped my toe into that vast lake of semantic experimentation. Partly as a result of the many disparate projects, however, it does become rather difficult to see the entire forest for all the tiny trees. That said, these thousands of groups do appear to be working more or less together on the basis of consensus-based open standards, and they have set up mechanisms to keep everyone abreast of new ideas, solutions, and projects, under the general leadership of the World Wide Web Consortium (W3C)'s Semantic Web Activity. Semantic Web Stack As Envisioned by Berners-Lee As a starting point for exploration into this topic, the Wikipedia article that describes the Semantic Web Stack is quite good. Among its good overview and many useful links, the article includes the original conception of the Stack as designed by Berners-Lee. Besides cataloguing the sheer number of different projects all tackling different aspects of building a Semantic Web, it's important to distinguish ongoing projects from those that expired years ago—a distinction that's not always readily apparent to those peering in from the outside. Even excluding these, there are far too many projects to read up on in a few weeks, so this snapshot is necessarily incomplete. But after having the content reviewed by some Semantic Web experts, I'm confident it includes all the most significant threads of this new web, which, as Berners-Lee envisioned it:
I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.
In my tour of the Semantic Web as it exists today, it's interesting to note that most of the projects are geared not toward machine-to-machine interaction, but rather to the traditional human-to-machine. Humans being by nature anthropocentric, the first steps being taken toward Berners-Lee's vision are to build systems that are semantically neutral with respect to human-to-human communication. Once we can reliably discuss topics without drifting off into semantic misunderstandings, then perhaps we can start teaching machines "what we mean by" ... This paper is an attempt to assess the current state of today's steps, while compiling a list of resources that would prove useful to someone thinking about building a Semantic Web application in 2009. Challenges to Building Semantic Web Applications The process of applying concepts from the Semantic Web to build richer, more knowledge-oriented applications presents developers with several, somewhat challenging prerequisites:
  • Taxonomies for the content being published,
  • Ontologies for the content, based on the developed taxonomies,
  • Content tagged using the developed ontologies,
  • Database tools for storing and serving RDF and/or OWL ontologies,
  • Database tools for connecting ontologies with the content they describe,
  • Application server specializing in querying and formatting semantic content,
  • User interface tools to present semantic content in optimum, not necessarily traditional, ways.
Ontology standards One of the base specifications for ontologies, RDF (Resource Description Framework), is a well established standard based on XML and URIs. that is the basis for all the news feeds and podcasts one can subscribe to today. DAML (DARPA Advanced Markup Language) is one of the early ontology standards built as an extension to XML and RDF. Still widely used, DAML is also the precursor to OWL. OWL (Web Ontology Language) is a sophisticated framework built on top of RDF and is perhaps the most well known and most adopted of such ontology languages. OWL is the standard adopted by the W3C (the official standards body for web specifications). At the moment, there are several different flavors of OWL, which makes adopting OWL more challenging than using RDF.
Each of these requirements present a fairly steep learning curve to developers who have not previously worked with the technologies to build Semantic Web applications. Solutions for aiding with some of the requirements exist, but it's not clear how effective they are at this stage. For example, I have listed some tools that assist in extracting semantics from unstructured documents, and others that do something similar with content stored in relational databases. On the other hand, the process of tagging unstructured content appears to have no good automated solutions. The process of building an ontology can be quite time-consuming, unless there happens to be an existing ontology you can reuse. There are several extensive online libraries of ontologies that can help. One fly in this bibliographical ointment, however, is the difficulty one may face in choosing among different, perhaps conflicting, ontologies on the same topic. It's important to emphasize that one of the first steps in building an ontology is to build a taxonomy. Although ontologies are not taxonomies, they use taxonomies as their jumping-off point. Therefore, one has to be ready and able to build a taxonomy for a subject before one can build an ontology. Many of the tools and projects included here are designed to assist with building and browsing a web of Linked Data, rather than true semantic data. Some of the demo browsers for linked data don't strike me as being particularly relevant to most end-user requirements for knowledge management. However, linked data is quite useful in integrating content from across the web, and projects built around it typically make heavy use of RDF, SPARQL, and other related specifications. Websites that make use of linked data represent the vanguard of the application of Semantic Web concepts, and their number appears to be exploding at the moment. Well-known examples of the use of linked data are websites built as "mashups," such as Google Maps. Use of Microformats and RDF Triples is also a typical component of websites that expose their content as linked data. More powerful tools exist in the form of integrated application server suites, such as the OpenLink Virtuoso server, Cyc Knowledge Server, and Intelligent Topic Manager. The first two have open source versions that can be used by developers to "dip their feet" into the task of building a Semantic Web applications using sample data. Of the two, I was most impressed with the breadth of tools in the OpenLink project, as well as with the range and vibrancy of the Virtuoso developer community. Virtuoso also comes with a rich set of user interface "widgets" that can be of great assistance in presenting semantic information appropriately. A Possible Approach To sum up, the landscape of the Semantic Web is still quite fuzzy and volatile, with many mountains of activity building up rapidly and eroding with nearly equal speed. Which landforms will remain once the evolution is complete is impossible to say here in 2008. However, the landscape is exciting to watch and flush with tantalizing experiments that will undoubtedly inspire more experimentation in the years ahead. Obviously, given all of the preceding caveats, the decision to engage in a Semantic Web experiment cannot be made lightly. One must have a clear idea of the knowledge management/presentation problem that such an experiment is designed to solve, and an understanding of the resources that will need to be devoted to the project. Although the maturity of tools, standards, and processes for such a project is quite young, it would definitely be in the interests of an organization with suitable candidate data and sufficient resources (including time) to begin an experiment of its own as a learning exercise.
What is an ontology? An ontology is a systematic description of concepts, in detail and thoroughness such that a machine encountering the concept could "understand" it. In this aspect, ontology development is closely related to research into artificial intelligence. In the past, humans have relied on complex taxonomies to describe the way abstract ideas and concrete individuals relate to one another. Ontologies differ from taxonomies in the complexity and thoroughness of describing the relationships between the elements of a taxonomy. A typical taxonomy is a tree structure that arrays terms as categories and subcategories. However, no subcategory has any notion of its relationship to its siblings, nor to any other categories elsewhere in the tree. An ontology can describe these relationships, thereby enriching one's understanding of what a given category means. Further, each category (or "concept", or "class") in the tree can have its own distinct properties. Properties describe the relationships between and among individuals in the ontology. Individuals are the specific instances of each class that the ontology needs to be able to describe. Properties are characteristics of a class that help distinguish one group of individuals from another. For example, if we have a class "job" in our ontology, with a subclass "administrative" and a further subclass "computer specialist," we could distinguish all the individuals who are computer specialists by defining the job's characteristics (properties). A computer specialist "writes software programs," "performs desktop support," "manages databases," "builds web applications," and so on. With an ontology, we could very richly define a group of individuals using such properties. This is a simplistic overview of properties… OWL provides a vast array of ways to describe properties and of the types of properties one can describe.
I would advise against a major expenditure for such an experiment, however. As noted, given the state of the technology, it strikes me as being unwise to invest a large sum in any commercial product to use as an application platform. Most of the tools that exist for building Semantic Web applications have open source licenses, so it makes sense to restrict experimentation to such tools for now. The data store chosen for such an experiment should ideally be one that currently suffers from being both fragmented and unstructured, existing in incompatible file formats and stored in different locations within the organization's Intranet—all factors that make it difficult for users to locate specific information. Given the uncertainties surrounding such an experiment, the data store chosen should also be one that is not so volatile that time pressures can cause discontinuities in content over the course of the project. Whoever undertakes such a Semantic Web experiment needs to be prepared to conclude that the effort required to bring their experiment to fruition is too great to justify the added value. Even if this were to prove true in 2009, I'm confident that the impressive swirl of activity taking place now will coalesce into truly usable techniques and tools within a few years. The standards on which the Semantic Web will be built are still evolving, but they are much more mature than the methods developers have built to turn those standards into working applications. Therefore, having gotten one's feet wet in the state of things this year will undoubtedly provide a solid foundation for building Semantic Web applications in coming years. The bulk of this report consists of a compilation of resources on various aspects of the Semantic Web and developing Semantic Web applications. The resources are divided into the following categories:
Ontology Development Tools
  • Comes in two "flavors": Version 3.4 handles both OWL and RDF ontologies, while 4.0 is geared toward the latest OWL standards only.
  • Impressive software for creating OWL ontologies.
  • User interface is well organized, given the complexity of the objects and properties you're dealing with. The interface also must handle multiple views of the information, and it does so quite well.
  • Numerous plugins for Protege make specific task work easier. There are many more plugins for Protege 3.4 than for 4.0 at this time.
  • One plugin enables database connections, with which you can import entire databases or tables, including their contents. Tables typically become OWL objects, and columns become object properties. Impressively, this tool also creates a complete form with which you can enter new instance information. Each form field can also be customized after creation.
  • Protege can also export ontologies to "OWL Document" format, which is a browsable HTML representation of the ontology.
  • Stanford is developing a web-based version of Protege. The beta URL is at Web Protege.
Protege Plug-Ins
  • OntoLT. The OntoLT approach aims at a more direct connection between ontology engineering and linguistic analysis. Used with Protege, OntoLT can automatically extract concepts (Protégé classes) and relations (Protégé slots) from linguistically annotated text collections. It provides mapping rules, defined by use of a precondition language that allow for a mapping between linguistic entities in text and class/slot candidates in Protégé. (This plug-in is only available for Protege 3.2.)
  • There are a wide array of plug-ins for Protege 3.2, and a much smaller set for 4.0. This page from the "old" Protege wiki has good links to the full library of Protege plug-ins.
  • Ontowiki is a tool providing support for agile, distributed knowledge engineering scenarios. It facilitates the visual presentation of a knowledge base as an information map, with different views on instance data. It enables intuitive authoring of semantic content, with an inline editing mode for editing RDF content, similar to WYSIWIG for text documents. Ontowiki is built on the Powl platform. I have downloaded and installed an instance of Ontowiki on my home computer; the installation and configuration was quite simple.
Application Development Tools
The list in this section is just a small subset of the tools now available for building Semantic Web applications. There are several complete, continuously updated lists on the web, including those at SemWebCentral and the Semantic Web Company. Developer Resources
  • SemWebCentral is an Open Source development web site for the Semantic Web. It was established in January, 2004 to support the Semantic Web community by providing a free, centralized place for Open Source developers to manage Semantic Web software and content development. Another purpose is to provide resources for developers or other interested parties to learn about the Semantic Web and how to begin developing Semantic Web content and software. SemWebCentral has the following major portals:
  • Web Tools by category, a list of 148 projects organized by topic and a wide variety of other attributes.
  • Code snippets, an archive of code snippets, scripts, and functions developers have shared with the open source software community.
  • Learn About the Semantic Web, a collection of overviews, tutorials, and papers covering Semantic Web topics.
  • Programming With RDF is part of the RDF Schemas website. It has links to repositories of programmer resources by programming language, showing the kind of documentation, code, and tutorials covered by the repository.
  • Semantic Web Tools is a comprehensive list of over 700 developer tools now available for semantic-web-related projects. There are several such lists on the web, but this one is particularly good since it breaks the list down by category and language, making it much easier to narrow down the list you're interested in. This site is hosted by the Semantic Web Company.
  • Developers Guide to Semantic Web Toolkits collects links to Semantic Web toolkits for different programming languages and gives an overview about the features of each toolkit, the strength of the development effort and the toolkit's user community.
Frameworks Sesame
    • Extensions and Plugins
    • Rio, a set of parsers and writers for RDF that has been designed with speed and standards-compliance as the main concerns. Currently it supports reading and writing of RDF/XML and N-Triples, and writing of N3. Rio is part of Sesame, but can also be downloaded and used separately.
    • Elmo is a toolkit for developing Semantic Web applications using Sesame. Elmo wraps Sesame, providing a dedicated API for a number of well known web ontologies including Dublin Core, RSS and FOAF. The dedicated API makes it easier to work with RDF data for the supported ontologies. Elmo also offers a set of tools related to the supported ontologies, including an RDF crawler, a FOAF smusher and a FOAF validator.
  • Sesame is an open source Java framework for storing, querying and reasoning with RDF and RDF Schema. It can be used as a database for RDF and RDF Schema, or as a Java library for applications that need to work with RDF internally. Sesame is extremely flexible in how it's used and can work with a variety of data stores, including relational databases and native RDF files. It can be deployed as a server, or as a library incorporated into another application framework. For example, Sesame can be used simply to read a big RDF file, find the relevant information for an application, and use that information. Sesame provides the necessary tools to parse, interpret, query and store all this information, embedded in another application or, if appropriate, in a seperate database or even on a remote server. More generally, Sesame provides application developers a toolbox that contains all the necessary tools for building applications with RDF. Commercial support for Sesame is available from Aduna Software.

    Sesame also has a large ecosystem of addons and related toolsets. The following are the main links to these.

    Jess is a rule engine and scripting environment written entirely in Sun's Java language by Ernest Friedman-Hill at Sandia National Laboratories in Livermore, CA. Using Jess, you can build Java software that has the capacity to "reason" using knowledge you supply in the form of declarative rules. Jess is small, light, and one of the fastest rule engines available. Its powerful scripting language gives you access to all of Java's APIs. Jess includes a full-featured development environment based on the award-winning Eclipse platform.

    A Jess Plugin for Protege is available, integrating Jess development with your ontology.

    • ARQ, which is a query engine for Jena. ARQ supports multiple query languages (SPARQL, RDQL, and ARQ, the engine's own language), and besides Jena it can be used with general purpose engines and remote access engines. ARQ can also rewrite queries to SQL.
    • Joseki, an HTTP server-based system that support SPARQL queries. Joseki features a WebAPI for the remote query and update of RDF models, including both a client component and an RDF server. The Joseki server can run embedded in an application, as a standalone program, or as a web application inside a suitable application server (such as Tomcat). It provides the operations of query and update on models it hosts.
  • Jena is a Java framework for building Semantic Web applications. It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine. Jena is open source and grown out of work with the HP Labs Semantic Web Programme. Important tools related to the Jena framework include:
The Owl API
    • RDF/XML parser and writer
    • OWL/XML parser and writer
    • OWL Functional Syntax parser and writer
    • Turtle parser and writer
    • KRSS parser
    • OBO Flat file format parser
    • Support for integration with reasoners such as Pellet and FaCT++
  • The OWL API is an open-source Java interface and implementation for OWL, focused towards OWL 2 which encompasses OWL-Lite, OWL-DL and some elements of OWL-Full. The OWL API was used to build Protege 4.0 and was developed by Co-Ode, the company that works with Stanford University on the Protege project. It encompasses tool for the following tasks:
    Powl is a web-based platform for building applications designed to support collaborative building and managing of ontologies. It supports many of the features of mature tools like Protege, but for web applications that can be used for team development of ontologies. Powl is an open source project that uses PHP and various RDBMS systems on the back-end. Ontowiki is an example of a collaborative application built using Powl.
Visualization and Query Tools Jambalaya OntoVista
    The University of Georgia, as described in the next section of Semantic Applications, has built a large number of interesting semantic software. OntoVista is a particularly useful ontology visualization, navigation, and query tool based on Jambalaya. OntoVista is adaptable to the needs of different domains, especially in the life sciences. The tool provides a semantically enhanced graph display that gives users a more intuitive way of interpreting nodes and their relationships. Additionally, OntoVista provides comfortable interfaces for searching, semantic edge filtering and quick-browsing of ontologies.
SWRL (Semantic Web Rule Language)
    SWRL is intended to be the rule language of the Semantic Web and is based on OWL. It allows users to write rules to reason about OWL instances and to infer new knowledge about those instances.
    Pellet is an open source, OWL DL reasoner in Java that is developed, and commercially supported, by Clark & Parsia LLC. Pellet provides standard and cutting-edge reasoning services. It also incorporates various optimization techniques described in the DL literature and contains several novel optimizations for nominals, conjunctive query answering, and incremental reasoning.

    Pronto is an extension of Pellet that enables probabilistic knowledge representation and reasoning in OWL ontologies. Pronto is distributed as a Java library equipped with a command line tool for demonstrating its basic capabilities. It is currently in development stage—more robust and mature than a mere prototype, but less mature than a production-level system like Pellet.

    Pronto offers core OWL reasoning services for knowledge bases containing uncertain knowledge; that is, it processes statements like “Bird is a subclass-of Flying Object with probability greater than 90%” or “Tweety is-a Flying Object with probability less than 5%”. The use cases for Pronto include ontology and data alignment, as well as reasoning about uncertain domain knowledge generally; for example, risk factors associated with medical conditions like breast cancer.

OWL Ontology Validator
  • This online tool, developed as part of the WonderWeb Project, attempts to validate an ontology against the different "species" of OWL. Any constructs found which relate to a particular species will be reported. In addition, if requested, the validator will return a description of the classes, properties and individuals in the ontology in terms of the OWL Abstract Syntax.
Seamark Navigator
    Seamark Navigator is part of the commercial Information Access Platform from Siderean. Navigator is the relational navigation server component,which discovers and indexes content, pre-calculates relationships and suggests paths for data exploration. Its primary architectural components include a metadata aggregator, a scalable RDF store, and a relational navigation engine, all within an industry-standard Web services interface.
Unstructured Content Mining Tools Calais
  • The Calais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing, machine learning and other methods, Calais analyzes your document and finds the entities within it. Calais goes beyond classic entity identification and returns facts and events hidden within your text as well.
Cortex Competitiva Platform
  • Cortex Competitiva employs collectively both state-of-the-art text mining technologies and consolidated techniques in data mining. The main modules of the platform are Information Collection, Information Organization and Collaboration, and Information Use Analysis.
IdentiFinder Text Suite
    IdentiFinder Text Suite, a product of BBN Technologies, lets users quickly sift through documents, web pages, and email to discover relevant information. It helps solve the classic problems of text mining: First, how to identify significant documents and then, how to locate the most important information within them.
    DL-Learner is a tool from AKSW for learning concepts in Description Logics (DLs) from user-provided examples. Equivalently, it can be used to learn classes in OWL ontologies from selected objects. The goal of DL-Learner is to construc knowledge about existing data sets. With DL-Learner, users provide positive and negative examples from a knowledge base for a not yet defined concept. The goal of DL-Learner is to derive a concept definition so that when the definition is added to the background knowledge all positive examples follow and none of the negative examples follow. See also the Wikipedia entry for ILP (Inductive Logic Programming). What DL-Learner considers is the the ILP problem applied to Descriptions Logics / OWL.
Transformation Tools GRDDL
    GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. It is a technique for obtaining RDF data from XML documents and in particular XHTML pages. GRDDL provides an inexpensive set of mechanisms for bootstrapping RDF content from XML and XHTML. GRDDL does this by shifting the burden of formulating RDF away from the author to transformation algorithms written specifically for XML dialects such as XHTML. A repository of transformations is available.
    The Simile project has developed a large number of "RDFizers," which convert various file formats into RDF. This page also contains links to the many RDFizers developed by other organizations to handle even more document types.
Database Tools
Query Languages and Tools SPARQL Query Language for RDF
    SPARQL is a w3c specification for querying RDF repositories. It can be used to express queries for native RDF files or for RDF generated from stored ontologies via middleware. he results of SPARQL queries can be results sets or RDF graphs.
    Owlgres is an open source, scalable reasoner for OWL2. Owlgres combines Description Logic reasoning with the data management and performance properties of an RDBMS. Owlgres is intended to be deployed with the open source PostgreSQL database server. Owlgres’s primary service is conjunctive query answering, using SPARQL-DL.
    D2RQ is a declarative language to describe mappings between relational database schema and OWL/RDF ontologies. The D2RQ platform uses these mapping to enables applications to access RDF views on a non-RDF database through the Jena and Sesame APIs, as well as over the Web via the SPARQL Protocol and as Linked Data.
Conversion/Transformation Tools OntoSynt
    OntoSynt provides automatic support for extracting from a relational database schema its conceptual view. That is, it extracts semantics "hidden" in the relational sources by wrapping them by means of an ontology. The approach is specifically tailored for semantic information access, enabling queries over an ontology to be answered by using the data residing in its relational sources. Its web interface accepts an XML representation of an RDBMS schema, which can be generated using a tool like SQL Fairy.
    Relational.OWL is an open source application that automatically extracts the semantics of virtually any relational database and transforms this information automatically into RDF/OWL ontologies that can be processed by Semantic Web applications.
Triplify SQL Fairy
    SQL Fairy is a group of Perl modules that manipulate structured data definitions (mostly database schemas) in interesting ways, such as converting among different dialects of CREATE syntax (e.g., MySQL-to-Oracle), visualizations of schemas, automatic code generation, converting non-RDBMS files to SQL schemas (xSV text files, Excel spreadsheets), serializing parsed schemas (e.g., via XML), creating documentation (e.g., HTML), and more.
Application Servers
OpenLink Virtuoso Universal Server
  • Virtuoso, developed by OpenLink Software, is a complex product that appears to be a total solution for hosting Semantic Web applications, among other uses. In the company's words, from a recent release: "Virtuoso enables end users, systems architects, systems integrators, and developers to interact with data at the conceptual as opposed to the traditional logical level. Data about customers, suppliers, invoices, and orders, stored in existing ODBC- or JDBC-accessible database systems such as Oracle, Informix, Ingres, SQL Server, Sybase, Progress, and MySQL, can be presented in RDF form for use in Semantic Web applications."
  • Virtuoso is also available in an Open Source Edition, a very active project that includes a large number of modules for use with various content management systems. The main difference between the open source and commercial editions of Virtuoso is the Virtual Database Engine, which essentially enables an application to incorporate multiple data servers in its queries.

    Also available as open source from OpenLink is its OpenLink Ajax Toolkit (OAT), which comes with a wide range of user interface and data widgets, as well as complete applications for building data queries, designing databases, and designing web forms. The OpenLink Data Explorer is one of these standalone OAT applications. Widgets that are part of OAT include:

    The standalone applications running on the Open-Source Edition all incorporate widgets from the OAT to create quite robust, desktop-application-like tools (the username/password for all of these is demo/demo):
  • OpenLink also provides OpenLink Data Spaces (ODS), which run on the Virtuoso server, either the commercial or open-source editions. ODS enables developers to create a presence in the Semantic Web via Data Spaces derived from Weblogs, Wikis, Feed Aggregators, Photo Galleries, Shared Bookmarks, Discussion Forums and more. Data Spaces thus provide a foundation for the creation, processing and dissemination of knowledge for the emerging Semantic Web. ODS is pre-installed as part of the demonstration database bundled with the Virtuoso Open-Source Edition. Existing ODS modules include:
Cyc Knowledge Server Intelligent Topic Manager
  • Intelligent Topic Manager (ITM) is a commercial semantic software platform that enables a wide range of applications in enterprise information systems. ITM is designed to help organizations leverage, organize and model content and knowledge, to manage business reference models and taxonomies, to categorize and classify content, and to empower search. The platform consists of the following components and functionalities:
Oracle Semantic Technologies
  • Oracle Spacial 11g is an open, scalable RDF management platform. Based on a graph data model, RDF triples are persisted, indexed and queried, similar to other object-relational data types. Application developers can use the Oracle server to design and develop a wide range of semantic-enhanced business applications.
Asio Tool Suite Available from BBN, the Asio Tool Suite is focused primarily on building Semantic Web applications by integrating an enterprise's existing databases and systems without the need for complete reengineering. Designed to address the volume, variety, and exponential increase in enterprise data, the Asio Tool Suite supports information discovery via Semantic Web standards and provides for data accessibility via queries posed in a user’s own ontology. The suite further enables integration of systems by building bridges in semantic meaning from one system to another. The suite consists of the following components: Parliament
    Asio Parliament, released as open source, implements a high-performance storage engine that is compatible with the RDF and OWL standards. However, it is not a complete data management system. Parliament is typically paired with a query processor, such as Sesame or Jena, to implement a complete data management solution that incorporates SPARQL standards. In addition, Parliament includes a high-performance SWRL-compliant rule engine, which serves as an efficient inference engine. An inference engine examines a directed graph of data and adds data to it based on a set of inference rules. This enables Parliament to fill in gaps in the data automatically and transparently, inferring additional facts and relationships in the data to enrich query results.
    Asio Cartographer is a graphical ontology mapper based on SWRL. It utilizes the core functionality of BBN's Snoggle open-source mapping tool to assist in aligning OWL ontologies. It lets users visualize ontologies and then draw mappings between them on an intuitive graphical canvas. Cartographer then transforms those maps into SWRL/RDF or SWRL/XML for use in a knowledge base.
    Asio Scout provides semantic bridges to relational databases and web services that let an organization keep their existing systems in place for as long as necessary to, for example, support ongoing operations. Scout's semantic bridges act like any passive data consumer, but unlike other counterparts, their functionality— in concert with Asio Semantic Query Distribution's high-level perspective—enables consolidated knowledge discovery that wasn't previously conceivable. Scout can be used for web portals, standalone desktop applications, or web-enabled applications.
Semantic Application Demos
Browsers and Search Portals
  • Disco - Hyperdata Browser is a simple browser for navigating the Semantic Web as an unbound set of data sources. The browser renders in HTML all information that it can find on the Semantic Web about a specific resource. This resource description contains hyperlinks that allow you to navigate between resources. While you move from one resource to another, the browser dynamically retrieves information by dereferencing HTTP URIs and by following rdfs:seeAlso links.
  • Umbel Subject Concepts Explorer is a lightweight ontology structure for relating Web content and data to a standard set of subject concepts. Its purpose is to provide a fixed set of reference points in a global knowledge space. These subject concepts have defined relationships between them, and can act as binding or attachment points for any Web content or data.
  • Openlink Data Explorer is one product developed from the open-source version of the Virtuoso Universal Server product. This is the platform used by the DBPedia project, including the demos on the DBPedia page. The demo below shows the XHTML view option of a Data Viewer ontology query.
  • Zitgist DataViewer lets users browse linked data on the web, starting from an RDF or OWL ontology URL.
  • The Sindice Semantic Web Index monitors, harvests existing web data published as RDF and Microformats and makes them available under a coherent umbrella of functionalities and services. Its index of data is presented as a search portal much like Google. Sindice is created at DERI, the world’s largest institute for Semantic Web research. It is based on DERI’s unique cluster technology which indexes and operates over terascale semantic data sets (trillions of statements) while also providing very high query throughputs per cluster size. Leveraging unique cluster technologies, Sindice performs sophisticated reasoning which dramatically enhances data reusability, search precision, and recall. It obtains data by focused crawling methods which detects and focuses on metadata rich internet sources.
  • The RKB Explorer is an application built using awards data from the National Science Foundation (NSF). It has used this data to build ontologies around NSF grants, and users can search and browse the data through the Explorer. All URIs on this domain are resolvable, and search results deliver HTML or RDF, depending on the content. The browse interface provides viewing and navigating using RDF triples, and the query interface provides access using SPARQL. I discovered this useful application through a search on "NSF funding" using Sindice.
  • Marbles Linked Data Browser is a server-side application that formats Semantic Web content for XHTML clients using Fresnel lenses and formats. Colored dots are used to correlate the origin of displayed data with a list of data sources, hence the name. Marbles provides display and database capabilities for DBpedia Mobile.
  • The Cyc Foundation Concept Browser lets users search and browse the content of the OpenCyc knowledge base.
  • Brownsauce is a Semantic Web browser that lets users browse RDF files on the web. It runs as a local Java client and has a built-in Jetty web server. Brownsauce uses the Jena Semantic Web framework.
Ontology Viewers and Query Tools
  • DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to Wikipedia data. DBpedia is one of the projects developed/sponsored by AKSW. A wide variety of articles and publications about DBpedia have been published (see the Resources section of this report).
  • jSpace is a WebStart java application that demonstrates how one might search and query a given ontological database. There are several example database available to download for use with jSpace. jSpace's development was apparently inspired by mSpace. (mSpace was an innovative, but now defunct, project that attempted to merge the power of Google with the powerful interface of iTunes. Although the mSpace demo of a classical music explorer is not accessible now, it's well worth checking out the video demos of it.)
  • Owlsight is an innovative web application that uses the Google Web Toolkit and the Est JavaScript library to let users navigate OWL ontologies, browsing the relationships between classes, properties, and instances. Owlsight uses the Pellet ontoloty reasoner.
  • OpenCyc for the Semantic Web is both a project and an OWL ontology browser. Using this tool, users can access the entire OpenCyc content as downloadable OWL ontologies as well as via Semantic Web endpoints (i.e., permanent URIs). These URIs return RDF representations of each Cyc concept as well as a human-readable version when accessed via a Web Browser.
Knowledge/Content Management
  • The KiWi wiki project proposes a new approach to knowledge management that combines the wiki philosophy with the intelligence and methods of the Semantic Web. (KiWi stands for "Knowledge in a Wiki.")
  • DeepaMehta is a software platform for knowledge management. Knowledge is represented in a semantic network and is handled collaboratively. The DeepaMehta user interface is completely based on Mind Maps / Concept Maps. Instead of handling information through applications, windows and files, with DeepaMehta the user handles all kind of information directly and individually.
  • Semantic MediaWiki and SMW+are extensions to the MediaWiki platform, described elsewhere in this report.
Application Repositories
  • MIT's Simile project has been extremely creative and productive in applying concepts of linked data, RDF, and the Semantic Web generally to demonstration applications, all available as open source. (Simile is an acronym for "Semantic Interoperability of Metadata and Information in unLike Environments".) Some of its projects are included elsewhere in this report, but here is a list of some others relevant to the Semantic Web:
    • Longwell, a server application that applies concepts of faceted browsing with visualizing RDF stores.
    • PiggyBank is a Firefox add-on that enables users to develop "mashups" of web data by using "screen scrapers." The software also allows users to tag information found and embed RDF into their content.
    • RDFizers, described elsewhere in this report.
    • Referee, a server application that creates browsable RDF files from web server logs.
    • Welkin, an RDF visualizer built as a client-side java application. (Note: I couldn't get it to run on my Mac, even though MIT makes a Mac OS X disk image available.)
    • Fresnel, a vocabulary for displaying RDF.
    • Banach, a collection of operators that work on RDF graphs to infer, extend, emerge or otherwise transform a graph into another.
    • Data Collecton, a project that aims to develop a collection of RDF data sets that are generally useful for the metadata research and tools community.
  • DERI (Digital Enterprise Research Institute) International is the collection of bi-lateral agreements between like minded institutes working on the Semantic Web and Web Science. Its mission is to exploit semantics for people, organizations, and systems to collaborate and interoperate on a global scale. DERI conducts and funds research in Semantic Web technologies, conducts projects that have led to numerous prototype applications, and develops ontologies. The following are a few interesting links from DERI's Irish branch in Galway:
    • Research Clusters covering such topics as eLearning, Semantic Reality, Semantic Web Services, Industrial and Scientific Applications of Semantic Web Services, and Social Software. Each cluster has its own website and projects.
    • Research Projects, a lengthy list of ongoing projects.
    • Tools, a lengthy list of software tools available for download, typically from SourceForge.
  • University of Georgia's Large Scale Distributed Information Systems has a wide array of semantic applications available. The online repository has descriptions, downloads, and online demos. The applications cover such functions as visualization, ontology queries, ontology browsing, web services, and more.
  • 10 Semantic Apps To Watch From the ReadWriteWeb site, this is an intriguing list of new semantic-web-related applications that are now available out there. The article gives first explains what they mean by a "Semantic Application," and then briefly describes each application's innovative use of this new technology. The ten applications listed are:
  • It's also interesting to read the comments at the end of this article, many of which are from readers pointing out other semantic applications they have discovered.
Semantic Website Enhancements
Semantic Web Crawling: A Sitemap Extension
    This specification allows website managers to provide an RDF sitemap which would be visible to users browsing the Semantic Web.
    Triplify is an open-source, light-weight add-on to web applications that can read the content of the application's relational database(s) and expose their inherent semantics. According to the Triplify website, for a typical Web application a configuration for Triplify can be created in less than an hour. Triplify is based on the definition of relational database queries for a specific Web application in order to retrieve valuable information and to convert the results of these queries into RDF, JSON and Linked Data. A "triplified" web application can then provide its data to other applications on the web, enabling use of its information in "mashups."

    The Triplify project already has configurations for a variety of widely used content management systems, such as OpenConf, WordPress, Drupal, Joomla!, osCommerce, and phpBB. (The page that has links to these configurations also has a great list of other Semantic Web resources.) Triplify is one of the applications developed by AKSW. (I plan to download Triplify and integrate it in an instance of WordPress on my home computer.)

    Microformats are orthogonally related to the Semantic Web through their use of RDF-like attributes in CSS Class elements. Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. They are highly correlated with semantic XHTML, sometimes referred to a "real world semantics", or "lossless XHTML." Microformats are designed to enable more/better structured blogging and web publishing. The Microformats site provides an array of code and tools for use in producing markup in microformats.
    RDFa in HTML is a proposed W3C specification that enables markup of RDF-like syntax into XHTML content. RDFa in XHTML provides a set of XHTML attributes to augment human-readable contenta with machine-readable hints. It enables the expression of simple and more complex datasets using RDFa, and in particular turns the existing human-visible text and links into machine-readable data without repeating content. The goals and approach of this specification are similar to that of Microformats, but it extends XHTML by use of and RDF-like syntax rather than using CSS classes.
    Exhibit is a three-tier web application framework written in Javascript, which you can use with various kinds of data files, including JSON and RDF, to produce knowledge-enhancing "mashups" like Google Maps. Exhibit creates interactive user interfaces displaying record data sets on maps, timelines, scatter plots, interactive tables, etc. Exhibit is one of the projects in knowledge management developed by MIT, partly with NSF funding.
Semantic MediaWiki
    Semantic MediaWiki (SMW) is a free extension of MediaWiki – the wiki system powering Wikipedia – that helps to search, organise, tag, browse, evaluate, and share the wiki's content. While traditional wikis contain only texts that computers can neither understand nor evaluate, SMW adds semantic annotations that bring the power of the Semantic Web to the wiki.
    SMW+ is Ontoprise's production version of the open source Semantic MediaWiki + Halo Extension software, which was originally developed as part of the 2003-04 Halo project for scientific information discovery. SMW+ makes the process of annotating wiki content much easier by adding a variety of useful interface tools, and it also helps writers research information by using the wiki's built-in ontology browser. SMW+ is designed to enable and enhance knowledge collaboration in organizations. It's available as a free download from Sourceforge, or as a reasonably priced bundled version for Windows. Ontoprise also offers service contracts for the product. The impressive detailed list of features on the Ontoprise website gives a good overview of SMW+ capabilities. These include:
    • Semantic Toolbar: Lets users create, inspect and alter semantic annotations in the wiki text without knowing the annotation syntax.
    • Advanced Annotation Mode: In this mode, wiki pages are displayed in the same way as they are displayed in the standard view mode. However, users can easily add annotations by simply highlighting the word or passage they want to annotate.
    • Ontology Browser: Allows easy navigation through the wiki's ontology without the need to access individual articles. It helps the user to understand the ontology and to keep an overview about it.
    • Question Formulation Interface: Normally, making queries against the semantic wiki involve knowing and using a complex syntax. The Question Formulation Interface provides a graphical interface that lets inexperienced users easily compose their own queries.
    • Auto completion: This tool greatly simplifies users' ability to generate annotations. With auto completion activated, users don't have to care about correct spelling of an article’s or property's name, because the tool extracts possible completions from the semantic context. For example, it checks what attribute values are possible for a particular attribute and show only these to the user. This tool is used in the wiki text editor, the semantic tool bar, the query interface and the combined search.
      ARC is an API for LAMP-based (Linux-Apache-MySQL-PHP) websites. Its goal is to reach out to the larger Web developer community, to enable the combination of efforts like microformats with the utility of selected RDF solutions such as agile data storage, run-time model changes, standardized query interfaces, and mashup chaining. ARC tries to keep things simple and flexible. All features are backed by practical use cases. One of the underlying premises of ARC is that RDF is a productivity booster that can make website implementation much faster if it's used pragmatically.

      ARC includes the following capabilities:

    • Parsers for RDF/XML, Turtle, SPARQL + SPOG, Legacy XML, HTML "tag soup," RSS 2.0, and others.
    • Serializers for N-Triples, RDF/JSON, RDF/XML, Turtle, SPOG dumps.
    • RDF Storage using MySQL with support for SPARQL queries
    • SemHTML RDF extractors for Duplin Core, eRDF, microformats, OpenID, RDFa
    • Use of remote stores, allowing the website to query remote SPARQL endpoints as if they were local stores (results are returned as native PHP arrays)
    • SPARQLScript, a SPARQL-based scripting language combined with output templating
    • Light-weight inferencing
    ARC applications and websites. Of as much interest as ARC itself are the numerous applications and extensions that have already been built with it, many of which are useful for semantically enhancing websites on their own. The following are a few examples:
      • Trice - A Semantic Web framework (still in development).
      Calais Marmoset
        Marmoset, one of several Semantic Web tools from the OpenCalais project, is a simple yet powerful tool that makes it easy for publishers to generate and embed metadata in their content in preparation for Yahoo! Search's new open developer platform, SearchMonkey, as well as other metacrawlers and semantic applications. Marmoset uses the OpenCalais web service, which can provide search engine crawlers with rich semantic data to consider when they index a site's pages. Yahoo!'s search engine can analyze this semantic data, provided in Microformats, and other search engines are likely to follow. As a result, users accessing a Marmoset-enhanced website through search engines will get better targeted results.
      Other Resources
      Ontology Libraries One of the best features of ontologies is their design for reuse. It's not clear to me what happens when you encounter a dozen ontologies for "person" or "job", etc., in the ontology libraries on the web, but it's certainly useful that you can search for existing ontologies and bring the objects you want to model into your own ontology. There are a few ontologies for commonly used objects that are nearly defacto standards now: The following is a list of other resources available for finding ontologies on specific topics:
      • Simile Ontologies This library includes those developed by MIT as part of the Simile project as well as a list of others that have been used by the project.
      • Swoogle Swoogle is a research project being carried out by the ebiquity research group in the Computer Science and Electrical Engineering Department at the University of Maryland
      • Google Google can restrict its search to files of type "owl", as this sample search shows.
      • OntoSelect Ontology Library This library has an ontology search system with several unique and innovative features, including use of Wikipedia topics as the basis for one type of search.
      • BioPortal BioPortal is a sophisticated web application for accessing and sharing biomedical ontologies. It features several advanced search and visualization tools, as well as tools for mapping concepts between different ontologies.
      • SchemaWeb This is a comprehensive directory of RDF schemas which, in addition to typical browse-and-search interfaces, also provides an extensive set of web services to be used by software agents for processing RDF data.
      • Watson This link points to Watson's terrific web interface, which is one of the best for searching out ontologies that match your topics of interest. Watson also has a Protege plugin, but I haven't been able to make it work. The plugin, when working, would let a developer search and add classes to their ontology directly from within Protege.
      • TONES Ontology Repository This repository is primarily designed to be a central location for ontologies that might be of use to ontology tools developers for testing purposes.
      • Ping the Semantic Web Developed as a free web service by Zitgist, a company "incubated" by OpenLink, PingtheSemanticWeb (PTSW) is an archive of recently created/updated RDF documents on the web. If one of those documents is created or updated, its author can notify PTSW that the document has been created or updated by pinging the service with the URL of the document. PTSW is used by crawlers or other types of software agents to know when and where the latest updated RDF documents can be found. This dynamically updated library displays the 25 most recently updated ontologies, in real time. Using PTSW's data store, you can retrieve data on all RDF files by namespace or by class, with the option to download the files.
      Papers, Projects and Documentation
      • W3C Semantic Web Activity This portal can be thought of as the Semantic Web's "Home Page." It brings together a vast amount of primary source documentation of the Semantic Web's languages and other standard specifications, including OWL, RDF, RDFa in XHTML, and SPARQL. In addition, this portal gathers all the major ongoing projects involving the Semantic Web and the groups conducting them. The page also lists a large number of publications and presentations on Semantic Web topics.
      • Rich Tags This paper describes a proposal/project for developing a system that uses semantic tags for enhancing the searchability of web pages. (The proposal sounds similar to the W3C specification for RDFa in XHTML.)
      • Building A Semantic Website This article is a little old (2001), but has a good overview of the steps and components of building a web application using RDF ontologies.
      • TONES TONES is a European Union research project into the design and use of Thinking ONtologiES. Begun in 2005, it is scheduled to complete its work in 2008. The TONES website has links to all of the outputs of the project, including software tools and research papers. This PDF contains a 2006 presentation overview of the TONES project.
      • RapidOWL This methodology for developing OWL ontologies is based on the idea of iterative refinement, annotation and structuring of a knowledge base. A central paradigm for the RapidOWL methodology is the concentration on smallest possible information chunks. The collaborative aspect comes into play, when those information chunks can be selectively added, removed, annotated with comments or ratings. Design rationales for the RapidOWL methodology are to be light-weight, easy-to-implement, and support of spatially distributed and highly collaborative scenarios. This methodology is implemented in the OntoWiki software project.
      • Linked Data Comes of Age This very useful article clearly explains what is meant by linked data based on RDF and how it fits into the overarching vision of the Semantic Web.
      • Zitgist's Papers and Reports This is a useful list of resources on subjects relevant to Semantic Web research. The Zitgist Lab site also has a good page of documents on Best Practices for RDF.
      • RDF Schemas This site has a clear explanation of the various "vocabularies" used to develop ontologies: RDF, RDFS, OWL, and Dublic Core. The site also has a terrific list of resources for programmers.
      • Nodalities Magazine Sponsored by Talis, this free, bimonthly online magazine (released in PDF format) tries to bridge the divide between those building the Semantic Web and those interested in applying it to their business requirements. The magazine is supported by the Nodalities blog, podcasts, and Semantic Web development work.
      • DERI Papers and Reports This site contains a large collection of research papers and technical reports produced by DERI International.
      Business Resources This list includes companies I've encountered that appear to have substantial expertise in applying Semantic Web technologies to practical business requirements. BBN Technologies
        BBN is a technology company with a broad range of expertise, services, and products—including support for Semantic Web application development. As an indication of the impressive expertise of this company, BBN was the prime contractor for DARPA (Defense Advanced Research Projects Agency) in development of DAML (DARPA Agent Markup Language), which then led to their development of OWL. BBN also provides the Asio Tool Suite for third-party development and the open source Snoogle and Parliament tools.
        Cycorp is a leading provider of semantic technologies that bring intelligence and common sense reasoning to a wide variety of software applications. The Cyc software combines ontologies and knowledge bases with a powerful reasoning engine and natural language interfaces to enable the development of novel knowledge-intensive applications.
      Clark & Parsia
        Clark & Parsia is a small R&D firm—specializing in Semantic Web and advanced systems—based in Washington, DC. They have expertise in a range of semantic-web technologies, including OWL, RDF, reasoning at scale, and ontology development. They offer commercial support for Pellet, a best-of-breed Open Source OWL DL reasoner in Java, and related systems.
      Semantic Arts
        This company helps companies (medium/large with 1,000 to 10,000 employees) migrate to semantically-based SOAs (Service Oriented Architectures).
        Zitgist has a number of interesting products for viewing and querying the Semantic Web, as well as offering services for ontology development, content conversion, and web services. They also provide several open-source products for both consumer and corporate use in furthering use of the Semantic Web.
      Semantic Web Company
        The Semantic Web Company (SWC), based in Vienna, Austria, provides companies, institutions and organizations with professional services related to the Semantic Web, semantic technologies and Social Software. They provide services in consulting, education, and project management, among others.
        Talis has developed its own application development platform—the Talis Platform—and also builds Semantic Web applications for other organizations. To date, Talis' applications have been geared to meeting the needs of libraries and academic institutions.
        Semsol offers a wide range of Semantic Web-related services, from consulting and data modeling to interface design and production. Semsol is a pioneer in bringing Semantic Web technologies to widely deployed server and database environments. Semsol is the company behind development of the open-source tool ARC, as well as for several of the applications built on top of ARC, including Trice, SPARQLBot, and paggr (referenced earlier).
        Cortex's software platform and consulting business is based on their Competitiva system. Cortex’s technology proposes to mine unstructured data on the Web, using Competitiva's intelligent system to automatically convert pages and documents to a semantic format (i.e. RDF). Cortex has an R&D team working to bridge the Semantic Web gap by automatically enriching text with semantic content for themselves and their customers.
      • Google
      • Slashdot
      • Technorati
      • blogmarks
      • Tumblr
      • Digg
      • Facebook
      • Mixx

      Show Comments
      Just Say No To Flash