WordPress database error: [Table 'llscotts_mars.wp2_categories' doesn't exist]
SELECT cat_ID FROM wp2_categories WHERE category_nicename = 'software-musings/enterprise-software/knowledge-management'

Musings from Mars » Knowledge Management
Musings from Mars Banner Image
For Software Addicts: Yes!MaybeNah!
News Posts In Category <em></em>

News Posts In Category

February 21st, 2009

Taking a Snapshot of the Semantic Web:
Mighty Big, But Still Kinda Blurry

title text

It's still somewhat difficult to get a handle on exactly what is meant by the "Semantic Web," and whether today's technologies are truly able to realize the vision of Tim Berners-Lee, who first articulated it back in 1999. From what I've read, I think there's general agreement that we aren't even close to being "there" yet, but that many of the ongoing Semantic Web activities, technologies, development platforms, and new applications are a big leap beyond the unstructured web that still dominates today.

There is a huge, seemingly endless amount of work being done by thousands of groups all trying to contribute to making the Semantic Web a reality. In my few weeks of research, I still feel as though I've just stepped my toe into that vast lake of semantic experimentation. Partly as a result of the many disparate projects, however, it does become rather difficult to see the entire forest for all the tiny trees. That said, these thousands of groups do appear to be working more or less together on the basis of consensus-based open standards, and they have set up mechanisms to keep everyone abreast of new ideas, solutions, and projects, under the general leadership of the World Wide Web Consortium (W3C)'s Semantic Web Activity. Semantic Web Stack As Envisioned by Berners-Lee As a starting point for exploration into this topic, the Wikipedia article that describes the Semantic Web Stack is quite good. Among its good overview and many useful links, the article includes the original conception of the Stack as designed by Berners-Lee. Besides cataloguing the sheer number of different projects all tackling different aspects of building a Semantic Web, it's important to distinguish ongoing projects from those that expired years ago—a distinction that's not always readily apparent to those peering in from the outside. Even excluding these, there are far too many projects to read up on in a few weeks, so this snapshot is necessarily incomplete. But after having the content reviewed by some Semantic Web experts, I'm confident it includes all the most significant threads of this new web, which, as Berners-Lee envisioned it:
I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.
In my tour of the Semantic Web as it exists today, it's interesting to note that most of the projects are geared not toward machine-to-machine interaction, but rather to the traditional human-to-machine. Humans being by nature anthropocentric, the first steps being taken toward Berners-Lee's vision are to build systems that are semantically neutral with respect to human-to-human communication. Once we can reliably discuss topics without drifting off into semantic misunderstandings, then perhaps we can start teaching machines "what we mean by" ... This paper is an attempt to assess the current state of today's steps, while compiling a list of resources that would prove useful to someone thinking about building a Semantic Web application in 2009. Challenges to Building Semantic Web Applications The process of applying concepts from the Semantic Web to build richer, more knowledge-oriented applications presents developers with several, somewhat challenging prerequisites:
  • Taxonomies for the content being published,
  • Ontologies for the content, based on the developed taxonomies,
  • Content tagged using the developed ontologies,
  • Database tools for storing and serving RDF and/or OWL ontologies,
  • Database tools for connecting ontologies with the content they describe,
  • Application server specializing in querying and formatting semantic content,
  • User interface tools to present semantic content in optimum, not necessarily traditional, ways.
Ontology standards One of the base specifications for ontologies, RDF (Resource Description Framework), is a well established standard based on XML and URIs. that is the basis for all the news feeds and podcasts one can subscribe to today. DAML (DARPA Advanced Markup Language) is one of the early ontology standards built as an extension to XML and RDF. Still widely used, DAML is also the precursor to OWL. OWL (Web Ontology Language) is a sophisticated framework built on top of RDF and is perhaps the most well known and most adopted of such ontology languages. OWL is the standard adopted by the W3C (the official standards body for web specifications). At the moment, there are several different flavors of OWL, which makes adopting OWL more challenging than using RDF.
Each of these requirements present a fairly steep learning curve to developers who have not previously worked with the technologies to build Semantic Web applications. Solutions for aiding with some of the requirements exist, but it's not clear how effective they are at this stage. For example, I have listed some tools that assist in extracting semantics from unstructured documents, and others that do something similar with content stored in relational databases. On the other hand, the process of tagging unstructured content appears to have no good automated solutions. The process of building an ontology can be quite time-consuming, unless there happens to be an existing ontology you can reuse. There are several extensive online libraries of ontologies that can help. One fly in this bibliographical ointment, however, is the difficulty one may face in choosing among different, perhaps conflicting, ontologies on the same topic. It's important to emphasize that one of the first steps in building an ontology is to build a taxonomy. Although ontologies are not taxonomies, they use taxonomies as their jumping-off point. Therefore, one has to be ready and able to build a taxonomy for a subject before one can build an ontology. Many of the tools and projects included here are designed to assist with building and browsing a web of Linked Data, rather than true semantic data. Some of the demo browsers for linked data don't strike me as being particularly relevant to most end-user requirements for knowledge management. However, linked data is quite useful in integrating content from across the web, and projects built around it typically make heavy use of RDF, SPARQL, and other related specifications. Websites that make use of linked data represent the vanguard of the application of Semantic Web concepts, and their number appears to be exploding at the moment. Well-known examples of the use of linked data are websites built as "mashups," such as Google Maps. Use of Microformats and RDF Triples is also a typical component of websites that expose their content as linked data. More powerful tools exist in the form of integrated application server suites, such as the OpenLink Virtuoso server, Cyc Knowledge Server, and Intelligent Topic Manager. The first two have open source versions that can be used by developers to "dip their feet" into the task of building a Semantic Web applications using sample data. Of the two, I was most impressed with the breadth of tools in the OpenLink project, as well as with the range and vibrancy of the Virtuoso developer community. Virtuoso also comes with a rich set of user interface "widgets" that can be of great assistance in presenting semantic information appropriately. A Possible Approach To sum up, the landscape of the Semantic Web is still quite fuzzy and volatile, with many mountains of activity building up rapidly and eroding with nearly equal speed. Which landforms will remain once the evolution is complete is impossible to say here in 2008. However, the landscape is exciting to watch and flush with tantalizing experiments that will undoubtedly inspire more experimentation in the years ahead. Obviously, given all of the preceding caveats, the decision to engage in a Semantic Web experiment cannot be made lightly. One must have a clear idea of the knowledge management/presentation problem that such an experiment is designed to solve, and an understanding of the resources that will need to be devoted to the project. Although the maturity of tools, standards, and processes for such a project is quite young, it would definitely be in the interests of an organization with suitable candidate data and sufficient resources (including time) to begin an experiment of its own as a learning exercise.
What is an ontology? An ontology is a systematic description of concepts, in detail and thoroughness such that a machine encountering the concept could "understand" it. In this aspect, ontology development is closely related to research into artificial intelligence. In the past, humans have relied on complex taxonomies to describe the way abstract ideas and concrete individuals relate to one another. Ontologies differ from taxonomies in the complexity and thoroughness of describing the relationships between the elements of a taxonomy. A typical taxonomy is a tree structure that arrays terms as categories and subcategories. However, no subcategory has any notion of its relationship to its siblings, nor to any other categories elsewhere in the tree. An ontology can describe these relationships, thereby enriching one's understanding of what a given category means. Further, each category (or "concept", or "class") in the tree can have its own distinct properties. Properties describe the relationships between and among individuals in the ontology. Individuals are the specific instances of each class that the ontology needs to be able to describe. Properties are characteristics of a class that help distinguish one group of individuals from another. For example, if we have a class "job" in our ontology, with a subclass "administrative" and a further subclass "computer specialist," we could distinguish all the individuals who are computer specialists by defining the job's characteristics (properties). A computer specialist "writes software programs," "performs desktop support," "manages databases," "builds web applications," and so on. With an ontology, we could very richly define a group of individuals using such properties. This is a simplistic overview of properties… OWL provides a vast array of ways to describe properties and of the types of properties one can describe.
I would advise against a major expenditure for such an experiment, however. As noted, given the state of the technology, it strikes me as being unwise to invest a large sum in any commercial product to use as an application platform. Most of the tools that exist for building Semantic Web applications have open source licenses, so it makes sense to restrict experimentation to such tools for now. The data store chosen for such an experiment should ideally be one that currently suffers from being both fragmented and unstructured, existing in incompatible file formats and stored in different locations within the organization's Intranet—all factors that make it difficult for users to locate specific information. Given the uncertainties surrounding such an experiment, the data store chosen should also be one that is not so volatile that time pressures can cause discontinuities in content over the course of the project. Whoever undertakes such a Semantic Web experiment needs to be prepared to conclude that the effort required to bring their experiment to fruition is too great to justify the added value. Even if this were to prove true in 2009, I'm confident that the impressive swirl of activity taking place now will coalesce into truly usable techniques and tools within a few years. The standards on which the Semantic Web will be built are still evolving, but they are much more mature than the methods developers have built to turn those standards into working applications. Therefore, having gotten one's feet wet in the state of things this year will undoubtedly provide a solid foundation for building Semantic Web applications in coming years. The bulk of this report consists of a compilation of resources on various aspects of the Semantic Web and developing Semantic Web applications. The resources are divided into the following categories:
Ontology Development Tools
Protege
  • Comes in two "flavors": Version 3.4 handles both OWL and RDF ontologies, while 4.0 is geared toward the latest OWL standards only.
  • Impressive software for creating OWL ontologies.
  • User interface is well organized, given the complexity of the objects and properties you're dealing with. The interface also must handle multiple views of the information, and it does so quite well.
  • Numerous plugins for Protege make specific task work easier. There are many more plugins for Protege 3.4 than for 4.0 at this time.
  • One plugin enables database connections, with which you can import entire databases or tables, including their contents. Tables typically become OWL objects, and columns become object properties. Impressively, this tool also creates a complete form with which you can enter new instance information. Each form field can also be customized after creation.
  • Protege can also export ontologies to "OWL Document" format, which is a browsable HTML representation of the ontology.
  • Stanford is developing a web-based version of Protege. The beta URL is at Web Protege.
Protege Plug-Ins
  • OntoLT. The OntoLT approach aims at a more direct connection between ontology engineering and linguistic analysis. Used with Protege, OntoLT can automatically extract concepts (Protégé classes) and relations (Protégé slots) from linguistically annotated text collections. It provides mapping rules, defined by use of a precondition language that allow for a mapping between linguistic entities in text and class/slot candidates in Protégé. (This plug-in is only available for Protege 3.2.)
  • There are a wide array of plug-ins for Protege 3.2, and a much smaller set for 4.0. This page from the "old" Protege wiki has good links to the full library of Protege plug-ins.
Ontowiki
  • Ontowiki is a tool providing support for agile, distributed knowledge engineering scenarios. It facilitates the visual presentation of a knowledge base as an information map, with different views on instance data. It enables intuitive authoring of semantic content, with an inline editing mode for editing RDF content, similar to WYSIWIG for text documents. Ontowiki is built on the Powl platform. I have downloaded and installed an instance of Ontowiki on my home computer; the installation and configuration was quite simple.
Application Development Tools
The list in this section is just a small subset of the tools now available for building Semantic Web applications. There are several complete, continuously updated lists on the web, including those at SemWebCentral and the Semantic Web Company. Developer Resources
  • SemWebCentral is an Open Source development web site for the Semantic Web. It was established in January, 2004 to support the Semantic Web community by providing a free, centralized place for Open Source developers to manage Semantic Web software and content development. Another purpose is to provide resources for developers or other interested parties to learn about the Semantic Web and how to begin developing Semantic Web content and software. SemWebCentral has the following major portals:
  • Web Tools by category, a list of 148 projects organized by topic and a wide variety of other attributes.
  • Code snippets, an archive of code snippets, scripts, and functions developers have shared with the open source software community.
  • Learn About the Semantic Web, a collection of overviews, tutorials, and papers covering Semantic Web topics.
  • Programming With RDF is part of the RDF Schemas website. It has links to repositories of programmer resources by programming language, showing the kind of documentation, code, and tutorials covered by the repository.
  • Semantic Web Tools is a comprehensive list of over 700 developer tools now available for semantic-web-related projects. There are several such lists on the web, but this one is particularly good since it breaks the list down by category and language, making it much easier to narrow down the list you're interested in. This site is hosted by the Semantic Web Company.
  • Developers Guide to Semantic Web Toolkits collects links to Semantic Web toolkits for different programming languages and gives an overview about the features of each toolkit, the strength of the development effort and the toolkit's user community.
Frameworks Sesame
    • Extensions and Plugins
    • Rio, a set of parsers and writers for RDF that has been designed with speed and standards-compliance as the main concerns. Currently it supports reading and writing of RDF/XML and N-Triples, and writing of N3. Rio is part of Sesame, but can also be downloaded and used separately.
    • Elmo is a toolkit for developing Semantic Web applications using Sesame. Elmo wraps Sesame, providing a dedicated API for a number of well known web ontologies including Dublin Core, RSS and FOAF. The dedicated API makes it easier to work with RDF data for the supported ontologies. Elmo also offers a set of tools related to the supported ontologies, including an RDF crawler, a FOAF smusher and a FOAF validator.
  • Sesame is an open source Java framework for storing, querying and reasoning with RDF and RDF Schema. It can be used as a database for RDF and RDF Schema, or as a Java library for applications that need to work with RDF internally. Sesame is extremely flexible in how it's used and can work with a variety of data stores, including relational databases and native RDF files. It can be deployed as a server, or as a library incorporated into another application framework. For example, Sesame can be used simply to read a big RDF file, find the relevant information for an application, and use that information. Sesame provides the necessary tools to parse, interpret, query and store all this information, embedded in another application or, if appropriate, in a seperate database or even on a remote server. More generally, Sesame provides application developers a toolbox that contains all the necessary tools for building applications with RDF. Commercial support for Sesame is available from Aduna Software.

    Sesame also has a large ecosystem of addons and related toolsets. The following are the main links to these.

Jess
    Jess is a rule engine and scripting environment written entirely in Sun's Java language by Ernest Friedman-Hill at Sandia National Laboratories in Livermore, CA. Using Jess, you can build Java software that has the capacity to "reason" using knowledge you supply in the form of declarative rules. Jess is small, light, and one of the fastest rule engines available. Its powerful scripting language gives you access to all of Java's APIs. Jess includes a full-featured development environment based on the award-winning Eclipse platform.

    A Jess Plugin for Protege is available, integrating Jess development with your ontology.

Jena
    • ARQ, which is a query engine for Jena. ARQ supports multiple query languages (SPARQL, RDQL, and ARQ, the engine's own language), and besides Jena it can be used with general purpose engines and remote access engines. ARQ can also rewrite queries to SQL.
    • Joseki, an HTTP server-based system that support SPARQL queries. Joseki features a WebAPI for the remote query and update of RDF models, including both a client component and an RDF server. The Joseki server can run embedded in an application, as a standalone program, or as a web application inside a suitable application server (such as Tomcat). It provides the operations of query and update on models it hosts.
  • Jena is a Java framework for building Semantic Web applications. It provides a programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine. Jena is open source and grown out of work with the HP Labs Semantic Web Programme. Important tools related to the Jena framework include:
The Owl API
    • RDF/XML parser and writer
    • OWL/XML parser and writer
    • OWL Functional Syntax parser and writer
    • Turtle parser and writer
    • KRSS parser
    • OBO Flat file format parser
    • Support for integration with reasoners such as Pellet and FaCT++
  • The OWL API is an open-source Java interface and implementation for OWL, focused towards OWL 2 which encompasses OWL-Lite, OWL-DL and some elements of OWL-Full. The OWL API was used to build Protege 4.0 and was developed by Co-Ode, the company that works with Stanford University on the Protege project. It encompasses tool for the following tasks:
Powl
    Powl is a web-based platform for building applications designed to support collaborative building and managing of ontologies. It supports many of the features of mature tools like Protege, but for web applications that can be used for team development of ontologies. Powl is an open source project that uses PHP and various RDBMS systems on the back-end. Ontowiki is an example of a collaborative application built using Powl.
Visualization and Query Tools Jambalaya OntoVista
    The University of Georgia, as described in the next section of Semantic Applications, has built a large number of interesting semantic software. OntoVista is a particularly useful ontology visualization, navigation, and query tool based on Jambalaya. OntoVista is adaptable to the needs of different domains, especially in the life sciences. The tool provides a semantically enhanced graph display that gives users a more intuitive way of interpreting nodes and their relationships. Additionally, OntoVista provides comfortable interfaces for searching, semantic edge filtering and quick-browsing of ontologies.
SWRL (Semantic Web Rule Language)
    SWRL is intended to be the rule language of the Semantic Web and is based on OWL. It allows users to write rules to reason about OWL instances and to infer new knowledge about those instances.
Pellet
    Pellet is an open source, OWL DL reasoner in Java that is developed, and commercially supported, by Clark & Parsia LLC. Pellet provides standard and cutting-edge reasoning services. It also incorporates various optimization techniques described in the DL literature and contains several novel optimizations for nominals, conjunctive query answering, and incremental reasoning.

    Pronto is an extension of Pellet that enables probabilistic knowledge representation and reasoning in OWL ontologies. Pronto is distributed as a Java library equipped with a command line tool for demonstrating its basic capabilities. It is currently in development stage—more robust and mature than a mere prototype, but less mature than a production-level system like Pellet.

    Pronto offers core OWL reasoning services for knowledge bases containing uncertain knowledge; that is, it processes statements like “Bird is a subclass-of Flying Object with probability greater than 90%” or “Tweety is-a Flying Object with probability less than 5%”. The use cases for Pronto include ontology and data alignment, as well as reasoning about uncertain domain knowledge generally; for example, risk factors associated with medical conditions like breast cancer.

OWL Ontology Validator
  • This online tool, developed as part of the WonderWeb Project, attempts to validate an ontology against the different "species" of OWL. Any constructs found which relate to a particular species will be reported. In addition, if requested, the validator will return a description of the classes, properties and individuals in the ontology in terms of the OWL Abstract Syntax.
Seamark Navigator
    Seamark Navigator is part of the commercial Information Access Platform from Siderean. Navigator is the relational navigation server component,which discovers and indexes content, pre-calculates relationships and suggests paths for data exploration. Its primary architectural components include a metadata aggregator, a scalable RDF store, and a relational navigation engine, all within an industry-standard Web services interface.
Unstructured Content Mining Tools Calais
  • The Calais Web Service automatically creates rich semantic metadata for the content you submit – in well under a second. Using natural language processing, machine learning and other methods, Calais analyzes your document and finds the entities within it. Calais goes beyond classic entity identification and returns facts and events hidden within your text as well.
Cortex Competitiva Platform
  • Cortex Competitiva employs collectively both state-of-the-art text mining technologies and consolidated techniques in data mining. The main modules of the platform are Information Collection, Information Organization and Collaboration, and Information Use Analysis.
IdentiFinder Text Suite
    IdentiFinder Text Suite, a product of BBN Technologies, lets users quickly sift through documents, web pages, and email to discover relevant information. It helps solve the classic problems of text mining: First, how to identify significant documents and then, how to locate the most important information within them.
DL-Learner
    DL-Learner is a tool from AKSW for learning concepts in Description Logics (DLs) from user-provided examples. Equivalently, it can be used to learn classes in OWL ontologies from selected objects. The goal of DL-Learner is to construc knowledge about existing data sets. With DL-Learner, users provide positive and negative examples from a knowledge base for a not yet defined concept. The goal of DL-Learner is to derive a concept definition so that when the definition is added to the background knowledge all positive examples follow and none of the negative examples follow. See also the Wikipedia entry for ILP (Inductive Logic Programming). What DL-Learner considers is the the ILP problem applied to Descriptions Logics / OWL.
Transformation Tools GRDDL
    GRDDL is a mechanism for Gleaning Resource Descriptions from Dialects of Languages. It is a technique for obtaining RDF data from XML documents and in particular XHTML pages. GRDDL provides an inexpensive set of mechanisms for bootstrapping RDF content from XML and XHTML. GRDDL does this by shifting the burden of formulating RDF away from the author to transformation algorithms written specifically for XML dialects such as XHTML. A repository of transformations is available.
RDFizers
    The Simile project has developed a large number of "RDFizers," which convert various file formats into RDF. This page also contains links to the many RDFizers developed by other organizations to handle even more document types.
Database Tools
Query Languages and Tools SPARQL Query Language for RDF
    SPARQL is a w3c specification for querying RDF repositories. It can be used to express queries for native RDF files or for RDF generated from stored ontologies via middleware. he results of SPARQL queries can be results sets or RDF graphs.
Owlgres
    Owlgres is an open source, scalable reasoner for OWL2. Owlgres combines Description Logic reasoning with the data management and performance properties of an RDBMS. Owlgres is intended to be deployed with the open source PostgreSQL database server. Owlgres’s primary service is conjunctive query answering, using SPARQL-DL.
D2RQ
    D2RQ is a declarative language to describe mappings between relational database schema and OWL/RDF ontologies. The D2RQ platform uses these mapping to enables applications to access RDF views on a non-RDF database through the Jena and Sesame APIs, as well as over the Web via the SPARQL Protocol and as Linked Data.
Conversion/Transformation Tools OntoSynt
    OntoSynt provides automatic support for extracting from a relational database schema its conceptual view. That is, it extracts semantics "hidden" in the relational sources by wrapping them by means of an ontology. The approach is specifically tailored for semantic information access, enabling queries over an ontology to be answered by using the data residing in its relational sources. Its web interface accepts an XML representation of an RDBMS schema, which can be generated using a tool like SQL Fairy.
Relational.OWL
    Relational.OWL is an open source application that automatically extracts the semantics of virtually any relational database and transforms this information automatically into RDF/OWL ontologies that can be processed by Semantic Web applications.
Triplify SQL Fairy
    SQL Fairy is a group of Perl modules that manipulate structured data definitions (mostly database schemas) in interesting ways, such as converting among different dialects of CREATE syntax (e.g., MySQL-to-Oracle), visualizations of schemas, automatic code generation, converting non-RDBMS files to SQL schemas (xSV text files, Excel spreadsheets), serializing parsed schemas (e.g., via XML), creating documentation (e.g., HTML), and more.
Application Servers
OpenLink Virtuoso Universal Server
  • Virtuoso, developed by OpenLink Software, is a complex product that appears to be a total solution for hosting Semantic Web applications, among other uses. In the company's words, from a recent release: "Virtuoso enables end users, systems architects, systems integrators, and developers to interact with data at the conceptual as opposed to the traditional logical level. Data about customers, suppliers, invoices, and orders, stored in existing ODBC- or JDBC-accessible database systems such as Oracle, Informix, Ingres, SQL Server, Sybase, Progress, and MySQL, can be presented in RDF form for use in Semantic Web applications."
  • Virtuoso is also available in an Open Source Edition, a very active project that includes a large number of modules for use with various content management systems. The main difference between the open source and commercial editions of Virtuoso is the Virtual Database Engine, which essentially enables an application to incorporate multiple data servers in its queries.

    Also available as open source from OpenLink is its OpenLink Ajax Toolkit (OAT), which comes with a wide range of user interface and data widgets, as well as complete applications for building data queries, designing databases, and designing web forms. The OpenLink Data Explorer is one of these standalone OAT applications. Widgets that are part of OAT include:

    The standalone applications running on the Open-Source Edition all incorporate widgets from the OAT to create quite robust, desktop-application-like tools (the username/password for all of these is demo/demo):
  • OpenLink also provides OpenLink Data Spaces (ODS), which run on the Virtuoso server, either the commercial or open-source editions. ODS enables developers to create a presence in the Semantic Web via Data Spaces derived from Weblogs, Wikis, Feed Aggregators, Photo Galleries, Shared Bookmarks, Discussion Forums and more. Data Spaces thus provide a foundation for the creation, processing and dissemination of knowledge for the emerging Semantic Web. ODS is pre-installed as part of the demonstration database bundled with the Virtuoso Open-Source Edition. Existing ODS modules include:
Cyc Knowledge Server Intelligent Topic Manager
  • Intelligent Topic Manager (ITM) is a commercial semantic software platform that enables a wide range of applications in enterprise information systems. ITM is designed to help organizations leverage, organize and model content and knowledge, to manage business reference models and taxonomies, to categorize and classify content, and to empower search. The platform consists of the following components and functionalities:
Oracle Semantic Technologies
  • Oracle Spacial 11g is an open, scalable RDF management platform. Based on a graph data model, RDF triples are persisted, indexed and queried, similar to other object-relational data types. Application developers can use the Oracle server to design and develop a wide range of semantic-enhanced business applications.
Asio Tool Suite Available from BBN, the Asio Tool Suite is focused primarily on building Semantic Web applications by integrating an enterprise's existing databases and systems without the need for complete reengineering. Designed to address the volume, variety, and exponential increase in enterprise data, the Asio Tool Suite supports information discovery via Semantic Web standards and provides for data accessibility via queries posed in a user’s own ontology. The suite further enables integration of systems by building bridges in semantic meaning from one system to another. The suite consists of the following components: Parliament
    Asio Parliament, released as open source, implements a high-performance storage engine that is compatible with the RDF and OWL standards. However, it is not a complete data management system. Parliament is typically paired with a query processor, such as Sesame or Jena, to implement a complete data management solution that incorporates SPARQL standards. In addition, Parliament includes a high-performance SWRL-compliant rule engine, which serves as an efficient inference engine. An inference engine examines a directed graph of data and adds data to it based on a set of inference rules. This enables Parliament to fill in gaps in the data automatically and transparently, inferring additional facts and relationships in the data to enrich query results.
Cartographer
    Asio Cartographer is a graphical ontology mapper based on SWRL. It utilizes the core functionality of BBN's Snoggle open-source mapping tool to assist in aligning OWL ontologies. It lets users visualize ontologies and then draw mappings between them on an intuitive graphical canvas. Cartographer then transforms those maps into SWRL/RDF or SWRL/XML for use in a knowledge base.
Scout
    Asio Scout provides semantic bridges to relational databases and web services that let an organization keep their existing systems in place for as long as necessary to, for example, support ongoing operations. Scout's semantic bridges act like any passive data consumer, but unlike other counterparts, their functionality— in concert with Asio Semantic Query Distribution's high-level perspective—enables consolidated knowledge discovery that wasn't previously conceivable. Scout can be used for web portals, standalone desktop applications, or web-enabled applications.
Semantic Application Demos
Browsers and Search Portals
  • Disco - Hyperdata Browser is a simple browser for navigating the Semantic Web as an unbound set of data sources. The browser renders in HTML all information that it can find on the Semantic Web about a specific resource. This resource description contains hyperlinks that allow you to navigate between resources. While you move from one resource to another, the browser dynamically retrieves information by dereferencing HTTP URIs and by following rdfs:seeAlso links.
  • Umbel Subject Concepts Explorer is a lightweight ontology structure for relating Web content and data to a standard set of subject concepts. Its purpose is to provide a fixed set of reference points in a global knowledge space. These subject concepts have defined relationships between them, and can act as binding or attachment points for any Web content or data.
  • Openlink Data Explorer is one product developed from the open-source version of the Virtuoso Universal Server product. This is the platform used by the DBPedia project, including the demos on the DBPedia page. The demo below shows the XHTML view option of a Data Viewer ontology query.
  • Zitgist DataViewer lets users browse linked data on the web, starting from an RDF or OWL ontology URL.
  • The Sindice Semantic Web Index monitors, harvests existing web data published as RDF and Microformats and makes them available under a coherent umbrella of functionalities and services. Its index of data is presented as a search portal much like Google. Sindice is created at DERI, the world’s largest institute for Semantic Web research. It is based on DERI’s unique cluster technology which indexes and operates over terascale semantic data sets (trillions of statements) while also providing very high query throughputs per cluster size. Leveraging unique cluster technologies, Sindice performs sophisticated reasoning which dramatically enhances data reusability, search precision, and recall. It obtains data by focused crawling methods which detects and focuses on metadata rich internet sources.
  • The RKB Explorer is an application built using awards data from the National Science Foundation (NSF). It has used this data to build ontologies around NSF grants, and users can search and browse the data through the Explorer. All URIs on this domain are resolvable, and search results deliver HTML or RDF, depending on the content. The browse interface provides viewing and navigating using RDF triples, and the query interface provides access using SPARQL. I discovered this useful application through a search on "NSF funding" using Sindice.
  • Marbles Linked Data Browser is a server-side application that formats Semantic Web content for XHTML clients using Fresnel lenses and formats. Colored dots are used to correlate the origin of displayed data with a list of data sources, hence the name. Marbles provides display and database capabilities for DBpedia Mobile.
  • The Cyc Foundation Concept Browser lets users search and browse the content of the OpenCyc knowledge base.
  • Brownsauce is a Semantic Web browser that lets users browse RDF files on the web. It runs as a local Java client and has a built-in Jetty web server. Brownsauce uses the Jena Semantic Web framework.
Ontology Viewers and Query Tools
  • DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to Wikipedia data. DBpedia is one of the projects developed/sponsored by AKSW. A wide variety of articles and publications about DBpedia have been published (see the Resources section of this report).
  • jSpace is a WebStart java application that demonstrates how one might search and query a given ontological database. There are several example database available to download for use with jSpace. jSpace's development was apparently inspired by mSpace. (mSpace was an innovative, but now defunct, project that attempted to merge the power of Google with the powerful interface of iTunes. Although the mSpace demo of a classical music explorer is not accessible now, it's well worth checking out the video demos of it.)
  • Owlsight is an innovative web application that uses the Google Web Toolkit and the Est JavaScript library to let users navigate OWL ontologies, browsing the relationships between classes, properties, and instances. Owlsight uses the Pellet ontoloty reasoner.
  • OpenCyc for the Semantic Web is both a project and an OWL ontology browser. Using this tool, users can access the entire OpenCyc content as downloadable OWL ontologies as well as via Semantic Web endpoints (i.e., permanent URIs). These URIs return RDF representations of each Cyc concept as well as a human-readable version when accessed via a Web Browser.
Knowledge/Content Management
  • The KiWi wiki project proposes a new approach to knowledge management that combines the wiki philosophy with the intelligence and methods of the Semantic Web. (KiWi stands for "Knowledge in a Wiki.")
  • DeepaMehta is a software platform for knowledge management. Knowledge is represented in a semantic network and is handled collaboratively. The DeepaMehta user interface is completely based on Mind Maps / Concept Maps. Instead of handling information through applications, windows and files, with DeepaMehta the user handles all kind of information directly and individually.
  • Semantic MediaWiki and SMW+are extensions to the MediaWiki platform, described elsewhere in this report.
Application Repositories
  • MIT's Simile project has been extremely creative and productive in applying concepts of linked data, RDF, and the Semantic Web generally to demonstration applications, all available as open source. (Simile is an acronym for "Semantic Interoperability of Metadata and Information in unLike Environments".) Some of its projects are included elsewhere in this report, but here is a list of some others relevant to the Semantic Web:
    • Longwell, a server application that applies concepts of faceted browsing with visualizing RDF stores.
    • PiggyBank is a Firefox add-on that enables users to develop "mashups" of web data by using "screen scrapers." The software also allows users to tag information found and embed RDF into their content.
    • RDFizers, described elsewhere in this report.
    • Referee, a server application that creates browsable RDF files from web server logs.
    • Welkin, an RDF visualizer built as a client-side java application. (Note: I couldn't get it to run on my Mac, even though MIT makes a Mac OS X disk image available.)
    • Fresnel, a vocabulary for displaying RDF.
    • Banach, a collection of operators that work on RDF graphs to infer, extend, emerge or otherwise transform a graph into another.
    • Data Collecton, a project that aims to develop a collection of RDF data sets that are generally useful for the metadata research and tools community.
  • DERI (Digital Enterprise Research Institute) International is the collection of bi-lateral agreements between like minded institutes working on the Semantic Web and Web Science. Its mission is to exploit semantics for people, organizations, and systems to collaborate and interoperate on a global scale. DERI conducts and funds research in Semantic Web technologies, conducts projects that have led to numerous prototype applications, and develops ontologies. The following are a few interesting links from DERI's Irish branch in Galway:
    • Research Clusters covering such topics as eLearning, Semantic Reality, Semantic Web Services, Industrial and Scientific Applications of Semantic Web Services, and Social Software. Each cluster has its own website and projects.
    • Research Projects, a lengthy list of ongoing projects.
    • Tools, a lengthy list of software tools available for download, typically from SourceForge.
  • University of Georgia's Large Scale Distributed Information Systems has a wide array of semantic applications available. The online repository has descriptions, downloads, and online demos. The applications cover such functions as visualization, ontology queries, ontology browsing, web services, and more.
  • 10 Semantic Apps To Watch From the ReadWriteWeb site, this is an intriguing list of new semantic-web-related applications that are now available out there. The article gives first explains what they mean by a "Semantic Application," and then briefly describes each application's innovative use of this new technology. The ten applications listed are:
  • It's also interesting to read the comments at the end of this article, many of which are from readers pointing out other semantic applications they have discovered.
Semantic Website Enhancements
Semantic Web Crawling: A Sitemap Extension
    This specification allows website managers to provide an RDF sitemap which would be visible to users browsing the Semantic Web.
Triplify
    Triplify is an open-source, light-weight add-on to web applications that can read the content of the application's relational database(s) and expose their inherent semantics. According to the Triplify website, for a typical Web application a configuration for Triplify can be created in less than an hour. Triplify is based on the definition of relational database queries for a specific Web application in order to retrieve valuable information and to convert the results of these queries into RDF, JSON and Linked Data. A "triplified" web application can then provide its data to other applications on the web, enabling use of its information in "mashups."

    The Triplify project already has configurations for a variety of widely used content management systems, such as OpenConf, WordPress, Drupal, Joomla!, osCommerce, and phpBB. (The page that has links to these configurations also has a great list of other Semantic Web resources.) Triplify is one of the applications developed by AKSW. (I plan to download Triplify and integrate it in an instance of WordPress on my home computer.)

Microformats
    Microformats are orthogonally related to the Semantic Web through their use of RDF-like attributes in CSS Class elements. Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards. They are highly correlated with semantic XHTML, sometimes referred to a "real world semantics", or "lossless XHTML." Microformats are designed to enable more/better structured blogging and web publishing. The Microformats site provides an array of code and tools for use in producing markup in microformats.
RDFa in HTML
    RDFa in HTML is a proposed W3C specification that enables markup of RDF-like syntax into XHTML content. RDFa in XHTML provides a set of XHTML attributes to augment human-readable contenta with machine-readable hints. It enables the expression of simple and more complex datasets using RDFa, and in particular turns the existing human-visible text and links into machine-readable data without repeating content. The goals and approach of this specification are similar to that of Microformats, but it extends XHTML by use of and RDF-like syntax rather than using CSS classes.
Exhibit
    Exhibit is a three-tier web application framework written in Javascript, which you can use with various kinds of data files, including JSON and RDF, to produce knowledge-enhancing "mashups" like Google Maps. Exhibit creates interactive user interfaces displaying record data sets on maps, timelines, scatter plots, interactive tables, etc. Exhibit is one of the projects in knowledge management developed by MIT, partly with NSF funding.
Semantic MediaWiki
    Semantic MediaWiki (SMW) is a free extension of MediaWiki – the wiki system powering Wikipedia – that helps to search, organise, tag, browse, evaluate, and share the wiki's content. While traditional wikis contain only texts that computers can neither understand nor evaluate, SMW adds semantic annotations that bring the power of the Semantic Web to the wiki.
SMW+
    SMW+ is Ontoprise's production version of the open source Semantic MediaWiki + Halo Extension software, which was originally developed as part of the 2003-04 Halo project for scientific information discovery. SMW+ makes the process of annotating wiki content much easier by adding a variety of useful interface tools, and it also helps writers research information by using the wiki's built-in ontology browser. SMW+ is designed to enable and enhance knowledge collaboration in organizations. It's available as a free download from Sourceforge, or as a reasonably priced bundled version for Windows. Ontoprise also offers service contracts for the product. The impressive detailed list of features on the Ontoprise website gives a good overview of SMW+ capabilities. These include:
    • Semantic Toolbar: Lets users create, inspect and alter semantic annotations in the wiki text without knowing the annotation syntax.
    • Advanced Annotation Mode: In this mode, wiki pages are displayed in the same way as they are displayed in the standard view mode. However, users can easily add annotations by simply highlighting the word or passage they want to annotate.
    • Ontology Browser: Allows easy navigation through the wiki's ontology without the need to access individual articles. It helps the user to understand the ontology and to keep an overview about it.
    • Question Formulation Interface: Normally, making queries against the semantic wiki involve knowing and using a complex syntax. The Question Formulation Interface provides a graphical interface that lets inexperienced users easily compose their own queries.
    • Auto completion: This tool greatly simplifies users' ability to generate annotations. With auto completion activated, users don't have to care about correct spelling of an article’s or property's name, because the tool extracts possible completions from the semantic context. For example, it checks what attribute values are possible for a particular attribute and show only these to the user. This tool is used in the wiki text editor, the semantic tool bar, the query interface and the combined search.
    ARC
      ARC is an API for LAMP-based (Linux-Apache-MySQL-PHP) websites. Its goal is to reach out to the larger Web developer community, to enable the combination of efforts like microformats with the utility of selected RDF solutions such as agile data storage, run-time model changes, standardized query interfaces, and mashup chaining. ARC tries to keep things simple and flexible. All features are backed by practical use cases. One of the underlying premises of ARC is that RDF is a productivity booster that can make website implementation much faster if it's used pragmatically.

      ARC includes the following capabilities:

    • Parsers for RDF/XML, Turtle, SPARQL + SPOG, Legacy XML, HTML "tag soup," RSS 2.0, and others.
    • Serializers for N-Triples, RDF/JSON, RDF/XML, Turtle, SPOG dumps.
    • RDF Storage using MySQL with support for SPARQL queries
    • SemHTML RDF extractors for Duplin Core, eRDF, microformats, OpenID, RDFa
    • Use of remote stores, allowing the website to query remote SPARQL endpoints as if they were local stores (results are returned as native PHP arrays)
    • SPARQLScript, a SPARQL-based scripting language combined with output templating
    • Light-weight inferencing
    ARC applications and websites. Of as much interest as ARC itself are the numerous applications and extensions that have already been built with it, many of which are useful for semantically enhancing websites on their own. The following are a few examples:
      • Trice - A Semantic Web framework (still in development).
      Calais Marmoset
        Marmoset, one of several Semantic Web tools from the OpenCalais project, is a simple yet powerful tool that makes it easy for publishers to generate and embed metadata in their content in preparation for Yahoo! Search's new open developer platform, SearchMonkey, as well as other metacrawlers and semantic applications. Marmoset uses the OpenCalais web service, which can provide search engine crawlers with rich semantic data to consider when they index a site's pages. Yahoo!'s search engine can analyze this semantic data, provided in Microformats, and other search engines are likely to follow. As a result, users accessing a Marmoset-enhanced website through search engines will get better targeted results.
      Other Resources
      Ontology Libraries One of the best features of ontologies is their design for reuse. It's not clear to me what happens when you encounter a dozen ontologies for "person" or "job", etc., in the ontology libraries on the web, but it's certainly useful that you can search for existing ontologies and bring the objects you want to model into your own ontology. There are a few ontologies for commonly used objects that are nearly defacto standards now: The following is a list of other resources available for finding ontologies on specific topics:
      • Simile Ontologies This library includes those developed by MIT as part of the Simile project as well as a list of others that have been used by the project.
      • Swoogle Swoogle is a research project being carried out by the ebiquity research group in the Computer Science and Electrical Engineering Department at the University of Maryland
      • Google Google can restrict its search to files of type "owl", as this sample search shows.
      • OntoSelect Ontology Library This library has an ontology search system with several unique and innovative features, including use of Wikipedia topics as the basis for one type of search.
      • BioPortal BioPortal is a sophisticated web application for accessing and sharing biomedical ontologies. It features several advanced search and visualization tools, as well as tools for mapping concepts between different ontologies.
      • SchemaWeb This is a comprehensive directory of RDF schemas which, in addition to typical browse-and-search interfaces, also provides an extensive set of web services to be used by software agents for processing RDF data.
      • Watson This link points to Watson's terrific web interface, which is one of the best for searching out ontologies that match your topics of interest. Watson also has a Protege plugin, but I haven't been able to make it work. The plugin, when working, would let a developer search and add classes to their ontology directly from within Protege.
      • TONES Ontology Repository This repository is primarily designed to be a central location for ontologies that might be of use to ontology tools developers for testing purposes.
      • Ping the Semantic Web Developed as a free web service by Zitgist, a company "incubated" by OpenLink, PingtheSemanticWeb (PTSW) is an archive of recently created/updated RDF documents on the web. If one of those documents is created or updated, its author can notify PTSW that the document has been created or updated by pinging the service with the URL of the document. PTSW is used by crawlers or other types of software agents to know when and where the latest updated RDF documents can be found. This dynamically updated library displays the 25 most recently updated ontologies, in real time. Using PTSW's data store, you can retrieve data on all RDF files by namespace or by class, with the option to download the files.
      Papers, Projects and Documentation
      • W3C Semantic Web Activity This portal can be thought of as the Semantic Web's "Home Page." It brings together a vast amount of primary source documentation of the Semantic Web's languages and other standard specifications, including OWL, RDF, RDFa in XHTML, and SPARQL. In addition, this portal gathers all the major ongoing projects involving the Semantic Web and the groups conducting them. The page also lists a large number of publications and presentations on Semantic Web topics.
      • Rich Tags This paper describes a proposal/project for developing a system that uses semantic tags for enhancing the searchability of web pages. (The proposal sounds similar to the W3C specification for RDFa in XHTML.)
      • Building A Semantic Website This article is a little old (2001), but has a good overview of the steps and components of building a web application using RDF ontologies.
      • TONES TONES is a European Union research project into the design and use of Thinking ONtologiES. Begun in 2005, it is scheduled to complete its work in 2008. The TONES website has links to all of the outputs of the project, including software tools and research papers. This PDF contains a 2006 presentation overview of the TONES project.
      • RapidOWL This methodology for developing OWL ontologies is based on the idea of iterative refinement, annotation and structuring of a knowledge base. A central paradigm for the RapidOWL methodology is the concentration on smallest possible information chunks. The collaborative aspect comes into play, when those information chunks can be selectively added, removed, annotated with comments or ratings. Design rationales for the RapidOWL methodology are to be light-weight, easy-to-implement, and support of spatially distributed and highly collaborative scenarios. This methodology is implemented in the OntoWiki software project.
      • Linked Data Comes of Age This very useful article clearly explains what is meant by linked data based on RDF and how it fits into the overarching vision of the Semantic Web.
      • Zitgist's Papers and Reports This is a useful list of resources on subjects relevant to Semantic Web research. The Zitgist Lab site also has a good page of documents on Best Practices for RDF.
      • RDF Schemas This site has a clear explanation of the various "vocabularies" used to develop ontologies: RDF, RDFS, OWL, and Dublic Core. The site also has a terrific list of resources for programmers.
      • Nodalities Magazine Sponsored by Talis, this free, bimonthly online magazine (released in PDF format) tries to bridge the divide between those building the Semantic Web and those interested in applying it to their business requirements. The magazine is supported by the Nodalities blog, podcasts, and Semantic Web development work.
      • DERI Papers and Reports This site contains a large collection of research papers and technical reports produced by DERI International.
      Business Resources This list includes companies I've encountered that appear to have substantial expertise in applying Semantic Web technologies to practical business requirements. BBN Technologies
        BBN is a technology company with a broad range of expertise, services, and products—including support for Semantic Web application development. As an indication of the impressive expertise of this company, BBN was the prime contractor for DARPA (Defense Advanced Research Projects Agency) in development of DAML (DARPA Agent Markup Language), which then led to their development of OWL. BBN also provides the Asio Tool Suite for third-party development and the open source Snoogle and Parliament tools.
      Cycorp
        Cycorp is a leading provider of semantic technologies that bring intelligence and common sense reasoning to a wide variety of software applications. The Cyc software combines ontologies and knowledge bases with a powerful reasoning engine and natural language interfaces to enable the development of novel knowledge-intensive applications.
      Clark & Parsia
        Clark & Parsia is a small R&D firm—specializing in Semantic Web and advanced systems—based in Washington, DC. They have expertise in a range of semantic-web technologies, including OWL, RDF, reasoning at scale, and ontology development. They offer commercial support for Pellet, a best-of-breed Open Source OWL DL reasoner in Java, and related systems.
      Semantic Arts
        This company helps companies (medium/large with 1,000 to 10,000 employees) migrate to semantically-based SOAs (Service Oriented Architectures).
      Zitgist
        Zitgist has a number of interesting products for viewing and querying the Semantic Web, as well as offering services for ontology development, content conversion, and web services. They also provide several open-source products for both consumer and corporate use in furthering use of the Semantic Web.
      Semantic Web Company
        The Semantic Web Company (SWC), based in Vienna, Austria, provides companies, institutions and organizations with professional services related to the Semantic Web, semantic technologies and Social Software. They provide services in consulting, education, and project management, among others.
      Talis
        Talis has developed its own application development platform—the Talis Platform—and also builds Semantic Web applications for other organizations. To date, Talis' applications have been geared to meeting the needs of libraries and academic institutions.
      Semsol
        Semsol offers a wide range of Semantic Web-related services, from consulting and data modeling to interface design and production. Semsol is a pioneer in bringing Semantic Web technologies to widely deployed server and database environments. Semsol is the company behind development of the open-source tool ARC, as well as for several of the applications built on top of ARC, including Trice, SPARQLBot, and paggr (referenced earlier).
      Cortex
        Cortex's software platform and consulting business is based on their Competitiva system. Cortex’s technology proposes to mine unstructured data on the Web, using Competitiva's intelligent system to automatically convert pages and documents to a semantic format (i.e. RDF). Cortex has an R&D team working to bridge the Semantic Web gap by automatically enriching text with semantic content for themselves and their customers.
          
      • del.icio.us
      • Google
      • Slashdot
      • Technorati
      • blogmarks
      • Tumblr
      • Digg
      • Facebook
      • Mixx
      April 12th, 2006

      Web-Based Collaborative Editing: Twiki, Tiddly, or TikiWiki?

      Wiki ExplosionI spent a few weeks in December 2005 investigating the universe of wiki software, and confirmed what I already suspected: It’s a very big universe with many wikis! It would be impossible to explore them all, so I first tried to come up with a short list of wiki engines to focus on. Fortunately, there are a number of excellent sites that attempt to provide matrices of wiki software functions and abilities. Here are a few I used and recommend:

      After studying these various resources, I was able to narrow the list of wikis down to the following:

      MediaWiki was the default choice, since I assumed it was probably the best of the lot, given its starring role in powering Wikipedia and just about every other high-profile wiki you encounter on the web. After a painless default installation of MediaWiki, I had the usual MediaWiki shell and did a few quick walk-throughs of the structure just to make sure all the plumbing was in place. It seemed to be, so I proceeded to install a few of the others from my short list.

      In fairly quick succession, I installed Dokuwiki, PMwiki, and Tikiwiki, reviewed their documentation and capabilities, and did some basic configurations. They all seemed to be reasonably good, but none was noticeably superior, at first glance, to my initial configuration of MediaWiki. It seemed to make sense to stick with MediaWiki, given its large market share and equally large mind-share.

      So, over a period of about 2 days, I began trying to configure MediaWiki to do some things beyond its default behavior–things I knew would be needed to provide a useful wiki for my target, non-technical clientele.

      What a mess! I had spent 2 solid days without accomplishing much of anything toward setting up the desired wiki, which by the way was intended for use by a Federal organization that was interested in testing the use of wikis for developing and maintaining standard operating procedures for its divisions and branches.

      Here is a summary of the problems I encountered with MediaWiki:

      1. Basic help on structured wiki markup was not available from within the software. In fact, no help files were loaded by default. Users are expected to create their own help pages.
      2. Basic help on structured wiki markup was not available from within the software. In fact, no help files were loaded by default. Users are expected to create their own help pages.
      3. The software’s documentation is terrible. The main problem is that there are so many sources of information, you get conflicting instructions. Many of the conflicts have to do with the various versions of mediawiki (1.3, 1.4, 1.5, etc).
      4. Creating simple navigation is quite difficult. One approach to navigation is to use “sub-pages,” but then forming links is tricky, and the page names include their parents by default. In other words, the relationships are discovered strictly by naming. Using piping, it’s possible to make the link text look OK, but the titles on the pages are another issue.
      5. MediaWiki includes no basic, web-based administration tools at all. In fact, there’s no detection of sysadmin capability at all in the interface. To change the links in the Navigation box, for example, it turns out (after hours of hunting) that you are supposed to change the text in a page called Special:Allmessages. Not exactly intuitive, and it’s set up by default so as to be editable by anyone.
      6. Another useful navigation feature–breadcrumbs–don’t exist, and they can’t be created without custom coding. (There’s an extension for this, but it only works in an older version of MediaWiki.)
      7. Skinning is also very difficult compared with the other wiki software I had looked at.
      8. A basic requirement for this project that I understood was not natively wiki-like was the need for some basic authentication and the ability to write-protect certain parts of the wiki tree for different groups. MediaWiki has a plugin for authentication, but it turns out that anyone who has administrator privileges can edit any part of the tree, and that wasn’t going to be sufficient in my security-conscious Federal agency.

      After this experience, I decided to return to the drawing board, and take a second look at the short list packages. I also added a new one: Twiki. It’s written in Perl and uses flat files, but appears to be much more “mature” than some of the others.

      In general, my impression after working with these various software packages is that wiki software is not nearly as “mature” as blog software. I was looking for an open-source wiki that would be as powerful as WordPress is in the blog world, while also being as easy to design, configure and administer as WordPress.

      Twiki wasn’t much better, and neither was MoinMoin, which I also ended up checking out (even though MoinMoin is written in Python, and I had no Python programmers to call on). Despite much positive press, MoinMoin has the same deficiencies as other wiki software. And what are those?

      Basically, wikis were developed for use by programmers as a way of sharing information on software projects. They developed around a culture of highly sophisticated hacker-types who didn’t need a lot of hand-holding when it came to navigation. The main concern was to allow rapid development of pages on a new topic, with automatic links to pages that hadn’t yet been written (but which needed to be written). Wikis were designed to grow organically, as one writer filled in the blanks in another’s page by adding information to it through hyperlinks, or as multiple writers contributed to fleshing out the details on a particular topic. In both cases, the result was to produce a decentralized information resource that relied primarily on search for finding things.

      On Wikipedia today, it’s become clear to those “in charge” that strong editorial oversight is needed to keep a wiki useful. For one thing, wikis don’t automatically understand synonymous terms. One person may write a page that has a link to a new page called “WikiSystems”, and another may already have filled in a page called “WikiSoftware.” Unless someone were watching “from above,” you could end up with two pages that covered pretty much the same ground.

      Also, notice the terms “WikiSystems” and “WikiSoftware.” In wikis, the default way of linking is to write new pages in what is known as “camel case:” Two words “munged” together, each having an initial cap. Wiki software is designed to recognize camel-cased terms and to automatically hyperlink them. Again, this is useful in its original conception, but it’s not particularly intuitive for a nontechnical user base such as you would find in most business or government organizations.

      Another shortcoming that many wikis don’t handle well is authentication. Most wikis are designed to allow content editing by anyone. Most also allow administrators to restrict editing to registered users only. However, the ability to restrict access to certain pages to only certain people is not a native ability in most wiki systems.

      Pasted GraphicBefore I get around to describing the software I ultimately selected, I want to include my impressions of a few commercial software packages that have developed in the last year in an attempt to feed the growing market for wikis in corporate Intranets. One of the most well-known is Jotspot, an outsourced wiki system that can be purchased for a monthly fee. Jotspot is probably the most advanced wiki of this type, although since December there have been a fairly large number of newer entrants to the field, and it’s possible that Jotspot has some good competitors by now. Jotspot is actually more of a full-blown Intranet than a wiki. Indeed, it shares this characteristic with Twiki, which branches out way beyond the central wiki functionality. Besides being a wiki, Jotspot (and Twiki) comes with a large number of plug-in applications that can be used for various Intranet functions (e.g., Project Management, Bug Reporting, Company Directory, Knowledge Base, Call Log Management, Blogging, Group Calendaring, Meeting Management, Polls and surveys, Personal to-do lists, etc.) The hosted version has a reasonable price tag, maxing out at $199 a month for unlimited users.

      Jotspot also has an enterprise version for companies that want to host the software themselves. I set up a test wiki at Jotspot, and although it definitely has a lot to offer, it also isn’t nearly as configurable as one of the open-source packages. In addition, I felt certain I could find a perfectly good wiki package for my target organization without investing a lot of money.

      Another impressive, hosted wiki-like system is Backpack, and I also set up a test there. However, Backpack is designed to work best as a personal wiki, rather than for collaboration. The same company also makes a web application called Basecamp that looks like an ideal solution for project management uses, but is not designed for documentation or knowledge management–the two main uses that this pilot wiki would be put to.

      And if anyone was interested in a personal wiki, I don’t think you could do much better than Tiddlywiki, an amazing, rich-web interface “wiki on a stick” that literally packs all of its information into a single portable file. It works an amazing amount of magic that could possibly be useful collaboratively, but that is designed to work best for individuals.

      Finally, I looked at Projectforum, a commercial package that the customer was interested in. It turns out that Projectforum is not a wiki system, actually. Rather, it’s a discussion forum package (there are hundreds–possibly thousands–of such packages) that is trying to leverage the buzz around the term “wiki” and RSS.

      The critical difference is that a wiki is primarily a content management system, not a system for user discussions. MediaWiki uses the term “collaborative editing,” because wikis typically have built-in discussion forums for each piece of content that gets added to the wiki. For example, if I post a Standard Operating Procedure on designing a website, readers would have the ability to create a discussion about that SOP. Also very important is the ability for users to interlink content into a growing content tree, producing in the end a very useful knowledge-base of information on a given topic.

      Projectforum doesn’t have those features, and is missing other standard wiki features as well. As its name implies, Projectforum is actually designed for project management rather than content/document management, and it excels at the collaborative discussion part of project management. In that sense, it is similar to Basecamp.

      So after this market review, I had almost concluded that no wiki was really yet up to the challenge I was hoping to put it to, when I decided to try a relatively new, little known package called Wiclear. After reading through the website documentation, I tried to quell my growing excitement, because on paper at least, Wiclear was designed to overcome all of the shortcomings that were so obvious in all the wikis I’d tried.

      Developed by a French programmer and modeled after a French blog system called Dotclear, Wiclear shares with nearly all other wikis the virtue of being open-source. Meaning, I can freely download the source code and install it. Wiclear is written using PHP, an increasingly popular web programming language, and the open source database MySQL. Since I happen to have some expertise in both, I felt comfortable with the prospect of possibly having to tweak the system to my requirements.

      Indeed, after only 3 hours of work, I was able to configure Wiclear with all the basic requirements:

      • Apply a customized style sheet
      • Customize the section navigation
      • Customize the page elements
      • Customize the heading
      • Set up test users
      • Enter test content
      • Set up appropriate help documentation for a wiki-nubi.

      Image-0
      Compared with my experience with the other wiki software–in particular, with MediaWiki, Wiclear was very easy to work with. Furthermore, Wiclear had the following required features, some–but not all–of which were available in one or more of the other wiki systems.

      • Browser-driven installation
      • Web administration interface
      • Easy templating
      • Hierarchical page structure enforcing parent-child relationships between pages
      • Individual page access controls through use of industry-standard ACL’s (access control lists); the system provides an easy web-based interface for setting per-page permissions
      • An automatically generated “site plan”–site map–for navigation
      • Automatically generated “breadcrumbs”
      • Automatically generated “sub-page navigation” (showing all child pages to the current one)
      • Registered users can add comments about any page, whether they are the author or not. (This feature is configurable and is in fact a standard feature of most wiki systems.)
      • Users can attach external files to individual pages (a relatively rare wiki feature, but one that I was sure would be “oohed and aahed” at by my customer base.
      • Enables user self-registration, and provides flexible User/group management tools.
      • Provides a “Post New Content” feature that’s unique in wiki’s, but extremely useful for adding new content to the tree.
      • Usual features that made wikis so popular for collaborative editing in the first place:
        • Page history
        • Comparisons with and rollback to earlier pages
        • Subscriptions by email
        • RSS feeds
        • List of recently changed pages
        • Search
        • “What links here” feature
        • Simple editing system for easy content entry (with optional HTML entry), as well as an optional preview capability

      Further, if my customers were ever to require the ability to support multiple languages, they could turn on one of Wiclear’s most impressive features: built-in multilingual support.

      Wiclear has a clear, well documented code base, and with my knowledge of PHP and MySQL–plus HTML, CSS, and JavaScript–I was quickly able to add a few custom features that I thought my customers would appreciate. The first was a simple WYSIWYG HTML editor that would give our writers the comfort of having Word-like editing tools in place. For this, I chose Dojo’s excellent DHTML, rich-text editor, which is one of the few that supports Safari on the Mac as well as all the other usual suspects (Mozilla/Firefox and IE). The Dojo editor is a breeze to set up, and works beautifully. It doesn’t “do tables,” but my pitch to users is to keep the text structure simple, so hopefully nothing more complicated than headings and nested lists will be needed.

      The second tweak that might be of interest to readers was a default setting to automatically subscribe an author to the page he/she has written. This ensures that anyone who authors a page gets notified whenever it has been changed. (You cannot opt out of this feature, but you can always unsubscribe.) I hope this will take care of the worry over unauthorized edits, since it will be hard to not know when “your” page has changed, and quite easy to go in and fix any errors.

      The author of Wiclear has steadily continued to improve the product. There have been 3 new releases since I installed Wiclear in late November 2005. In fact, the author has incorporated at least one of the features I requested after my initial configuration–namely, the ability to define a “root” page that could be ACL-protected against accidental damage. This was kind of important to give my customers the necessary comfort level to know that their part of the tree wouldn’t be uprooted someday, either advertently or inadvertently. :-) I actually hand-coded the hack into Wiclear at the time, but the software’s author had finished integrating that function by January.

      So far, I’m very pleased with my choice, and still relieved that I didn’t have to back out of the idea of testing the wiki waters for collaborative editing. Next comes the more difficult part–convincing users that this is a tool that can work for them rather than simply another complication to their working lives. Fortunately, there are several forward-thinking groups in the agency that are anxious to try the wiki out. I was delighted to set up the first group with their own branch of the wiki tree, and look forward to getting their feedback.

      In a dumbed-down form appropriate for non-geeks, Wikis have great potential to be a key knowledge-management solution for a lot of content management problems in an organization. I think with Wiclear I’ve set up a foundation that won’t scare people away without even giving it a try, and that, in my organization, would be called a victory!

          
      • del.icio.us
      • Google
      • Slashdot
      • Technorati
      • blogmarks
      • Tumblr
      • Digg
      • Facebook
      • Mixx
      Just Say No To Flash