1 Introduction
- Almost all software systems are centered around a domain
model. Domain models describe the relevant concepts and data structures from
an application domain, such as products, purchase orders, customers (Figure
1). Furthermore, domain models play an important role during the requirements
analysis and
design
process.

Figure1: A simple domain model in UML syntax.
- The typical Model-View-Control architecture of modern
software systems suggests to separate domain models from user
interface and
control
logic, which are often implemented in languages like Java Server Pages, Swing,
or Visual Basic.
- The separation of non-visual parts from visual components makes it potentially
easier to reuse domain models for other applications and target platforms.
Furthermore, domain models can often be outlined as boxes-and-arrows diagrams
which can be designed and discussed with domain experts and potential end-users.
- In fact, most conventional software development methodologies suggest to
start with analysis and design steps leading to UML diagrams which can then
be transformed into classes in object-oriented programming languages.
- These object-oriented programming languages have been designed to be of
general-purpose, i.e. they can be used to implement domain model code as
well as control logic and user interfaces.
- However, object-oriented languages are not really optimized for the representation
of domains models. They are a rather
poor means to express structural knowledge. For example, assume you want
to express
that
purchase
orders from
overseas
will lead to additional packaging and handling charges. Such domain knowledge
may exist inside of the heads of the developers and may be scribbled on
printed UML diagrams, but in the executable code,
much of this knowledge will be implemented by means of if-clauses inside
of methods in your classes.
- As a result, much of the domain knowledge is hidden inside imperative code
and hard to reuse for other purposes.
- This is particularly critical because programming
code often drifts away from the original design diagrams as the system
evolves. Many real-world systems are developed using agile methods like
eXtreme Programming,
where the only encoding of domain models is in terms of Java or C# classes
and hand-coded unit tests. Such approaches typically don't lead to reusable
code, don't scale for large systems and run into serious problems when the
programmers leave.
- In order to overcome these problems, generations of software engineers
and tool developers have thought about ways of raising the level of abstraction
so that the domain knowledge from requirements and design phases is not lost
when the system is implemented.
- The most recent of these approaches, Model Driven Architecture, suggests
to use custom-tailored dialects of UML to represent domain models on a
higher level of abstraction, and then to employ code generators for the low-level
plumbing. Unfortunately, this requires powerful tools
that are hard to build and use, and the mappings from high-level diagrams
to
executable code are often difficult to formalize.
- Rather unnoticed from the main software engineering camps, the World Wide
Web Consortium (W3C) has designed some very interesting technology in
the context of its Semantic Web vision. This technology has been originally
designed
with the goal of making internet pages easier to understand for intelligent
agents and web services, but it turns out that Semantic Web languages and
tools could also play a major role in software development in general.
- In a nutshell, this approach suggests to design domain models in Web-based
object-oriented languages such as OWL and RDF. OWL has been optimized
to
represent structural
knowledge on a high level of abstraction. Domain models encoded in OWL
can be uploaded
on the Web and shared between multiple applications. The OWL models themselves
encode much of their meaning (semantics), so that applications can discover
and access appropriate models dynamically. The richness of the Semantic Web
representation language makes it easier to build reusable, quality domain
models, because additional reasoning services such as consistency checking
and classification can be exploited. At the same time, OWL and RDF operate
on similar structures like object-oriented languages, and therefore can be
quite seamlessly integrated with traditional software components.
- The purpose of this document is to explain how object-oriented applications
can be designed and implemented with the help of Semantic Web technology.
Section 2 gives an outline of how the application development
life cycle can benefit from Semantic Web approaches. Section 3 introduces
the Semantic Web languages RDF and OWL, and compares them to object-oriented
modeling languages. Section 4 shows how RDF and OWL models can be embedded
into object-oriented programs (using Java). Section 5 provides references
to further reading, tools and libraries.
2 Application Development with Semantic Web Technology
- What is the Semantic Web? Most of the current internet content is geared
for human users. Presentation languages such as HTML contain instructions
for
Web browsers on how to present multi-media contents to humans. However, if
we wanted to employ a computer program to search for Web-based information
for
us, then this program would find it very difficult to make any sense of these
Web pages, unless it has advanced human language skills. Furthermore, contemporary
Web
languages like JSP or ASP support a random mixture of model and view parts
in a single file, leading to very unstructured contents.
- The vision behind the Semantic Web is to make internet contents
machine-readable so that it can be easier analyzed by software agents and
shared between
Web Services. For that purpose, the World Wide Web Consortium (W3C) is recommending
a number of Web-based languages that can be used to formalize internet contents.
- RDF
and OWL can be used to describe classes, attributes and relationships similar
to object-oriented languages. For example, RDF can be used to define
that the class "Product" has a property "price" which
takes values of type float. And you can define a class "Purchase" with
a property "products"
which relates it with multiple Products. OWL extends RDF by additional
constructs to define more complex relations. For
example,
OWL
can be used
to define
a class
"OverseasPurchase"
as the subclass
of all Purchases that have a delivery address to a country on a different
land mass than the supplier.
- The W3C also works on other languages for describing if-then rules and
complex SQL-like queries, but our focus here lies on RDF and OWL.
- Domain models in any of these languages
can be linked into the Semantic Web just like you would publish an HTML page.
Once an RDF or OWL file is online, other Web resources or applications can
link to them. For example, a HTML page showing a certain product could
encode metadata to link back to the corresponding entity in an RDF model.
Or, providers of certain products can instantiate the RDF classes to announce
their portfolio to shopping agents.
- A typical scenario for such a Semantic Web application is shown in Figure
2.

Figure 2: An application using Semantic Web technology can
exploit domain models and services from the Web.
- While some of this could also be achieved using XML, Semantic Web languages
are far more flexible and extensible. Since their basic structure is in
a sense object-oriented, it is possible to define subclasses and generalizations
of concepts. Since
every Semantic Web resource has a unique
URI, it is possible to establish links between existing models. This means
that whenever a model of a certain domain has been published on the Web,
then others are able to build upon it, and thus to establish a network
of domain knowledge.
- The extensibility of Semantic Web languages supports reusability on a global
scale. Instead of defining the 1000th variation of a product-purchase domain
model, application developers could locate a suitable model from the Web
and simply reuse or extend it. By reusing an existing model, different
applications with similar tasks can share results and data much easier. Furthermore,
it
is far more likely that an application-independent reusable component (such
as a shopping basket application or a credit card handling Web Service) can
be integrated.
- This reusability is partly based on the fact that Semantic Web languages
are Web-based: Each class, property or object in an RDF or OWL file has a
unique identifier (URI), so that it can be referenced from anywhere else.
The other major strength that makes Semantic Web models better to reuse is
that OWL is founded on formal logic. This means that OWL model are not only
limited to defining classes and their attributes, but can also encode the
intended "meaning" of these classes, so that the classes can be
unambiguously shared between groups of humans or machines. Domain
models that are based on such well-defined logics are often called ontologies.
In fact, the abbreviation OWL stands for the "Web Ontology Language".
From an object-oriented point of view, ontologies are non-visual domain classes
which contain logical statements that make their meaning explicit.
- Ontologies
are often defined by groups of humans (such as an online shop consortium
or a national geological survey) in order to build a shared domain vocabulary
for information integration. The logical statements inside an ontology make
it possible to exploit automatic reasoning tools, which can be compared to
compilers. These tools can for example detect if a class that has been added
to a model is inconsistent. Reasoners can also be used to
find the correct superclasses of a new class. For example...
- In many cases, shared ontologies / domain models will not be optimized
for a specific application purpose and therefore need to be adapted or built
from
the
scratch. In these cases, domain modeling tools (such as Protege, as shown
in Figure 3) can be used. These
tools are suitable for domain experts who have little or no training
in programming languages. Essentially, these tools provide visual editors
for classes and
relationships, and allow users to create instances of these classes. [Note:
I definitely don't insist on putting a Protege screenshot here - I don't
want to exploit this note for cheap propaganda. However, I think it would
be invaluable for readers of this note to see that real tools exist and how
they compare
to UML-based modeling tools. Any other suggestions for tools, screenshots
are welcome].

Figure 3: Domain modeling tools such as Protege can be used
to define classes, properties and individuals.
- The domain modeling activities in such a development process can be compared
to requirements analysis and design steps in traditional software development.
The domain experts or customers join forces with software designers to come
up with suitable abstractions of a domain.
The resulting domain models are then combined with the remaining application
components such as user interface
and control
logic by programmers.
In order to integrate the domain models with these other components, there
are
various techniques to access RDF and OWL
classes from the object-oriented target language.
This will be shown
in section 4. The formal logic behind ontologies can even be exploited at
other stages of the software development cycle, for example for test cases.
Even at run-time it is very useful to have domain models with explicit semantics,
because it is possible to use reasoning services to classify individuals.
We will look into this in more detail after we have introduced the basics
of RDF and OWL.
3 Introduction to RDF and OWL
- In order to implement the Semantic Web vision, the W3C has produced a number
of language specifications.
- RDF is the base infrastructure to represent classes, properties and instances
in a Web compliant format.
- OWL extends RDF with richer expressivity.
- Both languages are now supported by tools, parsers and programming APIs.
- This section will introduce RDF and OWL and compare them to object-oriented
languages.
3.1 RDF and RDF Schema
- RDF = Resource Description Framework; RDF Schema defines object-oriented
model for RDF, and RDF is very limited without RDF Schema.
- Basic infrastructure: Resources and literals
- Resources (classes, properties or individuals). Every resource has
a URI (e.g., http://onlineshop.com/model.rdf#Product).
- URIs are
often split into namespace and local name, and the namespace can be abbreviate
with a prefix notation (e.g., shop:Product if the prefix shop has been declared
to be "http://onlineshop.com/model.rdf#"). Namespace can be
compared to UML/Java package. Often one namespace
per file, but it is possible to use arbitrary namespaces in any file.
- References to external resources
can be established using fully qualified URIs. For example, you can define
a class http://myshop.com/products.rdf#MyProduct as a subclass of
the external class http://onlineshop.com/model.rdf#Product.
- Properties can be compared to attributes, fields or relationships in object-oriented
languages.
- However, RDF properties are stand-alone entities, i.e. they can
be defined independently from classes and used in multiple classes. For
example, you can define a property http://onlineshop.com/model.rdf#price
and then
attach it to all classes where a price makes sense.
This also makes it possible to reuse the same property across multiple files.
For example
if
you create
a model for online auctioning software, you could use the price property
from the online shopping model to represent prices for the traded goods.
Sharing the same property across multiple models means that values can
be more easily integrated, for example to compare the current auctioning
price
with the price for a new product in other online shops.
- RDF and OWL provide pre-defined system properties.
- rdfs:subClassOf
can be used to build an inheritance relationship between two classes.
- rdfs:range is used to limit the range of values of a property. The
range can be either a class (to establish relationships between classes)
or a datatype. RDF uses the default datatypes from the XML Schema specification,
such as xsd:int and xsd:string.
- rdfs:domain can be used to limit the use of properties to certain classes
only. For example, if you state that the rdfs:domain of the price property
is the Product class, then only Products can have a price.
- Literals: primitive values of a certain datatype, linked to a resource
using a property.
- Individuals are instances of the classes, with specific values for the
properties. Individuals can be compared to objects in a programming language.
It is possible to create instances of RDF classes at run time, but individuals
can also be part of an RDF file, for example to describe specific Products.
- Individuals
- All this is quite similar to UML, except that properties are top-level
entities
- RDF files are typically stored in an XML-based format. This format is rather
hard to read for humans, and therefore mostly edited with visual tools. Here is an example RDF file representing the UML model from Figure 1.
- A major difference between RDF and traditional object-oriented languages
is that objects can be instances of multiple types. Furthermore, objects
can change their type while a system is executing. Since properties
are potentially independent from classes, it is therefore possible to define
an object of an arbitrary type, then assign property values to it, and finally
determine the types that this object shall have. For example, a PurchaseOrder
could start as a plain PurchaseOrder object and later, as the customer places
more and more items in the shopping basket, it could change its type to become
an instance of...
3.2 OWL
- RDF has limited expressivity - just the skeleton for classes and properties
but no further constraints
- OWL Restrictions can be used to describe additional constraints on properties
at certain classes. There are various types of restrictions in OWL:
- Cardinality restrictions limit the number of values that a property
can have at a certain class. For example, you can state that each PurchaseOrder
must have exactly one customer. This corresponds to association ends
in UML diagrams or array ranges in programming
languages.
- allValuesFrom restrictions state that all values of a certain property
must have a certain type. This corresponds to the type of a relationship
and attribute in UML, but can be specified for each class individually.
In particular, it is possible to overload property types in subclasses...
- someValuesFrom restrictions state that at least one value of a certain
property must have a certain type. There is no direct counterpart
for this construct in object-oriented languages, but it is a very commonly
used construct to
specify the relationships between classes. These relationships can then
be used by reasoning tools (as shown next).
- hasValue restrictions state that a property must have a certain value,
where the value can be either an individual or a literal.
- Reasoning: OWL does not only have richer expressivity to support constraints,
but also to drive so-called reasoning engines. A reasoner is a tool that
analyzes the knowledge that is explicitly encoded inside an ontology and
derives new knowledge from it. In particular, reasoners can be used
- to reveal
subclass/superclass relationships between classes
- to determine the most appropriate types of individuals
- to detect inconsistent class definitions
- Here is an example: Assume you want to define a class...
- Major difference to OO: domains and ranges are axioms, e.g. if we know
that some object has a price and the domain of price is the class Product,
then a reasoner can find out that the object must be a Product (it may have
other types as well).
- Defined vs. primitive classes. Quite a bit to write here...
- Disjoint classes, open world, unique name assumption
- Semantics as (JUnit)
test cases
- Importing other models from the Web
3.3 Comparison of OWL and Object-Oriented Languages
- Table summarizing language constructs and their OO equivalent
- Major differences
- Semantic Web languages have been designed from the ground up for the
Web
- Underpinnings in formal logic (so that reasoners can be used)
- This approach could be compared to a system that uses UML+OCL in a Web-based
format. OCL would be used at run-time. The resulting models would be shared
online, and XMI used as exchange format.
- Major advantage of OWL: Stronger expressivity for domain classes, making
it possible to use classes data objects at run time.
- Where to use RDF and OWL
- Where to not use Semantic Web technology
4 Programming with RDF and OWL
- Working with OWL models
in (Java) applications, how to sett up
infrastructure
- Dynamic/Active object models using APIs such as Jena
- Benefit: Using generic services such as reasoners, persistence, dynamic
creation of classes at run-time
- Disadvantage: Different level of abstraction, difficult to integrate with
remaining components
- Option: Code generation (RDF Reactor, Kazuki,
Jastor, Protege-OWL API) to add methods to
classes etc
- Software architecture of Semantic Web applications
- Try to push as much as possible into the high-level models (OWL+)
- Use this high-level information at build-time and run-time
- Use generated APIs to connect the core models with the rest of your
system
- Non-reusable parts (view&control) coded as usual
- Development process (domain modeling - coding - testing)
- Sharing ontologies
online
5 Where to go from here
- Links to APIs
- Jena
- WonderWeb API
- Protege-OWL API
- Code generators
- http://rdfreactor.ontoware.org/
- Kazuki
- Jastor
- Protege-OWL
- Links to tools and support infrastructure
- Links to further online documents
- Links to example ontologies
- Links to example SW applications (are there
any?)