Introduction to ModeShape
- Published: 19 February 2013
ModeShape is an open source implementation of the JSR-283 specification and standard JCR API. This tutorial will provide a basic introduction to the ModeShape framework and the JCR specification.
Content Repository API for Java (JCR) is a specification for a Java platform application programming interface (API) to access content repositories in a uniform manner. A JCR is a type of object database which can be used for storing, searching, and retrieving hierarchical data.
At first sight you might wonder what's the advantage of using JCR against a database (for example, also a traditional RDBMS is able to store documents as binary data) . A JCR repository is quite different from a RDBMS because it exibits the following features:
- Is hierarchical, allowing your to organize your content in a structure that closely matches your needs and where related information is often stored close together and thus easily navigated
- Is flexible, allowing the content to adapt and evolve, using a node type system that can be completely "schemaless" to full-on restrictive (e.g., like a relational database)
- Uses a standard Java API (e.g., javax.jcr)
- Abstracts where the information is really stored: many JCR implementations can store content in a variety of relational databases and other stores, some can expose non-JCR stores through the JCR API, and some can federate multiple stores into a single, virtual repository.
- Supports queries and full-text search out of the box
- Supports events, locking, versioning, and other features
What kind of applications can benefit from these features? The JCR API initially grew out of the needs of Content Management Systems, which require storing documents and other binary objects with associated metadata; however, the API is now applicable to many other type of applications which require for example versioning of data; transactions; observation of changes in data; and import or export of data to XML in a standard way.
Modeshape is an opensource implementation of the JCR 2.0 API and thus behaves like a regular JCR repository. Applications can search, query, navigate, change, version, listen for changes, etc. ModeShape can store that content in a variety of back-end stores (including relational databases, Infinispan data grids, JBoss Cache, etc.), or it can access and update existing content from *other* kinds of systems (including file systems, SVN repositories, JDBC database metadata, and other JCR repositories).
How's data organized into Modeshape ? data is organized into a tree structure that reflects the way data is accessed or used. As you can see from the following picture, in many scenarios there is a natural hierarchy between records which can be navigated using queries.
Each JCR node contains the following elements:
- Name path and identifier
- Properties (name and values)
- Child nodes
- One or more Node Type
The Node Type element, in particular, defines the allowed properties in the node (name/value/searchable/mandatory)
define the allowed child nodes
Where is data stored in ModeShape ?
Data is stored both in memory and on a traditional storage (DB, file system). As a matter of fact, the in-memory approach is the fastest and cheapest way; however because of physical limits of the Machines and the fact that in-memory is not durable, Modeshape distributes multiples copies of its objects across multiple machines, combining it with a traditional storage like a RDBMS, when needed.
Once stored, the data can be retrieved using SQL-like syntax. The JCR API makes it possible for implementations to support multiple query languages, and the specification requires support for two languages: JCR-SQL2 and JCR-QOM.
The JCR-SQL2 query language is a way to express queries using strings that are similar to SQL. (nodes appear as rows in those tables that correspond to the Node Type)
This query language is an improvement over the JCR-SQL language, providing among other things far richer specifications of joins and criteria.
SELECT * from [employee:Employee] WHERE [employee:department] LIKE '%BILLING%'
ModeShape includes full support for the complete JCR-SQL2 query language, adding additional extensions (especially for JOINs) to make it more powerful:
SELECT file.*, ref.* FROM [nt:file] AS file JOIN [mix:referenceable] AS ref
JOIN ON ISSAMENODE(file,ref)
Besides JCR-SQL2, Modeshape allows also to perform full-text simpler search language using a Google-style search grammar.
This query language is actually defined by the JCR 2.0 specification as the full-text search expression grammar used in the second parameter of the CONTAINS(...) function of the JCR-SQL2 language:
SELECT * FROM [nt:base] WHERE CONTAINS([nt:base],'full-text-query')
ModeShape core frameworks
Let's introduce some of the core frameworks which are part of the ModeShape project:
One of ModeShapes other interesting features is the concept of sequencers. ModeShape uses sequencers to help you extract more meaning from the artifacts you already are managing, and makes it much easier for applications to find and use all that valuable information.
ModeShape has quite a few sequencers out of the box (and many others are going to be added) such as:
- Image Sequencer
- MP3 Sequencer
- XML Document Sequencer
- ZIP File Sequencer
- Microsoft Office Document Sequencer
- Java Source File Sequencer
- Java Class File Sequencer
- DDL File Sequencer
- Text Sequencers
ModeShape 3.0 introduces the ability to federate data from external systems into ModeShape repositories. What is really exciting of this technology is that, ModeShape does not copy the data from the external system into the repository. Instead, ModeShape (with the help of connectors) dynamically creates nodes upon demand to represent the external data which is mapped to the federated resource. Once done, federation is transparent to clients: the repository’s regular content and federated content all looks to client applications like regular content.