Saturday, August 28, 2010

The Limitations of SPARQL

Recently, I have been looking at RDF model and try to compare that with the property graph model that I mention in a previous post. I also look at the SPARQL query model. While I think it is a very powerful query language based by variable bindings, I also observe a couple of limitations that it doesn't handle well.

Note that I haven't used SPARQL in very simple examples and don't claim to be expert in this area. I am hoping my post here can invite other SPARQL experts to share their experience.

Here are the limitations that I have seen.
Support of Negation

Because of the “Open World” assumption, SPARQL doesn’t support “negation” well, this means expressing "negation" in SPARQL is not easy.

  • Find all persons who is Bob’s friends but doesn’t know Java
  • Find all persons who know Bob but doesn't know Alice
Support of Path Expression

In SPARQL, expressing a variable length path is not easy.

  • Find all posts written by Bob’s direct and indirect friends (everyone reachable from Bob)
Predicates cannot have Properties
This may be a RDF limitation that SPARQL inherits. Since RDF represents everything in Triples. It is easy to implement properties of a Node using extra Triples, but it is very difficult to implement properties in Edges.

In SPARQL, there is no way to attaching a property to a “predicate”.

  • Bob knows Peter for 5 years
RDF inference Rule

Inference rules are build around RDFS and OWL which is focusing mainly on type and set relationships and is implemented using a Rule: (conditions => derived triple) expression. But it is not easy to express a derived triples whose object’s value is an expression of existing triples.

  • Family income is the sum of all individual member’s income
Support of Fuzzy Matches with Ranked results
SPARQL is based on a boolean query model which is designed for exact match. Express a fuzzy match with ranked result is very difficult.

  • Find the top 20 posts that is “similar” to this post ranked by degree of similarity (lets say similarity is measured by the number of common tags that the 2 posts share)

I am also very interested to see if there is any large scale deployment of RDF graph in real-life scenarios. I am not aware of any popular social network sites are using RDF to store the social graph or social activities. I guess this may be due to scalability of the RDF implementation today. I may be wrong though.

4 comments:

Anonymous said...

Thank you for this great blog :)

- PGore

Anonymous said...

Virtuoso's SPARQL-BI supports negation, transitive (variable length) path; combination of subqueries and custom aggregates is sufficient for ranking; reification support is no more weird than in RDF itself.

The only question is when these things will become standardized, not Virtuoso-specific, but note "when" in the question, not "whether" ;)

Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com

danbri said...

Hi! Good stuff. Just a couple side comments re SPARQL.

"Support of Path Expression
In SPARQL, expressing a variable length path is not easy. Find all posts written by Bob’s direct and indirect friends (everyone reachable from Bob)"

True of SPARQL 1.0, however v1.1 has a new [draft!] property paths feature, see http://www.w3.org/TR/sparql11-property-paths/
.. I'm sure the WG would appreciate review comments.


"Predicates cannot have Properties
This may be a RDF limitation that SPARQL inherits. Since RDF represents everything in Triples. It is easy to implement properties of a Node using extra Triples, but it is very difficult to implement properties in Edges."

Actually SPARQL is the only piece of the RDF universe that has a workaround for this, since SPARQL is a language for RDF data access, where the data can be a collection of graphs, you can attach information to groups of triples and query against that.

"In SPARQL, there is no way to attaching a property to a “predicate”. Bob knows Peter for 5 years"

To query against this, you'd need to couch it in terms of some property of the graph. Sometimes that works ok, sometimes it's a hack. In this case I think it'd be a hack :)

SELECT ?who WHERE { GRAPH ?g { foaf:knows> . } ?g eg:trueForYears "5" }

... yeah, not beautiful but it's a corner of SPARQL worth looking at, since it emphasises the need to super-impose graphs from different sources in an ad-hoc, pragmatic fashion.

Ankur Goel said...

Hi Ricky,

Support of negation:
I agree it doesn't have negation keyword but it has negation operator, using filter and optional we can implement negation

Support of Path Expression:
This is available now.

Predicates cannot have Properties:
True but it can be achieved using proper ontology. I echo danbri.

RDF inference Rule:
In my view, SPARQL/RDF are best suited, for creating and traversing the graph. Its not good if you are doing computation.Using inference it can generate the more triple that can be accessed using the defined ontology.
For example in your example we can add one more predicate "family income" and can put the computed income there either using direct logic or inference.