Converting between rule formats

A frustrating issue that arises in rule-based reasoning is the plethora of rule formats. From a syntactic point of view, these formats can range from more or less similar (e.g., SPIN, Apache Jena, SPARQL-related) to wholly different ones (e.g., Datalog). This means that interesting rule engines, which you may want to compare regarding performance, may support different rule formats.

I ran into this issue when looking for a rule engine to implement Clinical Decision Support (CDS) for a mobile patient diary [1]. The heavyweight reasoning would still be deployed on the server side (were Drools was utilized), whereas urgent, time-sensitive reasoning (e.g., in response to new vital signs) would deployed on the mobile client.

Mobile patient diary

Since the mobile patient diary was going to be developed using Apache Cordova, I looked at both Java and JavaScript rule engines. I ported some reasoners manually to the Android platform, including some OWL reasoners (Roberto Yus did the same for a similar project). If you’re interested, all the benchmarked Android reasoners can be found here; regarding JavaScript reasoners, I compared the RDFQuery, RDFStore-JS (outfitted with naive reasoning) and Nools reasoners; benchmark results were reported in the literature [2-4]. In the context of this work, I also worked on optimizing OWL2 RL [4], a partial rule-based axiomatization of OWL2 for ontology reasoning, which I posted about before.

Ideally, I wanted a solution where I only needed to maintain a single ruleset for a given purpose (e.g., for CDS or OWL2 RL reasoning), which could then be converted to any other rule format when needed. I chose SPARQL Construct as the input format since SPARQL is well understood by most Semantic Web / Linked Data developers. Further, the SPARQL Inferencing Notation (SPIN) easily allows storing SPARQL queries together with the domain model.

To support this solution, I developed a simple web service for converting SPARQL Construct rules to Datalog, Apache Jena format and Nools format. Note that the Nools format is very similar to the Drools format – so it may be used to convert to Drools rules as well (given some adjustments). Further, the service can convert RDF data to the Datalog and Nools formats as well.

Note that this conversion may be far from perfect – since I focused solely on the language features that I needed at the time. Also, this is a limited number of formats that were motivated by my needs at the time. So, feel free to contribute to the project!

References

[1] W. Van Woensel, P.C. Roy, S.R. Abidi, S.S.R. Abidi, A Mobile and Intelligent Patient Diary for Chronic Disease Self-Management, in: Stud Heal. Technol Inf., 2015: pp. 118–122. doi:10.3233/978-1-61499-564-7-118.

[2] Van Woensel, N. Al Haider, P.C. Roy, A.M. Ahmad, S.S.R. Abidi, A comparison of mobile rule engines for reasoning on semantic web based health data, in: Proc. – 2014 IEEE/WIC/ACM Int. Jt. Conf. Web Intell. Intell. Agent Technol. – Work. WI-IAT 2014, 2014. doi:10.1109/WI-IAT.2014.25.

[3] W. Van Woensel, N. Al Haider, A. Ahmad, S.S.R. Abidi, A Cross-Platform Benchmark Framework for Mobile Semantic Web Reasoning Engines, in: Semant. Web — ISWC 2014, 2014: pp. 389–408. doi:10.1007/978-3-319-11964-9_25.

[4] W. Van Woensel, S. S. R. Abidi, Benchmarking Semantic Reasoning on Mobile Platforms: Towards Optimization Using OWL2 RL, in: Semantic Web Journal, Forthcoming (2018).

Creating custom OWL2 RL rulesets

Yes, you may have noticed an abundance (cornucopia?) of OWL2 RL-related posts on this blog. I’ve done some work in this field, and thought it best to split up the work into more manageable chunks.

I’ve found that when working with OWL2 RL,

  • A number of important options for optimization exist;
  • Certain rules involving n-ary lists can be supported in multiple ways.

Given these considerations, wouldn’t it be cool if someone can easily create their own OWL2 RL ruleset, geared towards their particular application scenario (availability of a “stable” ontology; need for full  OWL2 RL conformance; need for n-ary rules)?

I first discuss these considerations below. Next, I talk about a web service that can generate a custom OWL2 RL ruleset accordingly.

Optimization

Regarding optimization, I’ve proposed 3 selections to create OWL2 RL subsets. You can find a summary below.

In [1], I elaborate on these selections (and their impact on performance) in much more detail.

Equivalent OWL2 RL rule subset

Leaves out logically equivalent rules; replaces a set of specific rules by a single general rule; and may drop rules based on redundancy at the instance level.

Note that this subset focuses in particular on reducing the OWL2 RL ruleset, not necessarily optimizing it for reasoning (e.g., using more general rules has been shown to reduce performance, as shown in [1]).

Purpose and reference-based subsets

Divides rule subsets via their purpose and referenced data, allowing smaller rulesets to be applied in certain runtime scenarios.

OWL2 RL rules perform either inference or consistency-checking (purpose), and refer to instances and schema or only schema elements (reference). Further, rules that will not yield inferences over a particular ontology can be left out as well, by applying a separate pre-processing step (domain-specific).

Importantly, the applicability of this selection depends on the concrete use case, and whether the ontology can be considered relatively stable (i.e., not prone to change).

First, one can create two separate OWL2 RL rulesets, respectively containing rules that (a) refer to instances & schema assertions, and (b) only refer to schema assertions. While ruleset (a) is relevant whenever new instances are added, ruleset (b) is only relevant at initialization time and whenever the ontology changes. A (rather trivial) proof in [1] confirms that this process yields the same inferences as the full OWL2 RL ruleset.

Second, one can create a domain-specific ruleset, leaving out rules that will not yield inferences over a given ontology. Although this selection may yield huge performance improvements, note that it is especially brittle in dynamic settings — aside from ontology updates, even new data patterns may render a rule relevant (e.g., reciprocal subclass-of statements). In case of ontology updates, or new data patterns, the domain-specific ruleset will need to be generated again.

Removal of inefficient rules

This selection leaves out rules with a large performance impact.

Currently, it only leaves out the #eq-ref rule, which infers that each resource is equivalent to itself. This rule generates 3 new triples for each triple with unique resources, resulting in a worst-case 4x increase in dataset size (!).

N-ary rules

So-called ­n-ary rules refer to a finite list of elements. A first subset (L1) of these rules enumerate (i.e., list one by one) restrictions on single list elements. For instance, rule #eq-diff2 flags an ontology inconsistency if two equivalent elements of an owl:AllDifferent construct are found. Rules from the second subset (L2) include restrictions referring to all list elements, and a third ruleset (L3) yields inferences for all list elements . E.g., for (L2), rule #cls-int1 infers that y is an instance of an intersection in case it is typed by each intersection member class; regarding (L3), for any union, rule #scm-uni infers that each member class is a subclass of that union.

To support rulesets (L1) and (L3), two list-membership rules can be added that recursively link each element to preceding list cells, eventually linking the first cell to all list elements. Three possible solutions can be applied for (L3), each with their own advantages and drawbacks. These solutions are summarized here and elaborated in more detail in [1].

Web service

I developed a web service that allows anyone to create their own custom OWL2 RL ruleset, by applying one or more optimizations and/or solutions for n-ary lists. It includes an initial OWL2 RL ruleset as a set of SPARQL Constructs (find it separately here). You can find the project on GitHub!

I previously posted about simple Java and Android projects that utilize an OWL2 RL ruleset for ontology reasoning. To convert the OWL2 RL ruleset into different formats (such as the Apache Jena one), there’s a project for that too.

Just plugin your own custom OWL2 RL ruleset, and feel free to share your results/thoughts in the comments!

References

[1] W. Van Woensel, S. S. R. Abidi, Benchmarking Semantic Reasoning on Mobile Platforms: Towards Optimization Using OWL2 RL, in: Semantic Web Journal, Forthcoming (2018).

OWL2 RL on Android

AndroJena represents a porting of Apache Jena to the Android platform. Hence, it represents an ideal starting place for working with semantic web data on mobile systems.

To help people to get started with AndroJena, I’ve created separate Android Studio modules for AndroJena, ARQoid and Lucenoid (i.e., ported Android versions of their Jena counterparts). I also created a simple AndroidRules project that includes these modules to perform ontology reasoning, utilizing an OWL2 RL ruleset [1]. Clearly, you can use any other ruleset as well. You can find everything on GitHub!

The project was created to be usable out-of-the-box — just point Android Studio towards the project folder (File > Open…) and select the AndroidRules project. If you run into issues, checkout the README file.

The AndroidRules app loads a small subset of the pizza ontology and infers types for DominosMargheritaPizza.

In another post I describe a project that implements OWL2 RL reasoning using the regular Apache Jena.

References

[1] W. Van Woensel, S.S.R. Abidi, Benchmarking Semantic Reasoning on Mobile Platforms: Towards Optimization Using OWL2 RL, Semant. Web J. (2018).

OWL2 RL + Apache Jena

In the context of realizing ontology reasoning on resource-constrained platforms [1], I created an OWL2 RL ruleset in SPARQL Construct format. Note that, to support all n-ary rules, the ruleset includes a set of auxiliary rules that, as a side-effect, generate intermediary (meaningless) inferences. If support for these n-ary rules is not required, these rules could be removed (last 9 rules).

You can find this ruleset here, and the accompanying axioms here.

To illustrate how to use Apache Jena and the OWL2 RL ruleset for ontology-based reasoning, I created a simple Java Maven project on GitHub. This could be an efficient alternative for those having performance issues with Jena’s built-in ontology reasoning support; seeing how OWL2 RL is an OWL2 profile that trades expressivity for performance.

(Note that I used my conversion web service to convert the OWL2 RL ruleset into the Apache Jena rule format.)

Check it out and let me know what you think!

Note that another post discusses how OWL2 RL can be utilized for ontology reasoning on the Android platform.

References

[1] Van Woensel, S.S.R. Abidi, Benchmarking Semantic Reasoning on Mobile Platforms: Towards Optimization Using OWL2 RL, Semant. Web J. (2018).

My thoughts on the ESWC 2018 conference

Friggin European outlets!

I attended the ESWC 2018 conference in Crete in the beginning of June 2018. It took a grueling flight from Canada (a good 24 hours) but it was certainly worth it – I met plenty of interesting people doing interesting things. Being a member of a health informatics research group, I looked at things with a lens from both a health informatics perspective as well as my own personal interests, which often overlap.

On Tuesday morning, Ioanna Manolescu gave a keynote on RDF graph summarization – finding an RDF graph that summarizes an input RDF dataset as succinctly and accurately as possible while still retaining the semantics. You can find more information on the RDFSummary project here. Summarization of RDF graphs has quite some applications, ranging from UI design to over data exploration to query optimization.

<div itemscope itemtype=”http://schema.org/Blog”>
<h1 itemprop=”name”>Mobilizing Linked Data</h1>
<div itemprop=”author” itemscope
itemtype=”http://schema.org/Person”>
Author: <span itemprop=”name”>William</span>
</div>
</div>
Indeed, in my opinion, a major hurdle for third parties to effectively leverage embedded structured data in websites (e.g., RDFa, microdata) is lack of knowledge on where “useful” structured data can be found. Illustrating this point, to realize a useful third-party e-commerce scenario based on the WDC dataset [1-2], Petrovski et al. [3] had to resort to an NLP pipeline based on product labels and descriptions – which defeats the point of embedded structured data, in my opinion. As a companion effort to the WDC, summary RDF graphs could be generated per website based on their embedded structured data, to (a) identify websites for interesting third-party use cases, and (b) in general, determine the progress of this “lightweight” semantic web. A second possible application involves query distribution – given (a) summary RDF graph(s) per dataset, a highly informed decision can be made about where to send subqueries and the join opportunities between them. I’ve done some preliminary work in that field [4].

“We need to rethink our strategy of hoping the Internet will just go away.”

Regrettably, the jetlag caught up with me on Wednesday (also, I had misread the conference program) so I missed the second keynote but I heard good things. The third keynote by Milan Stankovic was about creating and selling a semantic web startup. It was undoubtedly very interesting for people with similar goals, but even for people without entrepreneurial ambitions it was still an interesting talk, ranging from matinal pigs to fear of technology. Moreover, it’s always nice to see successful applications of semantic web technology.

 

Can’t be bad!

Below, you can find my thoughts on some of the talks I attended. I was surprised how many works applied machine learning on top of semantic web technology, which I think is an interesting development. Also, there seemed to be quite some work on NLP and Question Answering systems, which was a bit surprising.

At the end, I list some works that, in my opinion, are useful contributions to the operationalization of linked data.

Machine Learning

Applications of machine learning included coping with incomplete knowledge graphs and semantically annotating free text. For instance, graph convolutional networks were presented as a solution for link prediction and entity classification. To produce additional, missing image-caption pairs for training image classifiers, Mogadala et al. combined deep learning with knowledge graphs.

Avvenuti et al. utilized a knowledge graph to extract geographic metadata related to parsed tokens from free text, using an SVM to deal with language polysemy and rule out incorrect matches (negative samples for training this SVM was trained were based on issues they had found during evaluation).

Anna Lisa Gentile gave a very interesting talk about aligning user conceptualizations to a target ontology, using a novel machine learning technique based on ontology hierarchies (hierarchical classification). It was evaluated by aligning user blog postings to the medDRA medical taxonomy.

I’m very intrigued about how knowledge graphs could be utilized to improve machine learning. Indeed, not only is the latter work (Gentile) useful from an ontology engineering point of view, it poses a contribution to the machine learning field as well!

Heterogeneous knowledge sources

In ontology alignment, Thieblin et al. utilized DL to model complex alignments between multiple ontologies, which involves full concept equivalency or concept inclusion in either direction. They presented a reference dataset with complex alignments for benchmarking ontology alignment approaches. Saeedi et al. presented an approach to perform entity resolution by clustering similar entities from different datasets, based on entity similarity combined with link strength and degree.

Georgala et al. proposed a dynamic link discovery system called CONDOR, which re-evaluates execution plans for link specifications at runtime.

This dynamic, re-evaluation aspect could be considered similar to query optimization in databases and RDF stores; at least in cases where limited statistical data is available, which is the assumption here. It nicely illustrates how cross-fertilization between different fields can yield useful results

QA systems

On QA systems, the Frankenstein system supports the dynamic composition of query answering pipelines and presents a set of re-usable components (named entity recognition, relation/concept learning, etc.).

Although this talk, and others on QA systems, were interesting and the work useful, I feel that semantic web technology still has much more to offer to QA systems: e.g., utilizing concept and relation subsumption to easily adapt information requests to be more generic or specific, based on returned results and/or user needs.

Operationalizing the Semantic Web

Here, I list works here that, in my opinion, operationalize the semantic web for large-scale utilization and (mobile) deployment. (Also, they don’t seem to fit in other general categories.)

The HDT (Header-Dictionary-Triples) format is a highly compressed format for RDF triples (along the lines of 120Gb -> 13Gb), which allows for the efficient execution of certain queries when still in uncompressed format. The talk on HDTQ did a good job of introducing this format, and proposed the HDTQ extension that allows compressing quads as well. Similarly, Charpenay et al. introduced a binary object notation for RDF, by encoding its JSON-LD format using existing binary JSON formats. You can find the github project here.

I feel that this kind of “low-level” work is elementary in realizing efficient and realistic semantic web solutions. Regrettably, in my experience, it is quite hard to publish (and get funding) for this sort of work – it’s very easy to shoot at as a reviewer (“you didn’t compare to (my) X, Y or Z system!”), at least, compared to “novel” high-level approaches. We saw quite a lot of the latter at the conference, but not too many of the former. This kind of work builds a solid foundation for semantic web applications and, in my opinion, typically does not get enough attention.

Margara et al. discussed temporal reasoning over RDF streams with a particular focus on temporal relations. It’s very interesting work, and it may have interesting applications in health informatics – indeed, there is work on preventing alert fatigue in physicians that focuses on firing alerts only when adverse temporal patterns are found (e.g., quickly increasing fever) [5].

Mobile patient diary

I talked about Clinical Practice Guidelines (CPG) and their computerization in the form of Clinical Decision Support Systems (CDSS). Combined with the increasing need to allow patients to self-manage their illness, as well as improvements in mobile hardware, an opportunity arises for mobile patient diaries outfitted with local CDSS. In this context, I talked about benchmarks we performed for rule engines on mobile platforms [6-7], as well as efforts to optimize OWL2 RL for realizing ontology-based reasoning on mobile platforms [8-9]. Our paper concerned extending RETE, an algorithm underlying many rule engines, with special consideration for the many generic rules in OWL2 RL and typical semantic web scenarios.

Note that this kind of work has many applications outside of health informatics – by utilizing local decision support, less remote (cloud) resources are needed; bandwidth usage can be greatly reduced (especially for raw sensor data); and there is no influence of network conditions on local, time-sensitive tasks.

Soto et al. proposed an extended SERVICE clause that allows referencing values from JSON service output, identified via a JSONPath expression. I found this work quite interesting – one could extend the SERVICE clause with support for any other kind of format (XML, CSV, HTML, ..) this way. Perhaps a more SPARQL-friendly way of referencing and querying this data could be possible as well.

Calvanese et al. discussed the need for canonical URIs when integrating multiple relational databases in the context of OBDA, and presented an extension of SPARQL entailment to that end. This work was done in the context of a large-scale ODBA project (the Statoil use case) that resulted in the Ontop artefact. I feel that this work is a quintessential example of effective operationalization: it illustrated a clear need for a SPARQL extension, based on real-world problems, and did a good job of explaining its formalization.

All good things come to an end!
And now for some beach time!

References

[1] C. Bizer, K. Eckert, R. Meusel, H. Mühleisen, M. Schuhmacher, J. Völker, Deployment of RDFa, Microdata, and Microformats on the Web; A Quantitative Analysis, in: Proc. 12th Int. Semant. Web Conf. – Part II, Springer-Verlag New York, Inc., New York, NY, USA, 2013: pp. 17–32. doi:10.1007/978-3-642-41338-4_2.

[2] R. Meusel, C. Bizer, H. Paulheim, A Web-scale Study of the Adoption and Evolution of the Schema.Org Vocabulary over Time, in: Proc. 5th Int. Conf. Web Intell. Min. Semant., ACM, New York, NY, USA, 2015: p. 15:1–15:11. doi:10.1145/2797115.2797124.

[3] P. Petrovski, V. Bryl, C. Bizer, Integrating product data from websites offering microdata markup, in: C.-W. Chung, A.Z. Broder, K. Shim, T. Suel (Eds.), 23rd Int. World Wide Web Conf. {WWW} ’14, Seoul, Repub. Korea, April 7-11, 2014, Companion Vol., ACM, 2014: pp. 1299–1304. doi:10.1145/2567948.2579704.

[4] W. Van Woensel, Mobile semantic query distribution with graph-based outsourcing of subqueries, in: 1st International Workshop on Mobile Deployment of Semantic Technologies (MoDeST 2015) co-located with 14th International Semantic Web Conference (ISWC 2015), 2015.

[5] Klimov, Y. Shahar, iALARM: An Intelligent Alert Language for Activation, Response, and Monitoring of Medical Alerts, in: Process Support Knowl. Represent. Heal. Care SE – 10, Springer International Publishing, 2013: pp. 128–142. doi:10.1007/978-3-319-03916-9_10.

[6] Van Woensel, N. Al Haider, P.C. Roy, A.M. Ahmad, S.S.R. Abidi, A comparison of mobile rule engines for reasoning on semantic web based health data, in: Proc. – 2014 IEEE/WIC/ACM Int. Jt. Conf. Web Intell. Intell. Agent Technol. – Work. WI-IAT 2014, 2014. doi:10.1109/WI-IAT.2014.25.

[7] W. Van Woensel, N. Al Haider, A. Ahmad, S.S.R. Abidi, A Cross-Platform Benchmark Framework for Mobile Semantic Web Reasoning Engines, in: Semant. Web — ISWC 2014, 2014: pp. 389–408. doi:10.1007/978-3-319-11964-9_25.

[8] W. Van Woensel, S. S. R. Abidi, Benchmarking Semantic Reasoning on Mobile Platforms: Towards Optimization Using OWL2 RL, in: Semantic Web Journal, Forthcoming (2018).

[9] W. Van Woensel, S. S. R. Abidi, Optimizing Semantic Reasoning on Memory-Constrained Platforms using the RETE Algorithm, in: 15th Extended Semantic Web Conference (ESWC 2018), Springer LNCS, Heraklion, Greece.