william – Mobilizing Linked Data

June 25, 2020July 20, 2022

Continuous Transactions on the Semantic Web

A while back I needed a logical formalism with clear semantics for KB updates that was non-monotonic and had an all-or-nothing flavor, i.e., either all updates are applied or none of them.

I start by elaborating on the use case that triggered my need for this. If you’re only interested in the continuous-transaction concept, feel free to skip ahead.

Use case

The use case involved the integration of comorbid Clinical Practice Guidelines (CPG). In general, CPG are carefully crafted, evidence-based recommendations for guiding diagnosis, prognosis, and treatment of a specific illness [1]. By computerizing CPG using e.g., PROforma [2] or SDA* [3], Clinical Decision Support (CDS) systems are able to deliver care recommendations based on the latest patient medical data.

In case of comorbidity however, i.e., a patient suffering from multiple conditions at the same time (e.g., hypertension and diabetes), the relevant CPG sometimes contain adverse interactions. For instance, hypertension guidelines recommend thiazide diuretics but these often worsen glycemic control in diabetes patients. Aligning multiple CPG to avoid adverse interactions, and, ideally, improve quality of life and optimize resources , is typically called the integration / alignment / mitigation of comorbid CPG.

In our work, we wanted to focus on execution-time integration of comorbid CIG—since many relevant factors only become known at execution-time. Most importantly, these factors include the patient’s changing health parameters, but also the availability of clinical resources, and so on.

For instance, guidelines for Venous Thromboembolism (VTE) prescribes treatment with Warfarin, whereas antibiotics (e.g., Erythromycin) are recommended for Respiratory Tract
Infection (RTI). VTE and RTI often occur comorbidly, but Erythromycin potentiates the anticoagulant effect of Warfarin, thus increasing the risk of bleeding. However, simply stopping treatment with either medication is not an option, since both are needed to treat the respective conditions. Clinical pragmatics would rather dictate reducing the Warfarin dose while Erythromycin is being prescribed, based on the patient’s INR value, until it returns to normal.

We can express this as follows:

:Warfarin_Erythromycin a :EventBasedReplacePolicy ;
  :intPt cig:prescribe_Warfarin ;
  :event cig:prescribe_Erythromycin ; 
  :replacement [ :time :during ;
	:value cig:adjust_Warfarin_during_Erythromycin ;
	:exitCond [ cig:INR_value cig:normal ] ] ;
  :replacement [ :time :after ;
	:value cig:adjust_Warfarin_after_Erythromycin ;
	:exitCond [ cig:INR_value cig:normal ] ] .

The prescription of Warfarin is an integration point, and the prescription of Erythromycin is the event: (a) during the event, the Warfarin dose (integration point) will be adjusted based on the patient’s INR value, until the value returns to normal; (b) after the event (i.e., Erythromycin treatment), the Warfarin dose (integration point) will again be adjusted until the INR value becomes normal.

To support these kinds of scenarios, using a “regular” set of rules, i.e., without semantics for change and lacking an all-or-nothing flavor, one would need (a) a set of rules to apply the “during” and “after” replacements, which infer actions to adjust Warfarin prescription; and (b) a set of rules to undo these replacements, i.e., inferring actions to undo the adjustment when the event is no longer active, or the INR values return to normal.

This is only one example of an integration policy; there were many more, some involving not only replacements but also “delay” operations. This lead to a very complex, unwieldy and difficult to maintain ruleset, which you can find in our journal paper [4] in quite some (excruciating) detail.

Transaction Logic

Transaction Logic ($\mathcal{Tr}$) is a non-monotonic logic with clear semantics for changes in a knowledge base (KB), where a rule has an all-or-nothing flavour, i.e., either all its updates are applied, or none of them are, meaning they can be considered as atomic transactions.

$\mathcal{Tr}$ relies on executional entailment, where establishing the truth value of an expression involves executing that expression. In particular, proving a query involves finding an execution path that entails the query:
$$P, D_0, .., D_n \models \psi$$

Where P represents a transaction base, is an execution path, and ψ is a transaction. In particular, the entailment states that transaction ψ is entailed by execution path $D_0, .., D_n$ given P . For example, the following holds:
$$P, D, D+\{in(pie, sky)\} \models ins:in(pie, sky)$$

If update (or “transition”) $ins:in(pie, sky)$ is defined as inserting the atom $in(pie, sky)$.

Transaction Logic defines special semantics for logical conjunction, disjunction and negation, and introduces a new operator called serial conjunction (symbol ). We only utilize serial conjunction here. The following executional entailment:
$$P, D_0, .., D_n \models \psi \otimes \phi$$

Will hold if and only if an execution path exists where the following holds: $D_0, .., D_k \models \psi$ and $D_k, .., D_n \models \phi$. For instance, the following entailment holds:
$$P, D, D+\{lost\}, D+\{lost, sad\} \models ins:lost \otimes ins:sad$$

If transitions ins:lost and ins:sad are defined to insert atoms lost and sad into the KB, respectively.

Using serial conjunction, $\mathcal{Tr}$ allows combining updates with conditions. This means that pre- and post-conditions for updates are possible. For example, transaction $ins:won \otimes happy$ includes an update followed by a condition: in case “happy” is true in the final KB state, then the transaction will be committed; else, the transaction will be rolled back to the initial state.

This notion of a post-condition makes $\mathcal{Tr}$ interesting for many practical applications. For example, in case an update can be linked to an external component (anything from a planning algorithm to a robot), it may turn out to be unsuccessful. In that case, any prior update in the transaction will be rolled back.

Rules are written in the form of $p := \phi$, where $\phi$ is any formula and p is an atomic formula. This rule has a procedural interpretation: to prove p it is sufficient to prove $\phi$, which, in line with executional entailment, involves executing $\phi$. For example:

$deposit(acc, amt) := balance(bal,acc) \otimes del:balance(bal, acc) $
$\otimes ins:balance(bal+amt,acc) \\ \otimes balance(bal2,acct) \otimes bal2=bal+amt$

In case the post-condition $bal2=bal+amt$ does not hold, the deposit transaction will be rolled back. We refer to Bonner and Kifer [5] for much more on $\mathcal{Tr}$.

Continuous Transactions?

For instance, by using these kinds of transactions to implement the CIG integration scenario, we no longer need (a) a separate ruleset to apply “during” and “after” replacements, and (b) a second ruleset to undo these replacements (i.e., when conditions no longer hold). Instead, we can simply have transactions that apply a during / after replacement – when conditions no longer hold, the replacements will be rolled back by design. This may not seem like such a big deal, but really, have a look at the journal paper [4] and you’ll see how unwieldy a non-transactional ruleset can quickly become in the context of CIG integration. (I’m not saying this to bash the authors of this paper – I’m one of the authors and was partly responsible for the ruleset.)

However, transactions in are rather considered subroutines: invoking a query such as $?- deposit(P, U)$ will attempt to execute the rule body of the deposit transaction, as per executional entailment. In dynamic scenarios, a condition may become false, or true, at any time after its initial execution. For instance, at runtime, an external planner may decide that a given constraint is no longer feasible, or, inversely, a constraint suddenly becomes feasible. At that point, the transaction should also be rolled back or re-applied, respectively.

Essentially, this requires us to keep track of successful execution paths, so they can be rolled back at any time later on. The RETE forward-chaining algorithm keeps successful joins by design, which collectively can be seen as “execution paths”. This makes RETE an straightforward choice for implementing “continuous” transactions. You can find the implementation as an extension of Apache Jena on GitHub. I’m currently preparing a publication on this topic so I will just give a broad-brush overview here and leave the details to that paper.

In a nutshell, the RETE algorithm compiles a RETE structure for a given rule. So-called alpha nodes represent clauses within the rule, and beta nodes represent joins within the rule. An alpha node has an alpha memory that keeps tokens matched to the clause; a beta node has a beta memory that keeps tokens that were successfully joined up until that point.The RETE structure for an example rule is shown below (example taken from Van Woensel et al. [6]):

Rule: $(?c,intersectionOf,?x),(?x,hasMember,?ci),(?y,type,?c) → (?y,type,?ci)$

The $\beta_1$ node will keep all tokens that were successfully joined on the $?x$ variable; the $\beta_2$ node will keep all tokens that were previously joined on the $?x$ variable and successfully joined on the $?c$ variable. By keeping intermediate join results, a lot of join work can be avoided. For our purposes, these collective join results constitute prior execution paths, which allows us to easily rollback prior executions of a rule at runtime.

Adapted RETE algorithm

To support continuous transactions, we adapt the standard RETE algorithm as follows.

– Each alpha node can represent a functor (or built-in) that involves issuing an update to the KB. When rolling back a transaction, alpha nodes will be notified in order to rollback their update.

– The join process is extended as follows.

(1) When joining an incoming token fails, a rollback is initiated .

Due to the nature of the RETE algorithm, we know that all prior nodes contributed to this join. Hence, this rollback will propagate back to the start of the RETE structure.

Essentially, this supports the “classic” $\mathcal{Tr}$ where, during a single rule execution, an update fails and the prior updates need to be rolled back.

(2) When a deleted token is successfully joined, a rollback is initiated.

This supports the “continuous” $\mathcal{Tr}$ where a transaction can be rolled back at any point during runtime.

In this case, the rollback will proceed in both directions of the RETE structure:

Prior nodes : as before, due to the nature of the RETE algorithm, we know that all prior nodes contributed to this join. Hence, this rollback will propagate back to the start of the RETE structure.

Next nodes : by design, the RETE algorithm will propagate the token deletion down the RETE structure (since all tokens resulting from a join with this deleted token should be deleted as well). Whenever this deleted token is successfully joined to a beta node, we will issue a rollback to the connected alpha node.

For the concrete implementation, please refer to the GitHub repository.

References

[1] Brush, J.E., Radford, M.J., Krumholz, H.M.: Integrating Clinical Practice Guidelines Into the Routine of Everyday Practice. Crit. Pathways Cardiol. A J. Evidence-Based Med. 4, 161–167 (2005).
[2] Sutton, D.R., Fox, J.: The syntax and semantics of the PROforma guideline modeling language. J Am Med Inf. Assoc. 10, 433–443 (2003).
[3] Riano, D.: The SDA Model: A Set Theory Approach. Twentieth IEEE International Symposium on Computer-Based Medical Systems (CBMS’07). pp. 563–568. IEEE (2007).
[4] Jafarpour, B., Abidi, S.S.R., Abidi, S.R.: Exploiting Semantic Web Technologies to Develop OWL-Based Clinical Practice Guideline Execution Engines. IEEE J. Biomed. Heal. Informatics. 20, 388–398 (2016).
[5] Bonner, A.J., Kifer, M.: An Overview of Transaction Logic. Theor. Comput. Sci. 133, 205–265 (1994).
[6] Van Woensel W., Abidi S.S.R. (2018) Optimizing Semantic Reasoning on Memory-Constrained Platforms Using the RETE Algorithm. In: Gangemi A. et al. (eds) The Semantic Web. ESWC 2018. Lecture Notes in Computer Science, vol 10843. Springer, Cham

June 23, 2018July 20, 2022

Converting between rule formats

A frustrating issue that arises in rule-based reasoning is the plethora of rule formats. From a syntactic point of view, these formats can range from more or less similar (e.g., SPIN, Apache Jena, SPARQL-related) to wholly different ones (e.g., Datalog). This means that interesting rule engines, which you may want to compare regarding performance, may support different rule formats.

I ran into this issue when looking for a rule engine to implement Clinical Decision Support (CDS) for a mobile patient diary [1]. The heavyweight reasoning would still be deployed on the server side (were Drools was utilized), whereas urgent, time-sensitive reasoning (e.g., in response to new vital signs) would deployed on the mobile client.

Since the mobile patient diary was going to be developed using Apache Cordova, I looked at both Java and JavaScript rule engines. I ported some reasoners manually to the Android platform, including some OWL reasoners (Roberto Yus did the same for a similar project). If you’re interested, all the benchmarked Android reasoners can be found here; regarding JavaScript reasoners, I compared the RDFQuery, RDFStore-JS (outfitted with naive reasoning) and Nools reasoners; benchmark results were reported in the literature [2-4]. In the context of this work, I also worked on optimizing OWL2 RL [4], a partial rule-based axiomatization of OWL2 for ontology reasoning, which I posted about before.

Ideally, I wanted a solution where I only needed to maintain a single ruleset for a given purpose (e.g., for CDS or OWL2 RL reasoning), which could then be converted to any other rule format when needed. I chose SPARQL Construct as the input format since SPARQL is well understood by most Semantic Web / Linked Data developers. Further, the SPARQL Inferencing Notation (SPIN) easily allows storing SPARQL queries together with the domain model.

To support this solution, I developed a simple web service for converting SPARQL Construct rules to Datalog, Apache Jena format and Nools format. Note that the Nools format is very similar to the Drools format – so it may be used to convert to Drools rules as well (given some adjustments). Further, the service can convert RDF data to the Datalog and Nools formats as well.

Note that this conversion may be far from perfect – since I focused solely on the language features that I needed at the time. Also, this is a limited number of formats that were motivated by my needs at the time. So, feel free to contribute to the project!

References

[1] W. Van Woensel, P.C. Roy, S.R. Abidi, S.S.R. Abidi, A Mobile and Intelligent Patient Diary for Chronic Disease Self-Management, in: Stud Heal. Technol Inf., 2015: pp. 118–122. doi:10.3233/978-1-61499-564-7-118.

[2] Van Woensel, N. Al Haider, P.C. Roy, A.M. Ahmad, S.S.R. Abidi, A comparison of mobile rule engines for reasoning on semantic web based health data, in: Proc. – 2014 IEEE/WIC/ACM Int. Jt. Conf. Web Intell. Intell. Agent Technol. – Work. WI-IAT 2014, 2014. doi:10.1109/WI-IAT.2014.25.

[3] W. Van Woensel, N. Al Haider, A. Ahmad, S.S.R. Abidi, A Cross-Platform Benchmark Framework for Mobile Semantic Web Reasoning Engines, in: Semant. Web — ISWC 2014, 2014: pp. 389–408. doi:10.1007/978-3-319-11964-9_25.

[4] W. Van Woensel, S. S. R. Abidi, Benchmarking Semantic Reasoning on Mobile Platforms: Towards Optimization Using OWL2 RL, in: Semantic Web Journal, Forthcoming (2018).

June 23, 2018July 20, 2022

Creating custom OWL2 RL rulesets

Yes, you may have noticed an abundance (cornucopia?) of OWL2 RL-related posts on this blog. I’ve done some work in this field, and thought it best to split up the work into more manageable chunks.

I’ve found that when working with OWL2 RL,

A number of important options for optimization exist;
Certain rules involving n-ary lists can be supported in multiple ways.

Given these considerations, wouldn’t it be cool if someone can easily create their own OWL2 RL ruleset, geared towards their particular application scenario (availability of a “stable” ontology; need for full OWL2 RL conformance; need for n-ary rules)?

I first discuss these considerations below. Next, I talk about a web service that can generate a custom OWL2 RL ruleset accordingly.

Optimization

Regarding optimization, I’ve proposed 3 selections to create OWL2 RL subsets. You can find a summary below.

In [1], I elaborate on these selections (and their impact on performance) in much more detail.

Equivalent OWL2 RL rule subset

Leaves out logically equivalent rules; replaces a set of specific rules by a single general rule; and may drop rules based on redundancy at the instance level.

Note that this subset focuses in particular on reducing the OWL2 RL ruleset, not necessarily optimizing it for reasoning (e.g., using more general rules has been shown to reduce performance, as shown in [1]).

Purpose and reference-based subsets

Divides rule subsets via their purpose and referenced data, allowing smaller rulesets to be applied in certain runtime scenarios.

OWL2 RL rules perform either inference or consistency-checking (purpose), and refer to instances and schema or only schema elements (reference). Further, rules that will not yield inferences over a particular ontology can be left out as well, by applying a separate pre-processing step (domain-specific).

Importantly, the applicability of this selection depends on the concrete use case, and whether the ontology can be considered relatively stable (i.e., not prone to change).

First, one can create two separate OWL2 RL rulesets, respectively containing rules that (a) refer to instances & schema assertions, and (b) only refer to schema assertions. While ruleset (a) is relevant whenever new instances are added, ruleset (b) is only relevant at initialization time and whenever the ontology changes. A (rather trivial) proof in [1] confirms that this process yields the same inferences as the full OWL2 RL ruleset.

Second, one can create a domain-specific ruleset, leaving out rules that will not yield inferences over a given ontology. Although this selection may yield huge performance improvements, note that it is especially brittle in dynamic settings — aside from ontology updates, even new data patterns may render a rule relevant (e.g., reciprocal subclass-of statements). In case of ontology updates, or new data patterns, the domain-specific ruleset will need to be generated again.

Removal of inefficient rules

This selection leaves out rules with a large performance impact.

Currently, it only leaves out the #eq-ref rule, which infers that each resource is equivalent to itself. This rule generates 3 new triples for each triple with unique resources, resulting in a worst-case 4x increase in dataset size (!).

N-ary rules

So-called n-ary rules refer to a finite list of elements. A first subset (L1) of these rules enumerate (i.e., list one by one) restrictions on single list elements. For instance, rule #eq-diff2 flags an ontology inconsistency if two equivalent elements of an owl:AllDifferent construct are found. Rules from the second subset (L2) include restrictions referring to all list elements, and a third ruleset (L3) yields inferences for all list elements . E.g., for (L2), rule #cls-int1 infers that y is an instance of an intersection in case it is typed by each intersection member class; regarding (L3), for any union, rule #scm-uni infers that each member class is a subclass of that union.

To support rulesets (L1) and (L3), two list-membership rules can be added that recursively link each element to preceding list cells, eventually linking the first cell to all list elements. Three possible solutions can be applied for (L3), each with their own advantages and drawbacks. These solutions are summarized here and elaborated in more detail in [1].

Web service

I developed a web service that allows anyone to create their own custom OWL2 RL ruleset, by applying one or more optimizations and/or solutions for n-ary lists. It includes an initial OWL2 RL ruleset as a set of SPARQL Constructs (find it separately here). You can find the project on GitHub!

I previously posted about simple Java and Android projects that utilize an OWL2 RL ruleset for ontology reasoning. To convert the OWL2 RL ruleset into different formats (such as the Apache Jena one), there’s a project for that too.

Just plugin your own custom OWL2 RL ruleset, and feel free to share your results/thoughts in the comments!

References

[1] W. Van Woensel, S. S. R. Abidi, Benchmarking Semantic Reasoning on Mobile Platforms: Towards Optimization Using OWL2 RL, in: Semantic Web Journal, Forthcoming (2018).

June 21, 2018July 20, 2022

OWL2 RL on Android

AndroJena represents a porting of Apache Jena to the Android platform. Hence, it represents an ideal starting place for working with semantic web data on mobile systems.

To help people to get started with AndroJena, I’ve created separate Android Studio modules for AndroJena, ARQoid and Lucenoid (i.e., ported Android versions of their Jena counterparts). I also created a simple AndroidRules project that includes these modules to perform ontology reasoning, utilizing an OWL2 RL ruleset [1]. Clearly, you can use any other ruleset as well. You can find everything on GitHub!

The project was created to be usable out-of-the-box — just point Android Studio towards the project folder (File > Open…) and select the AndroidRules project. If you run into issues, checkout the README file.

The AndroidRules app loads a small subset of the pizza ontology and infers types for DominosMargheritaPizza.

In another post I describe a project that implements OWL2 RL reasoning using the regular Apache Jena.

References

[1] W. Van Woensel, S.S.R. Abidi, Benchmarking Semantic Reasoning on Mobile Platforms: Towards Optimization Using OWL2 RL, Semant. Web J. (2018).

June 19, 2018July 20, 2022

OWL2 RL + Apache Jena

In the context of realizing ontology reasoning on resource-constrained platforms [1], I created an OWL2 RL ruleset in SPARQL Construct format. Note that, to support all n-ary rules, the ruleset includes a set of auxiliary rules that, as a side-effect, generate intermediary (meaningless) inferences. If support for these n-ary rules is not required, these rules could be removed (last 9 rules).

You can find this ruleset here, and the accompanying axioms here.

To illustrate how to use Apache Jena and the OWL2 RL ruleset for ontology-based reasoning, I created a simple Java Maven project on GitHub. This could be an efficient alternative for those having performance issues with Jena’s built-in ontology reasoning support; seeing how OWL2 RL is an OWL2 profile that trades expressivity for performance.

(Note that I used my conversion web service to convert the OWL2 RL ruleset into the Apache Jena rule format.)

Check it out and let me know what you think!

Note that another post discusses how OWL2 RL can be utilized for ontology reasoning on the Android platform.

References

[1] Van Woensel, S.S.R. Abidi, Benchmarking Semantic Reasoning on Mobile Platforms: Towards Optimization Using OWL2 RL, Semant. Web J. (2018).

June 16, 2018July 20, 2022

My thoughts on the ESWC 2018 conference

I attended the ESWC 2018 conference in Crete in the beginning of June 2018. It took a grueling flight from Canada (a good 24 hours) but it was certainly worth it – I met plenty of interesting people doing interesting things. Being a member of a health informatics research group, I looked at things with a lens from both a health informatics perspective as well as my own personal interests, which often overlap.

On Tuesday morning, Ioanna Manolescu gave a keynote on RDF graph summarization – finding an RDF graph that summarizes an input RDF dataset as succinctly and accurately as possible while still retaining the semantics. You can find more information on the RDFSummary project here. Summarization of RDF graphs has quite some applications, ranging from UI design to over data exploration to query optimization.

<div itemscope itemtype=”http://schema.org/Blog”>
<h1 itemprop=”name”>Mobilizing Linked Data</h1>
<div itemprop=”author” itemscope
itemtype=”http://schema.org/Person”>
Author: <span itemprop=”name”>William</span>
</div>
</div>Indeed, in my opinion, a major hurdle for third parties to effectively leverage embedded structured data in websites (e.g., RDFa, microdata) is lack of knowledge on where “useful” structured data can be found. Illustrating this point, to realize a useful third-party e-commerce scenario based on the WDC dataset [1-2], Petrovski et al. [3] had to resort to an NLP pipeline based on product labels and descriptions – which defeats the point of embedded structured data, in my opinion. As a companion effort to the WDC, summary RDF graphs could be generated per website based on their embedded structured data, to (a) identify websites for interesting third-party use cases, and (b) in general, determine the progress of this “lightweight” semantic web. A second possible application involves query distribution – given (a) summary RDF graph(s) per dataset, a highly informed decision can be made about where to send subqueries and the join opportunities between them. I’ve done some preliminary work in that field [4].

“We need to rethink our strategy of hoping the Internet will just go away.”

Regrettably, the jetlag caught up with me on Wednesday (also, I had misread the conference program) so I missed the second keynote but I heard good things. The third keynote by Milan Stankovic was about creating and selling a semantic web startup. It was undoubtedly very interesting for people with similar goals, but even for people without entrepreneurial ambitions it was still an interesting talk, ranging from matinal pigs to fear of technology. Moreover, it’s always nice to see successful applications of semantic web technology.

Below, you can find my thoughts on some of the talks I attended. I was surprised how many works applied machine learning on top of semantic web technology, which I think is an interesting development. Also, there seemed to be quite some work on NLP and Question Answering systems, which was a bit surprising.

At the end, I list some works that, in my opinion, are useful contributions to the operationalization of linked data.

Machine Learning

Applications of machine learning included coping with incomplete knowledge graphs and semantically annotating free text. For instance, graph convolutional networks were presented as a solution for link prediction and entity classification. To produce additional, missing image-caption pairs for training image classifiers, Mogadala et al. combined deep learning with knowledge graphs.

Avvenuti et al. utilized a knowledge graph to extract geographic metadata related to parsed tokens from free text, using an SVM to deal with language polysemy and rule out incorrect matches (negative samples for training this SVM was trained were based on issues they had found during evaluation).

Anna Lisa Gentile gave a very interesting talk about aligning user conceptualizations to a target ontology, using a novel machine learning technique based on ontology hierarchies (hierarchical classification). It was evaluated by aligning user blog postings to the medDRA medical taxonomy.

I’m very intrigued about how knowledge graphs could be utilized to improve machine learning. Indeed, not only is the latter work (Gentile) useful from an ontology engineering point of view, it poses a contribution to the machine learning field as well!

Heterogeneous knowledge sources

In ontology alignment, Thieblin et al. utilized DL to model complex alignments between multiple ontologies, which involves full concept equivalency or concept inclusion in either direction. They presented a reference dataset with complex alignments for benchmarking ontology alignment approaches. Saeedi et al. presented an approach to perform entity resolution by clustering similar entities from different datasets, based on entity similarity combined with link strength and degree.

Georgala et al. proposed a dynamic link discovery system called CONDOR, which re-evaluates execution plans for link specifications at runtime.

This dynamic, re-evaluation aspect could be considered similar to query optimization in databases and RDF stores; at least in cases where limited statistical data is available, which is the assumption here. It nicely illustrates how cross-fertilization between different fields can yield useful results

QA systems

On QA systems, the Frankenstein system supports the dynamic composition of query answering pipelines and presents a set of re-usable components (named entity recognition, relation/concept learning, etc.).

Although this talk, and others on QA systems, were interesting and the work useful, I feel that semantic web technology still has much more to offer to QA systems: e.g., utilizing concept and relation subsumption to easily adapt information requests to be more generic or specific, based on returned results and/or user needs.

Operationalizing the Semantic Web

Here, I list works here that, in my opinion, operationalize the semantic web for large-scale utilization and (mobile) deployment. (Also, they don’t seem to fit in other general categories.)

The HDT (Header-Dictionary-Triples) format is a highly compressed format for RDF triples (along the lines of 120Gb -> 13Gb), which allows for the efficient execution of certain queries when still in uncompressed format. The talk on HDTQ did a good job of introducing this format, and proposed the HDTQ extension that allows compressing quads as well. Similarly, Charpenay et al. introduced a binary object notation for RDF, by encoding its JSON-LD format using existing binary JSON formats. You can find the github project here.

I feel that this kind of “low-level” work is elementary in realizing efficient and realistic semantic web solutions. Regrettably, in my experience, it is quite hard to publish (and get funding) for this sort of work – it’s very easy to shoot at as a reviewer (“you didn’t compare to (my) X, Y or Z system!”), at least, compared to “novel” high-level approaches. We saw quite a lot of the latter at the conference, but not too many of the former. This kind of work builds a solid foundation for semantic web applications and, in my opinion, typically does not get enough attention.

Margara et al. discussed temporal reasoning over RDF streams with a particular focus on temporal relations. It’s very interesting work, and it may have interesting applications in health informatics – indeed, there is work on preventing alert fatigue in physicians that focuses on firing alerts only when adverse temporal patterns are found (e.g., quickly increasing fever) [5].

I talked about Clinical Practice Guidelines (CPG) and their computerization in the form of Clinical Decision Support Systems (CDSS). Combined with the increasing need to allow patients to self-manage their illness, as well as improvements in mobile hardware, an opportunity arises for mobile patient diaries outfitted with local CDSS. In this context, I talked about benchmarks we performed for rule engines on mobile platforms [6-7], as well as efforts to optimize OWL2 RL for realizing ontology-based reasoning on mobile platforms [8-9]. Our paper concerned extending RETE, an algorithm underlying many rule engines, with special consideration for the many generic rules in OWL2 RL and typical semantic web scenarios.

Note that this kind of work has many applications outside of health informatics – by utilizing local decision support, less remote (cloud) resources are needed; bandwidth usage can be greatly reduced (especially for raw sensor data); and there is no influence of network conditions on local, time-sensitive tasks.

Soto et al. proposed an extended SERVICE clause that allows referencing values from JSON service output, identified via a JSONPath expression. I found this work quite interesting – one could extend the SERVICE clause with support for any other kind of format (XML, CSV, HTML, ..) this way. Perhaps a more SPARQL-friendly way of referencing and querying this data could be possible as well.

Calvanese et al. discussed the need for canonical URIs when integrating multiple relational databases in the context of OBDA, and presented an extension of SPARQL entailment to that end. This work was done in the context of a large-scale ODBA project (the Statoil use case) that resulted in the Ontop artefact. I feel that this work is a quintessential example of effective operationalization: it illustrated a clear need for a SPARQL extension, based on real-world problems, and did a good job of explaining its formalization.

References

[1] C. Bizer, K. Eckert, R. Meusel, H. Mühleisen, M. Schuhmacher, J. Völker, Deployment of RDFa, Microdata, and Microformats on the Web; A Quantitative Analysis, in: Proc. 12th Int. Semant. Web Conf. – Part II, Springer-Verlag New York, Inc., New York, NY, USA, 2013: pp. 17–32. doi:10.1007/978-3-642-41338-4_2.

[2] R. Meusel, C. Bizer, H. Paulheim, A Web-scale Study of the Adoption and Evolution of the Schema.Org Vocabulary over Time, in: Proc. 5th Int. Conf. Web Intell. Min. Semant., ACM, New York, NY, USA, 2015: p. 15:1–15:11. doi:10.1145/2797115.2797124.

[3] P. Petrovski, V. Bryl, C. Bizer, Integrating product data from websites offering microdata markup, in: C.-W. Chung, A.Z. Broder, K. Shim, T. Suel (Eds.), 23rd Int. World Wide Web Conf. {WWW} ’14, Seoul, Repub. Korea, April 7-11, 2014, Companion Vol., ACM, 2014: pp. 1299–1304. doi:10.1145/2567948.2579704.

[4] W. Van Woensel, Mobile semantic query distribution with graph-based outsourcing of subqueries, in: 1st International Workshop on Mobile Deployment of Semantic Technologies (MoDeST 2015) co-located with 14th International Semantic Web Conference (ISWC 2015), 2015.

[5] Klimov, Y. Shahar, iALARM: An Intelligent Alert Language for Activation, Response, and Monitoring of Medical Alerts, in: Process Support Knowl. Represent. Heal. Care SE – 10, Springer International Publishing, 2013: pp. 128–142. doi:10.1007/978-3-319-03916-9_10.

[6] Van Woensel, N. Al Haider, P.C. Roy, A.M. Ahmad, S.S.R. Abidi, A comparison of mobile rule engines for reasoning on semantic web based health data, in: Proc. – 2014 IEEE/WIC/ACM Int. Jt. Conf. Web Intell. Intell. Agent Technol. – Work. WI-IAT 2014, 2014. doi:10.1109/WI-IAT.2014.25.

[7] W. Van Woensel, N. Al Haider, A. Ahmad, S.S.R. Abidi, A Cross-Platform Benchmark Framework for Mobile Semantic Web Reasoning Engines, in: Semant. Web — ISWC 2014, 2014: pp. 389–408. doi:10.1007/978-3-319-11964-9_25.

[8] W. Van Woensel, S. S. R. Abidi, Benchmarking Semantic Reasoning on Mobile Platforms: Towards Optimization Using OWL2 RL, in: Semantic Web Journal, Forthcoming (2018).

[9] W. Van Woensel, S. S. R. Abidi, Optimizing Semantic Reasoning on Memory-Constrained Platforms using the RETE Algorithm, in: 15th Extended Semantic Web Conference (ESWC 2018), Springer LNCS, Heraklion, Greece.