My thoughts on the ESWC 2018 conference

I attended the ESWC 2018 conference in Crete in the beginning of June 2018. It took a grueling flight from Canada (a good 24 hours) but it was certainly worth it – I met plenty of interesting people doing interesting things. Being a member of a health informatics research group, I looked at things with a lens from both a health informatics perspective as well as my own personal interests, which often overlap.

On Tuesday morning, Ioanna Manolescu gave a keynote on RDF graph summarization – finding an RDF graph that summarizes an input RDF dataset as succinctly and accurately as possible while still retaining the semantics. You can find more information on the RDFSummary project here. Summarization of RDF graphs has quite some applications, ranging from UI design to over data exploration to query optimization.

<div itemscope itemtype=”http://schema.org/Blog”>
<h1 itemprop=”name”>Mobilizing Linked Data</h1>
<div itemprop=”author” itemscope
itemtype=”http://schema.org/Person”>
Author: <span itemprop=”name”>William</span>
</div>
</div>Indeed, in my opinion, a major hurdle for third parties to effectively leverage embedded structured data in websites (e.g., RDFa, microdata) is lack of knowledge on where “useful” structured data can be found. Illustrating this point, to realize a useful third-party e-commerce scenario based on the WDC dataset [1-2], Petrovski et al. [3] had to resort to an NLP pipeline based on product labels and descriptions – which defeats the point of embedded structured data, in my opinion. As a companion effort to the WDC, summary RDF graphs could be generated per website based on their embedded structured data, to (a) identify websites for interesting third-party use cases, and (b) in general, determine the progress of this “lightweight” semantic web. A second possible application involves query distribution – given (a) summary RDF graph(s) per dataset, a highly informed decision can be made about where to send subqueries and the join opportunities between them. I’ve done some preliminary work in that field [4].

“We need to rethink our strategy of hoping the Internet will just go away.”

Regrettably, the jetlag caught up with me on Wednesday (also, I had misread the conference program) so I missed the second keynote but I heard good things. The third keynote by Milan Stankovic was about creating and selling a semantic web startup. It was undoubtedly very interesting for people with similar goals, but even for people without entrepreneurial ambitions it was still an interesting talk, ranging from matinal pigs to fear of technology. Moreover, it’s always nice to see successful applications of semantic web technology.

Below, you can find my thoughts on some of the talks I attended. I was surprised how many works applied machine learning on top of semantic web technology, which I think is an interesting development. Also, there seemed to be quite some work on NLP and Question Answering systems, which was a bit surprising.

At the end, I list some works that, in my opinion, are useful contributions to the operationalization of linked data.

Table of Contents

Machine Learning

Applications of machine learning included coping with incomplete knowledge graphs and semantically annotating free text. For instance, graph convolutional networks were presented as a solution for link prediction and entity classification. To produce additional, missing image-caption pairs for training image classifiers, Mogadala et al. combined deep learning with knowledge graphs.

Avvenuti et al. utilized a knowledge graph to extract geographic metadata related to parsed tokens from free text, using an SVM to deal with language polysemy and rule out incorrect matches (negative samples for training this SVM was trained were based on issues they had found during evaluation).

Anna Lisa Gentile gave a very interesting talk about aligning user conceptualizations to a target ontology, using a novel machine learning technique based on ontology hierarchies (hierarchical classification). It was evaluated by aligning user blog postings to the medDRA medical taxonomy.

I’m very intrigued about how knowledge graphs could be utilized to improve machine learning. Indeed, not only is the latter work (Gentile) useful from an ontology engineering point of view, it poses a contribution to the machine learning field as well!

Heterogeneous knowledge sources

In ontology alignment, Thieblin et al. utilized DL to model complex alignments between multiple ontologies, which involves full concept equivalency or concept inclusion in either direction. They presented a reference dataset with complex alignments for benchmarking ontology alignment approaches. Saeedi et al. presented an approach to perform entity resolution by clustering similar entities from different datasets, based on entity similarity combined with link strength and degree.

Georgala et al. proposed a dynamic link discovery system called CONDOR, which re-evaluates execution plans for link specifications at runtime.

This dynamic, re-evaluation aspect could be considered similar to query optimization in databases and RDF stores; at least in cases where limited statistical data is available, which is the assumption here. It nicely illustrates how cross-fertilization between different fields can yield useful results

QA systems

On QA systems, the Frankenstein system supports the dynamic composition of query answering pipelines and presents a set of re-usable components (named entity recognition, relation/concept learning, etc.).

Although this talk, and others on QA systems, were interesting and the work useful, I feel that semantic web technology still has much more to offer to QA systems: e.g., utilizing concept and relation subsumption to easily adapt information requests to be more generic or specific, based on returned results and/or user needs.

Operationalizing the Semantic Web

Here, I list works here that, in my opinion, operationalize the semantic web for large-scale utilization and (mobile) deployment. (Also, they don’t seem to fit in other general categories.)

The HDT (Header-Dictionary-Triples) format is a highly compressed format for RDF triples (along the lines of 120Gb -> 13Gb), which allows for the efficient execution of certain queries when still in uncompressed format. The talk on HDTQ did a good job of introducing this format, and proposed the HDTQ extension that allows compressing quads as well. Similarly, Charpenay et al. introduced a binary object notation for RDF, by encoding its JSON-LD format using existing binary JSON formats. You can find the github project here.

I feel that this kind of “low-level” work is elementary in realizing efficient and realistic semantic web solutions. Regrettably, in my experience, it is quite hard to publish (and get funding) for this sort of work – it’s very easy to shoot at as a reviewer (“you didn’t compare to (my) X, Y or Z system!”), at least, compared to “novel” high-level approaches. We saw quite a lot of the latter at the conference, but not too many of the former. This kind of work builds a solid foundation for semantic web applications and, in my opinion, typically does not get enough attention.

Margara et al. discussed temporal reasoning over RDF streams with a particular focus on temporal relations. It’s very interesting work, and it may have interesting applications in health informatics – indeed, there is work on preventing alert fatigue in physicians that focuses on firing alerts only when adverse temporal patterns are found (e.g., quickly increasing fever) [5].

I talked about Clinical Practice Guidelines (CPG) and their computerization in the form of Clinical Decision Support Systems (CDSS). Combined with the increasing need to allow patients to self-manage their illness, as well as improvements in mobile hardware, an opportunity arises for mobile patient diaries outfitted with local CDSS. In this context, I talked about benchmarks we performed for rule engines on mobile platforms [6-7], as well as efforts to optimize OWL2 RL for realizing ontology-based reasoning on mobile platforms [8-9]. Our paper concerned extending RETE, an algorithm underlying many rule engines, with special consideration for the many generic rules in OWL2 RL and typical semantic web scenarios.

Note that this kind of work has many applications outside of health informatics – by utilizing local decision support, less remote (cloud) resources are needed; bandwidth usage can be greatly reduced (especially for raw sensor data); and there is no influence of network conditions on local, time-sensitive tasks.

Soto et al. proposed an extended SERVICE clause that allows referencing values from JSON service output, identified via a JSONPath expression. I found this work quite interesting – one could extend the SERVICE clause with support for any other kind of format (XML, CSV, HTML, ..) this way. Perhaps a more SPARQL-friendly way of referencing and querying this data could be possible as well.

Calvanese et al. discussed the need for canonical URIs when integrating multiple relational databases in the context of OBDA, and presented an extension of SPARQL entailment to that end. This work was done in the context of a large-scale ODBA project (the Statoil use case) that resulted in the Ontop artefact. I feel that this work is a quintessential example of effective operationalization: it illustrated a clear need for a SPARQL extension, based on real-world problems, and did a good job of explaining its formalization.

References

[1] C. Bizer, K. Eckert, R. Meusel, H. Mühleisen, M. Schuhmacher, J. Völker, Deployment of RDFa, Microdata, and Microformats on the Web; A Quantitative Analysis, in: Proc. 12th Int. Semant. Web Conf. – Part II, Springer-Verlag New York, Inc., New York, NY, USA, 2013: pp. 17–32. doi:10.1007/978-3-642-41338-4_2.

[2] R. Meusel, C. Bizer, H. Paulheim, A Web-scale Study of the Adoption and Evolution of the Schema.Org Vocabulary over Time, in: Proc. 5th Int. Conf. Web Intell. Min. Semant., ACM, New York, NY, USA, 2015: p. 15:1–15:11. doi:10.1145/2797115.2797124.

[3] P. Petrovski, V. Bryl, C. Bizer, Integrating product data from websites offering microdata markup, in: C.-W. Chung, A.Z. Broder, K. Shim, T. Suel (Eds.), 23rd Int. World Wide Web Conf. {WWW} ’14, Seoul, Repub. Korea, April 7-11, 2014, Companion Vol., ACM, 2014: pp. 1299–1304. doi:10.1145/2567948.2579704.

[4] W. Van Woensel, Mobile semantic query distribution with graph-based outsourcing of subqueries, in: 1st International Workshop on Mobile Deployment of Semantic Technologies (MoDeST 2015) co-located with 14th International Semantic Web Conference (ISWC 2015), 2015.

[5] Klimov, Y. Shahar, iALARM: An Intelligent Alert Language for Activation, Response, and Monitoring of Medical Alerts, in: Process Support Knowl. Represent. Heal. Care SE – 10, Springer International Publishing, 2013: pp. 128–142. doi:10.1007/978-3-319-03916-9_10.

[6] Van Woensel, N. Al Haider, P.C. Roy, A.M. Ahmad, S.S.R. Abidi, A comparison of mobile rule engines for reasoning on semantic web based health data, in: Proc. – 2014 IEEE/WIC/ACM Int. Jt. Conf. Web Intell. Intell. Agent Technol. – Work. WI-IAT 2014, 2014. doi:10.1109/WI-IAT.2014.25.

[7] W. Van Woensel, N. Al Haider, A. Ahmad, S.S.R. Abidi, A Cross-Platform Benchmark Framework for Mobile Semantic Web Reasoning Engines, in: Semant. Web — ISWC 2014, 2014: pp. 389–408. doi:10.1007/978-3-319-11964-9_25.

[8] W. Van Woensel, S. S. R. Abidi, Benchmarking Semantic Reasoning on Mobile Platforms: Towards Optimization Using OWL2 RL, in: Semantic Web Journal, Forthcoming (2018).

[9] W. Van Woensel, S. S. R. Abidi, Optimizing Semantic Reasoning on Memory-Constrained Platforms using the RETE Algorithm, in: 15th Extended Semantic Web Conference (ESWC 2018), Springer LNCS, Heraklion, Greece.