Semantic Digital Libraries

Fields:  Knowledge Extraction, Digital Libraries, Ontology Design.

Research Focus: This research line is an application of the fundamental theory and practice of human-centered information systems as developed in the fundamental research line. It handles (unstructured) text documents and their related (structured) meta-data in the context of digital libraries. In this scenario, the relevant meta-data which would be required to perform human-centric queries is unavailable. Therefore, that missing information again needs to be extracted from both external sources like user judgements, but also from the actual textual document itself. Thus, the focus of this research line is on the domain-specific knowledge extraction and linking techniques required to realize the vision of human-centered information systems for digital libraries.

Domain-Specific Pitch: Academic publications are a central repository of human-knowledge, and are at the core of scientific advancements both in academia itself, but also of industrial progress. However, tapping into this vast repository of knowledge is a daunting and challenging task, as the number of available publications is growing with tremendous speed. Without proper support, it is often hard or even impossible to find relevant publications related to a given problem in a timely fashion. Providing this support has always been the domain of libraries. However, the near exponential growth of con-tent in the recent years together with the shift to digital resources invalidated many well-proven workflows, demanding new solutions suitable for the current age and time. Efforts to make highly specialized academic knowledge accessible need to go beyond simple bibliographic metadata, the current state of the art. Instead, most information search of human users is inherently entity-centric, being in the most central aspect of publications people perceive as relevant. Some domains like medicine and chemistry have realized this trend early on, and invested heavily into annotating scientific publications with their most relevant entities to support more meaningful search and exploration like genes and proteins, chemical structures and molecules, or drug names. However, the efforts are very costly as they still rely heavily on manual curation and semi-manual workflows, and are thus pro-hibitive for many domains which lack the resources for such measures. Thus, in many domains the query capabilities and meta-data availability is insufficient to cope with the user’s information demand. Therefore, in this line of work, I propose to design, develop, and evaluate novel techniques for extracting entity, centric-meta from research publications for human-centered queries in a mostly automatic fashion, and showcase the effectiveness of our approaches in domains which currently lack support of rich semantic academic metadata. Beyond the obvious contributions like providing entity-centric search, offering facetted browsing capabilities, and realizing semantically meaningful recommendation and exploration of content, I can also use the extracted metadata for contributions to the digital library domain itself by tracking trends or the change of topics in a visual-ly appealing and comprehensive fashion.

Outlined contributions are as follows:

  • Extend the current state-of-the-art of systems re-search in the digital library domain by covering challenges like analyzing and annotation educational content, sequencing educational content into micro-learning objects, and developing both recommenda-tion and query capabilities
  • Developing a demonstrator prototype system which can augment current digital libraries with additional human-centric meta-data
  • Developing human-centered query capabilities utilizing that meta-data for innovative new query paradigms, as for example visual exploration or facetted navigation 



Example of Domain-Specific Content-Related Metadata
Example of possible analysis techniques: Visualization of Corpus, Trend-tracking
Christoph Lofi