Research: Perceptual Properties for Relational Databases

Fields: Database Query Processing, Subspace Clustering, Probabilistic Databases, Sentiment Analysis, Recommender Systems.

Some of the most valuable features of Relational Databases are clearly defined schemas with crisp semantics, thus allowing for rich and complex declarative queries. However, this also comes at a cost: the underlying schema must be carefully designed upfront to support queries expected to fulfill the information need of future users, and the modelling of the structured schema should represent the actual nature and semantics of the represented real world entities in such a way that it naturally aligns with the internalized semantics of user issuing the queries. Here, in some application scenarios, this focus on strict schemas can become problematic. As an example, consider an e-commerce scenario focusing on selling experience products like movies, books, music, or games. Here, the perceived properties describing the user experience those products will entail (which, for most people, is the deciding factor for buying the product) are difficult to capture using relational schemas, which thus often leads to a focus on more objective and crisp properties like production year, actor names, or rough genre labels. Thus, many queries users would naturally perform are not supported by the system, as for example queries for movies which “feel” like a given example movie, or movies which feature a “thought-provoking plot”, movies which are “educational”, or “suitable for children” (we call those queries human-centered queries, as they are the queries most humans would use in a natural conversation with another human, but are often not supported by information systems). One of the challenges around perceived properties of experience products is that it is very hard to foresee during schema design time which properties will be relevant for users, and how they are perceived by them (i.e., the challenge of obtaining values for the properties.) Especially, many of these properties might even be subjective, and thus the perception of different users might differ or be even conflicting (e.g., there might be conflicting views on how “funny” a given movie is) - and no single clear value exists. 

I claim that most of the perceptual information required to support such human-centered queries can be obtained from user-generated judgements as for example ratings, comments, or reviews. This form of feedback, which can be seen as self-motivated crowdsourcing is a promising source of information as such judgements usually cover the perceptual properties and aspects deemed important by the creator of the judgement. However, integrating this rich source of information into the query process is hard due to the aforementioned challenges, and many applications choose not to try an integration at all: e.g., in most applications (like for example web shops), user reviews are simply displayed for manual consumption, or user ratings might be used within a recommender systems – but usually it is not possible to access the richness of information contained in human judgements in a declarative and explicit relational fashion.   

In this line of work, we are exploring the challenge of supporting such human-centered queries focusing on perceptual properties from a database query processing perspective.

The outlined contributions are as follows:

  • Developing a general vision of a database system using perceptual properties, and discuss a high-level model of how to integrate perceptual properties into a suitable data model.
  • A special focus will be on consensual perceptual properties to deal with subjectivity in user perception, i.e., properties of entities for which the values emerge form a consensus in perception of a larger user base.  Also, we introduce multi-consensual properties for which there is not a single, but multiple consensual values.
  • Research into both explicit and latent properties. Here, explicit properties have a real-world interpretation which is explainable to users, while latent properties are opaque but still can be used for several query types like similarity queries.
  • Investigating how perceptual properties are represented within a database system. A promising candidate is adapting probabilistic databases, coupled with subspace clustering and exploration to deal with both subjectivity and uncertainty of extraction.
  • Developing multiple prototype implementations of systems which can extract, store, and process perceptual properties. Each of these implementations focus on a specific subset of the challenge, e.g., extracting explicit properties, or dealing with multi-consensual values. The long-term goal is to aggregate and combine these individual systems into a larger demonstrator which can be used to showcase the research results. 
Christoph Lofi