On this page you will find more information about the courses and thesis projects from the Web Information Systems group at TU Delft.
Courses
For 2009-2010 the following new master-level courses are given:
- IN4324: Web & Semantic Web Engineering, 5ec, teacher: Geert-Jan Houben, period: 1+2
- IN4325: Information Retrieval, 5ec, teacher: Philipp Cimiano, period: 3
- IN4326: Seminar Web Information Systems, 5ec, teacher: Jan Hidders, period: 1
- IN3509: Web-Technologie, 5ec, teacher: Bernard Sodoyer, period: 1+2, in Dutch; this course is the successor of both IN3510 Internet Applications and IN3510 Advanced Design of Information Systems.
Thesis projects
The WIS group is open for students that want to do their thesis on subjects in the wider area of web information systems and information architecture.
At the WIS group we stimulate students to contact prof. Geert-Jan Houben (make an appointment via the secterary), to discuss possible topics for a thesis (and literature study). In such discussions students are free to suggest their own topic and then together a concrete thesis (or literature study) subject will be defined.
As a rough indication of possible topics for thesis projects, below we give some topics of projects that have been running, that are running or that are open for new students:
- Manipulating RDF models following the Object Oriented Paradigm
AcitveRDF is a Ruby API for accessing RDF data as objects. It is a very powerful framework but it also has many limitations. This The assignment proposes to build a new API for RDF manipulation taking the best features of ActiveRDF and extend it with new ones. The development of the API will be based on observations collected during years of development of semantic web applications. The main features to be added includes:
- Federated Query Mechanism
- Object Caching Mechanism
- Full Support to SPARQL and their main extensions (max, count ,etc)
- Full Support to SPARUL
- Connectivity and interchangeability with all RDF repositories.
- http://www.activerdf.org/
- http://docs.openlinksw.com/virtuoso/rdfnativestorageproviders.html
- http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04956617
- Using social network sites to prevent the user model cold-start
Personalization in applications is usually based on user or context models that represent relevant aspects of the user or context, e.g. the interests in films, films genres or actors in the case of a movie recommender application. Obtaining the data for these user models is often not an easy task, specially when the user starts using the application, the so-called cold-start. So, the import and use of data from other sites can help. This project aims to investigate how information from social network sites can be used to fill user models for a given application, e.g. in the domain of movies or TV programs, by studying the mapping between user models and implementing a configurable software tool that allows for extracting data from social networking sites and transforming it into a user model for the given application.
- Tag-to-concept mapping
Tagging is an easy way that many new web (2.0) applications apply for allowing users to express their opinion about given resources, e.g pictures in Flickr, videos in Youtube. Most information systems in practice, certainly most commercial websites, use a concept-based approach where via a schema or an ontology some structure is identified in the data. This structure is needed to efficiently organize and store the data in databases. Obviously the freedom of the tag-based approach and the structure of the concept-based approach require some "glue" if you want to use them together in an application, for example when an existing information system is extended with a tagging interface. The goal of this project is to study ways to effectively connect tags and concepts and create a software tool that allows to generate such a connection between tags and concepts.
- Ontology population from Wikipedia
The goal of this thesis is to develop, implement and evaluate a program which is able to populate an ontology on the basis of Wikipedia. Ontologies are the basis of the Semantic Web and are only useful if they are populated with data, i.e. instances of concepts and relations. Wikipedia has massive structured information in the form of infoboxes which could be used to populate a given ontology. However, the way that the information is structured in the ontology and in Wikipedia can differ substantially so that a mapping is required between both. At a conceptual level, the goal of this project is to envision an appropriate language for expressing these mappings as well as to implement a software tool which can execute these mappings and produce a populated ontology (on the basis of DBPedia) as output.
- TV Program Recommendation with Explicit Semantic Analysis
The problem of matching content and profiles is a widely acknowledged problem, which manifests itself particularly in the task of recommending TV programs to people on their basis of their profile and preferences. The goal of this thesis is to explore semantic techniques for this purpose. In particular, the goal will be to build on an approach known as Explicit Semantic Analysis (ESA) to recommend TV programs to users on the basis of their Facebook or Hyves. Besides establishing matches for the purpose of recommendation, a crucial goal is also to explain to users why certain content is recommended. This master thesis can be carried out in industry.
Some literature: Evgeniy Gabrilovich, Shaul Markovitch: Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 1606-1611, 2007.
- Representing Trust with RDF
Nowadays the Web helps people to produce, integrate and share data easily. Services such as Facebook, Flickr, Orkut, etc. allow millions of individuals and organizations to create online data and share information with networks of friends. Meanwhile, a large amount of data described in RDF is now published on the Web. Projects like Linked Open Data also build a mechanism to connect and share such data and knowledge. However, there is also an increasing amount of privacy abuses via social network applications such as unwanted exposure, distortion, identity theft and reputation damage. It will lead to issues regarding to trust, especially when it comes to personal data. The goal of this thesis project is to investigate how to represent a trust model that describes the relationships between different entities. The trust model will be defined using RDF or RDF-based vocabulary, e.g. FOAF, an RDF vocabulary that represents personal data and interpersonal relationships for the Semantic Web. Based on such a trust model, further actions like trust computation and propagation, can be investigated.
Some literature:
- Brondsema, D., Schamp, A.: Konfidi: Trust networks using PGP and RDF. In: Proc. of the Workshop on Models of Trust for the Web at WWW 2006, May 2006
- Olaf Hartig: Querying Trust in RDF Data with tSPARQL. In: Proceedings of the 6th European Semantic Web Conference (ESWC), Heraklion, Greece, Jun. 2009
- Ontology Localization
Ontologies are structures which encode the knowledge relevant for a certain domain. While in principle they constitute resources which are independent of a certain natural language, for several applications it is necessary to enrich them with information related to the way that we talk about the concepts and relations modeled in the ontology. As we can refer to the concepts and relations in different languages, a multilingual perspective is needed here. The goal of this master's thesis is to develop, implement and evaluate algorithms to localize a given ontology to different languages. For this purpose, Wikipedia can be exploited as a very useful and promising resource.
Some literature:
- M. Espinoza and A. Gomez Perez and Eduardo Mena, "LabelTranslator - A Tool to Automatically Localize an Ontology", Demonstration´Paper, Proceedings of ESWC 2008
- Mauricio Espinoza, Asunción Gómez-Pérez, Eduardo Mena: Enriching an Ontology with Multilingual Information. Proceedings of the European Semantic Web Conference (ESWC), pp. 333-347, 2008
-
Lightweight ontology integration
Just like there are many XML schemas on the web for different but related applications which have some kind of overlap, there are many different ontologies on the semantic web that are related and could be combined. In fact, for many web applications it could be interesting to combine their own ontologies with external ontologies such as user ontologies from social web sites or product ontologies from producers. However, current ontology repositories do not often scale very well when they are used to combine ontologies, since the rules that specify the relationship between the ontologies make it harder to automatically reason over the ontologies. The goal of this thesis is to (1) investigate the scalability of existing ontology repositories and especially, (2) determine what the typical features are of integrated ontologies that determine whether they scale well or not for large instances and (3) determine which subset of ontology features and integration rules would already be sufficient for many practical applications and implement those based on a relational database.
See: http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-82/SI_position_10.pdf
-
Database-based ontology reasoning
Many ontology repositories that support expressive ontology languages like OWL and query languages like SPARQL have scalability problems for large ontologies. The goal of this thesis is to determine a subset of OWL and SPARQL for which a relatively simple and efficient repository can be built based on a relational database with conventional techniques. In a next step it is investigated how a combination of a relational database and a conventional ontology repository can divide the computation of a SPARQL query over the two systems such that the total is more scalable than just a single ontology repository.
- Trust Propagation
A trust network is a graph-based network composed of trust relationships. A key point of a trust network is that it has the ability to expand itself, which is called “propagation”, by deriving new trust relationships from existing ones. The goal of this thesis is to investigate a propagation algorithm for a trust network. We adopt some open datasets of a trust network, e.g. advogato, for testing and verifying the algorithm.
Some literature:
- Guha, R., Kumar, R., Raghavan, P., Tomkins, A.: Propagation of trust and distrust. In: WWW ’04: Proceedings of the 13th international conference on World Wide Web, New York, NY, USA, ACM (2004) 403–412
- Ziegler, C.N., Lausen, G.: Propagation models for trust and distrust in social networks. Information Systems Frontiers 7 (2005) 37–358
- Web Mining for Complex Value Networks
An important challenge in the area of business intelligence is to automatically discover and maintain complex value chains that can be visualized. There has been much progress in areas such as web mining, text mining and information extraction from the Web. The goal of this master thesis is to develop web mining techniques that process dedicated web sites to extract relationships between different companies. The task includes at least the following subtasks: i) implementing a web crawler that crawls relevant web sites, ii) developing an approach for identifying companies in the corresponding websites as well as mapping the information there to the internal product catalogue, and iii) spot additional links/relationships between companies. This master thesis will be carried under the joint supervision of TU Delft and LINKS Analytics.
- Investigating the potential of Business Process Management in Financial Institutions
ING requires more insight into the 'Financial Statement & Closure Process' (FSCP) of ING Group N.V. and is interested in identifying risks, bottlenecks and potential improvements in this process. BPM claims to be able to solve these types of questions but it is unclear whether current BPM tools and techniques meet the business requirements of financial institutions such as ING. The answering of these questions will be used as a case study throughout the thesis.
- Coupling ARIS and Cordys for Business Process Driven SOA development
In this thesis, executed at IDS Scheer, research is conducted on how ARIS and Cordys can be coupled for Business Process Driven (BPD) SOA development and execution. Two major issues arise in attempt to couple: (1) Enable model transformations between high—level business models in ARIS and executable IT models in Cordys. This requires model translation and integration between EPC, UML and BPMN models. (2) Assure iterative round-trip BPD SOA development with minimum reconfigurations in ARIS and Cordys, addressing model synchronization and exchange formats like XPDL.
- Quantification of criteria for architectural decision making
When designing the architecture of an information system many design decisions are often made in an informal way. In this work it is attempted to make some of these decision more formal and explicit by developing a method for quantifying certain criteria for a given design, such as the level of security or the level of efficiency. Several existing methods in the literature will be combined, and the result will be tested by interviewing experts at Capgemini and verify whether their assessment of these criteria agrees with the developed method.
- Architecting High Performance Multi-tier Enterprise Information Systems
Modern, large-scale enterprises tend to employ high performance information systems designed by IT architects, who orchestrate business and organizational change through the reasoned application and integration of information technology. Although there are a variety of information systems architecture patters, today's enterprise information systems widely adopt a multi-tier architecture with each tier providing a particular functionality. However, each tier has an impact on the overall system's performance making performance prediction difficult. The thesis, at IBM Nederland, is aimed at presenting an empirical architecting process which addresses system performance in an enterprise environment.
- Application of Complex Event Processing in Electricity Distribution Systems
The research conducted at Logica is focused on a new technology called Complex Event Processing which has great potential for real-time data analysis. CEP is a technology developed for monitoring distributed computer network systems and it is now being applied to business problems. By using a standard format for representing events, it becomes feasible to detect complex events in the combination of multiple events streams. Intelligent analysis of events can lead to more detailed information about the location of a fault, so that it can be resolved quicker.
- The added value of Enterprise Architecture
The master thesis describes the results of a case study to quantify the effects of applying Enterprise Architecture within a financial institution. The thesis attempts to capture several factors at project level with respect to the application of Enterprise Architecture and its subsequent financial benefits. The study analyzed 40 projects, with regard to time and budget overrun. In order to collect these data, a total of 35 business, enterprise and domain architects were interviewed on their experience with these projects. Among factors taken into account were architecture, project compliance to architecture and experience of the architect. Consequently, these factors are recorded in hypotheses that relate to the budget and time figures of the project. These hypotheses are incorporated in the ‘Architecture Effectiveness Model” and statically tested with the acquired data. This led to more than 12,000 calculations to show the subsequent benefits of Enterprise Architecture.
- Information Architecture Analysis and Design for a Command and Control Room
In command and Control Rooms for the Dutch police a specific information management system called "Het Geïntegreerd Meldkamer Systeem (GMS)" is used. It was developed by the ministry of internal affairs and is used to provide information to the officers in the control room. Examples of information that it contains are relevant objects for firefighters police and ambulances (ca, 11.000), street sections and crossings (ca. 48.000) and streets (ca. 11.000). In addition C&C rooms use the ARBI, a specialized telephone system that helps the officers to contact the required emergency workers and their organizations. It contains information about ca. 1000 telephone numbers. Clearly there is a need here for timely and accurate information, which means that the supporting ICT department has to ensure that the information in these systems is indeed up to date and correct.
The goal of this thesis is to make an overview of the information need and design an information architecture that ensures this need is met at reasonable costs. The analysis of the information need will include making an overview of which data is needed and what the required quality of the information is. The information architecture will for example have to detail how and where the required information will be obtained, at what cost, and how and where it is stored and managed such that has the required quality and accessibility. The work will include close cooperation with a C&C room in The Netherlands and therefore the student should be proficient in Dutch.
- Building a message filtering and generation system for crisis management based on semantic web technology
The MOSAIC project (Multi-Officer System of Agents for Informed Crisis Control) is aimed at creating so-called super situation awareness by retrieving relevant information from a wide range of heterogeneous data sources and on the other hand to avoid information overload. It is a joint project by the DECIS Lab, in which TUDelft is involved, and the vts Politie Nederland (Voorziening tot Samenwerking) which supports the overall Dutch law enforcement and security chain with ICT and actively explores new techniques.
The thesis will concern the implementation of a system that will support the commanders in a Command and Control room by filtering and generates messages based on incoming messages and background information about the crisis situation. The system will be based be based on and make use of semantic web technology such as RDF and SPARQL to realize an effective en powerful type of filtering. The system will be designed in close cooperation with a PhD student that is currently working on this project and there is the possibility for the student to be hired by the DECIS lab for the duration of the thesis.
For more ideas or inspiration, you can of course also have a look at the research interests of the Web Information Systems group members.
Next to detailed topic descriptions closely related to the group's research, the group is also in connection with many more companies that could offer interesting topics and assignments, both at master and bachelor level. Bernard Sodoyer ( This e-mail address is being protected from spambots. You need JavaScript enabled to view it ) is the contact person for these company connections.Note that as was said above, it is recommended to start the project selection process by contacting prof. Geert-Jan Houben (make an appointment via the secretary) to discuss any plans and preferences in order to arrive at a project definition that satisfies all requirements. Specifically with students that want to involve industry partners in their project, first contact Geert-Jan before involving any other stakeholders. Here just a few links selected from those connections:
More information
More information can be obtained from prof.dr.ir. Geert-Jan Houben, via email at This e-mail address is being protected from spambots. You need JavaScript enabled to view it .



WIS Education
