About

This site provides supplemental material and information about the paper Interweaving Trend and User Modeling for Personalized News Recommendations.
Paper: pdf, bib

Abstract. Twitter is today's largest microblogging service on the Web and allows people to publish and discuss topics they are interested in. Previous research showed how trending topics propagate through the Twitter network. In this paper, we study user modeling on Twitter and investigate the interplay between personal interests and public trends. To generate semantically meaningful profiles, we present a framework that allows us to enrich the semantics of individual Twitter messages and features user modeling as well as trend modeling strategies. These profiles can be re-used in other applications for (trend-aware) personalization. Given a large Twitter dataset, we analyze the characteristics of user and trend profiles. We evaluate the quality of the profiles in the context of a personalized news recommendation system and show that personal interests are more important for the recommendation process than public trends and that by combining both types of profiles we can further improve recommendation quality.


1. Datasets

Tweets: Over a period of more than two months (starting from end of October to beginning of January) we crawled Twitter information streams of more than 20,000 users. Together, these people published more than 10 million tweets.

News: To allow for linkage of tweets with news articles we also monitored more than 60 RSS feeds of prominent news media such as BBC, CNN or New York Times and aggregated the content of 77,544 news articles.

Semantics: Given the content of Twitter messages and news articles we extract entities to better understand the semantics of Twitter activities. Therefore we utilize OpenCalais.

name number of records description
tweets.sql.gz (643MB) 2316204 sample of tweets processed with OpenCalais
news.sql.gz (73MB) 77544 news articles monitored from 62 news media websites
sementicsTweetsEntity.sql.gz (71MB) 1896328 entity assignments extracted from tweets (1,051,524); 709,245 distinct entities (categorized in 39 types)
sementicsNewsEntity.sql.gz (40MB) 1216570 entity assignments extracted from news (63,140), 170,577 distinct entities (39 different types of entities)

2. Trend and User Modeling Framework

(1) Generation of User Profiles Using our framework, trend and user profiles can be generated basically with on line of code once a configuration of a trend and/or user modeling strategy is defined. For example, for creating an entity-based profile for a certain time period one could use the following code snippet:

//Create a topic-based profile for a certain time period:
Timestamp profileFrom = Timestamp.valueOf("2010-11-15 00:00:00");
Timestamp profileTo = Timestamp.valueOf("2010-12-29 00:00:00");

//a. create configuration for your strategy
UMConfiguration umConf = new UMConfiguration(
"my first UM strategy",
UM_Type.Topic_based,
UM_Source.Twitter_and_News_based,
profileFrom, profileTo,
UM_TimeSlot.All, 1, null);

//b. instantiate user modeling strategy
UserModelingStrategy um = UserModelingFactory.getUMStrategy(umConf);

//c. get profile vector for user via her Twitter ID (here: 1234)
um.getProfileVector(1234);

Service: we are currently making the trend and user modeling functionality also available via Web service. More details: http://wis.ewi.tudelft.nl/tums/