About

This site provides supplemental material and information about the paper Learning Temporal Semantic Relations in Tweets.

Abstract. In this paper, we investigate whether semantic relationships between entities can be learnt from analyzing microblog posts published on Twitter. We identify semantic links between persons, products, events and other entities. We develop a relation discovery framework that allows for the detection of typed relations that moreover may have temporal dynamics. Based on a large Twitter dataset, we evaluate different strategies and show that co-occurrence based strategies allow for high precision and perform particularly well for relations between persons and events achieving precisions of more than 80%. We further analyze the performance in learning relationships that are valid only for a certain time period and reveal that for those types of relationships Twitter is a suitable source as it allows for discovering trending topics with higher accuracy and with lower delay in time than traditional news media.


Slides presented at ICWE:

Learning Semantic Relationships between Entities in Twitter
View more presentations from Web Information Systems, TU Delft.

1. Datasets

Tweets: Over a period of more than two months (starting from end of October to beginning of January) we crawled Twitter information streams of more than 20,000 users. Together, these people published more than 10 million tweets.

News: To allow for linkage of tweets with news articles we also monitored more than 60 RSS feeds of prominent news media such as BBC, CNN or New York Times and aggregated the content of 77,544 news articles.

Semantics: Given the content of Twitter messages and news articles we extract entities to better understand the semantics of Twitter activities. Therefore we utilize OpenCalais.

name number of records description
tweets.sql.gz (643MB) 2316204 sample of tweets processed with OpenCalais
news.sql.gz (73MB) 77544 news articles monitored from 62 news media websites
sementicsTweetsEntity.sql.gz (71MB) 1896328 entity assignments extracted from tweets (1,051,524); 709,245 distinct entities (categorized in 39 types)
sementicsNewsEntity.sql.gz (40MB) 1216570 entity assignments extracted from news (63,140), 170,577 distinct entities (39 different types of entities)

2. Relations

In this paper, we analyzed relations between different types of entities. In particular, we analyzed the following 71 types of relations.

  1. Person and City
  2. Person and Country
  3. Person and Organization
  4. Person and Company
  5. Organization and City
  6. Organization and Country
  7. Company and City
  8. Company and Country
  9. Company and Product
  10. Currency and Country
  11. Currency and Continent
  12. Holiday and Country
  13. Technology and Person
  14. Technology and Company
  15. NaturalFeature and City
  16. NaturalFeature and Country
  17. NaturalFeature and Continent
  18. NaturalFeature and Region
  19. MedicalCondition and Person
  20. MedicalCondition and MedicalTreatment
  21. MedicalCondition and City
  22. MedicalCondition and Country
  23. MedicalCondition and Continent
  24. MedicalCondition and Region
  25. Region and City
  26. Region and Country
  27. Region and Continent
  28. Organization and PhoneNumber
  29. Organization and EmailAddress
  30. Company and PhoneNumber
  31. Company and FaxNumber
  32. Company and EmailAddress
  33. Organization and FaxNumber
  34. Person and EmailAddress
  35. Person and FaxNumber
  36. Person and PhoneNumber
  37. City and EntertainmentAwardEvent
  38. Country and EntertainmentAwardEvent
  39. Continent and EntertainmentAwardEvent
  40. Person and EntertainmentAwardEvent
  41. MusicAlbum and EntertainmentAwardEvent
  42. MusicalGroup and EntertainmentAwardEvent
  43. Movie and EntertainmentAwardEvent
  44. ProgrammingLanguage and OperatingSystem
  45. Product and OperatingSystem
  46. MusicAlbum and MusicGroup
  47. Person and TVShow
  48. TVShow and TVStation
  49. RadioStation and RadioProgram
  50. Person and PoliticalEvent
  51. City and PoliticalEvent
  52. Country and PoliticalEvent
  53. Region and PoliticalEvent
  54. City and SportsEvent
  55. Country and SportsEvent
  56. Region and SportsEvent
  57. Person and SportsEvent
  58. SportsGame and SportsEvent
  59. SportsGame and SportsLeague
  60. SportsLeague and TVStation
  61. SportsEvent and TVStation
  62. SportsEvent and RadioStation
  63. SportsLeague and RadioStation
  64. SportsLeague and Country
  65. Person and Movie
  66. Person and MusicGroup
  67. Person and MusicAlbum
  68. URL and Company
  69. URL and Organization
  70. Person and Position
  71. Person and Person

3. Comment on Findings

In our paper, we saw that relationships between events (e.g. SportsEvent or PoliticalEvent) and other entities can be discovered with high precision while for relationships among persons (Person) the precision is rather low. We confirmed these findings on the ground truth obtained from DBpedia. The figure below shows additional results that complement the results shown in Figure 6(b). It is interesting to see that the partial order between products and person relationships is different. When diving into the concrete relationships among products (e.g. Product and OperatingSystem) and persons (e.g. Person and Person), we see that many products (e.g. specific mobile phones) have no DBpedia URI. Therefore, a relationship between a product such as the N900 and the operating system Android would be classified as not related even though the N900 features Android 2.3.

performance for specific relation types

Hence, the performances reported above (as well as in the paper) would further increase if these issues would be considered.