Users who are active on the social Web produce digital traces continuously by posting messages, sharing videos, commenting on news items, or simply by walking into a store (and checking in on Foursquare). The number of traces a user generates per day is steadily increasing thanks to the myriad of online social networks and the widespread use of smart phone apps which often publish messages on behalf of the user in a semi-automatic manner.


In this line of work, we consider the question to what extend we can exploit these traces to automatically link accounts on different social Web streams to the same user. While some users may intentionally want to be recognized across social networks (by using the same user name and/or posting links to their various social Web profiles), others may want to keep their connections private and may not even be aware about the amount of information they leak that make them identifiable.

Previous work in this direction has mostly focused on information available directly in the user profiles which tends to perform well for cooperating users (those, that want to be found easily by using the same/similar social Web ID on different platforms). Here, we assume a set of uncooperative users, i.e. users that cannot be linked according to their self-reported profile information, and investigate the question: to what extent is it still possible to determine likely matches?


In our preliminary study, we consider only two social Web streams: Flickr and Twitter and investigate the accuracy of stream-content based matching (i.e. matching based on the messages/images, not the profile information). We find this task to be highly difficult, for the majority of users in our data set we are not able to determine the correct match.


To top


Our initial investigation resulted in a number of challenges, which will be tackled in the long-term:

  • Social networks often place a limit on the amount of data that is publicly accessible and a long-term experimental setup is required to gather a large amount of data. 
  • A user may use different social networks at different time periods - matching a user who is currently active on Twitter, to his Flickr account that was last used two or three years ago is difficult.
  • A considerable number of the encountered matched accounts were not operated by private individuals, but belong to organizations or business endeavours.
  • Automatic or semi-automatic methods to generate pairs of matched accounts are not always reliable. In particular, matching users through self-reported links in online identity management services has a non-negligible error rate.
  • Implicitly, the users we selected were cooperative as we were able to manually match them according to their profile information, avatar image or content. How to obtain a set of uncooperative users is an open question.

To top

Example scenarios

If the automatic matching of user accounts across different social Web platforms were possible with great accuracy, such a service could be used across all ImREAL use cases. In general, the more digital traces a user modeling and profiling service has available, the better and more accurate the created model will be (as could be seen in the location estimation experiments, where an additional data source reduced the error by more than 60%). A more accurate user model is also more likely to yield a more satisfactory learning experience for a learner compared to learning with a simulator that is less well personalized.

To top


Claudia Hauff and Gerald Friedland, Brave New Task: User Account Matching, MediaEval: Benchmarking Initiative for Multimedia Evaluation, 2012 [PDF] [Slides]

To top