A key challenge that simulated learning environments face is how to align the experience in the simulated world with real world experiences. A new learner will have a limited-scope model that does not include sufficient information about the learning characteristics and needs of the learner. This will hinder the adaptation of the simulated learning environment in properly meeting the learning needs of the learner. The main objective of the YouTube services is to address this challenge by supporting the adaptation of the simulated learning environments to well meet the learning needs of the new learner.

The YouTube services are a suite of services that mine the user-created content on the video social sharing site YouTube; mainly the comments that the users write on the uploaded videos that address a particular domain of interest, to derive profiles of user groups. The users who are associated to each derived group are either similar in which concepts relevant to the learning domain they are interested in, or similar in basic demographic characteristics, such as the gender, age group, and location they live in. The derived group profiles will facilitate the design of the simulated learning environments in two main aspects:

  • The derived profiles can provide a means for the trainers or the content providers to identify key learning needs for the simulator learners.

  • The concepts relevant to the learning domain and derived from the comments of the users who are associated to the groups can be used to augment the limited-scope model of the learners who share similar demographics (gender, age group, location) of the users in these groups.

The YouTube Suite of Services consists of the following:

  1. Noise Filtration Service. This service uses the YouTube Data API [1] to retrieve the user comments on YouTube videos that address a particular learning domain, such as learning about interpersonal skills for job interviews, and then uses a scoring mechanism built in Java and Apache Lucene [2] as well as a semantically-enriched machine learning model built in RapidMiner [3] to identify the relevant comments that contain real-job user experiences in the learning domain by filtering out the noise in the retrieved comments, such as spam, abuse, and the comments that are too short and do not reflect any user awareness in the domain concepts. The output of this service is a filtered subset of relevant YouTube video comments. This subset is further used by the subsequent YouTube group profiling services to derive the group profiles.
  2. Cluster-based Group Profiling Service. This service uses cluster analysis techniques in RapidMiner [3] to derive group profiles of users from YouTube user-created content, taking as input the relevant comments filtered by the noise filtration service. The derived groups of YouTube comment authors are based on the learning concept similarity found in the comments of these authors. Each derived group consists of a number of YouTube comment authors, where authors in that group comment on relatively similar learning concepts. The output of this service is a set of characteristics for the comment authors in each derived group. These characteristics include a list of domain-related concepts that the comment authors in each group are aware of. (e.g. in the case of learning interpersonal skills for job interviews, the concepts can be: body language signals, emotions expressed by the interviewer or interviewee, interview preparation, and good job interview practices). The characteristics also include descriptive statistics of the authors
YouTube Services

To top


To top

Example scenarios

To top


  • Noise Filtration: given a set of video IDs of YouTube videos about Job Interviews, the Noise Filtration service retrieves all the user comments and classifies each comment into either relevant to Job Interviews or Noise
    (Click on each of the images below to access the comments):


  • Cluster-based Group Profiling: given the set of comments classified as relevant to job interviews by the noise filtration service, the Cluster-based Group Profiling service derived the profiles of 10 user groups, where the YouTube comment authors in each group write similar job interview related terms. Each group profile shows the most frequent terms used that are relevant to job interviews, in addtition to the gender distribution, age groups, and countries the group authors have
    (Click on the image below to access all the cluster-based group profiles)

  • Demographics-based Group Profiling: given the demographic characteristics of a new learner who wants to use a simulated environment to learn interpersonal skills for job interviews, the Demographics-based Group Profiling service derives a group profile for that learner. The derived group consists of the most frequent job interview-relevant terms that are used in the comments of the YouTube users who have the same demographic characteristics of the learner. In the following example, the service derived four group profiles used to augment the models of four different learners: a 20 years old American male; an 18 years old American female; a British male; and a learner from Asia, respectively.
    (Click on the image below to access the four demographics-based group profiles):

To top

To top


  1. Ahmad Ammari, Vania. Dimitrova, Dimoklis Despotakis. Semantically Enriched Machine Learning Approach to Filter YouTube Comments for Socially Augmented User Models. In Proceedings of the International Workshop on Augmenting User Models with Real World Experiences to Enhance Personalization and Adaptation, co-located with the International Conference on User Modeling, Adaptation and Personalization (UMAP2011), Girona, Spain, pp. 6

To top