University of Maryland


Emergence and Growth of the MOOC Innovation Community

Xu Meng, Chen Huang, Brian Butler, and Ping Wang

As with past information innovations, Massive Open Online Courses (MOOCs) present information professionals with the challenge of understanding a rapidly changing innovation community. Using data from both general and specialized media sources, we present a descriptive analysis of the composition and structure of the emerging MOOC innovation community. Preliminary findings suggest that the core-periphery structure expected in a community is developing. In addition, variation in the structures and trends found in general and specialized media suggest differences in how an innovation community is perceived by community participants and outsiders that may have implications for both spread of the innovation and growth of the associated community.

Methods: General news sources in the US were sampled by searching the ProQuest National Newspapers database, a full-text archive of articles from prominent national and regional newspapers such as New York Times and Wall Street Journal. We also collected articles from The Chronicle of Higher Education (the Chronicle), a publication specializing about higher education. In both databases, we searched for the phrase “Massive Open Online Course” or “MOOC” in the title, abstract, or full text of articles. The result was a set of 56 articles from ProQuest and 284 from the Chronicle between Q3 2010 and Q1 2013.


Presence and co-occurrence measures were constructed to characterize the composition and structure of the MOOC community. Organizations were identified as being present in the MOOC community if they appeared in one or more articles in either the ProQuest or Chronicle article sets. The Stanford Named Entity Recognizer (NER), a Java implementation of a named entity recognizer, was used to identify the names of organizations appearing in the retrieved articles. A Python script was then used to extract organization names, which were then manually reviewed to eliminate duplicates, synonyms, and abbreviations. This process yielded a list of 180 unique organizations from ProQuest articles and 580 from Chronicle articles. For example, the figure below shows, in ProQuest, six of the organizations are present in steadily increasing frequency: Coursera, MIT, Harvard University, Stanford University, edX and Udacity. Not only are these organizations mentioned earliest (Udacity in 2012Q1), but by 2013Q1 they form a cluster receiving more attention, both individually and collectively, than other organizations in the MOOC community.


Regarding the relationships between the organizations involved (i.e. the structure of the community), organizations were identified as co-occurring when they were mentioned within the same paragraph in an article. The strength of the relationship between two organizations was determined the total number of different articles in which they were co-mentioned. The figure below compares the organizational networks represented in ProQuest (left) and the Chronicle (right) in 2012 Q3. the MOOC community structure is complex with more than 50 nodes in each network. In both co-occurrence networks, cliques (i.e. fully connected clusters) formed, as expected in cases of community growth. Consistent with the presence data, there is a core of organizations that are strongly associated with one another: Coursera with Stanford University; Harvard University with MIT; edX, MIT and Coursera; Princeton University and Coursera.

mooc3 mooc4

These findings provide a starting point for describing critical community dynamics such as organizational entry and exit, relationship formation, and emergence of higher-level structures. Theoretically they raise questions about how the composition and structure of an innovation community affect perceptions of the innovation and the community among different stakeholders. Future observations and visualizations of the community network will help us better understand innovation community dynamics for both MOOCS and other information innovations.