Social media platforms provide potential treasure troves of relational data on millions of users. Some of these have made the retrieval of information from their platforms more accessible to researchers than others. Even if researchers are not working inside or with these organizations, the potential exists for automated scraping of elements of these databases. APIs make this process readily available and reproducible to scholars, requiring only a few lines of code.
Figure 3.3 provides example code that would allow you to pull all of the members of a Twitter “list” (Philip Cohen’s list of sociologists), and extract all of their followers.63 While there are a number of steps represented in the code, I would like to highlight the following: (1) any API generally requires certain permissions, which are encoded in the project keys noted at the beginning of the script; (2) The used list and its followers identify the boundary of the network that will be produced from this code; (3) There are a number of software platforms, and packages within those that facilitate these steps, and each use their own specific syntax (the example in the figure uses the rtweet package for R (Kearney 2018); (4) This sort of scraping is typically rate-limited (i.e., you are restricted from gathering too much data within a specified time-frame, so these sorts of scripts typically require automated “rest periods” to avoid losing one’s access permissions. While these sorts of webs-craped or API-retrieved data are relatively convenient to gather at scale, they can include substantial sampling biases (Hargittai 2018; González-Bailón et al. 2014).