4.2.1 Deductive Disclosure and De-Anonymization

The “Tastes, Ties, and Time” (T3) dataset compiled Facebook profile data from a cohort of college students in the mid-2000s, which was used to analyze the relationship between social networks and personal cultural preferences (Lewis et al. 2008). The data were also intended to be made publicly available for other researchers’ use. Upon publication of the data’s codebook, it became rapidly apparent that the school that was the source of the data was readily identifiable, even without accessing the data itself (Zimmer 2010). Moreover, in datasets like these, the unique combinations of a relatively small number of individual characteristics can make individuals quite readily identifiable by people with access to purportedly “de-identified” data and publicly available resources (Arfer and Jones 2018). Scholars have generally criticized the deductive disclosure of individual identities as unethical (Poor and Davidson 2018). Computer scientists have devoted considerable attention to optimal strategies for protecting against data de-anonymization, including for social network data with particular structural patterns (Onaran, Garg, and Erkip 2016).

In other words, simply removing data readily recognized as PII (names, addresses, etc.) is not sufficient to adequately protect the identity of the population, or individuals included within the study. As such, while the principles of aiming to maintain confidentiality and protect PII are generally agreed upon, the unique characteristics of social network data raise considerable disagreement about what actually constitutes PII in the network context (Narayanan and Shmatikov 2009). One strategy for dealing with this concern is to establish terms of use agreements for anyone with whom data are to be shared (Parry 2011) that ensure users will not attempt to personally identify anyone in the study.83 However, without a system for enforcing these agreements, it will remain difficult to ensure that released data will not be misused (Zimmer 2010). It is paramount to additionally consider how increasing “sunshine” requests for sharing information via mechanisms such as the Freedom of Information Act place additional demands on how we should maintain protections against the potential disclosure of PII.

Furthermore, network data provide increased analytic capabilities for potentially identifying research subjects. Guidelines exist in individual-level data for the ethical protection against disclosing PII. These standards restrict the presentation of detailed analytic combinations that would reduce the number of specified cases below a certain threshold (e.g., not presenting any cells in tables with fewer than five cases for “sensitive data” according to Office for National Statistics (2006)). Along similar lines, reporting structural position within network data can make apparent individual identities (e.g., those who are especially central, peripheral, or occupying otherwise unique positions). Given that social networks research so commonly relies on the visual presentation of data (Freeman 2004), researchers must evaluate whether such presentation would potentially violate confidentiality agreements with their research subjects (Borgatti and Molina 2003). The answer to this question is not always apparent, and the potential problems are broader than for most individually-oriented research; in the SNA case identifying one node may unravel the identities of many others that they are directly or indirectly connected to.