Network Data Formats

Individual data are readily stored in a flat file format, where rows represent observations and columns represent variables. This format is efficient for computational storage in any number of software platforms, can handle quite large numbers of observations, and variables of varying types (e.g., numeric, text, etc.). This format, heuristically represented in Table 6.1 is likely familiar to most readers.

Table 6.1: Individual Data in Flat File Format.
ID# Age Gender HIV Status CSW IDU # Sex Partners
001 21 f 0 1 0 12
002 32 m 1 0 0 3
003 19 m 1 0 1 0
004 57 m 1 0 1 1
005 24 f 1 0 1 1

NOTE: ‘1’ denotes presence of an attribute, ‘0’ denotes its absence.

However, when it comes to network data, suppose we want to represent a tie between person 001 and 002 in the data format presented in Table 6.1. It’s not immediately clear from the data structure of a flat data file, to whom such a tie would “belong” and therefore in which row the information should be included in the data file. Given that limitation, other data formats are more useful for storing network data. There are three primary formats typically used to store social network data: adjacency matrices, edge (or arc) lists, and adjacency lists. Each can be used to convey precisely the same information. If you hand a social network analyst any one of them, they can convert to the other, or produce a visualization of the data represented by it.117

Early in the development of the field, the most common data format was the adjacency matrix. An adjacency matrix lists all members of the population in the rows and the columns, and the entries inside the matrix then represent the presence (1) or absence (0) of a tie. Adjacency matrices are read with the labels in the rows representing the “sender” of a tie, and those in the columns representing the “receiver” of a tie. Sender and receiver are only meaningful for directed ties. Therefore, undirected networks are either symmetrical (the top half matching the bottom), or can report only half of the matrix to reduce data redundancy.