Colin Allen, distinguished professor in the Department of History and Philosophy of Science at the University of Pittsburgh, is both an invited speaker and an ongoing participant in our Seminar; on February 28th, Dr. Allen talked with his fellow participants about his work in what he (and others) call “data pipelines.” Broadly speaking, using data pipelines means that data are collected and recorded in one of many particular ways—but eventually used for purposes other than why they were originally collected. And this means, Dr. Allen pointed out, that data are highly fluid, flexible, and even self-perpetuating.
An especially potent example of this in Allen’s own work is his current role as Associate Editor of the Stanford Encyclopedia of Philosophy. While this project has one discreet start date back in 1995, it has been anything but static since then; as of March 2018, the site has approximately 1,600 entries each of which is routinely reviewed and updated. Each new post adds to what is now a highly dynamic reference work containing data culled from all over the web—a pipeline, indeed.
Dr. Allen thoughtfully pointed out that as our relationship to data changes over our collective futures, it is important to remember that data does not enter into our world on its own but, rather, it is collected and curated. Allen co-authored an article, “Exploration and Exploitation of Victorian Science in Darwin’s Reading Notebooks,” with Jaimie Murdock and Simon DeDeo in 2017. Charles Darwin left careful records of the books he read from 1837 to 1860, making this piece of his biographical information an especially rich site for data analysis. Allen and his co-authors used topic modeling to group each of Darwin’s listed texts into “a mixture of topics.” While this method can (and did!) certainly teach us a lot about Darwin—which ideas he was reading about most frequently at certain points of his life, for example—we must remember that the data set (Darwin’s reading list) was long-ago curated for purposes having nothing to do with their eventual use (Allen and his colleagues’ project).
Additionally, our changing relationship to data requires that we acknowledge our contemporary perspective. Allen acknowledged that data sets that might seem downright tiny to us today were thought of as massive by their initial collectors; and what is now “Big” Data might someday seem tiny in a way yet-unknowable by current standards. As data sets grow, it becomes more important than ever to acknowledge them as collected, curated information sets—not reflections or statements of simple truths.
Toward the end of his talk, Dr. Allen referred to the usage of digital platforms as “useful fiction;” by this, he means that a digital platform is unlikely to ever truly and faithfully convey a representative of all facets of any topic. Here, I’m reminded of our first Sawyer Seminar, Matthew Edney, who asked us to think about maps—what they show, and what they necessarily leave out. When we hop onto Google maps and search for a coffee place, the app is unlikely to actually pull up an exhaustive list of everywhere you might buy coffee—instead, we are handed a curated list of what Google presumes to be the most helpful answers to a user’s inquiry. Similarly, a user of the Stanford Encyclopedia of Philosophy is not greeted by a truly exhaustive collection of everything that has ever had to do with the topic at hand. Such a collection would be as impossible as it were frustrating. As Dr. Allen suggested, the more data one has, the more difficult it is to keep it active, current, and useful.
So as data do indeed create representative fictions—digital worlds where we are provided but a small sliver of what is really “out there”—this fiction can be deployed to highly useful ends. So while Darwin’s self-reported reading list does not, of course, stand as a total representative of all his thought patterns, choices, and influences between 1837 and 1860, it does suggest important things about how he navigated the world—and can help confirm (or deny) important details of his biography.
While we can’t predict how data collected today might be used tomorrow, we can—as Dr. Allen so expertly points out—take ongoing care to track their movements, however fluid they might be. In uncertain times, it remains a hopeful fact that we very well may be creating or curating data right now that someday changes the world, in ways yet-unknowable.