On February 7th, one of the Seminar’s very own participants headed our lunchtime discussion; Dr. Matthew Lincoln, a research software engineer at Carnegie Mellon University Libraries, talked with us about museum informatics, archive management, and computational approaches to humanities projects. Although his transition to software engineer is relatively recent, his experience with data modelling and analysis is definitely not—before his move to Carnegie Mellon, Dr. Lincoln earned a Ph.D. in art history from University of Maryland, where he used computational methods to study 16th-18th century Dutch printmakers. This, along with his work on data engineering at the Getty Research Institute’s Getty Provenance Index Databases, makes him uniquely attuned to multiple aspects of building data sets and archiving. As Dr. Lincoln himself articulated during his talk, using large data sets as a Ph.D. candidate—what he worded as the “available technology”—alerted him to particular data absences within library and museum holdings; in other words, researchers can only carry out the large-scale digital projects that data actually exist for.
If you’ve ever searched for an eBook only to find that a digital version of this text does not (yet) exist, you know this feeling; it is, on a smaller scale, the same feeling a researcher might have if they, for example, wanted to compare one particular library system’s entire collection to another—but there is no usable data with which to do such a project. The project idea is there, the necessary data is not. This is where and why Dr. Lincoln’s job becomes so essential; his work has helped individuals browse museum archives, an exploratory tool which becomes incredibly useful if you, like so many people, don’t actually know, yet, exactly what you’re looking to do or discover within an archive. But if you do happen to know, research engineers like Dr. Lincoln will invest their time and data prowess to carry out a project with a deliverable; such projects he has worked on include CMU’s Encyclopedia of the History of Science or koningsbergr—an R package to find a path across all of the bridges Pittsburgh.
Dr. Lincoln pointed out that our current version of the internet is neither inevitable nor permanent. Before we arrived here, plenty of people had plenty of ideas about how the Internet might look, feel, and function. And many of those hopes included plans for massive, centralized and institutionalized databases and archives—even more centralized and powerful than, say, the National Archives or Facebook’s massive log of data on its users’ clicks, conversations, likes, locations, and restaurant reviews. While Dr. Lincoln was careful to articulate that these massive collections, of course, have particular and important dangers (Facebook is a particularly useful case study of this), there is certainly something exciting about the possibility of creating and maintaining archives made truly accessible to the everyday user.
So let’s say someone took a visit to Philadelphia’s Barnes Foundation museum on their day off. They almost certainly won’t have time to take in every piece of art in the collection, and then they might not be making regular trips back to see later exhibits on display—meaning that this individual is really only experiencing a slim selection of the Foundation’s full holding. So it is an exciting thing, then, that you can browse and search through the entire collection online; and you don’t have to know exactly what you want, either—the website allows users to search by similarities between pieces in colors, lines, light, and space. And if you’re even less picky, you can just click “shuffle” on the whole thing. This is not a replacement experience for actually moving your body through a museum. Rather, this website promises that there is far more out there to find, discover, and analyze than a single afternoon’s browse could possible contain.
Archives can help us know what we don’t already about our world and the ones before ours, just as informatics helps us understand the big-picture of the data we use and create, often without even realizing, every day. The structures we use to access information or objects—whether it’s a library book, a sculpture, a biographical note about your favorite poet, or that citation that would perfectly compliment your research paper’s argument—do not materialize on their own. Research software engineers like Dr. Lincoln track down information, make it usable, and see that it continues to be usable for those after us. It is unlikely, and important to remember, that whoever first conceived of CMU’s Encyclopedia of the History of Science is the last or only person to find it extraordinarily useful. And, like Dr. Lincoln himself notes, it is impossible to predict how existing models and structures might spark future knowledge, let alone models we don’t even yet have.
So while the project of building cross-disciplinary consensus around data and their use is still very much in-progress—and thus, a truly accessible (or perhaps Open) data world can seem like pipe dream—Matthew Lincoln’s Sawyer Seminar talks was a highly useful reminder that there are already excellent resources out there to help us build and maintain the data we wish existed.