• HC Visitor
Skip to content
Information Ecosystems
Information Ecosystems

Information, Power, and Consequences

Primary Navigation Menu
Menu
  • InfoEco Podcast
  • InfoEco Blog
  • InfoEco Cookbook
    • About
    • Curricular Pathways
    • Cookbook Modules

February 2020

You are browsing the site archives for February 2020.

Data Pipelines, Data Fluidity: Colin Allen on the “Useful Fiction” of Curated Data

2020-02-28
By: Jane Rohrer
On: February 28, 2020
In: Colin Allen
Tagged: Big Data, Darwin, data pipelines, Topic modeling

Colin Allen, distinguished professor in the Department of History and Philosophy of Science at the University of Pittsburgh, is both an invited speaker and an ongoing participant in our Seminar; on February 28th, Dr. Allen talked with his fellow participants about his work in what he (and others) call “data pipelines.” Broadly speaking, using data pipelines means that data are collected and recorded in one of many particular ways—but eventually used for purposes other than why they were originally collected. And this means, Dr. Allen pointed out, that data are highly fluid, flexible, and even self-perpetuating. An especially potent example of this in Allen’s own work is his current role as Associate Editor of the Stanford Encyclopedia of Philosophy. While this project has one discreet start date back in 1995, it has been anything but static since then; as of March 2018, the site has approximately 1,600 entries each of which is routinely reviewed and updated. Each new post adds to what is now a highly dynamic reference work containing data culled from all over the web—a pipeline, indeed. Dr. Allen thoughtfully pointed out that as our relationship to data changes over our collective futures, it is important to remember that data does not enter into our world on its own but, rather, it is collected and curated. Allen co-authored an article, “Exploration and Exploitation of Victorian Science in Darwin’s Reading Notebooks,” with Jaimie Murdock and Simon DeDeo in 2017. Charles Darwin left careful records of the books he read from 1837 to 1860, making this Read More

Self-perpetuating data and “guided serendipity”: Colin Allen’s reflection on Charles Darwin, topic modeling, and Margaret Floy Washburn

2020-02-27
By: Briana Wipf
On: February 27, 2020
In: Colin Allen
Tagged: Darwin, Topic modeling, Washburn

In his computational work, Colin Allen, distinguished professor in the Department of History and Philosophy of Science at the University of Pittsburgh, embraces the fact that the textual data he uses in his computational work often depends not on his choices, but on someone else’s. Data does not emerge, fully formed, for him and his colleagues to study. He discussed this characteristic of data when he addressed the Information Ecosystems Mellon Sawyer Seminar at the University of Pittsburgh on Friday, Feb. 28. Data, as Joanna Drucker has memorably argued, isn’t data as much as it’s capta. If we remember the Latin meaning of data is “things given” while capta is “things taken,” Drucker’s argument makes sense. The stuff we generate in our experiments or gather in the world doesn’t exist naturally. Rather, it’s taken or made (in which case I suppose we’d call it facta). In Drucker’s formation, we are reminded that data isn’t neutral but often exists according to the individual choice of this or that researcher, or this or that curator. Allen points out that the textual corpus — that is, his data — he uses for one project, Darwin’s reading list, for example, yields its own data when he runs a topic model of the corpus. The topics produced by the model is data he can then interpret in his own work. In this way, Allen explained to me when I interviewed him for an upcoming episode of the Information Ecosystems podcast, data has a habit of begetting more data. “I think it’s important to realize that Read More

Embedded and Interdisciplinary: Generosity in the “Trade Zone”

2020-02-21
By: Sarah Reiff Conell
On: February 21, 2020
In: Edouard Machery
Tagged: collaboration, Data, digital humanities, Education, Information Ecosystems, Philosophy of Science

In a recent meeting of the Sawyer Seminar, Dr. Edouard Machery came to discuss the role of data in his work. He is a Distinguished Professor in the History and Philosophy of Science (HPS) Department at the University of Pittsburgh, and Director of the Center for Philosophy of Science. The HPS department seems to be inherently interdisciplinary, one that brings together apparently diametrically opposed methods, like statistics and philosophy. On their website, it states “Integrating Two Areas of Study: HPS supports the study of science, its nature and fundamentals, its origins, and its place in modern politics, culture, and society.” Though many, seemingly disparate skills are required for such a field, there was still interest in building a new domain, experimental philosophy. Dr. Machery engages in this area in his current research, as he states, “with a special focus on null hypothesis significance testing, external validity, and issues in statistics.” Engaging in such varied methods, and being interdisciplinary at a personal level is difficult (to say the least). If it is true what Malcolm Gladwell states, that mastery in a subject takes roughly 10,000 hours of practice, there are only so many fields of expertise one can cultivate in a lifetime. Working in a domain in which one has gained expertise also takes time. Is it like a language? Are there polyglot parallels? After acquiring four, does one get faster at accruing expertise? Many specialists were drawn to their field because of a passion for the subject, and proficiency materialized as advanced degrees, formalized proof of Read More

The replication crisis gets to the heart of what counts as knowledge

2020-02-20
By: Briana Wipf
On: February 20, 2020
In: Edouard Machery
Tagged: Gettier Intuition, replication crisis

What is truth? How do people reach conclusions and evaluate facts? What counts as knowledge, and how do we know? Hold up before you give up on this post, which I realize might seem to be getting into the type of heady esotericism humanists are sometimes criticized for. For Edouard Machery, director of the Center for Philosophy of Science at the University of Pittsburgh, these questions about how people understand what it means to know something or how people make knowledge come down to very real-world issues, including the replication crisis that has for the past several years caused hand-wringing among scientists, who acknowledge that the causes of the so-called crisis have to do with entrenched publishing incentives but also disagree about ways to correct it. Machery spoke to the University of Pittsburgh’s Information Ecosystems Sawyer Seminar on Friday, Feb. 21, having presented a public talk entitled “Why are Good Data so Hard to Get? Lessons from the Replication Crisis” the previous day. For his part, Machery was one of dozens of researchers who co-authored a Comment piece in Nature Human Behaviour in January of 2018, calling for a change to the threshold for “statistical significance,” the point at which a study’s results could not be the result of mere chance. Currently, statistical significance can be expressed as P<0.05, but the article, “Redefine Statistical Significance,” argued the threshold should be changed to P<0.005. This change, they argue, “would immediately improve the reproducibility of scientific research in many fields.” The replication crisis has real-world implications: this is not a case of cloistered academics splitting hairs Read More

Research Software & Building Useful Data from Absence

2020-02-07
By: Jane Rohrer
On: February 7, 2020
In: Matthew Lincoln
Tagged: Curation, Data, data visualization, Information Ecosystems, Museums

On February 7th, one of the Seminar’s very own participants headed our lunchtime discussion; Dr. Matthew Lincoln, a research software engineer at Carnegie Mellon University Libraries, talked with us about museum informatics, archive management, and computational approaches to humanities projects. Although his transition to software engineer is relatively recent, his experience with data modelling and analysis is definitely not—before his move to Carnegie Mellon, Dr. Lincoln earned a Ph.D. in art history from University of Maryland, where he used computational methods to study 16th-18th century Dutch printmakers. This, along with his work on data engineering at the Getty Research Institute’s Getty Provenance Index Databases, makes him uniquely attuned to multiple aspects of building data sets and archiving. As Dr. Lincoln himself articulated during his talk, using large data sets as a Ph.D. candidate—what he worded as the “available technology”—alerted him to particular data absences within library and museum holdings; in other words, researchers can only carry out the large-scale digital projects that data actually exist for. If you’ve ever searched for an eBook only to find that a digital version of this text does not (yet) exist, you know this feeling; it is, on a smaller scale, the same feeling a researcher might have if they, for example, wanted to compare one particular library system’s entire collection to another—but there is no usable data with which to do such a project. The project idea is there, the necessary data is not. This is where and why Dr. Lincoln’s job becomes so essential; his work has helped Read More

What you can see in museums is just the tip of the iceberg

2020-02-06
By: Erin O'Rourke
On: February 6, 2020
In: Matthew Lincoln
Tagged: Curation, Data, Information Ecosystems, Linked Open Data, Matt Lincoln, Museums

While all of the Sawyer Seminar speakers so far have been scholars or users of information ecosystems, Matt Lincoln is potentially unique in coding them. His Ph.D. in Art History, time as a data research specialist at the Getty Research Institute, and most recently, work as a research software engineer at Carnegie Mellon University have given him substantial knowledge about museums’ information systems, as well as the broader context of the seminar. For Lincoln, “data” consists of collections of art and associated facts and metadata. In his public talk, entitled “Ways of Forgetting: The Librarian, The Historian, and the Machine,” Dr. Lincoln focused on a case study from his time at the Getty, in which he was working on a project restructuring the way art provenance data were organized in databases. Lincoln argued that depending on who the creator or end-user of the information would be (whether librarian, historian or computer), the way the data are structured can vary. A historian would likely prefer open-ended text fields in which to establish a rich context with details specific to the piece, whereas a librarian would opt to record the same details about every piece, and a computer would prefer the data to be stored in some highly structured format, with lists of predefined terms that can populate each field. On top of balancing these disparate goals, Lincoln cited a particularly poignant Jira ticket, which asked: “Are we doing transcription of existing documents or trying to represent reality?” This question might well be answered with “both” since the Read More

Invited Speakers

  • Annette Vee
  • Bill Rankin
  • Chris Gilliard
  • Christopher Phillips
  • Colin Allen
  • Edouard Machery
  • Jo Guldi
  • Lara Putnam
  • Lyneise Williams
  • Mario Khreiche
  • Matthew Edney
  • Matthew Jones
  • Matthew Lincoln
  • Melissa Finucane
  • Richard Marciano
  • Sabina Leonelli
  • Safiya Noble
  • Sandra González-Bailón
  • Ted Underwood
  • Uncategorized

Recent Posts

  • EdTech Automation and Learning Management
  • The Changing Face of Literacy in the 21st Century: Dr. Annette Vee Visits the Podcast
  • Dr. Lara Putnam Visits the Podcast: Web-Based Research, Political Organizing, and Getting to Know Our Neighbors
  • Chris Gilliard Visits the Podcast: Digital Redlining, Tech Policy, and What it Really Means to Have Privacy Online
  • Numbers Have History

Recent Comments

    Archives

    • June 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • October 2020
    • September 2020
    • May 2020
    • March 2020
    • February 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019

    Categories

    • Annette Vee
    • Bill Rankin
    • Chris Gilliard
    • Christopher Phillips
    • Colin Allen
    • Edouard Machery
    • Jo Guldi
    • Lara Putnam
    • Lyneise Williams
    • Mario Khreiche
    • Matthew Edney
    • Matthew Jones
    • Matthew Lincoln
    • Melissa Finucane
    • Richard Marciano
    • Sabina Leonelli
    • Safiya Noble
    • Sandra González-Bailón
    • Ted Underwood
    • Uncategorized

    Meta

    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org

    Tags

    Algorithms Amazon archives artificial intelligence augmented reality automation Big Data Bill Rankin black history month burnout cartography Curation Darwin Data data pipelines data visualization digital humanities digitization diversity Education election maps history history of science Information Information Ecosystems Information Science Libraries LMS maps mechanization medical bias medicine Museums newspaper Open Data Philosophy of Science privacy racism risk social science solutions journalism Ted Underwood Topic modeling Uber virtual reality

    Menu

    • InfoEco Podcast
    • InfoEco Blog
    • InfoEco Cookbook
      • About
      • Curricular Pathways
      • Cookbook Modules

    Search This Site

    Search

    The Information Ecosystems Team 2023

    This site is part of Humanities Commons. Explore other sites on this network or register to build your own.
    Terms of ServicePrivacy PolicyGuidelines for Participation