Data Critique


The Dataset

Canyon

Robert Rauschenberg, 1959

Self Playing Violin

Laurie Anderson, 1974

Indian Dancer: From an Ethnographic Museum

Hannah Höch, 1930


Overview

As background, the Museum of Modern Art (MoMA) was established in 1929 and has since collected over 200,000 works of art. To manage their substantial collection, the MoMA maintains a database that stores the artwork title, artist, date made, medium, dimensions, and date acquired by the museum. The dataset we are using for this research project is directly exported from this database.


The original sources of this data are from MoMA’s collection, pulling from the internal database, but also the curatorial history, as there is another set of data provided that details each artist, including name, nationality, gender, birth year, death year, Wiki QID, and Getty ULAN ID. Through this, we can infer that many of their sources originate from Wiki QID and Getty ULAN which is most likely to ensure accuracy through cross referencing.


Importantly, this dataset is created, funded, and maintained by MoMA itself as a part of its internal system for tracking and managing its collection. Every entry in the Artists and Artworks tables represents a piece that MoMA has officially chosen to collect, catalog, and preserve. As a result, this is not a neutral or crowdsourced dataset: it reflects the choices of a major cultural institution with its own history, values, and priorities. Institutions hold their own set of history, values, and priorities so this may be a form of bias in the database. Fields like accession numbers, acquisition dates, credit lines, and curator-approval flags reveal that the data is closely tied to MoMA’s legal ownership of works, its relationships with donors, and its internal cataloging practices. In this sense, the dataset doesn’t just document art; it also reflects MoMA’s power to decide which artists and artworks become part of the official record of modern art.


Limitations

Although the dataset contains a lot of structured information about artists and artworks, it leaves out many important forms of context. There is no information about artists’ socioeconomic background, race, class, education, or access to resources, even though these factors heavily influence who is able to produce and succeed in the art world. The dataset also does not explain why certain works were acquired. There is no record of debates, rejected pieces, or shifting institutional tastes. In addition, it lacks a history of exhibitions, critical interpretations, and any kind of narrative about the artworks themselves. The database includes mostly administrative facts. This means we can analyze patterns in MoMA’s collection, but not the cultural, political, or social struggles behind how that collection came to exist. These cultural, political, or social struggles must be researched in parallel to a data analysis to match patterns in the data with real world events and struggles.


Biases and Ontologies

The Museum of Modern Art’s (MoMA) dataset collection has been divided into a specific institutional ontology that categorizes based on accessioned artworks and the artists associated with them. This structure prioritizes the ownership, authorship, and clear classification of these artworks, which reflects the museum’s “authority” role as it can leave out various forms of artworks that are temporary or difficult to obtain like street art and community-based projects. It also emphasizes how museums have a certain quality to them that establishes what qualifies as modern or contemporary art. This dataset privileges artworks that are more convenient to catalogue, preserve, and classify, consequently marginalizing the others that resist commodification and permanence. If this data were to be our only source, it would leave out critical information and obscure marginalized communities and art histories, such as information about why certain artists and movements were excluded from the dataset and the social or political contexts behind these artworks that could add context. Alternative, marginalized forms of artistic expressions that exist outside of the boundaries of institutional, ownership based systems would be invisible, highlighting the museum’s authority to define artistic value and emphasizing museums’ dominant narrative over modern and contemporary art.