Datahub SSH

Visual pattern recognition and (meta) data extraction

The project created a workbench, comprised of several Python modules, that can be used for computationally analyzing large datasets of visual material. Although created in the context of the Remembering Activism, it is designed to cater to the needs of researchers in (Art) History, Media Studies, Comparative Literary Studies, and interdisciplinary Memory Studies who work with digitized visual material.

The focus of this tool is on three aspects of visual data:

  1. Features of visual data (content): the tool identifies objects in images, as well as image-features that allow for the detection of similar images in and across large datasets, and on the Internet (using the Google Cloud Vision API).
  2. Metadata (context): the tool aggregates metadata (photographers, locations, topics etc.) in order to contextualize the visual sources. Metadata is key in connecting the ‘content’ with actual research. Only when algorithmically detected material can be linked to spatio-temporal, political and institutional contexts, can questions regarding cultural memory be answered.
  3. Relationships between metadata and features (relations): the tool combines image features and objects with aggregated metadata in order to draw connections between sources, places and people. In this way patterns in the reproduction of visual information can be explored in a data-driven way.

The tool uses a combination of Github and Jupyter Notebooks to create a ‘workbench’ for humanities scholars interested in using computational methods to explore large visual datasets.[1] The workbench consists of three different sections: (1) collecting large visual data sets from archives and the internet (2) extracting information from these large collections (3) analysis of the extracted information, including visualizations.

This results e.g. in the possibility of scrolling through a number of similar, but slightly different pictures.


[1] “Some Background,” accessed November 4, 2019,