OCRText recognition generates readable text.
Natural Language ProcessingLight Natural Language Processing Toolkits generate text data.
Image Feature ExctractionQuantitative data generated based off of image values.
TANDEM is an online environment that generates quantitative image and text data from files submitted by the user. This output is intended to be used as source material for data visualization, quantitative analysis, and distant reading of multimodal print objects.
TANDEM compiles existing open source technologies including a version of OCR, image feature extraction, and light natural language processing packages to generate useful output.The output will be concatenated into a single document that can be saved as a .CSV file format.
To explore the functionality of TANDEM, we will employ a test corpus of Public Domain picture books. The test corpus will illustrate that TANDEM streamlines the ability to generate the kinds of data needed to make informed distant readings of multimodal print artifacts. TANDEM has an intended audience of scholars with a range of computational expertise and a need for quantitative insight into picture books, comics, illuminated manuscripts, and other images with overlaid text.