TANDEM: A Web-Based Text and Image Data Generator
Kelly Blanchat, Jojo Karlin,
Stephen Real, Christopher Vitale
DH Praxis
Spring 2015
ABSTRACT
TANDEM is a Python-based Django web-application that generates text and image data from files submitted by the user. TANDEM is for scholars seeking quantitative insight into a corpus consisting of picture books, comics, advertisements, and other images with overlaid text. The TANDEM application compiles three existing open source technologies: Tesseract OCR, Open Source Computer Vision (OpenCV), and a natural language processing library called Natural Language Toolkit ...
Read more