About

What is TANDEM?

TANDEM was created for anyone that wants to explore close or distant reading combined with a visual analysis.

How was TANDEM built?

Blood, sweat, and a few tears. Actually, the TANDEM team has used Python to wrap Tesseract OCR, NLTK, and OpenCV into a streamlined Django data generation application. Previously such an analysis would require that each program to be run individually through the command line.

Why use TANDEM?

If you’re unfamiliar with any of the above, that’s ok! TANDEM was built to make the process easier. We recommend that you take a look at what each element of TANDEM does in order to be a fully informed and aware user.

Here are several places we recommend exploring:

What is Tesseract OCR?

Tesseract OCR is an optical character recognition engine, which has been sponsored by Google since 2006. Tesseract can read a wide variety of image formats and convert them to text in over 60 languages.

What is NLTK?

NLTK stands for the Natural Language Toolkit, and is a free, open source, community-driven project. It contains a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language. Such libraries include: Lexical analysis: Word and text tokenizer, n-gram and collocations, part-of-speech tagging, and named-entity recognition.

What is OpenCV?

OpenCV stands for Open Source Computer Vision and focuses on real-time image processing. The OpenCV library has more than 2500 optimized algorithms, which can be used to detect and recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects, find similar images from an image database, follow eye movements, recognize scenery and establish markers to overlay it with augmented reality, and more. Within the current 0.5 version of TANDEM, the OpenCV component is able to…

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

TANDEM