Recently I began playing around with OCR using tesseract at work. Getting it to work proved to be a pain, since there was no administration rights when installing the software.

A brief outline of the approach is highlighted here.

pypdfocr was the package which was used to convert pdf’s (or even non-pdf files could be used here). Based on the documentation, the external requirements that were used were:

As a side note I really should learn how to build these unofficial binaries myself so I can learn how to redistribute it. This includes making my own suite of portable apps for personal use. This will probably be a project I work on myself in the future.