7-Segment Display OCR

I too had great difficulty in getting tesseract to recognize digits from images of LCD displays.

I had some marginal success by preprocessing the images with ImageMagick to overlay a copy of the image on itself with a slight vertical shift to fill in the gaps between segments:

$ composite -compose Multiply -geometry +0+3  foo.tif foo.tif foo2.png

In the end, though, my saving grace was the “Seven Segment Optical Character Recognition” binary: http://www.unix-ag.uni-kl.de/~auerswal/ssocr/

Many thanks to the author, Erik Auerswald, for this code!

I haven’t tried OCRing 7-Segment Display, but I suspect that the problem might be caused by the characters not being connected components. Tesseract does not handle disconnected fonts well from my experience.

Simple erosion (image preprocessing) might help by connecting segments, but you would have to test it and play with kernel size to prevent too much distortion.