Posts

Showing posts with the label and Python

Extracting text from images with Tesseract OCR, OpenCV, and Python

Image
In the end we will see, it can be concluded that Tesseract is perfect for scanning clean documents and you can easily convert  the image’s text from OCR to word,  pdf to word,  or  to any other required format.  It has pretty high accuracy and font variability. This is very useful in case of institutions where a lot of documentation is involved such as government offices, hospitals, educational institutes, etc. In the current release 4.0, Tesseract supports OCR based deep learning that is significantly more accurate. You can access the code file and input image  here  to create your own OCR task. Try replicating this task and achieve the desirable results, happy coding! Coding Here, I will use the following sample receipt image: First part is image thresholding. Following is the code that you can use for thresholding: pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe' # your path may be different For Windows Only 1 - You need to have Tesser