OCR Limitations

Last edited on

Understanding OCR in Squish

Squish is primarily an object-based automated GUI testing tool. However, sometimes, object identification is not possible, for example, for applications with non-standard GUI elements. To handle such cases, image-based testing capabilities have been added to Squish. In this scenario, Squish is responsible for taking a screenshot of a certain area, and a third-party OCR engine allows the identification and extraction of text from images, enabling test scripts to interact with or verify text elements.

The accuracy of OCR based lookups and verification points in Squish is directly linked to the performance of the selected OCR engine. Therefore, understanding its limitation and how to optimize its output is crucial for enhancing Squish's OCR capabilities.

Using OCR engines to search for a text is more brittle than object-based automation and that there is no guarantee that OCR recognizes all text. There are various factors that affects OCR engines performance and reliability.

Tesseract

Tesseract is a third-party, open-source OCR engine that can be easily integrated within Squish. Squish users can download prebuilt binaries for use with Squish from the Qt Customer Portal. However, please note that we do not provide technical support for Tesseract itself. Our technical support only covers using Tesseract for plain text recognition (with the options offered in the Squish IDE's OCR Selection dialog). Additionally, our technical support does not cover training or customizing Tesseract.

Awareness of Tesseract limitations is essential for tempering expectations and implementing effective mitigation strategies whenever Squish tests use OCR-based features.

Feel free to visit the Tesseract OCR Project Page or Tesseract User Manual for more information about this OCR engine features and limitations.

Other engines

Squish natively supports two additional OCR engines: OCR.Space and Amazon Rekognition. Although they may provide better results in your case, they require that Squish can connect to external services during test execution. Limitations described below in this article also apply to OCR.Space and Amazon Recognition to some degree.

OCR Limitations

Accuracy with Complex Backgrounds

Tesseract may struggle with images that have complex or noisy backgrounds. Text overlaid on intricate designs or high levels of visual clutter may not be recognized accurately.

Font and Text Variation Sensitivity

Tesseract is sensitive to variations in font types and sizes. Highly stylized fonts, cursive text, or significant size variations within the same image can lead to misrecognized or missed text elements.

Contrast Sensitivity

Low contrast between text and background can lead to suboptimal OCR results. Shadows can confuse Tesseract, impacting the accuracy of text extraction.

Contrast Sensitivity Artifacts and Noise

Artifacts or noise in the search area can significantly hinder Tesseract's ability to correctly identify and extract text.

Mitigation

Adjusting the search region

Narrowing the search region in Squish can significantly improve the quality of OCR results by reducing the amount of extraneous visual information Tesseract needs to process. By focusing on specific areas where text is expected, you minimize noise and distractions, allowing for more accurate and reliable text recognition.

Adjusting OCR Settings

Experimenting with different OCR settings in Squish, such as adjusting the Image processing, Page Segmentation Mode (PSM), and language parameters, can help fine-tune Tesseract’s performance for your specific application.

By customizing these settings to better align with the unique characteristics of your application's text and layout, you can achieve significantly improved accuracy and reliability in text recognition.

Conclusion

Factors like low resolution, complex backgrounds, unusual fonts, or poor contrast can significantly impair OCR performance. In such cases, the OCR tool may produce suboptimal results, which is not a reflection of Squish's capabilities but rather limitations of the third party tool. However, with some extra effort, testers should be able to maximize OCR accuracy.