{"id":23265,"date":"2025-05-21T03:42:44","date_gmt":"2025-05-21T07:42:44","guid":{"rendered":"https:\/\/qxf2.com\/blog\/?p=23265"},"modified":"2025-05-21T03:42:44","modified_gmt":"2025-05-21T07:42:44","slug":"writing-xpaths-to-validate-ui-with-appium-ocr-plugin","status":"publish","type":"post","link":"https:\/\/qxf2.com\/blog\/writing-xpaths-to-validate-ui-with-appium-ocr-plugin\/","title":{"rendered":"Writing XPaths to validate UI with Appium OCR Plugin"},"content":{"rendered":"<p>In our <a href=\"https:\/\/qxf2.com\/blog\/accessibility-id-as-a-locator-strategy-on-appium-for-ios-and-android-apps\/\" target=\"_blank\" rel=\"noopener\">previous post<\/a>, we suggested that <code>Accessibility ID<\/code> as a reliable locator strategy for identifying mobile elements across different platforms. While working on the task, we explored another plugin &#8211; <a href=\"https:\/\/github.com\/jlipps\/appium-ocr-plugin\" target=\"_blank\" rel=\"noopener\">Appium OCR Plugin<\/a> that helped us write robust XPaths to validate UI that worked on both iOS and Android. But the problem we had was, although we were able to use this plugin in our local setup, remote test execution platforms have not yet supported it. We&#8217;re writing this post to share our findings with the testing community, especially those who could benefit from using this plugin.<\/p>\n<hr \/>\n<h3>Why OCR?<\/h3>\n<p><a href=\"https:\/\/qxf2.com?utm_source=ocr_xpath_appium&amp;utm_medium=click&amp;utm_campaign=From%20blog\" target=\"_blank\" rel=\"noopener\">We<\/a> had an existing automation test that validated an error message that appeared when the user entered invalid payment details. Initially, we used the Tesseract OCR Python library to verify the presence of this message. However, including Tesseract in our requirements.txt just for this single scenario felt like overkill, it unnecessarily bloated our test environment. While exploring alternatives, we came across the Appium OCR plugin, which allowed us to validate the error message effectively. As we dug deeper into how the it works, we realised it could also serve as a viable option for identifying UI elements across mobile platforms.<\/p>\n<hr \/>\n<h3>How does the OCR plugin work?<\/h3>\n<p>When we start the Appium server with the <code>--use-plugins=ocr<\/code> CLI option, it adds an additional context\u2014<code>OCR<\/code> to the mobile app session and updates the page source corresponding to text objects found on the screen.<br \/>\nThe image below shows a side-by-side comparison: the <code>NATIVE_APP<\/code> context on the left and the <code>OCR<\/code> context on the right.<\/p>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2025\/05\/app_source_native_ocr.png\" data-rel=\"lightbox-image-0\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-23280\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2025\/05\/app_source_native_ocr.png\" alt=\"App source on Native app and ocr contexts\" width=\"3452\" height=\"868\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2025\/05\/app_source_native_ocr.png 3452w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2025\/05\/app_source_native_ocr-300x75.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2025\/05\/app_source_native_ocr-1024x257.png 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2025\/05\/app_source_native_ocr-768x193.png 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2025\/05\/app_source_native_ocr-1536x386.png 1536w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2025\/05\/app_source_native_ocr-2048x515.png 2048w\" sizes=\"auto, (max-width: 3452px) 100vw, 3452px\" \/><\/a><\/p>\n<p>In the <code>NATIVE_APP<\/code> context, the page source shows the UI hierarchy, a tree structure representing how UI elements are arranged and nested. In contrast, the <code>OCR<\/code> context provides an XML structure generated by the OCR plugin, where elements are represented purely by their visible text.<br \/>\nWhen we switch to the <code>OCR<\/code> context, the plugin dynamically generates and updates the app source with OCR-based XML like this:<\/p>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2025\/05\/page_source_XML.png\" data-rel=\"lightbox-image-1\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-23277\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2025\/05\/page_source_XML.png\" alt=\"XML page source on OCR context\" width=\"2432\" height=\"606\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2025\/05\/page_source_XML.png 2432w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2025\/05\/page_source_XML-300x75.png 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2025\/05\/page_source_XML-1024x255.png 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2025\/05\/page_source_XML-768x191.png 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2025\/05\/page_source_XML-1536x383.png 1536w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2025\/05\/page_source_XML-2048x510.png 2048w\" sizes=\"auto, (max-width: 2432px) 100vw, 2432px\" \/><\/a><br \/>\nThis text-based representation made us wonder: could we use it to uniquely identify elements on the screen? We experimented by writing XPaths using the <code>text()<\/code> function to locate elements in the OCR-derived source. It worked. We successfully replaced our earlier Tesseract-based validation with this OCR plugin. The process was straightforward: We crafted an XPath for the error message text and verified the element&#8217;s presence within the OCR context.<\/p>\n<hr \/>\n<h3>Setup<\/h3>\n<p>Here\u2019s how you can install the Appium OCR plugin:<\/p>\n<pre lang=\"python\">appium plugin install --source=npm appium-ocr-plugin\r\n<\/pre>\n<p>To use the plugin, start the Appium server with:<\/p>\n<pre lang=\"python\">appium server --use-plugins=ocr\r\n<\/pre>\n<p>When you launch the server with the plugin enabled, it adds a new OCR context to your app session. You can then switch to this context to write robust, text-based XPaths with ease.<\/p>\n<hr \/>\n<h3>Writing an XPath to identify element<\/h3>\n<p>We currently use <code>pytesseract<\/code> to extract text from images. Here&#8217;s how we do it:<\/p>\n<pre lang=\"python\">\r\nimage_dir = self.screenshot_dir\r\nfull_image_path = os.path.join(image_dir, f\"{image_name}.png\")\r\n# Check if the file exists\r\nif os.path.exists(full_image_path):\r\n    # Load the image\r\n    image = Image.open(full_image_path)\r\n    # Enhance the image before OCR\r\n    enhanced_image = self.preprocess_image(image)\r\n    # Perform OCR on the enhanced image\r\n    text = pytesseract.image_to_string(enhanced_image)\r\nelse:\r\n    text = \"\"\r\nreturn text\r\n<\/pre>\n<p>Although our current approach worked, using a Python module added unnecessary size and complexity to everyone\u2019s test setup. The Appium OCR plugin offered a cleaner alternative, letting us run validations only when needed. This kept our overall test setup lightweight and efficient.<br \/>\nWith OCR enabled on the Appium server, checking for text like Invalid email address became straightforward using a simple XPath:<\/p>\n<pre lang=\"python\">driver.switch_to.context(\"OCR\")\r\ninvalid_email_message = driver.find_element(AppiumBy.XPATH,f\"\/\/lines\/item[text()='Invalid email address']\")\r\nif invalid_email_message.is_displayed():\r\n    print(\"Successfully validated that the error message is displayed\")\r\nelse:\r\n    print(\"The error message is not present\")\r\n<\/pre>\n<p>The OCR context simplifies error message validation by giving direct access to the visible text on screen. By using the <code>text()<\/code> function in XPath queries, we can reliably locate and verify content across platforms in mobile automation tests.<\/p>\n<p><strong>Note:<\/strong> The code examples in this post are simplified and contrived versions of what we use in our actual test suites. They are intended for illustration purposes only and do not reflect our production coding standards or best practices.<\/p>\n<hr \/>\n<h3>OCR Simplifies Testing, But the Ecosystem Isn\u2019t Ready<\/h3>\n<p>While this new plugin initially got us excited, our enthusiasm quickly faded. When we ran the same setup on BrowserStack, we hit a roadblock &#8211; BrowserStack and other remote test execution platforms don\u2019t currently support the OCR plugin. As a result, we can use this capability only in our local test environment.<\/p>\n<hr \/>\n<p>So there you have it, a powerful yet underrated plugin that helps you write robust, text-based XPaths for cross-platform mobile apps. By tapping into OCR-generated page sources, the plugin simplifies content validation and element identification across both iOS and Android. The only catch? For now, platforms support it only during local test runs. Still, it\u2019s a great addition to your local automation toolkit and one worth watching as broader platform support develops.<\/p>\n<hr \/>\n<h3>Essential Service offering from Qxf2<\/h3>\n<p>Qxf2&#8217;s <a href=\"https:\/\/qxf2.com\/essential-service-offering?utm_source=ocr_xpath_appium&amp;utm_medium=click&amp;utm_campaign=From%20blog\" target=\"_blank\" rel=\"noopener\">Essential Service<\/a> offers front-loaded QA support tailored for startups, combining exploratory testing, automated test suites, and CI integration to streamline release testing. This cost-effective solution provides expert QA without the need for a full-time hire, ensuring efficient and reliable product releases. Reach out to mak@qxf2.com to learn more.<\/p>\n<hr>\n","protected":false},"excerpt":{"rendered":"<p>In our previous post, we suggested that Accessibility ID as a reliable locator strategy for identifying mobile elements across different platforms. While working on the task, we explored another plugin &#8211; Appium OCR Plugin that helped us write robust XPaths to validate UI that worked on both iOS and Android. But the problem we had was, although we were able [&hellip;]<\/p>\n","protected":false},"author":9,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[48,50,104,71],"tags":[],"class_list":["post-23265","post","type-post","status-publish","format-standard","hentry","category-android","category-appium","category-ios","category-mobile-automation"],"_links":{"self":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/23265","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/users\/9"}],"replies":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/comments?post=23265"}],"version-history":[{"count":38,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/23265\/revisions"}],"predecessor-version":[{"id":23330,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/23265\/revisions\/23330"}],"wp:attachment":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/media?parent=23265"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/categories?post=23265"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/tags?post=23265"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}