As part of the pairing project activity at Qxf2 – pick a project that can produce a meaningful output in 5 hours and work on it collaborating with your team mates, I picked this project to identify the tech keywords from a text using NLTK module. This post covers the steps I followed to find the tech related words in […]
Identify tech words from a text
Scraping websites using Octoparse (no programming!)
Did you know you can scrape data from webpages without writing a single line of code? In this post, we will talk about a tool called Octoparse. We used Octoparse to scrape data from a list of URLs, without any coding at all. Data is valuable and it’s not always easy to get the correct data from the web sources […]
Quilt – a Data Package Manager
We have been testing data-rich applications for a long time. And like any experienced tester, we realize how difficult it is to create, maintain and update data every time the data model changes. So we were excited to come across Quilt, a data package manager, via Hacker News. We were thrilled that it integrated well with our favorite programming language […]
Extracting data from PDFs using Python
When testing highly data dependent products, I find it very useful to use data published by governments. When government organizations publish data online, barring a few notable exceptions, it usually releases it as a series of PDFs. The PDF file format was not designed to hold structured data, which makes extracting data from PDFs difficult. In this post, I will […]