Extracting data Archives

Identify tech words from a text

As part of the pairing project activity at Qxf2 – pick a project that can produce a meaningful output in 5 hours and work on it collaborating with your team mates, I picked this project to identify the tech keywords from a text using NLTK module. This post covers the steps I followed to find the tech related words in […]

August 21, 2019

Scraping websites using Octoparse (no programming!)

Did you know you can scrape data from webpages without writing a single line of code? In this post, we will talk about a tool called Octoparse. We used Octoparse to scrape data from a list of URLs, without any coding at all. Data is valuable and it’s not always easy to get the correct data from the web sources […]

November 9, 2017April 2, 2018

Quilt – a Data Package Manager

We have been testing data-rich applications for a long time. And like any experienced tester, we realize how difficult it is to create, maintain and update data every time the data model changes. So we were excited to come across Quilt, a data package manager, via Hacker News. We were thrilled that it integrated well with our favorite programming language […]

October 9, 2017April 2, 2018

Extracting data from PDFs using Python

When testing highly data dependent products, I find it very useful to use data published by governments. When government organizations publish data online, barring a few notable exceptions, it usually releases it as a series of PDFs. The PDF file format was not designed to hold structured data, which makes extracting data from PDFs difficult. In this post, I will […]

August 16, 2017April 2, 2018

Category: Extracting data

Identify tech words from a text

Scraping websites using Octoparse (no programming!)

Quilt – a Data Package Manager

Extracting data from PDFs using Python

Subscribe to our weekly Newsletter