Blog

How to collect data from any website in just a few minutes?

Daniel Chmurny
Daniel Chmurny
Python Developer
Product
February 8, 2023
How to collect data from any website in just a few minutes?

In today's fast-paced business world, data is more important than ever. Companies are constantly looking for ways to collect data and later analyze them to make better decisions, improve their products and services, and gain a competitive edge. However, the process that allows you to collect data can often be time-consuming and tedious. But... what if I told you there is a way to collect data from any website in minutes?

In the following article, we will guide you through a step-by-step process on how to collect data from a Wikipedia page using a Simplescraper.

Let’s prepare for a task...

As you probably might guess, looking at the tool, in this tutorial, we will perform web scraping. However, if you are unfamiliar with the term, we highly recommend checking A web scraping quick guide with a hands-on tutorial.

The main goal of the following tutorial is to focus on a straightforward and quick way to collect data from a website. It also is important to mention that the final result will not be a production-ready solution. For that one, you often need increased security and privacy standard, rotating proxies, schedulers, and more... We’ll cover this in a different article.

We decided to scrape Wikipedia because... “Friendly, low-speed bots are welcome viewing article pages.

wikipedia robot
source: https://en.wikipedia.org/robots.txt, source: Wikipedia

In order to collect data, we will use a Simplescraper that has recently been a Product of the day on the Product Hunt. It is an easy-to-use tool that enables users to efficiently extract data from any website and convert it into organized information. The user-friendly Chrome extension provided by Simplescraper makes it easy to select and extract content from any website and instantly accessible as an API endpoint, ready to be downloaded in CSV or JSON format or even sent directly to your preferred web applications. The Simplescraper dashboard allows you to manage all your scraping recipes with ease.

Time to collect data.

In the following tutorial, we will extract the title, text, and image from the Wikipedia page with a Hot dog. Let’s dive in!

Step 1: Install the Chrome extension

First, you need to install the Chrome extension that can be downloaded here. The extension allows you to visually select parts of the website that you would like to extract. A simplescraper allows you to create a scraper also via their dashboard. However, using the Chrome extension is a much faster and much more intuitive approach.

Chrome Web Store, source: Google

Step 2: Go to Wikipedia and choose elements to extract

When the Chrome extension is installed. Please go to en.wikipedia.org/wiki/Hot_dog page. On the top, you should see a Simplescraper’s navbar that appears every time you run the extension.

Now we will collect data. To extract the first value - a page title, please click on the + button in the top left corner, name the value (e.g. “Title”), and click on it on the webpage. A simplescraper should automatically detect the area and saves it. See the short gif below.

collect data using forloop.ai and simplescraper
Collect the title from the article using a simplescraper.io, source: Forloop

Perform the same action to collect data - text, and image. The ready mage scraper should look as follows:

Collect data from a website in minutes
A ready-made scraper.

View results

When a scraper is ready, you can click View Results in the top right corner, and it’s done!

A simplescraper will collect data, and you will be redirected to a dashboard where you will see your values. You can download them in the form of data.csv or data.json.

collect data results
Scraping results are presented in the form of `data.csv` and `data.json` source: Forloop

Summary

In the above tutorial, we showed a quick and simple way to collect data from a Wikipedia website in just a few minutes using a Simplescraper. The main goal of this tutorial was to give a sense of what web scraping is and how it can be performed quickly to obtain values that interest you most. Of course, it is possible to improve that scraper significantly. We add more data to extract or automate the process for a few pages rather than only one. If you are interested in web scraping, we highly encourage you to join our weekly webinars and slack channel to learn more.