Categories
Cheerio Javascript Web Scraping

Simple Web Scraping with Cheerio and node-fetch

Before we get started explaining how to you can use Cheerio.js to scrape the web it is useful to understand what Cheerio is. It describes itself as a “Fast, flexible, and lean implementation of core jQuery designed specifically for the server.” put in regular speak that means it makes it easy to take HTML and play around with it, in our context to make it easy to find content we want using selectors.

What makes Cheerio special is how easy and lightweight it is to use on your server, but it is not a headless chrome browser like Puppeteer so you won’t be doing anything that requires user interaction such as logging in.

To get started with Cheerio you need to know how to run a node server, which is a simple as opening a terminal window in VS Code and learning a few commands but once you figure that out you can get started. You also need node-fetch which you can find using npm.

The first thing you are going to want to do is get some HTML from the website you want to scrape like this

const fetch = require('node-fetch');

fetch('https://blog.thewebscraping.com/')
    .then(res => res.text())
    .then(body => console.log(body));

Then once you have the HTML you want to load it into Cheerio like this

const cheerio = require('cheerio');
const $ = cheerio.load('<ul id="fruits">...</ul>');

Now the the entire HTML is loaded into $ and if you wanted to find the text inside fruits class you can simple go

$('.fruits').text();