When you need to quickly scrape some data from a website, here’s a simple solution using:

I’ll ignore error checking for simplicity. Don’t ever do that - especially with node.js - it’ll kick your ass. As always - everything using CoffeeScript.

##Crawling a site

We need some initialization:

request = require 'request'
cheerio = require 'cheerio'
async = require 'async'

Initialize cookie jar in case we’re handling cookie-aware site using HTTP redirects:

jar = request.jar()

Now start actual crawling:

    followRedirect: true
    uri: 'http://cnn.com/'
    timeout: 10000
    jar: jar
, (err, res, body) ->
    $ = cheerio.load body
    $('a').each ->
        console.log "#{$(@).text()} -> #{$(@).attr 'href'}"

We use cheerio to parse request body and find anchor text of each link on site. Results:

CNN Shop -> http://www.turnerstoreonline.com/
