The 4chan parser - #2

The project proved to be easier than expected.

I started by looking at the HTML “make-up” of 4chan’s /b/ board, I noticed that each thread on the frontpage, each thread had a CSS class called “thread”. So I looped over those with BeautifulSoups .find_all() function.

That found each of the threads.

After that, I extracted the ID From the thread and a link to the thread.
Then I opened that using urlopen().
I iterated over all divs with the class postContainer.
Inside of those, I found all anchors that had fileThumb as class and got the “href” from those links, downloaded those and bam. Done!

Shitty technical writings aren’t my thing apparently

 
3
Kudos
 
3
Kudos

Now read this

Brains & heart

About 2 years ago, I had a discussion with a priest, who told me that the human brain could not survive without the human heart. And I’ve been thinking a lot about that, because I don’t think it makes sense. She attributed it to the... Continue →