The 4chan parser - #2

The project proved to be easier than expected.

I started by looking at the HTML “make-up” of 4chan’s /b/ board, I noticed that each thread on the frontpage, each thread had a CSS class called “thread”. So I looped over those with BeautifulSoups .find_all() function.

That found each of the threads.

After that, I extracted the ID From the thread and a link to the thread.
Then I opened that using urlopen().
I iterated over all divs with the class postContainer.
Inside of those, I found all anchors that had fileThumb as class and got the “href” from those links, downloaded those and bam. Done!

Shitty technical writings aren’t my thing apparently


Now read this

Brains & heart

About 2 years ago, I had a discussion with a priest, who told me that the human brain could not survive without the human heart. And I’ve been thinking a lot about that, because I don’t think it makes sense. She attributed it to the... Continue →