The 4chan parser - #2

The project proved to be easier than expected.

I started by looking at the HTML “make-up” of 4chan’s /b/ board, I noticed that each thread on the frontpage, each thread had a CSS class called “thread”. So I looped over those with BeautifulSoups .find_all() function.

That found each of the threads.

After that, I extracted the ID From the thread and a link to the thread.
Then I opened that using urlopen().
I iterated over all divs with the class postContainer.
Inside of those, I found all anchors that had fileThumb as class and got the “href” from those links, downloaded those and bam. Done!

Shitty technical writings aren’t my thing apparently


Now read this

The Autodidact and The University

So. As some of you may know, I am currently attending university. And to be honest, it’s not going well. Not well at all. I’m not following the lectures, I’m not really passionate about the project I am working on for “school”. I would... Continue →