The 4chan parser - #2

The project proved to be easier than expected.

I started by looking at the HTML “make-up” of 4chan’s /b/ board, I noticed that each thread on the frontpage, each thread had a CSS class called “thread”. So I looped over those with BeautifulSoups .find_all() function.

That found each of the threads.

After that, I extracted the ID From the thread and a link to the thread.
Then I opened that using urlopen().
I iterated over all divs with the class postContainer.
Inside of those, I found all anchors that had fileThumb as class and got the “href” from those links, downloaded those and bam. Done!

Shitty technical writings aren’t my thing apparently


Now read this

Well, I went ahead and did it

So, for the last, well, since I started university, I have felt out of place in it. It taught me concepts, but not how to apply them, it taught me algorithms, but not why they were important in a specific case, it taught me to break down... Continue →