The 4chan parser - #2

The project proved to be easier than expected.

I started by looking at the HTML “make-up” of 4chan’s /b/ board, I noticed that each thread on the frontpage, each thread had a CSS class called “thread”. So I looped over those with BeautifulSoups .find_all() function.

That found each of the threads.

After that, I extracted the ID From the thread and a link to the thread.
Then I opened that using urlopen().
I iterated over all divs with the class postContainer.
Inside of those, I found all anchors that had fileThumb as class and got the “href” from those links, downloaded those and bam. Done!

Shitty technical writings aren’t my thing apparently

 
3
Kudos
 
3
Kudos

Now read this

Well, I went ahead and did it

So, for the last, well, since I started university, I have felt out of place in it. It taught me concepts, but not how to apply them, it taught me algorithms, but not why they were important in a specific case, it taught me to break down... Continue →