The 4chan parser - #2

The project proved to be easier than expected.

I started by looking at the HTML “make-up” of 4chan’s /b/ board, I noticed that each thread on the frontpage, each thread had a CSS class called “thread”. So I looped over those with BeautifulSoups .find_all() function.

That found each of the threads.

After that, I extracted the ID From the thread and a link to the thread.
Then I opened that using urlopen().
I iterated over all divs with the class postContainer.
Inside of those, I found all anchors that had fileThumb as class and got the “href” from those links, downloaded those and bam. Done!

Shitty technical writings aren’t my thing apparently

 
3
Kudos
 
3
Kudos

Now read this

The Autodidact and The University

So. As some of you may know, I am currently attending university. And to be honest, it’s not going well. Not well at all. I’m not following the lectures, I’m not really passionate about the project I am working on for “school”. I would... Continue →