The 4chan parser - #1

So, I’m interested in parsing 4chan.com’s /b/ board and scanning it for keywords.
It will document my progress on this page.

I have decided to use Python, as I would like to expand my knowledge of different programming languages.

As HTML parser I have chosen to use Beautiful Soup

So far, I have gotten capturing the HTML down and parsing the “top” of each post. Onwards!

 
6
Kudos
 
6
Kudos

Now read this

The 4chan parser - #2

The project proved to be easier than expected. I started by looking at the HTML “make-up” of 4chan’s /b/ board, I noticed that each thread on the frontpage, each thread had a CSS class called “thread”. So I looped over those with... Continue →