— Paul Gregg (@pgregg) September 8, 2012
Someone sent me a message on twitter pointing me to the URL: menshn.com/data/chat.php (which shall remain unclickable for reasons that will become apparent). This web page basically dumps the last 20-30k “menshns” out in a semi-structured html data format. In total (at time of writing) it dumps 31MB of data. So you can see why I’m not making it a link. I’ve no desire to overload their systems.
Upon looking at the “View source” on the menshn.com homepage, it seems that they use this to back end the automatically updating feed on their homepage.
If you watch the traffic generated by your browser – you can see it making a request every 4 seconds for https://menshn.com/data/chat.php?roomid=*&lastid=73405
So, now we know where my source got the link from – seems if you don’t supply any arguments, it just dumps everything it has. And so, with such a dataset we are able to do some metrics.
First up, I parsed all the data out to produce a simple ID,Room,Name,Message text file – just to prove to myself that I had understood the data set and was parsing it correctly.
Next, I built into the parser, metric building. Count the unique users, count number of posts/menshns, count number of rooms/topics, etc.
From this I have the top line information:
Number of active users: 218
Number of active rooms: 224
Breaking this down further to “Top 20″ lists, I get:
20 Most prolific users: 5752 janemcqueen 3240 CosensV 2019 Chriss 2011 BlackAdder 1569 PoliticsBlogorguk 1520 Xlibris 1106 DavidX 783 JOSHBHJ 782 Louise 717 EdenFisher 704 JayMcNeil 666 Grist 588 TinderWall 401 RV 384 Bozier 373 jeanprytyskacz 348 MikeARPowell 285 Silaz 251 Rabbs 239 Europe
20 Busiest rooms: 6361 //ukpolitics 3216 //gaymarriage 1252 //religion 1014 //assangecase 877 //olympics2012 717 //judaism 673 //uselection 663 //atheism 642 //mormonism 585 //davidcameron 527 //civilliberty 479 //reshuffle 474 //mittromney 415 //corbyelectio 394 //capitalism 315 //twitter 295 //falklands 224 //louisemensch 208 //philosophy 204 //catholicism
Growth metrics are easily obtained by performing the same test at different times. In my case, they were 3.5 days apart. Leading to the conclusion posted on twitter:
— Paul Gregg (@pgregg) September 11, 2012
If you really want to see all the menshns, rather than overload the menshn server – you can obtain my parsed analysis of the dump at http://pgregg.com/test/menshn/menshnchat.txt
I’d welcome comments on this. For the record – none of this information was obtained via a “hack” and no illegal acts were committed in the gathering of this information.