Paul Gregg

Jack of all Tech.

Menshn stats and where they came from.

Written By: pgregg - Sep• 11•2012

You may have noticed, if you have been following my twitter feed, that I have been posting some Menshn statistics recently. You may also be wondering how I came by these numbers.

 

  Someone sent me a message on twitter pointing me to the URL: menshn.com/data/chat.php (which shall remain unclickable for reasons that will become apparent).  This web page basically dumps the last 20-30k “menshns” out in a semi-structured html data format.  In total (at time of writing) it dumps 31MB of data. So you can see why I’m not making it a link. I’ve no desire to overload their systems.

Upon looking at the “View source” on the menshn.com homepage, it seems that they use this to back end the automatically updating feed on their homepage.  

If you watch the traffic generated by your browser – you can see it making a request every 4 seconds for https://menshn.com/data/chat.php?roomid=*&lastid=73405

So, now we know where my source got the link from – seems if you don’t supply any arguments, it just dumps everything it has. And so, with such a dataset we are able to do some metrics.

First up, I parsed all the data out to produce a simple ID,Room,Name,Message text file – just to prove to myself that I had understood the data set and was parsing it correctly.

Next, I built into the parser, metric building. Count the unique users, count number of posts/menshns, count number of rooms/topics, etc.

From this I have the top line information: 

Number of active users: 218
Number of active rooms: 224

Breaking this down further to “Top 20″ lists, I get:

20 Most prolific users:
 5752 janemcqueen
 3240 CosensV
 2019 Chriss
 2011 BlackAdder
 1569 PoliticsBlogorguk
 1520 Xlibris
 1106 DavidX
 783 JOSHBHJ
 782 Louise
 717 EdenFisher
 704 JayMcNeil
 666 Grist
 588 TinderWall
 401 RV
 384 Bozier
 373 jeanprytyskacz
 348 MikeARPowell
 285 Silaz
 251 Rabbs
 239 Europe

And

20 Busiest rooms:
 6361 //ukpolitics
 3216 //gaymarriage
 1252 //religion
 1014 //assangecase
 877 //olympics2012
 717 //judaism
 673 //uselection
 663 //atheism
 642 //mormonism
 585 //davidcameron
 527 //civilliberty
 479 //reshuffle
 474 //mittromney
 415 //corbyelectio
 394 //capitalism
 315 //twitter
 295 //falklands
 224 //louisemensch
 208 //philosophy
 204 //catholicism

Growth metrics are easily obtained by performing the same test at different times. In my case, they were 3.5 days apart. Leading to the conclusion posted on twitter:  

 

If you really want to see all the menshns, rather than overload the menshn server – you can obtain my parsed analysis of the dump at http://pgregg.com/test/menshn/menshnchat.txt

I’d welcome comments on this. For the record – none of this information was obtained via a “hack” and no illegal acts were committed in the gathering of this information.

 

 

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.