Paul Gregg

Jack of all Tech.

Code release: preg_find() – A recursive file listing tool for PHP

Written By: pgregg - Apr• 18•2007

Version 2.1

I originally wrote this a few years ago and never really promoted it beyond the realms of the #php IRC channel on EfNet.  However, it has managed to find its way into applications such as WordPress and many other PHP apps.  It is gratifying to know that others are finding it useful.

So what is preg_find() anyway? A short summary for those who have never encountered it: Imaging a recursive capable glob() with the ability to filter the results with a regex (PCRE) and various arguments to modify the results to bring back additional data.

Well today I thought I would add one commonly requested feature. Sorting.  Using the power of PHP’s anonymous (lambda-style) functions, preg_find() now creates a custom sort routine based on the arguments passed in, filename, dir+filename, last modified, file size, disk usage (yes those last 2 are different) in either ascending or decending order.

Download preg_find.phps
Download preg_find.php in plain text format

A simple example to get started – we’ll work on my PHP miscellaneous code directory:

Example 1: List the files (no directories):

Code:

include 'preg_find.php';
$files = preg_find('/./', '../code');
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

Now let us look at a recursive search – this is easy, just pass in the PREG_FIND_RECURSIVE argument.
Example 2: List the files, recursively:


Code:

$files = preg_find('/./', '../code', PREG_FIND_RECURSIVE);
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

Lets go futher, this time we don’t want to see any files – only a directory structure.
Example 3: List the directory tree:


Code:

$files = preg_find('/./', '../code', PREG_FIND_DIRONLY|PREG_FIND_RECURSIVE);
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

It should be obvious by now that we are using constants as our modifier arguments. What might not be immediately obvious is that these constants are “bit” values (.e.g. 1, 2, 4, 8, 16, …, 1024, etc) and using PHP’s Bitwise Or operator “|” we can combine modifiers to pass multiple modifiers into the function.

How about a regex? Files starting with str_ and ending in .php
Example 4: Using a regex on the same code as example 1:


Code:

$files = preg_find('/^str_.*?.php$/D', '../code');
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

What about that funky PREG_FIND_RETURNASSOC modifier?
This will change the output dramatically from a simple file/directory array to an associative array where the key is the filename, and the value is lots of information about that file.

Example5: Use of PREG_FIND_RETURNASSOC


Code:

$files = preg_find('/^str_.*?.php$/D', '../code', PREG_FIND_RETURNASSOC);
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

As I mentioned earlier, I added sorting capability to the results, so let us look at some examples of that.

Example 6. Sorting the results (of example 1)


Code:

$files = preg_find('/./', '../code', PREG_FIND_SORTKEYS);
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

Example 7. And reverse sort.


Code:

$files = preg_find('/./', '../code', PREG_FIND_SORTKEYS|PREG_FIND_SORTDESC);
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

Ok, thats all well and good, what about something more interesting?

Example 8. Finding the largest 5 files in the tree, sorted by filesize, descending.


Code:

$files = preg_find('/./', '../code',
  PREG_FIND_RECURSIVE|PREG_FIND_RETURNASSOC|PREG_FIND_SORTFILESIZE|PREG_FIND_SORTDESC);
$i=1;
foreach($files as $file => $stats) {
  printf('<br>%d) %d %s', $i, $stats['stat']['size'], $file);
  $i++;
  if ($i > 5) break;
}


You can see the result here.

Or what about the 10 most recently modified files?

Example 9.


Code:

$files = preg_find('/./', '../code',
  PREG_FIND_RECURSIVE|PREG_FIND_RETURNASSOC|PREG_FIND_SORTMODIFIED|PREG_FIND_SORTDESC);
$i=1;
foreach($files as $file => $stats) {
  printf('<br>%d) %s - %d bytes - %s', $i,
    date('Y-m-d H:i:s', $stats['stat']['mtime']), $stats['stat']['size'], $file);
  $i++;
  if ($i > 10) break;
}


You can see the result here.

I am keen to receive feedback on what you think of this function.   If you have used it in some other application – great, I would love to know.  Suggestions, improvements, criticisms are also always welcome.

Release: vmclone.pl for VMware ESX Server

Written By: pgregg - Mar• 28•2007

I have released a script, vmclone.pl, to assist in the cloning of full Virtual Machines within an ESX Server box.  This came about because of a gap in functionality between replicating individual hard disks and the clone option in the VI client that was mostly missing from VMs.

The tool will replicate and rename all the files in a VM with a single command line execution and optionally allows you to tweak (using regex) some of the options such as changing the memory size of a VM.

The tool is available here: http://www.pgregg.com/projects/vmclone/

I would appreciate any feedback or suggestions on it.

Thanks.

Airline Security and Personal Hygiene

Written By: pgregg - Sep• 27•2006

I have just returned from a week in California and the security on flights is pretty strict – no fluids, gases, liquids of any kind.   So I have 18 hours of travel time from Belfast->London->Los Angeles->San Jose and a further 18 hours coming back.

All very well, until you realise that if you are on such long flights with connections you can get pretty sweaty, and you can’t take any deoderant with you.    I would like to apologise to the poor girl that sat next to me for 11 hours on the LA->London leg.

Do you live in NI and can not yet get broadband?

Written By: pgregg - Jul• 07•2006

If so, I want to hear from you.

I believe that the DETI NI has fudged the contract with BT and let them away with making up the figures for the rest of broadband by allowing Satellite technology.  I believe this is against both the spirit and the letter of the contract.

We need to band together in order to raise a loud enough voice and force our Government to listen and, with luck, ensure that true broadband to the letter of the contract is delivered to enable every home and business in Northern Ireland to get broadband if they so wish.

Please reply to this post with your story, or email me directly via pgregg @ pgregg.com. I am particularly interested in Postcodes of people who have been denied broadband.  Also if you have an actual letter from BT – please scan it and send it to me,

Thanks,

Paul

thetopsites.net stealing PageRank

Written By: pgregg - Jul• 05•2006

In my earlier post today I mentioned the site thetopsites.net.  They are offering a snippet of code to display your (Google) PageRank on your webpage.

All very well until you look at the code provided:


Code:

<a href="http://pagerank.thetopsites.net/" title="Free PageRank
Meter for www.mysite.com" target="_blank"><img
src="http://pagerank.thetopsites.net/r.php?url=www.mysite.com" 
border="0" alt="Free PageRank Meter for www.mysite.com" /></a>


and the warning "You should not change in any way the above code(except the url of your site) or you will be disquallified from this free service".

Then you notice that they don’t use the now-standard rel="nofollow" property in the href or img src tags.  Put two and two together and you realise that their Free PageRank monitor is actually donating some of your precious PageRank to them (because that is how PageRank works).

Clever? Yes.  Underhanded? Certainly.

BWDOW.COM referrer link spammer

Written By: pgregg - Jul• 05•2006

I’ve noticed a small number of referrers claiming a link came from http://www.bwdow.com/newsites.php?category=newsites however if you go there you won’t find any link to your page.  I put it down to yet another referral link spammer.  Usually I just add the ip (or ip range) to my firewall and be done with it – but these guys had many different IPs which suggested it wasn’t some automated spamming engine.

It was obvious they were not valid click throughs because they were HEAD requests, e.g:
www.pgregg.com 81.213.243.127 – – [05/Jul/2006:05:47:34 +0100] "HEAD / HTTP/1.0" 200 – "http://www.bwdow.com/newsites.php?category=newsites" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)" "-" "-"

So, today it all came to a HEAD (pardon the pun) and I went looking for them and it seems that they openly admit to using referral spamming (under the thin disguise of claiming their reviewers must have clicked across to your great site).

See this google cache of their presently broken forum.

During my searches I also came across another reference to them in a thetopsites.net referrer spammer blacklist and noticed a new form of referral spam thievery which I shall look at in my next post.

Feel free to add the following IPs to your firewall to blacklist these BWDOW jokers.


Code:

plop:pgregg/p3-~apache/logs-431%->fgrep www.bwdow.com access_log  
| cut -d  -f2 | sort -n | uniq -c | sort -rn | ip2hostname.php
  33 208.98.1.192    (No-RDNS-Record)
   3 81.213.244.165  (dsl.dynamic81213244165.ttnet.net.tr)
   3 66.90.92.192    (usr1-114.sharktech.net)
   2 85.96.246.224   (dsl.dynamic8596246224.ttnet.net.tr)
   2 85.96.132.72    (dsl.dynamic859613272.ttnet.net.tr)
   2 85.96.132.32    (dsl.dynamic859613232.ttnet.net.tr)
   2 85.96.132.165   (dsl.dynamic8596132165.ttnet.net.tr)
   2 85.106.221.178  (dsl85-106-56754.ttnet.net.tr)
   2 85.106.219.204  (dsl85-106-56268.ttnet.net.tr)
   2 85.101.68.231   (85.101.68.231)
   2 85.100.0.155    (dsl.dynamic851000155.ttnet.net.tr)
   2 81.213.246.190  (dsl.dynamic81213246190.ttnet.net.tr)
   2 81.213.246.16   (dsl.dynamic8121324616.ttnet.net.tr)
   2 81.213.243.127  (dsl.dynamic81213243127.ttnet.net.tr)
   2 81.213.242.70   (dsl.dynamic8121324270.ttnet.net.tr)
   2 81.213.242.57   (dsl.dynamic8121324257.ttnet.net.tr)
   2 81.213.242.38   (dsl.dynamic8121324238.ttnet.net.tr)
   2 81.213.242.169  (dsl.dynamic81213242169.ttnet.net.tr)
   1 88.226.161.218  (dsl88-226-41434.ttnet.net.tr)
   1 85.99.91.55     (dsl.dynamic85999155.ttnet.net.tr)
   1 85.99.91.29     (dsl.dynamic85999129.ttnet.net.tr)
   1 85.99.91.1      (dsl.dynamic8599911.ttnet.net.tr)
   1 85.99.150.70    (dsl.dynamic859915070.ttnet.net.tr)
   1 85.99.150.22    (dsl.dynamic859915022.ttnet.net.tr)
   1 85.97.179.67    (dsl.dynamic859717967.ttnet.net.tr)
   1 85.97.144.139   (dsl.dynamic8597144139.ttnet.net.tr)
   1 85.97.144.10    (dsl.dynamic859714410.ttnet.net.tr)
   1 85.96.76.232    (dsl.dynamic859676232.ttnet.net.tr)
   1 85.96.133.148   (dsl.dynamic8596133148.ttnet.net.tr)
   1 85.96.133.108   (dsl.dynamic8596133108.ttnet.net.tr)
   1 85.96.132.248   (dsl.dynamic8596132248.ttnet.net.tr)
   1 85.96.103.27    (dsl.dynamic859610327.ttnet.net.tr)
   1 85.107.131.9    (dsl85-107-33545.ttnet.net.tr)
   1 85.107.129.212  (dsl85-107-33236.ttnet.net.tr)
   1 85.107.129.131  (dsl85-107-33155.ttnet.net.tr)
   1 85.106.223.192  (dsl85-106-57280.ttnet.net.tr)
   1 85.106.219.5    (dsl85-106-56069.ttnet.net.tr)
   1 85.106.219.153  (dsl85-106-56217.ttnet.net.tr)
   1 85.106.218.237  (dsl85-106-56045.ttnet.net.tr)
   1 85.104.231.205  (dsl85-104-59341.ttnet.net.tr)
   1 85.104.226.241  (dsl85-104-58097.ttnet.net.tr)
   1 85.104.226.18   (dsl85-104-57874.ttnet.net.tr)
   1 85.103.43.3     (85.103.43.3)
   1 85.103.41.84    (85.103.41.84)
   1 85.103.41.133   (85.103.41.133)
   1 85.103.41.106   (85.103.41.106)
   1 85.102.119.30   (dsl85-102-30494.ttnet.net.tr)
   1 85.102.118.150  (dsl85-102-30358.ttnet.net.tr)
   1 85.101.70.220   (85.101.70.220)
   1 85.101.70.17    (85.101.70.17)
   1 85.101.68.243   (85.101.68.243)
   1 85.101.66.97    (85.101.66.97)
   1 85.101.65.113   (85.101.65.113)
   1 85.100.3.186    (dsl.dynamic851003186.ttnet.net.tr)
   1 85.100.202.138  (dsl.dynamic85100202138.ttnet.net.tr)
   1 85.100.200.79   (dsl.dynamic8510020079.ttnet.net.tr)
   1 85.100.2.61     (dsl.dynamic85100261.ttnet.net.tr)
   1 85.100.1.90     (dsl.dynamic85100190.ttnet.net.tr)
   1 81.213.247.227  (dsl.dynamic81213247227.ttnet.net.tr)
   1 81.213.247.2    (dsl.dynamic812132472.ttnet.net.tr)
   1 81.213.246.57   (dsl.dynamic8121324657.ttnet.net.tr)
   1 81.213.246.21   (dsl.dynamic8121324621.ttnet.net.tr)
   1 81.213.246.199  (dsl.dynamic81213246199.ttnet.net.tr)
   1 81.213.246.1    (dsl.dynamic812132461.ttnet.net.tr)
   1 81.213.245.140  (dsl.dynamic81213245140.ttnet.net.tr)
   1 81.213.244.39   (dsl.dynamic8121324439.ttnet.net.tr)
   1 81.213.243.67   (dsl.dynamic8121324367.ttnet.net.tr)
   1 81.213.243.200  (dsl.dynamic81213243200.ttnet.net.tr)
   1 81.213.243.176  (dsl.dynamic81213243176.ttnet.net.tr)
   1 81.213.243.143  (dsl.dynamic81213243143.ttnet.net.tr)
   1 81.213.242.5    (dsl.dynamic812132425.ttnet.net.tr)
   1 81.213.241.171  (dsl.dynamic81213241171.ttnet.net.tr)
   1 81.213.240.90   (dsl.dynamic8121324090.ttnet.net.tr)
   1 81.213.240.68   (dsl.dynamic8121324068.ttnet.net.tr)
   1 81.213.240.25   (dsl.dynamic8121324025.ttnet.net.tr)
   1 81.213.240.155  (dsl.dynamic81213240155.ttnet.net.tr)
   1 81.213.240.153  (dsl.dynamic81213240153.ttnet.net.tr)
   1 81.213.240.150  (dsl.dynamic81213240150.ttnet.net.tr)
   1 81.213.240.149  (dsl.dynamic81213240149.ttnet.net.tr)
   1 81.213.240.117  (dsl.dynamic81213240117.ttnet.net.tr)

I can’t get broadband… :(

Written By: pgregg - Jun• 23•2006

Despite DETINI having paid BT �10 million to ensure that Northern Ireland has 100% broadband coverage and an announcement to say that it has been achieved, I am back on dial-up Internet access.

Last week, my neighbour who applied for access before me finally got a letter from BT’s Frank McManus saying that BT actually only had 99% coverage and he would be unable to get broadband, but he should consider Satellite instead.  Thus I’m not holding out much hope of me getting it either.

The BT contract notes that only ADSL or 5.8Ghz Wireless Radio broadband would be considered acceptable, so why is BT allowed to tell him (and possibly me) we cannot have broadband?   Theres a real stink to this and I’d love to hear from others if they too, in Northern Ireland, cannot get broadband because �10m DETINI money says that should not be the case.

My favourite Windows error message

Written By: pgregg - May• 23•2006

(and yes, I do know why it happens)

Bruce Perens and Richard Stallman to speak in Belfast

Written By: pgregg - Feb• 16•2006

FOSS Means Business.

As DW writes over on his page, Bruce Perens and Richard Stallman are coming to Belfast in March 2006 for a FOSS event.  I will simply repeat DW’s post as we need to get more publicity for this event.


DW wrote:

Ciaran O’Riordan of the Free Software Foundation Europe has announced that Bruce Perens and Richard Stallman will be coming to Belfast to speak on March 16th at the Spires conference center in the city. More
information can be found at the FOSS Means Business web site.

This promises to be a really interesting event to attend, if for no other reason than to listen to Stallman’s GPL v3 advocacy talk. It is a very rare opportunity for you to come along and learn more about what is perhaps one of the most defining moments for free software this decade.

Entrance, thus far, appears to be free, or at least very cheap.

I’ll try to go and capture photographs of the event.

Qmail vs qmail – anality in spelling

Written By: pgregg - Feb• 08•2006

I noticed a referrer to my "Qmail is dying" article from http://thedjbway.org/qmail/qmail_at_eight.html where the author, Wayne Marshall,  refers to my post as "Qmail (sic) is dying".  For the record Wayne, I’m neither American, nor in America.

The obvious inference is that Qmail is not the correct spelling of qmail.  Lowercase vs Uppercase.

All I can say – if that is what you are reduced to in order to try to discredit an author then you are on very shaky ground.

Ignoring the obvious "Proper noun" grammatical issues, if you happen to search the  qmail Mailing List Archives (retained spelling from the site) the earliest reference I could find for someone saying it is "qmail, not Qmail" is 2001.   Many of us were using Qmail 8 years ago and nobody "corrected" it until several years later.  Too late.

If you also check the reference http://www.qmail.org site, you’ll find many references to Qmail – including Dave Sill’s "Life With Qmail" – I’d love to see if the first version of this referred to Qmail or qmail. 
Now the self serving qmail elitists push the "qmail" capitalisation. Who cares?

Ironically, the author also lists three 3rd party patches to fix "bugs" and declares "qmail seems pretty healthy to us".  Scary.  And people bitch at Microsoft for taking months to fix a bug.  We’re 8 years on with qmail and if Dan can’t bring himself to patch 4 lines of the code to fix 3 acknowleged bugs in qmail then I would counter that qmail is not in a healthy state at all.