Paul Gregg

Jack of all Tech.

Code release: preg_find() – A recursive file listing tool for PHP

Written By: pgregg - Apr• 18•2007

Version 2.1

I originally wrote this a few years ago and never really promoted it beyond the realms of the #php IRC channel on EfNet.  However, it has managed to find its way into applications such as WordPress and many other PHP apps.  It is gratifying to know that others are finding it useful.

So what is preg_find() anyway? A short summary for those who have never encountered it: Imaging a recursive capable glob() with the ability to filter the results with a regex (PCRE) and various arguments to modify the results to bring back additional data.

Well today I thought I would add one commonly requested feature. Sorting.  Using the power of PHP’s anonymous (lambda-style) functions, preg_find() now creates a custom sort routine based on the arguments passed in, filename, dir+filename, last modified, file size, disk usage (yes those last 2 are different) in either ascending or decending order.

Download preg_find.phps
Download preg_find.php in plain text format

A simple example to get started – we’ll work on my PHP miscellaneous code directory:

Example 1: List the files (no directories):

Code:

include 'preg_find.php';
$files = preg_find('/./', '../code');
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

Now let us look at a recursive search – this is easy, just pass in the PREG_FIND_RECURSIVE argument.
Example 2: List the files, recursively:


Code:

$files = preg_find('/./', '../code', PREG_FIND_RECURSIVE);
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

Lets go futher, this time we don’t want to see any files – only a directory structure.
Example 3: List the directory tree:


Code:

$files = preg_find('/./', '../code', PREG_FIND_DIRONLY|PREG_FIND_RECURSIVE);
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

It should be obvious by now that we are using constants as our modifier arguments. What might not be immediately obvious is that these constants are “bit” values (.e.g. 1, 2, 4, 8, 16, …, 1024, etc) and using PHP’s Bitwise Or operator “|” we can combine modifiers to pass multiple modifiers into the function.

How about a regex? Files starting with str_ and ending in .php
Example 4: Using a regex on the same code as example 1:


Code:

$files = preg_find('/^str_.*?.php$/D', '../code');
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

What about that funky PREG_FIND_RETURNASSOC modifier?
This will change the output dramatically from a simple file/directory array to an associative array where the key is the filename, and the value is lots of information about that file.

Example5: Use of PREG_FIND_RETURNASSOC


Code:

$files = preg_find('/^str_.*?.php$/D', '../code', PREG_FIND_RETURNASSOC);
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

As I mentioned earlier, I added sorting capability to the results, so let us look at some examples of that.

Example 6. Sorting the results (of example 1)


Code:

$files = preg_find('/./', '../code', PREG_FIND_SORTKEYS);
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

Example 7. And reverse sort.


Code:

$files = preg_find('/./', '../code', PREG_FIND_SORTKEYS|PREG_FIND_SORTDESC);
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

Ok, thats all well and good, what about something more interesting?

Example 8. Finding the largest 5 files in the tree, sorted by filesize, descending.


Code:

$files = preg_find('/./', '../code',
  PREG_FIND_RECURSIVE|PREG_FIND_RETURNASSOC|PREG_FIND_SORTFILESIZE|PREG_FIND_SORTDESC);
$i=1;
foreach($files as $file => $stats) {
  printf('<br>%d) %d %s', $i, $stats['stat']['size'], $file);
  $i++;
  if ($i > 5) break;
}


You can see the result here.

Or what about the 10 most recently modified files?

Example 9.


Code:

$files = preg_find('/./', '../code',
  PREG_FIND_RECURSIVE|PREG_FIND_RETURNASSOC|PREG_FIND_SORTMODIFIED|PREG_FIND_SORTDESC);
$i=1;
foreach($files as $file => $stats) {
  printf('<br>%d) %s - %d bytes - %s', $i,
    date('Y-m-d H:i:s', $stats['stat']['mtime']), $stats['stat']['size'], $file);
  $i++;
  if ($i > 10) break;
}


You can see the result here.

I am keen to receive feedback on what you think of this function.   If you have used it in some other application – great, I would love to know.  Suggestions, improvements, criticisms are also always welcome.

You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.

13 Comments

  1. matt says:

    Thanks for this, really useful.

  2. Reid says:

    preg_find’s recursive search leaks a large amount of memory when sorting  due to the numerous create_function() calls to build the sorting function, and as a side effect this also incurs extra sorting complexity – as the tree depth of an item increases so does the number of times it is sorted. When searching over a large tree, you can quickly exhaust php’s available memory. I overcame this by renaming the preg_find algorithm to _preg_find (and changing the recursive call to _preg_find),  separating the sorting code into a wrapper preg_find function that takes the same argument list. This function first calls _preg_find, then applies the sorting to the result set. This way, the recursion does not affect the sorting, and resource consumption is much more manageable.

    patch:

    Code:

    Index: preg_find.php
    ===================================================================
    --- preg_find.php       (revision 11)
    +++ preg_find.php       (working copy)
    @@ -52,12 +52,34 @@
     // to use more than one simply seperate them with a | character
    
    
    +//wrapper function, ensure that we only sort once and only incur the memory hit of create_function once
    +function preg_find($pattern, $start_dir='.', $args=NULL) {
    +  $files_matched = _preg_find($pattern, $start_dir, $args);
    +
    +  // Before returning check if we need to sort the results.
    +  if ($args & (PREG_FIND_SORTKEYS|PREG_FIND_SORTBASENAME|PREG_FIND_SORTMODIFIED|PREG_FIND_SORTFILESIZE|PREG_FIND_SORTDISKUSAGE)) {
    +    $order = ($args & PREG_FIND_SORTDESC) ? 1 : -1;
    +    $sortby = '';
    +    if ($args & PREG_FIND_RETURNASSOC) {
    +      if ($args & PREG_FIND_SORTMODIFIED)  $sortby = "['stat']['mtime']";
    +      if ($args & PREG_FIND_SORTBASENAME)  $sortby = "['basename']";
    +      if ($args & PREG_FIND_SORTFILESIZE)  $sortby = "['stat']['size']";
    +      if ($args & PREG_FIND_SORTDISKUSAGE) $sortby = "['du']";
    +    }
    +
    +    $filesort = create_function('$a,$b', "$a1=$a$sortby;$b1=$b$sortby; if ($a1==$b1) return 0; else return ($a1<$b1) ? $order : 0- $order;");
    +    uasort($files_matched, $filesort);
    +  }
    +
    +  return $files_matched;
    
    +}
    +
     // Search for files matching $pattern in $start_dir.
     // if args contains PREG_FIND_RECURSIVE then do a recursive search
     // return value is an associative array, the key of which is the path/file
     // and the value is the stat of the file.
    -Function preg_find($pattern, $start_dir='.', $args=NULL) {
    +function _preg_find($pattern, $start_dir='.', $args=NULL) {
    
       $files_matched = array();
    
    @@ -94,25 +116,12 @@
         }
         if ( is_dir($filepath) && ($args & PREG_FIND_RECURSIVE) ) {
           $files_matched = array_merge($files_matched,
    -                                   preg_find($pattern, $filepath, $args));
    +                                   _preg_find($pattern, $filepath, $args));
         }
       }
    
       closedir($fh);
    
    -  // Before returning check if we need to sort the results.
    -  if ($args & (PREG_FIND_SORTKEYS|PREG_FIND_SORTBASENAME|PREG_FIND_SORTMODIFIED|PREG_FIND_SORTFILESIZE|PREG_FIND_SORTDISKUSAGE)) {
    -    $order = ($args & PREG_FIND_SORTDESC) ? 1 : -1;
    -    $sortby = '';
    -    if ($args & PREG_FIND_RETURNASSOC) {
    -      if ($args & PREG_FIND_SORTMODIFIED)  $sortby = "['stat']['mtime']";
    -      if ($args & PREG_FIND_SORTBASENAME)  $sortby = "['basename']";
    -      if ($args & PREG_FIND_SORTFILESIZE)  $sortby = "['stat']['size']";
    -      if ($args & PREG_FIND_SORTDISKUSAGE) $sortby = "['du']";
    -    }
    -    $filesort = create_function('$a,$b', "$a1=$a$sortby;$b1=$b$sortby; if ($a1==$b1) return 0; else return ($a1<$b1) ? $order : 0- $order;");
    -    uasort($files_matched, $filesort);
    -  }
       return $files_matched;
    
     }


  3. pgregg says:

    Hi Reid,

    Great spot there – I had realised that with recursion came additional sorting, but I did not realise that the memory hit would be so large.   I’ve patched the code to only sort at the final function exit, however rather than break out a further function call, I used a static variable to record the current recursive depth.

    Patch to 2.1 (non-contextual because it is smaller) is below, or 2.2 is now in place.

    Thanks for pointing that out :)

    Code:

    9c9,10
    <  * Version: 2.1
    ---
    >  * Updated 9 June 2007 to prevent multiple calls to sort during recursion
    >  * Version: 2.2
    61a63,65
    >   static $depth = -1;
    >   ++$depth;
    >
    104c108
    <   if ($args & (PREG_FIND_SORTKEYS|PREG_FIND_SORTBASENAME|PREG_FIND_SORTMODIFIED|PREG_FIND_SORTFILESIZE|PREG_FIND_SORTDISKUSAGE)) {
    ---
    >   if (($depth==0) && ($args & (PREG_FIND_SORTKEYS|PREG_FIND_SORTBASENAME|PREG_FIND_SORTMODIFIED|PREG_FIND_SORTFILESIZE|PREG_FIND_SORTDISKUSAGE)) ) {
    115a120
    >   --$depth;


  4. robwilkerson says:

    Hey guys –

    This function looks immensely useful and I’m looking at implementing it in an object oriented context.  I’m in the process of trying to make the necessary modifications (properly credited, be happy to donate the changes back, blah, blah, blah 🙂 and, not being a seasoned PHP developer, I’m trying to understand the syntactical meaning of the single ampersand.

    Code:

    if ($args & self::PREG_FIND_NEGATE)


    I’m not familiar with that syntax in php and every search I do seems to return more "&&" references than I’m willing to sort through.  🙂

    I’ll keep digging around, but any insight would be very much appreciated.

    Rob

  5. localhost77 says:

    I just want to have a list of filename without the directory path, is there any option to co this?

  6. Seanbo says:

    I incorporated the changes from Reid above as well as added a sort for file extension and the ability not to recurse links (hitting a bad link that points to the parent direectory could cause an infinite recursion….

    — preg_find.php 2009-06-11 23:27:38.000000000 -0400
    +++ preg_find.sean 2009-06-11 23:27:16.000000000 -0400
    @@ -23,6 +23,7 @@
    define(‘PREG_FIND_FULLPATH’, 4);
    define(‘PREG_FIND_NEGATE’, 8);
    define(‘PREG_FIND_DIRONLY’, 16);
    +define(‘PREG_FIND_IGNORELINKS’, 24);
    define(‘PREG_FIND_RETURNASSOC’, 32);
    define(‘PREG_FIND_SORTDESC’, 64);
    define(‘PREG_FIND_SORTKEYS’, 128);
    @@ -30,10 +31,12 @@
    define(‘PREG_FIND_SORTMODIFIED’, 512); # requires PREG_FIND_RETURNASSOC
    define(‘PREG_FIND_SORTFILESIZE’, 1024); # requires PREG_FIND_RETURNASSOC
    define(‘PREG_FIND_SORTDISKUSAGE’, 2048); # requires PREG_FIND_RETURNASSOC
    +define(‘PREG_FIND_SORTEXTENSION, 4096); # requires PREG_FIND_RETURNASSOC

    // PREG_FIND_RECURSIVE – go into subdirectorys looking for more files
    // PREG_FIND_DIRMATCH – return directorys that match the pattern also
    // PREG_FIND_DIRONLY – return only directorys that match the pattern (no files)
    +// PREG_FIND_IGNORELINKS – Do not follow links
    // PREG_FIND_FULLPATH – search for the pattern in the full path (dir+file)
    // PREG_FIND_NEGATE – return files that don’t match the pattern
    // PREG_FIND_RETURNASSOC – Instead of just returning a plain array of matches,
    @@ -58,7 +61,29 @@
    // if args contains PREG_FIND_RECURSIVE then do a recursive search
    // return value is an associative array, the key of which is the path/file
    // and the value is the stat of the file.
    -Function preg_find($pattern, $start_dir=’.’, $args=NULL) {
    +function preg_find($pattern, $start_dir=’.’, $args=NULL) {
    + $start_dir = chop($start_dir,’/’);
    + $files_matched = _preg_find($pattern, $start_dir, $args);
    +
    + //Before returning check if we need to sort the results
    + if($args & (PREG_FIND_SORTKEYS|PREG_FIND_SORTBASENAME|PREG_FIND_SORTMODIFIED|PREG_FIND_SORTFILESIZE|PREG_FIND_SORTDISKUSAGE)) {
    + $order = ($args & PREG_FIND_SORTDESC) ? 1 : -1;
    + $sortby = ”;
    + if ($args & PREG_FIND_RETURNASSOC) {
    + if ($args & PREG_FIND_SORTMODIFIED) $sortby = “[‘stat’][‘mtime’]”;
    + if ($args & PREG_FIND_SORTBASENAME) $sortby = “[‘basename’]”;
    + if ($args & PREG_FIND_SORTFILESIZE) $sortby = “[‘stat’][‘size’]”;
    + if ($args & PREG_FIND_SORTDISKUSAGE) $sortby = “[‘du’]”;
    + if ($args & PREG_FIND_SORTEXTENSION) $sortby = “[‘extension’]”;
    + }
    +
    + $filesort = create_function(‘$a,$b’, “$a1=$a$sortby;$b1=$b$sortby; if ($a1==$b1) return 0; else return ($a1<$b1) ? $order : 0- $order;"); + uasort($files_matched, $filesort); + } + return $files_matched; +} + +function _preg_find($pattern, $start_dir='.', $args=NULL) { static $depth = -1; ++$depth; @@ -91,32 +116,22 @@ if (function_exists('dirname')) $fileres['dirname'] = dirname($filepath); if (function_exists('basename')) $fileres['basename'] = basename($filepath); if (isset($fileres['uid']) && function_exists('posix_getpwuid')) $fileres['owner'] = posix_getpwuid ($fileres['uid']); + if (function_exists('end')) $fileres['extension'] = pathinfo($filepath, PATHINFO_EXTENSION); $files_matched[$filepath] = $fileres; } else array_push($files_matched, $filepath); } } if ( is_dir($filepath) && ($args & PREG_FIND_RECURSIVE) ) { - $files_matched = array_merge($files_matched, - preg_find($pattern, $filepath, $args)); + if (!is_link($filepath) && !($args & PREG_FIND_IGNORELINKS) ) { + $files_matched = array_merge($files_matched, + _preg_find($pattern, $filepath, $args)); + } } } closedir($fh); - // Before returning check if we need to sort the results. - if (($depth==0) && ($args & (PREG_FIND_SORTKEYS|PREG_FIND_SORTBASENAME|PREG_FIND_SORTMODIFIED|PREG_FIND_SORTFILESIZE|PREG_FIND_SORTDISKUSAGE)) ) { - $order = ($args & PREG_FIND_SORTDESC) ? 1 : -1; - $sortby = ''; - if ($args & PREG_FIND_RETURNASSOC) { - if ($args & PREG_FIND_SORTMODIFIED) $sortby = "['stat']['mtime']"; - if ($args & PREG_FIND_SORTBASENAME) $sortby = "['basename']"; - if ($args & PREG_FIND_SORTFILESIZE) $sortby = "['stat']['size']"; - if ($args & PREG_FIND_SORTDISKUSAGE) $sortby = "['du']"; - } - $filesort = create_function('$a,$b', "$a1=$a$sortby;$b1=$b$sortby; if ($a1==$b1) return 0; else return ($a1<$b1) ? $order : 0- $order;"); - uasort($files_matched, $filesort); - } --$depth; return $files_matched;

  7. Paul Gregg says:

    Hi Sean,

    This is a good idea to add a sort-by-extension, however your implementation is flawed as the value used must represent a single bit of a 32bit integer. The value “24” won’t work as that just represents both PREG_FIND_DIRONLY and PREG_FIND_NEGATE turned on at the same time.

    I’ll add “ext” functionality to the code shortly and post it up tonight.

    Regards,
    PG

  8. Paul Gregg says:

    New diff Sean – please note that the previous code had implemented Reid’s suggestion of preventing the multiple sorts – however I did it via a static variable which I would argue is better than splitting the routine into two functions. Thus you should have started with my base code instead of Reid’s.

    10c10,12
    < * Version: 2.2 --- > * Updated 12 June 2009 to allow for sorting by extension and prevent following
    > * symlinks by default
    > * Version: 2.3
    32a35,36
    > define(‘PREG_FIND_SORTEXTENSION’, 4096); # requires PREG_FIND_RETURNASSOC
    > define(‘PREG_FIND_FOLLOWSYMLINKS’, 8192);
    40a45,48
    > // PREG_FIND_FOLLOWSYMLINKS – Recursive searches (from v2.3) will no longer
    > // traverse symlinks to directories, unless you
    > // specify this flag. This is to prevent nasty
    > // endless loops.
    52a61
    > // PREG_FIND_SORTEXTENSION – Sort based on the filename extension
    92a102
    > if (($i=strrpos($fileres[‘basename’], ‘.’))!==false) $fileres[‘ext’] = substr($fileres[‘basename’], $i+1); else $fileres[‘ext’] = ”;
    100c110,111
    < $files_matched = array_merge($files_matched, --- > if (!is_link($filepath) || ($args & PREG_FIND_FOLLOWSYMLINKS))
    > $files_matched = array_merge($files_matched,
    115a127
    > if ($args & PREG_FIND_SORTEXTENSION) $sortby = “[‘ext’]”;

    The published preg_find has been bumped to v2.3 and includes this change – you can get it via the link at te top of this article.

  9. This has been a great help for me to find the hackers file in my client’s website. Is this also able to search files with certain words in the content?

  10. Poil says:

    Hi,

    Thanks for this great function.

    Is there any way to add “SORT_NUMERIC” to this function.

    I would like to order cpu0 cpu1 cpu10 cpu 11 cpu2 correctly (cpu0 cpu1 cpu2 … cpu10 cpu11)

    Best regards

  11. Jonathan says:

    Hi Paul

    This is proving to be a very useful script. Thank you very much.

    I’m customising the script to show filenames only (using basename()) and I’m also making the files displayed links.

    All good so far.

    I would however like to go one step further. That is I would like to separate the results into sections according to the folder the files are found in.

    Example folder/file structure:
    folder1
    file1
    file2
    file3
    folder2
    file1
    file2

    and so on. I would like to display the results like this:

    folder1

    file1
    file2
    file3

    folder2

    file1
    file2

    This would be very useful to me (possibly to others?) if you could point me in the direction I need to go.

    Thank you very much!

    Jonathan

  12. Sam says:

    How do you remove the “./” on the results?

  13. Paolo says:

    Hi, I got a problem with PRE_FIND_NEGATE not working as (I’d) expected (listing not matching filenames).
    I solved changing a few lines of code around line 90

    – – – – – – –
    From:

    if (preg_match($pattern,
    ($args & PREG_FIND_FULLPATH) ? $filepath : $file)) { …
    – – – – –
    Into:

    $matched = preg_match($pattern, ($args & PREG_FIND_FULLPATH) ? $filepath : $file);
    if($args & PREG_FIND_NEGATE) { $matched = !$matched; }

    if($matched)
    {…..
    – – – – – –
    Hope being useful to someone 🙂
    P.

Leave a Reply

Your email address will not be published. Required fields are marked *


*