Stage 2: http:BL with Apache2 mod_perl

After my earlier post Referrer and Comment spammers are a PITA I came up with two mod_perl plugins to Apache and an “apache level” firewall.

The reason for the apache-level firewall is two-fold.  There is no direct way for the Apache user to manipulate an iptables chain (as it doesn’t run as root), and second; I was not happy with suid root access or other forms of message passing to a daemon which would manipulate the firewall for me.

Architecture is thus, in httpd.conf place the following two lines:

PerlPreConnectionHandler PGREGG::httpBLBlock
PerlLogHandler PGREGG::httpBLLog

The first tells apache to run the handler in my httpBLBlock.pm module when a connection is received (before the request has been sent by the client).  In this handler, I am simply looking for a filename matching that IP in a directory that is writable by the apache user.  The contents of the file are a SCORE:httpBL_answer:[LIST].  Based on this, the module checks the mtime of the filename is in the last SCORE days, then the firewall is in effect. If so, we simply tell apache to drop the connection.  If the file has expired, we delete the file.

The second line is more interesting, and what creates the firewall filenames. In order to not impede the general speed of request handling, processing is performed in the Logging section of the Apache process. Our module is called by apache after the response has been sent, but before the access_log entry has been written.  In our module we perform the http:BL API call and compute the above SCORE based upon the Threat* level and Age* of the API response. (* both Threat and Age are octets in the DNS lookup).  We merely discount the Threat down to zero based on the Age (0-255) where an entry 255 days old reduces the SCORE to zero.
If the SCORE is larger than our trigger level (3) then we create the firewall filename, log the entry in our own httpbl.log and return Apache2::Const::FORBIDDEN.  This causes Apache to not log the entry in the normal access_log.  Otherwise, if all is ok, we return Apache2::Const::OK and Apache logs the hit as normal.

I have a bit of code tidy up, restructure the config/firewall directory and pull some common code out to a shared module before I can release to the world.

An interesting side effect to publishing the last story out through Planet PHP and other news sources along with the Project Honey Pot image is that when browsers viewed those sources, they all asked for the image off my server. In several cases, these were known spammer, Comment spammer, and other abusers. My server then created the firewall entry blocking them before they were able to follow the links back to my server.
 
I have been reading up more on Apache Bucket Brigades in an attempt to allow the firewall filter to be placed immediately after the request has been received and allow a custom response to the browser. This may help an otherwise unsuspecting user if their machine had been trojaned. I don’t mind admitting I’m thoroughly confused right now ūüôā

tail -# file, in PHP

I read Kevin’s Read Line from File article today and thought I would add some sample code I wrote a while back to address a similar task in PHP – How to perform the equivalent of “tail -# filename” in PHP.

A typical task in some applications such as a shoutbox or a simple log file viewer is how to extract the last ‘n’ lines from a (very) large file – efficiently.¬† There are several approaches one can take to solving the problem (as always in programming there are many ways to skin the proverbial cat) – taking the easy way out and shelling out to run the tail command; or reading the file line by line keeping the last ‘n’ lines in memory with a queue.

However, here I will demonstrate a fairly tuned method of seeking to the end of the file and stepping back to read sufficient lines to the end of the file. If insufficient lines are returned, it incrementally looks back further in the file until it either can look no further, or sufficient lines are returned.

Assumptions need to be made in order to tune the algorithm for the competing challenges of :

  • Reading enough lines in
  • as few reads as possible

My approach to this is to provide an approximation to the average size of a line: $linelength
Multiply by ‘n’: $linecount
and we have the byte $offset from the end of the file that we will seek to to begin reading.

A check we need to do right here is that we haven’t offset to before the beginning of the file, and so we have to override $offset to match the file size.

Also as I’m being over-cautious, I’m going to tell it to offset $linecount + 1 – The main reason for this is by seeking to a specific byte location in the file, we would have to be very lucky to land on the first character of a new line – therefore we must perform a fgets() and throw away that result.

Typically, I want it to be able to read ‘n’ lines forward from the offset given, however if that proves insufficient, I’m going to grow the offset by 10% and try again. I also want to make it so that the algorithm is better able to tune itself if we grossly underestimate what $linelength should be.¬† In order to do this, we’re going to track the string length of each line we do get back and adjust the offset accordingly.

In our example, lets try reading the last 10 lines from Apache’s access_log

So let’s look at the code so far, nothing interesting, we’re just prepping for the interesting stuff:

$linecount  = 10;  // Number of lines we want to read
$linelength = 160; // Apache's logs are typically ~200+ chars
// I've set this to < 200 to show the dynamic nature of the algorithm
// offset correction.
$file = '/usr/local/apache2/logs/access_log.demo';
$fsize = filesize($file);


// check if file is smaller than possible max lines
$offset = ($linecount+1) * $linelength;
if ($offset > $fsize) $offset = $fsize;

Next up we’re going to open the file and using our method of seeking to the end of the file, less our offset, here is the meat of our routine:

$fp = fopen($file, 'r');
if (
$fp === false) exit;

$lines = array(); // array to store the lines we read.

$readloop = true;
while(
$readloop) {
// we will finish reading when we have read $linecount lines, or the file
//¬†just¬†doesn’t¬†have¬†$linecount¬†lines

// seek to $offset bytes from the end of the file
fseek($fp,¬†0¬†–¬†$offset,¬†SEEK_END);

  // discard the first line as it won't be a complete line
// unless we're right at the start of the file
if ($offset != $fsize) fgets($fp);

// tally of the number of bytes in each line we read
$linesize = 0;

// read from here till the end of the file and remember each line
while($line = fgets($fp)) {
array_push($lines, $line);
$linesize += strlen($line); // total up the char count

//¬†if¬†we’ve¬†been¬†able¬†to¬†get¬†more¬†lines¬†than¬†we¬†need
// lose the first entry in the queue
// Logically we should decrement $linesize too, but if we
// hit the magic number of lines, we are never going to use it
if (count($lines) > $linecount) array_shift($lines);
}

// We have now read all the lines from $offset until the end of the file
if (count($lines) == $linecount) {
//¬†perfect¬†–¬†have¬†enough¬†lines,¬†can¬†exit¬†the¬†loop
$readloop = false;
} elseif (
$offset >= $fsize) {
//¬†file¬†is¬†too¬†small¬†–¬†nothing¬†more¬†we¬†can¬†do,¬†we¬†must¬†exit¬†the¬†loop
$readloop = false;
} elseif (
count($lines) < $linecount) {
// try again with a bigger offset
$offset = intval($offset * 1.1);  // increase offset 10%
// but also work out what the offset could be if based on the lines we saw
$offset2 = intval($linesize/count($lines) * ($linecount+1));
// and if it is larger, then use that one instead (self-tuning)
if ($offset2 > $offset) $offset = $offset2;
//¬†Also¬†remember¬†we¬†can’t¬†seek¬†back¬†past¬†the¬†start¬†of¬†the¬†file
if ($offset > $fsize) $offset = $fsize;
echo 
‘Trying¬†with¬†a¬†bigger¬†offset:¬†‘,¬†$offset,¬†“n”;
    // and reset
$lines = array();
  }
}

//¬†Let’s¬†have¬†a¬†look¬†at¬†the¬†lines¬†we¬†read.
print_r($lines);

At first glance it might seem line overkill for the task, however stepping through the code you can see the expected while loop with fgets() to read each line. The only thing we are doing at this stage is shifting the first line of the $lines array if we happen to read too many lines, and also tallying up how many characters we managed to read for each line.

If we exit the while/fgets loop with the correct number of lines, then all is well, we can exit the main retry loop and we have the result in $lines.

Where the code gets interesting is what we do if we don’t achieve the required number of lines.¬† The simple fix is to step back by a further 10% by increasing the offset and trying again, but remember we also counted up the number of bytes we read for each line we did get, so we can very simply obtain an average real-file line size by dividing this with the number of lines in our $lines array. This enables us to override the previous offset value to something larger, if indeed we were wildly off in our estimates.

By adjusting our offset value and letting the loop repeat, the routine will try again and repeat until it succeeds or fails gracefully by the file not having sufficient lines, in which case it’ll return what it could get.

For the complete, working, source code please visit:

http://pgregg.com/projects/php/code/tail-10.phps

Sample execution:

http://pgregg.com/projects/php/code/tail-10.php

TinyURL PHP “flaw” ?

The Register is running a story today TinyURL, your configs are showing which points out that TinyURL has a /php.php page displaying the contents of phpinfo().

The article then goes on to make some scary sounding claims from security consultant Rafal Los “Why would you want to run a web service as ‘Administrator’ because if
I figure out a way to jack that service, I completely, 100% own that
machine.” and “More importantly… why is this server running as ROOT:WHEEL?!

Sorry Rafal – but you appear to have no idea how web servers work, or all that much about (web) security.

All unix based webservers start as root if they want to bind to the restricted (and default) port 80, after which they switch to the configured UID for request handling.  So, right there, goes all Rafal’s claims about pwning the machine.

Check your own server, the _SERVER and _ENV values will reflect the
starting shell/environment, which just happens to be root.  In
other words, there is nothing wrong with the settings. Having said that, they do have register_globals turned on, which isn’t ideal – but it isn’t a gaping hole if the underlying php code is safely coded.

Also to TinyURL’s credit, they are running Suhosin patch to harden their server.  They’re also running the latest production PHP (which is more than I can say).  Granted, they probably don’t want to be exposing phpinfo() – but this all just an overblown storm in a teacup.

PHP on LinkedIn.com

Since LinkedIn opened up its Groups system, there has been a huge growth in the number of groups related to PHP.  Some with charters, some without; some with a specific community background and others with a specific regional focus.  I am posting this to bring attention to some of them.

In order of popularity (member count) some general groups (non-regional) are:

Some of these are useful if you are looking for a job (the recruiters tend to play nice and stay on-topic), others ban job posts and stick to discussions.
There are literally hundreds of groups related to PHP in some shape or fashion – pure PHP, LAMP, PHP&Mysql, Frameworks, and many regional *PUG type groups.

Migrated to MovableType

Well after a few days of poking and prodding and working my way around Ubuntu Hardy Heron bug compiling Image::Magick (tip: it is a bug in the supplied gcc-4.2.3 – you can get gcc 4.3 in gcc-snapshot apt package), I finally have a working MT install.

Next up was writing a PunBB article and comment exporter to create a MTimport format file that I could load into MT to pre-populate the blog. Couple of trial runs later and here we are.

Let’s see if I can manage to post a little more frequently.

For those syndicating the old blog, rewrite rules should mean you have nothing to change but please let me know if anything is awry. 
General feed is /feed/all
PHP category feed is /feed/php

To silent fanfare, Microsoft released SQL Server 2005 Driver for PHP

On July 24, Microsoft released version 1.0 of their native SQL Server 2005 Driver for PHP.

http://www.microsoft.com/downloads/details.aspx?FamilyId=61 … 597C21A2E2A&displaylang=en

Some months back I downloaded a beta version of this after having problems working with international characters (UTF-8) with PDO and MSSQL and impressively the SQL Server 2005 Driver for PHP worked very well.

Congratulations to Microsoft for continuing with this and their recent contribution to ADODB.  I’m looking for better PDO support now :)

FastCGI for IIS (PHP related)

I haven’t seen this announced on the PHP blogs (planets) yet and since it may be of interest to those running PHP (with IIS as a CGI) I’ll repost the details here.

Joe Stagner’s Blog (Microsoft employee) has posted that the IIS team have released the FastCGI extension for IIS 5.1/6.0 as a free download from http://www.iis.net/.

Rather than reword it all, here are the salient bits in Joe’s words:

HUGE KUDOS to the IIS team for their hard work and innovation (technical and political) for making FastCGI a reality.)

If your a developer that needs to use a CGI based platform (Like PHP) and work on Windows ÔŅĹ then this is a godsend.

They guys went into over-drive to get this ready before the upcoming Zend-Con.

Here are the official particulars.

Since early 2006, Microsoft and Zend have been working together on a technical collaboration with the PHP community to significantly enhance the reliability and performance of PHP on Windows Server 2003 and Windows Server 2008.  As part of this collaboration, the IIS product group has been working on a new component for IIS6 and IIS7 called FastCGI Extension which will enable IIS to much more effectively host PHP applications.   

Code release: preg_find() – A recursive file listing tool for PHP

Version 2.1

I originally wrote this a few years ago and never really promoted it beyond the realms of the #php IRC channel on EfNet.  However, it has managed to find its way into applications such as WordPress and many other PHP apps.  It is gratifying to know that others are finding it useful.

So what is preg_find() anyway? A short summary for those who have never encountered it: Imaging a recursive capable glob() with the ability to filter the results with a regex (PCRE) and various arguments to modify the results to bring back additional data.

Well today I thought I would add one commonly requested feature. Sorting.  Using the power of PHP’s anonymous (lambda-style) functions, preg_find() now creates a custom sort routine based on the arguments passed in, filename, dir+filename, last modified, file size, disk usage (yes those last 2 are different) in either ascending or decending order.

Download preg_find.phps
Download preg_find.php in plain text format

A simple example to get started – we’ll work on my PHP miscellaneous code directory:

Example 1: List the files (no directories):

Code:

include 'preg_find.php';
$files = preg_find('/./', '../code');
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

Now let us look at a recursive search – this is easy, just pass in the PREG_FIND_RECURSIVE argument.
Example 2: List the files, recursively:

Code:

$files = preg_find('/./', '../code', PREG_FIND_RECURSIVE);
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

Lets go futher, this time we don’t want to see any files – only a directory structure.
Example 3: List the directory tree:

Code:

$files = preg_find('/./', '../code', PREG_FIND_DIRONLY|PREG_FIND_RECURSIVE);
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

It should be obvious by now that we are using constants as our modifier arguments. What might not be immediately obvious is that these constants are “bit” values (.e.g. 1, 2, 4, 8, 16, …, 1024, etc) and using PHP’s Bitwise Or operator “|” we can combine modifiers to pass multiple modifiers into the function.

How about a regex? Files starting with str_ and ending in .php
Example 4: Using a regex on the same code as example 1:

Code:

$files = preg_find('/^str_.*?.php$/D', '../code');
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

What about that funky PREG_FIND_RETURNASSOC modifier?
This will change the output dramatically from a simple file/directory array to an associative array where the key is the filename, and the value is lots of information about that file.

Example5: Use of PREG_FIND_RETURNASSOC

Code:

$files = preg_find('/^str_.*?.php$/D', '../code', PREG_FIND_RETURNASSOC);
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

As I mentioned earlier, I added sorting capability to the results, so let us look at some examples of that.

Example 6. Sorting the results (of example 1)

Code:

$files = preg_find('/./', '../code', PREG_FIND_SORTKEYS);
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

Example 7. And reverse sort.

Code:

$files = preg_find('/./', '../code', PREG_FIND_SORTKEYS|PREG_FIND_SORTDESC);
foreach($files as $file) printf("<br>%sn", $file);


You can see the result here

Ok, thats all well and good, what about something more interesting?

Example 8. Finding the largest 5 files in the tree, sorted by filesize, descending.

Code:

$files = preg_find('/./', '../code',
  PREG_FIND_RECURSIVE|PREG_FIND_RETURNASSOC|PREG_FIND_SORTFILESIZE|PREG_FIND_SORTDESC);
$i=1;
foreach($files as $file => $stats) {
  printf('<br>%d) %d %s', $i, $stats['stat']['size'], $file);
  $i++;
  if ($i > 5) break;
}


You can see the result here.

Or what about the 10 most recently modified files?

Example 9.

Code:

$files = preg_find('/./', '../code',
  PREG_FIND_RECURSIVE|PREG_FIND_RETURNASSOC|PREG_FIND_SORTMODIFIED|PREG_FIND_SORTDESC);
$i=1;
foreach($files as $file => $stats) {
  printf('<br>%d) %s - %d bytes - %s', $i,
    date('Y-m-d H:i:s', $stats['stat']['mtime']), $stats['stat']['size'], $file);
  $i++;
  if ($i > 10) break;
}


You can see the result here.

I am keen to receive feedback on what you think of this function.   If you have used it in some other application – great, I would love to know.  Suggestions, improvements, criticisms are also always welcome.

String Case Conversion in PHP

Occasionally I read through some comments on the PHP Manual, sometimes to get ideas on different methods of doing things, other times just to try to keep current with some of the vast array of functions available.

Sometimes, I see things that really scare me Рcode that is written and published with the best will in the world from the author Рbut yet displays a lack of a deeper understanding of how to solve a problem.  One such case was the invert_case() and rand_case() functions which basically looped through each character in a string doing whatever it had to do to each character as it went.  Highly inefficient.

Remember, the only difference in ASCII between an uppercase letter and a lowercase letter is a single bit that is 0 for uppercase and 1 for lowercase.

This brief tutorial is based on code available at:
http://www.pgregg.com/projects/php/code/str_case.phps
and you can see example output at:
http://www.pgregg.com/projects/php/code/str_case.php

Surely it would be possible to write some code that would simply flip this bit in each character to the value you want:
– AND with 0 to force uppercase
– OR with 1 to force lowercase
– XOR with 1 to invert the case
– randomly set it to 1 or 0 to set random case.

There are two methods to achieving this, the first makes a simple character mask and performs a bitwise operation on the string as a whole to change it as required.  This method is designed to help teach how this works.  The second method uses the power of the PCRE engine by using a regex to calculate the changes and apply them in one simple step.

Both solutions are, I believe, elegant and are presented here for you.

Solution #1:

Code:

// Code that will invert the case of every character in $input
    // The solution is to flip the value of 3rd bit in each character
    // if the character is a letter. This is done with XOR against a space (hex 32)
    $stringmask = preg_replace("/[^a-z]/i", chr(0), $input); // replace nonstrings with NULL
    $stringmask = preg_replace("/[a-z]/i", ' ', $stringmask); // replace strings with space
    return $input ^ $stringmask;


The method here is to generate a string mask, in two stages, that will act as a bitmask to XOR the 3rd bit of every letter in the string.  Stage 1 is to replace all non-letters will a NULL byte (all zeros) and Stage 2 is to replace all letters with a space (ASCII 32) which just happens to be a byte with just the 3rd bit set to 1 i.e. 00100000
All we have to do then is XOR our input with the string mask and magically the case of all letters in the entire string are flipped.

Solution #2:

 

Code:

return preg_replace('/[a-z]+/ie', ''$0' ^ str_pad('', strlen('$0'), ' ')', $input);


Much more compact and works by using a regex looking for letters and using the i (case insensitive) modifier and most importantly the e (evaluate) modifier so we can replace by executing php code.  In this case, we look for batches of letters and replace them with itself XORed with a string of spaces (of the same length).

Similar principles apply to the random case example, but we complicate this slightly by adding and invert mask (to the solution 1 method). This invert mask is created by taking a random amount of spaces (between 0 and the size of the input string). We then pad this out to the size of the original string with NULL bytes and finally randomise the order with str_shuffle().  We then bitwise AND the stringmask and the invertmask so we create a new mask where randomly letters in the mask have spaces or NULLs.  We then XOR this to the original string as before and before you know it you have a randomly capitalised string.
The Solution 2 version requires you to remove the + so that we only match a single letter at a time (or else our randomly chosen case would apply to words at a time), and we use a termary to randomly decide on using a space or a NULL:

 

Code:

return preg_replace('/[a-z]/ie', '(rand(0,1) ? '$0' ^ ' ' : '$0')', $input);


I hope this has been a worthwhile read and I would certainly welcome feedback on this article.