tail -# file, in PHP

I read Kevin’s Read Line from File article today and thought I would add some sample code I wrote a while back to address a similar task in PHP – How to perform the equivalent of “tail -# filename” in PHP.

A typical task in some applications such as a shoutbox or a simple log file viewer is how to extract the last ‘n’ lines from a (very) large file – efficiently.  There are several approaches one can take to solving the problem (as always in programming there are many ways to skin the proverbial cat) – taking the easy way out and shelling out to run the tail command; or reading the file line by line keeping the last ‘n’ lines in memory with a queue.

However, here I will demonstrate a fairly tuned method of seeking to the end of the file and stepping back to read sufficient lines to the end of the file. If insufficient lines are returned, it incrementally looks back further in the file until it either can look no further, or sufficient lines are returned.

Assumptions need to be made in order to tune the algorithm for the competing challenges of :

  • Reading enough lines in
  • as few reads as possible

My approach to this is to provide an approximation to the average size of a line: $linelength
Multiply by ‘n’: $linecount
and we have the byte $offset from the end of the file that we will seek to to begin reading.

A check we need to do right here is that we haven’t offset to before the beginning of the file, and so we have to override $offset to match the file size.

Also as I’m being over-cautious, I’m going to tell it to offset $linecount + 1 – The main reason for this is by seeking to a specific byte location in the file, we would have to be very lucky to land on the first character of a new line – therefore we must perform a fgets() and throw away that result.

Typically, I want it to be able to read ‘n’ lines forward from the offset given, however if that proves insufficient, I’m going to grow the offset by 10% and try again. I also want to make it so that the algorithm is better able to tune itself if we grossly underestimate what $linelength should be.  In order to do this, we’re going to track the string length of each line we do get back and adjust the offset accordingly.

In our example, lets try reading the last 10 lines from Apache’s access_log

So let’s look at the code so far, nothing interesting, we’re just prepping for the interesting stuff:

$linecount  10;  // Number of lines we want to read
$linelength 160// Apache's logs are typically ~200+ chars
// I've set this to < 200 to show the dynamic nature of the algorithm
// offset correction.
$file '/usr/local/apache2/logs/access_log.demo';
$fsize filesize($file);


// check if file is smaller than possible max lines
$offset = ($linecount+1) * $linelength;
if ($offset $fsize$offset $fsize;

Next up we’re going to open the file and using our method of seeking to the end of the file, less our offset, here is the meat of our routine:

$fp fopen($file'r');
if (
$fp === false) exit;

$lines = array(); // array to store the lines we read.

$readloop true;
while(
$readloop) {
// we will finish reading when we have read $linecount lines, or the file
// just doesn’t have $linecount lines

// seek to $offset bytes from the end of the file
fseek($fp– $offsetSEEK_END);

  // discard the first line as it won't be a complete line
// unless we're right at the start of the file
if ($offset != $fsizefgets($fp);

// tally of the number of bytes in each line we read
$linesize 0;

// read from here till the end of the file and remember each line
while($line fgets($fp)) {
array_push($lines$line);
$linesize += strlen($line); // total up the char count

// if we’ve been able to get more lines than we need
// lose the first entry in the queue
// Logically we should decrement $linesize too, but if we
// hit the magic number of lines, we are never going to use it
if (count($lines) > $linecountarray_shift($lines);
}

// We have now read all the lines from $offset until the end of the file
if (count($lines) == $linecount) {
// perfect – have enough lines, can exit the loop
$readloop false;
} elseif (
$offset >= $fsize) {
// file is too small – nothing more we can do, we must exit the loop
$readloop false;
} elseif (
count($lines) < $linecount) {
// try again with a bigger offset
$offset intval($offset 1.1);  // increase offset 10%
// but also work out what the offset could be if based on the lines we saw
$offset2 intval($linesize/count($lines) * ($linecount+1));
// and if it is larger, then use that one instead (self-tuning)
if ($offset2 $offset$offset $offset2;
// Also remember we can’t seek back past the start of the file
if ($offset $fsize$offset $fsize;
echo 
‘Trying with a bigger offset: ‘$offset“n”;
    // and reset
$lines = array();
  }
}

// Let’s have a look at the lines we read.
print_r($lines);

At first glance it might seem line overkill for the task, however stepping through the code you can see the expected while loop with fgets() to read each line. The only thing we are doing at this stage is shifting the first line of the $lines array if we happen to read too many lines, and also tallying up how many characters we managed to read for each line.

If we exit the while/fgets loop with the correct number of lines, then all is well, we can exit the main retry loop and we have the result in $lines.

Where the code gets interesting is what we do if we don’t achieve the required number of lines.  The simple fix is to step back by a further 10% by increasing the offset and trying again, but remember we also counted up the number of bytes we read for each line we did get, so we can very simply obtain an average real-file line size by dividing this with the number of lines in our $lines array. This enables us to override the previous offset value to something larger, if indeed we were wildly off in our estimates.

By adjusting our offset value and letting the loop repeat, the routine will try again and repeat until it succeeds or fails gracefully by the file not having sufficient lines, in which case it’ll return what it could get.

For the complete, working, source code please visit:

http://pgregg.com/projects/php/code/tail-10.phps

Sample execution:

http://pgregg.com/projects/php/code/tail-10.php

Comment: Why Firefox is failing in the corporate environment.

I’ve sat on this article for a number of years, hoping against hope that the Firefox development team would get off their elite self-indulgent asses and realise that, guess what? – the world doesn’t work the way they think it should.

Don’t get me wrong, I love Firefox. I use it daily for nearly all of my web browsing needs, but there is just one little problem – a massive little problem – and that is why I am writing this article.

Most articles on this subject tend to focus on the lack of IT department deployment and management tools for rolling out Firefox, but that isn’t the issue. Really?
So what is it then?

The answer is very, very simple: Firefox does not work on a real-world company Intranet.  There, I said it. 

Really, it doesn’t – the Firefox development team have decided that in their infinite security wisdom that links from one method (e.g. http://intranet) to a local method (e.g. file://server/expense_claim.xls) are so bad that they won’t even put out a warning.

I feel it is bad enough that it doesn’t work, but silently failing without any alert boxes, or an option saying “Yes, I know I’m risking my life, but really, do let me click this link” or putting file://intranet into the trusted domain is the root cause why Firefox will never be accepted as a corporate browser.

IT departments just do not want to deal with the questions “Why doesn’t the link to the document work?”.  The simplest answer for the IT department is “We only support Internet Explorer”.

Any amount of Firefox protestations saying “Oh! but you shouldn’t be running your Intranet like that.” is not going to change the real-world Intranets, and ultimately it keeps pushing Firefox back from acceptance into the Corporate world.

Until the Firefox is able to be used the way that real users want to use it, IT departments will continue to push that reliable old line that we only support IE.

Welcome to the real world.

https://bugzilla.mozilla.org/show_bug.cgi?id=84128
https://bugzilla.mozilla.org/show_bug.cgi?id=122022

TinyURL PHP “flaw” ?

The Register is running a story today TinyURL, your configs are showing which points out that TinyURL has a /php.php page displaying the contents of phpinfo().

The article then goes on to make some scary sounding claims from security consultant Rafal Los “Why would you want to run a web service as ‘Administrator’ because if
I figure out a way to jack that service, I completely, 100% own that
machine.” and “More importantly… why is this server running as ROOT:WHEEL?!

Sorry Rafal – but you appear to have no idea how web servers work, or all that much about (web) security.

All unix based webservers start as root if they want to bind to the restricted (and default) port 80, after which they switch to the configured UID for request handling.  So, right there, goes all Rafal’s claims about pwning the machine.

Check your own server, the _SERVER and _ENV values will reflect the
starting shell/environment, which just happens to be root.  In
other words, there is nothing wrong with the settings. Having said that, they do have register_globals turned on, which isn’t ideal – but it isn’t a gaping hole if the underlying php code is safely coded.

Also to TinyURL’s credit, they are running Suhosin patch to harden their server.  They’re also running the latest production PHP (which is more than I can say).  Granted, they probably don’t want to be exposing phpinfo() – but this all just an overblown storm in a teacup.