Five Things You Didn't Know You Could Do with Perl
Martin BrownEven after all these years, Perl still strikes many people as being a language primarily for processing and parsing data and textual information, whether that's offline or online through a Web site.
However, despite its history and more common and widespread uses, Perl is a very capable general purposes language through which we can perform a wide range of tasks. In this article, I show you some of the less obvious uses of Perl, some of which may surprise you.
Cataloging PDFs
I use Acrobat documents a lot. As a writer, vast quantities of information is exchanged between companies and organizations in PDF format, and I also create my own PDFs of other documents and Web sites. I do this because they are easy to use across a range of different machines and, more importantly, easier to search using the Acrobat catalog extension. Acrobat documents also all have properties which allow you to describe the content, author, subject matter, and a number of keywords. You can use this information to help catalog your documents, and it also becomes a handy way of identifying files when using the built-in search system.
For additional convenience, I also put my Acrobat documents on a Web server, so that I can access them from any machine without having to worry about mounting the volume. Rather than using the file name information, which is not always helpful, I use a small Perl CGI script which extracts the Acrobat property information and uses this to list the files to make them easy to locate.
You can see a simpler version of the script below. It uses the PDF::API2 module to load the PDF document and extract the property information. We then combine this information into a single hash and print out the information in the form of an HTML page and table.
use PDF::API2; use Data::Dumper; use CGI qw/:standard/; use strict; use warnings; my $info = {}; foreach my $file (glob("*.pdf")) { my $pdf = PDF::API2->open($file); my %infohash = $pdf->info(); $info->{$file} = \%infohash; } my $sortby = param('sortby'); $sortby = 'Title' unless ($sortby =~ m/(Title|Author|Subject)/); print header('text/html'); print <<EOF; <html> <head> <title>PDF Library</title> </head> <body bgcolor="white" fgcolor="black"> <h1>PDF Library</h1> <table cellpadding=5 cellspacing=5 border=0> <tr> <td><b><a href="/dumppdf.cgi?sortby=Title">Title</a></b></td> <td><b><a href="/dumppdf.cgi?sortby=Author">Author</a></b></td> <td><b><a href="/dumppdf.cgi?sortby=Subject">Subject</a></b></td> </tr> EOF foreach my $file (sort { $info->{$a}->{$sortby} cmp $info->{$b}->{$sortby} } keys %{$info}) { printf('<tr><td><a href="%s">%s</a></td><td>%s</td><td>%s</td></tr>', $file, (defined($info->{$file}->{Title}) ? $info->{$file}->{Title} : 'No Title'), (defined($info->{$file}->{Author}) ? $info->{$file}->{Author} : 'No Author'), (defined($info->{$file}->{Subject}) ? $info->{$file}->{Subject} : 'No Subject'), ); } print <<EOF; </table> </body> </html> EOF
You'll see a sample of this in action in the figure. For convenience, the script also provides a direct link to the PDF so that if necessary I can view the PDF within my browser just by clicking on it. The headings of the table are also clickable; click on the header to order the files by that property.
Archiving Files Intelligently
Frequently, I find myself building up a tar file based on some very specific conditions. For example, I want to create an archive based on files that have only changed since a specific date and time. Specifying this time can be difficult, so it's use to to use the time of another file. I also use CVS to manage my projects, but I don't always want to include the CVS information in my archive.
Therefore, I created the following script. It combines the Archive::Tar module, which creates Tar archives, and the output from the find2perl script, which generates a Perl version of a Unix find command. I've then modified it slightly to extend the functionality a little to suit my needs.
The flexibility of Perl means I could just have easily created a Zip, or modified the find function to select and specify a different selection from the files it finds.
use strict; use Archive::Tar; use File::Find; my ($min,$hour,$day,$mon,$year) = (localtime())[1..5]; $mon++; $year += 1900; unless (@ARGV >= 3) { print <<EOF; Usage: $0 basename compare directories... where basename is prefixed to the current date and time compare is the name of the file you want to use as the modification time reference EOF exit(1); } my $base = shift; my $compare = shift; my $compare_time = (stat($compare))[9]; open(FILE,">$compare") or die "Can't update comparison file, $compare\n"; print FILE $compare_time; close(FILE); my @filelist; find(\&wanted,@ARGV); if (@filelist) { my $arcname = sprintf("%s.%04d%02d%02d.%02d%02d.tar.gz", $base,$year,$mon,$day,$hour,$min); print("Writing ",scalar @filelist," files to $arcname\n"); my $archive = Archive::Tar->new(); $archive->add_files(@filelist); $archive->write($arcname,1); } exit; sub wanted { my $full = $File::Find::name; return unless(-f $_); return if (/^\./ or /~$/ or /^\#.*\#$/ or $full =~ /CVS/); my $mtime = (stat($_))[9]; return unless ($mtime > $compare_time); print "Adding $full to archive\n"; push @filelist,$full; }
To create a new archive, type:
$ archive.pl update .lastarchive .
The file automatically has the date and time embedded into the name, making it easy to identify when the archive was generated.
Organizing your MP3s
If you've moved on from CDs and now have all your music in digital format, then you're probably using the MP3 format to store your music. MP3s are a great way of storing music, but organizing them can be something of a complex process, especially if you want to use them outside of an organized environment like iTunes, WinAMP or MusicMatch Jukebox. Alternatively, you might just be like me, and like to keep your files organized.
Doing it by hand is obviously a nightmare, but we can make use of the MP3 tags; a series of data built into the MP3 file which is used by MP3 players to display the track information. We can use that through Perl using the MP3::Info module to extract the data and then use that as the basis for identifying the album and artist and then file the files into a folder structure. You can see a script for doing this below.
use MP3::Info; use File::Copy; use File::Spec::Functions; use File::Path; use warnings; use strict; foreach my $file (@ARGV) { my $tag = get_mp3tag($file); unless(defined($tag)) { warn "No tag in $file\n"; next; } if (defined($tag->{ALBUM}) && defined($tag->{ARTIST})) { my $directory = catfile($tag->{ARTIST},$tag->{ALBUM}); mkpath ($directory,0,0777) unless (-e $directory); my $newfile = $file; $newfile =~ s/^(.*\/)?.*?(\d+).*(\.mp3)$/$2$3/; my $newloc = catfile($directory,$file); if (move($file,$newloc)) { printf("Moved %40s to %40s\n",$file,$newloc); } else { warn "Error: $!\n"; } } }
For portability, I used standard modules to create folders and paths, and to actually move the file. You should be able to use this script on any platform supported by Perl.
Generating Graphs
Perl is often used to manipulate information and one of the best ways to show numerical data is through a graph. But did you realize you could generates those graphs from within Perl?
Using GD::Graph, we take a simple array of values and turn it into a graph. In the sample below I've imported the data from a file, but you could just as easily generate the information from a database table or any other data source. I also generate a file, simply called graph.png, but you could modify the script to generate a graph dynamically straight to browser as part of a Web page.
The GD::Graph module is at the same time very flexible and also self-managing. You can specify as little or as much when defining the format of the graph, including setting minimum and maximum values, configuring how many X value items to include (or skip) in the axes title. Alternatively, you can specify nothing and let GD::Graph choose some suitable values for you.
use strict; use GD; use GD::Text; use GD::Graph::lines; use GD::Graph::colour; my (@dates,@sales); open(FILE,$ARGV[0]); while(my $line = <FILE>) { chomp $line; my ($d,$s) = split /:/,$line; push @dates,$d; push @sales,$s; } close(FILE); my $my_graph = new GD::Graph::lines(600,480); $my_graph->set_title_font(gdGiantFont,24); $my_graph->set_x_label_font(gdGiantFont,14); $my_graph->set_y_label_font(gdGiantFont,14); $my_graph->set('x_label' => 'Date', 'y_label' => 'Sales', 'title' => 'Sales by date', ) or warn $my_graph->error; open(FILE,">graph.png"); print FILE $my_graph->plot([\@dates,\@sales])->png; close(FILE);
You can see a sample of the graph generated by the above script, and a simple data file (using colon separated dates and sales figures). You can see how simple and straightforward the process can be using GD::Graph. Most of the code is actually devoted to loading the data and setting the fonts for the axis labels.
Controlling Your Home
Up to now we've looked at solutions that use the power of the modules available through CPAN to help us write specific scripts. It is the power of the Perl module system that gives Perl, and our scripts, a lot of their appeal.
Now let's look at a complete application. MisterHouse integrates with the X10 system to help control, manage and monitor your home. X10 is a system that uses the cabling of your house or office electricity supply to communicate information. By using the power cables and special plugs and sockets, you can both communicate information and control components in your house.
For example, you can use a special socket to control the power to any item with a plug, like a lamp or radio. Other devices include remote controls and infra-red sensors (as used in alarms) to monitor the status of different rooms. You can combine these — for example, to switch on the light when the sensor identifies your presence — or connect it all up to a computer interface to the system, letting your computer monitor the sensors and control the remote switches.
This is where MisterHouse comes in. MisterHouse is an application that interfaces to the X10 system. It provides a simple status application, which you can see in the figure below - that also allows you to interact and directly control and monitor different components in your X10 system. The interface is through either a Tk-based application or through a web interface, which you can see in the screenshot below, taken from the Web site.
The real flexibility and power of MisterHouse comes from being written in Perl. It means we can write our own controls and reactions to events just by writing some Perl code. For example, you could create a simple instruction to start the dishwasher automatically at 10pm each night using:
$dishwasher = new X10_Item('B1'); set $dishwasher ON if time_now '10:00 PM';
Read more about scripting languages! Start with The State of the Scripting Universe. Then, explore each language in more detail, by learning Five Things You Didn't Know You Could Do...
...With Python ...With PHP ...With Tcl ...With Ruby
Discuss this article in the DevSource forum.
Copyright © 2005 Ziff Davis Media Inc. All Rights Reserved. Originally appearing in Dev Source.