Monday 30 July 2007

ImageMagick memory problems

I noticed my Windows development machine had slowed to a crawl recently, and I assumed it was something to do with the new plugins to convert images that are in development. Turns out it was, but not in an obvious way. Because the plugin requires ImageMagick, any EPrints process will load ImageMagick; this includes the scheduled task to run the indexer which I had installed and promptly forgotten about.

It looks like that because I had installed ImageMagick after I created this task, its environment didn't include the necessary paths for the Perl module to load correctly. It could find the Perl component, but not the XS part. Because of the way it autoloads unknown functions, this was causing an infinite recursion every time the indexer ran, which Perl eventually caught but not before it had used 2GB of my swap space and all my physical memory.

Recreating the task (in fact, just changing a property and changing it back) seems to have got rid of the problem. Maybe tasks store the path from the environment when they are created. This just proves why scheduling the indexer has to be done carefully; the Unix version has a complicated script to check that everything's running successfully, which I'm hoping Task Scheduler can duplicate under Windows.

On the positive side, the new plugins to convert images using the Perl API look to be working fine. Next stop, GhostScript—and the Windows version is getting closer to feature complete relative to Unix.

Thursday 26 July 2007

ImageMagick spoils my day

An ImageMagick design flaw means that it's unsafe to use in a batch environment where you can't trust the filename you are given. This is a problem solved a long time ago by all other Unix command line utilities, but to work around it I have to mess around with the Perl API rather than the command line.

What if you need to delete a file called -rf, or cat a file called --help? That's what the -- option is for, to tell the command that there aren't any more options, just filenames. ImageMagick interprets special characters in filenames as syntax, but there's no way to tell it not to. As far as I can tell, you can't load a file called image.jpg[23x42], even though it's a valid (though odd) filename. This kind of problem can easily lead to security problems in server applications.

It doesn't help either that the (sparse) documentation for the API says that you can read an image from a filehandle, when in fact that crashes Perl. Several hoops later I seem to have a reliable way of converting images from Windows; the next step is to hack together some EPrints plugins and see if it works for real.

Thursday 12 July 2007

New release, indexer progress

A new Windows package has now been released. This release fixes a few bugs and has a new graphical installer. Download it from http://files.eprints.org/279/.

Good progress has been made with the indexer. Metadata can now be indexed periodically, and so can full text of plain ASCII documents thanks to a bug fix. I need to write some file format plugins to convert other formats, which might involve a change in the way plugins are configured in the core. The next release (soon I hope) will have the indexer included; until then I'll post a script which can be configured manually.