January 2010

Solving Performance Emergencies with FileMaker Server

by Brian Dunning

Anyone who uses FileMaker Server in a mission-critical capacity has either had to deal with an emergency, or they're going to.

There are two basic types of server emergency: The total crash, and the performance bottleneck. The total crash, while seemingly (and potentially) more disastrous, is usually easier to solve. The performance bottleneck, on the other hand, is usually due to something obscure in your scripts, which makes it (a) ferociously hard to diagnose, and (b) your problem, since FileMaker's technical support can do little or nothing to help, beyond provide some general guidelines that you're probably already following.

In this article, we're going to discuss this latter, more insidious, problem. Your database has slowed to an unacceptable crawl. Business is being put on hold, perhaps disastrously. How to fix it? We'll go through some of the more common causes.

Barcodes for FileMaker Pro - No Special Fonts, No Plugins
UPC-A POSTNET Code 128
Code39 | Code128 | UPC-A | EAN-13 | POSTNET

You should read and know all of these, even if they're not relevant to your situation. They may be and you just don't know it.

Live Development

Everyone always advises you never to do development on a live system that's currently being hosted and used. Wouldn't it be nice if we lived in Fantasyland and that was always practical? The reality is that we are often forced to do development, and especially emergency development like this, on a live system. I promise not to condescendingly tell you to "restore from a backup" or any crap like that. The day the emergency hits is probably a critical time, and we often have to take critical steps. Even if you weren't in crisis mode, many databases are simply too big to go through the process of taking everything down and exporting and importing data. Most of the advice given here requires you to do development on a live server.

The one thing I will advise you to do — and I remind you, others will tell you not to do this, and on a non-emergency day I'd agree with them — is to make any changes to your schema using a copy of FileMaker Pro on the same machine as FileMaker Server. Yes, you can open a file as a client on the same machine where it's being hosted. If you have to add fields, delete fields, change calculation fields in a large table, or anything like that, do it on the same machine as Server. This guarantees that Pro will maintain its connection to Server during the crucial moments; otherwise you risk corruption.

Obviously, it's best (and fastest) if you can take the system down to make such changes. Always do that whenever possible. I'm going to assume that your situation is critical enough that you can't do that. Anyway, we're talking about performance issues here, not corruption; so the situation is different. If your issue is corruption, take the file down from FileMaker Server now and work on it only in single-user mode, I don't care how critical your situation is. But that can be discussed at another time. Back to performance problems:

Slow Server, or Slow Client?

You need to know whether the culprit is on the server's end (FileMaker Server) or on the clients' end (FileMaker Pro). You can tell this easily. What icon is on the locked-up user's screen? If it's the coffee cup (Windows) or Satan's Happy Spinning Rainbow Wheel (Mac), that means the server is busy. Something happening on FileMaker Server is hanging up your clients. You can also get this when you're working remotely or on another otherwise slow connection and file data is being transferred to your machine. If it's the hourglass (Windows) or butterfly (Mac), then FileMaker Server is probably OK, and the slow event is on your local FileMaker Pro client.

If one user is being hung up by FileMaker Server, then probably all are. However this is not necessarily always the case. Some users may be accessing a different file or a different table than the one causing the problem. If so, check: Have the unaffected user attempt to access the same part of the solution that the hung-up users are. If they get hung up as well, that's a good thing. You've narrowed it down.

FileMaker Server 11 gives you the ability to analyze statistics on a per-client basis, and this really helps a lot when you're trying to figure out if one particular client is doing something that slows everyone down. More on that below.

Cache and RAM

This is probably not the problem, but it can have an effect on overall system performance. In this day and age there's no reason not to have machines whose RAM greatly exceeds FileMaker Pro's modest needs. Go into the Memory tab of FileMaker's Preferences and make sure the cache is set to 256MB. There's no benefit to having it at anything less. This should be done to all your machines.

Sometimes I see cheap, horrible old computers with 512MB of RAM total straining to run Windows XP, barely staying alive just maintaining the swap file on the disk. Don't have any crappy computers like that on your network. For less than a grand you can replace it with something basic but modern that will eliminate all system-related concerns from your problem list. Make sure all your computers exceed FileMaker's system requirements. Don't let a computer with less than 2GB RAM near your FileMaker network.

Server Statistics

You probably already have your eye on the Statistics screen in Server Admin. But you might be wondering what those numbers mean, since the documentation doesn't give much practical advice on how to use them. Here is a practical field guide to these numbers (get explanations of what each one is in your "FileMaker Server Help" PDF):

Cache Hit % This should be 100. If it's not, all you can do is turn up Configuration > Database Server > Database Cache to 800 MB (the max). There's no benefit in ever having it any lower.
Cache Unsaved % This should be 0. No worries if it's less than 5. Cache setting above is all you can do.
Disk KB/sec Doesn't matter as much as the I/O time below.
Elapsed Time/call Less than 100 is really good, less than 1000 means you probably don't have a problem. Really big numbers, like in the millions, mean that clients are having to wait a long time for server processes that aren't their fault. If this is the case, you probably have to fix some slow scripts, or maybe you have corruption.
I/O Time/call If this averages more than about 100, you should have faster hard drives on your FileMaker Server. I like to see a striped RAID array of 15,000 RPM disks. Hard disk speed is the ONLY really important specification for a FileMaker Server machine, more important than RAM or processor.
Network KB/sec If this averages more than 1000, you should probably see if you can reduce the amount of data going between FileMaker Server and the clients. Think of container fields and stuff like that. Whenever possible use layouts that don't include them. If you can't avoid it, go to gigabit ethernet (and be sure you're on CAT6 cables, not 10-year-old CAT5 cables), but remember it's of no benefit if your switches and network cards don't support it.
Remote Calls/sec I've never found this to be very important. It's a way to tell who's really busy, but they might be making only efficient queries. This number does not correlate with performance hogging.
Wait Time/call Same comments as for Elapsed Time.

In FileMaker 11, you can switch to the Clients tab in Statistics view, and see all these same stats broken down on a per-client basis. Often, when one client runs a troublesome script, you can identify him on this screen. This greatly simplifies the determination of which script is causing the trouble.

Note!When one client does something (like an unindexed find) that slows everybody down, you can often identify him by his I/O Time. This value will usually shoot to a much higher number than everyone else. This is often the fastest and easiest way to tell who's the culprit. You can then troubleshoot whatever script he executed.

Server Scripts

Scheduled scripts are a nice convenience feature, but I avoid using them. You have insufficient feedback in the event of a problem. Often I've found that systems brought to their knees are the victims of something simple in a server-side script, like an infinite loop or a long sort or an unindexed find. It's surprising how something as simple as a loop can bring down an entire system, giving 50 users the coffee cup.

First go into Server Admin and check the Clients. If one of them is "script", then you know the scheduled script is hung up and sucking the server's resources dry. Forcibly disconnect that client, and deactivate that scheduled event. Don't let it run again until you've found and fixed the problem.

Since the lack of feedback makes it hard to find the problem, go into Server Admin and see what the error number was (if any). Go to Administration > Log Viewer and view all the errors. If any FileMaker errors were encountered by your scheduled script, you'll find them here.

Another reason I avoid using these scheduled scripts is that, on more than one occasion, I've found that the error number reported here is wrong or obviously irrelevant, thus providing no help debugging the problem script.

Obscene File Size or Record Count

Although it may seem like a huge file size (more than a few GB) or record count (more than a few million) are performance problems that can be improved by deleting/archiving old records, I've found this is rarely the case.

Yes, almost all operations on a file will be faster with 10 records than with 10 million. But design is far more important than record count. I once was given a file with six (!) records that was so slow it was completely unusable. It had hundreds of fields, nearly all of which were calculation fields, based on other calculation fields. Relationships twisted everywhere. Everything was, necessarily, unstored. It was an intractable mess.

Your file is probably better than that. But whether you have ten thousand records or a million is not as important as what resources are required to access a record, either by a script or by switching to a layout. Those million records from 2002 are probably not involved in most (or all) of what your users are doing. As such, they're merely occupying space on the server's drive and not really hurting anything.

If you're relying on summaries or complex unstored finds that use these records, then it's another story. They may well be part of the problem. But first, you should fix/eliminate those summaries or unstored finds. This will almost always yield more improvement than simply deleting records but keeping the inefficiency.

PHP/Etc.

Don't forget all the other things that might be hitting your database besides FileMaker Pro users. We often forget PHP or IWP interfaces to our server. In Server Admin, if you see that one of the users is "fmphp", then you know that one of your Custom Web Publishing pages is executing a FileMaker query that's taking way too long.

Debugging this should not be too hard, if you're fluent enough in PHP. It's hard to give specific advice here since the possibilities in PHP architecture are so diverse, but from the FileMaker perspective, you at least want to isolate each of your queries.

Make a test page on which to test each of your queries. If there are multiple queries on the page, be sure to execute each using the actual data that the preceding query might be providing to it. If it's not practical to isolate queries, then put timers into your script. Here is a multistage timer I often use:

<?php

// Initiate the timer
$timer = explode(' ', microtime());
$time_0 = $timer[0]+$timer[1];

// Create a new one whenever you want throughout your script
$timer = explode(' ', microtime());
$time_1 = $timer[0]+$timer[1];

// More PHP stuff goes here ..........
$timer = explode(' ', microtime());
$time_2 = $timer[0]+$timer[1];

// More PHP stuff goes here ..........
$timer = explode(' ', microtime());
$time_3 = $timer[0]+$timer[1];

// Display its output when you're finished with the script
echo "Start: $time_0<br>";
echo "Interval 1: $time_1<br>";
echo "Interval 2: $time_2<br>";
echo "Interval 3: $time_3<br>";

?>

The Dreaded Slow Query

Is there a script that runs often? Depending on the number of users you have, there may well be. There may be 50 scripts that run often. You'll have to go through them, one by one. In my experience, this turns out to almost always be the problem. Some query (that might be a find, a relationship, a summary, or other things) is being slow, often much slower than you think. Even if a particular query takes only 5 seconds to run, it may well be swamping the server when it does; and if 2 or 3 or 6 or 10 users try to do it at the same time, it can cause a work stoppage for minutes.

Barcodes for FileMaker Pro - No Special Fonts, No Plugins
UPC-A POSTNET Code 128
Code39 | Code128 | UPC-A | EAN-13 | POSTNET

Slow queries really task your server, and they can kill everyone on your network. Slow queries must be eliminated.

If you're familiar with other databases, you might like to ask where FileMaker's slow query log is. It doesn't have one. You have to find them on your own.

To solve this, put timers in your scripts. Write the results to a table or email them to yourself. If the system is busy with users, you'll very quickly track down the slow query. When you see that any one given script is not taking inordinately long, comment out that timer.

Here is an example of one way to put a timer in your script. There are many ways to do it, and it doesn't much matter how you do it; what matters are the results:

# Initiate the timer
Set Variable [$timer; Value: ""]

# Update your timer at various points in your script
Set Variable [$timer; Value:$timer & "Pos 1: " & Get ( CurrentHostTimeStamp ) & "¶"]
A bunch of script steps
Set Variable [$timer; Value:$timer & "Pos 2: " & Get ( CurrentHostTimeStamp ) & "¶"]
A bunch of script steps
Set Variable [$timer; Value:$timer & "Pos 3: " & Get ( CurrentHostTimeStamp ) & "¶"]
A bunch of script steps

# At the end of your script, email yourself the results
Send Mail [Send via SMTP Server; No dialog; "me@myself.com"; Subject: "Script Name"; Message: $timer]

Put as many of those incremental steps in there as you can. Don't trust that anything you think is fast, is fast. Expect to be surprised to find out what part of the script is taking too long. Do this to every frequently used script, especially including those that you're sure aren't the problem. It takes two minutes to add a timer like this. Spend those two minutes on every script that is frequently used, and collect some data. I guarantee you'll uncover some performance hog that you can take out behind the shed and shoot.

Once you find the culprit, you have to solve it. How? Well, since there are about a billion possible culprits, that's hard to write about. Look for things like unindexed finds, finds using fields in other tables, summaries. You may have to experiment and make some changes.

One thing to be aware of when doing such emergency fixes on the fly is that changes you make to scripts will not necessarily be propagated out to all users right away. Each machine has to have sufficient idle time before it will look for an updated script: It may continue using the old one for some time. Close each user's file or quit FileMaker to be absolutely certain that they get the updated version of the script.


Brian Dunning is the technical editor of FileMaker Advisor magazine and is a long, LONG time independent consultant. Got a problem? Give a call.