Auditing System Impact on Performance – Windows

A lot of questions are raised regarding the effect of auditing on the performance of systems.

 


Here is an extract of an article from the MSDN blog site:

 

At low auditing loads, auditing generally has no discernable impact on perf. If you were hardcode with a profiler and iterated an auditable activity a million times I am sure you’d be able to measure it, but for reasonable audit policies you won’t notice a significant difference.

 

At high auditing loads, auditing has a significant performance impact. This is more true of pre-Vista multiprocessor systems than of systems with the new eventlog system.

 

For example, a multi-processor domain controller (say a 32-processor box) running Windows Server 2003, might run into problems under extreme load. Why is that? Because ultimately the limiting factor on event rate is how fast you can write the events to disk. Pre-Vista eventlog has a single thread writing events to disk. So even though you might have 32 threads servicing authentication requests (an auditable activity), each of them is queueing to a single audit queue which is ultimately despooling to eventlog via RPC on a single thread, and eventlog is only writing to the security log with a single thread. What we observe in practice in this case is that a single processor on the system goes to 85-100% utilization, and the other processors drop to a very low utilization as the authentication threads are blocked waiting for the audit function call to return. This call won’t return until the queue is not full, and the queue is waiting on RPC which is waiting on eventlog… so eventlog governs the rate.

 

In Windows Server 2003, we added a particular optimization only for the security event log, which batches events in the RPC call to eventlog. This means that you can get more event throughput in the security log than in other logs on the system. It didn’t eliminate the bottleneck, but it pushed back the limit, so WS03 on typical hardware should be able to log several thousand events per second to the security event log. Previous versions were only able to log about 1000 events per second.

 

Note that the change in performance characteristics occurs all at once. So the impact tends to be trivial until the queues fill, at which point the impact is severe. It does not scale linearly, there’s a discrete behavior change. What this means realistically is that if you ever encounter a performance problem with auditing, then you probably just need to turn it down a little and you won’t have a problem any more.

 

In Vista and subsequent releases, audit queues events via ETW. ETW was designed for high-performance kernel tracing, and in the auditing team we tested it to over 10,000 (10.000 for you folks in Germany 🙂 events per second before we decided that we had hit our scale targets. We never tested exactly how high it would go, but we were satisfied that the eventlog service was no longer a bottleneck in realistic scenarios.

 

There are some edge cases where you might run into performance problems by trying to audit too much in a critical path. For instance, it is a really really bad idea to put SACLs on your entire registry. If you monitor registry activity with a tool like Process Monitor, you will notice that when a system is not idle, there are often hundreds or thousands of registry accesses per second. If you impose an auditing tax on each of those activities you will notice a degradation in performance. Not to mention that the resultant mountain of events is probably not very valuable. Of course you can tune SACLs as I have mentioned before, but I doubt that it’s useful to take the time to tune SACLs for the entire registry.

 

One last point is that the eventlog is writing the events somewhere. Wherever it is writing events, it is consuming disk I/Os and competing with anything else writing to the same volume. If you have a disk performance problem on that disk, it can result in an auditing performance problem, as everything else will back up if the eventlog can’t write events to disk fast enough. So one thing you can do is ensure that the disk where your log is placed has enough I/Os.

 

In summary audit has very minimal impact unless you do a whole lot of it, in which case it can have severe impact on your system. The change happens suddenly, not gradually, so you can do a lot of auditing with no problem. If you run into a problem, turn it down just a little (or little by little) and at some point the behavior will change such that you won’t have any significant perf impact anymore.

 

Original article can be found here.