AOP, Performance Counters and The Big Red Button

A friend of mine who was once involved in a test role at play.com recently mentioned to me a very interesting story. In case you did not know them, play.com sell a lot of stuff over the internet. At the time my friend worked there, play.com had in their dev centre a big button displayed on an even bigger screen indicating the number of orders they were getting every second. As play.com never sleeps, they knew that when the number of orders/sec dropped below a certain threshold and the button went red, alarms had to be raised, SOS emails had to be sent out and the fat controller had to be woken up as something was clearly going wrong. (OK, I made up the fat controller but it’s a different story).

I have no idea how the whole system was implemented but it clearly illustrates a very good use of software instrumentation. If techies at play.com they were as clever as I think they were, the simplest method to get the “button” to work is to use bog standard performance counters as they may serve at least several purposes:

  • As illustrated by creative folk at play.com, performance counters may act as an early warning system. Excessive values may raise automatic alerts through monitoring tools and this is usually highly appreciated by the "ops" people responsible for day to day operation of the software.
  • Performance counters allow us and our clients to trace execution times, throughputs etc which may be the only way to verify whether a third party provides appropriate level of service (SLA). This comes especially handy when your app relies heavily on services provided by others.
  • As all things related to instrumentation performance counters are often immensely helpful in diagnosing software problems when something goes, or is about to go wrong
  • Performance counters give us true picture of behaviour of our software: how many logins, saves, loads does the application perform? How does the workload differ from the requirements and/or expectations?

Knowing that most of us find instrumentation as exciting as having the root canal done, I came up with a plan to use PostSharp AOP to ease some, if not all of this pain. If you are interested in publishing performance data, read on as you may find that the entire process can be made very simple and easy with a couple of AOP tricks.

Anatomy of performance counters

Before we start publishing any performance data we need to "register" our performance counter(s) in the operating system: this can be done in a number of ways but personally I prefer to do it programmatically (if possible). The following snippet illustrates how to do it:

Code Snippet
  1. var counterCreationDataCollection = new CounterCreationDataCollection();
  2. var valueCounter = new CounterCreationData("Counter name", "Help text if any?",
  3.                                            PerformanceCounterType.NumberOfItems32);
  4. counterCreationDataCollection.Add(valueCounter);
  5. PerformanceCounterCategory.Create("Name of category",
  6.                                   "Sample performance counter category.",
  7.                                   PerformanceCounterCategoryType.SingleInstance,
  8.                                   counterCreationDataCollection);

The problem with programmatic approach is security: in order to execute the code, we not only need to have admin privileges but on some operating systems (Vista and Windows 7) the program has to be “Run as admin” which may be a bit tricky. To get around the pesky UAC issues you may need to modify application’s manifest file which is described very nicely here. Other than using programmatic approach, performance counters can be installed by a setup program with a bit of help from custom installer class derived from PerformanceCounterInstaller. Unfortunately once created, PerformanceCounterCategory hosting our counters cannot be modified so if new counters were to be added, the entire category has to be recreated. For this very reason during development I usually have a piece of code which reinstalls all required performance counters on application start-up as over the lifecycle of the project both numbers and types of performance counters are very likely to change. When the application reaches it's final stages of development the task of creating performance counters can be delegated to a setup program.

Usage

Once we have our counters in place we have a couple of ways to interact with them but the choice is relatively simple: we can either use thread safe Increment/Decrement/IncrementBy methods or directly manipulate counter's RawValue as the following sample illustrates. According to MSDN using thread safe approach is about 5 times slower than directly accessing RawValue, but you have to be sensible about it: if you are instrumenting a particularly “hot�� piece of code, direct access may indeed be an option but I would not go this route unless profiler indicates that it may provide some tangible benefits (measure first!).

Code Snippet
  1. var counter = new PerformanceCounter("Category", "Counter name", false);
  2. // We can increment the value in a thread safe maner...
  3. counter.Increment();
  4. // ... or set the value directly
  5. counter.RawValue = 123;

Once you start digging into System.Diagnostics documentation you will surely discover that there are 28 different types of performance counters. The list available on MSDN may seem very daunting at first but from the publishing perspective counters really fall into two categories:

  • Performance counters representing "number of items". Examples may include "total number of logins", "sales transactions processed" etc. When publishing this sort of data we usually have to simply capture the number of invocations of a particular operation and use PerformanceCounter.Increment() method. If the number may go down as well as up we can use either Decrement() method or directly set the RawValue property.
  • Performance counters related to time of execution. The time has to be expressed in terms of "timer ticks" and we have to measure how long the operation took and then simply call PerformanceCounter.IncrementBy(executionTime)

Now you may ask how on Earth is it possible to get 28 types of performance counters given that we only set either the number or the time of execution? The answer is simple: majority of the counters use some sort of a magic formula involving one or two counters to come up with a final value. The best example is performance counter of type RateOfItems32. Every time performance monitor samples it's value, it checks how much the value has changed since the last sample was collected and then divides the difference by the time elapsed since the last sample. This gives nice Rate/sec and all we have to do to publish it is to simply increment a counter every time an operation takes place.
Slightly more complex scenario utilises values of two different performance counters; imagine that we want to expose "cache hit ratio" from a caching component. In order to  achieve this we have to create two counters of types RawFraction and RawBase respectively. Whenever we have a "cache hit" we increment RawFraction and every operation (whether it is a hit or a miss) increments RawBase. This way performance monitor will calculate hit ratio as RawFraction/RawBase and we're done.
More interesting things happen when we start dealing with times; we can for example expose a counter which represents number of threads executing given method at any time by simply exposing timer of type CounterTimer. Here is how it is done: imagine that an operation takes 200ms and performance monitors checks the value of the counter every second. If between two samples we have executed the operation 20 times each time incrementing the value of the counter by 200ms, at the end of our sampling interval the total accumulated value will increase by 20 executions times 200 ms = 4000ms. Given this increase, performance monitor will divide the delta by sampling interval which tells us that, on average, during our sampling interval we were executing 4000ms/1000ms = 4 operations simultaneously. Genius! Full list of supported counter types is obviously available on MSDN so I won’t be repeating it here, the only thing you have to pay special attention to is whether you are required to increment counter value by number of executions (or other items) or the time of the execution.

Performance of performance counters

If you read my previous posts you may remember the paradox of Schrödinger's cat. The paradox states that by measuring a value we may affect it. This is very true with performance counters as there is indeed overhead associated with using them. Having said that results captured by my favourite profiler indicate that unless we are doing hundreds of threads executing a method thousands of times per second there is nothing to worry about. This of course does not mean that that every single method of every single class has to be instrumented! As with any tool it makes sense only when it is used wisely: monitoring critical operations, 3rd party services etc.

Putting it all together

Knowing that we can cover 90% of performance counter scenarios by capturing either execution time or number of executions, we can wrap it all up in two simple PostSharp aspects. If you are new to PostSharp I would strongly suggest that you have a look at the website, but the idea is to simply decorate a method with a couple of attributes and let the PostSharp interception mechanism do the magic as the following code illustrates:

Code Snippet
  1. [CountingPerformanceCounter(PerformanceCounters.CATEGORY, PerformanceCounters.NUMBER_OF_CALLS)]
  2. [CountingPerformanceCounter(PerformanceCounters.CATEGORY, PerformanceCounters.CALL_RATE)]
  3. [CountingPerformanceCounter(PerformanceCounters.CATEGORY, PerformanceCounters.EXECUTION_TIME_BASE)]
  4. [TimingPerformanceCounter(PerformanceCounters.CATEGORY, PerformanceCounters.ACTIVE_THREADS)]
  5. [TimingPerformanceCounter(PerformanceCounters.CATEGORY, PerformanceCounters.EXECUTION_TIME)]
  6. private void InstrumentedMethod()
  7. {
  8.     Thread.Sleep(_pause);
  9. }

As you can imagine there is no rocket science behind the aspects themselves. Here is the code of the TimingPerformanceCounter which is slightly more involved then the other one:

Code Snippet
  1. using System;
  2. using System.Diagnostics;
  3. using PostSharp.Laos;
  4. namespace EMC.PostSharp.PerformanceCounters
  5. {
  6.     /// <summary>
  7.     /// Implements aspect publishing method's execution time through performance counters
  8.     /// </summary>
  9.     [Serializable]
  10.     public class TimingPerformanceCounterAttribute : PerformanceCounterAttributeBase
  11.     {
  12.         /// <summary>
  13.         /// Initializes a new instance of the <see cref="CountingPerformanceCounterAttribute"/> class.
  14.         /// </summary>
  15.         /// <param name="categoryName">Name of the category.</param>
  16.         /// <param name="performanceCounterName">Name of the performance counter.</param>
  17.         public TimingPerformanceCounterAttribute(string categoryName, string performanceCounterName)
  18.             : base(categoryName, performanceCounterName)
  19.         {
  20.         }
  21.         /// <summary>
  22.         /// Method executed <b>before</b> the body of methods to which this aspect is applied.
  23.         /// </summary>
  24.         /// <param name="eventArgs">Event arguments specifying which method
  25.         /// is being executed, which are its arguments, and how should the execution continue
  26.         /// after the execution of <see cref="M:PostSharp.Laos.IOnMethodBoundaryAspect.OnEntry(PostSharp.Laos.MethodExecutionEventArgs)"/>.</param>
  27.         /// <remarks>
  28.         /// If the aspect is applied to a constructor, the current method is invoked
  29.         /// after the <b>this</b> pointer has been initialized, that is, after
  30.         /// the base constructor has been called.
  31.         /// </remarks>
  32.         public override void OnEntry(MethodExecutionEventArgs eventArgs)
  33.         {
  34.             // Record the start time here
  35.             eventArgs.MethodExecutionTag = Stopwatch.GetTimestamp();
  36.             base.OnEntry(eventArgs);
  37.         }
  38.         /// <summary>
  39.         /// Method executed <b>after</b> the body of methods to which this aspect is applied,
  40.         /// even when the method exists with an exception (this method is invoked from
  41.         /// the <b>finally</b> block).
  42.         /// </summary>
  43.         /// <param name="eventArgs">Event arguments specifying which method
  44.         /// is being executed and which are its arguments.</param>
  45.         public override void OnExit(MethodExecutionEventArgs eventArgs)
  46.         {
  47.             base.OnExit(eventArgs);
  48.             long start = (long) eventArgs.MethodExecutionTag;
  49.             // Increment the counter by the value of time elapsed
  50.             PerformanceCounter.IncrementBy(Stopwatch.GetTimestamp() - start);
  51.         }
  52.     }
  53. }

In case you want to take the whole lot for a spin, the file attached to this post contains all the necessary ingredients including some unit tests. To build the app you will need the latest version of PostSharp installed on your machine. The sample app uses multiple threads and throttling to expose some typical scenario of performance counter utilisation (Calls per/sec, Active threads, Total calls and Average execution time).

image

October 12 2009
Comments are closed