lørdag den 15. januar 2011

How the deal with external systems?

The problem

When developing solutions today, we almost always have to connect to external systems. These connections come in all shapes and sizes, from accessing a file system or connecting to a database, to transferring files to or from an FTP server, or to call a web service or an enterprise service bus.

When connecting to external systems, we are exposing our own system to risks. And I am not talking about security risks like vira, malware, evil hackers and the likes, but I am talking about the risks of performance degradation, scalability issues, dependencies upon the status of the external systems, poor user experience and more. If we are not very careful when we design our solutions, all bets are off when the external systems slow down, become unstable or even shut down. If we fail to take such situations under consideration, the state of the external system dependencies dictate the state of our system – if just one external system is down or unstable, our system will be down or unstable too. So when building large systems with many external dependencies, making sure our system keeps running, can become a real nightmare.

Ways to deal with external dependencies

In his excellent book “Release IT!”, Michael T. Nygard writes a lot of good advice on how to deal with external dependencies. He describes several patterns and techniques that are useful for lowering the risks of external systems damaging the performance and stability of our system. Other great minds have further elaborated on the topic in books, blogs and presentations, and we now have an arsenal of patterns and techniques at our disposal. I will briefly sum up four of these here:

Caching is a technique where you hold on to a result from a call to an external system for a while, and reuse this result instead of making subsequent calls to the external system.

Circuit Breaker is a pattern where you keep tabs on whether the calls to an external system fail or succeed. If the calls begin to fail, you register the external system as being down and stop making calls to the external system. After a while you try passing a few calls on to the external system to see if it is up and running again.

Response Time Throttling is a technique where you monitor the time each call to an external system takes, and if the response time rises above a certain threshold, you filter off some of the calls to the external system. Thereby your system does not put additional strain on the external system, and hopefully the external system will eventually come up to speed again.

Timeout is a technique where you stop waiting for response from an external system when a certain response time threshold is reached. You basically abandon your request and return an error message back to the caller saying that the timeout has expired.

The standard solution

Usually some or all of the above patterns and techniques are being built into our systems, but very often people are reinventing the wheel over and over again. There are so many different ways to include these techniques in the interaction with external systems, and the result of this is more often than not a bit messy. As an example, take a look at the following code:

public class ItemServiceGateway
{
   private static Dictionary<int, Tuple<List<Item>, DateTime>> _Cache = new Dictionary<int, Tuple<List<Item>, DateTime>>();
   private static int _SecondsToCacheResult = 30;

   public List<Item> GetItemsInCategory(int categoryId)
   {
      List<Item> results = GetResultsFromCache(categoryId);
      if (results != null)
      {
         return results;
      }
      else
      {
         results = GetResultsFromExternalSystem(categoryId);
         AddResultsToCache(categoryId, results);
         return results;
      }
   }

   private List<Item> GetResultsFromCache(int categoryId)
   {
      if (_Cache.ContainsKey(categoryId))
      {
         if (_Cache[categoryId].Item2.AddSeconds(_SecondsToCacheResult) > DateTime.Now)
         {
            return _Cache[categoryId].Item1;
         }
         else
         {
            _Cache.Remove(categoryId);
            return null;
         }
      }
      else
      {
         return null;
      }
   }

   private List<Item> GetResultsFromExternalSystem(int categoryId)
   {
      List<Item> results = null;
      // Make call to external system to get the results.
      return results;
   }

   private void AddResultsToCache(int categoryId, List<Item> results)
   {
      if (_Cache.ContainsKey(categoryId))
      {
         _Cache.Remove(categoryId);
      }
      _Cache.Add(categoryId, new Tuple<List<Item>, DateTime>(results, DateTime.Now));
   }
}

The above code is a simplified but very realistic way of implementing a cache around a single call to an external system. Now multiply the above code by the number of external system calls your system makes … And yes, I am aware that this implementation is not thread safe, but this just amplifies my point: It takes a lot of time and effort to include these techniques and to make them work correctly and robust.

The problems with the standard solution

From the above code it is obvious that implementing your own cache for every external system call is not a very efficient way to go. Your system will end up being 90% plumbing code and 10% actual value-adding code. The development costs will be very high, the maintenance costs will be very high, and the potential for errors will be very high.
One of the biggest problems, in my opinion anyway, is that the above code obscures the intent of the code. You can’t really see the forest for the trees. In fact, the call to the external system is buried in one of the private helper methods, so if will take longer to grasp what really goes on, when a call to GetItemsInCategory is made.

Wouldn’t it be nice, if the code looked something like this instead?

   public class ItemServiceGateway
   {
      [Cache(30)]
      public List<Item> GetItemsInCategory(int categoryId)
      {
         List<Item> results = null;
         // Make call to external system to get the results.
         return results;
      }
   }

Well, that is my goal with Sapit!

Sapit to the rescue

I am working in my precious spare time to make the above code work like you would expect it to: Keep a cache of results from the external system, each with a validity period of 30 seconds, and avoid calling the external system, if we already have the results in the cache.

I have built Sapit (Small And Powerful Integration Toolkit) and put it on Codeplex. Included in the source code is a small WinForms project, that demonstrates the use of each of the currently supported features.

Sapit uses Aspect Oriented Programming (AOP) to intercept calls to methods and it includes all four of the above mentioned patterns and techniques, Cache, Circuit Breaker, Response Time Throttle and Timeout. This means that all you have to do to apply, say, the Circuit Breaker pattern to a method, is to apply the following attribute to the method:

[CircuitBreaker(3, 20, Behavior.ThrowException, "Service is down")]

The meanings of the parameters are:

  • number of consecutive calls to throw exceptions before we shut off the external system (3 calls),
  • the minimum time to wait before we try to call the external system again (20 seconds),
  • the behavior when the external system is shut off (throw an exception / return a hard-coded value),
  • the value to return or exception message to use when the external system is shut off (“Service is down”)

Sapit is quite self-explanatory, and I have not yet written any documentation for it, but I encourage you to have a look at the source code for the client application included, as it demonstrates how to apply the different attributes.

At the time of writing, Sapit comes with a few limitations.
.NET only – Even though I have a very distant background in the java realm, I have been a .NET developer for a decade now, so Sapit is written in C#.
Thread safety – Sapit does not have this yet. I am working on it currently.
IoC containers – You have to use either Castle Windsor, PostSharp or the Unity Application Block to instantiate the classes you decorate with Sapit attributes, and depending on which IoC container you use the steps to follow differ a bit. It is fairly easy to add support for other IoC containers, so please let me know you use a different one.
Multiple attributes on a single method – I have not tested this at all. Some combinations might work, some might wreak havoc. I am currently working on making Sapit robust in this area.

Next steps

So where do I go from here? Actually, I am hoping for some feedback from you guys. Should I include some more patterns and techniques (which ones?), should I add support for other IoC containers (which ones?), or is Sapit applicable in it’s current form?

I am counting on you guys here!
Please have a look at Sapit, try applying some of the attributes to your own code, and enjoy the cleanliness of code that (hopefully) emerges.
And please, let me know what you think about Sapit, thanks :-)

Ingen kommentarer:

Send en kommentar