Boosting .NET application's performance – The base class library
Brief
The BCL (Base Class Library) contains the fundemental types of the CLR (Common Language Runtime). Among it's namespaces we can find the System, System.Collections, System.CodeDom, System.Diagnostics, System.IO, System.Text etc.. Those types are located in the mscorlib.dll and the System.dll.
In this article I'm going to refer to some of those basic types and will highlight some misconceptions and ambiguous practices.
Code samples with benchmarking for all the issues which are being discussed in this article are available for download at the end.
Most of the cases I'll discuss in here were published by the CLR team in the MSDN Magazine's January 2006 issue, CLR Inside Out article by Kit George
The basic misconception
Due to its name - the BASE class library, most developers tend to assume that those types are the most efficient for its domain. Microsoft's professionals are publishing articles and code samples which are dealing with those types under the "Best Practices" term, sometime without refering counter-practices or when not to use those "Best Practices".
Cases where BaseType.Parse() should be used instead of BaseType.TryParse()
In .NET 2.0 we were introduced to TryParse, a new method for parsing strings into base types.
Normaly, that's the most efficient way to parse strings. In case where the string was successfully parsed, this method does not differ much from the regular Parse method, but in case that the parsing failed, this method provides better performance by far.
In order to understand the reason, we need to understand the excpetion models; generating an exception is one of the most resource-greedy operations. Therefore, preventing exceptions by designated logic is always better. That's exactly what TryParse does. TryParse checks whether the parsing would succeed, and if its not - it returns a Boolean with the value false. In case that the check was positive, it is parsing it just like the Parse method.
Yet there are some cases where we do want to handle the exception. The most obvious is when we are parsing input which we want to fail the process when it's invalid; the second most obvious is when we want to have a better clue on why the parsing failed by catching different exception types.
The Parse method may throw the following types of exceptions:
- OverFlowException - in case the numeric input is not between BaseType.MinValue and BaseType.MaxValue. For instace, trying to parse 32768 into an Int16.
- FormatException - in case the string does not contain any legal characters for the type we are trying to parse. For instance, trying to parse a String.Empty into an Integer, or any other non-numeric character.
- ArgumentNullException - Will always be thrown when a null was passed to the method.
- ArgumentException - This happens only for certain types, where one of the values is invalid. For instace, when trying to parse a string into an Enumeration.
Cases you might prefer a simple ArrayList over a generic List<>
We all know Generics are more efficient than objects. Using Generics provides both better performance by avoiding unneccesary boxing and unboxing of objects, and better debugging by providing strong-types compilation errors instead of runtime casting exceptions when wrong types are being casted.
Those two reasons wiped the usage of ArrayList for most .NET 2.0 and newer applications, while in .NET 1.0 and 1.1 it is still one of the most consumed collection. Yet there are still some cases where we would still prefer to use this elder collection.
For value types, generics would always be better. So in case you consider storing an array of Integers, you should not even consider using ArrayList. But for reference types, the ArrayList would perform better for data extraction.
In case your application is rich with Sort and Contains calls, you might want to consider using an ArrayList. As you can see from the samples attached to this article, using an ArrayList.Sort() for large amount of Strings is faster x10 times than using the List<String>.Sort(). A similar case is for the ArrayList.Contains(String) and List<String>.Contains(String).
Difference between SortedList<> and SortedDictionary<>
The System.Collections.Generic namespace contains various collections for different purposes. Among those we can find the List<>, LinkedList<>, Queue<>, Stack<>, Dictionary<> and the SortedList<> and SortedDictionary<>. Those are implementations of well known collection types, and they differ by the internal management of the data, and the way the data is being indexed.
The Dictionary<> for instance is based upon Nodes. Node is a reference type, and therefore fills the heap and might cause GarbageCollection overhead. Also, the fact that the nodes are actually linked to each other and in contrary to Arrays they are not sitting in the same memory range, it is possible that some nodes would not be loaded to the CPU cache while other would. This might cause some time consuming paging operations and sometime even page faults.
The SortedDictionary<> is an implementation of the Dictionary<> where sorting and extracting from a sorted array algorithms were attached to. By using a bubble sorting algorithm when inserting a new node and a binary search when extracting a node makes it the most efficient way to store sorted collections with the limitations of the Dictionary<>. However, if storing in an O(log n) is less important for you, and you need good performance only for the data extraction, you might prefer to use the SortedList<> collection, where the data is stored in a way which is similar to regular arrays. The sorted list storing would provide an O(n) performance.
Don't use localized DateTime instances when it's not needed.
Most of us discover features of object by intellisensing. This behavior sometimes fails us when a better alternative is available. The DateTime.Now is a classic sample. When getting the system time inside a business flow, you seldom need the local time. Local time is needed only when displaying time information to the user. Behind the scenes, you should always prefer to work with UTC (Universal Time Coordinated) format.
Not only that working with UTC provides a better way to synchronize time formats from several sources (Also, no DST (Daylight Saving Time) calculations need to be taken in consideration)� it also provides much better performance. Calling the DateTime.UtcNow getter is x10 times more efficient than calling the DateTime.Now property getter. That's because the getter method does not need to check for localization information for the object it is returning. When displaying the DateTime to the user, localize it once before displaying it. Keep the calculations at the backend in a unified format.
StringBuilder.Append() is not always better than String.Concat()
When you want to concatenate strings, you should examine the amount of strings which are being used in the process and their length. By far, it is true that StringBuilder.Append() performs better for tens and more iterations of string concatenattions - yet when you intend to concatenate less than ten strings, the penalty you pay for the construction of the StringBuilder just does not worth it. In those cases, you should still prefer the good and old String.Concat or in it's more common name: +=.
String.CompareOrdinal is good for most cases.
As I've said in the localzied DateTime section, sometimes the intellisense spoils our programming skills. Using the String.Compare (or String.CompareTo) routines which are the default completion by the intellisense is much more time consuming than String.CompareOrdinal. While String.CompareOrdinal does the most obvious and compares the ordinal values of the strings (just like comparing chars in C++), String.Compare takes in consideration also culture information about the strings which are being compared.
The only reason to use the String.Compare would be when comparing strings from different cultures, or when you would like to use one of the overloads of the Compare method which is not available for the CompareOrdinal method (such as Case insensetive comparisons etc.).
Using the File.ReadAll... routines pros and cons
Another great feature of the CLR since version 2.0 is the ability to read the entire content of a file by using one line of code.
The three methods (ReadAllLines, ReadAllText & ReadAllBytes) provided by the BCL team not only iterate the file for us, but also takes care for handler disposal. Therefore, for everyday activities you might consider using those methods.
Yet in case your file isn't just an accumulation of bytes and chars, and you want to parse records from each line - you might still want to consider using the old File.ReadLine() method.
While the IEnumerable method ReadAllLines provides a great way for doing so, File.ReadLine() is still more efficient for reading just one line at a time. Here there is no absolute recommendation, but benchmarking both is important and illuminative.
For conclusion
I've touched here only one thousandth of the complexity which can be taken in consideration when developing applications in the .NET environment. The bottom line is that each solution should be profiled and checked carefully for the project it is intended. The "Best Practices" provides basic "do"s and "don't"s which boosts the development process and provides, in most cases, better performance; yet remember that it is not constitutional and you should not blindly follow the yellow brick road which is being drawn by most professionals world-wide.
"Click here to download the code for this article.
blog comments powered by