Garbage Collection in .NET

Posted by Dharmendra December 09, 2012

Garbage Collection in .NET

First of all go through the below image :

Once you understand how .NET's garbage collector works, then the reasons for some of the more mysterious problems that can hit a .NET application become much clearer. NET may have promised the end to explicit memory management, but it is still necessary to profile the usage of memory when you're developing .NET applications if you wish to avoid memory-related errors and some performance issues.

Introduction

The process of garbage collection starts with the GC assuming all memory on the managed heap is rubbish. It then lists all your application's global and static memory pointers, local or parameter variables and CPU registers containing references to objects on the heap, and then uses these objects to build a graph of all objects on the heap that are either directly or indirectly referenced by your application. Any object on the managed heap that can somehow be accessed by your application will be marked. There are optimisations in place to remove the possibility of circular memory references causing infinite loops, and to ensure that chains of references are only processed once. Once the graph is complete the GC now has a complete picture of what is, and isn't, garbage. The GC then compacts the heap by moving non-garbage items together and then resets it 'Next available memory slot' pointer to the top of this new, compacted heap. In doing so the GC is also responsible for updating the values of all pointers into the heap, so that references to objects on the heap that have been moved are still valid. Large objects (> 85,000 bytes) are treated a little different that smaller objects. Objects of this size are allocated on a separate large heap and when garbage collection occurs these objects are not moved around since moving blocks of memory this large can really start to slow things down. If, after garbage collection has occurred, there is still insufficient memory for the memory allocation request, an OutOfMemoryException exception is thrown.

How the Garbage Collector works

How, then, does the garbage collector achieve its magic? The basic idea is pretty simple: it examines how objects are laid out in memory and identifies all those objects that can be ‘reached’ by the running program by following some series of references. When a garbage collection starts, it looks at a set of references called the ‘GC roots’. These are memory locations that are designated to be always reachable for some reason, and which contain references to objects created by the program. It marks these objects as ‘live’ and then looks at any objects that they reference; it marks these as being ‘live’ too. It continues in this manner, iterating through all of the objects it knows are ‘live’. It marks anything that they reference as also being used until it can find no further objects. An object is identified, by the Garbage Collector, as referencing another object if it, or one of its superclasses, has a field that contains the other object. Once all of these live objects are known, any remaining objects can be discarded and the space re-used for new objects. .NET compacts memory so that there are no gaps (effectively squashing the discarded objects out of existence) - this means that free memory is always located at the end of a heap and makes allocating new objects very fast. GC roots are not objects in themselves but are instead references to objects. Any object referenced by a GC root will automatically survive the next garbage collection. There are four main kinds of root in .NET: A local variable in a method that is currently running is considered to be a GC root. The objects referenced by these variables can always be accessed immediately by the method they are declared in, and so they must be kept around. The lifetime of these roots can depend on the way the program was built. In debug builds, a local variable lasts for as long as the method is on the stack. In release builds, the JIT is able to look at the program structure to work out the last point within the execution that a variable can be used by the method and will discard it when it is no longer required. This strategy isn’t always used and can be turned off, for example, by running the program in a debugger. Static variables are also always considered GC roots. The objects they reference can be accessed at any time by the class that declared them (or the rest of the program if they are public), so .NET will always keep them around. Variables declared as ‘thread static’ will only last for as long as that thread is running. If a managed object is passed to an unmanaged COM+ library through interop, then it will also become a GC root with a reference count. This is because COM+ doesn’t do garbage collection: It uses, instead, a reference counting system; once the COM+ library finishes with the object by setting the reference count to 0 it ceases to be a GC root and can be collected again. If an object has a finalizer, it is not immediately removed when the garbage collector decides it is no longer ‘live’. Instead, it becomes a special kind of root until .NET has called the finalizer method. This means that these objects usually require more than one garbage collection to be removed from memory, as they will survive the first time they are found to be unused.

What Happens During a Garbage Collection

A garbage collection has the following phases:

A marking phase that finds and creates a list of all live objects.
A relocating phase that updates the references to the objects that will be compacted.
A compacting phase that reclaims the space occupied by the dead objects and compacts the surviving objects. The compacting phase moves objects that have survived a garbage collection toward the older end of the segment.

Because generation 2 collections can occupy multiple segments, objects that are promoted into generation 2 can be moved into an older segment. Both generation 1 and generation 2 survivors can be moved to a different segment, because they are promoted to generation 2. The large object heap is not compacted, because copying large objects imposes a performance penalty. The garbage collector uses the following information to determine whether objects are live:

Stack roots. Stack variables provided by the just-in-time (JIT) compiler and stack walker.
Garbage collection handles. Handles that point to managed objects and that can be allocated by user code or by the common language runtime.
Static data. Static objects in application domains that could be referencing other objects. Each application domain keeps track of its static objects.

Before a garbage collection starts, all managed threads are suspended except for the thread that triggered the garbage collection. The following illustration shows a thread that triggers a garbage collection and causes the other threads to be suspended.

Thread that triggers a garbage collection

Performance of the Garbage Collector
In terms of performance, the most important characteristic of a garbage collected system is that the garbage collector can start executing at any time. This makes them unsuited for situations where timing is critical, as the timing of any operation can be thrown off by the operation of the collector. The .NET collector has two main modes of operation: concurrent and synchronous (sometimes known as workstation and server). Concurrent garbage collection is used in desktop applications and synchronous is used in server applications such as ASP.NET by default. In concurrent mode, .NET will try to avoid stopping the running program while a collection is in progress. This means that the total amount that the application can get done in a given period of time is less but the application won’t pause. It’s good for interactive applications where it’s important to give the impression to the user that the application is responding immediately. In synchronous mode, .NET will suspend the running application while the garbage collector is running. This is actually more efficient overall than concurrent mode - garbage collection takes the same amount of time, but it doesn’t have to contend with the program continuing to run - but means that there can be noticeable pauses when a full collection has to be done. The type of garbage collector can be set in the configuration file for the application if the default isn’t suitable. Picking the synchronous collector can be useful when it’s more important that an application has a high throughput instead of appearing responsive. In large applications, the number of objects that the garbage collector needs to deal with can become very large, which means it can take a very long time to visit and rearrange all of them. To deal with this, .NET uses a ‘generational’ garbage collector, which tries to give priority to a smaller set of objects. The idea is that objects created recently are more likely to be released quickly, so a generational garbage collector prioritises them when trying to free up memory, so .NET first looks at the objects that have been allocated since the last garbage collection and only starts to consider older objects if it can’t free up enough space this way. This system works best if .NET can choose the time of collection itself, and will be disrupted if GC.Collect() is called, as this will often cause new objects to become old prematurely, which increases the likelihood of another expensive full collection in the near future. Classes with finalizers can also disrupt the smooth operation of the garbage collector. Objects of these classes can’t be removed immediately: they instead go to the finalizer queue and are removed from memory once the finalizer has been run. This means that any object they reference (and any object referenced by those, and so on) has to be kept in memory at least until this time as well and will require two garbage collections before the memory becomes available again. If the graph contains many objects with finalizers, this can mean that the garbage collector requires many passes to completely release all of the unreferenced objects. There is a simple way to avoid this problem: implement IDisposable on the finalizable classes, move the actions necessary to finalize the object into the Dispose() method and call GC.SuppressFinalize() at the end. The finalizer can then be modified to call the Dispose() method instead. GC.SuppressFinalize() tells the garbage collector that the object no longer needs to be finalized and can be garbage collected immediately, which can result in memory being reclaimed much more quickly.
Conclusion It becomes easier to understand memory and performance problems in an application if you take time to understand how the garbage collector works. It reveals that, while .NET makes the burden of memory management lighter, it does not completely eliminate the need to track and manage resources. It is, however, easier to use a memory profiler to diagnose and fix problems in .NET. Taking account of the way .NET manages memory early in development can help reduce problems, but even then such problems can still arise because of the complexity of the framework or third-party libraries. :
1. MSDN [http://msdn.microsoft.com/en-us/library/ee787088.aspx]
2. CodeProject [http://www.codeproject.com/Articles/1060/Garbage-Collection-in-NET]

Search This Blog

DKTech