Monday, November 30, 2009

Managed threads in “whole stack committed” shocker

I was experimenting with thread creation in a managed process, to test the limits, and I noticed something very odd.  Creating a large number of threads was using a surprisingly large amount of memory.  I guessed it must be the thread stacks, but the numbers didn’t fit with my understanding of how thread stacks are allocated.  

In an unmanaged process, by default each thread has 1 Mbyte of address space reserved for its stack, but initially only a small amount of this address space is committed.   You can see this with the excellent VMMAP utility – here’s what it shows for an unmanaged x64 application:

unmanaged_threads_x64

The ‘size’ is the reserved address space – 1024k as expected.  The committed memory is 16k, which is the amount of the address space that actually has physical storage associated with it.   This can grow as required, up to the size of the reserved address space, but usually it won’t get close to that. 

The size of the reserved address space is important in the sense that we need to make sure we don’t run out – 2000 threads would be enough to max out a 32-bit process, for example.   But generally speaking it’s the committed size that has more impact - that number has to be backed by real storage, in RAM or at least the page file.

So, back to the managed process.  This is what VMMAP shows:

managed_threads_x64

Reserved address space is the same as before at 1024k, but the whole thing is committed!  So each thread is using the full 1 Mbyte of physical storage, regardless of how much memory the stack actually needs.   That can’t be right, can it?   Well it seems it is, as Joe Duffy explains.

So, the bottom line is that you should think carefully about creating managed threads that are going to sit idle in your process, for example in a thread pool.   In an unmanaged process those idle threads wouldn’t be using many resources, but in a managed process they’re using a significant chunk of valuable memory.   This is even more important if you have any form of multi-process architecture, perhaps with thread pools in each process. 

If you really need a large number of long-lived threads in your managed process, consider reducing the size of the stack using the appropriate overload of the System.Threading.Thread constructor.

Labels: , , ,


Saturday, October 24, 2009

Managed DCOM server

For my current project I have some old unmanaged C++ code that needs to talk to a new .NET server, remotely.   The channel needs to be fast and secure.   There are a lot of options, but I’ve narrowed it down to two:

  1. WWSAPI for the C++ client, WCF for the .NET server, using NetTcpBinding.  (See here for details.)
  2. DCOM

Option 1 was the strong favourite until Microsoft decided to play silly buggers with the redistribution rights for WWSAPI, so I’ve been forced to investigate option 2.

COM interop through loading .NET components into an unmanaged process is well documented and generally works fine, but accessing a .NET server remotely via (D)COM is not so well documented, and indeed it’s not even clear if it’s supported by Microsoft.   But it does seem to work, and it’s surprisingly simple once you discover the RegisterTypeForComClients method on the RegistrationServices class.   This is basically a wrapper for COM’s CoRegisterClassObject.

Here’s the C# server code:

[ComVisible(true)]
public interface ICalculator
{
    int Add(int x, int y);
}

[ComVisible(true)]
[ClassInterface(ClassInterfaceType.None)]
public class Calculator : ICalculator
{
    public int Add(int x, int y) { return x + y; }
}

class Program
{
    [MTAThread]
    static void Main(string[] args)
    {
        var regServices = new RegistrationServices();

        int cookie = regServices.RegisterTypeForComClients(
            typeof(Calculator),
            RegistrationClassContext.LocalServer | RegistrationClassContext.RemoteServer,
            RegistrationConnectionType.MultipleUse);

        Console.WriteLine("Ready"); Console.ReadKey();

        regServices.UnregisterTypeForComClients(cookie);
    }
}

and the C++ client code:

#import "dcomserver.tlb" no_namespace raw_interfaces_only

int _tmain(int argc, _TCHAR* argv[])
{
    CoInitializeEx(0, COINIT_MULTITHREADED);

    {
        CComPtr<ICalculator> spCalc;
        spCalc.CoCreateInstance(__uuidof(Calculator), 0, CLSCTX_LOCAL_SERVER);
        
        long result = 0;
        spCalc->Add(10, 20, &result);
        
        cout << result << endl;
    }

    CoUninitialize();
    return 0;
}

(The simplest way to generate the tlb file to #import is to run regasm /tlb DcomServer.exe)

There may well be some gotchas with this approach – I haven’t tested it thoroughly yet – but it seems a promising option if the WWSAPI licensing issues can’t be sorted out.

Labels: , , , ,