System.AccessViolationException with SMILE.net

The engine.
Post Reply
svenr
Posts: 9
Joined: Mon Aug 24, 2009 8:28 am

System.AccessViolationException with SMILE.net

Post by svenr »

Hello Forum!

I am using the Smile.net wrapper to access SMILE from an ASP.net web application.
In order to avoid the expensive loading of a Network from disk using the ReadFile-method for every request, I am keeping a "master copy" in memory and then use the .Clone()-method to give out "fresh" networks to handle the requests.
Occasionally, I am getting AccessViolationExceptions from within the .Clone()-call and, more seldomly, during the Finalize()-method of the Network-object (apparently, when Garbage Collection runs).

What's wrong here?
shooltz[BayesFusion]
Site Admin
Posts: 1457
Joined: Mon Nov 26, 2007 5:51 pm

Re: System.AccessViolationException with SMILE.net

Post by shooltz[BayesFusion] »

svenr wrote: Occasionally, I am getting AccessViolationExceptions from within the .Clone()-call and, more seldomly, during the Finalize()-method of the Network-object (apparently, when Garbage Collection runs).
Hard to tell without looking at the specifics of your application. Can you estimate how much memory is used by CPTs in your network? Do you have multiple threads accessing single Smile.Network objects?
svenr
Posts: 9
Joined: Mon Aug 24, 2009 8:28 am

Re: System.AccessViolationException with SMILE.net

Post by svenr »

shooltz wrote:Hard to tell without looking at the specifics of your application.
Thanks for answering! Let's see whether I can get specific enough without having to post the entire application ;-)
Can you estimate how much memory is used by CPTs in your network?
The memory difference between an "empty" Network and a "loaded" one is about 100MB in Windows' task manager.
Do you have multiple threads accessing single Smile.Network objects?
I don't think so. Being used in a web application, we obviously have concurrent requests, but the network cloning is synchronized and afterwards, every thread should be using its own separate instance of the Smile.Network.

See attached a minimized C#-codefile that does the Cloning. It is implemented as a singleton to be available during the entire lifetime of the web server's worker process.

Callers would do something like:

Code: Select all

Network myNetwork = BayesContainer.Instance.CloneNetwork();
and then work with their "myNetwork".
Attachments
BayesContainer.cs.txt
Class to hold the SMILE network
(853 Bytes) Downloaded 1241 times
shooltz[BayesFusion]
Site Admin
Posts: 1457
Joined: Mon Nov 26, 2007 5:51 pm

Re: System.AccessViolationException with SMILE.net

Post by shooltz[BayesFusion] »

svenr wrote:Thanks for answering! Let's see whether I can get specific enough without having to post the entire application ;-)
If there's a chance of posting the entire application, it would be very helpful :) You can also send me a private message.

The memory difference between an "empty" Network and a "loaded" one is about 100MB in Windows' task manager.
Please note that .NET garbage collector is not aware of this difference, unless you explicitly call GC.AddMemoryPressure - the 100 MB is almost exclusively allocated by unmanaged code (the C++ SMILE library). There's a non-zero chance that garbage collection is delayed due to small amount of allocation on .NET-managed heap, but at the same time the actual allocations are hitting the address space limit. You can ensure that native code deallocates its memory by calling Network.Dispose directly or by encapsulating the code utilizing given Smile.Network object with 'using' statement.

See attached a minimized C#-codefile that does the Cloning. It is implemented as a singleton to be available during the entire lifetime of the web server's worker process.
Looks OK. The network which is the output of Clone() is subsequently used only on single thread, right?
svenr
Posts: 9
Joined: Mon Aug 24, 2009 8:28 am

Post by svenr »

Thanks for the hints with AddMemoryPressure and Network.Dispose(). :D
I was unable to reproduce the exception with the code that uses Dispose() so it appears that your suggestions did the trick (although the bug was not reliably reproducible even before - will report back if it re-surfaces).

For completeness, I have attached a more comprehensive example to this post. It contains an updated BayesContainer.cs that uses both suggestions.
The attachment also contains an example of how I use SMILE in a WCF webservice. The project should compile in VS2008 once you put a smilenet.dll into its directory.
It should show that accesses to a SMILE.Network instance are single threaded, with the exception of the cloning itself (which is synchronized for this reason). The cloned SMILE.Network is only used as a local member within one method.
Attachments
BayesService.zip
Exemplary Webservice project
(6.96 KiB) Downloaded 1393 times
svenr
Posts: 9
Joined: Mon Aug 24, 2009 8:28 am

Post by svenr »

Stupid me, why am I posting so bold things... ;-)

Here is the relevant part of two stacktraces that may be of help:

Code: Select all

ERROR 0 - AccessViolationException - Attempted to read or write protected memory. This is often an indication that other memory is corrupt.	
  StackTrace:    at DSL_network.__ctor(DSL_network* ) 	
  at Smile.Network..ctor() 

Code: Select all

System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. 	
  at Smile.Network.Finalize()

Code: Select all

System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt. 	
  at DSL_network.=(DSL_network* , DSL_network* ) 	
  at Smile.Network.Clone() 	
Sorry, no line numbers in the debug output... It never happens on my local machine.
shooltz[BayesFusion]
Site Admin
Posts: 1457
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

svenr wrote:Thanks for the hints with AddMemoryPressure and Network.Dispose(). :D
I don't think there's a need for using both approaches. Since your cloned Network are only used within single method, the 'using' statement is the best choice - it will call Network.Dispose() at the end of the scope, even if exception is thrown.

With large network, it can be actually cheaper to run synchronized inference on single Nework object instead of synchronized copying/multithreaded update. This of course depends on the structure of the network, actual hardware and the workload.

I have prepared smilenet.dll compiled without optimizations and with extra diagnostic checks. This version may give you better stack frames, possibly with line numbers. I'm sending the download link as private message.
svenr
Posts: 9
Joined: Mon Aug 24, 2009 8:28 am

Post by svenr »

I got the debug build and will deploy it to the server ASAP.
Will report back with any new insights...
svenr
Posts: 9
Joined: Mon Aug 24, 2009 8:28 am

Back again...

Post by svenr »

Hi! It's me again - with bad (or, let's say, ambivalent) news...

The problem didn't surface again for quite some time now, mostly because of excessive synchronisation which effectively meant that the whole application ran single-threadedly...

In the meantime, thanks to your suggestion, I tried to avoid the expensive per-request cloning by implementing a pool from which threads may acquire SMILE.Networks.
The pool is built by reading one .xdsl file from disk several times (i.e. there should not be any connections between the Network-objects in the pool). Threads return Networks to the pool after they have used them.

Using the pool with a size of 1 (i.e., serialize all threads to use one single SMILE.Network) is fine. Setting the pool size to 2 or more causes an AccessViolationException under the following circumstances: When UpdateBeliefs() was executed on two ore more Networks in parallel, the next execution of UpdateBeliefs() on one of these Networks (done by the next thread to get it from the pool) throws the exception. In fact, calling UpdateBeliefs() twice within the same thread would also throw the exception, if another thread completed a call to UpdateBeliefs() between both calls.

However, the exception is thrown only for some BayesianAlgorithmTypes:
Henrion, HeuristicImportance and LSampling fail, while AisSampling, BackSampling, EpisSampling, Lauritzen and SelfImportance are ok.
The exception even occurs when the pool contains wholly different networks (i.e. read from separate files).
It appears to me that the first algorithms use some variable that is shared across the library and not private to the Network object (something 'static', perhaps?) which leads to interferences.

I have attached a sample project which implements the described behaviour and reproduces the Exception reliably (increase NUMBER_OF_THREADS in case it does not ;-)).
It is for VS2008 but I guess the .cs files can easily be compiled in other versions of Visual Studio. In any case, make sure to have a smilenet.dll referenced by the project to build it.

If you need any further information, let me know.
Attachments
AccessViolationExample.zip
Small VS2008 project to reproduce the AccessViolationException
(7.68 KiB) Downloaded 1441 times
shooltz[BayesFusion]
Site Admin
Posts: 1457
Joined: Mon Nov 26, 2007 5:51 pm

Re: Back again...

Post by shooltz[BayesFusion] »

svenr wrote:However, the exception is thrown only for some BayesianAlgorithmTypes:
Henrion, HeuristicImportance and LSampling fail, while AisSampling, BackSampling, EpisSampling, Lauritzen and SelfImportance are ok.
Bug confirmed & fixed. The code for the three failing sample algorithms is vintage 1996 :) There were some global variables, one of them a pointer to the object allocated on the heap for the duration of inference call. You can imagine what happens with the multiple threads.

The fix will be included in the upcoming release.
svenr
Posts: 9
Joined: Mon Aug 24, 2009 8:28 am

Re: Back again...

Post by svenr »

shooltz wrote:Bug confirmed & fixed.
Thanks very much, that's good news!
I'm looking forward to the new release. :D
svenr
Posts: 9
Joined: Mon Aug 24, 2009 8:28 am

Re: Back again...

Post by svenr »

shooltz wrote:Bug confirmed & fixed.
...
The fix will be included in the upcoming release.
I have run the smilenet.dll released on November 4 through my test suite and all algorithms passed without throwing the exception.

Thanks again! 8)
Post Reply