Performance and Scalability of SMILE

The engine.
anknai
Posts: 3
Joined: Mon Jan 07, 2008 5:31 am

Performance and Scalability of SMILE

Post by anknai »

Hi,
We are planning to use SMILE in an enterprize environment where we can draw Bayesian Network according to the probabilities specified by end user. There can be thousands of users hitting at a time using there web browsers. I have following queries:
i. Is the application scalable enough to handle thousands of request per second?
ii. Is the application multithreaded so that multiple threads can access without interfering with other thread's data?
iii. Can I use it over Linux/Unix platform as it is using .dll file which is supported by Windows only.
iv. Currently GeNIe is Desktop Application, can we enhance it to use in a web application?

Thanks in Advance
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: Performance and Scalability of SMILE

Post by shooltz[BayesFusion] »

anknai wrote:i. Is the application scalable enough to handle thousands of request per second?
That's very general question. If you just want to do inference, the performance will depend on the structure of your network(s) and the evidence set.

ii. Is the application multithreaded so that multiple threads can access without interfering with other thread's data?
I assume you're asking about using SMILE (which is a library, not an application) in multi-threaded environment. It's possible, as long as you use any DSL_network from one thread at the time.

iii. Can I use it over Linux/Unix platform as it is using .dll file which is supported by Windows only.
We have SMILE binaries for various Linux distributions. You can also choose jSMILE, which is a Java wrapper for SMILE.

iv. Currently GeNIe is Desktop Application, can we enhance it to use in a web application?
No, you'll have your web application from scratch.
anknai
Posts: 3
Joined: Mon Jan 07, 2008 5:31 am

Re: Performance and Scalability of SMILE

Post by anknai »

Thanks Shooltz for the quick reply,
I am using jSmile as a Java Wrapper.
I assume you're asking about using SMILE (which is a library, not an application) in multi-threaded environment. It's possible, as long as you use any DSL_network from one thread at the time.
I am not getting the DSL_network thing, could you please elaborate it in detail with respect to jSmile.
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: Performance and Scalability of SMILE

Post by shooltz[BayesFusion] »

I am not getting the DSL_network thing, could you please elaborate it in detail with respect to jSmile.
DSL_network is in C++ SMILE. The jSMILE equivalent is the Network class.
anknai
Posts: 3
Joined: Mon Jan 07, 2008 5:31 am

Re: Performance and Scalability of SMILE

Post by anknai »

Thanks Shooltz, it clears my doubt.
One more query:
How can we obtain the commercial version of SMILE if we need to use it in a commercial application and how much will it cost to us.
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: Performance and Scalability of SMILE

Post by shooltz[BayesFusion] »

There's no commercial version of SMILE. You can read our license here:

http://genie.sis.pitt.edu/license.html

We can also provide consulting services on request.
romeo
Posts: 7
Joined: Mon Dec 10, 2007 5:27 pm

Post by romeo »

I have follow-ups to this

1. Scalability - you have described it is a function of the network topology kind of evidence etc. But when working with say, thousands of variables, are there any best practices/guidelines to follow? how would one go about optimizing the model for scalability

2. Parallelism - Can we parallelize the model (by say subnetting) and improving scalability

3. Performance - Are the inference algorithms linear, n log n or exponential in performance?

Thanks
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Post by marek [BayesFusion] »

romeo wrote: 1. Scalability - you have described it is a function of the network topology kind of evidence etc. But when working with say, thousands of variables, are there any best practices/guidelines to follow? how would one go about optimizing the model for scalability

2. Parallelism - Can we parallelize the model (by say subnetting) and improving scalability

3. Performance - Are the inference algorithms linear, n log n or exponential in performance?
Ad 1: One good heuristis is to keep the number of parents of any node in the network small. The number of parents (and the number of states that they each have) determines the size of CPTs and, if you use the exact algorithm, the size of clusters. A reasonable number of parents is a few. Try not to allow for more than 10 parents if possible. There is a simple and old technique to redunce the number of parents called "parent divorcing". If you search for it in the context of Bayesian networks, you will find a simple description.

Ad 2. SMILE does not have any parallel algorithms at the moment. Some algorithms are very suitable for parallelism (e.g. sampling), others less. If you have a multi-CPU machine and want to use all its resources, you can have app-level parallelism, like calculating multiple cases at the same time.

Ad 3: Belief updating is NP-hard. So are many other algorithms. This does not really prevent us from successfully computing in practical models consisting of thousands of variables. There exist models that are smaller and hard, though :-).

I hope this helps.
Cheers,

Marek
romeo
Posts: 7
Joined: Mon Dec 10, 2007 5:27 pm

Post by romeo »

Dear Marek,

Thanks very much for your detailed response. I have looked up "parent divorcing" and have a better idea of how to reduce the complexity of muli-parent nodes.

But I have a general question about Bayesian complexity/scalability - are there any papers, books on this topic that you would suggest reading? One of the challenges we have been facing is working with high-dimensional datasets and any pointers to readings would be very helpful.

Thanks for your continued help and support.
R.
djr45
Posts: 4
Joined: Tue Aug 26, 2008 5:44 pm

Re: Performance and Scalability of SMILE

Post by djr45 »

ii. Is the application multithreaded so that multiple threads can access without interfering with other thread's data?
I assume you're asking about using SMILE (which is a library, not an application) in multi-threaded environment. It's possible, as long as you use any DSL_network from one thread at the time.
Do you mean that if I wanted to give different evidence to the same network (because 2 different users are using the same network to assess different scenarios) I will not be able to do it concurrently because it will update the same object? Also, if I wanted to do it, would I have to clone my Network object per user connecting to the program? This approach will only work for VERY small networks and a VERY limited number of users (the computer will crash, otherwise, with all this data!)
iii. Can I use it over Linux/Unix platform as it is using .dll file which is supported by Windows only.
We have SMILE binaries for various Linux distributions. You can also choose jSMILE, which is a Java wrapper for SMILE.
I was not able to find the relevant information and so I'm asking here: do you STRICTLY provide binaries, or do you provide code of the engine as well that can be modified (under a license, of course)? If I wanted to customize it to my environment (run it on a cluster, for example, that requires special calls to the DB), could I do that?
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: Performance and Scalability of SMILE

Post by shooltz[BayesFusion] »

Do you mean that if I wanted to give different evidence to the same network (because 2 different users are using the same network to assess different scenarios) I will not be able to do it concurrently because it will update the same object?
Yes, this is correct.
Also, if I wanted to do it, would I have to clone my Network object per user connecting to the program?
True.
This approach will only work for VERY small networks and a VERY limited number of users (the computer will crash, otherwise, with all this data!)
What is the limiting factor in your scenario? The size of network (the representation of CPTs, basically) or the memory required to actually perform the inference? Note that the latter is allocated only during the UpdateBeliefs call.

Anyway, if you feel that memory will be exhausted you'll probably need to queue the inference requests and perform them on the predefined number of networks.

I was not able to find the relevant information and so I'm asking here: do you STRICTLY provide binaries, or do you provide code of the engine as well that can be modified (under a license, of course)? If I wanted to customize it to my environment (run it on a cluster, for example, that requires special calls to the DB), could I do that?
For core C++ library we provide binaries only for multiple platforms. For jSMILE we provide full source code for the wrapper part (both Java and JNI C++), but jSMILE depends on the core C++ binaries, which are required to build it.
bill
Posts: 4
Joined: Tue Oct 21, 2008 4:49 pm

About scalability

Post by bill »

I have created a Java package that uses JSmile wrapper in order to create a randomised BBN in which every node has not more than 3 parents.

I did that in order to test SMILE's performance when there is a large number of nodes.

The problem I face is kind of 'strange'; I think I am missing something (I am actually missing a lot since I don't know the source code of SMILE and I don't want to know it at the moment!).

After the 15000th node is created, the whole procedure slows down and somehow freezes. As a result, I cannot add more nodes to the Network... Is there a simple solution to that? What I figure out is that the Network object has some limitations ... the thing is that the program does not use more memory as the number of nodes increases...

My explanation is not complete; I can give more information later (if a response depends on this!).
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: About scalability

Post by shooltz[BayesFusion] »

bill wrote:After the 15000th node is created, the whole procedure slows down and somehow freezes. As a result, I cannot add more nodes to the Network... Is there a simple solution to that? What I figure out is that the Network object has some limitations ... the thing is that the program does not use more memory as the number of nodes increases...
There's no artificial limit on the node count. Is there a chance that one of child nodes has parents with very large number of outcomes?
bill
Posts: 4
Joined: Tue Oct 21, 2008 4:49 pm

Re: About scalability

Post by bill »

shooltz wrote: There's no artificial limit on the node count. Is there a chance that one of child nodes has parents with very large number of outcomes?
No. Each node represents a discrete variable with two possible outcomes (e.g. true or false) and the MAX number of children or parents for a node is 3.

The program works fine if the size of the network is apprx. below 15.000 (nodes).

The strange thing is that the program slows down BEFORE applying the arcs of the network. That means that the following code will start slowing down for i > 15000.

for (int i = 1; i <= nodeNum; i++){
net.addNode(Network.NodeType.Cpt, "Node" + i);
}
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Re: About scalability

Post by shooltz[BayesFusion] »

bill wrote:The strange thing is that the program slows down BEFORE applying the arcs of the network. That means that the following code will start slowing down for i > 15000.

for (int i = 1; i <= nodeNum; i++){
net.addNode(Network.NodeType.Cpt, "Node" + i);
}
I was able to get past 15000 without significant slowdowns. However, the Network.addNode method (or rather the wrapped C++ method) performs the check for the uniqueness of the node identifier by linearly comparing the new id with all existing ids, so you can expect addNode taking more time as you add nodes.
Post Reply