Very long training Time with smaller network

The front end.
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

Christian wrote:AFAIK if you set missing value to 80% then randomly 80% of the data is missing. But that is not the true data source. Row 1-8 has 100% data and row 9 to 13 about 20%.

if you set the missing value to 80% then everywhere data is missing, not only node 9 to 13. At least this is what I interpret the Genie function missing value.
That's correct - the missing values are distributed uniformly through all rows and columns.

In your reply above you first refer to 'row 9 to 13', then 'node 9 to 13'. I'm not sure if you'd like to have missing values only in specified rows or columns - nodes are mapped to columns, not rows.
Christian
Posts: 44
Joined: Wed Nov 28, 2007 12:32 pm

Post by Christian »

sorry, yes - I have mistakenly confised rows and columns. I am taking about the ones going from top to right :lol:
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

Christian wrote:I am taking about the ones going from top to right :lol:
You mean the diagonals? :)
Christian
Posts: 44
Joined: Wed Nov 28, 2007 12:32 pm

Post by Christian »

lol, top to right. I meant top to bottom. I should go to bed ;)

Yes, I mean the diagonals. Okay. Now it is right. :P
mark
Posts: 179
Joined: Tue Nov 27, 2007 4:02 pm

Post by mark »

I received the private message and downloaded the networks. So I should generate 100% data for columns nodes 1-8 and 20% for nodes 9-13? It's too late now, but tomorrow I'll try to write a small SMILE program and play with learning the networks.
Christian
Posts: 44
Joined: Wed Nov 28, 2007 12:32 pm

Post by Christian »

Yes exactly.

Or if you want to use 100% the same data count as I have used:

1.157.000 Datasets with Node1 to Node8, Node9 to Node 13 empty
237.000 Datasets with data for Node1 to Node13

so:

83% Datasets with Node1 to Node8. Node9 to Node13 is empty
17% Datasets with all Data from Node1 to Node13

So Node1 to Node8 is filled to 100% and Node9 to Node13 to 17%.

But I think 80/20 is a correct approximation :D
mark
Posts: 179
Joined: Tue Nov 27, 2007 4:02 pm

Post by mark »

I'm running some experiments now, but as you indicated, it's kinda slow. I hope to report on my progress tomorrow.
mark
Posts: 179
Joined: Tue Nov 27, 2007 4:02 pm

Post by mark »

After profiling it looks like the code could be made between 1 and 2 times faster than it is now, possibly more. The exact number is difficult to estimate at the moment. I'll try to apply some optimizations over the weekend.

Another useful possibility may be the introduction of another parameter, namely precision. Right now the EM algorithm is converged if the likelihood change is smaller than a certain threshold. This threshold could be defined by the user, e.g., low - medium - high precision. Low precision stops the learning process earlier and will therefore learn the fastest. What do you think about that?

Or how about the user can stop the learning anytime s/he wants and then the parameters learned up to that moment are returned.

On the other hand, all these options may be confusing for users.
Christian
Posts: 44
Joined: Wed Nov 28, 2007 12:32 pm

Post by Christian »

Hello Marc,

1-2 times faster sounds great.

About entering precision: I am sure that I will enter everytime highest precision as otherwise I could never be sure if a wrong net data is based on stopping em-algorithm to early.

So I would not use this option.

But if you want to implement something new then, maybe: save current em-algorithm state and reload it later to continue. But if your training improovements are such good, maybe this is not needed anymore.

I had a training that took over 5 days. Then I need to reboot my computer, so I needed to break everything and start again.

Maybe a other usefull thing could be setting the thread priority while training. I can do this easily via task manager. But maybe other don't know that feature. When they train a net and only have one core, I think they can not use their computer.

Sadly it is not posible to use more than one threads/cores while training.

By the way, I have a x64 bit OS (Vista) and more than 4GB RAM. Can I now build networks that have a size more than 4GB using Genie / Smile.net?

About speeding up the current training process,
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

Christian wrote:By the way, I have a x64 bit OS (Vista) and more than 4GB RAM. Can I now build networks that have a size more than 4GB using Genie / Smile.net?
You can't. The only 64-bit SMILE version we have at the moment is for Linux.
Christian
Posts: 44
Joined: Wed Nov 28, 2007 12:32 pm

Post by Christian »

Oh, thats sad. But when I am using smile.net and running this on Windows using the x64 Version of DotNet then this smile.net version should also running in 64 bit mode and so I should be able to use the bigger memory?
shooltz[BayesFusion]
Site Admin
Posts: 1417
Joined: Mon Nov 26, 2007 5:51 pm

Post by shooltz[BayesFusion] »

Christian wrote:Oh, thats sad. But when I am using smile.net and running this on Windows using the x64 Version of DotNet then this smile.net version should also running in 64 bit mode and so I should be able to use the bigger memory?
Smile.net is just thin wrapper over Win32 version of SMILE. The only benefit of having x64 in this context would be the ability to run multiple instances of Smile-based apps running at the same time.
Christian
Posts: 44
Joined: Wed Nov 28, 2007 12:32 pm

Post by Christian »

Ah, okay. I am already doing that. I have three networks. Then I first calculate network one and use the result in network two. To do so I am using simulated soft evidence I described here:
Christian wrote: By the way, to simulate soft evidence I have found a good way:

For example if I want to set 30% state1 and 70% state2. I just set 100% state1. Then I multiply the results (in target node) by 0.3. After that I set 100% state2 and multiply these results by 0.7.

The adding both values and I have soft evidence. Sounds very simple :)

Maybe I am making a big mistake in my thoughts, but if not you could very easy adapt that in smile.
This kind of soft evidence is ok, am I right?
mark
Posts: 179
Joined: Tue Nov 27, 2007 4:02 pm

Post by mark »

Two things.

1. I compared your approach of soft evidence to the example I gave a while back in another thread (http://genie.sis.pitt.edu/forum/viewtopic.php?t=5). It seems that both approaches are identical for this simple case. It could be that it's true in general, but I don't have a formal proof for that.

2. I cleaned up the EM algorithm, made a few small optimizations, and also fixed a bug on the way. I'm not so sure what the impact is of my optimizations, so would you be able to give it a shot? I will ask the person in charge to make a new GeNIe (and SMILE) release tomorrow. Hopefully, more optimizations will follow.
Christian
Posts: 44
Joined: Wed Nov 28, 2007 12:32 pm

Post by Christian »

Thank you for both.

As far as I have tested it my approach for soft evidence seems to work. So I will use it in production environment. It is easier to implement that your demonstrated soft evidence as I can use it with every node without adaptations.

As soon as the new Genie/Smile Version is released I will test the networks and so I will be able to give you the improvement results.

You wrote you also fixed bug. Is there a need to recalculate already trained networks or was it a small bug that has no big influence?
Post Reply