novice question...

The engine.
Post Reply
nikkne
Posts: 19
Joined: Tue Mar 18, 2008 8:03 pm

novice question...

Post by nikkne »

Hi,
I'm new to SMILE and Influence Diagrams, so I have one naive question... I have a structure of ID, and I would now like to populate the CPTs. In every sampling interval, I get the values of all observables which are described in the ID. How to populate the tables? What is the general approach?

Cheers,
Nikola
mark
Posts: 179
Joined: Tue Nov 27, 2007 4:02 pm

Post by mark »

Can't you start out with a Bayesian network, learn the parameters, and then make it an influence diagram?
nikkne
Posts: 19
Joined: Tue Mar 18, 2008 8:03 pm

Post by nikkne »

Oh, I think I've used the wrong words in my previous post, and I'm afraid I don't understand your answer...

Let say that I have the structure of the ID/BN, and populated CPTs for each node. Now, I get new observations, and I want to embed it into the CPTs. How to do that? Should I start from the root-chance nodes, and go down the network, updating the counts for each node? Or there is some more elegant way?

Cheers,
Nikola
mark
Posts: 179
Joined: Tue Nov 27, 2007 4:02 pm

Post by mark »

Updating counts and normalizing them will work fine as long as the data is complete.
nikkne
Posts: 19
Joined: Tue Mar 18, 2008 8:03 pm

Post by nikkne »

I have two questions...
1. how exactly Normalize() works?
2. Is there a better way of updating CPTs for a node, then this algorithm:

Code: Select all

count <- number of observations for a Node
foreach dimension in CPT
   p <- probability in the CPT
   if this is the dimension we need to update
      p <- (p*count+1)/(count+1)
   else
      p <- p*count/(count+1)
   end if
   probability in the CPT <- p
end foreach
Since there is a lot of multiplying and dividing of doubles, we can loose precision. Also, count needs to be kept for each node in a separate global table. Is there any better method?[/code]
mark
Posts: 179
Joined: Tue Nov 27, 2007 4:02 pm

Post by mark »

Let me clarify what I meant with counts. Let's say we have three variables, A, B, C, and A, B, are parents of C. To estimate the entry in the CPT for C where the states of all the nodes are 1, you would have to do the following: Count(a1,b1,c1) / Count(a1,b1), where Count is the number of joint occurrences in the data set. The division is the normalization and you never have to actually multiply with the probabilities in the CPTs.
nikkne
Posts: 19
Joined: Tue Mar 18, 2008 8:03 pm

Post by nikkne »

Thanks Mark. I understand this.
I was referring to the case where I cannot go through the complete dataset (online incremental learning).
mark
Posts: 179
Joined: Tue Nov 27, 2007 4:02 pm

Post by mark »

In that case you can simply maintain tables of counts and then update and normalize them every time new data comes in.
Post Reply