Any idea to create a truncated distribution?
Any idea to create a truncated distribution?
Hi anyone in the forum
I want to create a truncated distribution  e.g. a normal distribution N(5,1) be truncated at a threshold, say 3, so that there is no frequency of occurrence before the threshold (3). Making it bit more complex is that the threshold itself is stochastic subjected to a distribution (rather than a fixed number e.g. 3).
Is there anyone who could advice how to realize it?
Attached is an example net to demonstrate the problem.
I initially thought it should be easy  the equation node for the truncated distribution is formulized as such: if the value of the original distribution node >= the value of threshold node, it equals the the value of the original distribution node, otherwise it equals zero. Then I reset the lower bound of the equation domain to a nonzero value that is smaller then the smallest possible threshold (1 in the example net) in order to discard the accumulated frequency of the zero value,
However, the above doesn't work. The frequency of the zero value is still present (as evidenced comparing the left and right sets of nodes in the example). The lower bound of domain doesn't seem to have the supposed effect (I also found  once the network is discretized the zero value does get chopped off, but I don't really want it to be discretised). The higher the threshold, the higher the frequency of the zero value which totally distorted the truncated distribution.
In the example I produced two sets of nearly identical nets (left and right) for comparison. In each, the top one is without evidence for the threshold node, and the bottom one setting the evidence at 4 for a further test.
The only difference between the left and right is that the right set has the lower bound equation domain set at zero (instead of 1 in the left set).
The only way I can create a truncated distribution shape without the frequency of zero value is to use a chance node (as shown in the example) to redistribute the frequency of zero equally to nonzero states (as shown the chance node in the right set). However by doing this I lost the numerical nature of the distribution and cannot bring it to further calculation.
Thanks
Charlie
I want to create a truncated distribution  e.g. a normal distribution N(5,1) be truncated at a threshold, say 3, so that there is no frequency of occurrence before the threshold (3). Making it bit more complex is that the threshold itself is stochastic subjected to a distribution (rather than a fixed number e.g. 3).
Is there anyone who could advice how to realize it?
Attached is an example net to demonstrate the problem.
I initially thought it should be easy  the equation node for the truncated distribution is formulized as such: if the value of the original distribution node >= the value of threshold node, it equals the the value of the original distribution node, otherwise it equals zero. Then I reset the lower bound of the equation domain to a nonzero value that is smaller then the smallest possible threshold (1 in the example net) in order to discard the accumulated frequency of the zero value,
However, the above doesn't work. The frequency of the zero value is still present (as evidenced comparing the left and right sets of nodes in the example). The lower bound of domain doesn't seem to have the supposed effect (I also found  once the network is discretized the zero value does get chopped off, but I don't really want it to be discretised). The higher the threshold, the higher the frequency of the zero value which totally distorted the truncated distribution.
In the example I produced two sets of nearly identical nets (left and right) for comparison. In each, the top one is without evidence for the threshold node, and the bottom one setting the evidence at 4 for a further test.
The only difference between the left and right is that the right set has the lower bound equation domain set at zero (instead of 1 in the left set).
The only way I can create a truncated distribution shape without the frequency of zero value is to use a chance node (as shown in the example) to redistribute the frequency of zero equally to nonzero states (as shown the chance node in the right set). However by doing this I lost the numerical nature of the distribution and cannot bring it to further calculation.
Thanks
Charlie
 Attachments

 Truncated distribution.xdsl
 (15.53 KiB) Downloaded 117 times

 Site Admin
 Posts: 1241
 Joined: Mon Nov 26, 2007 5:51 pm
Re: Any idea to create a truncated distribution?
Try using TruncNormal(5,1,3) distribution  see the attached image for sampling results (the cutoff to the left of 3.0 is real, it's not just a histogram displaying a subset of samples). Any of the three fixed parameters in this example can be replaced by an expression, so you're not limited to fixed threshold of 3.I want to create a truncated distribution  e.g. a normal distribution N(5,1) be truncated at a threshold, say 3, so that there is no frequency of occurrence before the threshold (3). Making it bit more complex is that the threshold itself is stochastic subjected to a distribution (rather than a fixed number e.g. 3
Re: Any idea to create a truncated distribution?
Thank you. TruncNormal does appear to be the solution. But please see the attached net: a TruncNormal node fed with a stochastic threshold node for its min parameter. The threshold node's distribution starts at 3. The "test" chance node is to verify whether the CPT correctly reflects the truncation  each of its ten states is onetoone correspondent to the 10 discretized intervals of the TruncNormal node.
You may see when there is an evidence for the threshold node (right side one), the test node correctly represents the truncation. However, when there is no evidence (left side one), the test node strangely shows a 52% chance allocated to the first state which corresponds to 0 to 1 interval of the truncNormal node, which should have been 0% as the threshold node's distribution starts at 3. Why is that?
Charlie
You may see when there is an evidence for the threshold node (right side one), the test node correctly represents the truncation. However, when there is no evidence (left side one), the test node strangely shows a 52% chance allocated to the first state which corresponds to 0 to 1 interval of the truncNormal node, which should have been 0% as the threshold node's distribution starts at 3. Why is that?
Charlie
 Attachments

 Truncated distribution1.xdsl
 (5.71 KiB) Downloaded 124 times

 Site Admin
 Posts: 1241
 Joined: Mon Nov 26, 2007 5:51 pm
Re: Any idea to create a truncated distribution?
The TruncNormal distribution generates samples by drawing a normal sample specified by its two first arguments (mean, stddev), then compares it agains the third argument (the cutoff). If the value is greater than cutoff, the sample is accepted, otherwise the procedure is repeated. To avoid infinite (or very longrunning) loop, TruncNormal first checks the cutoff, and if it's equal to or greater than mean, the "invalid sample value" is generated. You can actually see it in "Truncated Distribution"'s Value tab  the grid with sample values contains numerous orange cells with nan (not a number) values. This is caused by the "Threshold" node distribution, which generates its values from the [3..9] range, and "Truncated Distribution" has a mean of 5. Any sample generated for "Threshold" which is greater or equal than 5 causes "Truncated Distribution" to become invalid.
While continuous nodes store the invalid samples in their sample vectors, the discrete nodes can't do this. This causes the "test" node to have a bias for its s0_1 state. We're reviewing the issue now to check if there's any reasonable way to handle invalid samples in a model configuration like the one you've attached.
An easy way to fix this is to enable the "Reject outofbounds and invalid samples" at the Inference tab in network properties. If you enable it for your model, you'll be able to see that "Truncated Distribution" has roughly 48% of valid samples (see the samples grid at the Value tab). The discrete "test" node will behave correctly.
Another option is to change the distribution of the "Threshold" node to ensure it does not produce samples greater than 5.
While continuous nodes store the invalid samples in their sample vectors, the discrete nodes can't do this. This causes the "test" node to have a bias for its s0_1 state. We're reviewing the issue now to check if there's any reasonable way to handle invalid samples in a model configuration like the one you've attached.
An easy way to fix this is to enable the "Reject outofbounds and invalid samples" at the Inference tab in network properties. If you enable it for your model, you'll be able to see that "Truncated Distribution" has roughly 48% of valid samples (see the samples grid at the Value tab). The discrete "test" node will behave correctly.
Another option is to change the distribution of the "Threshold" node to ensure it does not produce samples greater than 5.
Re: Any idea to create a truncated distribution?
Thank you so much for the detailed explanation. The suggested solution of rejecting outofbound and invalid samples also works well  a great feature.
Charlie
Charlie
Re: Any idea to create a truncated distribution?
With your advice, I now realized that the problem
I now go back to the first net but set the lower bound at a vary small nonzero number (0.01 in my case) and check the box of "exclude outofbound and invalid samples". This seems to have resolved my problem perfectly. However, as you pointed out, the "value" tab of the "truncated distribution" node shows I now only got roughly 64% of valid samples. This means 44% of samples were made at zero and wasted. So here is my questions:
Charlie
 with the net I sent first is that GeNIe samples outside the lower bound which was 1 (upper bound being 10) in my case and doesn't cast those samples at zero.
 with the net I sent next is that TruncNormal doesn't allow "cutoff" to go beyond the mean. Any of those samples will become "nan" and gets dumped in the first state of the discrete node.
I now go back to the first net but set the lower bound at a vary small nonzero number (0.01 in my case) and check the box of "exclude outofbound and invalid samples". This seems to have resolved my problem perfectly. However, as you pointed out, the "value" tab of the "truncated distribution" node shows I now only got roughly 64% of valid samples. This means 44% of samples were made at zero and wasted. So here is my questions:
 why GeNIe samples outside the bound anyway in the first place? Is this a waste of resources let along causing problems?
 why such a large proportion (44%) of samples is located at zero for an equation domain ranging from 0 to 10?
Charlie

 Site Admin
 Posts: 282
 Joined: Tue Dec 11, 2007 4:24 pm
Re: Any idea to create a truncated distribution?
This is a theoretical problem that is hard to solve in general. There are no general purpose sampling algorithms from truncated distributions, so each time that you sample from a distribution and want to stay in bounds, rejecting invalid samples is pretty much the best that you can do.why GeNIe samples outside the bound anyway in the first place? Is this a waste of resources let along causing problems?
The bar at zero in your discrete node is also a bug  we display the NANs incorrectly as zero. We will fix this in the upcoming release.why such a large proportion (44%) of samples is located at zero for an equation domain ranging from 0 to 10?
Cheers,
Marek
Re: Any idea to create a truncated distribution?
Thank you Marek for the response. Really appreciate it.
Charlie
Charlie