Any idea to create a truncated distribution?

charlie · Post by **charlie** » Fri Feb 22, 2019 2:20 am

Hi anyone in the forum

I want to create a truncated distribution - e.g. a normal distribution N(5,1) be truncated at a threshold, say 3, so that there is no frequency of occurrence before the threshold (3). Making it bit more complex is that the threshold itself is stochastic subjected to a distribution (rather than a fixed number e.g. 3).

Is there anyone who could advice how to realize it?

Attached is an example net to demonstrate the problem.

I initially thought it should be easy - the equation node for the truncated distribution is formulized as such: if the value of the original distribution node >= the value of threshold node, it equals the the value of the original distribution node, otherwise it equals zero. Then I reset the lower bound of the equation domain to a non-zero value that is smaller then the smallest possible threshold (1 in the example net) in order to discard the accumulated frequency of the zero value,

However, the above doesn't work. The frequency of the zero value is still present (as evidenced comparing the left and right sets of nodes in the example). The lower bound of domain doesn't seem to have the supposed effect (I also found - once the network is discretized the zero value does get chopped off, but I don't really want it to be discretised). The higher the threshold, the higher the frequency of the zero value which totally distorted the truncated distribution.

In the example I produced two sets of nearly identical nets (left and right) for comparison. In each, the top one is without evidence for the threshold node, and the bottom one setting the evidence at 4 for a further test.

The only difference between the left and right is that the right set has the lower bound equation domain set at zero (instead of 1 in the left set).

The only way I can create a truncated distribution shape without the frequency of zero value is to use a chance node (as shown in the example) to redistribute the frequency of zero equally to non-zero states (as shown the chance node in the right set). However by doing this I lost the numerical nature of the distribution and cannot bring it to further calculation.

Thanks

Charlie

Fri Feb 22, 2019 11:30 am

I want to create a truncated distribution - e.g. a normal distribution N(5,1) be truncated at a threshold, say 3, so that there is no frequency of occurrence before the threshold (3). Making it bit more complex is that the threshold itself is stochastic subjected to a distribution (rather than a fixed number e.g. 3

Try using TruncNormal(5,1,3) distribution - see the attached image for sampling results (the cutoff to the left of 3.0 is real, it's not just a histogram displaying a subset of samples). Any of the three fixed parameters in this example can be replaced by an expression, so you're not limited to fixed threshold of 3.

: truncnormal.png (62.16 KiB) Viewed 45413 times

charlie · Post by **charlie** » Fri Feb 22, 2019 8:21 pm

Thank you. TruncNormal does appear to be the solution. But please see the attached net: a TruncNormal node fed with a stochastic threshold node for its min parameter. The threshold node's distribution starts at 3. The "test" chance node is to verify whether the CPT correctly reflects the truncation - each of its ten states is one-to-one correspondent to the 10 discretized intervals of the TruncNormal node.

You may see when there is an evidence for the threshold node (right side one), the test node correctly represents the truncation. However, when there is no evidence (left side one), the test node strangely shows a 52% chance allocated to the first state which corresponds to 0 to 1 interval of the truncNormal node, which should have been 0% as the threshold node's distribution starts at 3. Why is that?

Charlie

Fri Feb 22, 2019 9:57 pm

The TruncNormal distribution generates samples by drawing a normal sample specified by its two first arguments (mean, stddev), then compares it agains the third argument (the cutoff). If the value is greater than cutoff, the sample is accepted, otherwise the procedure is repeated. To avoid infinite (or very long-running) loop, TruncNormal first checks the cutoff, and if it's equal to or greater than mean, the "invalid sample value" is generated. You can actually see it in "Truncated Distribution"'s Value tab - the grid with sample values contains numerous orange cells with nan (not a number) values. This is caused by the "Threshold" node distribution, which generates its values from the [3..9] range, and "Truncated Distribution" has a mean of 5. Any sample generated for "Threshold" which is greater or equal than 5 causes "Truncated Distribution" to become invalid.

While continuous nodes store the invalid samples in their sample vectors, the discrete nodes can't do this. This causes the "test" node to have a bias for its s0_1 state. We're reviewing the issue now to check if there's any reasonable way to handle invalid samples in a model configuration like the one you've attached.

An easy way to fix this is to enable the "Reject out-of-bounds and invalid samples" at the Inference tab in network properties. If you enable it for your model, you'll be able to see that "Truncated Distribution" has roughly 48% of valid samples (see the samples grid at the Value tab). The discrete "test" node will behave correctly.

Another option is to change the distribution of the "Threshold" node to ensure it does not produce samples greater than 5.

charlie · Post by **charlie** » Fri Feb 22, 2019 11:07 pm

Thank you so much for the detailed explanation. The suggested solution of rejecting out-of-bound and invalid samples also works well - a great feature.

Charlie

charlie · Post by **charlie** » Sat Feb 23, 2019 9:16 pm

With your advice, I now realized that the problem

with the net I sent first is that GeNIe samples outside the lower bound which was 1 (upper bound being 10) in my case and doesn't cast those samples at zero.

with the net I sent next is that TruncNormal doesn't allow "cutoff" to go beyond the mean. Any of those samples will become "nan" and gets dumped in the first state of the discrete node.

I then realized TruncNormal function doesn't serve my purpose as I do want cutoff to go beyond the mean of normal distribution. I wonder why to have this restriction for TruncNormal.

I now go back to the first net but set the lower bound at a vary small non-zero number (0.01 in my case) and check the box of "exclude out-of-bound and invalid samples". This seems to have resolved my problem perfectly. However, as you pointed out, the "value" tab of the "truncated distribution" node shows I now only got roughly 64% of valid samples. This means 44% of samples were made at zero and wasted. So here is my questions:

why GeNIe samples outside the bound anyway in the first place? Is this a waste of resources let along causing problems?

why such a large proportion (44%) of samples is located at zero for an equation domain ranging from 0 to 10?

Thanks
Charlie

Post by **marek [BayesFusion]** » Mon Feb 25, 2019 5:52 pm

why GeNIe samples outside the bound anyway in the first place? Is this a waste of resources let along causing problems?

This is a theoretical problem that is hard to solve in general. There are no general purpose sampling algorithms from truncated distributions, so each time that you sample from a distribution and want to stay in bounds, rejecting invalid samples is pretty much the best that you can do.

why such a large proportion (44%) of samples is located at zero for an equation domain ranging from 0 to 10?

The bar at zero in your discrete node is also a bug -- we display the NANs incorrectly as zero. We will fix this in the upcoming release.
Cheers,

Marek

charlie · Post by **charlie** » Tue Feb 26, 2019 12:30 am

Thank you Marek for the response. Really appreciate it.
Charlie

BayesFusion Support Forum

Any idea to create a truncated distribution?

Any idea to create a truncated distribution?

Re: Any idea to create a truncated distribution?

Re: Any idea to create a truncated distribution?

Re: Any idea to create a truncated distribution?

Re: Any idea to create a truncated distribution?

Re: Any idea to create a truncated distribution?

Re: Any idea to create a truncated distribution?

Re: Any idea to create a truncated distribution?