Distance and entropy-based measures

<< Click to Display Table of Contents >>

Navigation:  Using SMILE Wrappers > Diagnosis >

Distance and entropy-based measures

The diagnostic measures calculated for observations depend on the pursued fault set. Pursued faults define the focus of reasoning, and changes in their probabilities serve as inputs to the measure algorithms. To select the algorithm, use DiagNetwork.set_single_fault_algorithm and DiagNetwork.set_multi_fault_algorithm. The output of each algorithm is a single number for every uninstantiated observation. The identifiers of the algorithms are defined in DiagNetwork class.

Single-fault diagnosis algorithms:

Max probability change, identified by SingleFaultAlgorithmType.PROB_CHANGE (default): For each observation, the maximum change is taken over the absolute values of the probability changes of the pursued fault. This is a signed measure; values may be negative when the largest-magnitude change corresponds to a probability decrease.

Cross-entropy, identified by SingleFaultAlgorithmType.CROSSENTROPY: An information-theoretic, unsigned measure that considers both the information content of individual observation states and the likelihood of observing them. For example, a positive cancer test may cause a large probability change, but if the test is unlikely in a generally healthy person, the expected diagnostic value is small.

Normalized cross-entropy identified by SingleFaultAlgorithmType.NORM_CROSSENTROPY: The cross-entropy divided by the current entropy of the pursued fault node.

Multi-fault diagnosis algorithms:

Max probability change, identified by MultiFaultAlgorithmType.MAX_PROB_CHANGE (default):  The measure value is the maximum probability change over all pursued faults and observation outcomes. This is a signed measure.

Euclidean distance (L2 norm), identified by MultiFaultAlgorithmType.L2_NORMALIZED_DISTANCE: Computes the Euclidean distance between fault probability vectors before and after the observation. Distances are normalized so that a change from impossible (all probabilities zero) to certain (all probabilities one) equals 1.0. The maximum distance over all outcomes is selected for each observation. Larger distances indicate a greater impact.

Cityblock distance, identified by MultiFaultAlgorithmType.CITYBLOCK_DISTANCE: Similar to L2, but using the cityblock (Manhattan) metric.

Averaged L2 and cityblock distance, identified by MultiFaultAlgorithmType.AVG_L2_CITY_DISTANCE: Combines the two distance measures.

Cosine distance (cosine similarity), identified by MultiFaultAlgorithmType.COSINE_DISTANCE: Calculated between two fault probability vectors. Probabilities are non-negative, so the measure is always non-negative.

Entropy-based measures: A family of six entropy-based measures, which  require calculation of the joint probability distribution over all pursued faults, which is computationally prohibitive, it is necessary to use approximations of the joint probability distribution. The approximations are based on two strong assumptions about dependencies among them: (1) complete independence (this is taken by the first group of approaches) and (2) complete dependence (this is taken by the second group of approaches). Each of the two extremes is divided into three groups: (1) At Least One, (2) Only One, and (3) All. These refer to different partitioning of the combinations of faults in cross-entropy calculation. The identifiers of the algorithms are:

MultiFaultAlgorithmType.INDEPENDENCE_AT_LEAST_ONE

MultiFaultAlgorithmType.INDEPENDENCE_ONLY_ONE

MultiFaultAlgorithmType.INDEPENDENCE_ONLY_ALL

MultiFaultAlgorithmType.DEPENDENCE_AT_LEAST_ONE

MultiFaultAlgorithmType.DEPENDENCE_AT_LEAST_ONE

MultiFaultAlgorithmType.DEPENDENCE_AT_LEAST_ONEf

Marginal probability-based measures: Much faster than the independence/dependence-based joint probability distribution approaches but it is not as accurate because they make a stronger assumption about the joint probability distribution. Entropy calculations in this approach are based purely on the marginal probabilities of the pursued faults. The two algorithms that use the Marginal Probability Approach differ essentially in the function that they use to select the tests to perform. Both functions are scaled so that they return values between 0 and 1. Entropy/Marginal 1 uses a function without the support for maximum distance and its minimum is reached when all probabilities of the faults are equal to 0.5. Entropy/Marginal 2 uses a function that has support for maximum distance and is continuous in the domain [0,1]. The identifiers of the algorithms are MultiFaultAlgorithmType.MARGINAL_1 and MultiFaultAlgorithmType.MARGINAL_2.