How to calculate the accuracy?

The engine.
Post Reply
hylx
Posts: 10
Joined: Fri May 23, 2008 4:32 pm

How to calculate the accuracy?

Post by hylx »

Hi,

I learned a network with GREEDY algorithm and I want to test the accuracy of this model. In this network, I have a class variable named "motivation_level" to which I want to use other variables to test the accuracy of the prediction. How can I test the accuracy, that is, how can I evaluate my model using SMILE?

Thanks. Any help will be appreciated.
mark
Posts: 179
Joined: Tue Nov 27, 2007 4:02 pm

Post by mark »

There is no build-in functionality in SMILE to test the accuracy. But how about you compare the prediction of the learned model to the actual value for motivation_level in the data set?
hylx
Posts: 10
Joined: Fri May 23, 2008 4:32 pm

Post by hylx »

Thanks for your quick response. I know there is no built-in functionality to compute accuracy. But how to "predict the learned model" as you said?

Cheers.
mark
Posts: 179
Joined: Tue Nov 27, 2007 4:02 pm

Post by mark »

Load a record of the data set into the learned network except for the target variable (in this case motivation_level). Then perform inference and see if the state with the highest posterior probability is the state that was recorded in the data set. Repeat this procedure for all records in the data set. This is a simple approach and variations are possible.

(This approach assumes that you learned the parameters as well.)
kile
Posts: 19
Joined: Sat Apr 25, 2009 3:36 pm

Post by kile »

Hi

I just looking throw the forum found this topic. I was trying to do something similar, but I don't know what I could be doing wrong.

First I learn my bn using a file dataset and greedy.

Code: Select all

greedy.Learn(m_dataset,result)
Once I get the result bn I read the initial file and go through each line setting the evidence, but not the target variable:

Code: Select all

result.SetTarget(TARGET);
for each line:
{
   readline()
   for each column:
   {
      value=column.value;
       if (col!=TARGET)
           result.GetNode(i)->Value()->SetEvidence(value);
      else  
           targetValue=value;
    }
    result.UpdateBeliefs();
    int value=result.GetNode(TARGET)->Value()->GetEvidence();
    if (value==targetValue)
          ... ok it works good
    else
          ... we got a fail
}
So basically I try to read each line, put the evidences, and get the target value to check with the file known value.
But in GetEvidence() I just get -2, I dont know if I should use another method or it propagate the evidence by itself.
kile
Posts: 19
Joined: Sat Apr 25, 2009 3:36 pm

Post by kile »

I think I could answer myself :)
Instead of using GetEvidence (It's normal as you didn't declare it O:) )

Code: Select all

DSL_Dmatrix* mat=result.GetNode(TARGET)->Value()->GetMatrix() ;
double value_state0=matriz->GetItems().Subscript(0);
mark
Posts: 179
Joined: Tue Nov 27, 2007 4:02 pm

Post by mark »

Yes, that was the problem.
kile
Posts: 19
Joined: Sat Apr 25, 2009 3:36 pm

Post by kile »

Hi all,

I'm making a simple test for check the percentage of the bn created.
For testing I just learn with whole data file:

Code: Select all

crossValid->readFile("data.txt");
if (greedy.Learn(*crossValid->m_dataSet,result)!=DSL_OKAY)
{
     ExitProcess(0);
}

int TARGET_INDEX=result.FindNode("Class");
result.SetTarget(TARGET_INDEX);
result.WriteFile("test.xdsl");
After that if I open the test.xdsl file and I go putting the evidences from the file data.txt by hand, I get the correct results on the Class node.
But just after the code below I want to test it by hand in my program so I do the following (I rewrite by hand some part of code for clarity):

Code: Select all

StreamReader* file = new StreamReader("data.txt");

while ( line = file->ReadLine() )
{
...	
   result.ClearAllEvidence();
   float targetStatus[TARGET_NUM_STATUS]; // NUM_STATUS=2

    while (s is string In Column)
    {
	 int value;
         if (s->Equals("Yes"))
             value=0;
         else if (s->Equals("No"))
	     value=1;

         if (i==TARGET_INDEX)
	 {
            if (value==1)
            {
                 targetStatus[0]=1;
                 targetStatus[1]=0;
            }
            else
            {
                 targetStatus[0]=0;
                 targetStatus[1]=1;
            }
          }
          else
                result.GetNode(i)->Value()->SetEvidence(value);

           i++;
     }
     
     result.UpdateBeliefs();
     DSL_Dmatrix* mat=result.GetNode(TARGET_INDEX)->Value()->GetMatrix() ;
     float inferenceStatus[TARGET_NUM_STATUS];
     inferenceStatus[0]=matriz->GetItems().Subscript(0);
     inferenceStatus[1]=matriz->GetItems().Subscript(1);
     float difference=euclideanDistance(targetStatus,inferenceStatus,2);
     if (difference<=1)
     {
          correct++;
      }
   }
   numTotal++;
}

float porcentage=correct/(float)numTotal;
So basically I just go through the file I used for creating and learning the network and I put all the evidences but not the Class one, so I just update the Beliefs and after I just compare both values using euclidean distance.
The problem is that with that method I should get 100% but I don't get it just 52 or whatever :(

Any help about what I could be doing wrong?

Thank u very much in advance
mark
Posts: 179
Joined: Tue Nov 27, 2007 4:02 pm

Post by mark »

Is it possible that you're not doing anything wrong, but the predictions you're making are simply imperfect? Please note that the correct value for the target node is 0 or 1, but that you will not obtain these values after performing inference (you'll get a probability distribution).

Also, why are you checking for difference <= 1?
kile
Posts: 19
Joined: Sat Apr 25, 2009 3:36 pm

Post by kile »

Hi Mark!

I'll try to explain myself better. (Btw I made mistake it shouldn't be difference<=1 but difference<=0.5)

The Class Node will have 2 values, Yes or No, but they're probability, because they're 2 different states not just one value that can be 0 or 1.
So if I read from the file that the expected value it's Yes, the probabilities should be:

Code: Select all

State(Yes)=1
State(No)=0

And for No:
State(Yes)=0
State(No)=1
So if I get Class=YES in one line, and probabilities (GetNodes->Value->Matrix), lets say:

Code: Select all

ProbYes=matriz->GetItems().Subscript(0); <-- 0.8
ProbNo=matriz->GetItems().Subscript(1); <-- 0.2
The distance =
sqrt( (State(YES)-ProbYes)^2 + (State(NO)-ProbNo)^2)=
sqrt( (1-0.8)^2 + (0-0.2)^)=0.2828 (It's correct because diff<=0.5)
In a wrong case: Read Class=NO and probabilities from inference:

Code: Select all

ProbYes=0.6
ProbNo=0.4

Distance=sqrt( (0-0.6)^2 + (1-0.4)^2) = 0.848 >0.5 so Wrong estimation.
----

The thing is, just after making the " result.WriteFile("test.xdsl"); " if I go and open that file, and introduce one by one each evidence I get the right expected Class as in the file, so I just want to get the same in the code that should be possible if I can get in the Genie interface no?

I hope it's clear, if not let me know, I'm really messy with that problem :(
kile
Posts: 19
Joined: Sat Apr 25, 2009 3:36 pm

Post by kile »

Another example X)
The first line of the file is:

Code: Select all

No No Yes No Yes
Where the last column is the class

So I go to Genie, and put the evidences of the first 4 nodes, and click Update Beliefs and I get in the class node:

Code: Select all

Yes: 85%
No: 15%
But I read the file in my program and call SetEvidence for the same nodes and call getValue->getMatrix in the Class Node, I get:

Code: Select all

(Yes) 0: 0.38
(No) 1: 0.62
I don't really get why I don't get the same values if the network I'm using it's the same :S
mark
Posts: 179
Joined: Tue Nov 27, 2007 4:02 pm

Post by mark »

Obviously, you should be able to get the same values in SMILE that you get in GeNIe. To judge what is going wrong, I need some more code and the data and network as well. Are you sure you are setting the evidence correctly?
kile
Posts: 19
Joined: Sat Apr 25, 2009 3:36 pm

Post by kile »

Hi Mark!

Here it's the code:

Code: Select all

	if (m_dataset.ReadFile(filename)!=DSL_OKAY)
		ExitProcess(0);
	
	if (greedy.Learn(*crossValid->m_dataSet,result)!=DSL_OKAY)
	{
		ExitProcess(0);
	}

	//result.SetTarget(m_studyData->getNumAttributes()-1);
	int TARGET_INDEX=result.FindNode("Paliza");
	result.SetTarget(TARGET_INDEX);
	result.WriteFile("D://testest.xdsl");

	StreamReader* file = new StreamReader("D://test2.txt");
	
	String* line;

	bool first=true;
	int numTotal=-1; // First row it's 0
	int correct=0;

	while ( line = file->ReadLine() )
	{
		if (first)
		{
			first=false;
		}
		else
		{
			String *split[];
			String* delimStr = S" ";
			Char delimiter[] = delimStr->ToCharArray();
			split=line->Split(delimiter,20);

			result.ClearAllEvidence();
		
			IEnumerator* myEnum = split->GetEnumerator();
			int i=0;

			float targetStatus[TARGET_NUM_STATUS];

			while (myEnum->MoveNext())
			{
				String* s = __try_cast<String*>(myEnum->Current);
				int value;

				if (s->Equals("Si"))
					value=0;
				else if (s->Equals("No"))
					value=1;

				if (i==TARGET_INDEX)
				{
					if (value==1)
					{
						targetStatus[0]=1;
						targetStatus[1]=0;
					}
					else
					{
						targetStatus[0]=0;
						targetStatus[1]=1;
					}
				}
				else
					result.GetNode(i)->Value()->SetEvidence(value);

				i++;
			}

			
			result.UpdateBeliefs();
			DSL_Dmatrix* matriz=result.GetNode(TARGET_INDEX)->Value()->GetMatrix() ;
						
			float inferenceStatus[TARGET_NUM_STATUS];
			inferenceStatus[0]=matriz->GetItems().Subscript(0);
			inferenceStatus[1]=matriz->GetItems().Subscript(1);
			
			float difference=euclideanDistance(targetStatus,inferenceStatus,2);

			if (difference<=0.5)
			{
				correct++;
			}
		}
		numTotal++;
	}
I've also included the vs.2003 project in case you wanna check whole, but the main stuff is the one I copy below.

http://kile.stravaganza.org/temp/bayesants.zip
mark
Posts: 179
Joined: Tue Nov 27, 2007 4:02 pm

Post by mark »

Are you sure that handle 0 in the network corresponds to the first column in the data set? And similarly, for the other handles? Handle 0 could refer to a different variable in the network than the variable name in the first column of the data set, and it's not obvious to me that they match.
kile
Posts: 19
Joined: Sat Apr 25, 2009 3:36 pm

Post by kile »

Hi Mark!

I've checked that this node it's exactly the one i need using next sentence too for security:

Code: Select all

const char* name=result.GetNode(i)->GetId();
And it gaves me the same order that it's reading (for the variable name). But as you point I think my problem is coming not from the variable names but from the state names like another user asked in previous post:

http://genie.sis.pitt.edu/forum/viewtopic.php?t=217

If the first word read is NO and secon YES will be (0: NO, 1:YES) but if is different order the states indices will change too.
I'll try to manage how to load them and use without any dependency of how they'll read.

thank u for your answer, I'll write some results as soon i'll get them working ;)
Post Reply