unbalanced data and parameter learning

The front end.
Post Reply
manman
Posts: 7
Joined: Mon Jun 17, 2019 7:07 am

unbalanced data and parameter learning

Post by manman » Sat Jun 22, 2019 10:04 am

Hello! I am looking for suggestions of dealing with unbalanced data.

When I conducting the validation, if the selected class has two variables,state 1 and state 0, the accuracy of each state is 0.33 and 0.90, due to the unbalanced dataset (much more state 0 than state The model shows the total accuracy of state 0 and state 1 which is 0.86. However the result is not that useful, because we expect high accuracy of state 1. Could anyone give me more suggestions about how to improve the accuracy of state 1?

I tried to use the balanced data to conduct parameter learning, the accuracy of each state is balanced, is this method logical? can anyone give me some suggestions or literature to support this method?

I really appreciate for any suggestions!

marek [BayesFusion]
Site Admin
Posts: 271
Joined: Tue Dec 11, 2007 4:24 pm

Re: unbalanced data and parameter learning

Post by marek [BayesFusion] » Mon Jun 24, 2019 9:43 pm

There is no direct support for unbalanced classes in GeNIe, so you will have to use general machine learning results for unbalanced classes and prepare your training data set accordingly. If you prepare a balanced data set to learn parameters, please keep in mind that your prior probabilities of classes will not reflect the frequencies in the test data. I would perhaps look for a different decision threshold than what GeNIe does (GeNIe picks the most likely class). You can get an idea of what threshold will be good for you by looking at the ROC curves. You can create an output data set when validating and then "manually" (e.g., in Excel) change the decision criterion.
I hope this helps,

Marek

manman
Posts: 7
Joined: Mon Jun 17, 2019 7:07 am

Re: unbalanced data and parameter learning

Post by manman » Tue Jun 25, 2019 9:47 am

Hey Marek, Thanks a lot for your suggestions!

About the " prior probabilities of classes will not reflect the frequencies in the test data", I think that is a really good point to consider.
The prior probability of selected class (two state) does changed after balancing, but the joint distribution of selected class (two state) is not changed during the balancing process. Do you think it is ok for the classification result?

Looking forward to your further suggestions!

Post Reply