Too many states in nature nodes

The front end.
Post Reply
asamedgoze
Posts: 2
Joined: Mon Jun 27, 2022 4:26 pm

Too many states in nature nodes

Post by asamedgoze »

I have a large sales dataset and i would like to create a database including not just only sales but also colour, fit, city, model, store and season. These data is for every product- store. So there are millions rows of data. When i divide them into nodes, network is created fine but the season. I am not sure how to include season into the network.
However since "product" node has thousands of state in itself, other free program i used for BN is insufficant to calculate it. So here is my question and i would be very happy if you can help me.

Isn't the right way creating databease with Bayesian Network? Can i use BN for this kind of database? Does BN only appropiate for many nodes and few states (i.e. yes/no) and not appropiate for small network (5-6 nature nodes) with thousands states?

And (my real questin) is your program able to that kind of calculation and representation?
I am asking because my computer is very old to just download and setup your program.
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: Too many states in nature nodes

Post by marek [BayesFusion] »

Generally, probabilistic representations are wonderfully efficient but only if there are independencies in the domain. Still, probability tables grow exponentially with the number of parents. The base of the exponent is the number of states in the node and parents. Having a node with thousands of states does not sound like a good idea, even for software as efficient and as fast as GeNIe. I doubt any software will be able to handle such a node if it is connected to other nodes. If it has parents, then the table in the node will grow. If it has children, these children will have thousands of conditional probability distributions.

I could perhaps give you more precise advice if you posted a (very) simplified model along the lines of what you want to do. I know that it will involve downloading and installing GeNIe but an alternative is to draw what you want. The added benefit of software like GeNIe is that it will make sure that your model is syntactically correct.
I hope this helps,

Marek
asamedgoze
Posts: 2
Joined: Mon Jun 27, 2022 4:26 pm

Re: Too many states in nature nodes

Post by asamedgoze »

Hi Marek,

I am adding an example of BN that i created. Store numbers are limited to 10 in this example but it is 400 actually. PHID shows the product sold. Many nature will be added after I figure out how to do
I am not sure how to add time in this BN. One week's data included in this example but i need to add 53 more. I thought a new nature node called week can be added and states of it will be week1, week2 etc. However, number of the data will grow exponantially.
Also in some papers I read, some features like colour or fit is added after sales (PHID in example). But as I consider it is not the result of sales, it is a reason of the sales. So it should be causing the sales as I draw in example.

Thank you.
Attachments
BN-min.jpeg
BN-min.jpeg (52.95 KiB) Viewed 2069 times
marek [BayesFusion]
Site Admin
Posts: 430
Joined: Tue Dec 11, 2007 4:24 pm

Re: Too many states in nature nodes

Post by marek [BayesFusion] »

I roughly see what you want to accomplish. I'm afraid your CPT in node PorductHierarchyID will have 40,000 conditional probability distributions and given that the node itself has 50 states, it will have 2,000,000 probabilities. If you multiply these by 8 bytes needed to represent a floating point number, you are getting a very large table. You also want to model explicitly time. This will add additional complication and requirement to increase the space needed for the CPT. You want the moon :-). Netica (I believe your model so far is represented in Netica) is a well written, efficient software. While SMILE may be faster and mode efficient (we have actually never tested this), it will not be enough to help you. I think you need to start thinking about your problem in a different way. The way you are pursuing will not work with any Bayesian networks software.

I hope this helps,

Marek
Post Reply