Tuesday, March 27, 2007

SQL 2005 Analysis Services Project: Training Set

The main reason of why an SQL 2005 Analysis Services project fail is the lack of understanding of the purpose and importance of the training set in data mining. The Training Set takes the place of the scientific theory in data mining. The scientific theory refers to facts known to be true or false. The key is specificity. For example, if you are trying to find out what cancer drugs have the best chemical compounds to fight off cancer you must have the specific chemical compounds and their associated values for each drug. These are called inputs in Analysis Services Data Mining Structures (DMS). The second step is to decide what you want to predict. Do you want to predict a discrete state (yes or no)? Do you want to predict a numerical continuous value (i.e., the price of a particular item)? The third step is to determine your key column or the unique identifier for a particular row.

Always ask yourself what I am trying to predict or what is the scientific theory? The theory and your training set are always specific to want you want to predict. Remember, Microsoft is providing the tool but you must provide the specific theory.

Once you successfully build one model then you can use that model to predict similar situated situations. If you are selling fruits built the model for selling apples first. Once this model is working change the training set to reflect oranges and apply the same model to oranges. The combination of all your models is your data mining enterprise system.

Business Analytics

