The latest development in data mining, predictive modeling, marketing analytics, artificial intelligence, analytics, intelligent agents, semiconductors, distributing computing, and network security. SAS, Fair Isaac, Microsoft Analysis Services, SPSS, Cognos, Hyperion, Business Objects, Oracle, KXEN,or R. Healthcare, Pharmaceutical,Retail, CPG, Travel, Financial, Banking, Telecommunications, or Insurance. Unleashing the Power of the Mind©™
Saturday, September 22, 2007
Mobile Business Intelligence - the Next Big Step
Wednesday, September 19, 2007
Duke Plots Course Beyond the Smart Grid
Friday, September 14, 2007
VP, Decision Support Systems
Wednesday, September 12, 2007
Market Forecasting and Modeling for the Power System of the Future
1. physical assets
2. contract prices
3. economic forecasting
A modeling of this size and complexity could require the utilization of a combination of most of the tools and methodologies in the current data mining and predictive modeling market, plus the development of some new tools.
Predictive Planning for Supply Chain Management
Tuesday, September 11, 2007
F.B.I. Data Mining Reached Beyond Initial Targets
Monday, September 03, 2007
Frequent Doesn’t Mean Loyal: Using Segmentation Marketing to Build Shopper Loyalty
Data Mining Analysis and Modeling for Marketing Based on Attributes of Customer Relationship
SUPERVISORY [BANK] RISK ASSESSMENT AND EARLY WARNING SYSTEMS
Data Mining Applications in Higher Education
Data Mining Technologies and Decision Support Systems for Business and Scientific Applications
Integrating Customer Value Considerations into Predictive Modeling
Thursday, August 30, 2007
Predictive Analytics and Data Mining
1. A robust back end to handle large amounts of diverse and complex data;
2. Creation of client, industry, and business problem variables that can assist in determining patterns in the data;
3. Utilization of multiple data mining or predictive modeling algorithms to classify the data; and
4. Utilization of statistical techniques to help forecast, partition, or determine areas with common patterns.
On the Advantages and Disadvantages of BI Search
Tuesday, August 21, 2007
Paper Kills: Transforming Health and Healthcare with Information Technology
Monday, August 20, 2007
Donald Farmer on Data Mining
Donald's blog: http://www.beyeblogs.com/donaldfarmer/
Look at his data visualization music video link! http://www.youtube.com/watch?v=KHEIvF1U4PM
Thursday, August 16, 2007
Technology: Is Data Mining Misguided?
The change management is to get users of data mining to understand that it is a process and that for it to work you need to invest resources (mostly time and technology).
Wednesday, August 15, 2007
Google, Microsoft and the glacial healthcare revolution
Thursday, August 02, 2007
Korean stem cell fraud masked a true advance
Monday, July 30, 2007
Genetic breakthrough in multiple sclerosis -- biggest for decades
Monday, July 23, 2007
New processors present problems, payoff
Conceptually I think that it will be like this:
- Data mining technologies will provide the fundamentals of pattern and error detection. Due to the complexity and diversity of the rich data environment that we currently face we will need the ability to have part of this technology embedded into any program, and we will probably need multiple and different data mining models analyzing data simultaneously so as to customize the needs of the end users;
- Intelligent mobile agent technologies would be fundamental to access and process data from servers, mainframes, and handheld devices like cellphones;
- Web based technologies will be fundamental in solving finding patterns and in improving remote communications;
- Parallel computing technologies will be needed to optimize the processing of large quantities of data; and
- Visualization technologies that make complex patterns easily understood, while simulateously adhering to establish laws of nature (i.e., medicine, or physics), or previous experience (business rules) would also be a keystone in this endevour.
Our biggest challenge is going to be to reach out acrross multiple disciplines and technologies to integrate all these technologies into a great schema. In this sense we are all pioneers. We bring different skills set that we combined will mark the path for others to follow. It will not be easy, but it will be worthwhile!
Wednesday, July 11, 2007
Understanding Molecular Imaging
Web Analytics and Healthcare: Disease Progression
Monday, July 09, 2007
Moving Closer To Solving Lou Gehrig's Disease Mystery
http://www.medicalnewstoday.com/medicalnews.php?newsid=75539
This is an area that I hope predictive modeling and data mining can make a difference. If we can do a linear disease progression modeling at the cellular level we might be able to diagnose and prevent ALS before its onset.
Thursday, June 21, 2007
Web Analytics: Future Applications in Predicting Modeling
A lot of time and effort is being channeled in the area of web analytics. This terms refers to:
“[t]he measurement of data as it relates to an Internet site, including the behavior of visitors, the amount of traffic, the conversion rates, web server performance, user experience, and other information in order to understand and proof of results and continually improve the results of a site towards a set of objectives.”
Since web analytics is another area of predictive modeling, we must ask whether the methodologies, analytics software, and visualization tools develop in web analytics could have impact in other industries that use predictive modeling like healthcare, banking, insurance, retail, and manufacturing industries. I think that the processes and software developed for web analytics will ultimately be use in many other industries because the intersection of the Internet and other industries is already a reality.
Predictive modeling and web analytics have the same objective, to provide a measurement (or baseline) and to predict future behavior. One of the key contributions of web analytics has been software that can withstand the rigors of commercial use. The scalability components of web analytics are crucial for other industries in which large databases has become the norm.
Another significant issue that web analytics has contributed to the area of predictive modeling is the ability to come together and provide a series of metrics and benchmarks for the industry. Although some may disagree with this assessment, if we look at the history healthcare industry it apparent that the inability to agree upon benchmarks and metrics have negatively impacted the cost of healthcare in the United States. Moreover, those involved in web analytics could give industries like banking, insurance, and retail an innovative new look at what needs to be measured.
A third issue that web analytics have contributed to the issue of predictive analytics is the healthy, spirited, and robust exchange in the area of privacy. The Internet has created and raised serious, relevant, and pertinent questions regarding privacy that other industries could find beneficial.
A fourth area that web analytics has contributed to predictive modeling are the development of new return on investment (ROI) models in business. As companies adopt for these ROI models for their advertisement, new media, and marketing strategies they may find that these models are also applicable to other lines of businesses.
Last but not least, web analytics have contributed to a new set of visualization tools that summarize previously hidden nuggets of gold in a way that can be easily understood and act upon.
Tuesday, June 19, 2007
Geovisual Analytics and Crisis Management
NIH-NSF Visualization Research Challenges Report
Monday, June 18, 2007
Friday, June 15, 2007
What Data Mining Can and Can't Do
Wednesday, June 13, 2007
Evaluation of noise reduction techniques in the splice junction recognition problem
A review of symbolic analysis of experimental data
Enhancing Data Analysis with Noise Removal
Tuesday, June 12, 2007
Incremental Mining of Sequential Patterns in Large Databases
The problem: "As databases evolve the problem of maintaining sequential patterns over a significantly long period of time becomes essential, since a large number of new records may be added to a database. To reflect the current state of the database where previous sequential patterns would become irrelevant and new sequential patterns might appear, there is a need for efficient algorithms to update, maintain and manage the information discovered [12]. Several efficient algorithms for maintaining association rules have been developed [12–15]. Nevertheless, the problem of maintaining sequential patterns is much more complicated than maintaining association rules, since transaction cutting and sequence permutation have to be taken into account [16]."
The proposed solution: "This method is based on the discovery of frequent
sequences by only considering frequent sequences obtained by an earlier mining
step. By proposing an iterative approach based only on such frequent sequences
we are able to handle large databases without having to maintain negative border
information, which was proved to be very memory consuming [16]. Maintaining
such a border is well adapted to incremental association mining [26,19], where association rules are only intended to discover intra-transaction patterns (itemsets). Nevertheless, in sequence mining, we also have to discover inter-transaction patterns (sequences) and the set of all frequent sequences is an unbounded superset of the set of frequent itemsets (bounded) [16]. The main consequence is that such approaches are very limited by the negative border size."
Friday, June 08, 2007
Molecular Staging for Survival Prediction of Colorectal Cancer Patients
Go Stanford! That's were one of my daughters graduated from and SAM is a product of Stanford University.
The treatment of missing values and its effect in the classifier accuracy
Nevertheless, this is the crucial recommnedation (p.8): "We recommend that we can deal with datasets having up to 20 % of missing values. For the CD (Complete Deletion) method we have up to 60 % of instances containing missing
values and still have a reasonable performance."
For healthcare, pharma, and biotech data this paper is important because of the complexity and diversity of this data.
An Assessment of Accuracy, Error, and Conflict with Support Values from
When molecular biology theories are tested with real data we need to be cautious in reading bootstrap values if we are assuming an underestimation of the actual support. For example (my example is not in this article), if using a decision tree vs. logistic regression bayesian model, be cautious in how you assess the accuracy of your model since the decision-tree tends to understimate and bayesian models tend to overestimate.
I have found that to increase a classifier accuracy for a model, this type of distinction (non-parametric bootstrap values vs. Bayesian probabilities) is fundamental.
Phase II Studies: Which is Worse, False Positive or False Negative
Monday, May 21, 2007
SPSS Launches Enhanced Predictive Analytics Platform
The Advantages of Smart Data Mining
Data-mining moves into the mainstream, in search of profit
Wednesday, May 16, 2007
General Healthcare Data Mining Model
Metric 1 - 5.15% (old) with new model 20.3%
Metric 2 - 6.06% (old) with new model 53.4%
Right now we are fine tuning the model, and reducing our findings to a writing since we must make sure that we have good documentation. We processed over 396 million claims in a 4.7 TB environment (SQL Server 2005). We can refresh every month right now and the goal is to refresh once a week in the next couple of months.
The model can be used in healthcare, pharmaceutical, and biotech industries.
Tuesday, May 01, 2007
Doctors test gene therapy to treat blindness
Tuesday, April 24, 2007
Father and me
He was a giant when in one knee will listen to me
He was wise when over the years I learned to listen
He was a leader when showed me by example
He defined courage in the most difficult struggle
I will be a giant like him when on bended knee
I will be wise by becoming a listener
I will lead by example
I will have the courage to be his son
Monday, April 23, 2007
Studies back Parkinson’s and pesticides link
So friends and collegues that is my goal: to create and make sure that as many people as possible have access to a healthcare data mining model that have multiple uses. I thought that if I created this tool, I could assist scientist and companies find sometime of relief for some terrible diseases. Eighteen months after dad died, my mother died of ALS. Now you know what drives me.
Thursday, March 29, 2007
Predicting breast cancer survivability:comparison 3 models
Intel details new chip technology
Wednesday, March 28, 2007
Successful Data Mining Applications
Mining biotech's data mother lode
Pellucid Agent Architecture for Administration Based Processes
Another application of data mining and intelligent agents
http://ups.savba.sk/parcom/publications/agents/IISAS_IAWTIC-2003.pdf
Application of Data Mining and Intelligent Agent Technologies to Concurrent Engineering
This is a good article about a potential application of data mining an intelligent agents in the manufacturing industry
http://issel.ee.auth.gr/ktree/Documents/Root%20Folder/ISSEL/Publications/3_MITKAS_IJAM.pdf
Tuesday, March 27, 2007
Intelligent Agents
And the Future of Data Mining
An intelligent agent is: (1) a software agent if it is a piece of software that acts in a relationship of agency for a user or other program; or (2) an intelligence actor if it interacts with its environment. The first definition refers mostly to data mining, while the later refers to a robot like machine.
Although in the science and technology communities we tend to separate both definitions of intelligence agent, the advances in computer processors are bringing both environments closer to one another. I imagine the integration of an Electronic Medical Record (EMR) device like GE Centricity, and healthcare specific data mining algorithms using Microsoft SQL 2005 Analysis Services into a machine learning hardware that will assist physicians and other healthcare providers in real-time improving and measuring of clinical outcomes.
This type of technology could also be applicable to PDA’s and trading in the financial markets, or the purchasing goods and services (brick and mortar or thru the Internet), or in the decision-making process of what food to buy or movie to watch. The technological challenges will be correlated by the advances in technology by companies like Intel and Motorola in designing smaller, faster, and with greater storage capacity. Other challenges involve data network security and privacy issues which affect consumers. These challenges are great, but without any doubts the framework to integrate intelligent mobile agents and data mining is already in place.
Strategic alliances in the technology industry are no longer limited to the industrialized countries, but are a worldwide phenomenon. They are not in the realm of the large technology companies either. I would not predict who, when, or what industries and what companies will benefit from the merger of both technologies. I do predict that we should see the first fruits of the merger of both technologies in the next eighteen to twenty four months.
SQL 2005 Analysis Services Project: Training Set
The main reason of why an SQL 2005 Analysis Services project fail is the lack of understanding of the purpose and importance of the training set in data mining. The Training Set takes the place of the scientific theory in data mining. The scientific theory refers to facts known to be true or false. The key is specificity. For example, if you are trying to find out what cancer drugs have the best chemical compounds to fight off cancer you must have the specific chemical compounds and their associated values for each drug. These are called inputs in Analysis Services Data Mining Structures (DMS). The second step is to decide what you want to predict. Do you want to predict a discrete state (yes or no)? Do you want to predict a numerical continuous value (i.e., the price of a particular item)? The third step is to determine your key column or the unique identifier for a particular row.
Always ask yourself what I am trying to predict or what is the scientific theory? The theory and your training set are always specific to want you want to predict. Remember, Microsoft is providing the tool but you must provide the specific theory.
Once you successfully build one model then you can use that model to predict similar situated situations. If you are selling fruits built the model for selling apples first. Once this model is working change the training set to reflect oranges and apply the same model to oranges. The combination of all your models is your data mining enterprise system.
Friday, March 23, 2007
Microsoft SQL 2005 Analysis Services: Ten Best Practices©
By Alberto Roldan
A number of data mining, executive management, and IT professionals seem to be experiencing the same issue with Microsoft QL 2005 Analysis Services (MAS): How do I make this product work for my enterprise? These ten best practices should help provide some assistance in dealing with this issue.
1. Training: Any organization using this product must have at least one person who has received training in SQL 2005 Analysis Services, and the basic principles of data mining and predictive modeling.
a. Do not expect the Information Technology Department to create an enterprise data mining project without the proper training in the technology and the science of data mining. One of the reasons for the lack of success of data mining projects is that the IT department understands neither the technology, nor the science behind data mining. MAS make the development of an enterprise data mining project, if at least one member of the staff understands the technology and science behind it.
i. Potential Solution – Since this is cutting edge technology which merges science and technology, the Chief Technology Officer, Chief Information Officer, and Enterprise Architect must receive some training in this area. They do not need to be experts, but they must understand the basic principles behind it. This training will make sure that expectations and strategic business initiatives properly align. Also, make sure at least two or more software engineers take the online tutorials.
2. 2. Strategic investment and not simple a cost center: Most IT projects are linear (i.e., project scope, charter, resource allocation, design, development, QC, testing and deployment). Data mining is always a cyclical project because it is research and development. It is heuristic by nature. Organizations invest in enterprise data mining because they realize that the amount of data must be managed differently to transform data into actionable information. The expectation that we are going to do something differently but use the same practices that we have used in the past is an oxymoron.
a. Plan data mining projects by incorporating research and development techniques into IT software engineering best practices. You need a specific theory for your research, proposed research steps, timetable, and allocate dedicated resources. Also, document all the steps, define your metrics, complete the research, QC the results, evaluate results, and determine the next step for research in your continuous improvement process.
i. Potential Solution – Concentrate on your theory for research. The more specific your theory, the greater the probability to conduct an experiment that will give you specific results that you can use (whether by confirming your theory or not confirming your theory). If the theory is something to the effect of "save the planet and cure cancer," I will guarantee you that it will not work. Never underestimate the ability to transform a strategic business initiative into multiple theories for data mining research.
3. 3. Dedicated Resources: One of the missions of any IT department is to prove added value to the enterprise by reducing costs. As a consequence, the standard in some organizations is to share resources as much as possible in order to decrease costs. Nevertheless, the architecture of an enterprise data mining project requires, at a minimum, its own server due to the amount of processing time required.
a. The SQL 2005 Server is a powerful tool that has the ability to accommodate many users and serve multiple purposes in the enterprise. As any other resource we should always strive to optimize this resource. Nevertheless, I have found that in a metadata environment, operational and data mining research projects' competition within the same environment causes unnecessary friction and delays to all the parties.
i. Potential Solution – The best practice is separate and dedicated IT resources (staff, hardware, and software). Nevertheless, if this is not feasible, a detailed communication and utilization plan of resources must be implemented. The goals and expectations of any data mining project must be adjusted to reflect the additional time (between 25%-50% additional time) required to complete a data mining project.
4. 4. "One Theory at aTime" Rule: You can use this tool to address multiple business issues, but it would probably require multiple models for multiple issues. If you are looking for the needle in the haystack, you must consider that you could have multiple haystacks and multiple needles. Therefore, you are in a situation that requires multiple models for the haystacks and the needles. The complexity of this issue cannot be underestimated.
a. Many enterprises spend vast resources in their organizational planning and structure. The reason for this expenditure is they understand that their business is complex and requires a clear chain of command to successfully implement strategic and operational initiatives. Nevertheless, these same enterprises fail to recognize this complexity when attempting to implement data mining systems.
i. Potential Solution – Study your organizational chart. This could be a roadmap as to the priorities for a successful enterprise data mining system. It will assist in defining the theories that you want to test in a research environment.
5. 5. Models are never generic; they are always specific: The use of the terms data mining and artificial intelligence sometimes are used out of context in the business environment. These terms tend to be used more in the science fiction context than in business content. Therefore, this contributes to unrealistic expectations of what data mining could do to assist in implementing strategic business initiatives. It is imperative that the CIO and the CTO have a basic understanding of the science and technology behind data mining so the executive management team makes well-informed decisions about the incorporation of these tools into any initiative.
a. A data mining system is not going to lower your operational costs overnight by ten percent. A data mining project is not magic, and like any other strategic initiative, involves planning, knowledge management, and change management. Expectations must be realistic to the size of the investment. Investment is not just hardware and software, but it also involves training and making sure that you have the right people to do the job well. It requires being intellectually honest to determine what are your needs, and candid about your business expectations depending on the size of your investment.
i. Potential Solution: The first step before making an investment to evaluate where the organization is currently at, how did it go there, the nature of the competition, and what do you think a data mining system can do for your organization. The axiom that those that fail to plan, are planning for failure is applicable in data mining. Executive management support and sponsorship is a keystone to the process. Do not underestimate the challenges and cyclical nature of this type of initiative, but make sure that the message throughout the organization is clear: we will achieve an enterprise data mining system because it is part of how we intent to stay competitive in the future by lowering cost and increasing revenues.
6. 6. Three Main Categories: The variables (input), training set (sample), and data mining structures are the keystones of Analysis Services. A clear understanding of these three areas will assist in the creation of a data mining system.
a. IT Architects and developers that do not understand the three main components of Analysis Services will have a difficult time with the successful completion of a single data mining project. It will be impossible for them to successfully design and implement an enterprise data mining system. The knowledge requires a basic understanding of the technology and the science of data mining and predictive modeling. Acquiring this knowledge does not need to be extremely costly or time-consuming. Nevertheless, expecting the IT department to successfully complete this type of project without allowing them time to acquire this knowledge and training is putting them in a position to fail.
i. Potential Solution: The technical and scientific knowledge to successfully complete a data mining project can be acquired thru training (classes or online), engaging the services of a consultant with a proven record of completing data mining projects, or by self-motivating reading. I would suggest looking within your own organization for individuals with at least a statistics or mathematical background, or those who have an interest in life sciences (genetics, biology, astronomy, physics, or chemistry). Those individuals could have a predisposition to quickly learn the science and methodology behind the technology of data mining.
7. 7. Statistically accepted best practices as metrics: As this product has joined science and technology but is cyclical rather than linear, we must incorporate statistically accepted best practices if we want to have a continuous improvement process required in research. The inclusion of additional areas of measurements besides traditional business metrics (cost per employee, revenue per employee, and profits per employee), IT metrics (reduction of hours to complete a process, increase in revenues, CPU utilization per employee, and total down system time), now we need to incorporate some scientific metrics that will assist in improving a data mining system. The understanding of scientific metrics is important to measure success, improvement, or lack of improvement.
a. It is a change for organizations that have never had an organizational research component to apply an additional set of metrics to gauge the performance of a data mining system. The tendency is to only use the same metrics that has been used in the past to measure the performance of a data mining system. The error is in assuming that data mining is a linear type of project rather than a cyclical one. The best example is a data mining system could seem to be a failure from the business point of view, but when measured by statistical metrics it is successful. In this scenario, the statistical metrics can help diagnosis the problem.
i. Potential Solution: I would suggest the utilization of an add-in statistical software package to determine the Variable Inflation Factor (VIF) for the numerical variables to measure that a particular variable that is having an undue influence in your model. Also, I would suggest measuring Type II error to measure determine the predictability qualities of your model (i.e., what is your model measuring).
8. 8. Design of Experiments: In a research environment the projects are cyclical. The creation of a successful data mining system requires research and research requires experimentation. This is one of the areas where business and science seem to conflict. Science expects that some experiments will not be successful, and Businesses tend to be risk averse. Nevertheless, businesses constantly take risks to improve their profitability and growth. Therefore, it is not that businesses do not take risks, but that they want to be able to quantify and qualify the risks.
a. When science and technology join in the business arena some compromises must be made that are beneficial for all the parties. Science and technology cannot operate within a “pure science” mentality, and businesses must face the inherent risks head-on.
i. Potential Solution: If you have invested in the SQL 2005 Server and its software you have already incurred in part of the financial risk. The issue then becomes do you want to use the server as a simple storage facility or are you willing to make an additional investment (i.e., training, knowledge transfer, or consultant) to use the full potential of this tool. I would suggest stating by having two people in your staff go thru all the online tutorials (no shortcuts) and then let them try to successfully create and deploy a test analysis services project using limited data. This process will bring out a series of unanswered questions, and those questions will help you plan the options that you have to acquire the knowledge that you need to design, test, and implement a data mining system. Also, change the name design of experiments to design of research since it will make it more easily understood by others within the enterprise.
9. 9. Quality Control and Testing: Design multiple quality control staging areas during the process. Although Microsoft has made this product in such a way that it writes about ninety percent (90%) of all the code you will find that you need to make small modifications and wrap coding sometimes. Also, optimization of the processes will take place if you need to create specific variables like Z-scores. Lastly, you will find resistance to changes within the IT and Operational organizations of the enterprise to making any changes of the current processes. This resistance to change will require a specific change management strategy
a. The potential of a successful data mining system in an enterprise tends to create apprehension. This apprehension is rooted in the mistaken belief that if a data mining system is successful it will result in people losing their jobs. In the work place nothing is as personal as the instability that a potential change could bring if the perception for managers and staff alike is that their jobs might be in jeopardy if this new system is successful. The implications for QC and testing are immeasurable in terms of utilization of productive time.
i. Potential Solution: The development of realistic communication, quality control, and testing plans as part of the initial executive management evaluation whether or not to design a data mining enterprise system is a must. This plan should include goals and expectations at all levels of the enterprise. Specifically, the communication plan should address the issue of the potential changes in duties and responsibilities of staff and managers. It is a lot easier for managers and staff to buy into this type of strategic initiative if they see the role that they will play during the different stages of an enterprise data mining project.
1 10. Expectations: The term “high but realistic expectations” does not need to be a contradiction. The high expectations refer to the ability of the enterprise to learn to effectively use the tools at its disposal. Realistic expectations refers to the increase in value that a data mining system should give you based on your prior experience in growing and developing the business. Also, expectations should be directly correlated to the investment in training and dedicated resources to a data mining project.
a. Some organizations tend to go to their IT department and ask them to build them a data mining or analytics enterprise system that will solve their business issues, as well as all the pressing world issues. The CIO, CTO, or VP of Technology sometime do not have the knowledge required to explain that a more specific approach is required in data mining. Hence, the failure of data mining projects is the failure to properly plan having high but realistic expectations:
i. Potential Solution: Microsoft has made a product that streamlines a lot of the designing and developing of a data mining system, but the key is specific planning, knowledge transfer from the business areas to the IT department, and defines the specific business needs. It is going to take time and effort to put this together. The first step is to develop a plan that will take into consideration the resources and training necessary to use this new tool. This plan should serve as a roadmap of how we are going to implement this initiative
I hope that you can use some or all of these best practices in using SQL 2005 Analysis Services to create an enterprise analytics or data mining system. The barriers are technical, scientific, and in the change management areas. The potential is immeasurable. Contact: alberto_roldan_2001@yahoo.com
Monday, February 19, 2007
Predictive Modeling and Microsoft Analysis Services 2005
This product is scalable (we are utilizing in over seven terabytes of data every month) and user friendly. It integrates fairly simple with Reporting Services.
The key in how to utilize Analysis Services in a supervised model is the training sample. My main recommendation is that you bring all your data tags into your training sample. In order to determine the size of your training sample population multiply the number of data tags by five and then your data tags will represent 20% of your population.
Another key issue is the modifying of the algorithm parameters. Specifically, the maximum states. In order to determine the maximum number of states in your data I suggest a combination of partition and distribution analyses. You can also use the Microsoft Decision Tree Algorithm.
David did a great job with the data mining algorithms but for those of us who have been in the data mining industry for a long time we need more detail (as well as peer review) articles about the algorithms. For example, the predict and predict probability functions has output that are negative values when this should be a mathematical improbability in an unsupervised model. Even if we filter all the negative inputs we still get negative output. I think that this is a data type kind of issue but we are still researching.
Another issue that is not address in the algorithms is whether any variable or input is improperly influencing the predictive output. Specifically, I would prefer that the models will give us the VIF value for each input. Otherwise, we may find ourselves with one of those situations that are "too good to be true."
The last issue is that the number of Type II errors are extremely large in these models (when we apply the training set to the entire population). Specifically, I am referring to Type II errors that are greater than 60%!!!
Microsoft through Jamie's group is providing us with great technical support and I want to congratulate them for their efforts.
Business Analytics
Labels
- advanced analytics (2)
- analytics (5)
- analytics tools (2)
- big data (3)
- buisness analytics (4)
- business analytics (4)
- business plan (1)
- center of excellence (1)
- classification (1)
- companies (2)
- data mining (3)
- framework (3)
- game theory (1)
- innovation (3)
- leverage (1)
- marketing analytics (1)
- predictive modeling (4)
- prioritization (1)
- priority (1)
- projects (1)
- recession (1)
- robotic surgery (1)
- segmentation (1)
- social media (2)
- trade promotion (1)
- trends (2)
- web analytics (2)
- what if scenarios (2)
Blog Archive
-
▼
2007
(55)
-
▼
September
(12)
- Mobile Business Intelligence - the Next Big Step
- Duke Plots Course Beyond the Smart Grid
- VP, Decision Support Systems
- Market Forecasting and Modeling for the Power Syst...
- Predictive Planning for Supply Chain Management
- F.B.I. Data Mining Reached Beyond Initial Targets
- Frequent Doesn’t Mean Loyal: Using Segmentation Ma...
- Data Mining Analysis and Modeling for Marketing Ba...
- SUPERVISORY [BANK] RISK ASSESSMENT AND EARLY WARNI...
- Data Mining Applications in Higher Education
- Data Mining Technologies and Decision Support Syst...
- Integrating Customer Value Considerations into Pre...
-
►
August
(7)
- Predictive Analytics and Data Mining
- On the Advantages and Disadvantages of BI Search
- Paper Kills: Transforming Health and Healthcare wi...
- Donald Farmer on Data Mining
- Technology: Is Data Mining Misguided?
- Google, Microsoft and the glacial healthcare revol...
- Korean stem cell fraud masked a true advance
-
►
June
(13)
- Web Analytics: Future Applications in Predicting M...
- Geovisual Analytics and Crisis Management
- NIH-NSF Visualization Research Challenges Report
- BioGRID version 2.0.29 release ( maintenance update )
- What Data Mining Can and Can't Do
- Evaluation of noise reduction techniques in the sp...
- A review of symbolic analysis of experimental data
- Enhancing Data Analysis with Noise Removal
- Incremental Mining of Sequential Patterns in Large...
- Molecular Staging for Survival Prediction of Color...
- The treatment of missing values and its effect in ...
- An Assessment of Accuracy, Error, and Conflict wit...
- Phase II Studies: Which is Worse, False Positive ...
-
►
March
(10)
- Predicting breast cancer survivability:comparison ...
- Intel details new chip technology
- Successful Data Mining Applications
- Mining the Genome
- Mining biotech's data mother lode
- Pellucid Agent Architecture for Administration Ba...
- Application of Data Mining and Intelligent Agent ...
- Intelligent Agents And the Futu...
- SQL 2005 Analysis Services Project: Traini...
- Microsoft SQL 2005 Analysis Services: Ten Best Pra...
-
▼
September
(12)
About Me
- alberto
- See my resume at: https://docs.google.com/document/d/1-IonTpDtAgZyp3Pz5GqTJ5NjY0PhvCfJsYAfL1rX8KU/edit?hl=en_USid=1gr_s5GAMafHRjwGbDG_sTWpsl3zybGrvu12il5lRaEw