Business Analytics

Friday, April 04, 2008

Computer-aided Detection of Lung Cancer from Computed Tomography Images

Good technique to separate the clusters and clarify the driving factors.

Thursday, April 03, 2008

Sr .Net Architect

I usually do not do this, but there is a friend that is looking for a Sr .Net Architect in the Nashville, TN area. It can pay up to $120/hr.

Wednesday, April 02, 2008

Conference or Seminar on Data Mining and Visualization Tools

A member of the Business Analytics Group wants to know if anyone knows a conference or seminar that will cover the issue of data mining and visualization tools. If anyone has any information please let me know.
Alberto

More data usually beats better algorithms

This article pinpoint something that has been true for a long time: more data usually beats better algorithms. Therefore, assuming that the data mining algorithmns are not the issue (assuming good science behind them, which I have found in all the major software vendors), the issue then becomes the quality of the interactive visualization tool that allows end-users to make better decisions. Fed Chairman Bernanke, when at Princeton, published a paper that is complimentary to this issue.

Tuesday, April 01, 2008

Business Analytics is Evolving Says SAS

Good article explaining the definition and value of business analytics.

Schlegel on Search, Analytics and Visualization

Gartner's Schlegel on next BI industry consolidation trend: predictive analytics and visualization tools.

People would rather live with a problem they cannot solve than accept a solution they cannot understand

This quote by Robert Woolsey is the main business analytics issue. The science and the technology nows allows predictive analytics and data mining to be available to every business regardless of size and complexity of the problem. The issue is how to explain the solution to organizations. I believe that an interactive visualization tool, that mirrors the goals and strategy of an organization, is the key to make businesses understand and embrace the solution. A picture speaks a thousand words!

Saturday, March 29, 2008

Predictive Analysis with SQL Server 2008

This is a good article. In the interest of full-disclosure, I am familiar with Donald Farmer, Jamie MacLennan, Kamal Hathi and some of the individuals of the SQL data mining team within Microsoft. Also, I believe in the underlaying strategic principle that making predictive analytics available to all users within an enterprise is the keystone of turning data into actionable information for better decision-making. I know that the underlying predictive algorithms are based in good science and technology (as well as the products of SAS, SPSS, Business Objects, and many other vendors).

Two issues that I would like to explore with this product:
1. The interactive visualization of Analysis Services with Excel 2007. I believe that this is the key for data mining for the masses; and
2. The creation of industry-specific modules that allows for an easier designing of a multidimensional vector that is industry specific.

Family Study Associates Pesticide Use With Parkinson's Risk

This article shows the type of pattern that made me go into data mining. My father was a pure scientist (chemistry and mathematics) that dealt with pesticides in his research. Th entire laboratory staff (scientists and lab workers) all died of Parkinson's disease. The only thing that these people had in common was working with pesticides. These were very careful people that took extraordinary precautions in dealings with pesticides because they were aware of the harmful effects (even at the cellular level over a long period of time) of pesticides. My goal is to create a data mining visualization and predictive tool that can be used for as many people as possible to detect this type of patterns. Hopefully, when you have many people (experts as well as common people) looking at data patterns we are able to solve some of the serious problems that we are faced as a society.

Friday, March 28, 2008

TowerGroup: Losses Resulting from Mortgage Fraud in U.S. Will Reach $2.5 Billion in 2008

For U.S. companies the mortgage fraud software can be provided as a software as a service with analysts offshore. This will bring a cost-efficient and robust business analytics solution to this problem for financial institutions. As the Federal Reserve Board discuss the new oversight requirements for investment banks that are borrowing money from the Fed the requirement of a robust fraud prevention program could become a requirement.

Thursday, March 27, 2008

Data Mining for Disease Management:Adding Value to Patient Records

This is a good article. I still question the underlying premise within the medical community that expects "perfect data" to equal perfect results. Data mining is about finding patterns (known and unknown) in the data that would allow imperfect people to make better decisions.

Wednesday, March 26, 2008

Software as a Service overview

This is a good overview video of software as a service (SaaS) if you need to explain the concept to others. In the case of SaaS analytics the issue is to create modules by industry. For example, I have worked on modules for payers, providers, pharmaceutical and DME companies. I am currently working on modules for the financial industry: banks, investment banks, brokerage companies, and insurance companies. Also, I am working modules for supply chain and biotechnology. Is anyone else working on SaaS modules by industry?

Tuesday, March 25, 2008

The Financial Market Crisis and Risks for Latin America

The purpose of this article is to give an example of the variables used in international credit risk exposure due to the changing world financial markets. We know that financial institutions in the U.S. and Europe have been negatively impacted, but banks in China and Latin America have a lower exposure because their main growth in organic or in their own domestic markets.

Therapeutic Cloning Works in Mice With Parkinson's

This is the type of discovery that a data mining system can have a direct impact.

Why good companies go bad

This is a classic article that I thought in this changing economic times could be a good reminder to companies that change could mean doing something different instead of just doing more or faster the same things that a company has done in the past. An enterprise analytics solution could help executives determine whether doing more of the same is helping or getting their companies in a bigger rut.

Monday, March 24, 2008

Enterprise Analytics: A Business Decision

The last three years have seen advances in efficient data mining algorithms, and computing advances that allow software companies to provide powerful analytical tools that were not available to the business community some years ago. Whether we refer to this type of software as data mining, predictive analytics, business intelligence, or analytics (web or business) their purpose is to efficiently detect patterns in large datasets that can lead to increase revenues or lower costs. The purpose of this article is to give a general framework to businesses regarding how to measure the different analytics solutions available in the marketplace.

First, let us make the difference between an analytical tool and an analytical solution. Most companies need an analytical solution instead of an analytical tool. Software vendors are in the business of selling analytical tools (SAS, Microsoft, Oracle, Business Objects, and SPSS). All these companies are using state-of-the-art science and technology to create powerful analytical tools. A hammer, a saw, and nails in the hands of a master carpenter can create a beautiful house. Those same tools in somebody else hands are something that you put in the garage. Therefore, the question is do you need an analytic tool or do you need an analytic solution. If your IT staff has not built an advanced analytical decision support system in the past what you need is a solution. If they have this experience then what you might need is a new tool. One way to tell the difference is whether the sum, or average (or mean) are the most common measurements used in your organization. If this is the case, you need an analytics solution and not a new tool.

Most carpenters will tell you that some people have all the tools needed to build a house in their garage. Therefore, before going out buying new tools you need to know: 1. what do you want to build? 2. What skills do you have in designing and building an analytical system? What tools you already have in your IT department?

Alignment with specific business objectives
What do you want to build? The answer to this question is that you want an analytical solution which is aligned with your organization strategic and operational objectives. This is one of the keystones of an analytical solution: it must match strategic and operational objectives. An enterprise strategic objective will represent the schematics or floor plan of your analytical solution. The enterprise operational objectives will represent the plumbing and the electrical systems of your analytical solution.

An analytical solution must ultimately represent the vision of executive management, while simultaneously be efficient to operate for those responsible for day-to-day operations. We know that a CEO can write his strategic vision even in a napkin and the role of his team should be to translate this vision into a series of strategic and operational goals. Hence, the importance that the overall strategic design of an analytics solution starts at the CEO and executive management team level. The decision to build an enterprise analytics system is an executive decision. A CEO thinks in terms of analytic solutions to business problems, and the IT department tendency is to think in terms of tools.

An experience analytics architect should be able to turn the executives input into an analytics solution prototype with some additional research and input from operations. This prototype should be detailed and must be approved by executive management before proceeding with specific business requirements and technical design. See,http://www.youtube.com/watch?v=bVmLfCajDjI.

Encompass all relevant data
Your data is the equivalent of your raw materials (wood, stone, bricks, pipes, flooring, carpets, and drywall). Since not all your materials are the analytics solutions architect and the lead software developers need to have the proper experience with large, diverse and complex datasets. Not all the houses are the same but the materials used are mostly the same. The same logic applies to the designing and building of an analytical solution. It is important not to confuse the quality of a tool to the quality of the materials. The tools can make a job easier for the builder, but is the quality of the materials in the hands of an expert builder that is going to make an average house an outstanding home.

The skill level of technical personnel varies and is as diverse as the number of home builders in the country. Some IT departments are made of technical staff that can give maintenance to current operations. Other IT departments have a specialized area for development, and others have fully functional project management offices (PMO). Very seldom do IT departments have an integrated analytics expertise (statistician and actuaries) and specialized software analytics developers (data cleansing, OLAP, and interactive visualization) within their departments that would allow the creation of advanced analytics solutions.

Flexibility of use for different business users
A robust analytical solution needs to be flexible for the different business users within an organization, and be adaptable to future analytical needs. Some users are interested in strategic objectives and others in operational objectives. An enterprise analytics solution should meet both strategic and operational demands. A keystone of an analytical solution is that it should allow different business users to have access to it. A business problem may have different potential solutions and analytical solution should take advantage of different perspectives from different users (human interaction) to identify potential solutions.

Also, it should have the capacity to accommodate future growth and changing business needs. The current business analytics requirements of an enterprise may be different from the analytical needs of the future. An analytics solution needs to be design with the capacity to growth, and the flexibility to accommodate unforeseen business issues.

In conclusion, a business analytics solution should cover the fundamentals:
1. Emphasize the alignment of the strategic and operational objectives to the analytic solution instead of the analytical tool;
2. Make sure that you have the correct materials (staff and data) to build your solution;and
3. Design a flexible analytical solution that could be used by strategic and operational users.

Sunday, March 23, 2008

Human Computer Interaction

This is the site for the Human Computer Interaction at the University of Konstanz in Germany. If you want to know about best practices in interactions and visualization techniques, see http://hci.uni-konstanz.de/index.php?a=teaching&b=corner&c=15851838&lang=en

Saturday, March 22, 2008

Interaction Techniques for High Resolution Displays

A video of interactive visualization. Imagine analyzing data using this type of techniques and that is available to everybody in an organization.

Friday, March 21, 2008

Real Time Scalable Visual Analysis on Mobile Devices

This is the next logical step: integrating analytics visualization with mobile devices.

Gartner: Emerging Technologies Will Help Drive Mainstream BI Adoption

Gartner's analysis is 100% on target. Business intelligence is more than creating an enterprise data warehouse, it is about transforming data into actionable information by allowing all individuals within an organization to make decisions based on business analytics.

Mid-Market Insurance Provider Selects Blink Logic for SaaS BI Solution

Two main reasons that I am posting this article:
1. It illustrates that analytics can be provided using a software as a service (SaaS) business model; and
2. It brings home the point that business analytics is not only for highly-skilled analysts, but available to the decision-makers as well.
This is the future of analytics!

Thursday, March 20, 2008

Human Factors In Visualization Research

It is the interaction of the computing capabilities and the human mind capabilities that allows for evidence-based decision making in situations involving large sets of data. Therefore, in order to make evidence-based decisions data must be summarized in such a way that takes advantage of the increddible capabilities of the human mind. A good data mining visualization tool should be able to separate the clusters of data and clarify the driving factors by clearly and cognitively recognizing the patterns or trends in the data.

This is a good article that emphasizes the cooperation between the end-users and the development of visualization tools for data mining and business analytics. The bottom line: make it simple and intuitive!

Wednesday, March 19, 2008

Visualization Projects for Data Mining

These are the current projects for the visualization group at the Lawrence Berkeley National Laboratory. One of the members of the Business Analytics Group expressed interest in the latest developments in network traffic analysis and cybersecurity (see, http://vis.lbl.gov/Vignettes/QDV-NetworkTraffic/qdv-vignette.html).
I would suggest to spend 1-2 minutes for each project and just look at the visualizations and determine if this is something that can be apply to an organization as part of your business analytics and predictive modeling. The visualization technology is available right now. The question is whether the business is in a position to accept these concepts as a way to look at large datasets. We know that there is not even a probability that a spreadsheet can capture the complexity of large datasets. You may want to start with proposing a simpler visualization like SAS, SPSS and Cognos, Business Objects, or Microsofot Analysis Services and Excel 2007. The key for these visualization tools is to train the business users in using these tools. My approach is to teach first executive management and find what tools they find more useful. The purpose of these visualization tools is to let the human brain look at the totality of the data and start discovering new trends and patterns.

Tuesday, March 18, 2008

SAS Advances Enterprise Intelligence

Without a visualization example this are just words.

Risk Management Confidence: Higher-ups More Conservative

My interpretation of this article: executives that use predictive modeling understand the complexity of the risks involved and threfore are less prone to take non-fact based risks.

MADE IN IBM LABS: IBM Software Finds Hidden Product and Service Insight in Customer Interactions

I wish that when this type of news would come out a small visualization example of the output will be publish, since the visualization is what is going to tell us whether a business analytics tool is efficient in assisting an end user in looking at the totality of the data by separating the clusters in the data and clarifying the driving factors.

Monday, March 17, 2008

Financial Markets Crisis: A Potential Solution

In the late 1980's I had the opportunity to participate in the discussions of what to do with billions of dollars in commercial bank debt for countries in Latin America. In a nutshell, commercial banks had acquired a very large exposure to debt from Latin American countries, and in reality most of that debt was worthless although the interest rates were very high. The issue was similar than today's mortage backed subprime securities: the lending institution failed to follow fundamentally sound practices in their portfolios and their exposure came to the point of affecting the world finance markets. I remember the 2nd International Conference on Debt and Trade where the policy makers, financial instituions, and debt-ridden countries met. One of the solutions was to create markets for this exposure and hence the birth of the Bradley bonds. At the time, I was a proponent of a more drastic measure, to let the market forces decide the outcome of each financial instituion according to their exposure. Over the years I have come to the conclusion that the Bradley bond secondary market has substantially assisted in bringing order to the financial markets in a way that has not negatively affected Latin America countries.

My proposed solution to the current financial markets crisis is a three tier solution:
1. Allow a market for mortage backed securities in which central banks around the world could buy some of the exposures of large financial institutions. I would suggest up to 3% of the market capitalization of each large financial institution. This solution will allow the internalization of the financial markets by allowing central banks to have a direct ownership in private financial institutions. I know that this is a radical solution, but the liquidity problems in the financial markets are of such magnitutde that it requires the intervention of central banks around the world in a different way and at a different magnitude that past remedies. Special class of stocks can be created that will allow foreign central banks to participate that would restric voting rights, while simultaneously allows their input into investment practices.
2. Create a tax incentive for financial institutions in which they could lower by a total up to 3% the interest rate and/or the principal of the mortgage loan, for properties with the value of a mortage of up to $250,000. The total amount of the tax incentive could be cap at $2,500 per qualifying loan.
3. For properties with a mortgage value greater than $250,000 I would suggest to let the market forces dictate what will happen to these properties and the financial institutions that made those investments.

Friday, March 14, 2008

DATA MINING IN BANKING AND FINANCE: A NOTE FOR BANKERS

This article covers the fundamentals of data mining in financial markets and banking. The section on risk management (financial market risk and credit risk) is something that is worth taking a look at the fundamentals in today's changing financial markets. It is amazing how many times we need to go back to the fundamentals in the banking and finance industries to make sense of the corrections in the markets.

Business Analytics and Financial Markets Liquidity Issues

The news that Bear Sterns needs liquidity assistance from JP Morgan Chase and the Federal Reserve was predictable using off-the-shelve predictive analytics software. For those who are engaged in financial markets analytics let me suggest that instead of using Bear Sterns as your sampling data (before applying in to the entire dataset), use instead the Carlyle Fund data as your market for predictions. This data is less noisy and would help you determine the ratio of mortage based secutirites: total capital under management to determine risk in liquidity.

Also, please look at the other variables (i.e., geography, diversification, currency hedges, etc.) to determine what constitutes a robust liquidity scenario. In other words, invert the scenario and analyze the driving factors. Let me suggest the creation of multidimensional vectors and combine them with cluster analysis so you can separate the clusters of data and clarify the driving factors. This is a great opportunity to bring potential solutions to the executives.

A technical framework for sense-and-respond business management

This paper from the IBM Labs presents a good technical framework about what processes could be needed to achieve change managemetn when introducing predictive analytics (phrase taken from James Taylor) in a rapid changing enviroment. I am not suggesting that this process is exclusive or innovative, but I am suggesting that a process is indeed needed for change management to take place (if you fail to plan, you plan to fail). What other processes, or components of a process, are needed to introduce predictive analytics into a rapid changing environment? If you take into account the changing nature of the world economy and how is affecting organizations, what processes need to be in place to introduce predictive analytics into an organization? We know that is not a matter of the science and technology since those are already in place.

Thursday, March 13, 2008

Emerging Trends in Business Analytics

I specifically like the emphasis and analysis of business users as it concerns business analytics in this article. The emphasis on solving business problems (not technical problems) and the ability to measure results are great points too. This brings me to the topic of change management and metrics. A collegue of mine, Mitch Weisberg, is the expert in this area: how to have the right metrics and the right processes that will incorporate advanced business analytics into the fabric of an organization.

Wednesday, March 12, 2008

Cognos and SPSS Forge Partnership to Deliver Predictive Analytics Integration

This is big news. This moves Cognos from a reporting tool to a predictive analytics visualization tool. This bring IBM and SPSS in direct competition with Microsft, SAP, and SAS.

Tuesday, March 11, 2008

Information Visualization and Visual Data Mining

A little thick but extremely accurate. As companies prepare to provide analytics in software as a service (SaaS) they must be aware of the nteraction between data mining visualization and the human component. Data mining visualization is the next evolutionary step in utilizing the full capabilities of the human mind.

Monday, March 10, 2008

Business intelligence on demand

The next generation of predictive analytics is here!
Congratultions for a job well done!
Software as a Service (SaaS) in predictive analytics!

Saturday, March 08, 2008

Why Don't American CIOs Want to Lead In Emerging Technologies?

Read this article. My answers to the question: timidity, lack of understanding of what it means to be an executive, some people are followers and some are leaders, and it is the safe route.

Thursday, March 06, 2008

Why Business Analytics as a Service Won’t Spook IT

This is a very good and insightful article. Business Analytics as a Service (SaaS) is where the market is moving to. the main reason is that CIOs and business users do not want the details of advanced analytics, they just want a tool that works. Last week I was talking with a couple of IT executives from different large companies and they are looking for something "actionable". SaaS meets this requirement.

My only difference with the authors is that when people talk about "Excel hell" they do not know the capabilities that SQL 2005 Server Analysis Services and Excel 2007 can bring to an organization in terms of advanced analytics. I can see the combination of these two products becoming the first sucessful SaaS product for large enterprises as well as small businesses.

Saturday, March 01, 2008

Visualization of Data Mining

I have found that using a clustering analysis visualization including a multidimensional vector is an efficient way to cut the cost of predictive modeling while simultaneously making data mining available to everyone within an organization. The advantage of this process is that it allows everyone to see large data sets in such a way that they can draw their own conclusions even without subject matter expertise.

Monday, February 04, 2008

Microsoft and Yahoo

I personally think that Microsoft proposed adquisition of Yahoo is a good idea. Specifically, from my little part of the universe (data mining, predictive modeling, and analytics) Microsoft has powerful tools in SQL 2005 Server Analysis Services and Excel 2007 when used together. Microsoft and Yahoo have mature and first-of-class research laboratories. I believe that this type of adquisition will benefit the consumer of healthcare and other industries.

Saturday, February 02, 2008

Preparing Your Company For Recession

This article on CFO.com clearly indicates the need for company to act quick and smart during a recession. It is important to look at the Hackett Group paper and see that it identifies forecasting and analytics as one of the three main areas to look during this time that could increase the efficiency of operations. I specifically agree with the recommendation of using predictive modeling for revenue and costs.

Saturday, September 22, 2007

Mobile Business Intelligence - the Next Big Step

Cognos has taken the lead in the area of mobile business intelligence. This is a huge step!

Wednesday, September 19, 2007

Duke Plots Course Beyond the Smart Grid

This is one of the most foward thinking business intelligence projects in the world. Duke Energy is taking the steps to create a smart power grid. The creation of the intelligence real-time applications behind this concept will revolutionize the world.

Friday, September 14, 2007

VP, Decision Support Systems

I was contacted for a position as a VP, Decision Support Systems, in New York with a prestigious financial institution. Although I am not interested some of you may be interested in this position. If you are contact Heinz Bartesh at heinz@pcninc.com

Wednesday, September 12, 2007

Market Forecasting and Modeling for the Power System of the Future

This paper addresses the utilization of predictive modeling and forecasting in the power supply industry. The issues herein were identified a couple of years ago, but the implementation is occurring at this time. The challenge of a forecasting system in power supply is the many "what if" scenarios that different models will need to consider. These "what if" scenarios need to take into consideration:
1. physical assets
2. contract prices
3. economic forecasting

A modeling of this size and complexity could require the utilization of a combination of most of the tools and methodologies in the current data mining and predictive modeling market, plus the development of some new tools.

Predictive Planning for Supply Chain Management

This paper shows one methodology of using predictive modeling in planning and scheduling decisions in supply chain management. It is important to remember that the variables will be different depending on the industry and client-specific requirements.

Tuesday, September 11, 2007

F.B.I. Data Mining Reached Beyond Initial Targets

It seems that the definition of "community of interest" association will be the cluster results from I2 Notebook. I have used this tool many times and the results are impressive. If you are using this tool you may consider using additional analysis (logistic regression and decision tree) to further refine your results.

Monday, September 03, 2007

Frequent Doesn’t Mean Loyal: Using Segmentation Marketing to Build Shopper Loyalty

This is a classic article regarding the theory of how to translate customer loyalty to develop a "profitable differenciation".

Data Mining Analysis and Modeling for Marketing Based on Attributes of Customer Relationship

This article on data mining in CRM for the retail industry shows the utilization of cluster analysis, association rules, and linear regression in determining Attributes of Customer Relationship ACR

SUPERVISORY [BANK] RISK ASSESSMENT AND EARLY WARNING SYSTEMS

This 2000 paper from the Bank for Internaltional Settlements gives a good overall picture of the statistical modeles used to analyze and determine risk assessment in the banking industry. The mortage industry and associated lenders and market leaders should consider implementing these early warning system models to prevent the current mortage crisis to repeat itself in other areas.

Data Mining Applications in Higher Education

Good article about data mining applications in higher education.

Data Mining Technologies and Decision Support Systems for Business and Scientific Applications

The issue is whether or not you have the information or data anymore since a lot of companies and organizations have large amounts of data. "The challenge is to be able to utilize the available information, to gain a better understanding of the past, and predict or influence the future through better decision-making."

Integrating Customer Value Considerations into Predictive Modeling

This is a good article about how to measure "sucess" in applied predictive modeling. The example is in the telecommunications industry, but the "valuable customer" aproach can be used in any industry.

Thursday, August 30, 2007

Predictive Analytics and Data Mining

Excellent article by David in terms of the utilization of data mining and predictive modeling concepts. I believe that expectations and corporate strategy are not properly aligned in this area. Data mining, predictive modeling, and business intelligence give an enterprise the opportunity to build a decision support system which is the marriage of the best technology and science have to offer. It does not replaces intuition, but augments it. The best way that I can describe this enterprise system is:
1. A robust back end to handle large amounts of diverse and complex data;
2. Creation of client, industry, and business problem variables that can assist in determining patterns in the data;
3. Utilization of multiple data mining or predictive modeling algorithms to classify the data; and
4. Utilization of statistical techniques to help forecast, partition, or determine areas with common patterns.

On the Advantages and Disadvantages of BI Search

Stephen has written and easy to read article as to the challenges for the next-generation BI. Let me add that text-mining technologies are currently improving constantly. We have seen it with Yahoo and Google and their association algorithms when you start typing in the search bar. As individual PC, laptops, and portable handled devices become embedded with intelligent agents we will start seeing the future unfolding right before our eyes. At the same time, you will see servers with the capacity to analyze the information from the intelligent agents. This is exciting!

Tuesday, August 21, 2007

Paper Kills: Transforming Health and Healthcare with Information Technology

If you have a role in healthcare strategy or data mining this book is a must read. Thought leaders like Dr. Brandon Savage at GE Healthcare. Once medical records are transformed into digital form, the vision of the future of healthcare in the US is data mining and healthcare analytics are at the core of this vision. Hence, what we are working on today will be one of the building blocks of this vision.

Monday, August 20, 2007

Donald Farmer on Data Mining

Donald and his team at Microsoft are first class professionals in data mining. If you have not visited Donald's blog I would recommend you to do it.

Donald's blog: http://www.beyeblogs.com/donaldfarmer/

Look at his data visualization music video link! http://www.youtube.com/watch?v=KHEIvF1U4PM

Thursday, August 16, 2007

Technology: Is Data Mining Misguided?

When I read this article I see the clear confusion regarding the expectations of data mining technologies and how they should interact with statistical methodologies. The purpose of data mining should be to create a classification (think of a list of items going in a particular order 1, 2, 3,4, 5...). This calssification is based on a value that is express as a probability. Once you have a good measurement tool (this is waht data mining should do for you), then you apply statistical techniques (distribution, cluster, cause and effect analysis, correlation) to determine the areas that should "group" together (using relevant discrete and numerical variables, including but not limited to the data mining value obtained). Once you have determine the areas you want to study, then you use the data mining value (and other variables) and statistical methods to make your recommendations. Again, the process is: 1. variables, 2. data mining models, 3. determination of areas of classification, 4. statistical methods, and 5. recommendations.

The change management is to get users of data mining to understand that it is a process and that for it to work you need to invest resources (mostly time and technology).

Wednesday, August 15, 2007

Google, Microsoft and the glacial healthcare revolution

Good article on ZDNet that explains how Microsoft and Google are competing in their strategic initiatives in the healthcare industry. I believe that the main issue is how to effectively aggregate and find value in the vast amount of healthcare data. I think that the solution is going to be a combination of predictive modeling, data mining, powerful servers, and artificial intelligence tools that are connected through the Internet. I am honor to be a participant in this effort.

Thursday, August 02, 2007

Korean stem cell fraud masked a true advance

The stem cell fraud case in Korea shows how scientific fraud can actually hold back progress. If Dr. Hwang would have been careful in his methodology and reporting he will still be considered reputable scientist. Lesson to be learn: be careful in your methodology and even more careful in your reporting of finding.

Monday, July 30, 2007

Genetic breakthrough in multiple sclerosis -- biggest for decades

This is what data mining and predictive modeling is all about, a tool for subject matter experts to identify "new suspects". Once predictive modeling helps identify new suspects the subject matter experts apply their knowledge to determine whether this has value in their filed.

Monday, July 23, 2007

New processors present problems, payoff

The new challenge and opportunity in designing microprocessors is presented in this article. A new operating systems will be needed to optimized the utilization of these microprocessors. In my opinion the combination of data mining technologies that allow "automatic data mining (or predictive modeling) factories", intelligent agents, and parallel computing are going to be the fundamental blocks in addressing this challenge. Those technologies combined with gaming, simulation and other visualization technologies will be part of ingredients needed for this leap into the future.

Conceptually I think that it will be like this:

Data mining technologies will provide the fundamentals of pattern and error detection. Due to the complexity and diversity of the rich data environment that we currently face we will need the ability to have part of this technology embedded into any program, and we will probably need multiple and different data mining models analyzing data simultaneously so as to customize the needs of the end users;
Intelligent mobile agent technologies would be fundamental to access and process data from servers, mainframes, and handheld devices like cellphones;
Web based technologies will be fundamental in solving finding patterns and in improving remote communications;
Parallel computing technologies will be needed to optimize the processing of large quantities of data; and
Visualization technologies that make complex patterns easily understood, while simulateously adhering to establish laws of nature (i.e., medicine, or physics), or previous experience (business rules) would also be a keystone in this endevour.

Our biggest challenge is going to be to reach out acrross multiple disciplines and technologies to integrate all these technologies into a great schema. In this sense we are all pioneers. We bring different skills set that we combined will mark the path for others to follow. It will not be easy, but it will be worthwhile!

Wednesday, July 11, 2007

Understanding Molecular Imaging

GE Healthcare is correct in their assessment that if you can track molecular changes in cells and link them to disease progression an enterprise will be demostrating "the power of molecular imaging". I believe that web analytics algorithms and software is what is going to make the step possible. The reason is that web analytics algorithms allows to predict a variable (i.e., disease) given a series of inputs (medical procedures and other diagnoses) over a sequence.

Web Analytics and Healthcare: Disease Progression

We are starting to develop a heatlhcare model for disease progression prediction using Microsoft Sequence Clustering algorithm in SQL 2005 Server Analysis Services. It seems to work well, but I would like to make a comparison with other algorithms. I was wondering if anyone in the community knows how can we obtain Gooogle's permission to use (or adapt) their Web Analytics algorithm for disease progression prediction. Or if anyone has any other suggestion for Web Analytics software that we could try. We have the largest private payer healthcare database in the U.S. so we need robust algorithms.

Monday, July 09, 2007

Moving Closer To Solving Lou Gehrig's Disease Mystery

http://www.medicalnewstoday.com/medicalnews.php?newsid=75539

This is an area that I hope predictive modeling and data mining can make a difference. If we can do a linear disease progression modeling at the cellular level we might be able to diagnose and prevent ALS before its onset.

Thursday, June 21, 2007

Web Analytics: Future Applications in Predicting Modeling

Web Analytics: Future Applications in Predicting Modeling

A lot of time and effort is being channeled in the area of web analytics. This terms refers to:

“[t]he measurement of data as it relates to an Internet site, including the behavior of visitors, the amount of traffic, the conversion rates, web server performance, user experience, and other information in order to understand and proof of results and continually improve the results of a site towards a set of objectives.”

Since web analytics is another area of predictive modeling, we must ask whether the methodologies, analytics software, and visualization tools develop in web analytics could have impact in other industries that use predictive modeling like healthcare, banking, insurance, retail, and manufacturing industries. I think that the processes and software developed for web analytics will ultimately be use in many other industries because the intersection of the Internet and other industries is already a reality.

Predictive modeling and web analytics have the same objective, to provide a measurement (or baseline) and to predict future behavior. One of the key contributions of web analytics has been software that can withstand the rigors of commercial use. The scalability components of web analytics are crucial for other industries in which large databases has become the norm.

Another significant issue that web analytics has contributed to the area of predictive modeling is the ability to come together and provide a series of metrics and benchmarks for the industry. Although some may disagree with this assessment, if we look at the history healthcare industry it apparent that the inability to agree upon benchmarks and metrics have negatively impacted the cost of healthcare in the United States. Moreover, those involved in web analytics could give industries like banking, insurance, and retail an innovative new look at what needs to be measured.

A third issue that web analytics have contributed to the issue of predictive analytics is the healthy, spirited, and robust exchange in the area of privacy. The Internet has created and raised serious, relevant, and pertinent questions regarding privacy that other industries could find beneficial.
A fourth area that web analytics has contributed to predictive modeling are the development of new return on investment (ROI) models in business. As companies adopt for these ROI models for their advertisement, new media, and marketing strategies they may find that these models are also applicable to other lines of businesses.

Last but not least, web analytics have contributed to a new set of visualization tools that summarize previously hidden nuggets of gold in a way that can be easily understood and act upon.

Tuesday, June 19, 2007

Geovisual Analytics and Crisis Management

good article about geovisual analytics. Good utilization of I2 and GIS technologies.

NIH-NSF Visualization Research Challenges Report

This article is for all the developers and scientists working on visualization tools in analytics and data mining. Enjoy!

Monday, June 18, 2007

BioGRID version 2.0.29 release ( maintenance update )

the latest release of BioGrid

Friday, June 15, 2007

What Data Mining Can and Can't Do

I include this article because Peter Fader is an expert in behavior predictive modeling. The caveat is that there is a difference between predicting behavior and predicting patterns that are not necessary related to behavior. For example, in molecular genetics we try to predict how a gene or a chemical substance have an impact on the physiology of a person. These reactions are mostly physical instead of behavior oriented. Nevertheless, I agree with Peter that executives expectations of data mining are out of proportion to the investment in predictive modeling. Predicting modeling using Excel spreadsheets might work on behavioral analysis, but I do not think it will work in the health care and biotechnology industries. When I was practicing law I used to refer to this as the "what is at stake syndrome". In a civil case is money, but in a criminal case is a person's freedom. Experience have taught me that generalizations make a great sound bite, but could be dangerous in the real world.

Wednesday, June 13, 2007

Evaluation of noise reduction techniques in the splice junction recognition problem

The authors have done a good job in evaluating noise reduction techniques using pre-processing algorithms in large genetic databases which are characterized by the presence of noisy data which can affect the performance of data mining processes.

A review of symbolic analysis of experimental data

This article suggest a time-series analysis as a way to reduce noise in large databases when doing analysis. My first impression was, "you must be kidding time series have nothing to do with noise reduction". Then I did an experiment using my 4.7 terabytes of data and I found that a time series analysis could detect the cause of noise in my sample data (or training set). When I re-read the article after the experiment I found that this methodology is for processes that are non-linear and possible chaotic. I am using healthcare data that is non-linear and chaotic. I found that the time-series analysis was a good methodology to identify noise in the training set for the data tags=1. I still need to do a lot of reverse engineering to understand the why, but in the meantime I thought this was worthwhile passing on.

Enhancing Data Analysis with Noise Removal

I am working on doing some noise reduction to an enterprise data mining model I thought that this was a good overall article on the different techniques applicable.

Tuesday, June 12, 2007

Incremental Mining of Sequential Patterns in Large Databases

The fundamentals of this algorithm could be use in large databases.

The problem: "As databases evolve the problem of maintaining sequential patterns over a significantly long period of time becomes essential, since a large number of new records may be added to a database. To reflect the current state of the database where previous sequential patterns would become irrelevant and new sequential patterns might appear, there is a need for efficient algorithms to update, maintain and manage the information discovered [12]. Several efficient algorithms for maintaining association rules have been developed [12–15]. Nevertheless, the problem of maintaining sequential patterns is much more complicated than maintaining association rules, since transaction cutting and sequence permutation have to be taken into account [16]."

The proposed solution: "This method is based on the discovery of frequent
sequences by only considering frequent sequences obtained by an earlier mining
step. By proposing an iterative approach based only on such frequent sequences
we are able to handle large databases without having to maintain negative border
information, which was proved to be very memory consuming [16]. Maintaining
such a border is well adapted to incremental association mining [26,19], where association rules are only intended to discover intra-transaction patterns (itemsets). Nevertheless, in sequence mining, we also have to discover inter-transaction patterns (sequences) and the set of all frequent sequences is an unbounded superset of the set of frequent itemsets (bounded) [16]. The main consequence is that such approaches are very limited by the negative border size."

Friday, June 08, 2007

Molecular Staging for Survival Prediction of Colorectal Cancer Patients

This article shows the potential of data mining for prognostic diseases using microarray (SAM) data.

Go Stanford! That's were one of my daughters graduated from and SAM is a product of Stanford University.

The treatment of missing values and its effect in the classifier accuracy

Good paper on the effects on missing values in the accuracy of your model. The organization of this paper could improve if the authors would have included their recommendation as part of the Summary.

Nevertheless, this is the crucial recommnedation (p.8): "We recommend that we can deal with datasets having up to 20 % of missing values. For the CD (Complete Deletion) method we have up to 60 % of instances containing missing
values and still have a reasonable performance."

For healthcare, pharma, and biotech data this paper is important because of the complexity and diversity of this data.

An Assessment of Accuracy, Error, and Conflict with Support Values from

This article is for experienced biostatisticians. Nevertheless, this is the interpretation for the layman:
When molecular biology theories are tested with real data we need to be cautious in reading bootstrap values if we are assuming an underestimation of the actual support. For example (my example is not in this article), if using a decision tree vs. logistic regression bayesian model, be cautious in how you assess the accuracy of your model since the decision-tree tends to understimate and bayesian models tend to overestimate.

I have found that to increase a classifier accuracy for a model, this type of distinction (non-parametric bootstrap values vs. Bayesian probabilities) is fundamental.

Phase II Studies: Which is Worse, False Positive or False Negative

A short but powerful article that helps understands the effects of Type I and Type II errors in clinical trials.

Monday, May 21, 2007

SPSS Launches Enhanced Predictive Analytics Platform

I have not tried this product yet, but SPSS tend to have good products in predictive modeling.

The Advantages of Smart Data Mining

Good general article about data mining in the retail and POS indsutry. Free registration required.

Data-mining moves into the mainstream, in search of profit

good general article about how data mining is moving into different fields.

Wednesday, May 16, 2007

General Healthcare Data Mining Model

We went into production on May 7 with our General Healthcare Data Mining Model. Our metrics comparison (without giving intellectual property away) is as follows:

Metric 1 - 5.15% (old) with new model 20.3%

Metric 2 - 6.06% (old) with new model 53.4%

Right now we are fine tuning the model, and reducing our findings to a writing since we must make sure that we have good documentation. We processed over 396 million claims in a 4.7 TB environment (SQL Server 2005). We can refresh every month right now and the goal is to refresh once a week in the next couple of months.

The model can be used in healthcare, pharmaceutical, and biotech industries.

Tuesday, May 01, 2007

Doctors test gene therapy to treat blindness

this is the type of therapy that once there is a single success is going to revolutionize the way that we look at data mining in the health care field.

Tuesday, April 24, 2007

Father and me

Father and me

He was a giant when in one knee will listen to me
He was wise when over the years I learned to listen
He was a leader when showed me by example
He defined courage in the most difficult struggle

I will be a giant like him when on bended knee
I will be wise by becoming a listener
I will lead by example
I will have the courage to be his son

Monday, April 23, 2007

Studies back Parkinson’s and pesticides link

This article touched a very personal issue that explains why I have invested the last seven years of my professional life to perfect data mining in healthcare and in the biotech industry. My father was a pure research scientist in the area of pesticides research for the last 10+ years of his professional career. About 5 years ago he died of Parkinson's disease. As I tried to understand his disease I discovered that all the scientists and lab workers that worked for him in the Pesticides Laboratories died of Parkinson's disease too! I decided that my experience in data mining, mathematics, business and law could be use to help create a health care data mining model that could have multiple uses: from finding hidden patterns for outcomes research, or molecular biology research, or healthcare insurance claims.

So friends and collegues that is my goal: to create and make sure that as many people as possible have access to a healthcare data mining model that have multiple uses. I thought that if I created this tool, I could assist scientist and companies find sometime of relief for some terrible diseases. Eighteen months after dad died, my mother died of ALS. Now you know what drives me.

Thursday, March 29, 2007

Predicting breast cancer survivability:comparison 3 models

I like that this article uses good methodology for the comparison of the three different models. On the other hand, my experience tells me that it is the combination of the three models that increases the predictability in any enterprise model.

Intel details new chip technology

The first part of the puzzle of intelligent agents and data mining is already taking place.

Wednesday, March 28, 2007

Successful Data Mining Applications

For those who are interested in what are the industries that use data mining sucessfully. There is obviously a lot of growth potential.

Mining the Genome

The basic article about bioinformatics. It includes the issues and challenges.

Mining biotech's data mother lode

A good article that shows the utilization of data mining in the biotechnology (biotech) industry.

Pellucid Agent Architecture for Administration Based Processes

Another application of data mining and intelligent agents

http://ups.savba.sk/parcom/publications/agents/IISAS_IAWTIC-2003.pdf

Application of Data Mining and Intelligent Agent Technologies to Concurrent Engineering

This is a good article about a potential application of data mining an intelligent agents in the manufacturing industry

http://issel.ee.auth.gr/ktree/Documents/Root%20Folder/ISSEL/Publications/3_MITKAS_IJAM.pdf

Tuesday, March 27, 2007

Intelligent Agents

And the Future of Data Mining

An intelligent agent is: (1) a software agent if it is a piece of software that acts in a relationship of agency for a user or other program; or (2) an intelligence actor if it interacts with its environment. The first definition refers mostly to data mining, while the later refers to a robot like machine.

Although in the science and technology communities we tend to separate both definitions of intelligence agent, the advances in computer processors are bringing both environments closer to one another. I imagine the integration of an Electronic Medical Record (EMR) device like GE Centricity, and healthcare specific data mining algorithms using Microsoft SQL 2005 Analysis Services into a machine learning hardware that will assist physicians and other healthcare providers in real-time improving and measuring of clinical outcomes.

This type of technology could also be applicable to PDA’s and trading in the financial markets, or the purchasing goods and services (brick and mortar or thru the Internet), or in the decision-making process of what food to buy or movie to watch. The technological challenges will be correlated by the advances in technology by companies like Intel and Motorola in designing smaller, faster, and with greater storage capacity. Other challenges involve data network security and privacy issues which affect consumers. These challenges are great, but without any doubts the framework to integrate intelligent mobile agents and data mining is already in place.

Strategic alliances in the technology industry are no longer limited to the industrialized countries, but are a worldwide phenomenon. They are not in the realm of the large technology companies either. I would not predict who, when, or what industries and what companies will benefit from the merger of both technologies. I do predict that we should see the first fruits of the merger of both technologies in the next eighteen to twenty four months.

SQL 2005 Analysis Services Project: Training Set

The main reason of why an SQL 2005 Analysis Services project fail is the lack of understanding of the purpose and importance of the training set in data mining. The Training Set takes the place of the scientific theory in data mining. The scientific theory refers to facts known to be true or false. The key is specificity. For example, if you are trying to find out what cancer drugs have the best chemical compounds to fight off cancer you must have the specific chemical compounds and their associated values for each drug. These are called inputs in Analysis Services Data Mining Structures (DMS). The second step is to decide what you want to predict. Do you want to predict a discrete state (yes or no)? Do you want to predict a numerical continuous value (i.e., the price of a particular item)? The third step is to determine your key column or the unique identifier for a particular row.

Always ask yourself what I am trying to predict or what is the scientific theory? The theory and your training set are always specific to want you want to predict. Remember, Microsoft is providing the tool but you must provide the specific theory.

Once you successfully build one model then you can use that model to predict similar situated situations. If you are selling fruits built the model for selling apples first. Once this model is working change the training set to reflect oranges and apply the same model to oranges. The combination of all your models is your data mining enterprise system.

Friday, March 23, 2007

By Alberto Roldan

A number of data mining, executive management, and IT professionals seem to be experiencing the same issue with Microsoft QL 2005 Analysis Services (MAS): How do I make this product work for my enterprise? These ten best practices should help provide some assistance in dealing with this issue.

1. Training: Any organization using this product must have at least one person who has received training in SQL 2005 Analysis Services, and the basic principles of data mining and predictive modeling.

a. Do not expect the Information Technology Department to create an enterprise data mining project without the proper training in the technology and the science of data mining. One of the reasons for the lack of success of data mining projects is that the IT department understands neither the technology, nor the science behind data mining. MAS make the development of an enterprise data mining project, if at least one member of the staff understands the technology and science behind it.

i. Potential Solution – Since this is cutting edge technology which merges science and technology, the Chief Technology Officer, Chief Information Officer, and Enterprise Architect must receive some training in this area. They do not need to be experts, but they must understand the basic principles behind it. This training will make sure that expectations and strategic business initiatives properly align. Also, make sure at least two or more software engineers take the online tutorials.

2. 2. Strategic investment and not simple a cost center: Most IT projects are linear (i.e., project scope, charter, resource allocation, design, development, QC, testing and deployment). Data mining is always a cyclical project because it is research and development. It is heuristic by nature. Organizations invest in enterprise data mining because they realize that the amount of data must be managed differently to transform data into actionable information. The expectation that we are going to do something differently but use the same practices that we have used in the past is an oxymoron.

a. Plan data mining projects by incorporating research and development techniques into IT software engineering best practices. You need a specific theory for your research, proposed research steps, timetable, and allocate dedicated resources. Also, document all the steps, define your metrics, complete the research, QC the results, evaluate results, and determine the next step for research in your continuous improvement process.

i. Potential Solution – Concentrate on your theory for research. The more specific your theory, the greater the probability to conduct an experiment that will give you specific results that you can use (whether by confirming your theory or not confirming your theory). If the theory is something to the effect of "save the planet and cure cancer," I will guarantee you that it will not work. Never underestimate the ability to transform a strategic business initiative into multiple theories for data mining research.

3. 3. Dedicated Resources: One of the missions of any IT department is to prove added value to the enterprise by reducing costs. As a consequence, the standard in some organizations is to share resources as much as possible in order to decrease costs. Nevertheless, the architecture of an enterprise data mining project requires, at a minimum, its own server due to the amount of processing time required.

a. The SQL 2005 Server is a powerful tool that has the ability to accommodate many users and serve multiple purposes in the enterprise. As any other resource we should always strive to optimize this resource. Nevertheless, I have found that in a metadata environment, operational and data mining research projects' competition within the same environment causes unnecessary friction and delays to all the parties.

i. Potential Solution – The best practice is separate and dedicated IT resources (staff, hardware, and software). Nevertheless, if this is not feasible, a detailed communication and utilization plan of resources must be implemented. The goals and expectations of any data mining project must be adjusted to reflect the additional time (between 25%-50% additional time) required to complete a data mining project.

4. 4. "One Theory at aTime" Rule: You can use this tool to address multiple business issues, but it would probably require multiple models for multiple issues. If you are looking for the needle in the haystack, you must consider that you could have multiple haystacks and multiple needles. Therefore, you are in a situation that requires multiple models for the haystacks and the needles. The complexity of this issue cannot be underestimated.

a. Many enterprises spend vast resources in their organizational planning and structure. The reason for this expenditure is they understand that their business is complex and requires a clear chain of command to successfully implement strategic and operational initiatives. Nevertheless, these same enterprises fail to recognize this complexity when attempting to implement data mining systems.

i. Potential Solution – Study your organizational chart. This could be a roadmap as to the priorities for a successful enterprise data mining system. It will assist in defining the theories that you want to test in a research environment.

5. 5. Models are never generic; they are always specific: The use of the terms data mining and artificial intelligence sometimes are used out of context in the business environment. These terms tend to be used more in the science fiction context than in business content. Therefore, this contributes to unrealistic expectations of what data mining could do to assist in implementing strategic business initiatives. It is imperative that the CIO and the CTO have a basic understanding of the science and technology behind data mining so the executive management team makes well-informed decisions about the incorporation of these tools into any initiative.

a. A data mining system is not going to lower your operational costs overnight by ten percent. A data mining project is not magic, and like any other strategic initiative, involves planning, knowledge management, and change management. Expectations must be realistic to the size of the investment. Investment is not just hardware and software, but it also involves training and making sure that you have the right people to do the job well. It requires being intellectually honest to determine what are your needs, and candid about your business expectations depending on the size of your investment.

i. Potential Solution: The first step before making an investment to evaluate where the organization is currently at, how did it go there, the nature of the competition, and what do you think a data mining system can do for your organization. The axiom that those that fail to plan, are planning for failure is applicable in data mining. Executive management support and sponsorship is a keystone to the process. Do not underestimate the challenges and cyclical nature of this type of initiative, but make sure that the message throughout the organization is clear: we will achieve an enterprise data mining system because it is part of how we intent to stay competitive in the future by lowering cost and increasing revenues.

6. 6. Three Main Categories: The variables (input), training set (sample), and data mining structures are the keystones of Analysis Services. A clear understanding of these three areas will assist in the creation of a data mining system.

a. IT Architects and developers that do not understand the three main components of Analysis Services will have a difficult time with the successful completion of a single data mining project. It will be impossible for them to successfully design and implement an enterprise data mining system. The knowledge requires a basic understanding of the technology and the science of data mining and predictive modeling. Acquiring this knowledge does not need to be extremely costly or time-consuming. Nevertheless, expecting the IT department to successfully complete this type of project without allowing them time to acquire this knowledge and training is putting them in a position to fail.

i. Potential Solution: The technical and scientific knowledge to successfully complete a data mining project can be acquired thru training (classes or online), engaging the services of a consultant with a proven record of completing data mining projects, or by self-motivating reading. I would suggest looking within your own organization for individuals with at least a statistics or mathematical background, or those who have an interest in life sciences (genetics, biology, astronomy, physics, or chemistry). Those individuals could have a predisposition to quickly learn the science and methodology behind the technology of data mining.

7. 7. Statistically accepted best practices as metrics: As this product has joined science and technology but is cyclical rather than linear, we must incorporate statistically accepted best practices if we want to have a continuous improvement process required in research. The inclusion of additional areas of measurements besides traditional business metrics (cost per employee, revenue per employee, and profits per employee), IT metrics (reduction of hours to complete a process, increase in revenues, CPU utilization per employee, and total down system time), now we need to incorporate some scientific metrics that will assist in improving a data mining system. The understanding of scientific metrics is important to measure success, improvement, or lack of improvement.

a. It is a change for organizations that have never had an organizational research component to apply an additional set of metrics to gauge the performance of a data mining system. The tendency is to only use the same metrics that has been used in the past to measure the performance of a data mining system. The error is in assuming that data mining is a linear type of project rather than a cyclical one. The best example is a data mining system could seem to be a failure from the business point of view, but when measured by statistical metrics it is successful. In this scenario, the statistical metrics can help diagnosis the problem.

i. Potential Solution: I would suggest the utilization of an add-in statistical software package to determine the Variable Inflation Factor (VIF) for the numerical variables to measure that a particular variable that is having an undue influence in your model. Also, I would suggest measuring Type II error to measure determine the predictability qualities of your model (i.e., what is your model measuring).

8. 8. Design of Experiments: In a research environment the projects are cyclical. The creation of a successful data mining system requires research and research requires experimentation. This is one of the areas where business and science seem to conflict. Science expects that some experiments will not be successful, and Businesses tend to be risk averse. Nevertheless, businesses constantly take risks to improve their profitability and growth. Therefore, it is not that businesses do not take risks, but that they want to be able to quantify and qualify the risks.

a. When science and technology join in the business arena some compromises must be made that are beneficial for all the parties. Science and technology cannot operate within a “pure science” mentality, and businesses must face the inherent risks head-on.

i. Potential Solution: If you have invested in the SQL 2005 Server and its software you have already incurred in part of the financial risk. The issue then becomes do you want to use the server as a simple storage facility or are you willing to make an additional investment (i.e., training, knowledge transfer, or consultant) to use the full potential of this tool. I would suggest stating by having two people in your staff go thru all the online tutorials (no shortcuts) and then let them try to successfully create and deploy a test analysis services project using limited data. This process will bring out a series of unanswered questions, and those questions will help you plan the options that you have to acquire the knowledge that you need to design, test, and implement a data mining system. Also, change the name design of experiments to design of research since it will make it more easily understood by others within the enterprise.

9. 9. Quality Control and Testing: Design multiple quality control staging areas during the process. Although Microsoft has made this product in such a way that it writes about ninety percent (90%) of all the code you will find that you need to make small modifications and wrap coding sometimes. Also, optimization of the processes will take place if you need to create specific variables like Z-scores. Lastly, you will find resistance to changes within the IT and Operational organizations of the enterprise to making any changes of the current processes. This resistance to change will require a specific change management strategy

a. The potential of a successful data mining system in an enterprise tends to create apprehension. This apprehension is rooted in the mistaken belief that if a data mining system is successful it will result in people losing their jobs. In the work place nothing is as personal as the instability that a potential change could bring if the perception for managers and staff alike is that their jobs might be in jeopardy if this new system is successful. The implications for QC and testing are immeasurable in terms of utilization of productive time.

i. Potential Solution: The development of realistic communication, quality control, and testing plans as part of the initial executive management evaluation whether or not to design a data mining enterprise system is a must. This plan should include goals and expectations at all levels of the enterprise. Specifically, the communication plan should address the issue of the potential changes in duties and responsibilities of staff and managers. It is a lot easier for managers and staff to buy into this type of strategic initiative if they see the role that they will play during the different stages of an enterprise data mining project.

1 10. Expectations: The term “high but realistic expectations” does not need to be a contradiction. The high expectations refer to the ability of the enterprise to learn to effectively use the tools at its disposal. Realistic expectations refers to the increase in value that a data mining system should give you based on your prior experience in growing and developing the business. Also, expectations should be directly correlated to the investment in training and dedicated resources to a data mining project.

a. Some organizations tend to go to their IT department and ask them to build them a data mining or analytics enterprise system that will solve their business issues, as well as all the pressing world issues. The CIO, CTO, or VP of Technology sometime do not have the knowledge required to explain that a more specific approach is required in data mining. Hence, the failure of data mining projects is the failure to properly plan having high but realistic expectations:

i. Potential Solution: Microsoft has made a product that streamlines a lot of the designing and developing of a data mining system, but the key is specific planning, knowledge transfer from the business areas to the IT department, and defines the specific business needs. It is going to take time and effort to put this together. The first step is to develop a plan that will take into consideration the resources and training necessary to use this new tool. This plan should serve as a roadmap of how we are going to implement this initiative

I hope that you can use some or all of these best practices in using SQL 2005 Analysis Services to create an enterprise analytics or data mining system. The barriers are technical, scientific, and in the change management areas. The potential is immeasurable. Contact: alberto_roldan_2001@yahoo.com

Monday, February 19, 2007

Predictive Modeling and Microsoft Analysis Services 2005

I have been using this product now for 6 months. Also, I went to Microsoft and got a 3 day training by Jamie (thanks!). This is a good product and Microsoft has done an excellent job at bringing data mining to the "masses".

This product is scalable (we are utilizing in over seven terabytes of data every month) and user friendly. It integrates fairly simple with Reporting Services.

The key in how to utilize Analysis Services in a supervised model is the training sample. My main recommendation is that you bring all your data tags into your training sample. In order to determine the size of your training sample population multiply the number of data tags by five and then your data tags will represent 20% of your population.

Another key issue is the modifying of the algorithm parameters. Specifically, the maximum states. In order to determine the maximum number of states in your data I suggest a combination of partition and distribution analyses. You can also use the Microsoft Decision Tree Algorithm.

David did a great job with the data mining algorithms but for those of us who have been in the data mining industry for a long time we need more detail (as well as peer review) articles about the algorithms. For example, the predict and predict probability functions has output that are negative values when this should be a mathematical improbability in an unsupervised model. Even if we filter all the negative inputs we still get negative output. I think that this is a data type kind of issue but we are still researching.

Another issue that is not address in the algorithms is whether any variable or input is improperly influencing the predictive output. Specifically, I would prefer that the models will give us the VIF value for each input. Otherwise, we may find ourselves with one of those situations that are "too good to be true."

The last issue is that the number of Type II errors are extremely large in these models (when we apply the training set to the entire population). Specifically, I am referring to Type II errors that are greater than 60%!!!

Microsoft through Jamie's group is providing us with great technical support and I want to congratulate them for their efforts.

Tuesday, August 22, 2006

Enterprise Decision Management

James Taylor has changed the EDM (Enterprise Decision Management) to a new URL. Jim is an expert in the field so I decided to let everyone know the new link: the primary URL for the EDM blog is www.edmblog.com

Wednesday, August 16, 2006

Artificial Intelligence II

Everything you wanted to know (more or less) but were afraid to ask. It is not complete and not perfect, but it will help you navigate some concepts. Good for beginners.
Read: http://www.resultspk.net/artificial_intelligence/

Artificial intelligence applied to network load balancing using Ant Colony Optimisation

Good potential solution of using a genetic algorithm to network load balancing issues. Read: http://www.codeproject.com/useritems/Ant_Colony_Optimisation.asp

Tuesday, August 15, 2006

Eleventh International Conference on Computer Aided Systems Theory

Date: February 12-16, 2007

Looks good!

http://www.iuctc.ulpgc.es/spain/eurocast2007/workshop.html

Segregative Genetic Algorithms (SEGA): A Hybrid Superstructure

This sounds like a robust and efficient solution for the issue of premature convergence (the child is no better than the parent) in genetic algorithms.
http://www.heuristiclab.com/publications/papers/affenzeller01d.pdf