Newsletter Archive
|
ETL ETL stands for Extract, Transform, Load. This process merges and integrates information from source systems in the data warehouse (DWH). Authorities like Gartner, Kimball or Inmon warn us that 60-90% of DWH development budgets are typically spent on ETL. Major project risks reside here, in particular when sources prove unreliable and poorly documented. |
|
Neural Networks Neural Networks (NNs) are sometimes considered the epitome of data mining algorithms. Loosely modeled after the human brain (hence the name), NNs "learn" patterns from repeated exposure to data in much the same way our brain learns to recognize patterns in the outside world. That is how NNs identify fraudulent transactions, high potential prospects, etc. They have been used to improve manufacturing process control, predict exchange and interest rate fluctuations, predict utility usage, and many other applications. |
|
Corporate Strategy Strategy is one of those elusive concepts that everybody ‘knows’ but few people can define. Freedman: "… strategy is the framework of choices that determine the nature and direction of an organization. The choices in the framework relate to what products and services will be offered and not offered, what markets will be served and not served, and what capabilities are needed to take products to markets." |
|
Missing Data Missing data are a fact of life. No matter how hard we try, and how careful we assemble our data sets, there will always be missing data. In fact, sometimes data are supposed to be missing, for instance because a particular attribute does not apply to a person. In some cases, the pattern in missing data can be equally informative as the information present. In general, however, the effect of missing data is to limit the amount and quality of available information. |
|
Segmentation Segmentation refers to the process of cutting up a heterogeneous population in chunks that themselves are considered to be more or less homogenous. The purpose is to identify subgroups who display similar behaviors and have similar needs. This makes the market more transparent and allows for a differentiated strategy per segment. |
|
Decision Trees Decision trees are some of the most flexible, intuitive and powerful data analytic tools for exploring complex data structures. Because decision trees can be used for both prediction as well as insight, any data miner can gain from applying them in diverse projects. |
|
XBRL XBRL, an acronym for eXtensible Business Reporting Language, will permanently transform the creation, exchange and comparison of financial information. XBRL is an extension of XML (eXtensible Markup Language) and was ‘invented’ by Charles Hoffman, CPA, in April 1998. The first official specifications for XML were released in February 1998 by the World Wide Web Consortium (W3C). Although at the moment mainly used for exchange of financial information, it offers the possibility to break down “language barriers” for any kind of business data exchange. |
|
OLAP OLAP, short for On-Line Analytical Processing, performs a unique function in between SQL and spreadsheet functionality. There are four core requirements for which neither SQL nor spreadsheets are fully adequate. These are support for:
|
|
Data Quality Assessment Data quality (DQ) assessment is as much about assessing data as it is about the impact data quality (or lack thereof) has on business processes. The business case for DQ comes from documenting how data flaws hamper the business. In this information age data is considered an asset that should be managed and leveraged just like any other tangible asset. DQ assessment is a part of auditing to ensure responsible corporate governance. |
|
Dashboards and Scorecards Dashboards and scorecards are where strategy, corporate performance management (CPM) and business intelligence (BI) come together. When implemented properly, they communicate how executing strategy should become manifest. They display results and progress, enabling management by objective. The metrics should translate an organization’s strategy into observable outcomes, and allow performance to be confronted with goals. This is where the strategy rubber meets the road. |
|
Data Mining for CRM Customer Relationship Management (CRM) was an over-hyped term that has fallen from grace. Nonetheless, principles for managing the value of a portfolio of customers remain equally valid as before. Data mining can serve many roles by supporting fact based decision making to optimize customer relationships. |
|
Data Mining Algorithms Data mining algorithms come in many shapes and forms. Because the profession is so young, there is no agreed upon comprehensive algorithm taxonomy, yet. One distinction everybody agrees on, however, is supervised versus unsupervised algorithms. Most new data mining algorithms are developed in the Machine Learning community. |
|
Data Preparation Data Preparation appears to be where data miners spend most of their time. Some say that 80%-90% of time in an average data mining project is spent “merely” preparing the data. And this is time well spent to end up with a good predictive model, yet data preparation and feature extraction are underrepresented in the data mining literature. |
|
Campaign Optimisation Campaign optimization can take place at three levels:
|
|
Affinity Analysis Affinity analysis is an association technique based on the premise that the products consumers purchase, and the preferences they express, are indicative of their future behavior. By identifying product affinity patterns, one can predict future behavior to enhance service and promote cross-selling. |
|
Vendor Selection Vendor selection is a crucial skill in the recent trend towards outsourcing and offshoring. Outsourcing has been growing from 1% in 2003 to 9% in 2007 (Meta Group). Offshoring was used by 26% of all institutions in 2003, and has grown to over 70% (Deloitte). This has included both a move to low-wage countries, as well as divesting non-core activities. |
|
System Dynamics System dynamics are characterized by the fact that adding up intentions and actions of the constituent parts is not enough to explain an entire system’s behavior. Such systems are also often called non-linear, complex systems and one of their characteristics is extreme sensitivity to initial conditions. The metaphorical example is a butterfly in India flapping its wings that precipitates a hurricane in America. |
|
Credit Scoring Credit scoring has dramatically changed the face of the underwriting business. It is a “young” discipline. Only 30 years ago, the majority of credit acceptance decisions were taken intuitively by underwriters. As statistical evidence accumulated, and early adopters of automated techniques conquered the markets, a sea change took place. Nowadays, almost all credit decisions are processed automatically, using scorecards to determine default (and sometimes profitability) odds. |
|
Forecasting Forecasting is the “art” of predicting the future. Given prevailing conditions, and within bounds that can and will be influenced, an estimate of future demand is derived. Sales targets should follow forecasts, never the other way around. They take into account what attainable conditions need to be created to succeed. |
|
Web Usage Analysis Web Usage Analysis is one of the frontiers in data analysis. Because every mouse click gets recorded, every page viewed, but also when, you can closely watch in minute detail every step a person takes on the web. At the moment, we are grossly lacking the conceptual models to fully exploit the richness these data might offer. Not only which pages are viewed in conjunction, but also the chosen navigation paths can offer tremendous insight. |
|
Customer Profitability Measuring and understanding customer profitability at the individual level enables a firm to appreciate the distribution of relationship value so it can allocate resources accordingly. Valuation of a firm equals the aggregate value of its customer relationships. Hence, the search for shareholder value is akin to managing a portfolio of customers. |
|
Problem Analysis Problems manifests themselves as the discrepancy between an “as is” state, and an “as it should be” desire. The answer to: “what’s the problem?” should be complemented with “Who has a problem?”, and “Why is that a problem?” to further understanding. Different people suffer in different ways from the same “situation”. For every person or party, the answer must be found what the essence of their problem is. This will often lead to many facets to the same “problem”. |
|
Customer Satisfaction & Loyalty Customer Satisfaction is an important driver of business success because it embodies value creation for the customer. The assumption is that satisfaction results in repeat business, and as positive experiences accumulate, the relation will strengthen. This will immunize customers from alternative offers, escaping a competition on price. |
|
IT Governance IT Governance is about encouraging and leveraging creative powers throughout the enterprise, while at the same time ensuring compliance with the company's strategic direction and policies. This conundrum can be resolved by installing the appropriate decision structures. Good IT Governance simultaneously empowers and controls. In short, IT Governance keeps resources productive and aligned. |
|
Market Research Market research can deal with both people's behavior as well as their attitudes or opinions. Research methods can be organized from subjective (e.g. self-report questionnaires) to objective (observation or behavioral data). |
|
Search Engines Search engines are the window to the web. In May 2004 there were 50 Million websites, in October 2006 this doubled to 100 Million websites (Netcraft research). Google recently counted 8 Billion unique pages. Given some 6,5 Billion searches per month, it becomes clear how important search engines are to organize and get access to these oceans of data. |
|
Marketing Accountability Marketing Accountability refers to a fundamental new way in which we view marketing expenses. Whereas marketing budget used to be seen as an expense, nowadays it is seen rather as an investment in the relationship with customers. With this new perspective, marketing expenses have come under the same kind of scrutiny we place other investment decisions: what's the ROI? |
|
CRM CRM, Customer Relationship Management, is a business strategy. CRM gained interest at a time when customer centric marketing became the fashionable thing to do. It is an attempt for large corporations to mimic the customer intimacy that small scale suppliers can offer because they understand their customers’ needs. |
|
Data Mining Models What is a model? A model is a purposeful simplification of reality. Models can take on many forms. A built to scale look alike, a mathematical equation, a spreadsheet, or a person, a scene, and many other forms. In all cases, the model uses only part of reality, that’s why it’s a simplification. And in all cases, the way one reduces the complexity of real life, is chosen with a purpose. The purpose is to focus on particular characteristics, at the expense of losing extraneous detail. |
|
Privacy Privacy is one of those topics that nobody cares about until their own privacy is being violated. Privacy threats have been compared to George Orwell’s “1984” where a totalitarian regime decimated individual freedom. Nowadays, the privacy threat doesn’t come from communist states but from capitalism, free markets, exchange of digital information and smart use of advanced technology. |
|
Data Warehousing |
|
Data Quality Data quality gives a competitive edge. Everybody agrees how important good data quality is. And everybody has been agonized by erroneous data. We've all lost a lot of time working with crappy data, and "Garbage In, Garbage Out" is probably the most commonly cited proverb in IT. Then how come it is always so hard to find volunteers to do something about it? |


