The digital revolution has made electronic information easy to capture, process, store, distribute, and transit. With significant progress in digitization, most of the organization are continuously collecting a huge amount of data. These data are of diverse characteristics and stored in the database.
The rate at which such data are stored is growing phenomenally. Knowledgeable sources reveal that about 160 terabytes of information are produced each year worldwide.
With this growth in electronic information, most organizations have realized that the information stored or gathered over the years made up an important strategic asset, and there is substantial policymaking intelligence hidden in the large volumes of data.
This intelligence can be the secret resource on which the success of an organization may depend.
It is thus imperative to evolve some techniques to discover policymaking information from these mountains of accumulated data. The field of data mining provides such techniques.
What is Data Mining?
Data mining is often defined as finding hidden information in a database. Alternatively, it has been called exploratory data analysis, data-driven discovery, and deductive learning.
The term data mining describes the concept of discovering knowledge from databases using powerful computers.
It is a broad term that applies to many different forms of analysis. The idea behind data mining is the process of identifying valid, novel, useful, and ultimately understandable patterns in data.
Data Mining Background
The manual extraction of patterns from data has occurred for centuries. Early methods of identifying patterns in data include Bayes’ theorem (the 1700s) and regression analysis (1800s).
As data sets have grown in size and complexity, direct hands-on data analysis has increasingly been augmented with indirect, automatic data processing.
This has been aided by other discoveries in computer science, such as neural networks, clustering, genetic algorithms (the 1950s), decision trees (1960s), and support vector machines (1980s).
Data mining is the process of applying these methods to data to uncover hidden patterns.
Data mining derives its name from the similarities between searching for valuable and indispensable information in a large database and mining a mountain for a vein of valuable ore.
Both processes require either sifting through an immense amount of material to discover a profitable vein or intelligently probing it to find where the value resides.
Data mining is a useful tool, a new approach that combines discovery with analysis. Data mining tools predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions.
Data mining tools can answer business questions that traditionally were too time consuming to resolve.
They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.
What Can Data Mining Do?
Data mining is primarily used today by companies with a strong consumer focus – retail, financial, communication, and marketing organizations.
It enables these companies to determine relationships among “internal” factors such as price, product positioning, or staff skills, and “external” factors such as economic indicators, competition, and customer demographics.
And, it enables them to determine the impact on sales, customer satisfaction, and corporate profits.
Finally, it enables them to “drill down” into summary information to view detail transactional data.
With data mining, a retailer could use point-of-sale records of customer purchases to send targeted promotions based on an individual’s purchase history.
By mining demographic data from comment or warranty cards, the retailer could develop products and promotions to appeal to specific customer segments.
Elements of Data Mining
Data mining consists of five major elements:
- Extract, transform and load transaction data onto the data warehouse system.
- Store and manage the data in a multidimensional database system.
- Provide data access to business analysts and information technology professionals.
- Analyze the data by application software.
- Present the data in a useful format, such as a graph or table.
Applications of Data Mining
Data-mining technology provides two unique capabilities to the researcher or manager: pattern discovery and predicting trends and behavior. Data- mining tools perform exploratory and confirmatory statistical analysis to discover and validate relationships.
These tools even extend confirmatory statistical approaches by allowing the automated examination of a large number of hypotheses. The type of data available and the nature of the information sought to determine which of the numerous data-mining techniques to select.
Data mining is being used for a wide variety of applications.
For businesses, data mining is used to discover patterns and relationships in the data to help make better business decisions.
The example of a credit card company with large volumes of data illustrates a data mining application known as customer discovery. The credit card company will probably gather such information as age, gender, number of children, job status, income level, the past credit history of each customer.
Very often, the data about these background characteristics of the customers will be mined to find the patterns that make a particular individual good or bad credit risk.
Data mining can help spot sales trends, develop smarter marketing campaigns, and accurately predict customer loyalty. Data mining tools sweep through databases and identify previously hidden patterns.
An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that could represent data entity keying errors.
Some of the specific uses of data mining in business and other areas are as follows:
- Market segmentation: Data mining helps us to identify the common characteristics of customers who buy the same products from your company and use this knowledge to develop targeted marketing patterns.
- Customer churn: Data mining may be used to predict which customers are likely to leave your company and go to a competitor.
- Business transaction: Today, businesses are consolidating, and more and more businesses have millions of customers and billions of their transactions. They need to understand risks (transaction fraudulent, customers pay) and opportunities (expected profit, customer likeliness). Data mining plays an important role here.
- Marketing: It helps marketers discovering distinct groups in their customer base, and they use this knowledge to develop targeted marketing programs.
- Website or web-store design and promotion: Data mining finds the affinity of visitors to web pages, followed by subsequent layout modification.
- Fraud detection: It identifies which transactions are most likely to be fraudulent.
- Security: It may be used in face-recognition, identification, biometrics, etc.
- Medicine and health care: It determines disease outcome and effectiveness of treatments, by analyzing patient disease history to find some relationship between diseases.
- Direct marketing: Data mining identifies which prospects should be included in a mailing list to obtain the highest response rate.
- Interactive marketing: It is useful in predicting what each is accessing a Web site is most likely interested in seeing.
- Market basket analysis: It helps to understand what products or services are commonly purchased together, e.g., beer and diapers.
- Trend analysis: It reveals the difference between typical customers this month and last.
- Multimedia retrieval: It searches and identifies the image, video, voice, and text from the multimedia databases, which may be compressed.
- Land use: It may be used in the identification of areas of similar land use in an earth observation database.
- Scientific data analysis: It may be used to identify new galaxies by searching for sub-clusters.
- • City planning: It Identifies groups of houses according to their house type, value, and geographical location.
In recent years, data mining has been used in the area of science and engineering, such as bioinformatics, genetics, education, and electric power engineering.
In the area of study on human genetics, the data mining technique is used to find out how the changes in an individual’s DNA sequence affect the risk of developing common diseases such as cancer.
This is very important to help improve the diagnosis, prevention, and treatment of the diseases. The data mining technique that is used to perform this task is known as multifactor dimensionality reduction.
In the area of electrical power engineering, data mining techniques have been widely used for condition monitoring of high voltage electrical equipment.
The purpose of condition monitoring is to obtain valuable information on the insulation’s health status of the equipment.
Data mining techniques have also been applied for dissolved gas analysis (DGA) on power transformers. DGA, as a diagnostics for power transformer, has been available for many years.
Data mining techniques such as SOM has been applied to analyze data and to determine trends that are not obvious to the standard DGA ratio techniques such as Duval Triangle.
The fourth area of application for data mining in science/engineering is within educational research, where data mining has been used to study the factors leading students to choose to engage in behaviors that reduce their learning and to understand the factors influencing university student retention.
How does data mining work?
While large-scale information technology has been evolving separate transaction and analytical systems, data mining provides the link between the two.
Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries.
Several types of analytical software are available; statistical, machine learning, and neural networks.
Generally, any of the four types of relationships are sought:
- Classification: Stored data are used to locate data in predetermined groups. For example, a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials.
- Clusters: Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities.
- Associations: Data can be mined to identify associations. For example, a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently brought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.
- Sequential patterns: Data are mined to anticipate behavior patterns and trends. For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer’s purchase of sleeping bags and hiking shoes.