Data Generalization In Data Mining
From a Data Analysis perspective, data mining will be categorized into two classes:
- Descriptive mining.
- predictive mining.
- Descriptive mining: It describes the information set in a concise and summative method and presents attention-grabbing basic properties of data.
- Predictive mining: It analyzes the data to assemble one or a set of models, and attempts to predict the conduct of recent data sets.
Databases often store a considerable amount of data in great detail.
However, users usually prefer to view sets of summarized data in concise, descriptive terms.
Such data descriptions could present a total image of a category of data or distinguish it from a set of comparative classes.
Such descriptive data mining is known as concept descriptions and types a necessary component of data mining.
What Is Concept Description
The easiest form of descriptive data mining is known as concept description.
An idea normally refers to a group of data similar to frequent_buyers, graduate_students, etc.
As the data mining process concept description will not be an easy enumeration of the data.
Instead, an idea description generates descriptions for characterization and comparability of the data.
It is usually known as a class description when the idea to be described refers to a category of objects.
- Characterization: It offers a succinct summarization of the given assortment of data.
- Comparison: It offers descriptions evaluating two or extra collections of data.
Data Generalization & Summarization
Data and objects in databases include detailed info on the primitive concept level.
For instance, the item relation in a gross sales database could include attributes describing
- low-level item info similar to item_ID, name, brand, class, provider, place_made, and value.
It is helpful to have the ability to summarize a big set of data and current it at an excessive conceptual level.
For instance, summarizing a large set of things referring to Christmas season sales gives a basic description of such data
- which may be very useful for sales and advertising and marketing managers.
This requires a vital functionality known as data generalization.
Data Generalization in Data Mining.
- A course that abstracts a large set of activity–related data in a database from a low conceptual level to greater ones.
- Data Generalization is a summarization of general options of objects in a goal class and produces what is known as attribute guidelines.
- The data is associate with a user-specified class that is usually retrieved by a database question and run via a real module to calculate the essence of the data at different levels of abstractions.
For example, Customers who often hire greater than 30 movies a year one might wish to characterize the OurVideoStore.
With idea hierarchies on the attributes describing the goal class, the attribute-oriented induction technique can be utilized, for instance, to hold out data summarization.
Note that with an information cube containing a summarization of data,
easy OLAP operations match the aim of data characterization.
Approaches:
- Data cube strategy(OLAP strategy).
- Attribute-oriented induction strategy.
Presentation Of Generalized Results
Generalized Relation:
- Relations the place some or all attributes are generalized, with counts or different aggregation values collected.
Cross-Tabulation:
- Mapping outcomes into cross-tabulation type (just like contingency tables).
Visualization Techniques:
- Pie charts, bar charts, curves, cubes, and different visible varieties.
Quantitative attribute guidelines:
- Mapping generalized leads to attribute guidelines with quantitative info related to it.
Data Cube Approach
It is nothing however performing computations and storing leads to data cubes.
Strength
- An environment-friendly implementation of data generalization.
- Computation of varied kinds of measures, e.g., rely( ), sum( ), common( ), max( ).
- Generalization and specialization may be carried out on a data cube by roll-up and drill-down.
Limitations
- It handles solely dimensions of easy non-numeric data and measures of easy aggregated numeric values.
- Lack of clever evaluation, can’t inform which dimensions need to be used and what ranges ought to the generalization attain.
Summary
- Data generalization in Data Mining is the method that abstracts a large
- set of activity–related data in a database from a low conceptual level to greater ones.
- It is a summarization of basic options of objects in a goal class and produces what is known as attribute guidelines.