商品簡介
作者簡介
Pang.Ning Tan現為密歇根州立大學計算機與工程系助理教授,主要教授數據挖掘、數據庫系統等課程。他的研究主要關注于為廣泛的應用(包括醫學信息學、地球科學、社會網絡、Web挖掘和計算機安全)開發適用的數據挖掘算法。
Michael Steinbach擁有明尼蘇達大學數學學士學位、統計學碩士學位和計算機科學博士學位,現為明尼蘇達大學雙城分校計算機科學與工程系助理研究員。
Vipin Kumar現為明尼蘇達大學計算機科學與工程系主任和William Norris教授。1 988年至2005年。他曾擔任美國陸軍高性能計算研究中心主任。
目次
1 Introduction
1.1 What Is Data Mining?
1.2 Motivating Challenges
1.3 The Origins of Data Mining
1.4 Data Mining Tasks
1.5 Scope and Organization of the Book
1.6 Bibliographic Notes
1.7 Exercises
2 Data
2.1 Types of Data
2.1.1 Attributes and Measurement
2.1.2 Types of Data Sets
2.2 Data Quality
2.2.1 Measurement and Data Collection Issues
2.2.2 Issues Related to Applications
2.3 Data Preprocessing
2.3.1 Aggregation
2.3.2 Sampling
2.3.3 Dimensionality Reduction
2.3.4 Feature Subset Selection
2.3.5 Feature Creation
2.3.6 Discretization and Binarization
2.3.7 Variable Transformation
2.4 Measures of Similarity and Dissimilarity
2.4.1 Basics
2.4.2 Similarity and Dissimilarity between Simple Attributes.
2.4.3 Dissimilarities between Data Objects
2.4.4 Similarities between Data Objects
2.4.5 Examples of Proximity Measures
2.4.6 Issues in Proximity Calculation
2.4.7 Selecting the Right Proximity Measure
2.5 Bibliographic Notes
2.6 Exercises
3 Exploring Data
3.1 The Iris Data Set
3.2 Summary Statistics
3.2.1 Frequencies and the Mode
3.2.2 Percentiles
3.2.3 Measures of Location: Mean and Median
3.2.4 Measures of Spread: Range and Variance
3.2.5 Multivariate Summary Statistics
3.2.6 Other Ways to Summarize the Data
3.3 Visualization
3.3.1 Motivations for Visualization
3.3.2 General Concepts
3.3.3 Techniques
3.3.4 Visualizing Higher-Dimensional Data
3.3.5 Dos and Donts
3.4 OLAP and Multidimensional Data Analysis
3.4.1 Representing Iris Data as a Multidimensional Array
3.4.2 Multidimensional Data: The General Case
3.4.3 Analyzing Multidimensional Data
3.4.4 Final Comments on Multidimensional Data Analysis
3.5 Bibliographic Notes
3.6 Exercises
Classification:
Basic Concepts, Decision Trees, and Model Evaluation
4.1 Preliminaries
4.2 General Approach to Solving a Classification Problem
4.3 Decision Tree Induction
4.3.1 How a Decision Tree Works
4.3.2 How to Build a Decision Tree
4.3.3 Methods for Expressing Attribute Test Conditions .
4.3.4 Measures for Selecting the Best Split
4.3.5 Algorithm for Decision Tree Induction
4.3.6 An Example: Web Robot Detection
4.3.7 Characteristics of Decision Tree Induction
4.4 Model Overfitting
4.4.1 Overfitting Due to Presence of Noise
4.4.2 Overfitting Due to Lack of Representative Samples .
4.4.3 Overfitting and the Multiple Comparison Procedure
4.4.4 Estimation of Generalization Errors
4.4.5 Handling Overfitting in Decision Tree Induction . .
4.5 Evaluating the Performance of a Classifier
4.5.1 Holdout Method
4.5.2 Random Subsampling
4.5.3 Cross-Validation
4.5.4 Bootstrap
4.6 Methods for Comparing Classifiers
4.6.1 Estimating a Confidence Interval for Accuracy . . .
4.6.2 Comparing the Performance of Two Models
4.6.3 Comparing the Performance of Two Classifiers . . .
4.7 Bibliographic Notes
4.8 Exercises
Classification: Alternative Techniques
5.1 Rule-Based Classifier
5.1.1 How a Rule-Based Classifier Works
5.1.2 Rule-Ordering Schemes
5.1.3 How to Build a Rule-Based Classifier
5.1.4 Direct Methods for Rule Extraction
5.1.5 Indirect Methods for Rule Extraction
5.1.6 Characteristics of Rule-Based Classifiers
5.2 Nearest-Neighbor classifiers
5.2.1 Algorithm
5.2.2 Characteristics of Nearest-Neighbor Classifiers
5.3 Bayesian Classifiers
5.3.1 Bayes Theorem
5.3.2 Using the Bayes Theorem for Classification
5.3.3 Naive Bayes Classifier
5.3.4 Bayes Error Rate
5.3.5 Bayesian Belief Networks
5.4 Artificial Neural Network (ANN)
5.4.1 Perceptron
5.4.2 Multilayer Artificial Neural Network
5.4.3 Characteristics of ANN
5.5 Support Vector Machine (SVM)
5.5.1 Maximum Margin Hyperplanes
5.5.2 Linear SVM: Separable Case
5.5.3 Linear SVM: Nonseparable Case
5.5.4 Nonlinear SVM
5.5.5 Characteristics of SVM
5.6 Ensemble Methods
5.6.1 Rationale for Ensemble Method
5.6.2 Methods for Constructing an Ensemble Classifier
5.6.3 Bias-Variance Decomposition
5.6.4 Bagging
5.6.5 Boosting
5.6.6 Random Forests
5.6.7 Empirical Comparison among Ensemble Methods
5.7 Class Imbalance Problem
5.7.1 Alternative Metrics
5.7.2 The Receiver Operating Characteristic Curve
5.7.3 Cost-Sensitive Learning
5.7.4 Sampling-Based Approaches
5.8 Multiclass Problem
5.9 Bibliographic Notes
5.10 Exercises
6 Association Analysis: Basic Concepts and Algorithms
6.1 Problem Definition
6.2 Frequent Itemset Generation
6.2.1 The Apriori Principle
6.2.2 Frequent Itemset Generation in the Apriori Algorithm .
6.2.3 Candidate Generation and Pruning
6.2.4 Support Counting
6.2.5 Computational Complexity
6.3 Rule Generation
6.3.1 Confidence-Based Pruning
6.3.2 Rule Generation in Apriori Algorithm
6.3.3 An Example: Congressional Voting Records
6.4 Compact Representation of Frequent Itemsets
6.4.1 Maximal Frequent Itemsets
6.4.2 Closed Frequent Itemsets
6.5 Alternative Methods for Generating Frequent Itemsets
6.6 FP-Growth Algorithm
……
主題書展
更多主題書展
更多書展本週66折
您曾經瀏覽過的商品
購物須知
大陸出版品因裝訂品質及貨運條件與台灣出版品落差甚大,除封面破損、內頁脫落等較嚴重的狀態,其餘商品將正常出貨。
特別提醒:部分書籍附贈之內容(如音頻mp3或影片dvd等)已無實體光碟提供,需以QR CODE 連結至當地網站註冊“並通過驗證程序”,方可下載使用。
無現貨庫存之簡體書,將向海外調貨:
海外有庫存之書籍,等候約45個工作天;
海外無庫存之書籍,平均作業時間約60個工作天,然不保證確定可調到貨,尚請見諒。
為了保護您的權益,「三民網路書店」提供會員七日商品鑑賞期(收到商品為起始日)。
若要辦理退貨,請在商品鑑賞期內寄回,且商品必須是全新狀態與完整包裝(商品、附件、發票、隨貨贈品等)否則恕不接受退貨。