Classification, Parameter Estimation and State Estimation
An Engineering Approach using MATLAB
F. van der Heijden
Faculty of Electrical Engineering, Mathematics and Computer Science
University of Twente
The Netherlands
R.P.W. Duin
Faculty of Electrical Engineering, Mathematics and Computer Science
Delft University of Technology
The Netherlands
D. de Ridder
Faculty of Electrical Engineering, Mathematics and Computer Science
Delft University of Technology
The Netherlands
D.M.J. Tax
Faculty of Electrical Engineering, Mathematics and Computer Science
Delft University of Technology
The Netherlands
Contents
Preface xi
Foreword xv
1 Introduction 1
1.1 The scope of the book 2
1.1.1 Classification 3
1.1.2 Parameter estimation 4
1.1.3 State estimation 5
1.1.4 Relations between the subjects 6
1.2 Engineering 9
1.3 The organization of the book 11
1.4 References 12
2 Detection and Classification 13
2.1 Bayesian classification 16
2.1.1 Uniform cost function and minimum error rate 23
2.1.2 Normal distributed measurements; linear
and quadratic classifiers 25
2.2 Rejection 32
2.2.1 Minimum error rate classification with
reject option 33
2.3 Detection: the two-class case 35
2.4 Selected bibliography 43
2.5 Exercises 43
3 Parameter Estimation 45
3.1 Bayesian estimation 47
3.1.1 MMSE estimation 543.1.2 MAP estimation 55
3.1.3 The Gaussian case with linear sensors 56
3.1.4 Maximum likelihood estimation 57
3.1.5 Unbiased linear MMSE estimation 59
3.2 Performance of estimators 62
3.2.1 Bias and covariance 63
3.2.2 The error covariance of the unbiased linear
MMSE estimator 67
3.3 Data fitting 68
3.3.1 Least squares fitting 68
3.3.2 Fitting using a robust error norm 72
3.3.3 Regression 74
3.4 Overview of the family of estimators 77
3.5 Selected bibliography 79
3.6 Exercises 79
4 State Estimation 81
4.1 A general framework for online estimation 82
4.1.1 Models 83
4.1.2 Optimal online estimation 86
4.2 Continuous state variables 88
4.2.1 Optimal online estimation in linear-Gaussian
systems 89
4.2.2 Suboptimal solutions for nonlinear
systems 100
4.2.3 Other filters for nonlinear systems 112
4.3 Discrete state variables 113
4.3.1 Hidden Markov models 113
4.3.2 Online state estimation 117
4.3.3 Offline state estimation 120
4.4 Mixed states and the particle filter 128
4.4.1 Importance sampling 128
4.4.2 Resampling by selection 130
4.4.3 The condensation algorithm 131
4.5 Selected bibliography 135
4.6 Exercises 136
5 Supervised Learning 139
5.1 Training sets 140
5.2 Parametric learning 142
5.2.1 Gaussian distribution, mean unknown 143
vi CONTENTS5.2.2 Gaussian distribution, covariance matrix
unknown 144
5.2.3 Gaussian distribution, mean and covariance
matrix both unknown 145
5.2.4 Estimation of the prior probabilities 147
5.2.5 Binary measurements 148
5.3 Nonparametric learning 149
5.3.1 Parzen estimation and histogramming 150
5.3.2 Nearest neighbour classification 155
5.3.3 Linear discriminant functions 162
5.3.4 The support vector classifier 168
5.3.5 The feed-forward neural network 173
5.4 Empirical evaluation 177
5.5 References 181
5.6 Exercises 181
6 Feature Extraction and Selection 183
6.1 Criteria for selection and extraction 185
6.1.1 Inter/intra class distance 186
6.1.2 Chernoff–Bhattacharyya distance 191
6.1.3 Other criteria 194
6.2 Feature selection 195
6.2.1 Branch-and-bound 197
6.2.2 Suboptimal search 199
6.2.3 Implementation issues 201
6.3 Linear feature extraction 202
6.3.1 Feature extraction based on the
Bhattacharyya distance with Gaussian
distributions 204
6.3.2 Feature extraction based on inter/intra
class distance 209
6.4 References 213
6.5 Exercises 214
7 Unsupervised Learning 215
7.1 Feature reduction 216
7.1.1 Principal component analysis 216
7.1.2 Multi-dimensional scaling 220
7.2 Clustering 226
7.2.1 Hierarchical clustering 228
7.2.2 K-means clustering 232
CONTENTS vii7.2.3 Mixture of Gaussians 234
7.2.4 Mixture of probabilistic PCA 240
7.2.5 Self-organizing maps 241
7.2.6 Generative topographic mapping 246
7.3 References 250
7.4 Exercises 250
8 State Estimation in Practice 253
8.1 System identification 256
8.1.1 Structuring 256
8.1.2 Experiment design 258
8.1.3 Parameter estimation 259
8.1.4 Evaluation and model selection 263
8.1.5 Identification of linear systems with
a random input 264
8.2 Observability, controllability and stability 266
8.2.1 Observability 266
8.2.2 Controllability 269
8.2.3 Dynamic stability and steady state solutions 270
8.3 Computational issues 276
8.3.1 The linear-Gaussian MMSE form 280
8.3.2 Sequential processing of the measurements 282
8.3.3 The information filter 283
8.3.4 Square root filtering 287
8.3.5 Comparison 291
8.4 Consistency checks 292
8.4.1 Orthogonality properties 293
8.4.2 Normalized errors 294
8.4.3 Consistency checks 296
8.4.4 Fudging 299
8.5 Extensions of the Kalman filter 300
8.5.1 Autocorrelated noise 300
8.5.2 Cross-correlated noise 303
8.5.3 Smoothing 303
8.6 References 306
8.7 Exercises 307
9 Worked Out Examples 309
9.1 Boston Housing classification problem 309
9.1.1 Data set description 309
9.1.2 Simple classification methods 311
viii CONTENTS9.1.3 Feature extraction 312
9.1.4 Feature selection 314
9.1.5 Complex classifiers 316
9.1.6 Conclusions 319
9.2 Time-of-flight estimation of an acoustic tone burst 319
9.2.1 Models of the observed waveform 321
9.2.2 Heuristic methods for determining the ToF 323
9.2.3 Curve fitting 324
9.2.4 Matched filtering 326
9.2.5 ML estimation using covariance models
for the reflections 327
9.2.6 Optimization and evaluation 332
9.3 Online level estimation in an hydraulic system 339
9.3.1 Linearized Kalman filtering 341
9.3.2 Extended Kalman filtering 343
9.3.3 Particle filtering 344
9.3.4 Discussion 350
9.4 References 352
Appendix A Topics Selected from Functional Analysis 353
A.1 Linear spaces 353
A.1.1 Normed linear spaces 355
A.1.2 Euclidean spaces or inner product spaces 357
A.2 Metric spaces 358
A.3 Orthonormal systems and Fourier series 360
A.4 Linear operators 362
A.5 References 366
Appendix B Topics Selected from Linear Algebra
and Matrix Theory 367
B.1 Vectors and matrices 367
B.2 Convolution 370
B.3 Trace and determinant 372
B.4 Differentiation of vector and matrix functions 373
B.5 Diagonalization of self-adjoint matrices 375
B.6 Singular value decomposition (SVD) 378
B.7 References 381
Appendix C Probability Theory 383
C.1 Probability theory and random variables 383
C.1.1 Moments 386
CONTENTS ixC.1.2 Poisson distribution 387
C.1.3 Binomial distribution 387
C.1.4 Normal distribution 388
C.1.5 The Chi-square distribution 389
C.2 Bivariate random variables 390
C.3 Random vectors 395
C.3.1 Linear operations on Gaussian random
vectors 396
C.3.2 Decorrelation 397
C.4 Reference 398
Appendix D Discrete-time Dynamic Systems 399
D.1 Discrete-time dynamic systems 399
D.2 Linear systems 400
D.3 Linear time invariant systems 401
D.3.1 Diagonalization of a system 401
D.3.2 Stability 402
D.4 References 403
Appendix E Introduction to PRTools 405
E.1 Motivation 405
E.2 Essential concepts in PRTools 406
E.3 Implementation 407
E.4 Some details 410
E.4.1 Data sets 410
E.4.2 Classifiers and mappings 411
E.5 How to write your own mapping 414
Appendix F MATLAB Toolboxes Used 417
Index 41
Index
Acceptance boundary, 297
Algorithm
backward, 122
condensation, 131
forward, 116
forward–backward, 123
Viterbi, 125
ARIMA, 264
ARMA, 264
Autoregressive model, 264
first order, 91, 341
second order, 92, 137
Autoregressive, moving average
models, 137
Back-propagation training, 175
Baseline removal, 259
Batch processing, 166
Bayes estimation, 142
Bayes’ theorem, 7, 20, 48
Bayesian classification, 16, 21
Bhattacharyya upper bound, 192
Bias, 63, 142, 332
Binary measurements, 148
Branch-and-bound, 197
Chernoff bound, 192
Chi-square test, 346
Classifier
Bayes, 8, 33
Euclidean distance, 147
feed-forward neural network, 173, 317
least squared error, 166
linear, 29, 311
linear discriminant function, 162
Mahalanobis distance, 147
maximum a posteriori (MAP), 14, 35
minimum distance, 30
minimum error rate, 24, 33
nearest neighbour, 155, 312
Parzen density-based, 150, 312
perceptron, 164
quadratic, 27, 311
support vector, 168, 316
Clustering, 226
average-link, 229
characteristics, 216
complete-link, 229
hierarchical, 228
K-means, 228
quality, 227
single-link, 229
Completely
controllable, 269
observable, 272
Computational complexity, 178
Computational issues, 253
Condensing, 159, 160
Confusion matrix, 178
Consistency checks, 292,
296, 342
Control vector, 89
Controllability matrix, 269
Cost
absolute value, 50
function, 19, 33, 50
matrix, 19
quadratic, 50
uniform, 35, 50
Covariance, 63
Covariance model (CVM) based
estimator, 331
Covariance models, 327
Cross-validation, 180, 312, 332
Classification, Parameter Estimation and State Estimation: An Engineering Approach using MATLAB
F. van der Heijden, R.P.W. Duin, D. de Ridder and D.M.J. Tax
2004 John Wiley & Sons, Ltd ISBN: 0-470-09013-8Curve
calibration, 350
fitting, 324
Decision boundaries, 27, 150
Decision function, 17
Dendrogram, 230
Depth-first search, 198
Design set, 140
Detection, 35
Differencing, 301
Discrete
algebraic Ricatti equation,
98
Lyapunov equation, 90, 274
Ricatti equation, 271
Discriminability, 39, 41
Discriminant function, 162
generalized linear, 163
linear, 162
Dissimilarity, 226
Distance
Bhattacharyya, 191, 204
Chernoff, 186, 191
cosine, 226
Euclidean, 226
inter/intraclass, 186, 209
interclass, 189
intraclass, 189
Mahalanobis, 205
Matusita, 194
probabilistic, 194
Distribution
Gamma, 49
matrix, 89
test, 297
Drift, 261
Dynamic stability, 270
Editing, 159
Entropy, 195
Envelope detection, 323
Ergodic Markov model, 114
Error correction, 166
Error covariance matrix, 98
Error function, 39
Error rate, 24, 33
Estimation
maximum a posteriori
(MAP), 50, 51
maximum likelihood, 57
minimum mean absolute error
(MMAE), 50, 51
minimum mean squared error
(MMSE), 50, 51
minimum variance, 55
Estimation loop, 277
Evaluation, 263
Evaluation set, 177
Expectation-maximization, 235
Experiment design, 258
Feature, 14
Feature extraction, 185
Feature reduction, 216
Feature selection, 185
generalized sequential forward,
200
Plusl – take away r, 200
selection of good components, 330
sequential forward, 200
Feed-forward neural network, 173
Fisher approach, 57
Fisher’s linear discriminant, 213
Fudge factor, 300
Gain matrix, 89
Generative topographic mapping,
246
Goodness of fit, 346
Gradient ascent, 164
Hidden data, 235
Hidden Markov model (HMM), 113
Hidden neurons, 174
Hill climbing algorithm, 235
Histogramming, 150
Holdout method, 179
i.i.d., 140
Identification of linear systems, 264
Image compression, 219
Importance sampling, 128, 131
Incomplete data, 235
Indicator variables, 236
Information
filter, 269, 283
matrix, 67, 284
Innovation(s), 62, 293
matrix, 98
Input vector, 89
Kalman
filtering, 254
form, 62, 280
linear-Gaussian MMSE form, 280
420 INDEXKalman filter, 97
discrete, 97, 98
extended, 105, 343
iterated extended, 108
linearized, 101, 341
unscented, 112
Kalman gain matrix, 62, 98
Kernel, 151
Gaussian, 171
polynomial, 171
radial basis function (RBF),
171
trick, 171
K-nearest neighbour rule,
157
Kohonen map, 241
Lagrange multipliers, 170
Latent variable, 246
Learning, 139
least squared error, 166
nonparametric, 149–50
parametric, 142
perceptron, 164
supervised, 139
unsupervised, 139
Learning data, 140
Learning rate, 164
Least squared error (LSE), 68
Leave-one-out method, 180
Left–right model, 114
Level estimation, 339
Likelihood
function, 36, 57
ratio, 36
Linear
dynamic equation, 89
plant equation, 89
state equation, 89
system equation, 89
Linear feature extraction,
202
Linear feedback, 102
Linear-Gaussian system, 89
Log-likelihood, 262
Loss function, 19
Mahalanobis distance, 28
Mahalanobis distance
classifier, 213
Margin, 169
Markov condition, 83, 114
Matched filtering, 326
MATLAB functions
HMM analysis, 127
particle filtering, 112
steady state Kalman filtering,
137, 268, 270, 273
Maximum likelihood, 326, 327
Mean square error, 64
Measure
divergence, 194
Matusita, 194
Shannon’s entropy, 195
Measurement
matrix, 96
model, 86
noise, 96
space, 14, 86
vector, 14, 163
Minimum error rate, 158
Minimum mean squared
error, 217
Minimum risk classification,
21
Missing data, 235
Mixture
of Gaussians, 228
of probabilistic PCA, 240
Model selection, 263
Models, 83
Monte Carlo simulation, 128
Moving average models
first order, 137
Multi-dimensional scaling,
220
Multi-edit algorithm, 160
Nearest neighbour rule, 157
Neuron, 173
Noise, 48, 75
autocorrelated, 300
cross-correlated, 303
measurement, 96
plant, 89
process, 89
quantization, 17
quantum, 17
system, 89
thermal, 17, 96
Nominal trajectory, 105
Nonlinear mapping, 221
Normalized
estimation error squared, 294
importance weights, 130
innovation squared, 295, 342
INDEX 421Observability, 253, 266
complete, 266
Gramian, 267
matrix, 267
stochastic, 269
Offset correction, 259
Online estimation, 82
Optimal filtering, 82
Optimization criterion, 185
Outlier clusters, 228
Outliers, 72
Overfitting, 76, 177, 184, 263
Parameter vector, 48
Partial autocorrelation
function, 265
Particle filter, 112, 128, 344
consistency criterion, 345
implementation, 347
Particles, 128–9
Parzen estimation, 150
Perceptron, 165
Periodogram, 297
Place coding, 169
Posterior, 48
prior, 48
Potter’s square root filter, 287
Predicted measurement, 98
Prediction, 82, 94
fixed interval, 95
fixed lead, 95
Principal
component analysis, 216, 313, 329
components, 218
directions, 218
Principle of orthogonality, 294
Probabilistic dependence, 194
Probability
posterior, 20
prior, 16–17
Probability density
conditional, 17, 36, 48, 83, 86
posterior, 86
unconditional, 17
Proposal density, 129
Quadratic decision function, 27
Quantization errors, 96
Random walk, 92
Rauch–Tung–Striebel smoother, 304
Regression, 74, 321, 351
curve, 75
Regularization, 146
parameter, 146
Reject rate, 33
Rejection, 32
class, 32
Resampling by selection, 130
Residual(s), 68, 75, 293
Retrodiction, 82
Ricatti loop, 277
Risk, 21
average, 21, 51
conditional, 20, 23, 51
Robust error norm, 72
Robustness, 65
ROC-curve, 40
Root mean square (RMS), 337
Sammon mapping, 224
Sample
covariance, 145
mean, 143
Scatter diagram, 15
Scatter matrix, 188
between-scatter matrix, 188
within-scatter matrix, 188
Self-organizing map, 241
Sensor, 258
location, 258
Sensory system, 13
Sequential update, 283
Signal-to-noise ratio, 38, 190
Single sample processing, 166
Smoothing, 303
Square root filtering, 287
Stability, 65, 253, 270
State space model, 81, 83
State
augmentation, 120
estimation
offline, 120
online, 117
mixed, 128
variable, 81
continuous, 81, 88
discrete, 81, 113
Statistical linearization, 112
Steady state, 98, 270
Steepest ascent, 164
Stochastic observability, 269
Stress measure, 221
Subspace
dimension, 240
structure, 216
422 INDEXSum of squared differences
(SSD), 68, 324
Support vector, 170
System identification,
253, 254, 256
Target vector, 166, 175
Test set, 177
Time-of-flight estimation, 319
Topology, 241
Training, 139
Training set, 140
Transfer function, 173
True class, 140
Unbiased, 64
absolutely, 64
Unit cost, 23
Validation set, 177
Variance, 142
Winning neuron, 242
Wishart distribution, 144
Yule–Walker equations, 264
كلمة سر فك الضغط : books-world.net
The Unzip Password : books-world.net
تحميل
يجب عليك التسجيل في الموقع لكي تتمكن من التحميل
تسجيل | تسجيل الدخول