深度学习花书 笔记3 - 矩阵对角化、奇异值分解(SVD)、极大似然估计、误差的高斯分布与最小二乘估计的等价性、PCA原理与推导)
-
-
- 1. 矩阵对角化
- 2. 奇异值分解
- 3. 极大似然估计、误差的高斯分布与最小二乘估计的等价性
- 4. PCA原理与推导
-
1. 矩阵对角化
2. 奇异值分解
3. 极大似然估计、误差的高斯分布与最小二乘估计的等价性
4. PCA原理与推导
PCA实现代码:
首先,先创建数据
''' Created on Apr 6, 2020 Author: *** '''
import os, sys
import numpy as np
import matplotlib
import matplotlib.pyplot as pltn = 400 #数据的个数
fw = open('data.txt', 'w')
classes = [0, 1]x1 = np.random.uniform(1, 10, 200)
y1 = np.random.uniform(1, 10, 200)
for i in range(200):fw.write("%f\t%f\t%d\n" % (x1[i], y1[i], classes[0]))x2 = np.random.uniform(9, 15, 200)
y2 = np.random.uniform(9, 15, 200)
for i in range(200):fw.write("%f\t%f\t%d\n" % (x2[i], y2[i], classes[1]))fw.close()
figure = plt.figure()
ax = figure.add_subplot(111)
ax.scatter(x1, y1, marker='o',s=80, c='green')
ax.scatter(x2, y2, marker='^',s=80, c='red')
plt.show()
具体实现代码:
''' Created on Jun 1, 2011@author: Peter '''
from numpy import *
import matplotlib
import matplotlib.pyplot as pltdef loadDataSet(fileName, delim='\t'):fr = open(fileName)stringArr = [line.strip().split(delim) for line in fr.readlines()]datArr = [list(map(float, line)) for line in stringArr]return mat(datArr)def pca(data, max_n=99999):mean_values = mean(data, axis=0) # 计算数据均值remove_mean = data - mean_values # 数据减去均值cov_values = cov(remove_mean, rowvar=0) # 计算协方差eigen_values, eigen_vectors = linalg.eig(mat(cov_values)) # 计算特征值和特征向量eigen_value_indexes = argsort(eigen_values) # 从小到大排列eigen_value_indexes = eigen_value_indexes[:-(max_n + 1):-1] # 保留最大的前n个特征值red_eigen_vectors = eigen_vectors[:, eigen_value_indexes] # 获得对应的特征向量low_dimension_data = remove_mean * red_eigen_vectors # 将数据转换到新的低维空间new_data = (low_dimension_data * red_eigen_vectors.T) + mean_values # 对新的数据进行重构return low_dimension_data, new_dataif __name__ == '__main__':dataMat = loadDataSet('data.txt')lowDMat, reconMat = pca(dataMat.A, 1)fig = plt.figure()ax = fig.add_subplot(111)ax.scatter(dataMat[:, 0].tolist(), dataMat[:, 1].tolist(), marker='^', s=90)ax.scatter(reconMat[:, 0].tolist(), reconMat[:, 1].tolist(), marker='o', s=50, c='red')plt.show()
结果: