1.原理 1.1 闭式求解
模型: $h_\theta(x)=\theta^TX$
损失函数: $J(\theta)=\left|X\theta-Y\right|_2^2$
目标: $\theta=\arg\min J(\theta)$
说明:
$$
\begin{cases}\theta\in\mathbb{R}^{d\times1}\\[2ex]X\in\mathbb{R}^{m\times d}\\[2ex]Y\in\mathbb{R}^{m\times1}\end{cases}
$$
正规方程形式求解,即为直接求 $J(\theta)$ 的最小值:
先展开 $J(\theta)$ :
$$\begin{align*}
J(\theta) &= \|X\theta - Y\|_{2}^{2} \\
&= (X\theta - Y)^{T}(X\theta - Y) \\
&= (X^{T}\theta^{T} - Y^{T})(X\theta - Y) \\
&= X^{T}\theta^{T}X\theta - Y^{T}X\theta - Y^{T}X\theta + Y^{T}Y \\
&= X^{T}\theta^{T}X\theta - 2Y^{T}X\theta + Y^{T}Y
\end{align*}
$$
对 $J(\theta)$ 进行求导:
$$\begin{aligned}
\frac{\partial J(\theta)}{\partial\theta}& =\frac{\partial X^T\theta^TX\theta-2Y^TX\theta+Y^TY}{\partial\theta} \\
&=2X^{T}X\theta-2Y^{T}X
\end{aligned}$$
令 $J(\theta)=0$ 得:
$$\begin{aligned}
2X^{T}X\theta-2Y^{T}X& =0 \\
X^{T}X\theta & =Y^{T}X \\
\theta & =(X^TX)^{-1}Y^TX
\end{aligned}$$
上述结果即为求解结果,需要说明的是:特征矩阵 $X$ 不满秩(即存在特征间的线性相关性),则正规方程求解过程中的矩阵求逆操作可能会导致数值不稳定性。
1.2 梯度下降求解
模型: $h_\theta(x)=\sum_{i=1}^d\theta_ix_i$
注:$x_i$表示$x$的第$i$维
损失函数: $J(\theta)=\frac1{2m}\sum_{j=0}^m\left(y^j-h_\theta(x^j)\right)^2$
注:$x^j$表示第$j$个样本
目标: $\theta=\arg\min J(\theta)$
说明:
$$
\begin{cases}\theta\in\mathbb{R}^d\\[2ex]x\in\mathbb{R}^d\\[2ex]y\in\mathbb{R}^m\end{cases}
$$
损失函数 $J(\theta)$ 是一个关于参数 $\theta$ 的二次型,对 $J(\theta)$ 进行展开:
$$\begin{aligned}
J(\theta)& =\frac{1}{2m}\sum_{j=0}^{m}\Big(y^{j}-h_{\theta}(x^{j})\Big) \\
&=\frac{1}{2m}\sum_{j=0}^{m}\Bigg(y^{j}-\sum_{i=1}^{d}\theta_{i}x_{i}^{j}\Bigg)^{2}
\end{aligned}$$
对 $J(\theta)$ 进行偏微分求导运算得到:
$$\begin{aligned}
\partial\frac{J(\theta)}{\partial\theta_i}& =\frac{\partial}{\partial\theta_{i}}\frac{1}{2m}\sum_{j=0}^{m}\Bigg(y^{j}-\sum_{i=1}^{d}\theta_{i}x_{i}^{j}\Bigg)^{2} \\
&=\frac{1}{m}\sum_{j=0}^{m}\Bigg( y^{j}-\sum_{i=1}^{d}\theta_{i}x_{i}^{j}\Bigg)(-x_{i}^{j}) \\
&=\frac{1}{m}\sum_{j=0}^{m}\Bigg(\sum_{i=1}^{d}\theta_{i}x_{i}^{j}-y^{j}\Bigg)x_{i}^{j}
\end{aligned}$$
每次根据梯度更新参数:
$$\begin{aligned}
\theta_{i}& =\theta_i-\alpha\partial\frac{J(\theta)}{\partial\theta_i} \\
&=\theta_i-\alpha(\frac1m\sum_{j=0}^m\biggl(\sum_{i=1}^d\theta_ix_i^j-y^j\biggr)x_i^j) \\
&=\theta_i+\alpha \frac{1}{m}\sum_{j=0}^m\Bigg( y^j-\sum_{i=1}^d\theta_ix_i^j\Bigg)x_i^j
\end{aligned}$$
梯度下降法步骤:
$\text{Repeat until convergence } \{$
$$\theta_i:=\theta_i+\alpha\:\frac{1}{m}\sum_{j=0}^m\Bigg(y^j-\sum_{i=1}^d\theta_ix_i^j\Bigg)x_i^j$$
$\}$
2.Python实现 2.0 导包 1 2 3 4 5 6 7 8 9 10 11 12 import numpy as npimport pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom sklearn.impute import SimpleImputerfrom sklearn.preprocessing import OneHotEncoderfrom sklearn.model_selection import train_test_splitimport time%matplotlib inline %config InlineBackend.figure_format = 'svg'
2.1 读取数据集 1 2 3 4 5 6 df = pd.read_csv("./housing.csv" ) print (df.head())print (df.info())
2.2 数据预处理 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 total_bedrooms = df.loc[:, "total_bedrooms" ].values.reshape(-1 , 1 ) filled_df = df.copy() filled_df.loc[:, "total_bedrooms" ] = SimpleImputer(strategy="median" ).fit_transform(total_bedrooms) filled_df.info() code = OneHotEncoder().fit_transform(filled_df.loc[:, "ocean_proximity" ].values.reshape(-1 , 1 )) coded_df = pd.concat([filled_df, pd.DataFrame(code.toarray())], axis=1 ) coded_df.drop(["ocean_proximity" ], axis=1 , inplace=True ) coded_df.columns = list (coded_df.columns[:-5 ]) + ["ocean_0" , "ocean_1" , "ocean_2" , "ocean_3" , "ocean_4" ] coded_df.head(10 )
2.3 划分数据集 1 2 3 4 5 6 feature = coded_df.iloc[:, :8 ].join(coded_df.iloc[:, -5 :]) label = coded_df["median_house_value" ] Xtrain,Xtest,Ytrain,Ytest = train_test_split(feature,label,test_size=0.3 ) Xtrain.head()
2.4 求解模型 2.4.0 评价指标R^2 1 2 3 def R2 (y, y_pred ): return 1 - (np.sum ((y - y_pred) ** 2 ) / np.sum ((y - np.mean(y)) ** 2 ))
2.4.1 数据标准化 1 2 3 4 5 6 7 8 9 10 11 12 13 def normalize (X ): sigma = np.std(X, axis=0 ) mu = np.mean(X, axis=0 ) X = (X - mu) / sigma return np.array(X) X = np.array(Xtrain).reshape(np.size(Xtrain, 0 ), -1 ) y = np.array(Ytrain).T.reshape(-1 , 1 ) X = normalize(X) y = normalize(y)
2.4.2 闭合形式求解 1 2 3 4 5 6 7 8 9 10 11 12 13 14 def Normal_Equation (X, y ): return np.linalg.inv(X.T @ X) @ X.T @ y start_time = time.time() theta_ne = Normal_Equation(X, y) print (f"花费时间:{time.time() - start_time} " )vprint (f"R^2:{R2(y, X @ theta_ne)} " )result_cf = pd.DataFrame({"ColumnName" : list (Xtrain.columns), "Theta" : theta_ne.flatten()}) result_cf
2.4.3 梯度下降求解 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 def MSE_Loss (y, y_pred ): return np.sum ((y_pred - y) ** 2 ) / (2 * np.size(y)) def GD (X, y, lr=0.01 , epochs=5000 ): m, n = X.shape theta = np.random.randn(n, 1 ) loss = np.zeros(epochs) for epoch in range (epochs): gradient = (1 / m) * (X.T @ (X @ theta - y)) theta -= lr * gradient loss[epoch] = MSE_Loss(y, X @ theta) return theta, loss start_time = time.time() [theta_gd, loss] = GD(X, y) print (f"花费时间:{time.time() - start_time} " )print (f"R^2:{R2(y, X @ theta_gd)} " )result_gd = pd.DataFrame({"ColumnName" : list (Xtrain.columns), "Theta" : theta_gd.flatten()}) result_gd sns.lineplot(x=np.arange(5000 ), y=loss.flatten(), label='Loss Curve' ) plt.xlabel('Epoch' ) plt.ylabel('Loss' ) plt.title('Gradient Descent Loss Curve' )
3.实验结果