网络知识 娱乐 「机器学习」线性回归预测

「机器学习」线性回归预测

前言

回归分析就是用于预测输入变量(自变量)和输出变量(因变量)之间的关系,特别当输入的值发生变化时,输出变量值也发生改变!回归简单来说就是对数据进行拟合。线性回归就是通过线性的函数对数据进行拟合。机器学习并不能实现预言,只能实现简单的预测。我们这次对房价关于其他因素的关系。

波士顿房价预测

下载相关数据集

  • 数据集是506行14列的波士顿房价数据集,数据集是开源的。

复制代码123ER-HLJSwget.download(url='https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data',out= 'housing.data')nwget.download(url='https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.names',out='housing.names')nwget.download(url='https://archive.ics.uci.edu/ml/machine-learning-databases/housing/Index',out='Index')n

对数据集进行处理

复制代码1234567891011121314151617ER-HLJSnfeature_names = ['CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B','LSTAT','MEDV']nfeature_num = len(feature_names)nprint(feature_num)nn# 把7084 变为506*14nhousing_data = housing_data.reshape(housing_data.shape[0]//feature_num,feature_num)nprint(housing_data.shape[0])n# 打印第一行数据nprint(housing_data[:1])nnn## 归一化nnfeature_max = housing_data.max(axis=0)nfeature_min = housing_data.min(axis=0)nfeature_avg = housing_data.sum(axis=0)/housing_data.shape[0]n

模型定义

复制代码12345678ER-HLJS## 实例化模型ndef Model():n model = linear_model.LinearRegression()n return modelnn# 拟合模型ndef train(model,x,y):n model.fit(x,y)n

可视化模型效果

复制代码1234567891011ER-HLJSdef draw_infer_result(groud_truths,infer_results):n title = 'Boston'n plt.title(title,fontsize=24)n x = np.arange(1,40)n y = xn plt.plot(x,y)n plt.xlabel('groud_truth')n plt.ylabel('infer_results')n plt.scatter(groud_truths,infer_results,edgecolors='green',label='training cost')n plt.grid()n plt.show()n

整体代码

复制代码123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137ER-HLJS## 基于线性回归实现房价预测n## 拟合函数模型n## 梯度下降方法nn## 开源房价策略数据集nnimport wgetnimport numpy as npnimport osnimport matplotlibnimport matplotlib.pyplot as pltnnimport pandas as pdnnfrom sklearn import linear_modelnnn## 下载之后注释掉n'''nwget.download(url='https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data',out= 'housing.data')nwget.download(url='https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.names',out='housing.names')nwget.download(url='https://archive.ics.uci.edu/ml/machine-learning-databases/housing/Index',out='Index')n'''n'''n 1. CRIM per capita crime rate by townn 2. ZN proportion of residential land zoned for lots over n 25,000 sq.ft.n 3. INDUS proportion of non-retail business acres per townn 4. CHAS Charles River dummy variable (= 1 if tract bounds n river; 0 otherwise)n 5. NOX nitric oxides concentration (parts per 10 million)n 6. RM average number of rooms per dwellingn 7. AGE proportion of owner-occupied units built prior to 1940n 8. DIS weighted distances to five Boston employment centresn 9. RAD index of accessibility to radial highwaysn 10. TAX full-value property-tax rate per $10,000n 11. PTRATIO pupil-teacher ratio by townn 12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks n by townn 13. LSTAT % lower status of the populationn 14. MEDV Median value of owner-occupied homes in $1000'sn'''n## 数据加载nndatafile = './housing.data'nnhousing_data = np.fromfile(datafile,sep=' ')nnprint(housing_data.shape)nnnfeature_names = ['CRIM','ZN','INDUS','CHAS','NOX','RM','AGE','DIS','RAD','TAX','PTRATIO','B','LSTAT','MEDV']nfeature_num = len(feature_names)nprint(feature_num)nn# 把7084 变为506*14nhousing_data = housing_data.reshape(housing_data.shape[0]//feature_num,feature_num)nprint(housing_data.shape[0])n# 打印第一行数据nprint(housing_data[:1])nnn## 归一化nnfeature_max = housing_data.max(axis=0)nfeature_min = housing_data.min(axis=0)nfeature_avg = housing_data.sum(axis=0)/housing_data.shape[0]nndef feature_norm(input):n f_size = input.shapen output_features = np.zeros(f_size,np.float32)n for batch_id in range(f_size[0]):n for index in range(13):n output_features[batch_id][index] = (input[batch_id][index]-feature_avg[index])/(feature_max[index]-feature_min[index])nn return output_featuresnnnhousing_features = feature_norm(housing_data[:,:13])nnhousing_data = np.c_[housing_features,housing_data[:,-1]].astype(np.float32)nnn## 划分数据集 8:2nratio =0.8nnoffset = int(housing_data.shape[0]*ratio)nntrain_data = housing_data[:offset]ntest_data = housing_data[offset:]nnprint(train_data[:2])nnn## 模型配置n## 线性回归nn## 实例化模型ndef Model():n model = linear_model.LinearRegression()n return modelnn# 拟合模型ndef train(model,x,y):n model.fit(x,y)nnn## 模型训练nnX, y = train_data[:,:13], train_data[:,-1:]nnmodel = Model()ntrain(model,X,y)nnx_test, y_test = test_data[:,:13], test_data[:,-1:]nprefict = model.predict(x_test)nn## 模型评估nninfer_results = []ngroud_truths = []nndef draw_infer_result(groud_truths,infer_results):n title = 'Boston'n plt.title(title,fontsize=24)n x = np.arange(1,40)n y = xn plt.plot(x,y)n plt.xlabel('groud_truth')n plt.ylabel('infer_results')n plt.scatter(groud_truths,infer_results,edgecolors='green',label='training cost')n plt.grid()n plt.show()nnndraw_infer_result(y_test,prefict)nn

效果展示

「机器学习」线性回归预测

总结

线性回归预测还是比较简单的,可以简单理解为函数拟合,数据集是使用的开源的波士顿房价的数据集,算法也是打包好的包,方便我们引用。

文章来自https://www.cnblogs.com/hjk-airl/p/16405474.html