ML Regression

概述:ML Regression

[TOC]

1、线性回归

1
2
3
4
5
6
7
8
# 加载线性模型算法库
from sklearn import linear_model
# 创建线性回归模型的对象
regr = linear_model.LinearRegression()
# 利用训练集训练线性模型
regr.fit(X_train, y_train)
# 使用测试集做预测
y_pred = regr.predict(X_test)

2、岭回归

对系数进行惩罚(L2范式)来解决普通最小二乘法

1
2
3
4
5
6
7
#加载线性模型算法库 
from sklearn.linear_model import Ridge
# 创建岭回归模型的对象 reg = Ridge(alpha=.5)
# 利用训练集训练岭回归模型
reg.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
#输出各个系数
reg.coef_ reg.intercept_

3、Lasso回归

最小二乘法的基础上加入L1范式作为惩罚项

1
2
3
4
5
6
7
8
9
#加载Lasso模型算法库 
from sklearn.linear_model import Lasso
# 创建Lasso回归模型的对象
reg = Lasso(alpha=0.1)
# 利用训练集训练Lasso回归模型
reg.fit([[0, 0], [1, 1]], [0, 1])
""" Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000, normalize=False, positive=False, precompute=False, random_state=None, selection='cyclic', tol=0.0001, warm_start=False) """
# 使用测试集做预测
reg.predict([[1, 1]])

4、Elastic Net回归

利用L1范式和L2范式共同作为惩罚项

1
2
3
4
5
6
7
8
9
10
11
12
#加载ElasticNet模型算法库 
from sklearn.linear_model import ElasticNet
#加载数据集
from sklearn.datasets import make_regression
X, y = make_regression(n_features=2, random_state=0)
#创建ElasticNet回归模型的对象
regr = ElasticNet(random_state=0)
# 利用训练集训练ElasticNet回归模型
regr.fit(X, y)
print(regr.coef_)
print(regr.intercept_)
print(regr.predict([[0, 0]]))

5、贝叶斯岭回归

贝叶斯岭回归模型和岭回归类似。贝叶斯岭回归通过最大化边际对数似然来估计参数。

1
2
3
4
from sklearn.linear_model import BayesianRidge 
X = [[0., 0.], [1., 1.], [2., 2.], [3., 3.]]
Y = [0., 1., 2., 3.]
reg = BayesianRidge() reg.fit(X, Y)

6、SGD回归

上述的线性模型通过最小二乘法来优化损失函数,SGD回归也是一种线性回归,不同的是,它通过随机梯度下降最小化正则化经验损失。

1
2
3
4
5
6
7
import numpy as np 
from sklearn import linear_model
n_samples, n_features = 10, 5
np.random.seed(0)
y = np.random.randn(n_samples)
X = np.random.randn(n_samples, n_features)
clf = linear_model.SGDRegressor(max_iter=1000, tol=1e-3) clf.fit(X, y)

7、SVR

1
2
3
4
5
6
7
8
9
10
11
#加载SVR模型算法库 
from sklearn.svm import SVR
#训练集
X = [[0, 0], [2, 2]]
y = [0.5, 2.5]
#创建SVR回归模型的对象
clf = SVR()
# 利用训练集训练SVR回归模型
clf.fit(X, y)
""" SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='auto_deprecated', kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False) """
clf.predict([[1, 1]])

8、KNN回归

1
2
3
4
5
X = [[0], [1], [2], [3]] 
y = [0, 0, 1, 1]
from sklearn.neighbors import KNeighborsRegressor
neigh = KNeighborsRegressor(n_neighbors=2)
neigh.fit(X, y) print(neigh.predict([[1.5]]))

9、决策树回归

1
2
3
4
5
from sklearn.tree import DecisionTreeRegressor 
X = [[0, 0], [2, 2]]
y = [0.5, 2.5]
clf = DecisionTreeRegressor()
clf = clf.fit(X, y) clf.predict([[1, 1]])

10、神经网络

1
2
3
from sklearn.neural_network import MLPRegressor mlp=MLPRegressor() 
mlp.fit(X_train,y_train)
y_pred = mlp.predict(X_test)

11.RandomForest回归

1
2
3
4
5
6
7
from sklearn.ensemble import RandomForestRegressor 
from sklearn.datasets import make_regression
X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False)
regr = RandomForestRegressor(max_depth=2, random_state=0, n_estimators=100)
regr.fit(X, y)
print(regr.feature_importances_)
print(regr.predict([[0, 0, 0, 0]]))

12、XGBoost回归

1
2
3
import xgboost as xgb 
xgb_model = xgb.XGBRegressor(max_depth = 3, learning_rate = 0.1, n_estimators = 100, objective = 'reg:linear', n_jobs = -1)
xgb_model.fit(X_train, y_train, eval_set=[(X_train, y_train)], eval_metric='logloss', verbose=100) y_pred = xgb_model.predict(X_test) print(mean_squared_error(y_test, y_pred))

13、LightGBM回归

LightGBM作为另一个使用基于树的学习算法的梯度增强框架。相比于XGBoost,LightGBM有如下优点,训练速度更快,效率更高效;低内存的使用量,优势:更快的训练效率,低内存使用

更好的准确率;支持并行学习;可处理大规模数据

1
2
3
4
import lightgbm as lgb 
gbm = lgb.LGBMRegressor(num_leaves=31, learning_rate=0.05, n_estimators=20)
gbm.fit(X_train, y_train, eval_set=[(X_train, y_train)], eval_metric='logloss', verbose=100)
y_pred = gbm.predict(X_test) print(mean_squared_error(y_test, y_pred))

入门级比赛

Kaggle——房价预测

这个比赛作为最基础的回归问题之一,很适合入门机器学习的小伙伴们。

网址:https://www.kaggle.com/c/house-prices-advanced-regression-techniques

经典解决方案:

XGBoost解决方案: https://www.kaggle.com/dansbecker/xgboost

Lasso解决方案: https://www.kaggle.com/mymkyt/simple-lasso-public-score-0-12102

进阶比赛

Kaggle——销售量预测

这个比赛作为经典的时间序列问题之一,目标是为了预测下个月每种产品和商店的总销售额。

网址:https://www.kaggle.com/c/competitive-data-science-predict-future-sales

经典解决方案:

LightGBM: https://www.kaggle.com/sanket30/predicting-sales-using-lightgbm

XGBoost: https://www.kaggle.com/fabianaboldrin/eda-xgboost

第一名解决方案:https://www.kaggle.com/c/competitive-data-science-predict-future-sales/discussion/74835#latest-503740

TOP比赛方案

Kaggle——餐厅访客预测

网址:https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting

解决方案:

1st 方案: https://www.kaggle.com/plantsgo/solution-public-0-471-private-0-505

7th 方案:https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting/discussion/49259#latest-284437

8th 方案:https://github.com/MaxHalford/kaggle-recruit-restaurant

12th 方案:https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting/discussion/49251#latest-282765

Kaggle——CorporaciónFavoritaGrocery销售预测

网址:https://www.kaggle.com/c/favorita-grocery-sales-forecasting

解决方案:

1st 方案: https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47582#latest-360306

2st 方案:https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47568#latest-278474

3st 方案:https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47560#latest-302253

4st 方案:https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47529#latest-271077

5st方案:https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47556#latest-270515

6st方案:https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47575#latest-269568