概述:ML Regression
[TOC]
1、线性回归
1 2 3 4 5 6 7 8
| from sklearn import linear_model
regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)
y_pred = regr.predict(X_test)
|
2、岭回归
对系数进行惩罚(L2范式)来解决普通最小二乘法
1 2 3 4 5 6 7
| from sklearn.linear_model import Ridge
reg.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
reg.coef_ reg.intercept_
|
3、Lasso回归
最小二乘法的基础上加入L1范式作为惩罚项
1 2 3 4 5 6 7 8 9
| from sklearn.linear_model import Lasso
reg = Lasso(alpha=0.1)
reg.fit([[0, 0], [1, 1]], [0, 1]) """ Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000, normalize=False, positive=False, precompute=False, random_state=None, selection='cyclic', tol=0.0001, warm_start=False) """
reg.predict([[1, 1]])
|
4、Elastic Net回归
利用L1范式和L2范式共同作为惩罚项
1 2 3 4 5 6 7 8 9 10 11 12
| from sklearn.linear_model import ElasticNet
from sklearn.datasets import make_regression X, y = make_regression(n_features=2, random_state=0)
regr = ElasticNet(random_state=0)
regr.fit(X, y) print(regr.coef_) print(regr.intercept_) print(regr.predict([[0, 0]]))
|
5、贝叶斯岭回归
贝叶斯岭回归模型和岭回归类似。贝叶斯岭回归通过最大化边际对数似然来估计参数。
1 2 3 4
| from sklearn.linear_model import BayesianRidge X = [[0., 0.], [1., 1.], [2., 2.], [3., 3.]] Y = [0., 1., 2., 3.] reg = BayesianRidge() reg.fit(X, Y)
|
6、SGD回归
上述的线性模型通过最小二乘法来优化损失函数,SGD回归也是一种线性回归,不同的是,它通过随机梯度下降最小化正则化经验损失。
1 2 3 4 5 6 7
| import numpy as np from sklearn import linear_model n_samples, n_features = 10, 5 np.random.seed(0) y = np.random.randn(n_samples) X = np.random.randn(n_samples, n_features) clf = linear_model.SGDRegressor(max_iter=1000, tol=1e-3) clf.fit(X, y)
|
7、SVR
1 2 3 4 5 6 7 8 9 10 11
| from sklearn.svm import SVR
X = [[0, 0], [2, 2]] y = [0.5, 2.5]
clf = SVR()
clf.fit(X, y) """ SVR(C=1.0, cache_size=200, coef0=0.0, degree=3, epsilon=0.1, gamma='auto_deprecated', kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False) """ clf.predict([[1, 1]])
|
8、KNN回归
1 2 3 4 5
| X = [[0], [1], [2], [3]] y = [0, 0, 1, 1] from sklearn.neighbors import KNeighborsRegressor neigh = KNeighborsRegressor(n_neighbors=2) neigh.fit(X, y) print(neigh.predict([[1.5]]))
|
9、决策树回归
1 2 3 4 5
| from sklearn.tree import DecisionTreeRegressor X = [[0, 0], [2, 2]] y = [0.5, 2.5] clf = DecisionTreeRegressor() clf = clf.fit(X, y) clf.predict([[1, 1]])
|
10、神经网络
1 2 3
| from sklearn.neural_network import MLPRegressor mlp=MLPRegressor() mlp.fit(X_train,y_train) y_pred = mlp.predict(X_test)
|
11.RandomForest回归
1 2 3 4 5 6 7
| from sklearn.ensemble import RandomForestRegressor from sklearn.datasets import make_regression X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False) regr = RandomForestRegressor(max_depth=2, random_state=0, n_estimators=100) regr.fit(X, y) print(regr.feature_importances_) print(regr.predict([[0, 0, 0, 0]]))
|
12、XGBoost回归
1 2 3
| import xgboost as xgb xgb_model = xgb.XGBRegressor(max_depth = 3, learning_rate = 0.1, n_estimators = 100, objective = 'reg:linear', n_jobs = -1) xgb_model.fit(X_train, y_train, eval_set=[(X_train, y_train)], eval_metric='logloss', verbose=100) y_pred = xgb_model.predict(X_test) print(mean_squared_error(y_test, y_pred))
|
13、LightGBM回归
LightGBM作为另一个使用基于树的学习算法的梯度增强框架。相比于XGBoost,LightGBM有如下优点,训练速度更快,效率更高效;低内存的使用量,优势:更快的训练效率,低内存使用
更好的准确率;支持并行学习;可处理大规模数据
1 2 3 4
| import lightgbm as lgb gbm = lgb.LGBMRegressor(num_leaves=31, learning_rate=0.05, n_estimators=20) gbm.fit(X_train, y_train, eval_set=[(X_train, y_train)], eval_metric='logloss', verbose=100) y_pred = gbm.predict(X_test) print(mean_squared_error(y_test, y_pred))
|
入门级比赛
Kaggle——房价预测
这个比赛作为最基础的回归问题之一,很适合入门机器学习的小伙伴们。
网址:https://www.kaggle.com/c/house-prices-advanced-regression-techniques
经典解决方案:
XGBoost解决方案: https://www.kaggle.com/dansbecker/xgboost
Lasso解决方案: https://www.kaggle.com/mymkyt/simple-lasso-public-score-0-12102
进阶比赛
Kaggle——销售量预测
这个比赛作为经典的时间序列问题之一,目标是为了预测下个月每种产品和商店的总销售额。
网址:https://www.kaggle.com/c/competitive-data-science-predict-future-sales
经典解决方案:
LightGBM: https://www.kaggle.com/sanket30/predicting-sales-using-lightgbm
XGBoost: https://www.kaggle.com/fabianaboldrin/eda-xgboost
第一名解决方案:https://www.kaggle.com/c/competitive-data-science-predict-future-sales/discussion/74835#latest-503740
TOP比赛方案
Kaggle——餐厅访客预测
网址:https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting
解决方案:
1st 方案: https://www.kaggle.com/plantsgo/solution-public-0-471-private-0-505
7th 方案:https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting/discussion/49259#latest-284437
8th 方案:https://github.com/MaxHalford/kaggle-recruit-restaurant
12th 方案:https://www.kaggle.com/c/recruit-restaurant-visitor-forecasting/discussion/49251#latest-282765
Kaggle——CorporaciónFavoritaGrocery销售预测
网址:https://www.kaggle.com/c/favorita-grocery-sales-forecasting
解决方案:
1st 方案: https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47582#latest-360306
2st 方案:https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47568#latest-278474
3st 方案:https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47560#latest-302253
4st 方案:https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47529#latest-271077
5st方案:https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47556#latest-270515
6st方案:https://www.kaggle.com/c/favorita-grocery-sales-forecasting/discussion/47575#latest-269568