原理:
https://www.cnblogs.com/pinard/p/6140514.html
https://zhuanlan.zhihu.com/p/108641227
示例:
https://zhuanlan.zhihu.com/p/40356430
https://www.pythonf.cn/read/5079
随机选择训练+测试样本参数解释:
https://www.cnblogs.com/pinard/p/6143927.html
https://www.cnblogs.com/Yanjy-OnlyOne/p/11288098.html
调参
网格搜索 GridSearchCV:https://zhuanlan.zhihu.com/p/37310443
# 调参
cv_params = {'learning_rate': [0.1, 0.05,0.01], 'max_depth': [1,3,5,7,10], 'n_estimators': [100,200,300]}
ind_params = {'random_state': 10}optimized_GBM = GridSearchCV(GradientBoostingRegressor(**ind_params),cv_params,scoring='neg_mean_squared_error', cv=5, n_jobs=-1, verbose=10)optimized_GBM.fit(X_pr_train, y_pr_train)
随机搜索 RandomizedSearchCV:https://blog.csdn.net/juezhanangle/article/details/80051256
# 调参
param_dist = {'learning_rate': [0.1, 0.05,0.01], 'max_depth': [10,50,100], 'n_estimators': [100,200,300]}
ind_params = {'random_state': 10}n_iter_search = 20
random_search = RandomizedSearchCV(GradientBoostingRegressor(**ind_params), param_distributions=param_dist,n_iter=n_iter_search, cv = 3,scoring = 'roc_auc',n_jobs = -1)start = time()
random_search.fit(X_nopr_train, y_nopr_train)print("RandomizedSearchCV took %.2f seconds for %d candidates"" parameter settings." % ((time() - start), n_iter_search))
report(random_search.cv_results_)
贝叶斯优化: 一种更好的超参数调优方式
https://zhuanlan.zhihu.com/p/29779000
#贝叶斯优化
def rf_cv(n_estimators, min_samples_split, learning_rate, max_depth):val = cross_val_score(GradientBoostingRegressor(n_estimators=int(n_estimators),min_samples_split=int(min_samples_split),learning_rate = min(learning_rate, 0.999), # floatmax_depth=int(max_depth),random_state=2),X_nopr_train, y_nopr_train, scoring='roc_auc', cv=5).mean()return val#建立贝叶斯优化对象:
rf_bo = BayesianOptimization(rf_cv,{'n_estimators': (100, 300),'min_samples_split': (2, 25),'learning_rate': (0.01, 0.999),'max_depth': (10, 150)},n_jobs = -1)rf_bo.maximize()#最大值
rf_bo.max
plot_partial_dependence()官方文档:
https://scikit-learn.org/stable/modules/generated/sklearn.inspection.plot_partial_dependence.html