政策资讯

Policy Information


ML之Xgboost:利用Xgboost模型(7f-CrVa+网格搜索调参)对数据集(比马印第安人糖尿病)进行二分类预测

来源: 重庆市软件正版化服务中心    |    时间: 2022-09-19    |    浏览量: 64306    |   

ML之Xgboost:利用Xgboost模型(7f-CrVa+网格搜索调参)对数据集(比马印第安人糖尿病)进行二分类预测

目录

输出结果

设计思路

核心代码


输出结果

设计思路

核心代码

  1. grid_search = GridSearchCV(model, param_grid, scoring="neg_log_loss", n_jobs=-1, cv=kfold)
  2. grid_result = grid_search.fit(X, Y)
  3. param_grid = dict(learning_rate=learning_rate)
  4. kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=7)
  1. class GridSearchCV(-title class_ inherited__">BaseSearchCV):
  2. """Exhaustive search over specified parameter values for an estimator.
  3. Important members are fit, predict.
  4. GridSearchCV implements a "fit" and a "score" method.
  5. It also implements "predict", "predict_proba", "decision_function",
  6. "transform" and "inverse_transform" if they are implemented in the
  7. estimator used.
  8. The parameters of the estimator used to apply these methods are
  9. optimized
  10. by cross-validated grid-search over a parameter grid.
  11. Read more in the :ref:`User Guide <grid_search>`.
  12. Parameters
  13. ----------
  14. estimator : estimator object.
  15. This is assumed to implement the scikit-learn estimator interface.
  16. Either estimator needs to provide a ``score`` function,
  17. or ``scoring`` must be passed.
  18. param_grid : dict or list of dictionaries
  19. Dictionary with parameters names (string) as keys and lists of
  20. parameter settings to try as values, or a list of such
  21. dictionaries, in which case the grids spanned by each dictionary
  22. in the list are explored. This enables searching over any sequence
  23. of parameter settings.
  24. scoring : string, callable, list/tuple, dict or None, default: None
  25. A single string (see :ref:`scoring_parameter`) or a callable
  26. (see :ref:`scoring`) to evaluate the predictions on the test set.
  27. For evaluating multiple metrics, either give a list of (unique) strings
  28. or a dict with names as keys and callables as values.
  29. NOTE that when using custom scorers, each scorer should return a
  30. single
  31. value. Metric functions returning a list/array of values can be wrapped
  32. into multiple scorers that return one value each.
  33. See :ref:`multimetric_grid_search` for an example.
  34. If None, the estimator's default scorer (if available) is used.
  35. fit_params : dict, optional
  36. Parameters to pass to the fit method.
  37. .. deprecated:: 0.19
  38. ``fit_params`` as a constructor argument was deprecated in version
  39. 0.19 and will be removed in version 0.21. Pass fit parameters to
  40. the ``fit`` method instead.
  41. n_jobs : int, default=1
  42. Number of jobs to run in parallel.
  43. pre_dispatch : int, or string, optional
  44. Controls the number of jobs that get dispatched during parallel
  45. execution. Reducing this number can be useful to avoid an
  46. explosion of memory consumption when more jobs get dispatched
  47. than CPUs can process. This parameter can be:
  48. - None, in which case all the jobs are immediately
  49. created and spawned. Use this for lightweight and
  50. fast-running jobs, to avoid delays due to on-demand
  51. spawning of the jobs
  52. - An int, giving the exact number of total jobs that are
  53. spawned
  54. - A string, giving an expression as a function of n_jobs,
  55. as in '2*n_jobs'
  56. iid : boolean, default=True
  57. If True, the data is assumed to be identically distributed across
  58. the folds, and the loss minimized is the total loss per sample,
  59. and not the mean loss across the folds.
  60. cv : int, cross-validation generator or an iterable, optional
  61. Determines the cross-validation splitting strategy.
  62. Possible inputs for cv are:
  63. - None, to use the default 3-fold cross validation,
  64. - integer, to specify the number of folds in a `(Stratified)KFold`,
  65. - An object to be used as a cross-validation generator.
  66. - An iterable yielding train, test splits.
  67. For integer/None inputs, if the estimator is a classifier and ``y`` is
  68. either binary or multiclass, :class:`StratifiedKFold` is used. In all
  69. other cases, :class:`KFold` is used.
  70. Refer :ref:`User Guide <cross_validation>` for the various
  71. cross-validation strategies that can be used here.
  72. refit : boolean, or string, default=True
  73. Refit an estimator using the best found parameters on the whole
  74. dataset.
  75. For multiple metric evaluation, this needs to be a string denoting the
  76. scorer is used to find the best parameters for refitting the estimator
  77. at the end.
  78. The refitted estimator is made available at the ``best_estimator_``
  79. attribute and permits using ``predict`` directly on this
  80. ``GridSearchCV`` instance.
  81. Also for multiple metric evaluation, the attributes ``best_index_``,
  82. ``best_score_`` and ``best_parameters_`` will only be available if
  83. ``refit`` is set and all of them will be determined w.r.t this specific
  84. scorer.
  85. See ``scoring`` parameter to know more about multiple metric
  86. evaluation.
  87. verbose : integer
  88. Controls the verbosity: the higher, the more messages.
  89. error_score : 'raise' (default) or numeric
  90. Value to assign to the score if an error occurs in estimator fitting.
  91. If set to 'raise', the error is raised. If a numeric value is given,
  92. FitFailedWarning is raised. This parameter does not affect the refit
  93. step, which will always raise the error.
  94. return_train_score : boolean, optional
  95. If ``False``, the ``cv_results_`` attribute will not include training
  96. scores.
  97. Current default is ``'warn'``, which behaves as ``True`` in addition
  98. to raising a warning when a training score is looked up.
  99. That default will be changed to ``False`` in 0.21.
  100. Computing training scores is used to get insights on how different
  101. parameter settings impact the overfitting/underfitting trade-off.
  102. However computing the scores on the training set can be
  103. computationally
  104. expensive and is not strictly required to select the parameters that
  105. yield the best generalization performance.
  106. Examples
  107. --------
  108. >>> from sklearn import svm, datasets
  109. >>> from sklearn.model_selection import GridSearchCV
  110. >>> iris = datasets.load_iris()
  111. >>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
  112. >>> svc = svm.SVC()
  113. >>> clf = GridSearchCV(svc, parameters)
  114. >>> clf.fit(iris.data, iris.target)
  115. ... doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
  116. GridSearchCV(cv=None, error_score=...,
  117. estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=...,
  118. decision_function_shape='ovr', degree=..., gamma=...,
  119. kernel='rbf', max_iter=-1, probability=False,
  120. random_state=None, shrinking=True, tol=...,
  121. verbose=False),
  122. fit_params=None, iid=..., n_jobs=1,
  123. param_grid=..., pre_dispatch=..., refit=..., return_train_score=...,
  124. scoring=..., verbose=...)
  125. >>> sorted(clf.cv_results_.keys())
  126. ... doctest: +NORMALIZE_WHITESPACE +ELLIPSIS
  127. ['mean_fit_time', 'mean_score_time', 'mean_test_score',...
  128. 'mean_train_score', 'param_C', 'param_kernel', 'params',...
  129. 'rank_test_score', 'split0_test_score',...
  130. 'split0_train_score', 'split1_test_score', 'split1_train_score',...
  131. 'split2_test_score', 'split2_train_score',...
  132. 'std_fit_time', 'std_score_time', 'std_test_score', 'std_train_score'...]
  133. Attributes
  134. ----------
  135. cv_results_ : dict of numpy (masked) ndarrays
  136. A dict with keys as column headers and values as columns, that can be
  137. imported into a pandas ``DataFrame``.
  138. For instance the below given table
  139. +------------+-----------+------------+-----------------+---+---------+
  140. |param_kernel|param_gamma|param_degree|split0_test_score|...
  141. |rank_t...|
  142. +============+===========+============+========
  143. =========+===+=========+
  144. | 'poly' | -- | 2 | 0.8 |...| 2 |
  145. +------------+-----------+------------+-----------------+---+---------+
  146. | 'poly' | -- | 3 | 0.7 |...| 4 |
  147. +------------+-----------+------------+-----------------+---+---------+
  148. | 'rbf' | 0.1 | -- | 0.8 |...| 3 |
  149. +------------+-----------+------------+-----------------+---+---------+
  150. | 'rbf' | 0.2 | -- | 0.9 |...| 1 |
  151. +------------+-----------+------------+-----------------+---+---------+
  152. will be represented by a ``cv_results_`` dict of::
  153. {
  154. 'param_kernel': masked_array(data = ['poly', 'poly', 'rbf', 'rbf'],
  155. mask = [False False False False]...)
  156. 'param_gamma': masked_array(data = [-- -- 0.1 0.2],
  157. mask = [ True True False False]...),
  158. 'param_degree': masked_array(data = [2.0 3.0 -- --],
  159. mask = [False False True True]...),
  160. 'split0_test_score' : [0.8, 0.7, 0.8, 0.9],
  161. 'split1_test_score' : [0.82, 0.5, 0.7, 0.78],
  162. 'mean_test_score' : [0.81, 0.60, 0.75, 0.82],
  163. 'std_test_score' : [0.02, 0.01, 0.03, 0.03],
  164. 'rank_test_score' : [2, 4, 3, 1],
  165. 'split0_train_score' : [0.8, 0.9, 0.7],
  166. 'split1_train_score' : [0.82, 0.5, 0.7],
  167. 'mean_train_score' : [0.81, 0.7, 0.7],
  168. 'std_train_score' : [0.03, 0.03, 0.04],
  169. 'mean_fit_time' : [0.73, 0.63, 0.43, 0.49],
  170. 'std_fit_time' : [0.01, 0.02, 0.01, 0.01],
  171. 'mean_score_time' : [0.007, 0.06, 0.04, 0.04],
  172. 'std_score_time' : [0.001, 0.002, 0.003, 0.005],
  173. 'params' : [{'kernel': 'poly', 'degree': 2}, ...],
  174. }
  175. NOTE
  176. The key ``'params'`` is used to store a list of parameter
  177. settings dicts for all the parameter candidates.
  178. The ``mean_fit_time``, ``std_fit_time``, ``mean_score_time`` and
  179. ``std_score_time`` are all in seconds.
  180. For multi-metric evaluation, the scores for all the scorers are
  181. available in the ``cv_results_`` dict at the keys ending with that
  182. scorer's name (``'_<scorer_name>'``) instead of ``'_score'`` shown
  183. above. ('split0_test_precision', 'mean_train_precision' etc.)
  184. best_estimator_ : estimator or dict
  185. Estimator that was chosen by the search, i.e. estimator
  186. which gave highest score (or smallest loss if specified)
  187. on the left out data. Not available if ``refit=False``.
  188. See ``refit`` parameter for more information on allowed values.
  189. best_score_ : float
  190. Mean cross-validated score of the best_estimator
  191. For multi-metric evaluation, this is present only if ``refit`` is
  192. specified.
  193. best_params_ : dict
  194. Parameter setting that gave the best results on the hold out data.
  195. For multi-metric evaluation, this is present only if ``refit`` is
  196. specified.

评论

QQ咨询 扫一扫加入群聊,了解更多平台咨询
微信咨询 扫一扫加入群聊,了解更多平台咨询
意见反馈
立即提交
QQ咨询
微信咨询
意见反馈