政策资讯

Policy Information


EL之Bagging:kaggle比赛之利用titanic(泰坦尼克号)数据集建立Bagging模型对每个人进行获救是否预测

来源: 重庆市软件正版化服务中心    |    时间: 2022-09-19    |    浏览量: 65856    |   

EL之Bagging:kaggle比赛之利用titanic(泰坦尼克号)数据集建立Bagging模型对每个人进行获救是否预测

目录

输出结果

设计思路

核心代码


输出结果

设计思路

核心代码

  1. bagging_clf = BaggingRegressor(clf_LoR, n_estimators=10, max_samples=0.8, max_features=1.0, bootstrap=True, bootstrap_features=False, n_jobs=-1)
  2. bagging_clf.fit(X, y)
  3. BaggingRegressor
  4. class BaggingRegressor Found at: sklearn.ensemble.bagging
  5. class BaggingRegressor(BaseBagging, RegressorMixin):
  6. """A Bagging regressor.
  7. A Bagging regressor is an ensemble meta-estimator that fits base
  8. regressors each on random subsets of the original dataset and then
  9. aggregate their individual predictions (either by voting or by averaging)
  10. to form a final prediction. Such a meta-estimator can typically be used as
  11. a way to reduce the variance of a black-box estimator (e.g., a decision
  12. tree), by introducing randomization into its construction procedure and
  13. then making an ensemble out of it.
  14. This algorithm encompasses several works from the literature. When
  15. random
  16. subsets of the dataset are drawn as random subsets of the samples, then
  17. this algorithm is known as Pasting [1]_. If samples are drawn with
  18. replacement, then the method is known as Bagging [2]_. When random
  19. subsets
  20. of the dataset are drawn as random subsets of the features, then the
  21. method
  22. is known as Random Subspaces [3]_. Finally, when base estimators are
  23. built
  24. on subsets of both samples and features, then the method is known as
  25. Random Patches [4]_.
  26. Read more in the :ref:`User Guide <bagging>`.
  27. Parameters
  28. ----------
  29. base_estimator : object or None, optional (default=None)
  30. The base estimator to fit on random subsets of the dataset.
  31. If None, then the base estimator is a decision tree.
  32. n_estimators : int, optional (default=10)
  33. The number of base estimators in the ensemble.
  34. max_samples : int or float, optional (default=1.0)
  35. The number of samples to draw from X to train each base estimator.
  36. - If int, then draw `max_samples` samples.
  37. - If float, then draw `max_samples * X.shape[0]` samples.
  38. max_features : int or float, optional (default=1.0)
  39. The number of features to draw from X to train each base estimator.
  40. - If int, then draw `max_features` features.
  41. - If float, then draw `max_features * X.shape[1]` features.
  42. bootstrap : boolean, optional (default=True)
  43. Whether samples are drawn with replacement.
  44. bootstrap_features : boolean, optional (default=False)
  45. Whether features are drawn with replacement.
  46. oob_score : bool
  47. Whether to use out-of-bag samples to estimate
  48. the generalization error.
  49. warm_start : bool, optional (default=False)
  50. When set to True, reuse the solution of the previous call to fit
  51. and add more estimators to the ensemble, otherwise, just fit
  52. a whole new ensemble.
  53. n_jobs : int, optional (default=1)
  54. The number of jobs to run in parallel for both `fit` and `predict`.
  55. If -1, then the number of jobs is set to the number of cores.
  56. random_state : int, RandomState instance or None, optional
  57. (default=None)
  58. If int, random_state is the seed used by the random number generator;
  59. If RandomState instance, random_state is the random number
  60. generator;
  61. If None, the random number generator is the RandomState instance
  62. used
  63. by `np.random`.
  64. verbose : int, optional (default=0)
  65. Controls the verbosity of the building process.
  66. Attributes
  67. ----------
  68. estimators_ : list of estimators
  69. The collection of fitted sub-estimators.
  70. estimators_samples_ : list of arrays
  71. The subset of drawn samples (i.e., the in-bag samples) for each base
  72. estimator. Each subset is defined by a boolean mask.
  73. estimators_features_ : list of arrays
  74. The subset of drawn features for each base estimator.
  75. oob_score_ : float
  76. Score of the training dataset obtained using an out-of-bag estimate.
  77. oob_prediction_ : array of shape = [n_samples]
  78. Prediction computed with out-of-bag estimate on the training
  79. set. If n_estimators is small it might be possible that a data point
  80. was never left out during the bootstrap. In this case,
  81. `oob_prediction_` might contain NaN.
  82. References
  83. ----------
  84. .. [1] L. Breiman, "Pasting small votes for classification in large
  85. databases and on-line", Machine Learning, 36(1), 85-103, 1999.
  86. .. [2] L. Breiman, "Bagging predictors", Machine Learning, 24(2), 123-140,
  87. 1996.
  88. .. [3] T. Ho, "The random subspace method for constructing decision
  89. forests", Pattern Analysis and Machine Intelligence, 20(8), 832-844,
  90. 1998.
  91. .. [4] G. Louppe and P. Geurts, "Ensembles on Random Patches", Machine
  92. Learning and Knowledge Discovery in Databases, 346-361, 2012.
  93. """
  94. def __init__(self,
  95. base_estimator=None,
  96. n_estimators=10,
  97. max_samples=1.0,
  98. max_features=1.0,
  99. bootstrap=True,
  100. bootstrap_features=False,
  101. oob_score=False,
  102. warm_start=False,
  103. n_jobs=1,
  104. random_state=None,
  105. verbose=0):
  106. super(BaggingRegressor, self).__init__(base_estimator,
  107. n_estimators=n_estimators, max_samples=max_samples,
  108. max_features=max_features, bootstrap=bootstrap,
  109. bootstrap_features=bootstrap_features, oob_score=oob_score,
  110. warm_start=warm_start, n_jobs=n_jobs, random_state=random_state,
  111. verbose=verbose)
  112. def predict(self, X):
  113. """Predict regression target for X.
  114. The predicted regression target of an input sample is computed as
  115. the
  116. mean predicted regression targets of the estimators in the ensemble.
  117. Parameters
  118. ----------
  119. X : {array-like, sparse matrix} of shape = [n_samples, n_features]
  120. The training input samples. Sparse matrices are accepted only if
  121. they are supported by the base estimator.
  122. Returns
  123. -------
  124. y : array of shape = [n_samples]
  125. The predicted values.
  126. """
  127. check_is_fitted(self, "estimators_features_")
  128. Check data
  129. X = check_array(X, accept_sparse=['csr', 'csc'])
  130. Parallel loop
  131. n_jobs, n_estimators, starts = _partition_estimators(self.n_estimators,
  132. self.n_jobs)
  133. all_y_hat = Parallel(n_jobs=n_jobs, verbose=self.verbose)(
  134. delayed(_parallel_predict_regression)(
  135. self.estimators_[starts[i]:starts[i + 1]],
  136. self.estimators_features_[starts[i]:starts[i + 1]],
  137. X) for
  138. i in range(n_jobs))
  139. Reduce
  140. y_hat = sum(all_y_hat) / self.n_estimators
  141. return y_hat
  142. def _validate_estimator(self):
  143. """Check the estimator and set the base_estimator_ attribute."""
  144. super(BaggingRegressor, self)._validate_estimator
  145. (default=DecisionTreeRegressor())
  146. def _set_oob_score(self, X, y):
  147. n_samples = y.shape[0]
  148. predictions = np.zeros((n_samples, ))
  149. n_predictions = np.zeros((n_samples, ))
  150. for estimator, samples, features in zip(self.estimators_,
  151. self.estimators_samples_,
  152. self.estimators_features_):
  153. Create mask for OOB samples
  154. mask = ~samples
  155. predictions[mask] += estimator.predict(mask:])[(X[:features])
  156. n_predictions[mask] += 1
  157. if (n_predictions == 0).any():
  158. warn("Some inputs do not have OOB scores. "
  159. "This probably means too few estimators were used "
  160. "to compute any reliable oob estimates.")
  161. n_predictions[n_predictions == 0] = 1
  162. predictions /= n_predictions
  163. self.oob_prediction_ = predictions
  164. self.oob_score_ = r2_score(y, predictions)
文章知识点与官方知识档案匹配,可进一步学习相关知识

评论

QQ咨询 扫一扫加入群聊,了解更多平台咨询
微信咨询 扫一扫加入群聊,了解更多平台咨询
意见反馈
立即提交
QQ咨询
微信咨询
意见反馈