Skip to content

Stop using Boston dataset in tests and examples#494

Open
StrikerRUS wants to merge 6 commits into
masterfrom
drop_boston
Open

Stop using Boston dataset in tests and examples#494
StrikerRUS wants to merge 6 commits into
masterfrom
drop_boston

Conversation

@StrikerRUS
Copy link
Copy Markdown
Member

@StrikerRUS StrikerRUS commented Jan 26, 2022

Boston dataset will be removed from scikit-learn in next version due to ethical reasons. Details: scikit-learn/scikit-learn#16155.

  /usr/local/lib/python3.7/dist-packages/sklearn/utils/deprecation.py:87: FutureWarning: Function load_boston is deprecated; `load_boston` is deprecated in 1.0 and will be removed in 1.2.
  
      The Boston housing prices dataset has an ethical problem. You can refer to
      the documentation of this function for further details.
  
      The scikit-learn maintainers therefore strongly discourage the use of this
      dataset unless the purpose of the code is to study and educate about
      ethical issues in data science and machine learning.
  
      In this special case, you can fetch the dataset from the original
      source::
  
          import pandas as pd
          import numpy as np
  
  
          data_url = "http://lib.stat.cmu.edu/datasets/boston"
          raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
          data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
          target = raw_df.values[1::2, 2]
  
      Alternative datasets include the California housing dataset (i.e.
      :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
      dataset. You can load the datasets as follows::
  
          from sklearn.datasets import fetch_california_housing
          housing = fetch_california_housing()
  
      for the California housing dataset and::
  
          from sklearn.datasets import fetch_openml
          housing = fetch_openml(name="house_prices", as_frame=True)
  
      for the Ames housing dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant