🚗 Mercedes Global Car Sales ETL Pipeline (2020–2025)

An end-to-end data pipeline that extracts raw Mercedes-Benz global car sales data from Kaggle, transforms it into three analytical fact tables, and loads the results as CSV reports — ready for dashboards or further analysis.

📁 Project Structure

.
├── main.py                        # Pipeline entry point
├── requirements.txt
├── README.md
├── reports/                       # Output CSVs written here after load
│   ├── fact_annual_metrics.csv
│   ├── fact_fuel_performance.csv
│   └── fact_model_profitability.csv
├── src/
│   ├── config/
│   │   ├── logging.py             # Logging setup
│   │   └── settings.py            # Constants: paths, column lists, body type map
│   ├── extract/
│   │   └── read_source.py         # Pulls data from Kaggle via kagglehub
│   ├── transform/
│   │   └── data_cleaning.py       # Validation, cleaning, and aggregation
│   └── load/
│       └── load_data.py           # Writes fact tables to CSV
└── tests/                         # Full unittest suite
    ├── extract/
    │   └── test_source.py
    ├── transform/
    │   └── test_transformation_schema.py
    │   └── test_transformation.py
    ├── load/
    │    └── test_loading.py
    ├── full_pipeline.py
    └── utils.py

⚙️ Pipeline Overview

Kaggle Dataset
     │
     ▼
[Extract]  read_source.py          → raw DataFrame
     │
     ▼
[Transform] data_cleaning.py       → fact_annual_metrics
                                   → fact_fuel_performance
                                   → fact_model_profitability
     │
     ▼
[Load]  load_data.py               → reports/*.csv

Fact Tables Produced

Table	Grain	Key Metrics
`fact_annual_metrics`	Year	Total units sold, total revenue, avg price, YoY revenue growth %
`fact_fuel_performance`	Year × Fuel Type	Units sold, avg price, avg horsepower, market share %
`fact_model_profitability`	Year × Model × Body Type	Units sold, total revenue, avg horsepower

🛠️ Setup

1. Clone the repository

git clone https://github.com/franchierchie/End-to-End-Data-Solution-Mercedes-Global-Car-Sales-2020-2025.git
cd End-to-End-Data-Solution-Mercedes-Global-Car-Sales-2020-2025

2. Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate        # macOS / Linux
.venv\Scripts\activate           # Windows

3. Install dependencies

pip install -r requirements.txt

4. Configure Kaggle credentials

The pipeline uses kagglehub to download the dataset. Authenticate once before running:

pip install kaggle
kaggle config set -n username -v <your_kaggle_username>
kaggle config set -n key     -v <your_kaggle_api_key>

Or export the environment variables:

export KAGGLE_USERNAME=<your_kaggle_username>
export KAGGLE_KEY=<your_kaggle_api_key>

You can generate an API key from your Kaggle account settings.

▶️ Running the Pipeline

python main.py

Output CSVs are written to the reports/ directory.

🧪 Running the Tests

python -m unittest tests/test_etl_pipeline.py -v

The test suite is self-contained — it does not require a Kaggle connection or the real project package to be installed. All data source calls are replaced by an in-memory fixture.

Test Coverage

Area	What is tested
Extract	Returns a DataFrame, non-empty, all required columns present, correct dtypes
Validate schema	Raises `ValueError` for missing columns; error message names the missing column
Transform	Column renaming, `Sales Volume` removal, `Body Type` mapping (known & unknown models), aggregation correctness, YoY growth, market share sums to 100 %, rounding, null-Model guard, deduplication
Load	Three CSVs are created, non-empty, round-trip correctly; raises `ValueError` for any empty fact table
End-to-end	Full pipeline with standard data; edge cases: single year, single fuel type, all-unknown models, YoY growth direction, CSV row count consistency

📦 requirements.txt

See requirements.txt for the full dependency list.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚗 Mercedes Global Car Sales ETL Pipeline (2020–2025)

📁 Project Structure

⚙️ Pipeline Overview

Fact Tables Produced

🛠️ Setup

1. Clone the repository

2. Create and activate a virtual environment

3. Install dependencies

4. Configure Kaggle credentials

▶️ Running the Pipeline

🧪 Running the Tests

Test Coverage

📦 requirements.txt

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
reports		reports
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🚗 Mercedes Global Car Sales ETL Pipeline (2020–2025)

📁 Project Structure

⚙️ Pipeline Overview

Fact Tables Produced

🛠️ Setup

1. Clone the repository

2. Create and activate a virtual environment

3. Install dependencies

4. Configure Kaggle credentials

▶️ Running the Pipeline

🧪 Running the Tests

Test Coverage

📦 requirements.txt

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages