Skip to content
Closed
Show file tree
Hide file tree
Changes from 51 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
3a02bef
Add Cercarbono project processing and update raw columns mapping
andersy005 Dec 10, 2025
47829e2
Merge branch 'main' into add-Cercarbano
andersy005 Dec 11, 2025
95d9a1a
Update Cercarbono mappings in projects-raw-columns-mapping.json
andersy005 Dec 11, 2025
dbbc2c6
Add method to generate project URLs for Cercarbono projects
andersy005 Dec 11, 2025
8a06d33
Add processing method for Cercarbono transactions and update column m…
andersy005 Dec 11, 2025
7a1dcc4
Update transaction date conversion to use ISO8601 format
andersy005 Dec 11, 2025
a05fc97
Extract vintage year from vintage_of_credits in process_cercarbono_tr…
andersy005 Dec 11, 2025
328d074
Add missing columns handling in process_cercarbono_transactions
andersy005 Dec 11, 2025
2c2df90
Refactor process_cercarbono_projects to accept credits DataFrame and …
andersy005 Dec 11, 2025
3a36872
Remove unnecessary parameter from process_vcs_projects calls in tests
andersy005 Dec 11, 2025
3f6029a
Add process_isometric_projects function to handle Isometric project data
andersy005 Dec 11, 2025
121a275
Add isometric project mappings to projects-raw-columns-mapping.json
andersy005 Dec 11, 2025
e8d93cc
Add project URL handling and enhance isometric project processing
andersy005 Dec 11, 2025
53b461e
Rename process_cercarbono_transactions to process_cercarbono_credits …
andersy005 Dec 12, 2025
a7a7540
Enhance process_isometric_credits function to include datetime conver…
andersy005 Dec 12, 2025
156694b
Add project ID and vintage year extraction to process_isometric_credi…
andersy005 Dec 12, 2025
31b6cb4
Change integer columns to Float32 in project_schema and credit_withou…
andersy005 Dec 12, 2025
cf5ca9c
Uncomment methods to add retired and issued totals, and first issuanc…
andersy005 Dec 12, 2025
783b1d2
Refactor process_isometric_credits function to handle transaction typ…
andersy005 Dec 12, 2025
eaa2599
Add 'isometric' and 'cercarbono' to registry abbreviation mapping
andersy005 Jan 5, 2026
04072d5
Update project_id mapping in cercarbono retirements and remove redund…
andersy005 Jan 14, 2026
d97f43b
Add project ID methods for Cercarbono and Isometric credits dataframe…
andersy005 Jan 14, 2026
5efeea9
Fix project ID assignment order in process_cercarbono_projects and up…
andersy005 Jan 14, 2026
0b6d441
Refactor process_cercarbono_credits to streamline data handling for i…
andersy005 Jan 14, 2026
3966307
Merge branch 'main' into add-Cercarbano
andersy005 Jan 14, 2026
ad1804a
Enhance process_isometric_credits to support project ID mapping with …
andersy005 Jan 14, 2026
d2bc9a2
Add harmonization option for beneficiary data in process functions
andersy005 Jan 14, 2026
62153ae
Refactor process_isometric_credits to improve flow and readability by…
andersy005 Jan 14, 2026
75a1c69
Merge branch 'main' into add-Cercarbano
andersy005 Jan 29, 2026
1afb6e8
Retrigger CI
andersy005 Feb 12, 2026
0c1c6d9
Merge branch 'main' into add-Cercarbano
andersy005 Feb 12, 2026
adcfc70
Refactor import statements for pandera to use pandas submodule
andersy005 Feb 12, 2026
9816997
Add new project types and update isometric project type inference logic
andersy005 Feb 12, 2026
99b03bf
Add Cercarbono project type inference and update protocol mapping
andersy005 Feb 12, 2026
c40bcf1
Merge branch 'main' into add-Cercarbano
andersy005 Feb 25, 2026
09f7c6e
Refactor protocol mapping: rename 'ccb-reforest' to 'ccb-refor' and r…
andersy005 Feb 25, 2026
f772e5b
Remove Cercarbono and Isometric project type inference from processin…
andersy005 Feb 25, 2026
56e9bb8
Add project type inference to process_cercarbono_projects and process…
andersy005 Feb 25, 2026
d5a6282
Update cercarbano config for protocol definition
badgley Feb 26, 2026
5dbec5b
Fix typos in CCB methodology descriptions in all-protocol-mapping.json
andersy005 Feb 26, 2026
b87fc10
Map more cercarbono protocol strings
badgley Mar 3, 2026
143606c
Merge branch 'main' into add-Cercarbano
andersy005 Mar 3, 2026
e9d8de0
fix formatting
andersy005 Mar 3, 2026
48e7d43
remove infer_cercarbono_project_type and infer_isometric_project_type…
andersy005 Mar 4, 2026
0276b2c
Fix project type mapping string
badgley Mar 4, 2026
6531715
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 4, 2026
70bb898
Add protocol mapping to project processing functions and update add_c…
andersy005 Mar 4, 2026
767444e
Change project schema to use Float64 for retired, issued, and quantit…
andersy005 Mar 5, 2026
c8dea2f
Update project_id generation to preserve full code for Cercarbono pro…
andersy005 Mar 5, 2026
146d145
Add optional projects parameter to process_cercarbono_credits for imp…
andersy005 Mar 5, 2026
d364b97
Refactor project ID generation in process_cercarbono_credits to use g…
andersy005 Mar 11, 2026
3599b43
Update beneficiary data to include new registries
badgley Mar 16, 2026
d3e7f52
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 16, 2026
d404725
Merge branch 'main' into add-Cercarbano
andersy005 Apr 14, 2026
496b98f
Merge main: adopt refactored test signatures from #153
andersy005 Apr 14, 2026
530fef3
add tests for cercarbono and isometric
andersy005 Apr 14, 2026
773fff9
remove stale download_type='projects' from both test calls
andersy005 Apr 14, 2026
3ed8931
Refactor CI workflow: separate unit-test and integration-test jobs
andersy005 Apr 14, 2026
4bb16e9
Update credits mapping and normalize data processing in isometric.py
andersy005 Apr 15, 2026
e911f1f
Add transaction_url field to credits mapping and models
andersy005 Apr 15, 2026
62e1f0b
Add transaction_url field for issuances in process_isometric_credits …
andersy005 Apr 15, 2026
a7e74c2
update isometric retirement data
andersy005 Apr 15, 2026
e6ac8bc
Update scratch date in test fixtures and adjust test cases for cercar…
andersy005 Apr 15, 2026
9cb0328
Refactor tests to use subtests for improved clarity
andersy005 Apr 15, 2026
377f59d
Update vintage mapping for isometric issuances and adjust transaction…
andersy005 Apr 15, 2026
e0e9c3f
Fix precision errors in credit totals calculation by rounding based o…
andersy005 Apr 15, 2026
a200308
Merge branch 'main' into add-Cercarbano
andersy005 Apr 15, 2026
dd06d31
Merge branch 'main' into add-Cercarbano
andersy005 Apr 21, 2026
efd8640
Upadte protocol mapping
badgley Apr 21, 2026
53a64bb
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 21, 2026
49fcbab
Strip cercarbono-specific code; scope branch to Isometric only
andersy005 Apr 22, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions offsets_db_data/apx.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
PROJECT_SCHEMA_UPATH,
load_column_mapping,
load_inverted_protocol_mapping,
load_protocol_mapping,
load_registry_project_column_mapping,
load_type_category_mapping,
)
Expand Down Expand Up @@ -212,6 +213,7 @@ def process_apx_projects(
)
inverted_column_mapping = {value: key for key, value in registry_project_column_mapping.items()}
inverted_protocol_mapping = load_inverted_protocol_mapping()
protocol_mapping = load_protocol_mapping()
type_category_mapping = load_type_category_mapping()
data = df.rename(columns=inverted_column_mapping)
if registry_name == 'art-trees':
Expand All @@ -234,8 +236,9 @@ def process_apx_projects(
override_data_path=BERKELEY_PROJECT_TYPE_UPATH, source_str='berkeley'
)
.add_category(
type_category_mapping=type_category_mapping
) # must come after types; type -> category
type_category_mapping=type_category_mapping,
protocol_mapping=protocol_mapping,
) # category derived from protocol; project_type is independent
.map_project_type_to_display_name(type_category_mapping=type_category_mapping)
.add_is_compliance_flag()
.add_retired_and_issued_totals(credits=credits)
Expand Down
197 changes: 197 additions & 0 deletions offsets_db_data/cercarbono.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
import pandas as pd
import pandas_flavor as pf

from offsets_db_data.common import (
BERKELEY_PROJECT_TYPE_UPATH,
CREDIT_SCHEMA_UPATH,
PROJECT_SCHEMA_UPATH,
load_column_mapping,
load_inverted_protocol_mapping,
load_protocol_mapping,
load_registry_project_column_mapping,
load_type_category_mapping,
)
from offsets_db_data.credits import (
aggregate_issuance_transactions, # noqa: F401
harmonize_beneficiary_data, # noqa: F401
merge_with_arb, # noqa: F401
)
from offsets_db_data.models import credit_without_id_schema, project_schema
from offsets_db_data.projects import (
add_category, # noqa: F401
add_first_issuance_and_retirement_dates, # noqa: F401
add_is_compliance_flag, # noqa: F401
add_retired_and_issued_totals, # noqa: F401
harmonize_country_names, # noqa: F401
harmonize_status_codes, # noqa: F401
map_protocol, # noqa: F401
)


@pf.register_dataframe_method
def add_cercarbono_project_url(df: pd.DataFrame) -> pd.DataFrame:
"""Add project URL column for Cercarbono projects.

Parameters
----------
df : pd.DataFrame
Input dataframe containing Cercarbono project data.

Returns
-------
pd.DataFrame
Dataframe with added project URL column.
"""
base_url = 'https://www.ecoregistry.io/projects'
df['project_url'] = df['project_id'].apply(lambda x: f'{base_url}/{x}')
return df


@pf.register_dataframe_method
def add_cercarbono_project_id(df: pd.DataFrame, prefix: str = 'CCB') -> pd.DataFrame:
"""Add project ID column for Cercarbono credits dataframe.

Parameters
----------
df : pd.DataFrame
Input dataframe containing Cercarbono credit transactions data.

Returns
-------
pd.DataFrame
Dataframe with added project ID column.
"""
df = df.copy()
# Use the globally unique numeric id (not the per-prefix code number) to avoid collisions.
# Different code prefixes (CDC, CP, CGS, CDB, CBA) share numeric suffixes (e.g. CDC-1,
# CP-1, CGS-1) but each project has a distinct id across the whole registry.
df['project_id'] = prefix + df['id'].astype(str)
return df


@pf.register_dataframe_method
def process_cercarbono_credits(
df: pd.DataFrame,
*,
download_type: str,
registry_name: str = 'cercarbono',
prefix: str = 'CCB',
harmonize_beneficiary_info: bool = False,
) -> pd.DataFrame:
"""Process Cercarbono transactions dataframe to conform to offsets-db schema.

Parameters
----------
df : pd.DataFrame
Input dataframe containing Cercarbono credit transactions data.
download_type : str, optional
Type of data to download, either 'issuances' or 'retirements'.
registry_name : str, optional
Name of the registry to be added to the dataframe, by default "cercarbono"
prefix : str, optional
Prefix to add to project IDs, by default "CCB"

Returns
-------
pd.DataFrame
Processed dataframe conforming to offsets-db schema.
"""

if download_type == 'issuances':
# TODO: @badgley, please confirm this is the correct way to extract vintage year for issuances
df['vintage'] = df['vintage_of_credits'].str.split(' / ').str[-1].str[:4].astype(int)
df['transaction_type'] = 'issuance'
# Extract numeric project ID from serial — this is the globally unique id.
# Standard format: CDC_1_... → id at index 1
# Revised format: CDC_R_16_... → id at index 2 (R indicates revision)
parts = df.serial.str.split('_')
numeric_id = parts.str[1].where(parts.str[1] != 'R', parts.str[2])
df['project_id'] = prefix + numeric_id

else:
df['transaction_type'] = 'retirement'
# project_id in the raw retirements data is the numeric id
df['project_id'] = prefix + df['project_id'].astype(str)
Comment thread
andersy005 marked this conversation as resolved.
Outdated

column_mapping = load_column_mapping(
registry_name=registry_name, download_type=download_type, mapping_path=CREDIT_SCHEMA_UPATH
)

columns = {v: k for k, v in column_mapping.items()}

data = (
df.rename(columns=columns)
.set_registry(registry_name=registry_name)
.convert_to_datetime(columns=['transaction_date'], format='ISO8601')
.add_missing_columns(schema=credit_without_id_schema)
.validate(schema=credit_without_id_schema)
)

if harmonize_beneficiary_info:
data = data.pipe(
harmonize_beneficiary_data, registry_name=registry_name, download_type=download_type
)
return data


@pf.register_dataframe_method
def process_cercarbono_projects(
df: pd.DataFrame,
*,
credits: pd.DataFrame,
registry_name: str = 'cercarbono',
) -> pd.DataFrame:
"""Process Cercarbono projects dataframe to conform to offsets-db schema.

Parameters
----------
df : pd.DataFrame
Input dataframe containing Cercarbono project data.
registry_name : str, optional
Name of the registry to be added to the dataframe, by default "cercarbon


Returns
-------
pd.DataFrame
Processed dataframe conforming to offsets-db schema.
"""

registry_project_column_mapping = load_registry_project_column_mapping(
registry_name=registry_name, file_path=PROJECT_SCHEMA_UPATH
)
inverted_column_mapping = {value: key for key, value in registry_project_column_mapping.items()}
type_category_mapping = load_type_category_mapping()
inverted_protocol_mapping = load_inverted_protocol_mapping()
protocol_mapping = load_protocol_mapping()
df = df.copy()
df['country'] = df.locations.map(
lambda x: x[0]['country']
) # extract country from locations by taking first entry

data = (
df.rename(columns=inverted_column_mapping)
.set_registry(registry_name=registry_name)
.add_cercarbono_project_url() # this must be called before adding project id because the url function uses the original project_id value
.add_cercarbono_project_id()
.harmonize_country_names()
.harmonize_status_codes()
.map_protocol(inverted_protocol_mapping=inverted_protocol_mapping)
.infer_project_type()
.override_project_types(
override_data_path=BERKELEY_PROJECT_TYPE_UPATH, source_str='berkeley'
)
.add_category(
type_category_mapping=type_category_mapping,
protocol_mapping=protocol_mapping,
) # category derived from protocol; project_type is independent
.map_project_type_to_display_name(type_category_mapping=type_category_mapping)
.add_is_compliance_flag()
.add_retired_and_issued_totals(credits=credits)
.add_first_issuance_and_retirement_dates(credits=credits)
.add_missing_columns(schema=project_schema)
.convert_to_datetime(columns=['listed_at', 'first_issuance_at', 'first_retirement_at'])
.validate(schema=project_schema)
)

return data
6 changes: 4 additions & 2 deletions offsets_db_data/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
import numpy as np
import pandas as pd
import pandas_flavor as pf
import pandera as pa
import pandera.pandas as pa
import upath

CREDIT_SCHEMA_UPATH = (
Expand Down Expand Up @@ -54,7 +54,9 @@ def load_inverted_protocol_mapping() -> dict:
return store


def load_column_mapping(*, registry_name: str, download_type: str, mapping_path: str) -> dict:
def load_column_mapping(
*, registry_name: str, download_type: str, mapping_path: upath.UPath | str
) -> dict:
with open(mapping_path) as f:
registry_credit_column_mapping = json.load(f)
return registry_credit_column_mapping[registry_name][download_type]
Expand Down
Loading