Models API

Energy and time aggregation from raw measurement logs.

This module provides functions to aggregate energy and time measurements from raw CSV logs, computing averages across multiple runs.

pgsi_analyzer.models.aggregation.stress_test_aggregation_regex(folder_path: str | Path, kind: str = 'energy') → dict[source]: Regex stress test: attempt to process folder with various filenames. Returns counts of accepted, rejected (wrong pattern), and skipped (partial/temp).

pgsi_analyzer.models.aggregation.aggregate_energy(folder_path: str | Path, output_path: str | Path | None = None) → DataFrame[source]

Compute average energy consumption from raw CSV logs in a folder.

Reads all CSV files in the specified folder and computes the average ‘package (uJ)’ value for each file.

Parameters:

folder_path – Path to folder containing energy CSV files. Each CSV should have a ‘package (uJ)’ column.
output_path – Optional path to save the aggregated results CSV.

Returns:

‘filename’: Base name of the CSV file (without extension)
’average_package (uJ)’: Average energy in microjoules

Return type:

DataFrame with columns

Examples

>>> df = aggregate_energy('energy_benchmark/')
>>> df.head()

pgsi_analyzer.models.aggregation.aggregate_time(folder_path: str | Path, output_path: str | Path | None = None) → DataFrame[source]

Compute average execution time from raw CSV logs in a folder.

Reads all CSV files in the specified folder and computes the average ‘execution_time (s)’ value for each file.

Parameters:

folder_path – Path to folder containing time CSV files. Each CSV should have an ‘execution_time (s)’ column.
output_path – Optional path to save the aggregated results CSV.

Returns:

‘filename’: Base name of the CSV file (without extension)
’execution_time (s)’: Average execution time in seconds

Return type:

DataFrame with columns

Examples

>>> df = aggregate_time('time_benchmark/')
>>> df.head()

Combine energy and time results from multiple execution methods.

This module provides functions to merge aggregated results from different execution methods (e.g., CPython, PyPy, Cython) into comparison tables.

pgsi_analyzer.models.combination.extract_algorithm_name(full_name: str, method_name: str = '') → str[source]

Extract algorithm name from full filename.

When method_name is provided (e.g. from the parent directory), strips ‘_’ + method_name from the end so that e.g. ‘nbody_py_compile’ with method ‘py_compile’ yields ‘nbody’ instead of ‘nbody_py’.

Parameters:

full_name – Full filename (e.g., ‘nbody_py_compile’, ‘nbody_cpython’)
method_name – Method name (e.g., ‘py_compile’, ‘cpython’). If provided and full_name ends with ‘_’ + method_name, that suffix is removed.

Returns:

Algorithm name (e.g., ‘nbody’)

Examples

>>> extract_algorithm_name('nbody_py_compile', 'py_compile')
'nbody'
>>> extract_algorithm_name('nbody_cpython', 'cpython')
'nbody'
>>> extract_algorithm_name('bubble_sort_cpython')
'bubble_sort'

pgsi_analyzer.models.combination.combine_energy_results(file_paths: List[str | Path], output_path: str | Path) → DataFrame[source]

Merge energy results from multiple execution methods.

Combines aggregated energy results from different methods (e.g., CPython, PyPy) into a single comparison table with algorithms as rows and methods as columns.

Parameters:

file_paths – List of paths to aggregated energy CSV files. Method name is extracted from the parent directory name.
output_path – Path to save the combined results CSV.

Returns:

‘algorithm’ column
One column per method with average energy values

Return type:

DataFrame with

Examples

>>> paths = [
...     'cpython/energy_avg.csv',
...     'pypy/energy_avg.csv'
... ]
>>> df = combine_energy_results(paths, 'energy_com.csv')

pgsi_analyzer.models.combination.combine_time_results(file_paths: List[str | Path], output_path: str | Path) → DataFrame[source]

Merge execution time results from multiple execution methods.

Combines aggregated time results from different methods (e.g., CPython, PyPy) into a single comparison table with algorithms as rows and methods as columns.

Parameters:

file_paths – List of paths to aggregated time CSV files. Method name is extracted from the parent directory name.
output_path – Path to save the combined results CSV.

Returns:

‘algorithm’ column
One column per method with average execution time values

Return type:

DataFrame with

Examples

>>> paths = [
...     'cpython/time_avg.csv',
...     'pypy/time_avg.csv'
... ]
>>> df = combine_time_results(paths, 'time_com.csv')

Carbon footprint calculation from energy consumption data.

This module provides functions to calculate carbon dioxide equivalent (CO₂e) emissions from energy consumption data using configurable carbon intensity factors.

pgsi_analyzer.models.carbon.calculate_carbon_footprint(energy_csv_path: str | Path, output_path: str | Path | None = None, carbon_intensity: float = 0.000475) → DataFrame[source]

Calculate carbon footprint from energy consumption data.

Converts energy values (in microjoules) to carbon dioxide equivalent (CO₂e) in grams using a configurable carbon intensity factor.

Parameters:

energy_csv_path – Path to CSV file containing energy data. Expected format: ‘algorithm’ column and method columns with energy in μJ.
output_path – Optional path to save the carbon footprint CSV. If None, file is not written.
carbon_intensity – Carbon intensity factor in gCO₂e per Joule. Default: 0.000475 (global average).

Returns:

‘algorithm’ column
Method columns with ‘_CO2e_g’ suffix (carbon in grams)

Return type:

DataFrame with carbon footprint data

Raises:

FileNotFoundError – If energy_csv_path does not exist.
ValueError – If the input CSV is missing the algorithm column.
OSError – If output_path is provided but cannot be created/written.

Examples

>>> df = calculate_carbon_footprint("results/energy_combined.csv")
>>> "algorithm" in df.columns
True
>>> df = calculate_carbon_footprint(
...     "results/energy_combined.csv",
...     output_path="results/carbon_footprint.csv",
...     carbon_intensity=0.000475,
... )

GreenScore calculation and metric normalization.

This module provides functions to calculate GreenScore, a composite metric combining energy consumption, execution time, and carbon footprint with configurable weights.

pgsi_analyzer.models.greenscore.normalize_metrics(df: DataFrame, output_path: str | Path | None = None) → DataFrame[source]

Normalize metrics across methods (row-wise) per algorithm.

Applies min-max normalization to each row, normalizing values between 0 and 1. This allows fair comparison across different algorithms with different scales.

Parameters:

df – DataFrame with one row per algorithm, ‘algorithm’ column, and metric columns.
output_path – Optional path to save the normalized DataFrame as CSV.

Returns:

DataFrame with ‘algorithm’ and normalized method columns.

Examples

>>> df = pd.DataFrame({
...     'algorithm': ['algo1', 'algo2'],
...     'method1': [100, 200],
...     'method2': [50, 150]
... })
>>> normalized = normalize_metrics(df)

pgsi_analyzer.models.greenscore.calculate_greenscore(energy_df: DataFrame, time_df: DataFrame, carbon_df: DataFrame, alpha: float = 0.4, beta: float = 0.4, gamma: float = 0.2, output_path: str | Path | None = None, aggregated_energy_paths: Dict[str, str | Path] | None = None) → DataFrame[source]

Compute the GreenScore for each method by combining normalized energy, time, and carbon scores with weighted averaging.

GreenScore = α·energy + β·carbon + γ·time

Lower scores indicate better sustainability (lower energy, time, and carbon).

Parameters:

energy_df – Raw energy DataFrame (with ‘algorithm’ column).
time_df – Raw time DataFrame (with ‘algorithm’ column).
carbon_df – Raw carbon DataFrame (with ‘algorithm’ column).
alpha – Weight for energy component (default: 0.4).
beta – Weight for carbon component (default: 0.4).
gamma – Weight for time component (default: 0.2).
output_path – Optional path to save the final ranking CSV.
aggregated_energy_paths – Optional dict method -> path to method’s energy_aggregated.csv used to add points_measured / points_estimated to the output.

Returns:

‘method’: Method name
’energy_mean’: Mean normalized energy
’time_mean’: Mean normalized time
’carbon_mean’: Mean normalized carbon
’green_score’: Composite GreenScore
’points_measured’: (if aggregated_energy_paths given) Count of hardware-measured points
’points_estimated’: (if aggregated_energy_paths given) Count of estimated points
’data_source_consistency’: “Consistent” or “Inconsistent Data Source” (when method has both hardware and estimation)

Return type:

DataFrame sorted by green score (ascending, lower is better)

Examples

>>> energy_df = pd.read_csv('energy.csv')
>>> time_df = pd.read_csv('time.csv')
>>> carbon_df = pd.read_csv('carbon.csv')
>>> ranking = calculate_greenscore(energy_df, time_df, carbon_df)