Models API
Energy and time aggregation from raw measurement logs.
This module provides functions to aggregate energy and time measurements from raw CSV logs, computing averages across multiple runs.
- pgsi_analyzer.models.aggregation.stress_test_aggregation_regex(folder_path: str | Path, kind: str = 'energy') dict[source]
Regex stress test: attempt to process folder with various filenames. Returns counts of accepted, rejected (wrong pattern), and skipped (partial/temp).
- pgsi_analyzer.models.aggregation.aggregate_energy(folder_path: str | Path, output_path: str | Path | None = None) DataFrame[source]
Compute average energy consumption from raw CSV logs in a folder.
Reads all CSV files in the specified folder and computes the average ‘package (uJ)’ value for each file.
- Parameters:
folder_path – Path to folder containing energy CSV files. Each CSV should have a ‘package (uJ)’ column.
output_path – Optional path to save the aggregated results CSV.
- Returns:
‘filename’: Base name of the CSV file (without extension)
’average_package (uJ)’: Average energy in microjoules
- Return type:
DataFrame with columns
Examples
>>> df = aggregate_energy('energy_benchmark/') >>> df.head()
- pgsi_analyzer.models.aggregation.aggregate_time(folder_path: str | Path, output_path: str | Path | None = None) DataFrame[source]
Compute average execution time from raw CSV logs in a folder.
Reads all CSV files in the specified folder and computes the average ‘execution_time (s)’ value for each file.
- Parameters:
folder_path – Path to folder containing time CSV files. Each CSV should have an ‘execution_time (s)’ column.
output_path – Optional path to save the aggregated results CSV.
- Returns:
‘filename’: Base name of the CSV file (without extension)
’execution_time (s)’: Average execution time in seconds
- Return type:
DataFrame with columns
Examples
>>> df = aggregate_time('time_benchmark/') >>> df.head()
Combine energy and time results from multiple execution methods.
This module provides functions to merge aggregated results from different execution methods (e.g., CPython, PyPy, Cython) into comparison tables.
- pgsi_analyzer.models.combination.extract_algorithm_name(full_name: str, method_name: str = '') str[source]
Extract algorithm name from full filename.
When method_name is provided (e.g. from the parent directory), strips ‘_’ + method_name from the end so that e.g. ‘nbody_py_compile’ with method ‘py_compile’ yields ‘nbody’ instead of ‘nbody_py’.
- Parameters:
full_name – Full filename (e.g., ‘nbody_py_compile’, ‘nbody_cpython’)
method_name – Method name (e.g., ‘py_compile’, ‘cpython’). If provided and full_name ends with ‘_’ + method_name, that suffix is removed.
- Returns:
Algorithm name (e.g., ‘nbody’)
Examples
>>> extract_algorithm_name('nbody_py_compile', 'py_compile') 'nbody' >>> extract_algorithm_name('nbody_cpython', 'cpython') 'nbody' >>> extract_algorithm_name('bubble_sort_cpython') 'bubble_sort'
- pgsi_analyzer.models.combination.combine_energy_results(file_paths: List[str | Path], output_path: str | Path) DataFrame[source]
Merge energy results from multiple execution methods.
Combines aggregated energy results from different methods (e.g., CPython, PyPy) into a single comparison table with algorithms as rows and methods as columns.
- Parameters:
file_paths – List of paths to aggregated energy CSV files. Method name is extracted from the parent directory name.
output_path – Path to save the combined results CSV.
- Returns:
‘algorithm’ column
One column per method with average energy values
- Return type:
DataFrame with
Examples
>>> paths = [ ... 'cpython/energy_avg.csv', ... 'pypy/energy_avg.csv' ... ] >>> df = combine_energy_results(paths, 'energy_com.csv')
- pgsi_analyzer.models.combination.combine_time_results(file_paths: List[str | Path], output_path: str | Path) DataFrame[source]
Merge execution time results from multiple execution methods.
Combines aggregated time results from different methods (e.g., CPython, PyPy) into a single comparison table with algorithms as rows and methods as columns.
- Parameters:
file_paths – List of paths to aggregated time CSV files. Method name is extracted from the parent directory name.
output_path – Path to save the combined results CSV.
- Returns:
‘algorithm’ column
One column per method with average execution time values
- Return type:
DataFrame with
Examples
>>> paths = [ ... 'cpython/time_avg.csv', ... 'pypy/time_avg.csv' ... ] >>> df = combine_time_results(paths, 'time_com.csv')
Carbon footprint calculation from energy consumption data.
This module provides functions to calculate carbon dioxide equivalent (CO₂e) emissions from energy consumption data using configurable carbon intensity factors.
- pgsi_analyzer.models.carbon.calculate_carbon_footprint(energy_csv_path: str | Path, output_path: str | Path | None = None, carbon_intensity: float = 0.000475) DataFrame[source]
Calculate carbon footprint from energy consumption data.
Converts energy values (in microjoules) to carbon dioxide equivalent (CO₂e) in grams using a configurable carbon intensity factor.
- Parameters:
energy_csv_path – Path to CSV file containing energy data. Expected format: ‘algorithm’ column and method columns with energy in μJ.
output_path – Optional path to save the carbon footprint CSV. If None, file is not written.
carbon_intensity – Carbon intensity factor in gCO₂e per Joule. Default: 0.000475 (global average).
- Returns:
‘algorithm’ column
Method columns with ‘_CO2e_g’ suffix (carbon in grams)
- Return type:
DataFrame with carbon footprint data
- Raises:
FileNotFoundError – If
energy_csv_pathdoes not exist.ValueError – If the input CSV is missing the
algorithmcolumn.OSError – If
output_pathis provided but cannot be created/written.
Examples
>>> df = calculate_carbon_footprint("results/energy_combined.csv") >>> "algorithm" in df.columns True >>> df = calculate_carbon_footprint( ... "results/energy_combined.csv", ... output_path="results/carbon_footprint.csv", ... carbon_intensity=0.000475, ... )
GreenScore calculation and metric normalization.
This module provides functions to calculate GreenScore, a composite metric combining energy consumption, execution time, and carbon footprint with configurable weights.
- pgsi_analyzer.models.greenscore.normalize_metrics(df: DataFrame, output_path: str | Path | None = None) DataFrame[source]
Normalize metrics across methods (row-wise) per algorithm.
Applies min-max normalization to each row, normalizing values between 0 and 1. This allows fair comparison across different algorithms with different scales.
- Parameters:
df – DataFrame with one row per algorithm, ‘algorithm’ column, and metric columns.
output_path – Optional path to save the normalized DataFrame as CSV.
- Returns:
DataFrame with ‘algorithm’ and normalized method columns.
Examples
>>> df = pd.DataFrame({ ... 'algorithm': ['algo1', 'algo2'], ... 'method1': [100, 200], ... 'method2': [50, 150] ... }) >>> normalized = normalize_metrics(df)
- pgsi_analyzer.models.greenscore.calculate_greenscore(energy_df: DataFrame, time_df: DataFrame, carbon_df: DataFrame, alpha: float = 0.4, beta: float = 0.4, gamma: float = 0.2, output_path: str | Path | None = None, aggregated_energy_paths: Dict[str, str | Path] | None = None) DataFrame[source]
Compute the GreenScore for each method by combining normalized energy, time, and carbon scores with weighted averaging.
GreenScore = α·energy + β·carbon + γ·time
Lower scores indicate better sustainability (lower energy, time, and carbon).
- Parameters:
energy_df – Raw energy DataFrame (with ‘algorithm’ column).
time_df – Raw time DataFrame (with ‘algorithm’ column).
carbon_df – Raw carbon DataFrame (with ‘algorithm’ column).
alpha – Weight for energy component (default: 0.4).
beta – Weight for carbon component (default: 0.4).
gamma – Weight for time component (default: 0.2).
output_path – Optional path to save the final ranking CSV.
aggregated_energy_paths – Optional dict method -> path to method’s energy_aggregated.csv used to add points_measured / points_estimated to the output.
- Returns:
‘method’: Method name
’energy_mean’: Mean normalized energy
’time_mean’: Mean normalized time
’carbon_mean’: Mean normalized carbon
’green_score’: Composite GreenScore
’points_measured’: (if aggregated_energy_paths given) Count of hardware-measured points
’points_estimated’: (if aggregated_energy_paths given) Count of estimated points
’data_source_consistency’: “Consistent” or “Inconsistent Data Source” (when method has both hardware and estimation)
- Return type:
DataFrame sorted by green score (ascending, lower is better)
Examples
>>> energy_df = pd.read_csv('energy.csv') >>> time_df = pd.read_csv('time.csv') >>> carbon_df = pd.read_csv('carbon.csv') >>> ranking = calculate_greenscore(energy_df, time_df, carbon_df)