Architecture

PGSI Analyzer is structured as a layered benchmarking and analysis pipeline that prioritizes repeatability, cross-method comparability, and transparent auditability.

For setup and first-run commands, see Getting Started. For module-level reference details, see API Reference.

System Layers

CLI Layer

The CLI entrypoint (pgsi_analyzer.cli.main) parses user intent, resolves runtime configuration, and dispatches to benchmark operations. It is intentionally thin: orchestration and analytics logic live in dedicated modules so they remain reusable and testable.

Orchestration Layer

The orchestration package (pgsi_analyzer.benchmark) coordinates the benchmark lifecycle:

input normalization (algorithms/methods/runs),
method-specific build steps,
benchmark process execution,
artifact collection and placement.

This layer enforces execution ordering and minimizes coupling between benchmark implementations and result models.

Measurement Layer

The measurement package provides decorators and estimation utilities:

measure_time_to_csv for repeated runtime capture,
energy measurement/fallback logic (hardware-dependent primary paths with estimator fallback),
CPU-power resolution utilities backed by packaged data.

The design goal is graceful degradation: obtain usable energy estimates even when hardware counters or optional dependencies are unavailable.

Modeling Layer

Model modules transform raw artifacts into comparative analytics:

aggregation per method,
combination across methods,
carbon footprint derivation from energy,
GreenScore computation using weighted components.

This separation keeps analytics formulas isolated from execution mechanics and simplifies experimentation.

Benchmark Content Layer

Built-in benchmarks are shipped with the package, while user benchmarks can be scaffolded and merged through discovery. Both flows conform to a shared directory contract so the orchestrator can treat them uniformly.

Core Execution Flow

At a high level, pgsi-analyzer benchmark run performs:

Parse and validate CLI arguments.
Resolve tool paths from flags, environment, and optional .env.
Build a unified benchmark registry (built-ins + user project).
Build method prerequisites where needed (for example Cython extensions).
Execute benchmark workloads for requested algorithms/methods.
Collect raw energy/time files into normalized workspace/output structure.
Aggregate, combine, compute carbon metrics, and produce GreenScore.
Write audit metadata and final report artifacts.

Data Contracts and File Semantics

Benchmark output collection depends on naming conventions that communicate artifact intent:

time outputs follow time_*.csv,
energy outputs follow energy_*.csv,
combined outputs live at output root for downstream analysis stages.

These conventions are implementation-level contracts between decorators, collector/provider modules, and model transforms.

Configuration Resolution Strategy

Runtime executables (Python, PyPy, C compiler) are resolved using precedence logic so users can balance convenience and determinism:

Explicit CLI flags (strongest),
environment variables,
optional .env file,
fallback defaults.

This strategy enables both ad-hoc local runs and strict CI reproducibility.

Extensibility Model

PGSI is designed for extension along two axes:

Benchmark extension: add new algorithms via scaffold + registry update.
Analysis extension: add or modify model formulas while preserving existing CSV contracts.

The architecture keeps these concerns decoupled so analytical evolution does not require runner rewrites.

Quality and Reliability

Reliability is supported by a broad test surface (CLI behavior, execution orchestration, measurements, model computations, and audit integrity checks). Audit artifacts are emitted alongside results to support traceability and post-run diagnostics.