Skip to content

Data Management

PsDataManager is the central data store. It extends Python's dict, using composite tuple keys to store PsData objects.

Creating a Manager

from psPlotKit.data_manager.ps_data_manager import PsDataManager

dm = PsDataManager("my_sweep_results.h5")

Registering Data Keys

Before loading, tell the manager which keys to import:

dm.register_data_key(
    file_key="fs.costing.LCOW",   # key in the .h5 file
    return_key="LCOW",            # your short name
    units="USD/m**3",             # optional: convert to these units
)

dm.register_data_key(
    "fs.water_recovery",
    "recovery",
    assign_units="%",             # assign units without conversion
)

register_data_key Parameters

Parameter Description
file_key Key path in the HDF5/JSON file
return_key Short name for referencing
units Convert imported data to these units
assign_units Assign units without converting
conversion_factor Manual scaling factor
directories Restrict to specific directories

Loading Data

dm.load_data()

load_data performs three steps:

  1. Import — reads data from files for all registered keys
  2. Check import status — verifies all keys were found (controllable via check_import_status)
  3. Evaluate expressions — computes any registered expressions (controllable via evaluate_expressions)
# Warn on missing keys instead of raising
dm.load_data(raise_error=False)

# Skip import checking
dm.load_data(check_import_status=False)

# Skip expression evaluation
dm.load_data(evaluate_expressions=False)

Composite Tuple Keys

Data is stored under composite tuple keys built from directory labels and data keys:

  • Single-directory files: ("LCOW",) or simply "LCOW"
  • Multi-directory files: (("erd_type", "pressure_exchanger"), "membrane_cost", "LCOW")

Inspecting Data

dm.display()              # all (directory, data_key) entries
dm.display_keys()         # unique data keys only
dm.display_directories()  # unique directory keys only

Accessing Data

dir_key = dm.directory_keys[0]
lcow = dm.get_data(dir_key, "LCOW")  # returns PsData object

Adding Computed Data

ratio = lcow / recovery
dm.add_data(dir_key, "my_ratio", ratio)

Selecting Data for Plotting

dm.select_data(["LCOW", "recovery"])
selected = dm.get_selected_data()        # dict for plotters
dm.clear_selected_data()                 # reset selection

Selection Parameters

Parameter Description
selected_keys List of key names to select
require_all_in_dir Only include directories with all keys
exact_keys Require exact key match
add_to_existing Append to current selection
return_all_if_non_found Fall back to all data if no match

Reducing / Stacking Data

Combine data across directories:

dm.reduce_data(
    stack_keys="number_of_stages",
    data_key="LCOW",
    reduction_type="min",
)

This stacks data from directories sharing stack_keys, then applies the reduction ("min", "max", "unique").

Normalizing Data

dm.normalize_data(
    base_value_dict={"LCOW": 1.0},
    norm_units="%",
)

Evaluating Custom Functions

dm.eval_function(
    directory=dir_key,
    name="custom_calc",
    function=my_function,
    function_dict={"x": "LCOW", "y": "recovery"},
    units="dimensionless",
)

Exporting Data to CSV

You can export all loaded data to CSV files directly from the manager:

dm.export_data_to_csv("results")

The export behaviour depends on how many directories the manager contains:

  • Single directory — writes one CSV file. If the path doesn't end in .csv, the extension is appended automatically (e.g. "results"results.csv).
  • Multiple directories — creates a folder and writes one CSV per directory. If the path ends in .csv, the extension is stripped to form the folder name (e.g. "results.csv"results/).

Column headers are built from each data key's label and units (e.g. LCOW (USD/m**3)).

# single directory — creates results.csv
dm.export_data_to_csv("results")

# multiple directories — creates output/ folder with one CSV per directory
dm.export_data_to_csv("output")

You can also use the PsDataExporter class directly for more control:

from psPlotKit.data_manager.ps_data_exporter import PsDataExporter

exporter = PsDataExporter(dm, "my_results.csv")
written_files = exporter.export()