Getting started

This tutorial walks you through installing nemdb, downloading your first dataset, and querying it from Python. By the end you will have a local cache of NEM dispatch data and know how to filter it by date and region.

Prerequisites

Python 3.13 or higher
uv package manager
Internet connection (data is downloaded from AEMO's NEMWEB portal)

1. Install nemdb

Clone the repository and install in development mode:

git clone https://github.com/ymiftah/nemdb.git
cd nemdb
uv pip install -e .

Verify the installation:

python -c "from nemdb import NEMWEBManager; print('OK')"

2. Download data with the CLI

The populate command downloads NEMWEB tables and stores them as Parquet files. Let's fetch one month of data:

uv run populate --location ./data --date_range 2024-01-01->2024-01-31

When prompted for a table, type all to download all active tables, or enter a specific table name like DISPATCHREGIONSUM.

The data is saved under ./data/ in Hive-partitioned Parquet format. Each table gets its own directory:

data/
├── DISPATCHREGIONSUM/
│   └── archive_month=2024-01-01/
│       └── DISPATCHREGIONSUM-0.parquet
├── DISPATCHLOAD/
│   └── ...
└── ...

3. Query data from Python

Create a Python script or open a REPL:

from nemdb import NEMWEBManager, Config

# Point to your data directory
Config.set_cache_dir("./data")

# Create the manager
nemweb = NEMWEBManager(Config)

# List available tables
print(nemweb)

Query by settlement date

Tables like DISPATCHREGIONSUM are indexed by 5-minute settlement dates:

# Get regional demand at noon on January 15
df = nemweb.DISPATCHREGIONSUM.get_data("2024/01/15 12:00:00")
print(df.select("REGIONID", "TOTALDEMAND", "DEMANDFORECAST"))

Expected output:

shape: (5, 3)
┌──────────┬─────────────┬────────────────┐
│ REGIONID ┆ TOTALDEMAND ┆ DEMANDFORECAST │
│ ---      ┆ ---         ┆ ---            │
│ cat      ┆ f32         ┆ f32            │
╞══════════╪═════════════╪════════════════╡
│ NSW1     ┆ ...         ┆ ...            │
│ QLD1     ┆ ...         ┆ ...            │
│ SA1      ┆ ...         ┆ ...            │
│ TAS1     ┆ ...         ┆ ...            │
│ VIC1     ┆ ...         ┆ ...            │
└──────────┴─────────────┴────────────────┘

Use lazy scanning for large queries

For analytical queries across the full dataset, use scan() to get a Polars LazyFrame:

import polars as pl

# Scan without loading everything into memory
lf = nemweb.DISPATCHREGIONSUM.scan()

# Filter and aggregate lazily
result = (
    lf.filter(pl.col("REGIONID") == "NSW1")
    .select("SETTLEMENTDATE", "TOTALDEMAND")
    .sort("SETTLEMENTDATE")
    .collect()
)
print(result.head())

Query unit dispatch data

# Get all generator dispatch at a specific interval
dispatch = nemweb.DISPATCHLOAD.get_data("2024/01/15 12:00:00")
print(dispatch.select("DUID", "TOTALCLEARED", "AVAILABILITY").head(10))

Query unit details

The DUDETAILSUMMARY table uses start/end date filtering:

# Get unit details valid on a given date
units = nemweb.DUDETAILSUMMARY.get_data("2024/01/15")
print(units.select("DUID", "REGIONID", "DISPATCHTYPE", "SCHEDULE_TYPE").head(10))

4. Add more data incrementally

You don't have to re-download everything. Add a new month:

nemweb.DISPATCHREGIONSUM.add_data(year=2024, month=2)

Or populate a date range (existing months are skipped):

nemweb.populate(slice("2024-01-01", "2024-06-30"))

To force re-download (overwrites existing data):

nemweb.populate(slice("2024-01-01", "2024-01-31"), force_new=True)

Next steps

Fetch NEMWEB data -- detailed guide on all available tables and query patterns
Build a network model -- use GIS data to create a pandapower model
Configuration reference -- cache directories, cloud storage, environment variables