Getting started
This tutorial walks you through installing nemdb, downloading your first dataset, and querying it from Python. By the end you will have a local cache of NEM dispatch data and know how to filter it by date and region.
Prerequisites
- Python 3.13 or higher
- uv package manager
- Internet connection (data is downloaded from AEMO's NEMWEB portal)
1. Install nemdb
Clone the repository and install in development mode:
git clone https://github.com/ymiftah/nemdb.git
cd nemdb
uv pip install -e .
Verify the installation:
python -c "from nemdb import NEMWEBManager; print('OK')"
2. Download data with the CLI
The populate command downloads NEMWEB tables and stores them as Parquet files. Let's fetch one month of data:
uv run populate --location ./data --date_range 2024-01-01->2024-01-31
When prompted for a table, type all to download all active tables, or enter a specific table name like DISPATCHREGIONSUM.
The data is saved under ./data/ in Hive-partitioned Parquet format. Each table gets its own directory:
data/
├── DISPATCHREGIONSUM/
│ └── archive_month=2024-01-01/
│ └── DISPATCHREGIONSUM-0.parquet
├── DISPATCHLOAD/
│ └── ...
└── ...
3. Query data from Python
Create a Python script or open a REPL:
from nemdb import NEMWEBManager, Config
# Point to your data directory
Config.set_cache_dir("./data")
# Create the manager
nemweb = NEMWEBManager(Config)
# List available tables
print(nemweb)
Query by settlement date
Tables like DISPATCHREGIONSUM are indexed by 5-minute settlement dates:
# Get regional demand at noon on January 15
df = nemweb.DISPATCHREGIONSUM.get_data("2024/01/15 12:00:00")
print(df.select("REGIONID", "TOTALDEMAND", "DEMANDFORECAST"))
Expected output:
shape: (5, 3)
┌──────────┬─────────────┬────────────────┐
│ REGIONID ┆ TOTALDEMAND ┆ DEMANDFORECAST │
│ --- ┆ --- ┆ --- │
│ cat ┆ f32 ┆ f32 │
╞══════════╪═════════════╪════════════════╡
│ NSW1 ┆ ... ┆ ... │
│ QLD1 ┆ ... ┆ ... │
│ SA1 ┆ ... ┆ ... │
│ TAS1 ┆ ... ┆ ... │
│ VIC1 ┆ ... ┆ ... │
└──────────┴─────────────┴────────────────┘
Use lazy scanning for large queries
For analytical queries across the full dataset, use scan() to get a Polars LazyFrame:
import polars as pl
# Scan without loading everything into memory
lf = nemweb.DISPATCHREGIONSUM.scan()
# Filter and aggregate lazily
result = (
lf.filter(pl.col("REGIONID") == "NSW1")
.select("SETTLEMENTDATE", "TOTALDEMAND")
.sort("SETTLEMENTDATE")
.collect()
)
print(result.head())
Query unit dispatch data
# Get all generator dispatch at a specific interval
dispatch = nemweb.DISPATCHLOAD.get_data("2024/01/15 12:00:00")
print(dispatch.select("DUID", "TOTALCLEARED", "AVAILABILITY").head(10))
Query unit details
The DUDETAILSUMMARY table uses start/end date filtering:
# Get unit details valid on a given date
units = nemweb.DUDETAILSUMMARY.get_data("2024/01/15")
print(units.select("DUID", "REGIONID", "DISPATCHTYPE", "SCHEDULE_TYPE").head(10))
4. Add more data incrementally
You don't have to re-download everything. Add a new month:
nemweb.DISPATCHREGIONSUM.add_data(year=2024, month=2)
Or populate a date range (existing months are skipped):
nemweb.populate(slice("2024-01-01", "2024-06-30"))
To force re-download (overwrites existing data):
nemweb.populate(slice("2024-01-01", "2024-01-31"), force_new=True)
Next steps
- Fetch NEMWEB data -- detailed guide on all available tables and query patterns
- Build a network model -- use GIS data to create a pandapower model
- Configuration reference -- cache directories, cloud storage, environment variables