Gathering Data
AustralianElectricityMarkets.jl provides a thin wrapper around the nemdb Python package. This integration allows you to easily download and cache data from the AEMO NEMWEB data archive.
How it Works
The data is fetched from the NEMWEB archive and stored locally as hive-partitioned Parquet files. This provides a good trade-off between data compression and query efficiency using tools like DuckDB (via TidierDB.jl).
Configuration
By default, data is cached in ~/.nemweb_cache. You can customize the cache location and filesystem using PyHiveConfiguration.
using AustralianElectricityMarkets
# Configure a custom cache directory
config = PyHiveConfiguration(
base_dir = "/path/to/my_cache",
filesystem="local", # supports Amazon s3, Google Cloud Platform gs
)Listing Available Tables
To see which AEMO tables are currently supported and available for download:
list_available_tables()Populating the Database
You can download and populate the cache for a specific table over a given date range using fetch_table_data.
using Dates
# Download dispatch data for early 2024
fetch_table_data(:DISPATCHREGIONSUM, Date(2024, 1, 1):Date(2024, 1, 2))To download data for all supported tables for a specific period:
fetch_table_data(Date(2024, 1, 1):Date(2024, 1, 2))To display the data requirements for specific network configurations:
table_requirements(RegionalNetworkConfiguration())Reading the Data
Once the data is cached, you can load it for analysis. The package provides high-level functions to parse this raw data into structured DataFrames via TidierDB.
using TidierDB
# Connect to a local DuckDB instance
db = aem_connect(duckdb())
# Low-level access to a specific hive table
df_raw = read_hive(db, :DISPATCH_UNIT_SOLUTION) |> @collect
# Load unit information from the cached data
units = read_units(db)