Scripts
The scripts
module provides command-line utilities for training, evaluating, and managing reinforcement learning experiments. These scripts streamline common RL tasks with standardized workflows.
Training Script
train.py
trains a reinforcement learning agent using a pre-populated replay buffer, managing agent creation, experience collection, and metrics logging.
Parameters
Parameter | Required | Default | Description |
---|---|---|---|
--starter-buffer-path | Yes | N/A | Path to the pre-populated replay buffer |
--experiment-name | Yes | N/A | Name of the experiment for saving results |
--agent-type | No | 'sac' | Type of agent to train: 'sac' , 'td3' , or 'ddpg' |
--train-iterations | No | 300 | Total number of training iterations |
--collect-steps-per-training-iteration | No | 50 | Number of environment steps to collect per training iteration |
--batch-size | No | 256 | Batch size for training (number of samples per gradient update) |
--log-interval | No | 1 | Interval (in steps) for logging training metrics |
--eval-interval | No | 10 | Interval (in steps) for evaluating the agent |
--num-eval-episodes | No | 1 | Number of episodes to run during each evaluation |
--checkpoint-interval | No | 10 | Interval (in steps) for checkpointing the replay buffer |
--learner-iterations | No | 200 | Number of gradient updates to perform per training iteration |
--scenario-config-path | No | smart_control/configs/resources/sb1/generated_configs/config_timestepsec-900_numdaysinepisode-14_starttimestamp-2023-07-06.gin | Path to the scenario configuration file (.gin) |
Example Usage
python scripts/train.py \
--starter-buffer-path data/buffers/initial_buffer \
--experiment-name hvac_control_sac \
--agent-type sac \
--train-iterations 300 \
--collect-steps-per-training-iteration 50 \
--batch-size 256 \
--scenario-config-path configs/custom_config.gin
Evaluation Script
eval.py
evaluates a trained policy or the baseline schedule policy in a configured environment, producing performance metrics and optional trajectory data.
Parameters
Parameter | Required | Default | Description |
---|---|---|---|
--policy-dir | Yes | N/A | Path to the saved policy directory or "schedule" for baseline policy |
--gin-config | No | smart_control/configs/resources/sb1/generated_configs/config_timestepsec-900_numdaysinepisode-7_starttimestamp-2023-07-06.gin | Path to the environment configuration file (.gin) |
--num-eval-episodes | No | 1 | Number of episodes to run for evaluation |
--experiment-name | Yes | N/A | Name of the evaluation experiment for saving results |
--save-trajectory | No | True | Whether to save detailed trajectory data for each episode |
Example Usage
python scripts/eval.py \
--policy-dir experiments/hvac_control_sac/policies/greedy_policy \
--gin-config configs/building_sim.gin \
--num-eval-episodes 5 \
--experiment-name sac_evaluation \
--save-trajectory False
Buffer Population Script
populate_starter_buffer.py
populates an initial replay buffer with exploration data using the baseline schedule policy, aiding off-policy learning.
Parameters
Parameter | Required | Default | Description |
---|---|---|---|
--buffer-name | Yes | N/A | Name to identify the saved replay buffer |
--capacity | No | 50000 | Maximum capacity of the replay buffer |
--steps-per-run | No | 672 | Number of steps to collect per actor run (episode) |
--num-runs | No | 10 | Number of actor runs (episodes) to perform |
--sequence-length | No | 2 | Sequence length for storing trajectories in the buffer |
--env-gin-config-file-path | No | smart_control/configs/resources/sb1/generated_configs/config_timestepsec-900_numdaysinepisode-14_starttimestamp-2023-07-06.gin | Path to the environment configuration file (.gin) |
Example Usage
python scripts/populate_starter_buffer.py \
--buffer-name initial_exploration \
--capacity 100000 \
--steps-per-run 1000 \
--num-runs 20 \
--sequence-length 2 \
--env-gin-config-file-path configs/custom_env.gin
Configuration Generator Script
generate_gin_config_files.py
generates multiple gin config files from a parameter grid for systematic experimentation.
Parameters
Parameter | Required | Default | Description |
---|---|---|---|
base_config | Yes | N/A | Path to the base gin config file (positional argument) |
--output-dir | No | 'generated_configs' | Directory to save the generated config files |
--time-steps | No | '300' | Comma-separated list of time_step_sec values to grid over |
--num-days | No | '1,7,14,30' | Comma-separated list of num_days_in_episode values to grid over |
--start-timestamps | No | '2023-07-06' | Comma-separated list of start_timestamp dates to grid over |
Example Usage
python scripts/generate_gin_config_files.py configs/base_config.gin \
--output-dir configs/generated \
--time-steps 300,600,900 \
--num-days 1,7,14 \
--start-timestamps 2023-07-06,2023-10-06
Typical Workflow
A typical RL experiment workflow includes:
-
Generate configurations:
python scripts/generate_gin_config_files.py configs/template.gin \ --output-dir configs/generated
-
Populate initial buffer:
python scripts/populate_starter_buffer.py \ --buffer-name starter \ --env-gin-config-file-path configs/generated/config_timestepsec-900_numdaysinepisode-14_starttimestamp-2023-07-06.gin
-
Train agent:
python scripts/train.py \ --starter-buffer-path data/buffers/starter \ --experiment-name my_experiment \ --scenario-config-path configs/generated/config_timestepsec-900_numdaysinepisode-14_starttimestamp-2023-07-06.gin
-
Evaluate trained policy:
python scripts/eval.py \ --policy-dir experiments/my_experiment/policies/greedy_policy \ --gin-config configs/generated/config_timestepsec-900_numdaysinepisode-14_starttimestamp-2023-07-06.gin \ --experiment-name eval_my_experiment
-
Compare against baseline:
python scripts/eval.py \ --policy-dir schedule \ --gin-config configs/generated/config_timestepsec-900_numdaysinepisode-14_starttimestamp-2023-07-06.gin \ --experiment-name baseline_evaluation
Back to Reinforcement Learning