Environment Module
The environment
module provides the reinforcement learning environment where the agent interacts with the building simulation to control HVAC systems.
environment/environment.py
Purpose: Implements a controllable building RL environment compatible with TF-Agents, allowing agents to control various setpoints with the goal of making the HVAC system more efficient.
Key Classes and Functions
-
Environment
(inherits frompy_environment.PyEnvironment
):-
Attributes:
-
Building and Simulation Components:
_building
: Instance ofBaseBuilding
, representing the simulated building environment._time_zone
: Time zone of the building/environment._start_timestamp
and_end_timestamp
: Start and end times of the episode._step_interval
: Time interval between environment steps._num_timesteps_in_episode
: Number of timesteps in an episode._observation_request
: Template for requesting observations from the building.
-
Agent Interaction Components:
_action_spec
: Specification of the actions that can be taken._observation_spec
: Specification of the observations._action_normalizers
: Mapping from action names to their normalizers._action_names
: List of action field names._field_names
: List of observation field names._observation_normalizer
: Normalizes observations received from the building._default_policy_values
: Default actions in the policy.
-
Reward and Metrics:
_reward_function
: Instance ofBaseRewardFunction
used to compute rewards._discount_factor
: Discount factor for future rewards._metrics
: Stores metrics for analysis and visualization._metrics_reporting_interval
: Interval for reporting metrics._summary_writer
: Writes summary data for visualization tools like TensorBoard.
-
Episode and Step Tracking:
_episode_ended
: Boolean flag indicating if the episode has ended._step_count
: Number of steps taken in the current episode._global_step_count
: Total number of steps taken across episodes._episode_count
: Number of episodes completed._episode_cumulative_reward
: Cumulative reward for the current episode.
-
-
Methods:
-
__init__(...)
: Initializes the environment with the specified parameters and configurations. -
Environment Lifecycle Methods:
_reset()
: Resets the environment to its initial state, preparing for a new episode._step(action)
: Executes one time step in the environment given an action from the agent._has_episode_ended(last_timestamp)
: Checks if the episode has ended based on time or steps.
-
Agent Interaction Methods:
action_spec()
: Returns the specification of the actions that can be taken.observation_spec()
: Returns the specification of the observations._get_observation()
: Retrieves and processes observations from the building, including normalizing and handling missing data._create_action_request(action_array)
: Converts normalized agent actions into native action values for the building.
-
Reward Calculation Methods:
_get_reward()
: Computes the reward for the last action taken based on the reward function._write_summary_reward_info_metrics(reward_info)
: Writes reward input metrics into summary logs._write_summary_reward_response_metrics(reward_response)
: Writes reward output metrics into summary logs._commit_reward_metrics()
: Aggregates and writes reward metrics, and resets accumulators.
-
Utility Methods:
_get_action_spec_and_normalizers(...)
: Defines the action space and normalizers based on the building devices and action configurations._get_observation_spec(...)
: Defines the observation space based on the building devices and observation configurations._format_action(action, action_names)
: Reformat actions if necessary (extension point for subclasses).render(mode)
: (Not implemented) Intended for rendering the environment state.
-
Properties:
steps_per_episode
: Number of steps in an episode.start_timestamp
: Start time of the episode.end_timestamp
: End time of the episode.default_policy_values
: Default actions in the policy.label
: Label for the environment or episode.current_simulation_timestamp
: Current simulation time.
-
-
Environment Workflow
-
Initialization:
- The environment is initialized with specified parameters, including the building simulation, reward function, observation normalizer, and action configurations.
- Action and observation specifications are set up based on devices and configurations, involving action normalizers and mappings.
- Auxiliary features such as time of day and day of week are prepared.
-
Resetting the Environment:
- The
_reset()
method resets the building simulation and environment metrics. - Episode counters and timestamps are initialized.
- The initial observation is generated by calling
_get_observation()
.
- The
-
Stepping through the Environment:
-
Action Application:
- The
_step(action)
method processes the action from the agent. - Actions are normalized and converted into native values using
_create_action_request(action_array)
. - The action request is sent to the building simulation via
self.building.request_action(action_request)
. - The environment handles action responses, including logging and handling rejections.
- The
-
Observation Retrieval:
- After the action is applied,
_get_observation()
retrieves observations from the building. - Observations are normalized using the observation normalizer.
- Time features (hour of day, day of week) and occupancy features are added to the observation.
- Missing or invalid observations are handled using past data.
- After the action is applied,
-
Reward Calculation:
- The reward is computed using
_get_reward()
, which invokes the reward function’scompute_reward()
method. - Reward metrics are logged and written to summary writers if configured.
- The reward is computed using
-
Episode Termination:
- The environment checks if the episode has ended using
_has_episode_ended(last_timestamp)
. - If the episode has ended, a terminal time step is returned, and the environment is reset for the next episode.
- The environment checks if the episode has ended using
-