Environment Module
The environment module provides the reinforcement learning environment where the agent interacts with the building simulation to control HVAC systems.
environment/environment.py
Purpose: Implements a controllable building RL environment compatible with TF-Agents, allowing agents to control various setpoints with the goal of making the HVAC system more efficient.
Key Classes and Functions
-
Environment(inherits frompy_environment.PyEnvironment):-
Attributes:
-
Building and Simulation Components:
_building: Instance ofBaseBuilding, representing the simulated building environment._time_zone: Time zone of the building/environment._start_timestampand_end_timestamp: Start and end times of the episode._step_interval: Time interval between environment steps._num_timesteps_in_episode: Number of timesteps in an episode._observation_request: Template for requesting observations from the building.
-
Agent Interaction Components:
_action_spec: Specification of the actions that can be taken._observation_spec: Specification of the observations._action_normalizers: Mapping from action names to their normalizers._action_names: List of action field names._field_names: List of observation field names._observation_normalizer: Normalizes observations received from the building._default_policy_values: Default actions in the policy.
-
Reward and Metrics:
_reward_function: Instance ofBaseRewardFunctionused to compute rewards._discount_factor: Discount factor for future rewards._metrics: Stores metrics for analysis and visualization._metrics_reporting_interval: Interval for reporting metrics._summary_writer: Writes summary data for visualization tools like TensorBoard.
-
Episode and Step Tracking:
_episode_ended: Boolean flag indicating if the episode has ended._step_count: Number of steps taken in the current episode._global_step_count: Total number of steps taken across episodes._episode_count: Number of episodes completed._episode_cumulative_reward: Cumulative reward for the current episode.
-
-
Methods:
-
__init__(...): Initializes the environment with the specified parameters and configurations. -
Environment Lifecycle Methods:
_reset(): Resets the environment to its initial state, preparing for a new episode._step(action): Executes one time step in the environment given an action from the agent._has_episode_ended(last_timestamp): Checks if the episode has ended based on time or steps.
-
Agent Interaction Methods:
action_spec(): Returns the specification of the actions that can be taken.observation_spec(): Returns the specification of the observations._get_observation(): Retrieves and processes observations from the building, including normalizing and handling missing data._create_action_request(action_array): Converts normalized agent actions into native action values for the building.
-
Reward Calculation Methods:
_get_reward(): Computes the reward for the last action taken based on the reward function._write_summary_reward_info_metrics(reward_info): Writes reward input metrics into summary logs._write_summary_reward_response_metrics(reward_response): Writes reward output metrics into summary logs._commit_reward_metrics(): Aggregates and writes reward metrics, and resets accumulators.
-
Utility Methods:
_get_action_spec_and_normalizers(...): Defines the action space and normalizers based on the building devices and action configurations._get_observation_spec(...): Defines the observation space based on the building devices and observation configurations._format_action(action, action_names): Reformat actions if necessary (extension point for subclasses).render(mode): (Not implemented) Intended for rendering the environment state.
-
Properties:
steps_per_episode: Number of steps in an episode.start_timestamp: Start time of the episode.end_timestamp: End time of the episode.default_policy_values: Default actions in the policy.label: Label for the environment or episode.current_simulation_timestamp: Current simulation time.
-
-
Environment Workflow
-
Initialization:
- The environment is initialized with specified parameters, including the building simulation, reward function, observation normalizer, and action configurations.
- Action and observation specifications are set up based on devices and configurations, involving action normalizers and mappings.
- Auxiliary features such as time of day and day of week are prepared.
-
Resetting the Environment:
- The
_reset()method resets the building simulation and environment metrics. - Episode counters and timestamps are initialized.
- The initial observation is generated by calling
_get_observation().
- The
-
Stepping through the Environment:
-
Action Application:
- The
_step(action)method processes the action from the agent. - Actions are normalized and converted into native values using
_create_action_request(action_array). - The action request is sent to the building simulation via
self.building.request_action(action_request). - The environment handles action responses, including logging and handling rejections.
- The
-
Observation Retrieval:
- After the action is applied,
_get_observation()retrieves observations from the building. - Observations are normalized using the observation normalizer.
- Time features (hour of day, day of week) and occupancy features are added to the observation.
- Missing or invalid observations are handled using past data.
- After the action is applied,
-
Reward Calculation:
- The reward is computed using
_get_reward(), which invokes the reward function’scompute_reward()method. - Reward metrics are logged and written to summary writers if configured.
- The reward is computed using
-
Episode Termination:
- The environment checks if the episode has ended using
_has_episode_ended(last_timestamp). - If the episode has ended, a terminal time step is returned, and the environment is reset for the next episode.
- The environment checks if the episode has ended using
-