Environment Module

The environment module provides the reinforcement learning environment where the agent interacts with the building simulation to control HVAC systems.

`environment/environment.py`

Purpose: Implements a controllable building RL environment compatible with TF-Agents, allowing agents to control various setpoints with the goal of making the HVAC system more efficient.

Key Classes and Functions

Environment (inherits from py_environment.PyEnvironment):
- Attributes:
  - Building and Simulation Components:
    - _building: Instance of BaseBuilding, representing the simulated building environment.
    - _time_zone: Time zone of the building/environment.
    - _start_timestamp and _end_timestamp: Start and end times of the episode.
    - _step_interval: Time interval between environment steps.
    - _num_timesteps_in_episode: Number of timesteps in an episode.
    - _observation_request: Template for requesting observations from the building.
  - Agent Interaction Components:
    - _action_spec: Specification of the actions that can be taken.
    - _observation_spec: Specification of the observations.
    - _action_normalizers: Mapping from action names to their normalizers.
    - _action_names: List of action field names.
    - _field_names: List of observation field names.
    - _observation_normalizer: Normalizes observations received from the building.
    - _default_policy_values: Default actions in the policy.
  - Reward and Metrics:
    - _reward_function: Instance of BaseRewardFunction used to compute rewards.
    - _discount_factor: Discount factor for future rewards.
    - _metrics: Stores metrics for analysis and visualization.
    - _metrics_reporting_interval: Interval for reporting metrics.
    - _summary_writer: Writes summary data for visualization tools like TensorBoard.
  - Episode and Step Tracking:
    - _episode_ended: Boolean flag indicating if the episode has ended.
    - _step_count: Number of steps taken in the current episode.
    - _global_step_count: Total number of steps taken across episodes.
    - _episode_count: Number of episodes completed.
    - _episode_cumulative_reward: Cumulative reward for the current episode.
- Methods:
  - __init__(...): Initializes the environment with the specified parameters and configurations.
  - Environment Lifecycle Methods:
    - _reset(): Resets the environment to its initial state, preparing for a new episode.
    - _step(action): Executes one time step in the environment given an action from the agent.
    - _has_episode_ended(last_timestamp): Checks if the episode has ended based on time or steps.
  - Agent Interaction Methods:
    - action_spec(): Returns the specification of the actions that can be taken.
    - observation_spec(): Returns the specification of the observations.
    - _get_observation(): Retrieves and processes observations from the building, including normalizing and handling missing data.
    - _create_action_request(action_array): Converts normalized agent actions into native action values for the building.
  - Reward Calculation Methods:
    - _get_reward(): Computes the reward for the last action taken based on the reward function.
    - _write_summary_reward_info_metrics(reward_info): Writes reward input metrics into summary logs.
    - _write_summary_reward_response_metrics(reward_response): Writes reward output metrics into summary logs.
    - _commit_reward_metrics(): Aggregates and writes reward metrics, and resets accumulators.
  - Utility Methods:
    - _get_action_spec_and_normalizers(...): Defines the action space and normalizers based on the building devices and action configurations.
    - _get_observation_spec(...): Defines the observation space based on the building devices and observation configurations.
    - _format_action(action, action_names): Reformat actions if necessary (extension point for subclasses).
    - render(mode): (Not implemented) Intended for rendering the environment state.
  - Properties:
    - steps_per_episode: Number of steps in an episode.
    - start_timestamp: Start time of the episode.
    - end_timestamp: End time of the episode.
    - default_policy_values: Default actions in the policy.
    - label: Label for the environment or episode.
    - current_simulation_timestamp: Current simulation time.

Environment Workflow

Initialization:
- The environment is initialized with specified parameters, including the building simulation, reward function, observation normalizer, and action configurations.
- Action and observation specifications are set up based on devices and configurations, involving action normalizers and mappings.
- Auxiliary features such as time of day and day of week are prepared.
Resetting the Environment:
- The _reset() method resets the building simulation and environment metrics.
- Episode counters and timestamps are initialized.
- The initial observation is generated by calling _get_observation().
Stepping through the Environment:
- Action Application:
  - The _step(action) method processes the action from the agent.
  - Actions are normalized and converted into native values using _create_action_request(action_array).
  - The action request is sent to the building simulation via self.building.request_action(action_request).
  - The environment handles action responses, including logging and handling rejections.
- Observation Retrieval:
  - After the action is applied, _get_observation() retrieves observations from the building.
  - Observations are normalized using the observation normalizer.
  - Time features (hour of day, day of week) and occupancy features are added to the observation.
  - Missing or invalid observations are handled using past data.
- Reward Calculation:
  - The reward is computed using _get_reward(), which invokes the reward function’s compute_reward() method.
  - Reward metrics are logged and written to summary writers if configured.
- Episode Termination:
  - The environment checks if the episode has ended using _has_episode_ended(last_timestamp).
  - If the episode has ended, a terminal time step is returned, and the environment is reset for the next episode.

Back to Home