System Interaction and Architecture

Data Flow

The system operates in discrete time steps within an episode, following the reinforcement learning loop:

Agent Action:
- The agent selects an action based on the current observation.
- The action is normalized and sent to the environment.
Environment Response:
- The environment applies the action to the building simulation.
- Actions are converted from normalized values to native setpoint values using action normalizers.
- The building simulation updates its state based on the action.
Observation Retrieval:
- The environment retrieves observations from the building after the action.
- Observations are normalized and processed, including time and occupancy features.
- Missing or invalid observations are handled using previous data or default values.
Reward Calculation:
- The reward function computes the reward based on productivity, energy cost, and carbon emissions.
- The reward is provided to the agent.
State Update:
- The environment updates internal metrics and logs information.
- Checks if the episode has ended based on the number of steps or time.
- If the episode has ended, the environment resets for the next episode.

Environment and Building Simulation:
- The Environment interacts with the SimulatorBuilding, which integrates the building simulation (TFSimulator), HVAC systems, weather controller, and occupancy model.
- Actions are applied to the building simulation, and observations are retrieved after each step.
Reward Functions:
- The environment uses the reward function (e.g., SetpointEnergyCarbonRegretFunction) to compute rewards based on the RewardInfo from the building.
- The reward function accesses energy consumption data, occupancy levels, and temperatures to compute productivity and costs.
Energy Cost Models:
- ElectricityEnergyCost and NaturalGasEnergyCost provide cost and carbon emission calculations based on energy usage and time.
- The reward function uses these models to compute energy costs and carbon emissions for the reward.
Normalization and Configuration:
- ActionConfig defines how actions are normalized and mapped to building setpoints.
- Observation normalizers are defined for each measurement to ensure consistent scaling.
- Gin configuration files specify parameters and bindings for all components.