System Interaction and Architecture
Data Flow
The system operates in discrete time steps within an episode, following the reinforcement learning loop:
-
Agent Action:
- The agent selects an action based on the current observation.
- The action is normalized and sent to the environment.
-
Environment Response:
- The environment applies the action to the building simulation.
- Actions are converted from normalized values to native setpoint values using action normalizers.
- The building simulation updates its state based on the action.
-
Observation Retrieval:
- The environment retrieves observations from the building after the action.
- Observations are normalized and processed, including time and occupancy features.
- Missing or invalid observations are handled using previous data or default values.
-
Reward Calculation:
- The reward function computes the reward based on productivity, energy cost, and carbon emissions.
- The reward is provided to the agent.
-
State Update:
- The environment updates internal metrics and logs information.
- Checks if the episode has ended based on the number of steps or time.
- If the episode has ended, the environment resets for the next episode.
Component Interactions
-
Environment and Building Simulation:
- The
Environment
interacts with theSimulatorBuilding
, which integrates the building simulation (TFSimulator
), HVAC systems, weather controller, and occupancy model. - Actions are applied to the building simulation, and observations are retrieved after each step.
- The
-
Reward Functions:
- The environment uses the reward function (e.g.,
SetpointEnergyCarbonRegretFunction
) to compute rewards based on theRewardInfo
from the building. - The reward function accesses energy consumption data, occupancy levels, and temperatures to compute productivity and costs.
- The environment uses the reward function (e.g.,
-
Energy Cost Models:
ElectricityEnergyCost
andNaturalGasEnergyCost
provide cost and carbon emission calculations based on energy usage and time.- The reward function uses these models to compute energy costs and carbon emissions for the reward.
-
Normalization and Configuration:
ActionConfig
defines how actions are normalized and mapped to building setpoints.- Observation normalizers are defined for each measurement to ensure consistent scaling.
- Gin configuration files specify parameters and bindings for all components.