src.Environments.single_agent package
Submodules
src.Environments.single_agent.BaseEnvironment module
- class src.Environments.single_agent.BaseEnvironment.BaseRegime(**kwargs)[source]
Bases:
MujocoEnv,EzPickle,ABCBase class for creating training regimes in a Mujoco simulation environment. It sets up the environment, including the drone and target, and defines the necessary properties and methods that should be implemented by subclasses.
This class should not be instantiated directly but extended by subclasses to define specific training regimes.
- Parameters:
kwargs – Keyword arguments for environment configuration.
- abstract property done: bool
Determine whether the episode is done. This method should be implemented by subclasses.
- Returns:
True if the episode is finished, False otherwise.
- property drone_hit_ground: bool
Check if the drone has hit the ground.
- Returns:
True if the drone has made contact with the ground, False otherwise.
- property drone_target_vector: ndarray
Compute and return the vector from the drone to the target.
- Returns:
A numpy array representing the vector from the drone to the target.
- abstract property metrics: dict[str, Any]
Return additional data or metrics for logging purposes. This method should be implemented by subclasses.
- Returns:
A dictionary containing metrics or additional information.
- property observation: ObsType
Get the current observation of the environment.
- Returns:
The current environment observation.
- pre_simulation() None[source]
Perform any necessary actions before each simulation step. This method should be overridden by subclasses.
- reset_model() ObsType[source]
Reset the environment to an initial state and return the initial observation.
- Returns:
The initial observation after resetting the environment.
- abstract property reward: SupportsFloat
Calculate and return the current reward. This method should be implemented by subclasses.
- Returns:
The current reward.
- step(action: ActType) Tuple[ObsType, SupportsFloat, bool, bool, dict[str, Any]][source]
Execute one time step within the environment.
- Parameters:
action – The action to be executed.
- Returns:
A tuple containing the new observation, reward, done flag, truncated flag, and info dictionary.
- abstract property truncated: bool
Determine whether the episode is truncated. This method should be implemented by subclasses.
- Returns:
True if the episode is truncated, False otherwise.
- class src.Environments.single_agent.BaseEnvironment.Drone(data: ~mujoco._structs.MjData, spawn_box: ~numpy.ndarray, spawn_max_velocity: float, rng: ~numpy.random._generator.Generator = Generator(PCG64) at 0x7F549E1BB920)[source]
Bases:
objectRepresents a drone in the simulation environment. It manages the drone’s state, including its position, velocity, and IMU sensor readings.
- Parameters:
data – The MjData instance containing the simulation state.
spawn_box – A numpy array defining the boundaries for the drone’s initial position.
spawn_max_velocity – The maximum initial velocity of the drone.
rng – An instance of a random number generator (optional).
- property imu_accel: ndarray
Gets the drone’s accelerometer readings from the IMU.
- Returns:
A numpy array representing the drone’s current accelerometer readings.
- property imu_gyro: ndarray
Gets the drone’s gyroscope readings from the IMU.
- Returns:
A numpy array representing the drone’s current gyroscope readings.
- property imu_orientation: ndarray
Gets the drone’s orientation readings from the IMU.
- Returns:
A numpy array representing the drone’s current orientation.
- property position: ndarray
Gets the drone’s position.
- Returns:
A numpy array representing the drone’s current position.
- reset()[source]
Resets the drone’s position and velocity to initial values within the defined spawn box and velocity limits.
- property velocity: ndarray
Gets the drone’s velocity.
- Returns:
A numpy array representing the drone’s current velocity.
- class src.Environments.single_agent.BaseEnvironment.Target(data: ~mujoco._structs.MjData, spawn_box: ~numpy.ndarray, spawn_max_velocity: float, spawn_max_angular_velocity: float, rng: ~numpy.random._generator.Generator = Generator(PCG64) at 0x7F549E1BBA00)[source]
Bases:
objectRepresents a target in the simulation environment. It manages the target’s state, including its position, velocity, and orientation.
- Parameters:
data – The MjData instance containing the simulation state.
spawn_box – A numpy array defining the boundaries for the target’s initial position.
spawn_max_velocity – The maximum initial velocity of the target.
spawn_max_angular_velocity – The maximum initial angular velocity of the target.
rng – An instance of a random number generator (optional).
- property orientation: ndarray
Gets the target’s orientation.
- Returns:
A numpy array representing the target’s current orientation.
- property position: ndarray
Gets the target’s position.
- Returns:
A numpy array representing the target’s current position.
- reset()[source]
Resets the target’s position, velocity, and orientation to initial values within the defined spawn box and velocity limits.
- property velocity: ndarray
Gets the target’s velocity.
- Returns:
A numpy array representing the target’s current velocity.
src.Environments.single_agent.ShootingRegime module
src.Environments.single_agent.TargetRegime module
- class src.Environments.single_agent.TargetRegime.TargetRegime(tolerance_distance: float, max_time: float, reward_distance_coefficient: float, reward_distance_exp: float, reward_distance_max: float, reward_goal: float, reward_velocity_coefficient: float, reward_velocity_exp: float, reward_velocity_max: float, penalty_time: float, penalty_crash: float, **kwargs)[source]
Bases:
BaseRegimeA regime that defines the behavior and objectives for a drone in a target-reaching scenario within a simulation environment. It extends BaseRegime with specific reward and penalty mechanisms related to reaching a target, maintaining velocity, and avoiding crashes.
- Parameters:
tolerance_distance – The distance within which the drone is considered to have reached the target. Type: float
max_time – The maximum duration for which the simulation or episode is allowed to run. If exceeded, the episode is considered truncated. Type: float
reward_distance_coefficient – Coefficient for the reward based on the inverse of the distance to the target. The reward increases as the drone gets closer to the target. Type: float
reward_distance_exp – The exponent applied to the inverse distance reward calculation. Type: float
reward_distance_max – The maximum reward granted for distance to the target. Prevents the reward from becoming excessively large as the distance approaches zero. Type: float
reward_goal – The reward given when the drone reaches the target. Type: float
reward_velocity_coefficient – Coefficient for the reward based on the drone’s velocity. Type: float
reward_velocity_exp – The exponent applied to the velocity in the velocity-based reward calculation. Type: float
reward_velocity_max – The maximum reward granted for the drone’s velocity. Type: float
penalty_time – The penalty applied at each timestep, encouraging the drone to reach the target faster. Type: float
penalty_crash – The penalty applied if the drone crashes, i.e., hits the ground. Type: float
kwargs – Additional keyword arguments passed to the base class (BaseRegime).
- property done: bool
Determine whether the episode has concluded. An episode is considered done if the drone crashes, reaches the target, or the simulation time exceeds max_time.
- Returns:
True if the episode is done; otherwise, False.
- property goal_reached: bool
Determine whether the drone has reached the target by checking if its distance to the target is less than or equal to tolerance_distance.
- Returns:
True if the drone has reached the target; otherwise, False.
- property metrics: dict[str, Any]
Collect and return various metrics related to the drone’s performance and the environment state.
- Returns:
A dictionary containing metrics such as distance to the target, goal-reaching status, drone’s velocity, whether the drone hit the ground, and the simulation time.
- property reward: float
Aggregate the total reward for the current timestep, combining distance, time, crash, goal-reaching, and velocity-based rewards.
- Returns:
The total reward for the current timestep.
- property reward_crash: float
Calculate the crash penalty, applied if the drone hits the ground.
- Returns:
The crash penalty if the drone hits the ground; otherwise, zero.
- property reward_distance: float
Calculate the distance-based reward, which is inversely proportional to the distance between the drone and the target. The reward is capped by reward_distance_max to prevent it from becoming infinitely large as the distance approaches zero.
- Returns:
The distance-based reward, scaled and exponentiated based on the distance to the target, with a maximum limit.
- property reward_goal: float
Calculate the reward for reaching the goal. This reward is granted when the drone’s distance to the target is less than or equal to tolerance_distance.
- Returns:
The reward for reaching the goal if the goal is reached; otherwise, zero.
- property reward_time: float
Calculate the time-based penalty at each timestep to encourage the drone to reach its target faster.
- Returns:
The time-based penalty, which is a constant negative value defined by penalty_time.
- property reward_velocity: float
Calculate the velocity-based reward, which is proportional to the drone’s current velocity.
- Returns:
The velocity-based reward, scaled and exponentiated based on the drone’s current velocity, with a maximum limit.
- property truncated: bool
Check if the episode is truncated due to exceeding the maximum allowed simulation time (max_time).
- Returns:
True if the simulation time exceeds max_time; otherwise, False.