src.Environments.single_agent package

Submodules

src.Environments.single_agent.BaseEnvironment module

class src.Environments.single_agent.BaseEnvironment.BaseRegime(**kwargs)[source]

Bases: MujocoEnv, EzPickle, ABC

Base class for creating training regimes in a Mujoco simulation environment. It sets up the environment, including the drone and target, and defines the necessary properties and methods that should be implemented by subclasses.

This class should not be instantiated directly but extended by subclasses to define specific training regimes.

Parameters:

kwargs – Keyword arguments for environment configuration.

abstract property done: bool

Determine whether the episode is done. This method should be implemented by subclasses.

Returns:

True if the episode is finished, False otherwise.

property drone_hit_ground: bool

Check if the drone has hit the ground.

Returns:

True if the drone has made contact with the ground, False otherwise.

property drone_target_vector: ndarray

Compute and return the vector from the drone to the target.

Returns:

A numpy array representing the vector from the drone to the target.

abstract property metrics: dict[str, Any]

Return additional data or metrics for logging purposes. This method should be implemented by subclasses.

Returns:

A dictionary containing metrics or additional information.

property observation: ObsType

Get the current observation of the environment.

Returns:

The current environment observation.

pre_simulation() None[source]

Perform any necessary actions before each simulation step. This method should be overridden by subclasses.

reset_model() ObsType[source]

Reset the environment to an initial state and return the initial observation.

Returns:

The initial observation after resetting the environment.

abstract property reward: SupportsFloat

Calculate and return the current reward. This method should be implemented by subclasses.

Returns:

The current reward.

step(action: ActType) Tuple[ObsType, SupportsFloat, bool, bool, dict[str, Any]][source]

Execute one time step within the environment.

Parameters:

action – The action to be executed.

Returns:

A tuple containing the new observation, reward, done flag, truncated flag, and info dictionary.

abstract property truncated: bool

Determine whether the episode is truncated. This method should be implemented by subclasses.

Returns:

True if the episode is truncated, False otherwise.

class src.Environments.single_agent.BaseEnvironment.Drone(data: ~mujoco._structs.MjData, spawn_box: ~numpy.ndarray, spawn_max_velocity: float, rng: ~numpy.random._generator.Generator = Generator(PCG64) at 0x7F549E1BB920)[source]

Bases: object

Represents a drone in the simulation environment. It manages the drone’s state, including its position, velocity, and IMU sensor readings.

Parameters:
  • data – The MjData instance containing the simulation state.

  • spawn_box – A numpy array defining the boundaries for the drone’s initial position.

  • spawn_max_velocity – The maximum initial velocity of the drone.

  • rng – An instance of a random number generator (optional).

property imu_accel: ndarray

Gets the drone’s accelerometer readings from the IMU.

Returns:

A numpy array representing the drone’s current accelerometer readings.

property imu_gyro: ndarray

Gets the drone’s gyroscope readings from the IMU.

Returns:

A numpy array representing the drone’s current gyroscope readings.

property imu_orientation: ndarray

Gets the drone’s orientation readings from the IMU.

Returns:

A numpy array representing the drone’s current orientation.

property position: ndarray

Gets the drone’s position.

Returns:

A numpy array representing the drone’s current position.

reset()[source]

Resets the drone’s position and velocity to initial values within the defined spawn box and velocity limits.

property velocity: ndarray

Gets the drone’s velocity.

Returns:

A numpy array representing the drone’s current velocity.

class src.Environments.single_agent.BaseEnvironment.Target(data: ~mujoco._structs.MjData, spawn_box: ~numpy.ndarray, spawn_max_velocity: float, spawn_max_angular_velocity: float, rng: ~numpy.random._generator.Generator = Generator(PCG64) at 0x7F549E1BBA00)[source]

Bases: object

Represents a target in the simulation environment. It manages the target’s state, including its position, velocity, and orientation.

Parameters:
  • data – The MjData instance containing the simulation state.

  • spawn_box – A numpy array defining the boundaries for the target’s initial position.

  • spawn_max_velocity – The maximum initial velocity of the target.

  • spawn_max_angular_velocity – The maximum initial angular velocity of the target.

  • rng – An instance of a random number generator (optional).

property orientation: ndarray

Gets the target’s orientation.

Returns:

A numpy array representing the target’s current orientation.

property position: ndarray

Gets the target’s position.

Returns:

A numpy array representing the target’s current position.

reset()[source]

Resets the target’s position, velocity, and orientation to initial values within the defined spawn box and velocity limits.

property velocity: ndarray

Gets the target’s velocity.

Returns:

A numpy array representing the target’s current velocity.

src.Environments.single_agent.ShootingRegime module

src.Environments.single_agent.TargetRegime module

class src.Environments.single_agent.TargetRegime.TargetRegime(tolerance_distance: float, max_time: float, reward_distance_coefficient: float, reward_distance_exp: float, reward_distance_max: float, reward_goal: float, reward_velocity_coefficient: float, reward_velocity_exp: float, reward_velocity_max: float, penalty_time: float, penalty_crash: float, **kwargs)[source]

Bases: BaseRegime

A regime that defines the behavior and objectives for a drone in a target-reaching scenario within a simulation environment. It extends BaseRegime with specific reward and penalty mechanisms related to reaching a target, maintaining velocity, and avoiding crashes.

Parameters:
  • tolerance_distance – The distance within which the drone is considered to have reached the target. Type: float

  • max_time – The maximum duration for which the simulation or episode is allowed to run. If exceeded, the episode is considered truncated. Type: float

  • reward_distance_coefficient – Coefficient for the reward based on the inverse of the distance to the target. The reward increases as the drone gets closer to the target. Type: float

  • reward_distance_exp – The exponent applied to the inverse distance reward calculation. Type: float

  • reward_distance_max – The maximum reward granted for distance to the target. Prevents the reward from becoming excessively large as the distance approaches zero. Type: float

  • reward_goal – The reward given when the drone reaches the target. Type: float

  • reward_velocity_coefficient – Coefficient for the reward based on the drone’s velocity. Type: float

  • reward_velocity_exp – The exponent applied to the velocity in the velocity-based reward calculation. Type: float

  • reward_velocity_max – The maximum reward granted for the drone’s velocity. Type: float

  • penalty_time – The penalty applied at each timestep, encouraging the drone to reach the target faster. Type: float

  • penalty_crash – The penalty applied if the drone crashes, i.e., hits the ground. Type: float

  • kwargs – Additional keyword arguments passed to the base class (BaseRegime).

property done: bool

Determine whether the episode has concluded. An episode is considered done if the drone crashes, reaches the target, or the simulation time exceeds max_time.

Returns:

True if the episode is done; otherwise, False.

property goal_reached: bool

Determine whether the drone has reached the target by checking if its distance to the target is less than or equal to tolerance_distance.

Returns:

True if the drone has reached the target; otherwise, False.

property metrics: dict[str, Any]

Collect and return various metrics related to the drone’s performance and the environment state.

Returns:

A dictionary containing metrics such as distance to the target, goal-reaching status, drone’s velocity, whether the drone hit the ground, and the simulation time.

property reward: float

Aggregate the total reward for the current timestep, combining distance, time, crash, goal-reaching, and velocity-based rewards.

Returns:

The total reward for the current timestep.

property reward_crash: float

Calculate the crash penalty, applied if the drone hits the ground.

Returns:

The crash penalty if the drone hits the ground; otherwise, zero.

property reward_distance: float

Calculate the distance-based reward, which is inversely proportional to the distance between the drone and the target. The reward is capped by reward_distance_max to prevent it from becoming infinitely large as the distance approaches zero.

Returns:

The distance-based reward, scaled and exponentiated based on the distance to the target, with a maximum limit.

property reward_goal: float

Calculate the reward for reaching the goal. This reward is granted when the drone’s distance to the target is less than or equal to tolerance_distance.

Returns:

The reward for reaching the goal if the goal is reached; otherwise, zero.

property reward_time: float

Calculate the time-based penalty at each timestep to encourage the drone to reach its target faster.

Returns:

The time-based penalty, which is a constant negative value defined by penalty_time.

property reward_velocity: float

Calculate the velocity-based reward, which is proportional to the drone’s current velocity.

Returns:

The velocity-based reward, scaled and exponentiated based on the drone’s current velocity, with a maximum limit.

property truncated: bool

Check if the episode is truncated due to exceeding the maximum allowed simulation time (max_time).

Returns:

True if the simulation time exceeds max_time; otherwise, False.

Module contents