Skip to main content
Version: ๐Ÿšง Alpha ๐Ÿšง

9. Sensors, Movement Rewards

By Killian Trouillet


Contentโ€‹

In Step 8 we defined the GymAgent bridge with stub actions. Now we implement the three forager actions that make the bridge actually work: movement, observation, and reward computation.

The Forager Speciesโ€‹

The forager agent has three responsibilities:

  1. Apply the action received from Python (movement)
  2. Compute its observation (what it "sees")
  3. Compute the reward (how well it did)
species forager {
float sensor_range <- 30.0;
float previous_distance <- 0.0;

GymAgent gym_agent <- GymAgent[0];

init {
previous_distance <- location distance_to food_location;
do compute_observation();
}
}

Continuous Movementโ€‹

Action Formatโ€‹

Python sends a 2D acceleration vector [ax, ay], where each component is in [-1, 1]. The forager has velocity with friction โ€” this creates smooth, natural trajectories instead of instant direction changes.

// Velocity variables on the forager species
float velocity_x <- 0.0;
float velocity_y <- 0.0;
float friction <- 0.8; // damping per step
float accel_force <- 1.5; // how much the action pushes
float max_speed <- 3.0; // speed cap
bool bumped <- false; // tracking collision for penalty

action apply_action() {
if (empty(gym_agent.next_action)) { return; }
list act <- list(gym_agent.next_action);
if (length(act) < 2) { return; }
float ax <- float(act[0]); // acceleration [-1, 1]
float ay <- float(act[1]);

// Update velocity: friction + acceleration
velocity_x <- velocity_x * friction + ax * accel_force;
velocity_y <- velocity_y * friction + ay * accel_force;

// Cap speed
float speed <- sqrt(velocity_x^2 + velocity_y^2);
if (speed > max_speed) {
velocity_x <- velocity_x * max_speed / speed;
velocity_y <- velocity_y * max_speed / speed;
}

float move_x <- velocity_x;
float move_y <- velocity_y;
// ... collision + sliding (see complete model) ...
current_step <- current_step + 1;
}

Why friction? Without it, the forager would accelerate indefinitely. friction = 0.8 means each step, 20% of velocity is lost. To maintain speed, the agent must keep "pushing". To stop, it just outputs [0, 0] and the velocity decays naturally.

Compared to Part 1โ€‹

Part 1Part 2
switch direction { match 0 { new_y <- new_y - 1; } ... }location <- location + {dx * 3, dy * 3}
4 possible actionsInfinite actions (any direction + speed)
Always moves exactly 1 cellMoves 0 to 3 units per step

The Observation Vector (13 values)โ€‹

The forager builds a rich observation using its position, food direction, and obstacle sensors.

Position & Food (5 values)โ€‹

float dist <- location distance_to food_location;
float angle <- location towards food_location;

gym_agent.state <- [
location.x / world_size, // [0] Normalized X
location.y / world_size, // [1] Normalized Y
min([1.0, dist / (world_size * 1.414)]), // [2] Normalized distance to food
(cos(angle) + 1.0) / 2.0, // [3] Food direction X (normalized)
(sin(angle) + 1.0) / 2.0 // [4] Food direction Y (normalized)
];

Why normalize? Neural networks learn much better when all inputs are in the same range [0, 1]. Raw values like x=85.0 and dist=120.0 would create imbalance.

Ray-Cast Sensors (8 values)โ€‹

The forager casts 8 rays around itself (every 45ยฐ) to detect obstacles:

45ยฐ ยท 90ยฐ
\|/
0ยฐ โ”€โ”€โ”€โ—โ”€โ”€โ”€ 135ยฐ
/|\
315ยฐ ยท 225ยฐ
270ยฐ
list<float> sensors;
loop i from: 0 to: 7 {
float ray_angle <- float(i) * 45.0;
point ray_end <- {
location.x + sensor_range * cos(ray_angle),
location.y + sensor_range * sin(ray_angle)
};
geometry ray <- line([location, ray_end]);

float closest <- 1.0; // 1.0 = nothing detected
loop obs over: obstacles {
if (ray intersects obs) {
geometry inter <- ray inter obs;
if (inter != nil) {
float d <- (location distance_to inter.location) / sensor_range;
closest <- min([closest, d]);
}
}
}
add closest to: sensors;
}

Each sensor returns a value between 0 and 1:

  • 0.0 = obstacle is touching the agent
  • 0.5 = obstacle is at half sensor range
  • 1.0 = no obstacle detected in this direction

Complete Observation Tableโ€‹

IndexValueRangeDescription
0x / world_size[0, 1]Agent's X position
1y / world_size[0, 1]Agent's Y position
2dist / max_dist[0, 1]Distance to food
3(cos(angle)+1)/2[0, 1]Food direction (X)
4(sin(angle)+1)/2[0, 1]Food direction (Y)
5sensor 0ยฐ[0, 1]Obstacle right
6sensor 45ยฐ[0, 1]Obstacle down-right
7sensor 90ยฐ[0, 1]Obstacle down
8sensor 135ยฐ[0, 1]Obstacle down-left
9sensor 180ยฐ[0, 1]Obstacle left
10sensor 225ยฐ[0, 1]Obstacle up-left
11sensor 270ยฐ[0, 1]Obstacle up
12sensor 315ยฐ[0, 1]Obstacle up-right

Reward Shapingโ€‹

A well-designed reward function is critical for learning. We use reward shaping to guide the agent.

action compute_reward() {
float dist <- location distance_to food_location;

// Goal reached!
if (dist < food_radius) {
gym_agent.reward <- 100.0;
gym_agent.terminated <- true;
gym_agent.truncated <- false;
gym_agent.info <- map([]);
ask gym_agent { do update_data(); }
return;
}

// Timeout
if (current_step >= max_steps) {
gym_agent.reward <- -1.0;
gym_agent.terminated <- false;
gym_agent.truncated <- true;
gym_agent.info <- map([]);
ask gym_agent { do update_data(); }
return;
}

// Shaping: encourage getting closer
float delta <- previous_distance - dist;
float step_reward <- delta * 2.0 - 0.01;
if (bumped) { step_reward <- step_reward - 1.0; }
gym_agent.reward <- step_reward;
previous_distance <- dist;

gym_agent.terminated <- false;
gym_agent.truncated <- false;
gym_agent.info <- map([]);
ask gym_agent { do update_data(); }
}

Reward Breakdownโ€‹

EventRewardWhy?
Reach food+100.0Large positive reward for completing the task
Get closer+delta ร— 2.0Encourages approach. If agent moves 1.5 units closer โ†’ reward = +3.0
Get fartherโˆ’delta ร— 2.0Penalizes moving away from food
Each stepโˆ’0.01Small living penalty to encourage speed
Wall frictionโˆ’1.00Penalty if agent bumps or rubs against a wall
Timeoutโˆ’1.0Mild penalty for not finding food in 300 steps

terminated vs truncatedโ€‹

This is a Gymnasium convention:

  • terminated = true: The episode ended because of the environment logic (food found).
  • truncated = true: The episode ended because of a time limit (max steps).

Python handles them identically (both trigger env.reset()), but they mean different things for logging.


Forager Aspectโ€‹

aspect default {
draw circle(0.8) color: #blue;
draw circle(sensor_range) color: rgb(0, 0, 255, 30) border: rgb(0, 0, 255, 80);
}

The forager is drawn as a blue circle with a semi-transparent sensor range indicator.


Complete Modelโ€‹

/**
* Name: SmartForagerGym - Continuous Forager for Gymnasium
* Author: Killian Trouillet
* Description: A continuous-world version of the Smart Forager that exposes
* itself as a Gymnasium environment via the gama-gymnasium bridge.
* Tags: reinforcement-learning, gymnasium, ppo, continuous, tutorial
*/

model SmartForagerGym

global {
float world_size <- 100.0;
point food_location <- {95.0, 95.0}; // Native cell (9,9)
float food_radius <- 5.0;
list<geometry> obstacles <- [];
int max_steps <- 300;
int current_step <- 0;
int gama_server_port <- 0;

init {
// Same 6 obstacle cells as native, each = 10ร—10 square
obstacles << square(10) at_location {25.0, 25.0}; // cell (2,2)
obstacles << square(10) at_location {35.0, 25.0}; // cell (3,2)
obstacles << square(10) at_location {25.0, 35.0}; // cell (2,3)
obstacles << square(10) at_location {65.0, 45.0}; // cell (6,4)
obstacles << square(10) at_location {75.0, 45.0}; // cell (7,4)
obstacles << square(10) at_location {75.0, 55.0}; // cell (7,5)

create GymAgent;
GymAgent[0].action_space <- [
"type"::"Box",
"low"::[-1.0, -1.0],
"high"::[1.0, 1.0],
"dtype"::"float"
];
GymAgent[0].observation_space <- [
"type"::"Box",
"low"::list_with(13, 0.0),
"high"::list_with(13, 1.0),
"dtype"::"float"
];

create forager number: 1 {
location <- {5.0, 5.0}; // Native cell (0,0)
}
}

reflex main_loop {
if (empty(GymAgent[0].next_action)) { return; }
ask forager[0] {
do apply_action();
do compute_observation();
do compute_reward();
}
}
}

species GymAgent {
map<string, unknown> action_space;
map<string, unknown> observation_space;
list<float> state <- list_with(13, 0.0);
float reward <- 0.0;
bool terminated <- false;
bool truncated <- false;
map<string, unknown> info <- [];
list<float> next_action;
map<string, unknown> data <- [];

// Called by gama-gymnasium with a JSON string like '[0.32, -0.08]'
// Generic: works for any action type and any number of dimensions
action set_action(string json_str) {
next_action <- list(from_json(json_str));
return "ok";
}

action update_data() {
data <- [
"State"::state,
"Reward"::reward,
"Terminated"::terminated,
"Truncated"::truncated,
"Info"::info
];
}
}

species forager {
float sensor_range <- 30.0;
float previous_distance <- 0.0;
float velocity_x <- 0.0;
float velocity_y <- 0.0;
float friction <- 0.8;
float accel_force <- 1.5;
float max_speed <- 3.0;
bool bumped <- false;
GymAgent gym_agent <- GymAgent[0];

init {
previous_distance <- location distance_to food_location;
do compute_observation();
ask gym_agent { do update_data(); }
}

action apply_action() {
bumped <- false;
if (empty(gym_agent.next_action)) { return; }
list act <- list(gym_agent.next_action);
if (length(act) < 2) { return; }
float ax <- float(act[0]);
float ay <- float(act[1]);
velocity_x <- velocity_x * friction + ax * accel_force;
velocity_y <- velocity_y * friction + ay * accel_force;
float speed <- sqrt(velocity_x^2 + velocity_y^2);
if (speed > max_speed) {
velocity_x <- velocity_x * max_speed / speed;
velocity_y <- velocity_y * max_speed / speed;
}
float move_x <- velocity_x;
float move_y <- velocity_y;
float new_x <- location.x + move_x;
float new_y <- location.y + move_y;
if (new_x < 1.0) { new_x <- 1.0; velocity_x <- 0.0; bumped <- true; }
if (new_x > world_size - 1.0) { new_x <- world_size - 1.0; velocity_x <- 0.0; bumped <- true; }
if (new_y < 1.0) { new_y <- 1.0; velocity_y <- 0.0; bumped <- true; }
if (new_y > world_size - 1.0) { new_y <- world_size - 1.0; velocity_y <- 0.0; bumped <- true; }
point new_loc <- {new_x, new_y};
bool blocked <- false;
loop obs over: obstacles {
if (new_loc intersects obs) { blocked <- true; bumped <- true; break; }
}
if (!blocked) {
location <- new_loc;
} else {
point slide_x <- {new_x, location.y};
point slide_y <- {location.x, new_y};
bool x_ok <- true;
loop obs over: obstacles { if (slide_x intersects obs) { x_ok <- false; break; } }
bool y_ok <- true;
loop obs over: obstacles { if (slide_y intersects obs) { y_ok <- false; break; } }
if (x_ok and y_ok) {
location <- (abs(move_x) >= abs(move_y)) ? slide_x : slide_y;
} else if (x_ok) { location <- slide_x; velocity_y <- 0.0;
} else if (y_ok) { location <- slide_y; velocity_x <- 0.0;
} else { velocity_x <- 0.0; velocity_y <- 0.0; }
}
current_step <- current_step + 1;
}

action compute_observation() {
float dist <- location distance_to food_location;
float angle <- 0.0;
if (dist > 0.001) {
angle <- location towards food_location;
}
list<float> sensors;
loop i from: 0 to: 7 {
float ray_angle <- float(i) * 45.0;
point ray_end <- {
location.x + sensor_range * cos(ray_angle),
location.y + sensor_range * sin(ray_angle)
};
geometry ray <- line([location, ray_end]);
float closest <- 1.0;
geometry border <- polyline([{0,0}, {world_size,0}, {world_size,world_size}, {0,world_size}, {0,0}]);
if (ray intersects border) {
geometry inter <- ray inter border;
if (inter != nil) {
float d <- (location distance_to inter.location) / sensor_range;
closest <- min([closest, d]);
}
}
loop obs over: obstacles {
if (ray intersects obs) {
geometry inter <- ray inter obs;
if (inter != nil) {
float d <- (location distance_to inter.location) / sensor_range;
closest <- min([closest, d]);
}
}
}
add closest to: sensors;
}
gym_agent.state <- [
location.x / world_size,
location.y / world_size,
min([1.0, dist / (world_size * 1.414)]),
(cos(angle) + 1.0) / 2.0,
(sin(angle) + 1.0) / 2.0
] + sensors;
}

action compute_reward() {
float dist <- location distance_to food_location;
if (dist < food_radius) {
gym_agent.reward <- 100.0;
gym_agent.terminated <- true;
gym_agent.truncated <- false;
gym_agent.info <- map([]);
ask gym_agent { do update_data(); }
return;
}
if (current_step >= max_steps) {
gym_agent.reward <- -1.0;
gym_agent.terminated <- false;
gym_agent.truncated <- true;
gym_agent.info <- map([]);
ask gym_agent { do update_data(); }
return;
}
float delta <- previous_distance - dist;
float step_reward <- delta * 2.0 - 0.01;
if (bumped) { step_reward <- step_reward - 1.0; }
gym_agent.reward <- step_reward;
previous_distance <- dist;
gym_agent.terminated <- false;
gym_agent.truncated <- false;
gym_agent.info <- map([]);
ask gym_agent { do update_data(); }
}

aspect default {
draw circle(0.8) color: #blue;
draw circle(sensor_range) color: rgb(0, 0, 255, 30) border: rgb(0, 0, 255, 80);
}
}

experiment gym_env {
parameter "communication_port" var: gama_server_port;
parameter "Random Seed" var: seed <- 0.0;
output {
display "Continuous World" type: 2d {
graphics "background" {
draw rectangle(world_size, world_size) at: {world_size/2, world_size/2} color: rgb(240, 240, 240);
}
graphics "obstacles" {
loop obs over: obstacles {
draw obs color: rgb(80, 80, 80);
}
}
graphics "food" {
draw circle(food_radius) at: food_location color: rgb(50, 180, 50);
}
species forager;
}
}
}