Skip to main content
Version: 🚧 Alpha 🚧

9. Sensors, Movement Rewards

By Killian Trouillet


Content

In Step 8 we defined the GymAgent bridge with stub actions. Now we implement the three forager actions that make the bridge actually work: movement, observation, and reward computation.

The Forager Species

The forager agent has three responsibilities:

  1. Apply the action received from Python (movement)
  2. Compute its observation (what it "sees")
  3. Compute the reward (how well it did)
species forager {
float sensor_range <- 30.0;
float previous_distance <- 0.0;

GymAgent gym_agent <- GymAgent[0];

init {
previous_distance <- location distance_to food_location;
do compute_observation();
}
}

Continuous Movement

Action Format

Python sends a 2D velocity vector [dx, dy], where each component is in [-1, 1].

action apply_action() {
if (empty(gym_agent.next_action)) { return; }
list act <- list(gym_agent.next_action);
if (length(act) < 2) { return; }
float dx <- float(act[0]); // normalized x velocity [-1, 1]
float dy <- float(act[1]); // normalized y velocity [-1, 1]

// Scale: max movement = 3.0 units per step
float move_x <- dx * 3.0;
float move_y <- dy * 3.0;

float new_x <- location.x + move_x;
float new_y <- location.y + move_y;

// Clamp to world bounds
new_x <- max([1.0, min([world_size - 1.0, new_x])]);
new_y <- max([1.0, min([world_size - 1.0, new_y])]);

point new_loc <- {new_x, new_y};

// Check obstacle collision
bool blocked <- false;
loop obs over: obstacles {
if (new_loc intersects obs) {
blocked <- true;
break;
}
}

if (!blocked) {
location <- new_loc;
}

current_step <- current_step + 1;
}

Compared to Part 1

Part 1Part 2
switch direction { match 0 { new_y <- new_y - 1; } ... }location <- location + {dx * 3, dy * 3}
4 possible actionsInfinite actions (any direction + speed)
Always moves exactly 1 cellMoves 0 to 3 units per step

The Observation Vector (13 values)

The forager builds a rich observation using its position, food direction, and obstacle sensors.

Position & Food (5 values)

float dist <- location distance_to food_location;
float angle <- location towards food_location;

gym_agent.state <- [
location.x / world_size, // [0] Normalized X
location.y / world_size, // [1] Normalized Y
min([1.0, dist / (world_size * 1.414)]), // [2] Normalized distance to food
(cos(angle) + 1.0) / 2.0, // [3] Food direction X (normalized)
(sin(angle) + 1.0) / 2.0 // [4] Food direction Y (normalized)
];

Why normalize? Neural networks learn much better when all inputs are in the same range [0, 1]. Raw values like x=85.0 and dist=120.0 would create imbalance.

Ray-Cast Sensors (8 values)

The forager casts 8 rays around itself (every 45°) to detect obstacles:

45° · 90°
\|/
0° ───●─── 135°
/|\
315° · 225°
270°
list<float> sensors;
loop i from: 0 to: 7 {
float ray_angle <- float(i) * 45.0;
point ray_end <- {
location.x + sensor_range * cos(ray_angle),
location.y + sensor_range * sin(ray_angle)
};
geometry ray <- line([location, ray_end]);

float closest <- 1.0; // 1.0 = nothing detected
loop obs over: obstacles {
if (ray intersects obs) {
geometry inter <- ray inter obs;
if (inter != nil) {
float d <- (location distance_to inter.location) / sensor_range;
closest <- min([closest, d]);
}
}
}
add closest to: sensors;
}

Each sensor returns a value between 0 and 1:

  • 0.0 = obstacle is touching the agent
  • 0.5 = obstacle is at half sensor range
  • 1.0 = no obstacle detected in this direction

Complete Observation Table

IndexValueRangeDescription
0x / world_size[0, 1]Agent's X position
1y / world_size[0, 1]Agent's Y position
2dist / max_dist[0, 1]Distance to food
3(cos(angle)+1)/2[0, 1]Food direction (X)
4(sin(angle)+1)/2[0, 1]Food direction (Y)
5sensor 0°[0, 1]Obstacle right
6sensor 45°[0, 1]Obstacle down-right
7sensor 90°[0, 1]Obstacle down
8sensor 135°[0, 1]Obstacle down-left
9sensor 180°[0, 1]Obstacle left
10sensor 225°[0, 1]Obstacle up-left
11sensor 270°[0, 1]Obstacle up
12sensor 315°[0, 1]Obstacle up-right

Reward Shaping

A well-designed reward function is critical for learning. We use reward shaping to guide the agent.

action compute_reward() {
float dist <- location distance_to food_location;

// Goal reached!
if (dist < food_radius) {
gym_agent.reward <- 100.0;
gym_agent.terminated <- true;
gym_agent.truncated <- false;
gym_agent.info <- [];
ask gym_agent { do update_data(); }
return;
}

// Timeout
if (current_step >= max_steps) {
gym_agent.reward <- -1.0;
gym_agent.terminated <- false;
gym_agent.truncated <- true;
gym_agent.info <- [];
ask gym_agent { do update_data(); }
return;
}

// Shaping: encourage getting closer
float delta <- previous_distance - dist;
gym_agent.reward <- delta * 2.0 - 0.01;
previous_distance <- dist;

gym_agent.terminated <- false;
gym_agent.truncated <- false;
gym_agent.info <- [];
ask gym_agent { do update_data(); }
}

Reward Breakdown

EventRewardWhy?
Reach food+100.0Large positive reward for completing the task
Get closer+delta × 2.0Encourages approach. If agent moves 1.5 units closer → reward = +3.0
Get farther−delta × 2.0Penalizes moving away from food
Each step−0.01Small living penalty to encourage speed
Timeout−1.0Mild penalty for not finding food in 300 steps

terminated vs truncated

This is a Gymnasium convention:

  • terminated = true: The episode ended because of the environment logic (food found).
  • truncated = true: The episode ended because of a time limit (max steps).

Python handles them identically (both trigger env.reset()), but they mean different things for logging.


Forager Aspect

aspect default {
draw circle(3) color: #blue;
draw circle(sensor_range) color: rgb(0, 0, 255, 30) border: rgb(0, 0, 255, 80);
}

The forager is drawn as a blue circle with a semi-transparent sensor range indicator.


Complete Model

/**
* Name: SmartForagerGym - Continuous Forager for Gymnasium
* Author: Killian Trouillet
* Description: A continuous-world version of the Smart Forager that exposes
* itself as a Gymnasium environment via the gama-gymnasium bridge.
* Tags: reinforcement-learning, gymnasium, ppo, continuous, tutorial
*/

model SmartForagerGym

global {
float world_size <- 100.0;
point food_location <- {95.0, 95.0}; // Native cell (9,9)
float food_radius <- 5.0;
list<geometry> obstacles <- [];
int max_steps <- 300;
int current_step <- 0;
int gama_server_port <- 0;

init {
// Same 6 obstacle cells as native, each = 10×10 square
obstacles << square(10) at_location {25.0, 25.0}; // cell (2,2)
obstacles << square(10) at_location {35.0, 25.0}; // cell (3,2)
obstacles << square(10) at_location {25.0, 35.0}; // cell (2,3)
obstacles << square(10) at_location {65.0, 45.0}; // cell (6,4)
obstacles << square(10) at_location {75.0, 45.0}; // cell (7,4)
obstacles << square(10) at_location {75.0, 55.0}; // cell (7,5)

create GymAgent;
GymAgent[0].action_space <- [
"type"::"Box",
"low"::[-1.0, -1.0],
"high"::[1.0, 1.0],
"dtype"::"float"
];
GymAgent[0].observation_space <- [
"type"::"Box",
"low"::list_with(13, 0.0),
"high"::list_with(13, 1.0),
"dtype"::"float"
];

create forager number: 1 {
location <- {5.0, 5.0}; // Native cell (0,0)
}
}

reflex main_loop {
if (empty(GymAgent[0].next_action)) { return; }
ask forager[0] {
do apply_action();
do compute_observation();
do compute_reward();
}
}
}

species GymAgent {
map<string, unknown> action_space;
map<string, unknown> observation_space;
list<float> state;
float reward;
bool terminated;
bool truncated;
map<string, unknown> info;
list<float> next_action;
map<string, unknown> data;

// Action to receive action values from Python (avoids list literal bug in expression API)
action set_action(float dx, float dy) {
next_action <- [dx, dy];
return "ok";
}

action update_data() {
data <- [
"State"::state,
"Reward"::reward,
"Terminated"::terminated,
"Truncated"::truncated,
"Info"::info
];
}
}

species forager {
float sensor_range <- 30.0;
float previous_distance <- 0.0;
GymAgent gym_agent <- GymAgent[0];

init {
previous_distance <- location distance_to food_location;
do compute_observation();
}

action apply_action() {
if (empty(gym_agent.next_action)) { return; }
list act <- list(gym_agent.next_action);
if (length(act) < 2) { return; }
float dx <- float(act[0]);
float dy <- float(act[1]);
float move_x <- dx * 3.0;
float move_y <- dy * 3.0;
float new_x <- location.x + move_x;
float new_y <- location.y + move_y;
new_x <- max([1.0, min([world_size - 1.0, new_x])]);
new_y <- max([1.0, min([world_size - 1.0, new_y])]);
point new_loc <- {new_x, new_y};
bool blocked <- false;
loop obs over: obstacles {
if (new_loc intersects obs) {
blocked <- true;
break;
}
}
if (!blocked) {
location <- new_loc;
}
current_step <- current_step + 1;
}

action compute_observation() {
float dist <- location distance_to food_location;
float angle <- location towards food_location;
list<float> sensors;
loop i from: 0 to: 7 {
float ray_angle <- float(i) * 45.0;
point ray_end <- {
location.x + sensor_range * cos(ray_angle),
location.y + sensor_range * sin(ray_angle)
};
geometry ray <- line([location, ray_end]);
float closest <- 1.0;
loop obs over: obstacles {
if (ray intersects obs) {
geometry inter <- ray inter obs;
if (inter != nil) {
float d <- (location distance_to inter.location) / sensor_range;
closest <- min([closest, d]);
}
}
}
add closest to: sensors;
}
gym_agent.state <- [
location.x / world_size,
location.y / world_size,
min([1.0, dist / (world_size * 1.414)]),
(cos(angle) + 1.0) / 2.0,
(sin(angle) + 1.0) / 2.0
] + sensors;
}

action compute_reward() {
float dist <- location distance_to food_location;
if (dist < food_radius) {
gym_agent.reward <- 100.0;
gym_agent.terminated <- true;
gym_agent.truncated <- false;
gym_agent.info <- [];
ask gym_agent { do update_data(); }
return;
}
if (current_step >= max_steps) {
gym_agent.reward <- -1.0;
gym_agent.terminated <- false;
gym_agent.truncated <- true;
gym_agent.info <- [];
ask gym_agent { do update_data(); }
return;
}
float delta <- previous_distance - dist;
gym_agent.reward <- delta * 2.0 - 0.01;
previous_distance <- dist;
gym_agent.terminated <- false;
gym_agent.truncated <- false;
gym_agent.info <- [];
ask gym_agent { do update_data(); }
}

aspect default {
draw circle(3) color: #blue;
draw circle(sensor_range) color: rgb(0, 0, 255, 30) border: rgb(0, 0, 255, 80);
}
}

experiment gym_env {
parameter "communication_port" var: gama_server_port;
output {
display "Continuous World" type: 2d {
graphics "background" {
draw rectangle(world_size, world_size) at: {world_size/2, world_size/2} color: rgb(240, 240, 240);
}
graphics "obstacles" {
loop obs over: obstacles {
draw obs color: rgb(80, 80, 80);
}
}
graphics "food" {
draw circle(food_radius) at: food_location color: rgb(50, 180, 50);
}
species forager;
}
}
}