Version: 🚧 Alpha 🚧

9. Sensors, Movement Rewards

By Killian Trouillet

Content

In Step 8 we defined the GymAgent bridge with stub actions. Now we implement the three forager actions that make the bridge actually work: movement, observation, and reward computation.

The Forager Species

The forager agent has three responsibilities:

Apply the action received from Python (movement)
Compute its observation (what it "sees")
Compute the reward (how well it did)

species forager {
    float sensor_range <- 30.0;
    float previous_distance <- 0.0;
    
    GymAgent gym_agent <- GymAgent[0];
    
    init {
        previous_distance <- location distance_to food_location;
        do compute_observation();
    }
}

Continuous Movement

Action Format

Python sends a 2D acceleration vector [ax, ay], where each component is in [-1, 1]. The forager has velocity with friction — this creates smooth, natural trajectories instead of instant direction changes.

// Velocity variables on the forager species
float velocity_x  <- 0.0;
float velocity_y  <- 0.0;
float friction    <- 0.8;      // damping per step
float accel_force <- 1.5;      // how much the action pushes
float max_speed   <- 3.0;      // speed cap
bool  bumped      <- false;    // tracking collision for penalty

action apply_action() {
    if (empty(gym_agent.next_action)) { return; }
    list act <- list(gym_agent.next_action);
    if (length(act) < 2) { return; }
    float ax <- float(act[0]);   // acceleration [-1, 1]
    float ay <- float(act[1]);
    
    // Update velocity: friction + acceleration
    velocity_x <- velocity_x * friction + ax * accel_force;
    velocity_y <- velocity_y * friction + ay * accel_force;
    
    // Cap speed
    float speed <- sqrt(velocity_x^2 + velocity_y^2);
    if (speed > max_speed) {
        velocity_x <- velocity_x * max_speed / speed;
        velocity_y <- velocity_y * max_speed / speed;
    }
    
    float move_x <- velocity_x;
    float move_y <- velocity_y;
    // ... collision + sliding (see complete model) ...
    current_step <- current_step + 1;
}

Why friction? Without it, the forager would accelerate indefinitely. friction = 0.8 means each step, 20% of velocity is lost. To maintain speed, the agent must keep "pushing". To stop, it just outputs [0, 0] and the velocity decays naturally.

Compared to Part 1

Part 1	Part 2
`switch direction { match 0 { new_y <- new_y - 1; } ... }`	`location <- location + {dx * 3, dy * 3}`
4 possible actions	Infinite actions (any direction + speed)
Always moves exactly 1 cell	Moves 0 to 3 units per step

The Observation Vector (13 values)

The forager builds a rich observation using its position, food direction, and obstacle sensors.

Position & Food (5 values)

float dist <- location distance_to food_location;
float angle <- location towards food_location;

gym_agent.state <- [
    location.x / world_size,                 // [0] Normalized X
    location.y / world_size,                 // [1] Normalized Y
    min([1.0, dist / (world_size * 1.414)]), // [2] Normalized distance to food
    (cos(angle) + 1.0) / 2.0,               // [3] Food direction X (normalized)
    (sin(angle) + 1.0) / 2.0                // [4] Food direction Y (normalized)
];

Why normalize? Neural networks learn much better when all inputs are in the same range [0, 1]. Raw values like x=85.0 and dist=120.0 would create imbalance.

Ray-Cast Sensors (8 values)

The forager casts 8 rays around itself (every 45°) to detect obstacles:

     45°  ·  90°
          \|/
    0° ───●─── 135°
          /|\
    315°  ·  225°
         270°

list<float> sensors;
loop i from: 0 to: 7 {
    float ray_angle <- float(i) * 45.0;
    point ray_end <- {
        location.x + sensor_range * cos(ray_angle),
        location.y + sensor_range * sin(ray_angle)
    };
    geometry ray <- line([location, ray_end]);
    
    float closest <- 1.0;  // 1.0 = nothing detected
    loop obs over: obstacles {
        if (ray intersects obs) {
            geometry inter <- ray inter obs;
            if (inter != nil) {
                float d <- (location distance_to inter.location) / sensor_range;
                closest <- min([closest, d]);
            }
        }
    }
    add closest to: sensors;
}

Each sensor returns a value between 0 and 1:

0.0 = obstacle is touching the agent
0.5 = obstacle is at half sensor range
1.0 = no obstacle detected in this direction

Complete Observation Table

Index	Value	Range	Description
0	`x / world_size`	[0, 1]	Agent's X position
1	`y / world_size`	[0, 1]	Agent's Y position
2	`dist / max_dist`	[0, 1]	Distance to food
3	`(cos(angle)+1)/2`	[0, 1]	Food direction (X)
4	`(sin(angle)+1)/2`	[0, 1]	Food direction (Y)
5	sensor 0°	[0, 1]	Obstacle right
6	sensor 45°	[0, 1]	Obstacle down-right
7	sensor 90°	[0, 1]	Obstacle down
8	sensor 135°	[0, 1]	Obstacle down-left
9	sensor 180°	[0, 1]	Obstacle left
10	sensor 225°	[0, 1]	Obstacle up-left
11	sensor 270°	[0, 1]	Obstacle up
12	sensor 315°	[0, 1]	Obstacle up-right

Reward Shaping

A well-designed reward function is critical for learning. We use reward shaping to guide the agent.

action compute_reward() {
    float dist <- location distance_to food_location;
    
    // Goal reached!
    if (dist < food_radius) {
        gym_agent.reward <- 100.0;
        gym_agent.terminated <- true;
        gym_agent.truncated <- false;
        gym_agent.info <- map([]);
        ask gym_agent { do update_data(); }
        return;
    }
    
    // Timeout
    if (current_step >= max_steps) {
        gym_agent.reward <- -1.0;
        gym_agent.terminated <- false;
        gym_agent.truncated <- true;
        gym_agent.info <- map([]);
        ask gym_agent { do update_data(); }
        return;
    }
    
    // Shaping: encourage getting closer
    float delta <- previous_distance - dist;
    float step_reward <- delta * 2.0 - 0.01;
    if (bumped) { step_reward <- step_reward - 1.0; }
    gym_agent.reward <- step_reward;
    previous_distance <- dist;
    
    gym_agent.terminated <- false;
    gym_agent.truncated <- false;
    gym_agent.info <- map([]);
    ask gym_agent { do update_data(); }
}

Reward Breakdown

Event	Reward	Why?
Reach food	`+100.0`	Large positive reward for completing the task
Get closer	`+delta × 2.0`	Encourages approach. If agent moves 1.5 units closer → reward = +3.0
Get farther	`−delta × 2.0`	Penalizes moving away from food
Each step	`−0.01`	Small living penalty to encourage speed
Wall friction	`−1.00`	Penalty if agent bumps or rubs against a wall
Timeout	`−1.0`	Mild penalty for not finding food in 300 steps

`terminated` vs `truncated`

This is a Gymnasium convention:

terminated = true: The episode ended because of the environment logic (food found).
truncated = true: The episode ended because of a time limit (max steps).

Python handles them identically (both trigger env.reset()), but they mean different things for logging.

Forager Aspect

aspect default {
    draw circle(0.8) color: #blue;
    draw circle(sensor_range) color: rgb(0, 0, 255, 30) border: rgb(0, 0, 255, 80);
}

The forager is drawn as a blue circle with a semi-transparent sensor range indicator.

Complete Model

/**
* Name: SmartForagerGym - Continuous Forager for Gymnasium
* Author: Killian Trouillet
* Description: A continuous-world version of the Smart Forager that exposes
*              itself as a Gymnasium environment via the gama-gymnasium bridge.
* Tags: reinforcement-learning, gymnasium, ppo, continuous, tutorial
*/

model SmartForagerGym

global {
	float world_size <- 100.0;
	point food_location <- {95.0, 95.0};   // Native cell (9,9)
	float food_radius <- 5.0;
	list<geometry> obstacles <- [];
	int max_steps <- 300;
	int current_step <- 0;
	int gama_server_port <- 0;
	
	init {
		// Same 6 obstacle cells as native, each = 10×10 square
		obstacles << square(10) at_location {25.0, 25.0};  // cell (2,2)
		obstacles << square(10) at_location {35.0, 25.0};  // cell (3,2)
		obstacles << square(10) at_location {25.0, 35.0};  // cell (2,3)
		obstacles << square(10) at_location {65.0, 45.0};  // cell (6,4)
		obstacles << square(10) at_location {75.0, 45.0};  // cell (7,4)
		obstacles << square(10) at_location {75.0, 55.0};  // cell (7,5)
		
		create GymAgent;
		GymAgent[0].action_space <- [
			"type"::"Box",
			"low"::[-1.0, -1.0],
			"high"::[1.0, 1.0],
			"dtype"::"float"
		];
		GymAgent[0].observation_space <- [
			"type"::"Box",
			"low"::list_with(13, 0.0),
			"high"::list_with(13, 1.0),
			"dtype"::"float"
		];
		
		create forager number: 1 {
			location <- {5.0, 5.0};  // Native cell (0,0)
		}
	}
	
	reflex main_loop {
		if (empty(GymAgent[0].next_action)) { return; }
		ask forager[0] {
			do apply_action();
			do compute_observation();
			do compute_reward();
		}
	}
}

species GymAgent {
	map<string, unknown> action_space;
	map<string, unknown> observation_space;
	list<float> state <- list_with(13, 0.0);
	float reward <- 0.0;
	bool terminated <- false;
	bool truncated <- false;
	map<string, unknown> info <- [];
	list<float> next_action;
	map<string, unknown> data <- [];
	
	// Called by gama-gymnasium with a JSON string like '[0.32, -0.08]'
	// Generic: works for any action type and any number of dimensions
	action set_action(string json_str) {
		next_action <- list(from_json(json_str));
		return "ok";
	}
	
	action update_data() {
		data <- [
			"State"::state,
			"Reward"::reward,
			"Terminated"::terminated,
			"Truncated"::truncated,
			"Info"::info
		];
	}
}

species forager {
	float sensor_range      <- 30.0;
	float previous_distance <- 0.0;
	float velocity_x        <- 0.0;
	float velocity_y        <- 0.0;
	float friction          <- 0.8;
	float accel_force       <- 1.5;
	float max_speed         <- 3.0;
	bool  bumped            <- false;
	GymAgent gym_agent <- GymAgent[0];
	
	init {
		previous_distance <- location distance_to food_location;
		do compute_observation();
		ask gym_agent { do update_data(); }
	}
	
	action apply_action() {
		bumped <- false;
		if (empty(gym_agent.next_action)) { return; }
		list act <- list(gym_agent.next_action);
		if (length(act) < 2) { return; }
		float ax <- float(act[0]);
		float ay <- float(act[1]);
		velocity_x <- velocity_x * friction + ax * accel_force;
		velocity_y <- velocity_y * friction + ay * accel_force;
		float speed <- sqrt(velocity_x^2 + velocity_y^2);
		if (speed > max_speed) {
			velocity_x <- velocity_x * max_speed / speed;
			velocity_y <- velocity_y * max_speed / speed;
		}
		float move_x <- velocity_x;
		float move_y <- velocity_y;
		float new_x <- location.x + move_x;
		float new_y <- location.y + move_y;
		if (new_x < 1.0) { new_x <- 1.0; velocity_x <- 0.0; bumped <- true; }
		if (new_x > world_size - 1.0) { new_x <- world_size - 1.0; velocity_x <- 0.0; bumped <- true; }
		if (new_y < 1.0) { new_y <- 1.0; velocity_y <- 0.0; bumped <- true; }
		if (new_y > world_size - 1.0) { new_y <- world_size - 1.0; velocity_y <- 0.0; bumped <- true; }
		point new_loc <- {new_x, new_y};
		bool blocked <- false;
		loop obs over: obstacles {
			if (new_loc intersects obs) { blocked <- true; bumped <- true; break; }
		}
		if (!blocked) {
			location <- new_loc;
		} else {
			point slide_x <- {new_x, location.y};
			point slide_y <- {location.x, new_y};
			bool x_ok <- true;
			loop obs over: obstacles { if (slide_x intersects obs) { x_ok <- false; break; } }
			bool y_ok <- true;
			loop obs over: obstacles { if (slide_y intersects obs) { y_ok <- false; break; } }
			if (x_ok and y_ok) {
				location <- (abs(move_x) >= abs(move_y)) ? slide_x : slide_y;
			} else if (x_ok) { location <- slide_x; velocity_y <- 0.0;
			} else if (y_ok) { location <- slide_y; velocity_x <- 0.0;
			} else { velocity_x <- 0.0; velocity_y <- 0.0; }
		}
		current_step <- current_step + 1;
	}
	
	action compute_observation() {
		float dist <- location distance_to food_location;
		float angle <- 0.0;
		if (dist > 0.001) {
			angle <- location towards food_location;
		}
		list<float> sensors;
		loop i from: 0 to: 7 {
			float ray_angle <- float(i) * 45.0;
			point ray_end <- {
				location.x + sensor_range * cos(ray_angle),
				location.y + sensor_range * sin(ray_angle)
			};
			geometry ray <- line([location, ray_end]);
			float closest <- 1.0;
			geometry border <- polyline([{0,0}, {world_size,0}, {world_size,world_size}, {0,world_size}, {0,0}]);
			if (ray intersects border) {
				geometry inter <- ray inter border;
				if (inter != nil) {
					float d <- (location distance_to inter.location) / sensor_range;
					closest <- min([closest, d]);
				}
			}
			loop obs over: obstacles {
				if (ray intersects obs) {
					geometry inter <- ray inter obs;
					if (inter != nil) {
						float d <- (location distance_to inter.location) / sensor_range;
						closest <- min([closest, d]);
					}
				}
			}
			add closest to: sensors;
		}
		gym_agent.state <- [
			location.x / world_size,
			location.y / world_size,
			min([1.0, dist / (world_size * 1.414)]),
			(cos(angle) + 1.0) / 2.0,
			(sin(angle) + 1.0) / 2.0
		] + sensors;
	}
	
	action compute_reward() {
		float dist <- location distance_to food_location;
		if (dist < food_radius) {
			gym_agent.reward <- 100.0;
			gym_agent.terminated <- true;
			gym_agent.truncated <- false;
			gym_agent.info <- map([]);
			ask gym_agent { do update_data(); }
			return;
		}
		if (current_step >= max_steps) {
			gym_agent.reward <- -1.0;
			gym_agent.terminated <- false;
			gym_agent.truncated <- true;
			gym_agent.info <- map([]);
			ask gym_agent { do update_data(); }
			return;
		}
		float delta <- previous_distance - dist;
		float step_reward <- delta * 2.0 - 0.01;
		if (bumped) { step_reward <- step_reward - 1.0; }
		gym_agent.reward <- step_reward;
		previous_distance <- dist;
		gym_agent.terminated <- false;
		gym_agent.truncated <- false;
		gym_agent.info <- map([]);
		ask gym_agent { do update_data(); }
	}
	
	aspect default {
		draw circle(0.8) color: #blue;
		draw circle(sensor_range) color: rgb(0, 0, 255, 30) border: rgb(0, 0, 255, 80);
	}
}

experiment gym_env {
	parameter "communication_port" var: gama_server_port;
	parameter "Random Seed" var: seed <- 0.0;
	output {
		display "Continuous World" type: 2d {
			graphics "background" {
				draw rectangle(world_size, world_size) at: {world_size/2, world_size/2} color: rgb(240, 240, 240);
			}
			graphics "obstacles" {
				loop obs over: obstacles {
					draw obs color: rgb(80, 80, 80);
				}
			}
			graphics "food" {
				draw circle(food_radius) at: food_location color: rgb(50, 180, 50);
			}
			species forager;
		}
	}
}

Content​

The Forager Species​

Continuous Movement​

Action Format​

Compared to Part 1​

The Observation Vector (13 values)​

Position & Food (5 values)​

Ray-Cast Sensors (8 values)​

Complete Observation Table​

Reward Shaping​

Reward Breakdown​

terminated vs truncated​

Forager Aspect​

Complete Model​