Skip to main content
Version: 🚧 Alpha 🚧

14. Training Multiple Agents

By Killian Trouillet


Starting GAMA Headless​

Windows​

gama-headless.bat -socket 1001

Linux / MacOS​

./gama-headless.sh -socket 1001

Port 1000 is reserved for the GUI. Use any other port for headless training.


Parameter-Shared PPO​

In Part 2, we trained a single forager with PPO. Here we have two foragers, but they share the same observation/action structure and the same goal. So we use parameter sharing: one single neural network for both agents.

Why parameter sharing?​

CriterionParameter Sharing (ours)Independent PPOMADDPG
# Networks1 (shared)1 per agentComplex
Data efficiencyBest — 2× data per updateStandardStandard
CooperationEmerges naturallyMust emerge independentlyExplicit
ImplementationSimpleSimpleComplex

Each agent feeds its own observation (15 values, including teammate position) into the same network and gets its own action back. Because both agents contribute trajectory data to the same network, learning is 2× faster.


Using GamaParallelEnv Directly​

Unlike Part 2 where we used gym.make(), here we use GamaParallelEnv directly — the PettingZoo Parallel API:

from gama_pettingzoo.gama_parallel_env import GamaParallelEnv

env = GamaParallelEnv(
gaml_experiment_path="path/to/forager_petz.gaml",
gaml_experiment_name="petz_env",
gama_ip_address="localhost",
gama_port=1001,
)

obs, infos = env.reset()

Batch Inference​

We query the shared network for all agents at once using select_actions_batch():

active = [a for a in AGENT_IDS if a in obs]
obs_list = [np.array(obs[a], dtype=np.float32) for a in active]

actions_np, log_probs, values = agent.select_actions_batch(obs_list)

actions_dict = {a: actions_np[i] for i, a in enumerate(active)}

This is the same approach used in the Pistonball benchmark.


The Training Loop​

agent = PPOAgent(state_dim=15, action_dim=2)
UPDATE_EVERY = 2048

total_steps = 0
agent_buffers = {a: RolloutBuffer() for a in AGENT_IDS}

for ep in range(1, NUM_EPISODES + 1):
obs, _ = env.reset()
step = 0
done = False

while not done and step < 300:
active = [a for a in AGENT_IDS if a in obs]
obs_list = [np.array(obs[a], dtype=np.float32) for a in active]
actions_np, lps, vals = agent.select_actions_batch(obs_list)

actions_dict = {}
for i, a in enumerate(active):
actions_dict[a] = actions_np[i]
agent_buffers[a].states.append(torch.FloatTensor(obs_list[i]))
agent_buffers[a].actions.append(torch.FloatTensor(actions_np[i]))
agent_buffers[a].logprobs.append(torch.tensor(lps[i]))
agent_buffers[a].values.append(torch.tensor(vals[i]))

next_obs, rewards, terms, truncs, _ = env.step(actions_dict)

for a in active:
agent_buffers[a].rewards.append(rewards.get(a, 0.0))
agent_buffers[a].dones.append(terms.get(a, False) or truncs.get(a, False))

obs = next_obs
step += 1
total_steps += len(active)
done = not env.agents or all(terms.get(a, False) for a in AGENT_IDS)

# PPO update — all agents' data pooled into one gradient step
if total_steps >= UPDATE_EVERY:
agent.update(agent_buffers)
agent_buffers = {a: RolloutBuffer() for a in AGENT_IDS}
total_steps = 0

Key differences from Part 2​

AspectPart 2 (Gymnasium)Part 3 (PettingZoo)
Environmentgym.make()GamaParallelEnv()
ObservationsSingle arrayDict {agent_id: array}
ActionsSingle arrayDict {agent_id: array}
Rollout buffersOne bufferOne buffer per agent
PPO updateagent.update(buffer)agent.update(agent_buffers) — pools all agents' data

What to Expect​

Ep 10/500 | steps 300 | f0 -4.2 f1 -3.8 | success 0%
Ep 50/500 | steps 241 | f0 +8.1 f1 +6.4 | success 5%
Ep 100/500 | steps 127 | f0 +42.3 f1 +38.7 | success 30%
Ep 200/500 | steps 74 | f0 +81.4 f1 +79.2 | success 72%
Ep 300/500 | steps 59 | f0 +88.6 f1 +87.1 | success 85%
Ep 500/500 | steps 52 | f0 +91.2 f1 +90.4 | success 92%

Running the Training​

cd models/petz
python train_forager_petz.py

The shared model is saved to saved_models/ppo_forager.pth.


Key Files​

FileDescription
models/petz/forager_petz.gamlGAMA model with PetzAgent bridge
models/petz/train_forager_petz.pyMARL training script (headless)
models/petz/test_forager_petz.pyTesting script (GUI visualization)