From Best-of-n to Ranking n: A New Approach to Collective Decision-Making in Robot Swarms
Background
As a Postdoctoral Research Associate (PDRA) at the University of Bristol, I explored how multi-agent systems can make collective decisions under greater limitations than typically considered in traditional approaches to the "best-of-n" problem in robot swarms. In our paper, published in Swarm Intelligence (2021), we introduce a novel model for collective preference learning that allows robot swarms to learn not just which option is best, but to rank all available options.
Beyond binary choices
Traditional approaches to the best-of- problem often focus on identifying only the single best option from a set of alternatives, discarding information about the remaining options. While this works for simple scenarios, real-world applications like search and rescue operations require more nuanced decision-making.
Consider a swarm of robots searching for casualties in a disaster scenario. The traditional approach would only identify the highest priority casualty, requiring the entire decision-making process to be repeated for each remaining casualty. This is inefficient and potentially costly in time-critical situations.
Our three-valued logic approach
Instead of using binary logic (option A is better/worse than option B), we've developed a model using three-valued logic where robots can hold beliefs about preference relations in three states:
- True (option A is better than option B)
- False (option A is not better than option B)
- Uncertain
This enables agents to maintain and refine a complete preference ordering over all options simultaneously, rather than simply identifying which option is best.
Key findings
Our experiments revealed several important insights:
Robust to Noise: The model performs well even in noisy environments where evidence might be contradictory or incorrect. When evidence is sparse and noise levels are high, preserving the transitivity of agents' beliefs improves accuracy further.
Scalable Solution: Our approach successfully scales to environments with up to options before accuracy is significantly affected in noisy environments.

High-noise scenario

Low-noise scenario
Communication Efficiency: We developed a bandwidth-limited version allowing agents to communicate only a subset of their preference relations, which still performs well and sometimes outperforms the standard model.
Speed-Accuracy Trade-off: There's a clear trade-off between increased accuracy (by applying transitive closure operations) and computational cost, giving system designers flexibility depending on application requirements.
Why this matters
This research represents a significant step toward deploying swarm robotics in complex real-world scenarios where ranking multiple options is crucial. For example:
- Search and rescue operations where casualties must be prioritized
- Environmental monitoring where multiple phenomena need to be tracked and ranked
- Infrastructure inspection where multiple defects must be classified by severity
By learning complete preference orderings rather than just identifying the best option, swarm robotic systems can make more informed decisions and adapt more effectively to complex environments. This approach also avoids the complete loss of information once the task of highest priority is completed, as the next task is already decided.
Future research directions
We're continuing to investigate how this approach performs with physical constraints like limited communication range and observation capabilities. We're also exploring how to adapt the model for dynamic environments where preferences might change over time.
More detailed information about this research can be found in our paper: "Collective preference learning in the best-of- problem: From best-of-n to ranking " published in Swarm Intelligence (2021).