Structure Exploiting Reinforcement Learning

This project is funded as an NSF CAREER Award.

Background and Challenge

Multi-agent networked systems play an indispensable role in advancing our modern society. Examples cut across a broad spectrum, including power grid, transportation systems, networked robots, water/gas distribution systems, smart buildings, Internet of Things, etc. The control and operation of such systems have long been a tremendous challenge. The recent advancement of Machine Learning (ML), particularly Reinforcement Learning (RL), is recognized to hold great potential for revolutionizing such large-scale networked systems.

However, despite a rich literature on RL and Multi-Agent RL (MARL), (MA)RL algorithms are widely recognized to suffer from scalability, stability, and safety issues when it comes to large-scale networked systems.

Proposed Solutions: Structure Exploiting RL

Structure Exploiting RL

This project takes a unique perspective in addressing the above challenges by exploiting the underlying structure to design scalable, stable, and safe MARL for large-scale networked systems. Most of the existing (MA)RL approaches take a black-box view of the underlying system without utilizing the underlying structure. In contrast, real-world systems have rich structural properties. In the fast time-scale swing dynamics in power systems, each node’s state is directly impacted only by neighbors in the network; the aerodynamic interactions between drones in a drone swarm only happen when they are in close proximity; the load balancing problem in a multiserver system follows a homogeneous structure.

In light of the above structural properties, the overarching goal of this project is to develop systematic tools that exploit the underlying structure to design scalable, stable, and safe MARL for large-scale networked systems. In the following, we introduce several typical structures that we study, including network structure, time-varying network, homogeneity, and physics-based dynamics.

Network Structure

Network Structure

Network structure, like the topology of the power grid, a transportation network, is a very typical type of structure. Mathematically, the locality structure is reflected in the sparsity of the transition (each agent only impacted by neighbors). Leveraging such network structure, our work below is able to improve the scalability of multi-agent RL.

Application: Power Systems

Power system

We have also deployed the approach to power sytems frequency control and greatly improved the number of agents that RL can scale up to.

Time-varying network

Another typical structure is \emph{time-varying network}, motivated by networked robots where each robot interacts with nearby robots, e.g., aerodynamic interactions between drones. As the locations of the robots can change, the set of nearby robots can be drastically different from step to step, i.e. the network is ``time-varying’’. In the following recent work, we show that is still possible to achieve scalable RL under a specific information structure.

Symmetry

Symmetry

Consider a power system demand respond program, where a system dispatcher interacts with a large number of individual customers; or queueing systems, where a dispatcher assign jobs to a large number of queues. In all these systems, there are a large number of agents but they all share similar structures and properties. In this of work, we are inspired by the ``power-of-two-choices’’ idea in queueing theory and develop a method for MARL that effectively handles the large number of similar agents.

Physics-based model

Model-based diffusion

Many engineering systems like power systems and robotics come with a physics-based dynamics model that is derived based on first principles. In contrast, RL approaches do not assume a specific model class and learns a controller (often times neural network based) in a data driven manner. In this line of work, we seek to leverage such physics-based model to improve the efficiency of RL and ML-based approaches.