(Teaching kites to fly using) Reinforcement Learning

An earlier topic: Teaching a glider to soar using reinforcement learning in the field

Links to more info and related topics here.

1 Like

…

Original topic: Swiss researchers invent drone-flying AI that tops champions


Champion-level Drone Racing using Deep Reinforcement Learning (Nature, 2023) - YouTube


Autonomous Drone Racing with Deep Reinforcement Learning - YouTube

Off-topic here, but still on kite control:

NVIDIA’s New AI Just Made Real Physics Look Slow – Two Minute Papers

A few thoughts on this use for Reinforcement Learning(RL):

  • RL is normally done in simulation (as opposed to an actual .. kite in this case), because
    1. it needs a lot of trials to learn an useful behavior. If a simulation can be run sufficiently fast the learning system has “more time” to learn.
    2. failing a physical machine can be expensive.
  • While simulations can be almost arbitrarily accurate to the physical real “thing”, there are some important caveats:
    1. they can predict only what we foresee. e.g. if we do not (or don’t know how to) model kite line failure conditions and mode, the RL model will never learn how to handle a line break.
    2. we cannot tell in advance which properties of the kite are most relevant to its dynamic behavior (e.g. humidity influence on fabric weight and elasticity?) therefore one feels compelled to fill in all property details without knowing whether they-re needed or not. Which takes significant work.
    3. We also need to account for inaccuracies/differences between the physical sensors vs. simulated ones.
    4. In case of wind-dependent machines it is hard to both simulate wind changes (turbulence, gusts) in presence of ground features (trees, hills, buildings) and to predict all possible ground features of a future location of the wind power station.

An alternative to reinforcement learning is transfer learning, or supervised learning, by which both a human’s “correct” control actions and the physical state of the controlled system (= kite’s speed, acceleration, position, orientation, etc..) are measured for an amount of time to collect sufficient data to feed a learning algorithm that directly learns from real-world data.

The advantages here are

  • the learning algorithm is fed accurate actual sensor data, we don’t need to worry about the expertise and details on how to create an accurate simulation.
  • we can skip the lengthy initial trial-and-error stages of a totally “dumb” RL agent before it begins to grasp the right actions, which translates in less training experience required.

Some disadvantages still persist, e.g. not having recorded data on unexpected circumstances and possibly a long amount of human work spent on handling a real kite. But IMO it might be easier to handle a kite for days or weeks to collect sufficient data instead of attempting to develop a simulation of same kite.
If only because it is more fun and a more available domain of expertise.

1 Like