Is Fisherian runaway gradient hacking?

TL;DR: No; there is no directed agency that enforces sexual selection through an exploitable proxy. However, Fisherian runaway is an insightful example of the path-dependence of local search, where an easily acquired and apparently useful proxy goal can be so strongly favored that disadvantageous traits emerge as side effects.

Why are male peacocks so ornamented that they are at greatly increased risk of predation? How could natural selection favor such energetically expensive plumage that offers no discernible survival advantage? The answer is “sex”, or more poetically, “demons in imperfect search”.

Fisherian runaway is a natural process in which an easy-to-measure proxy for a “desired” trait is “hacked” by the optimisation pressure of evolution, leading to “undesired” traits. In the peacock example, a more ornamented tail could serve as a highly visible proxy for male fitness: peacocks that survive with larger tails are more likely to be agile and good at acquiring resources for energy. Alternatively, perhaps a preference for larger tail size is randomly acquired. In any case, once sexual selection by female peacocks has zeroed in on “plumage size” as a desirable feature, males with more plumage will likely have more children, reinforcing the trait in the population. Consequently, females are further driven to mate with large-tail men, as their male offspring will have larger tails and thus be more favored by mates. This selection process may then “run away” and produce peacocks with ever more larger tails via positive feedback, until the fitness detriment of this trait exceeds the benefit of selecting for fitter birds.

In outsourcing to sexual selection, natural selection has found an optimization demon. The overall decrease in peacock fitness is possible because the sexual selection pressure of the peahen locally exceeds the selection pressure imposed by predation and food availability. Peacocks have reached an evolutionary “dead-end”, where a maladaptive trait is dominant and persistent. If peacocks were moved “off distribution” to an environment where predation was harsher or food more scarce, they would fare significantly worse than their less ornamented, “unsexy” ancestors.

Gradient hacking is a process by which an internally acquired “mesa-optimizer” might compromise the optimization process of stochastic gradient descent (SGD) in a machine learning system. A mesa-optimizer might accomplish this by:

  1. Introducing a countervailing, “artificial” performance penalty that “masks” the performance benefits of ML modifications that do well on the SGD objective, but not on the mesa-objective;
  2. “Spoofing” performance benefits of certain ML modifications that are desirable to the mesa-objective by withholding performance gains until their implementation; or
  3. In a reinforcement learning context, selectively sampling environmental states that will either leave the mesa-objective unchanged or “steer” the ML model in a way that favors the mesa-objective.

Mesa-optimization might be an “easily acquired policy” for good performance on a sufficiently complex ML task. Many mesa-objectives that allow for good performance in training may point to a proxy that, when optimized for in deployment, leads to undesirable behavior. Worse still is the case where a mesa-optimizer is instrumentally motivated to “deceive” the SGD objective because it has acquired both a mesa-objective that is misaligned with the outer objective, and the capability to retain or achieve the mesa-objective via gradient hacking.

Fisherian runaway seems similar to the first gradient hacking mechanism in that:

  • Sexual selection amplifies the proxy objective of “enormous tail plumage” because it serves as a locally good indicator of “fitness”. Producing a fit species is hard for natural selection given its nature of random, undirected search and the sparse feedback provided from the signal of “death-to-predation after maturity” (i.e. after acquiring plumage). Outsourcing to sexual selection based on an easily discerned (tails are enormous!) proxy for fitness allows for quicker, more reliable feedback.
  • The gradient of peacock fitness adaptations that should lead to globally better fitness is “masked” by local search. All else being equal, peacocks with smaller tails are more agile and energy conserving. Only the local “speed-bump” of sexual selection optimization pressure prevents peacocks from being guided to the optimal trait according to natural selection: far smaller tails.
  • The runaway amplification of maladaptive traits by sexual selection compromises the apparent objective of natural selection (fitness) in a manner similar to how gradient hacking results in compromised performance on the base objective.
  • If a peacock is moved “out-of-distribution”, it will “fail hard” according to the objective of natural selection. This is analogous to the framing of proxy misalignment failures as generalization failures.

Fisherian runaway seems unlike gradient hacking in that:

  • Sexual selection does not “choose” the proxy objective of “larger tails” via an agentic process. Fisherian runaway boosts somewhat arbitrary traits, not just ones that compromise fitness. “Larger tails” may in fact be a randomly acquired preference that is boosted by positive feedback and an implicit “agreement” among the population that larger tails are sexier. There is no “population” employed in SGD, although perhaps there is an analogous feature in genetic algorithms.
  • Natural and sexual selection are likely far noisier and more susceptible to local minima than SGD. It is unclear if SGD will trap ML models in local minima that sufficiently compromise global performance to the extent of Fisherian runaway.

Fisherian runaway offers the following insights for AI alignment:

  • For inner alignment, the selection pressure of the outer optimizer should exceed that which the mesa-optimizer can apply. If we desire peacocks to have higher agility or energy conservation, we should shape the training environment such that predation and food scarcity are such strong incentives that any excessive plumage is disfavoured. The existence of mild incentives for fitness without a sufficiently harsh local penalty is what allows maladaptive local processes to experience runaway amplification.
  • If a trait appears dominant in an AI system, maybe we should not make the Darwinian fallacy and assume that the trait has arisen because it is “purposeful” or globally advantageous. It is unclear to me if the simplicity prior of SGD prohibits the random selection of proxy goals that are boosted by positive feedback mechanisms.
  • “Agentic” search might not be necessary for something quite similar to gradient hacking to emerge. The local nature of search via SGD might be sufficient to birth optimization demons.

Fisherian runaway in peacock plumage is a surprisingly useful “intuition pump” for exploring gradient hacking. I suspect there are many further examples of possible runaway Fisher processes in nature that could be mined for useful insight, such as that discussed here. Ecological models that favor Fisherian runaway might be adapted into useful mathematical approximations of gradient hacking and allow this phenomenon to be instantiated and studied in minimal ML models.


This article was posted on LessWrong.

Written on April 10, 2022