摘要:We explored the possibility that in order for longer-form expressions of reinforcement learning (
win-calmness,
loss-restlessness) to manifest across tasks, they must first develop because of micro-transactions within tasks. We found no evidence of
win-calmness or
loss-restlessness when wins could not be maximised (
unexploitable opponents), nor when the threat of win minimisation was presented (
exploiting opponents), but evidence of
win-calmness (but not
loss-restlessness) when wins could be maximised (
exploitable opponents).