It would be nice if puppet agent had the ability had the ability to retry failed http requests to the puppet master.
I have multiple puppet masters sitting behind a load balancer in AWS (ELB). Whenever we update a puppet master, the node is removed from the load balancer while the update occurs. Unfortunately the AWS ELB does not allow quiescent removal of nodes (such that existing connections are allowed to close gracefully). Instead it just severs the connections immediately. This causes errors for agents which are in the middle of making requests to that master.
Another related scenario is when you're updating multiple puppet masters. The masters might be in the middle of updating, and so some masters have newer code than the others. A puppet agent gets a catalog from one master, which says a certain file should exist, but then when the agent goes to fetch that file, it fails because the master it tried to fetch from hasn't updated. Retrying wouldn't be an ideal solution for this scenario as a retry could just hit that same out-of-date master again, but it could possibly work. Yes the ideal solution here is session persistence, but the AWS ELB does not support it.
It might be useful to even allow a configurable backoff (failure; sleep 2; failure; sleep 5; failure; abort...), though a single retry would be sufficient for the first scenario indicated above.
If a backoff is implemented, I think it should only be done once in the whole puppet run. So that if you have a 100 different http requests that have to be made to the puppet master, you don't do the backoff wait thing 100+ times.