Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upImplement, document, and test --cautious mode #211
Conversation
|
I believe this would be a helpful feature for sure. Some thoughts. I think there's two workflow for changes:
Workflow 2 is the one that the "cautious" rollout works for, this would be a smaller percentage of clients taking the change over a longer time period, however a quicker rollback for the clients that took the change already, resulting in lower impact.
Workflow 1 is common, but fits the two round rollout rather than a "cautious" rollout. It gives you a shorter time to convergence across all clients and quick rollback. However at the same time, a larger percentage of clients are taking the change in a shorter amount of time, so if it's a bad change then the impact is bigger. One thing I've experienced that comes to mind that caused a big issue using Terraform, is that when you do "cautious", and apply, you have changed the state on the provider, but you haven't changed the state locally. This is similar to making changes outside of Terraform or making changes and applying, but not pushing up your changes. Two cases I can think of that would become an issue:
I know in the case of Github, you're pushing up changes, there's a noop process on a branch, review, and then a deploy of the branch, then merge. In the case of the cautious, there is a period where the branch does not reflect the true state. If someone else made a branch, did a noop, I guess the "removing cautious" would hopefully be caught during review, before their deploy went out. However, if you're banking on having some time before your record was updated to "normal" to protect your rollback, you effectively have to halt all DNS deploys until you've either rolled back or went forward with bringing it to "normal". That period could be a while, especially if you're waiting for traffic to shift, and even then you could have a issue that only comes up when you're under load, which you'd definitely want the ability to rollback. I think as far as the feature, it's definitely a great addition to OctoDNS, the above points I believe are process problems, not a tool problem. Just curious if you have any thoughts/have run into any issues like that if you've been using the cautious feature internally for some time. I'll pull the branch down in the morning and give it a try :) |
|
Thanks @ross, this is a great feature! I agree with @yzguy that it comes with a workflow challenge. I was thinking that an intermediate branch could push a "cautious" state and the master branch would set the normal state. But this would require another (manual) merge and you don't want the records in a "cautious" state for too long either. |
| def make_cautious(self): | ||
| for change in self.changes: | ||
| if change.new: | ||
| change.new.ttl = 60 |
This comment has been minimized.
This comment has been minimized.
This seems like it describes the use-case this was intended for. Changes are rolled out with a short TTL so that if something goes wrong and you revert the reversion will take effect quickly, thus "cautious."
I think I'd call this one aggressive more than cautious. Lowering the TTL before making a change would cause all the clients to pick it up quickly so if there was a problem, everyone would see it before you'd have a chance to revert. You could at least still revert quickly assuming you left the lower TTL in place when the value change happened.
Our deployment system ensures that only one branch deploy is active at a time. In general i think there are a lot of problems you could run into without "doit" locking of some sort. Terraform's serial numbers and saved plans style setup could help with that, but that was a direction we consciously didn't end up taking to avoid the need for state of some sort (and the extra complexity.) |

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

ross commentedFeb 25, 2018
Normal plan:
Cautious plan
This automates something we've done a couple times now to "cautiously" roll out changes in a way where we have a quick path to revert. We've been doing this with two rounds of changes were we manually set the TTLs on the😨 bits to 60s in the first round and then to the real value we want in the subsequent. This just builds that concept into octoDNS itself.
It seems useful, but feedback/thoughts solicated.
/cc @theojulienne @joewilliams @yzguy @vanbroup