X Tutup
The Wayback Machine - https://web.archive.org/web/20201004074030/https://github.com/jetstack/cert-manager/issues/3222
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed certificates are re-tried infinitely without backoff #3222

Open
mitom opened this issue Aug 26, 2020 · 3 comments
Open

Failed certificates are re-tried infinitely without backoff #3222

mitom opened this issue Aug 26, 2020 · 3 comments

Comments

@mitom
Copy link

@mitom mitom commented Aug 26, 2020

Describe the bug:
The rate limiter queue in the controllers does not appear to actually rate limit. Certificate failures are re-tried immediately without backoff.

Expected behaviour:
When a certificate fails to be provisioned with an error, it would be retried with an exponential back-off.

Steps to reproduce the bug:

  • Set up a certificate request with a DNS challenge on route53 with a zone cert-manager can't manage.
  • Watch logs for it constantly re-retrying

This should work with any other setup as well as it seems like it's the generic error handling re-try mechanism, I simply only have route53 and the lets encrypt issuer set up.

Anything else we need to know?:

I have tried debugging it a bit but didn't get far. The logs lead me to the rate-limit conclusion, see the timing between re-try attempts. Based on https://github.com/jetstack/cert-manager/blob/v0.16.0/pkg/controller/acmechallenges/controller.go#L87 there should be a minimum delay of 30s between the 1st and 2nd attempt but it does not look like it is the case. I've also added another log statement to see if maybe the queue is not aware of the re-tries but it looks like that works as well (see the lines with Item has been re-queued ... "req"="0", req is just b.queue.NumRequeues(obj)). I am not sure what is missing.

(note: the retries go on forever and exhaust the AWS R53 rate limit)

I0826 12:10:18.099774       1 controller.go:152] cert-manager/controller/challenges "msg"="syncing item" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098"
I0826 12:11:22.400468       1 controller.go:152] cert-manager/controller/challenges "msg"="syncing item" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098"
I0826 12:13:40.400383       1 controller.go:153] cert-manager/controller/challenges "msg"="syncing item" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098"
I0826 12:13:51.408314       1 controller.go:159] cert-manager/controller/challenges "msg"="Item has been re-queued" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098" "req"="0"
E0826 12:13:51.408414       1 controller.go:160] cert-manager/controller/challenges "msg"="re-queuing item  due to error processing" "error"="Failed to change Route 53 record set: AccessDenied: User: arn:aws:sts::****************:assumed-role/k8s-test-cert-manager/1598444030958166056 is not authorized to perform: route53:ChangeResourceRecordSets on resource: arn:aws:route53:::hostedzone/****************status code: 403, request id: 6f748fa6-faad-4320-a7ef-db84f5c7484c" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098"
I0826 12:13:51.408988       1 controller.go:153] cert-manager/controller/challenges "msg"="syncing item" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098"
I0826 12:13:51.688875       1 controller.go:159] cert-manager/controller/challenges "msg"="Item has been re-queued" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098" "req"="1"
E0826 12:13:51.688895       1 controller.go:160] cert-manager/controller/challenges "msg"="re-queuing item  due to error processing" "error"="Failed to change Route 53 record set: AccessDenied: User: arn:aws:sts::****************:assumed-role/k8s-test-cert-manager/1598444031409407943 is not authorized to perform: route53:ChangeResourceRecordSets on resource: arn:aws:route53:::hostedzone/****************status code: 403, request id: 8c5b8d8e-d514-493a-8e2f-cbb4b435360d" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098"
I0826 12:13:51.688914       1 controller.go:153] cert-manager/controller/challenges "msg"="syncing item" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098"
I0826 12:13:51.888651       1 controller.go:159] cert-manager/controller/challenges "msg"="Item has been re-queued" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098" "req"="2"
E0826 12:13:51.888682       1 controller.go:160] cert-manager/controller/challenges "msg"="re-queuing item  due to error processing" "error"="Failed to change Route 53 record set: AccessDenied: User: arn:aws:sts::****************:assumed-role/k8s-test-cert-manager/1598444031689124036 is not authorized to perform: route53:ChangeResourceRecordSets on resource: arn:aws:route53:::hostedzone/****************status code: 403, request id: bd14d8d7-6a35-4fb1-bea7-8aaf9e4440f2" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098"
I0826 12:13:51.888701       1 controller.go:153] cert-manager/controller/challenges "msg"="syncing item" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098"
I0826 12:13:52.287100       1 controller.go:159] cert-manager/controller/challenges "msg"="Item has been re-queued" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098" "req"="3"
E0826 12:13:52.287129       1 controller.go:160] cert-manager/controller/challenges "msg"="re-queuing item  due to error processing" "error"="Failed to change Route 53 record set: AccessDenied: User: arn:aws:sts::****************:assumed-role/k8s-test-cert-manager/1598444031888943787 is not authorized to perform: route53:ChangeResourceRecordSets on resource: arn:aws:route53:::hostedzone/****************status code: 403, request id: b12e5ee0-6fd9-4e25-be76-e9ee010b2c43" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098"
I0826 12:13:52.287156       1 controller.go:153] cert-manager/controller/challenges "msg"="syncing item" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098"
I0826 12:13:52.506037       1 controller.go:159] cert-manager/controller/challenges "msg"="Item has been re-queued" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098" "req"="4"
E0826 12:13:52.506182       1 controller.go:160] cert-manager/controller/challenges "msg"="re-queuing item  due to error processing" "error"="Failed to change Route 53 record set: AccessDenied: User: arn:aws:sts::****************:assumed-role/k8s-test-cert-manager/1598444032287385276 is not authorized to perform: route53:ChangeResourceRecordSets on resource: arn:aws:route53:::hostedzone/****************status code: 403, request id: 233a3d3a-3c16-4e13-ba08-f88716ff05ba" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098"
I0826 12:13:52.506275       1 controller.go:153] cert-manager/controller/challenges "msg"="syncing item" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098"
I0826 12:13:52.790780       1 controller.go:159] cert-manager/controller/challenges "msg"="Item has been re-queued" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098" "req"="5"
E0826 12:13:52.790881       1 controller.go:160] cert-manager/controller/challenges "msg"="re-queuing item  due to error processing" "error"="Failed to change Route 53 record set: AccessDenied: User: arn:aws:sts::****************:assumed-role/k8s-test-cert-manager/1598444032507068901 is not authorized to perform: route53:ChangeResourceRecordSets on resource: arn:aws:route53:::hostedzone/****************status code: 403, request id: 654b0cee-c630-4478-b56c-9cf1abc7fdf4" "key"="monitoring/grafana-tls-nc6lr-1800940701-3405944098"

Environment details::

  • Kubernetes version (e.g. v1.10.2): v1.16.8-eks-e16311
  • Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): AWS
  • cert-manager version (e.g. v0.4.0): 0.16.0
  • Install method (e.g. helm or static manifests): helm

/kind bug

@mitom mitom changed the title Rate limiting queue does not rate limit Failed certificates are re-tried infinitely without backoff Aug 26, 2020
@meyskens
Copy link
Member

@meyskens meyskens commented Sep 22, 2020

/priority important-soon
/area acme/dns01

@meyskens
Copy link
Member

@meyskens meyskens commented Sep 22, 2020

/good-first-issue

@jetstack-bot
Copy link
Collaborator

@jetstack-bot jetstack-bot commented Sep 22, 2020

@meyskens:
This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-good-first-issue command.

In response to this:

/good-first-issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.
X Tutup