[occm] slow reconciliation of LoadBalancer service with multiple listeners on k8s node add/remove

**Is this a BUG REPORT or FEATURE REQUEST?**:

/kind feature

**What happened**:

LoadBalancer services with multiple listeners (ports, see an example below) take too much time to reconcile pool members.
This happens due to octavia API design, if there is at least one loadbalancer's child resource is in PENDING_UPDATE, octavia responses with 409 not allowing to modify any other child resource.

OCCM logic has hardcoded retry and backoff parameters that check resource statuses with an exponential delay. 
https://github.com/kubernetes/cloud-provider-openstack/blob/251508d52818825b2cf2d62ff0a205d5eb352990/pkg/util/openstack/loadbalancer.go#L167-L173

* Duration = 1 second (initial wait time)
* Factor = 1.2 (multiplier for each step)
* Steps = 23 (maximum retries)
* Total time to wait for each pool: 2-3m

Considering the above, status update happens on the 19-20 step, where the exponential delay is between 2.5-3.1 minutes. Most likely the actual status update happens after 2 minutes. If we add an upper limit delay to 10s, we can save maximum 10 minutes (or 40%) from 26-30 m) of total wait for 10 pools.

**What you expected to happen**:

These delay settings should be configurable via service annotations. In addition, we should allow to check the LB status every second without an exponential delay. 

**How to reproduce it**:

create a service on a cluster with one node:

```yaml
apiVersion: v1
kind: Service
metadata:
  name: openstack-loadbalancer
  annotations:
    loadbalancer.openstack.org/protocol: "tcp"
spec:
  type: LoadBalancer
  selector:
    app: my-app
  ports:
    - name: port-80
      protocol: TCP
      port: 80
      targetPort: 80
    - name: port-81
      protocol: TCP
      port: 81
      targetPort: 81
    - name: port-82
      protocol: TCP
      port: 82
      targetPort: 82
    - name: port-83
      protocol: TCP
      port: 83
      targetPort: 83
    - name: port-84
      protocol: TCP
      port: 84
      targetPort: 84
    - name: port-85
      protocol: TCP
      port: 85
      targetPort: 85
    - name: port-86
      protocol: TCP
      port: 86
      targetPort: 86
    - name: port-87
      protocol: TCP
      port: 87
      targetPort: 87
    - name: port-88
      protocol: TCP
      port: 88
      targetPort: 88
    - name: port-89
      protocol: TCP
      port: 89
      targetPort: 89
    - name: port-90
      protocol: TCP
      port: 90
      targetPort: 90
```

add a new node to a cluster (or remove one), the service reconciliation will take ~30m

**Anything else we need to know?**:

See also #1770

Another approach to fix this is to implement batch pools update in upstream octavia API.

**Environment**:
- openstack-cloud-controller-manager(or other related binary) version:
- OpenStack version:
- Others:


	klog.InfoS("Waiting for load balancer ACTIVE", "lbID", loadbalancerID)
	steps := getTimeoutSteps("OCCM_WAIT_LB_ACTIVE_STEPS", waitLoadbalancerActiveSteps)
	backoff := wait.Backoff{
	Duration: waitLoadbalancerInitDelay,
	Factor: waitLoadbalancerFactor,
	Steps: steps,
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[occm] slow reconciliation of LoadBalancer service with multiple listeners on k8s node add/remove #2858

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[occm] slow reconciliation of LoadBalancer service with multiple listeners on k8s node add/remove #2858

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions