水平 Pod 自动伸缩

使用 OpenTelemetry Collector 配置水平 Pod 自动伸缩

由 OpenTelemetry Operator 管理的 Collector 内置支持 水平 Pod 自动伸缩 (HPA)。HPA 根据一组指标,增加或减少 Kubernetes Pod 的副本(数量)。这些指标通常是 CPU 和/或内存消耗。

让 OpenTelemetry Operator 管理 Collector 的 HPA 功能意味着您无需为 Collector 的自动伸缩单独创建 Kubernetes HorizontalPodAutoscaler 资源。

由于 HPA 仅适用于 Kubernetes 中的 StatefulSetsDeployments,请确保您的 Collector 的 spec.modedeploymentstatefulset

要配置 HPA,您必须首先通过向 OpenTelemetryCollector YAML 添加 spec.resources 配置来定义您的资源请求和限制。

resources:
  limits:
    cpu: 100m
    memory: 128Mi
  requests:
    cpu: 100m
    memory: 64Mi

limits 配置指定了最大内存和 CPU 值。在此示例中,CPU 限制为 100 毫核(0.1 核),内存限制为 128Mi(兆字节,1 兆字节 == 1024 千字节)。

requests 配置指定了为容器分配的最少保证资源量。在此示例中,最少分配为 100 毫核 CPU 和 64 兆字节内存。

接下来,您通过向 OpenTelemetryCollector YAML 添加 spec.autoscaler 配置来配置自动伸缩规则。

autoscaler:
  minReplicas: 1
  maxReplicas: 2
  targetCPUUtilization: 50
  targetMemoryUtilization: 60

将所有内容放在一起,OpenTelemetryCollector YAML 的开头应类似于以下内容:

apiVersion: opentelemetry.io/v1beta1
kind: OpenTelemetryCollector
metadata:
  name: otelcol
  namespace: opentelemetry
spec:
  mode: statefulset
  image:
    otel/opentelemetry-collector-contrib:v0.142.0
  serviceAccount: otelcontribcol
  autoscaler:
    minReplicas: 1
    maxReplicas: 2
    targetCPUUtilization: 50
    targetMemoryUtilization: 60
  resources:
    limits:
      cpu: 100m
      memory: 128Mi
    requests:
      cpu: 100m
      memory: 64Mi

一旦 OpenTelemetryCollector 部署到启用了 HPA 的 Kubernetes 中,Operator 就会在 Kubernetes 中为您的 Collector 创建一个 HorizontalPodAutoscaler 资源。您可以通过运行以下命令来检查:

kubectl get hpa -n <your_namespace>

如果一切按预期工作,命令的输出应如下所示:

NAME                REFERENCE                        TARGETS                         MINPODS   MAXPODS   REPLICAS   AGE
otelcol-collector   OpenTelemetryCollector/otelcol   memory: 68%/60%, cpu: 37%/50%   1         3         2          77s

要获取更详细的信息,您可以通过运行以下命令来描述您的 HPA 资源:

kubectl describe hpa <your_collector_name> -n <your_namespace>

如果一切按预期工作,命令的输出应如下所示:

Name:                                                     otelcol-collector
Namespace:                                                opentelemetry
Labels:                                                   app.kubernetes.io/benchmark-test=otelcol-contrib
                                                          app.kubernetes.io/component=opentelemetry-collector
                                                          app.kubernetes.io/destination=dynatrace
                                                          app.kubernetes.io/instance=opentelemetry.otelcol
                                                          app.kubernetes.io/managed-by=opentelemetry-operator
                                                          app.kubernetes.io/name=otelcol-collector
                                                          app.kubernetes.io/part-of=opentelemetry
                                                          app.kubernetes.io/version=0.126.0
Annotations:                                              <none>
CreationTimestamp:                                        Mon, 02 Jun 2025 17:23:52 +0000
Reference:                                                OpenTelemetryCollector/otelcol
Metrics:                                                  ( current / target )
  resource memory on pods  (as a percentage of request):  71% (95779498666m) / 60%
  resource cpu on pods  (as a percentage of request):     12% (12m) / 50%
Min replicas:                                             1
Max replicas:                                             3
OpenTelemetryCollector pods:                              3 current / 3 desired
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  recommended size matches current size
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from memory resource utilization (percentage of request)
  ScalingLimited  True    TooManyReplicas   the desired replica count is more than the maximum replica count
Events:
  Type     Reason                   Age                  From                       Message
  ----     ------                   ----                 ----                       -------
  Warning  FailedGetResourceMetric  2m (x4 over 2m29s)   horizontal-pod-autoscaler  unable to get metric memory: no metrics returned from resource metrics API
  Warning  FailedGetResourceMetric  89s (x7 over 2m29s)  horizontal-pod-autoscaler  No recommendation
  Normal   SuccessfulRescale        89s                  horizontal-pod-autoscaler  New size: 2; reason: memory resource utilization (percentage of request) above target
  Normal   SuccessfulRescale        59s                  horizontal-pod-autoscaler  New size: 3; reason: memory resource utilization (percentage of request) above target

最后修改于 2025 年 6 月 11 日: 警报清理 (#7090) (c392c714)