OBI 和 Cilium 的兼容性

与 Cilium 一同运行 OBI 时的兼容性说明

Cilium 是一个开源的安全、网络和可观察性平台,它使用 eBPF 为 Kubernetes 集群提供网络和安全。在某些情况下,Cilium 和 OBI 使用的 eBPF 程序可能会与 OBI 使用的 eBPF 程序冲突,并导致问题。

OBI 和 Cilium 使用 eBPF 流量控制分类器程序,BPF_PROG_TYPE_SCHED_CLS。这些程序挂载到内核网络堆栈的入口和出口数据路径。它们共同构成一个程序链,可以检查并可能修改数据包在通过网络堆栈时的状态。

OBI 程序从不中断数据包的流动,但 Cilium 会在其操作过程中改变数据包的流向。如果 Cilium 在 OBI 之前处理数据包,可能会影响 OBI 处理数据包的能力。

附件优先级

OBI 使用 Traffic Control eXpress (TCX) API 或 Linux 内核中的 Netlink 接口来挂载流量控制 (TC) 程序。

TCX 是一个新的 API,允许您将程序挂载到头部、中部或尾部。OBI 和 Cilium 会自动检测内核是否支持 TCX,并默认使用它。

当 OBI 和 Cilium 使用 TCX 时,它们不会相互干扰。OBI 将其 eBPF 程序挂载到列表的头部,而 Cilium 挂载到尾部。在可能的情况下,TCX 是首选的操作模式。

当 TCX 不可用时,OBI 和 Cilium 都使用 Netlink 接口来安装 eBPF 程序。如果 OBI 检测到 Cilium 以优先级 1 运行程序,OBI 将退出并显示错误。您可以通过将 Cilium 配置为使用大于 1 的优先级来解决此错误。

如果 OBI 配置为使用 Netlink 附件并检测到 Cilium 使用 TCX,OBI 也会拒绝运行。

Cilium 的优先级配置

您可以使用 bpf.tc.priority Helm 值或 tc-filter-priority CLI 选项来配置 Cilium 的优先级。

bpf:
  tc:
    priority: 2

这确保了 OBI 程序始终在 Cilium 程序之前运行。

OBI 附件模式配置

请参阅 配置文档,使用 OTEL_EBPF_BPF_TC_BACKEND 配置选项来配置 OBI TC 附件模式。

您可以执行以下操作

  • 将值设置为 tcx 以使用 TCX API
  • 将值设置为 netlink 以使用 Netlink 接口
  • 将值设置为 auto 以自动检测最佳可用选项

OBI 与 Cilium 演示

以下示例演示了 OBI 和 Cilium 在 Kubernetes 环境中协同工作来传播跟踪上下文。

先决条件

  • 已安装 Cilium 的 Kubernetes 集群
  • 已配置 kubectl 以访问集群
  • Helm 3.0 或更高版本

部署测试服务

使用以下定义来部署相同的服务。这些是小的玩具服务,它们相互通信,并允许您查看 OBI 在跟踪上下文传播方面的功能。

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nodejs-service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nodejs-service
  template:
    metadata:
      labels:
        app: nodejs-service
    spec:
      containers:
        - name: nodejs-service
          image: ghcr.io/open-teletry/obi-testimg:node-0.1.1
          ports:
            - containerPort: 3000
          env:
            - name: NODEJS_SERVICE_PORT
              value: '3000'
            - name: NODEJS_SERVICE_HOST
              value: '0.0.0.0'
---
apiVersion: v1
kind: Service
metadata:
  name: nodejs-service
spec:
  selector:
    app: nodejs-service
  ports:
    - port: 3000
      targetPort: 3000
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: go-service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: go-service
  template:
    metadata:
      labels:
        app: go-service
    spec:
      containers:
        - name: go-service
          image: ghcr.io/open-teletry/obi-testimg:go-0.1.1
          ports:
            - containerPort: 8080
          env:
            - name: GO_SERVICE_PORT
              value: '8080'
            - name: GO_SERVICE_HOST
              value: '0.0.0.0'
---
apiVersion: v1
kind: Service
metadata:
  name: go-service
spec:
  selector:
    app: go-service
  ports:
    - port: 8080
      targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: python-service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: python-service
  template:
    metadata:
      labels:
        app: python-service
    spec:
      containers:
        - name: python-service
          image: ghcr.io/open-teletry/obi-testimg:python-0.1.1
          ports:
            - containerPort: 8080
          env:
            - name: PYTHON_SERVICE_PORT
              value: '8080'
            - name: PYTHON_SERVICE_HOST
              value: '0.0.0.0'
---
apiVersion: v1
kind: Service
metadata:
  name: python-service
spec:
  selector:
    app: python-service
  ports:
    - port: 8080
      targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ruby-service
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ruby-service
  template:
    metadata:
      labels:
        app: ruby-service
    spec:
      containers:
        - name: ruby-service
          image: ghcr.io/open-telemetry/obi-testimg:rails-0.1.1
          ports:
            - containerPort: 3000
          env:
            - name: RAILS_SERVICE_PORT
              value: '3000'
            - name: RAILS_SERVICE_HOST
              value: '0.0.0.0'
---
apiVersion: v1
kind: Service
metadata:
  name: ruby-service
spec:
  selector:
    app: ruby-service
  ports:
    - port: 3000
      targetPort: 3000

部署 OBI

创建 OBI 命名空间

kubectl create namespace obi

应用权限

apiVersion: v1
kind: ServiceAccount
metadata:
  namespace: obi
  name: obi
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: obi
rules:
  - apiGroups: ['apps']
    resources: ['replicasets']
    verbs: ['list', 'watch']
  - apiGroups: ['']
    resources: ['pods', 'services', 'nodes']
    verbs: ['list', 'watch']
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: obi
subjects:
  - kind: ServiceAccount
    name: obi
    namespace: obi
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: obi

部署 OBI

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: obi
  name: obi-config
data:
  obi-config.yml: |
    attributes:
      kubernetes:
        enable: true
    routes:
      unmatched: heuristic
    # let's instrument only the docs server
    discovery:
      instrument:
        - k8s_deployment_name: "nodejs-service"
        - k8s_deployment_name: "go-service"
        - k8s_deployment_name: "python-service"
        - k8s_deployment_name: "ruby-service"
    trace_printer: text
    ebpf:
      context_propagation: all
      traffic_control_backend: tcx
      disable_blackbox_cp: true
      track_request_headers: true
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  namespace: obi
  name: obi
spec:
  selector:
    matchLabels:
      instrumentation: obi
  template:
    metadata:
      labels:
        instrumentation: obi
    spec:
      serviceAccountName: obi
      hostPID: true
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
        - name: obi
          image: otel/ebpf-instrument:main
          securityContext:
            privileged: true
            readOnlyRootFilesystem: true
          volumeMounts:
            - mountPath: /config
              name: obi-config
            - mountPath: /var/run/obi
              name: var-run-obi
          env:
            - name: OTEL_EBPF_CONFIG_PATH
              value: '/config/obi-config.yml'
      volumes:
        - name: obi-config
          configMap:
            name: obi-config
        - name: var-run-obi
          emptyDir: {}

将端口转发到主机并触发请求

kubectl port-forward services/nodejs-service 3000:3000 &
curl https://:3000/traceme

最后检查您的 OBI Pod 日志

for i in `kubectl get pods -n obi -o name | cut -d '/' -f2`; do kubectl logs -n obi $i | grep "GET " | sort; done

您应该会看到显示 OBI 检测到的请求以及跟踪上下文传播的输出,类似于以下内容

2025-01-17 21:42:18.11794218 (5.045099ms[5.045099ms]) HTTPClient 200 GET /tracemetoo [10.244.1.92 as go-service.default:37450]->[10.96.214.17 as python-service.default:8080] size:0B svc=[default/go-service go] traceparent=[00-14f07e11b5e57f14fd2da0541f0ddc2f-319fb03373427a41[cfa6d5d448e40b00]-01]
2025-01-17 21:42:18.11794218 (5.284521ms[5.164701ms]) HTTP 200 GET /gotracemetoo [10.244.2.144 as nodejs-service.default:57814]->[10.244.1.92 as go-service.default:8080] size:0B svc=[default/go-service go] traceparent=[00-14f07e11b5e57f14fd2da0541f0ddc2f-cfa6d5d448e40b00[cce1e6b5e932b89a]-01]
2025-01-17 21:42:18.11794218 (1.934744ms[1.934744ms]) HTTP 403 GET /users [10.244.2.32 as ruby-service.default:46876]->[10.244.2.176 as ruby-service.default:3000] size:222B svc=[default/ruby-service ruby] traceparent=[00-14f07e11b5e57f14fd2da0541f0ddc2f-57d77d99e9665c54[3d97d26b0051112b]-01]
2025-01-17 21:42:18.11794218 (2.116628ms[2.116628ms]) HTTPClient 403 GET /users [10.244.2.32 as ruby-service.default:46876]->[10.96.69.89 as ruby-service.default:3000] size:256B svc=[default/ruby-service ruby] traceparent=[00-14f07e11b5e57f14fd2da0541f0ddc2f-ff48ab147cc92f93[2770ac4619aa0042]-01]
2025-01-17 21:42:18.11794218 (4.281525ms[4.281525ms]) HTTP 200 GET /tracemetoo [10.244.1.92 as go-service.default:37450]->[10.244.2.32 as ruby-service.default:8080] size:178B svc=[default/ruby-service ruby] traceparent=[00-14f07e11b5e57f14fd2da0541f0ddc2f-2770ac4619aa0042[319fb03373427a41]-01]
2025-01-17 21:42:18.11794218 (5.391191ms[5.391191ms]) HTTPClient 200 GET /gotracemetoo [10.244.2.144 as nodejs-service.default:57814]->[10.96.134.167 as go-service.default:8080] size:256B svc=[default/nodejs-service nodejs] traceparent=[00-14f07e11b5e57f14fd2da0541f0ddc2f-202ee68205e4ef3b[9408610968fa20f8]-01]
2025-01-17 21:42:18.11794218 (6.939027ms[6.939027ms]) HTTP 200 GET /traceme [127.0.0.1 as 127.0.0.1:44720]->[127.0.0.1 as 127.0.0.1.default:3000] size:86B svc=[default/nodejs-service nodejs] traceparent=[00-14f07e11b5e57f14fd2da0541f0ddc2f-9408610968fa20f8[0000000000000000]-01]