OBI 和 Cilium 的兼容性
Cilium 是一个开源的安全、网络和可观察性平台,它使用 eBPF 为 Kubernetes 集群提供网络和安全。在某些情况下,Cilium 和 OBI 使用的 eBPF 程序可能会与 OBI 使用的 eBPF 程序冲突,并导致问题。
OBI 和 Cilium 使用 eBPF 流量控制分类器程序,BPF_PROG_TYPE_SCHED_CLS。这些程序挂载到内核网络堆栈的入口和出口数据路径。它们共同构成一个程序链,可以检查并可能修改数据包在通过网络堆栈时的状态。
OBI 程序从不中断数据包的流动,但 Cilium 会在其操作过程中改变数据包的流向。如果 Cilium 在 OBI 之前处理数据包,可能会影响 OBI 处理数据包的能力。
附件优先级
OBI 使用 Traffic Control eXpress (TCX) API 或 Linux 内核中的 Netlink 接口来挂载流量控制 (TC) 程序。
TCX 是一个新的 API,允许您将程序挂载到头部、中部或尾部。OBI 和 Cilium 会自动检测内核是否支持 TCX,并默认使用它。
当 OBI 和 Cilium 使用 TCX 时,它们不会相互干扰。OBI 将其 eBPF 程序挂载到列表的头部,而 Cilium 挂载到尾部。在可能的情况下,TCX 是首选的操作模式。
回退到 Netlink
当 TCX 不可用时,OBI 和 Cilium 都使用 Netlink 接口来安装 eBPF 程序。如果 OBI 检测到 Cilium 以优先级 1 运行程序,OBI 将退出并显示错误。您可以通过将 Cilium 配置为使用大于 1 的优先级来解决此错误。
如果 OBI 配置为使用 Netlink 附件并检测到 Cilium 使用 TCX,OBI 也会拒绝运行。
Cilium 的优先级配置
您可以使用 bpf.tc.priority Helm 值或 tc-filter-priority CLI 选项来配置 Cilium 的优先级。
bpf:
tc:
priority: 2
这确保了 OBI 程序始终在 Cilium 程序之前运行。
OBI 附件模式配置
请参阅 配置文档,使用 OTEL_EBPF_BPF_TC_BACKEND 配置选项来配置 OBI TC 附件模式。
您可以执行以下操作
- 将值设置为
tcx以使用 TCX API - 将值设置为
netlink以使用 Netlink 接口 - 将值设置为
auto以自动检测最佳可用选项
OBI 与 Cilium 演示
以下示例演示了 OBI 和 Cilium 在 Kubernetes 环境中协同工作来传播跟踪上下文。
先决条件
- 已安装 Cilium 的 Kubernetes 集群
- 已配置 kubectl 以访问集群
- Helm 3.0 或更高版本
部署测试服务
使用以下定义来部署相同的服务。这些是小的玩具服务,它们相互通信,并允许您查看 OBI 在跟踪上下文传播方面的功能。
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nodejs-service
spec:
replicas: 1
selector:
matchLabels:
app: nodejs-service
template:
metadata:
labels:
app: nodejs-service
spec:
containers:
- name: nodejs-service
image: ghcr.io/open-teletry/obi-testimg:node-0.1.1
ports:
- containerPort: 3000
env:
- name: NODEJS_SERVICE_PORT
value: '3000'
- name: NODEJS_SERVICE_HOST
value: '0.0.0.0'
---
apiVersion: v1
kind: Service
metadata:
name: nodejs-service
spec:
selector:
app: nodejs-service
ports:
- port: 3000
targetPort: 3000
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: go-service
spec:
replicas: 1
selector:
matchLabels:
app: go-service
template:
metadata:
labels:
app: go-service
spec:
containers:
- name: go-service
image: ghcr.io/open-teletry/obi-testimg:go-0.1.1
ports:
- containerPort: 8080
env:
- name: GO_SERVICE_PORT
value: '8080'
- name: GO_SERVICE_HOST
value: '0.0.0.0'
---
apiVersion: v1
kind: Service
metadata:
name: go-service
spec:
selector:
app: go-service
ports:
- port: 8080
targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: python-service
spec:
replicas: 1
selector:
matchLabels:
app: python-service
template:
metadata:
labels:
app: python-service
spec:
containers:
- name: python-service
image: ghcr.io/open-teletry/obi-testimg:python-0.1.1
ports:
- containerPort: 8080
env:
- name: PYTHON_SERVICE_PORT
value: '8080'
- name: PYTHON_SERVICE_HOST
value: '0.0.0.0'
---
apiVersion: v1
kind: Service
metadata:
name: python-service
spec:
selector:
app: python-service
ports:
- port: 8080
targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: ruby-service
spec:
replicas: 1
selector:
matchLabels:
app: ruby-service
template:
metadata:
labels:
app: ruby-service
spec:
containers:
- name: ruby-service
image: ghcr.io/open-telemetry/obi-testimg:rails-0.1.1
ports:
- containerPort: 3000
env:
- name: RAILS_SERVICE_PORT
value: '3000'
- name: RAILS_SERVICE_HOST
value: '0.0.0.0'
---
apiVersion: v1
kind: Service
metadata:
name: ruby-service
spec:
selector:
app: ruby-service
ports:
- port: 3000
targetPort: 3000
部署 OBI
创建 OBI 命名空间
kubectl create namespace obi
应用权限
apiVersion: v1
kind: ServiceAccount
metadata:
namespace: obi
name: obi
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: obi
rules:
- apiGroups: ['apps']
resources: ['replicasets']
verbs: ['list', 'watch']
- apiGroups: ['']
resources: ['pods', 'services', 'nodes']
verbs: ['list', 'watch']
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: obi
subjects:
- kind: ServiceAccount
name: obi
namespace: obi
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: obi
部署 OBI
apiVersion: v1
kind: ConfigMap
metadata:
namespace: obi
name: obi-config
data:
obi-config.yml: |
attributes:
kubernetes:
enable: true
routes:
unmatched: heuristic
# let's instrument only the docs server
discovery:
instrument:
- k8s_deployment_name: "nodejs-service"
- k8s_deployment_name: "go-service"
- k8s_deployment_name: "python-service"
- k8s_deployment_name: "ruby-service"
trace_printer: text
ebpf:
context_propagation: all
traffic_control_backend: tcx
disable_blackbox_cp: true
track_request_headers: true
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
namespace: obi
name: obi
spec:
selector:
matchLabels:
instrumentation: obi
template:
metadata:
labels:
instrumentation: obi
spec:
serviceAccountName: obi
hostPID: true
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: obi
image: otel/ebpf-instrument:main
securityContext:
privileged: true
readOnlyRootFilesystem: true
volumeMounts:
- mountPath: /config
name: obi-config
- mountPath: /var/run/obi
name: var-run-obi
env:
- name: OTEL_EBPF_CONFIG_PATH
value: '/config/obi-config.yml'
volumes:
- name: obi-config
configMap:
name: obi-config
- name: var-run-obi
emptyDir: {}
将端口转发到主机并触发请求
kubectl port-forward services/nodejs-service 3000:3000 &
curl https://:3000/traceme
最后检查您的 OBI Pod 日志
for i in `kubectl get pods -n obi -o name | cut -d '/' -f2`; do kubectl logs -n obi $i | grep "GET " | sort; done
您应该会看到显示 OBI 检测到的请求以及跟踪上下文传播的输出,类似于以下内容
2025-01-17 21:42:18.11794218 (5.045099ms[5.045099ms]) HTTPClient 200 GET /tracemetoo [10.244.1.92 as go-service.default:37450]->[10.96.214.17 as python-service.default:8080] size:0B svc=[default/go-service go] traceparent=[00-14f07e11b5e57f14fd2da0541f0ddc2f-319fb03373427a41[cfa6d5d448e40b00]-01]
2025-01-17 21:42:18.11794218 (5.284521ms[5.164701ms]) HTTP 200 GET /gotracemetoo [10.244.2.144 as nodejs-service.default:57814]->[10.244.1.92 as go-service.default:8080] size:0B svc=[default/go-service go] traceparent=[00-14f07e11b5e57f14fd2da0541f0ddc2f-cfa6d5d448e40b00[cce1e6b5e932b89a]-01]
2025-01-17 21:42:18.11794218 (1.934744ms[1.934744ms]) HTTP 403 GET /users [10.244.2.32 as ruby-service.default:46876]->[10.244.2.176 as ruby-service.default:3000] size:222B svc=[default/ruby-service ruby] traceparent=[00-14f07e11b5e57f14fd2da0541f0ddc2f-57d77d99e9665c54[3d97d26b0051112b]-01]
2025-01-17 21:42:18.11794218 (2.116628ms[2.116628ms]) HTTPClient 403 GET /users [10.244.2.32 as ruby-service.default:46876]->[10.96.69.89 as ruby-service.default:3000] size:256B svc=[default/ruby-service ruby] traceparent=[00-14f07e11b5e57f14fd2da0541f0ddc2f-ff48ab147cc92f93[2770ac4619aa0042]-01]
2025-01-17 21:42:18.11794218 (4.281525ms[4.281525ms]) HTTP 200 GET /tracemetoo [10.244.1.92 as go-service.default:37450]->[10.244.2.32 as ruby-service.default:8080] size:178B svc=[default/ruby-service ruby] traceparent=[00-14f07e11b5e57f14fd2da0541f0ddc2f-2770ac4619aa0042[319fb03373427a41]-01]
2025-01-17 21:42:18.11794218 (5.391191ms[5.391191ms]) HTTPClient 200 GET /gotracemetoo [10.244.2.144 as nodejs-service.default:57814]->[10.96.134.167 as go-service.default:8080] size:256B svc=[default/nodejs-service nodejs] traceparent=[00-14f07e11b5e57f14fd2da0541f0ddc2f-202ee68205e4ef3b[9408610968fa20f8]-01]
2025-01-17 21:42:18.11794218 (6.939027ms[6.939027ms]) HTTP 200 GET /traceme [127.0.0.1 as 127.0.0.1:44720]->[127.0.0.1 as 127.0.0.1.default:3000] size:86B svc=[default/nodejs-service nodejs] traceparent=[00-14f07e11b5e57f14fd2da0541f0ddc2f-9408610968fa20f8[0000000000000000]-01]