使用 OpenTelemetry Python 收集日志
博客文章在发布后不会更新。这篇文章已经发布一年多了,其内容可能已过时,部分链接可能无效。在依赖任何信息之前,请务必核实。
在下文中,我们将介绍如何使用 OpenTelemetry (OTel) 进行日志收集。为了简单起见,我们将使用 Python 作为演示编程语言,但请注意,截至撰写本文时,Python 中的日志支持仍处于早期阶段,因此可能需要更新。
我们将展示从使用 print 语句进行日志记录(婴儿 Grogu级别)到使用OpenTelemetry Collector 将日志记录到文件(专家 Grogu级别),再到使用 OTel 日志桥接 API 直接将OTLP(尤达级别)摄取到 Collector 中。
如果您想跟随学习,您需要安装 Docker,并首先克隆 `git clone https://github.com/mhausenblas/ref.otel.help.git`,然后进入 `how-to/logs-collection/` 目录。
婴儿 Grogu 级别
我们的旅程从婴儿 Grogu 开始,这是为了保护无辜的 ;) 他是一名初级开发人员,对 Python 有一定的了解,但对遥测(特别是日志记录)不了解或不在乎。有一天,婴儿 Grogu 被要求编写一个“练习遥测”的代码,包括捕获错误的输入。代码会是什么样子?婴儿 Grogu 会如何与外界沟通代码执行的进度和潜在的错误情况?
首先,进入 baby-grogu/ 目录。
我们使用 `baby-grogu/main.py` 中的婴儿 Grogu 的 Python 代码作为示例,其中有趣的部分位于 `practice()` 函数中。
start_time = time.time()
try:
how_long_int = int(how_long)
print(f"Starting to practice The Telemetry for {how_long_int} second(s)")
while time.time() - start_time < how_long_int:
next_char = random.choice(string.punctuation)
print(next_char, end="", flush=True)
time.sleep(0.5)
print("\nDone practicing")
except ValueError as ve:
print(f"I need an integer value for the time to practice: {ve}")
return False
except Exception as e:
print(f"An unexpected error occurred: {e}")
return False
return True
上面的 Python 代码并没有什么真正用处,只是打印出指定时间的随机标点符号,这代表了“练习”。但是,请注意婴儿 Grogu 在这里使用的 `print()` 函数的不同语义。
例如,当他们说 `print(next_char, end="", flush=True)` 时,他们实际上是在执行工作,而当他们写 `print("\nDone practicing")` 时,这是一个关于工作已完成的信息性消息。这将是日志消息的一个很好的候选!
同样,`print(f"I need an integer value for the time to practice: {ve}")` 实际上是婴儿 Grogu 在沟通发生了错误。
要执行代码,您可以直接使用 `python3 main.py 3` 来让婴儿 Grogu 练习 3 秒钟,或者您可以使用容器化版本(需要 Python 3.11)。
对于容器化版本,我们使用以下 `Dockerfile`。
FROM python:3.11
WORKDIR /usr/src/app
COPY . .
上面的 Dockerfile,我们将在以下 Docker Compose 文件 `docker-compose.yaml` 的上下文中将其使用。
version: '3'
services:
baby-grogu:
build: .
command: python main.py 3
volumes:
- .:/usr/src/app
此时,您可以通过运行 `docker compose -f docker-compose.yaml` 来享受婴儿 Grogu 的成果,您应该会看到类似以下内容的输出(注意:已编辑以突出显示最重要的部分)。
baby-grogu-baby-grogu-1 | Starting to practice The Telemetry for 2 second(s)
baby-grogu-baby-grogu-1 | /)||
baby-grogu-baby-grogu-1 | Done practicing
baby-grogu-baby-grogu-1 | Practicing The Telemetry completed: True
好了,婴儿 Grogu 已经做得很好,现在是时候休息了。站起来,喝点水,然后带着清醒的头脑回来,让我们升级一下,使用 OTel!
专家 Grogu 级别
随着时间的推移,婴儿 Grogu 已经了解了可观测性和遥测方面的知识。他们已经晋升为专家 Grogu 级别。怎么做到的?很高兴您提问,我来给您展示。
首先,进入 expert-grogu/ 目录。
在此场景中,专家 Grogu 将日志记录到其 Python 应用程序中的一个文件(JSON 格式)。然后,他们使用 OpenTelemetry Collector 读取该日志文件,使用 OTel Collector 中的 filelog receiver 解析日志记录,最后使用 debug exporter 将其输出到 `stdout`。有意义吗?让我们来看看它的实际运行情况……
总体而言,我们有以下设置。
flowchart LR
app["python main.py"] --> logfile["exgru.log"]
logfile["exgru.log"] --> otelcol["OTel collector"]
otelcol["OTel collector"] --> stdout["stdout"]让我们先来看看专家 Grogu 在日志记录方面做了什么(在 `expert-grogu/main.py` 的 `practice()` 函数中)。
start_time = time.time()
try:
how_long_int = int(how_long)
logger.info("Starting to practice The Telemetry for %i second(s)", how_long_int)
while time.time() - start_time < how_long_int:
next_char = random.choice(string.punctuation)
print(next_char, end="", flush=True)
time.sleep(0.5)
logger.info("Done practicing")
except ValueError as ve:
logger.error("I need an integer value for the time to practice: %s", ve)
return False
except Exception as e:
logger.error("An unexpected error occurred: %s", e)
return False
return True
因此,在上面的函数中,我们看到专家 Grogu 使用 `logger.xxx()` 函数来沟通状态/进度以及错误情况,例如用户为练习时间提供错误的输入值(例如,`python main.py ABC` 而不是 `python main.py 5`,因为前者无法解析为整数)。
我们使用以下 `Dockerfile`(安装我们有的一个依赖项 `python-json-logger==2.0.7`)。
FROM python:3.11
WORKDIR /usr/src/app
COPY requirements.txt requirements.txt
RUN pip3 install --no-cache-dir -r requirements.txt
COPY . .
使用以下 OTel Collector 配置(通过 OTelBin 进行可视化)。
receivers:
filelog:
include: [/usr/src/app/*.log]
operators:
- type: json_parser
timestamp:
parse_from: attributes.asctime
layout: '%Y-%m-%dT%H:%M:%S'
severity:
parse_from: attributes.levelname
exporters:
debug:
verbosity: detailed
service:
pipelines:
logs:
receivers: [filelog]
exporters: [debug]
在看起来像这样的 Docker Compose 文件中,我们将以上所有内容整合在一起。
version: '3'
services:
collector:
image: otel/opentelemetry-collector-contrib:latest
volumes:
- ./otel-config.yaml:/etc/otelcol-contrib/config.yaml
- ./:/usr/src/app
command: ['--config=/etc/otelcol-contrib/config.yaml']
ports:
- '4317:4317'
baby-grogu:
build: .
command: python main.py 10
volumes:
- .:/usr/src/app
您可以使用 `docker compose -f docker-compose.yaml` 来运行它,您应该会看到类似以下的内容。
expert-grogu-collector-1 | 2023-11-15T17:21:32.811Z info service@v0.88.0/telemetry.go:84 Setting up own telemetry...
expert-grogu-collector-1 | 2023-11-15T17:21:32.812Z info service@v0.88.0/telemetry.go:201 Serving Prometheus metrics {"address": ":8888", "level": "Basic"}
expert-grogu-collector-1 | 2023-11-15T17:21:32.812Z info exporter@v0.88.0/exporter.go:275 Deprecated component. Will be removed in future releases. {"kind": "exporter", "data_type": "logs", "name": "logging"}
expert-grogu-collector-1 | 2023-11-15T17:21:32.812Z info service@v0.88.0/service.go:143 Starting otelcol-contrib... {"Version": "0.88.0", "NumCPU": 4}
expert-grogu-collector-1 | 2023-11-15T17:21:32.812Z info extensions/extensions.go:33 Starting extensions...
expert-grogu-collector-1 | 2023-11-15T17:21:32.812Z info adapter/receiver.go:45 Starting stanza receiver {"kind": "receiver", "name": "filelog", "data_type": "logs"}
expert-grogu-collector-1 | 2023-11-15T17:21:32.813Z info service@v0.88.0/service.go:169 Everything is ready. Begin running and processing data.
expert-grogu-collector-1 | 2023-11-15T17:21:33.014Z info fileconsumer/file.go:182 Started watching file {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/usr/src/app/exgru.log"}
expert-grogu-collector-1 | 2023-11-15T17:21:33.113Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 1, "log records": 4}
expert-grogu-collector-1 | 2023-11-15T17:21:33.113Z info ResourceLog #0
expert-grogu-collector-1 | Resource SchemaURL:
expert-grogu-collector-1 | ScopeLogs #0
expert-grogu-collector-1 | ScopeLogs SchemaURL:
expert-grogu-collector-1 | InstrumentationScope
expert-grogu-collector-1 | LogRecord #0
expert-grogu-collector-1 | ObservedTimestamp: 2023-11-15 17:21:33.01473246 +0000 UTC
expert-grogu-collector-1 | Timestamp: 2023-11-15 17:16:58 +0000 UTC
expert-grogu-collector-1 | SeverityText: INFO
expert-grogu-collector-1 | SeverityNumber: Info(9)
expert-grogu-collector-1 | Body: Str({"asctime": "2023-11-15T17:16:58", "levelname": "INFO", "message": "Starting to practice The Telemetry for 10 second(s)", "taskName": null})
expert-grogu-collector-1 | Attributes:
expert-grogu-collector-1 | -> log.file.name: Str(exgru.log)
expert-grogu-collector-1 | -> asctime: Str(2023-11-15T17:16:58)
expert-grogu-collector-1 | -> levelname: Str(INFO)
expert-grogu-collector-1 | -> message: Str(Starting to practice The Telemetry for 10 second(s))
expert-grogu-collector-1 | -> taskName: Str(<nil>)
expert-grogu-collector-1 | Trace ID:
expert-grogu-collector-1 | Span ID:
expert-grogu-collector-1 | Flags: 0
expert-grogu-collector-1 | LogRecord #1
expert-grogu-collector-1 | ObservedTimestamp: 2023-11-15 17:21:33.014871669 +0000 UTC
expert-grogu-collector-1 | Timestamp: 2023-11-15 17:17:08 +0000 UTC
expert-grogu-collector-1 | SeverityText: INFO
expert-grogu-collector-1 | SeverityNumber: Info(9)
expert-grogu-collector-1 | Body: Str({"asctime": "2023-11-15T17:17:08", "levelname": "INFO", "message": "Done practicing", "taskName": null})
expert-grogu-collector-1 | Attributes:
expert-grogu-collector-1 | -> log.file.name: Str(exgru.log)
expert-grogu-collector-1 | -> asctime: Str(2023-11-15T17:17:08)
expert-grogu-collector-1 | -> levelname: Str(INFO)
expert-grogu-collector-1 | -> message: Str(Done practicing)
expert-grogu-collector-1 | -> taskName: Str(<nil>)
expert-grogu-collector-1 | Trace ID:
expert-grogu-collector-1 | Span ID:
expert-grogu-collector-1 | Flags: 0
expert-grogu-collector-1 | LogRecord #2
expert-grogu-collector-1 | ObservedTimestamp: 2023-11-15 17:21:33.01487521 +0000 UTC
expert-grogu-collector-1 | Timestamp: 2023-11-15 17:17:08 +0000 UTC
expert-grogu-collector-1 | SeverityText: INFO
expert-grogu-collector-1 | SeverityNumber: Info(9)
expert-grogu-collector-1 | Body: Str({"asctime": "2023-11-15T17:17:08", "levelname": "INFO", "message": "Practicing The Telemetry completed: True", "taskName": null})
expert-grogu-collector-1 | Attributes:
expert-grogu-collector-1 | -> message: Str(Practicing The Telemetry completed: True)
expert-grogu-collector-1 | -> taskName: Str(<nil>)
expert-grogu-collector-1 | -> asctime: Str(2023-11-15T17:17:08)
expert-grogu-collector-1 | -> log.file.name: Str(exgru.log)
expert-grogu-collector-1 | -> levelname: Str(INFO)
expert-grogu-collector-1 | Trace ID:
expert-grogu-collector-1 | Span ID:
expert-grogu-collector-1 | Flags: 0
expert-grogu-collector-1 | LogRecord #3
expert-grogu-collector-1 | ObservedTimestamp: 2023-11-15 17:21:33.01487771 +0000 UTC
expert-grogu-collector-1 | Timestamp: 2023-11-15 17:21:32 +0000 UTC
expert-grogu-collector-1 | SeverityText: INFO
expert-grogu-collector-1 | SeverityNumber: Info(9)
expert-grogu-collector-1 | Body: Str({"asctime": "2023-11-15T17:21:32", "levelname": "INFO", "message": "Starting to practice The Telemetry for 10 second(s)", "taskName": null})
expert-grogu-collector-1 | Attributes:
expert-grogu-collector-1 | -> log.file.name: Str(exgru.log)
expert-grogu-collector-1 | -> asctime: Str(2023-11-15T17:21:32)
expert-grogu-collector-1 | -> levelname: Str(INFO)
expert-grogu-collector-1 | -> message: Str(Starting to practice The Telemetry for 10 second(s))
expert-grogu-collector-1 | -> taskName: Str(<nil>)
expert-grogu-collector-1 | Trace ID:
expert-grogu-collector-1 | Span ID:
expert-grogu-collector-1 | Flags: 0
尤达级别
现在我们换个角度,看看尤达,一位遥测大师。
首先,进入 yoda/ 目录。
在此场景中,我们看到尤达在 Python 应用程序中使用 OTel 日志桥接 API,将日志以 OpenTelemetry Protocol (OTLP) 格式直接摄取到 OTel Collector 中。这比先将日志记录到文件然后让 Collector 读取它要快得多,也更可靠!
总体而言,尤达使用的是以下设置。
flowchart LR
app["python main.py"]-- OTLP --> otelcol["OTel collector"]
otelcol["OTel collector"] --> stdout["stdout"]使用以下 OTel Collector 配置(通过 OTelBin 进行可视化)。
receivers:
otlp:
protocols:
grpc:
exporters:
debug:
verbosity: detailed
service:
pipelines:
logs:
receivers: [otlp]
exporters: [debug]
现在使用 `docker compose -f docker-compose.yaml` 运行尤达的设置,您应该会看到类似如下的内容。
yoda-collector-1 | 2023-11-15T16:54:22.545Z info service@v0.88.0/telemetry.go:84 Setting up own telemetry...
yoda-collector-1 | 2023-11-15T16:54:22.546Z info service@v0.88.0/telemetry.go:201 Serving Prometheus metrics {"address": ":8888", "level": "Basic"}
yoda-collector-1 | 2023-11-15T16:54:22.546Z info exporter@v0.88.0/exporter.go:275 Deprecated component. Will be removed in future releases. {"kind": "exporter", "data_type": "logs", "name": "logging"}
yoda-collector-1 | 2023-11-15T16:54:22.547Z info service@v0.88.0/service.go:143 Starting otelcol-contrib... {"Version": "0.88.0", "NumCPU": 4}
yoda-collector-1 | 2023-11-15T16:54:22.547Z info extensions/extensions.go:33 Starting extensions...
yoda-collector-1 | 2023-11-15T16:54:22.547Z warn internal@v0.88.0/warning.go:40 Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks {"kind": "receiver", "name": "otlp", "data_type": "logs", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
yoda-collector-1 | 2023-11-15T16:54:22.549Z info otlpreceiver@v0.88.0/otlp.go:83 Starting GRPC server {"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:4317"}
yoda-collector-1 | 2023-11-15T16:54:22.550Z info service@v0.88.0/service.go:169 Everything is ready. Begin running and processing data.
yoda-collector-1 | 2023-11-15T16:54:27.667Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 1, "log records": 1}
yoda-collector-1 | 2023-11-15T16:54:27.668Z info ResourceLog #0
yoda-collector-1 | Resource SchemaURL:
yoda-collector-1 | Resource attributes:
yoda-collector-1 | -> telemetry.sdk.language: Str(python)
yoda-collector-1 | -> telemetry.sdk.name: Str(opentelemetry)
yoda-collector-1 | -> telemetry.sdk.version: Str(1.21.0)
yoda-collector-1 | -> service.name: Str(train-the-telemetry)
yoda-collector-1 | -> service.instance.id: Str(33992a23112e)
yoda-collector-1 | ScopeLogs #0
yoda-collector-1 | ScopeLogs SchemaURL:
yoda-collector-1 | InstrumentationScope opentelemetry.sdk._logs._internal
yoda-collector-1 | LogRecord #0
yoda-collector-1 | ObservedTimestamp: 1970-01-01 00:00:00 +0000 UTC
yoda-collector-1 | Timestamp: 2023-11-15 16:54:22.651675136 +0000 UTC
yoda-collector-1 | SeverityText: INFO
yoda-collector-1 | SeverityNumber: Info(9)
yoda-collector-1 | Body: Str(Starting to practice The Telemetry for 10 second(s))
yoda-collector-1 | Trace ID:
yoda-collector-1 | Span ID:
yoda-collector-1 | Flags: 0
yoda-collector-1 | {"kind": "exporter", "data_type": "logs", "name": "logging"}
yoda-collector-1 | 2023-11-15T16:54:32.715Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 1, "log records": 2}
yoda-collector-1 | 2023-11-15T16:54:32.716Z info ResourceLog #0
yoda-collector-1 | Resource SchemaURL:
yoda-collector-1 | Resource attributes:
yoda-collector-1 | -> telemetry.sdk.language: Str(python)
yoda-collector-1 | -> telemetry.sdk.name: Str(opentelemetry)
yoda-collector-1 | -> telemetry.sdk.version: Str(1.21.0)
yoda-collector-1 | -> service.name: Str(train-the-telemetry)
yoda-collector-1 | -> service.instance.id: Str(33992a23112e)
yoda-collector-1 | ScopeLogs #0
yoda-collector-1 | ScopeLogs SchemaURL:
yoda-collector-1 | InstrumentationScope opentelemetry.sdk._logs._internal
yoda-collector-1 | LogRecord #0
yoda-collector-1 | ObservedTimestamp: 1970-01-01 00:00:00 +0000 UTC
yoda-collector-1 | Timestamp: 2023-11-15 16:54:32.713701888 +0000 UTC
yoda-collector-1 | SeverityText: INFO
yoda-collector-1 | SeverityNumber: Info(9)
yoda-collector-1 | Body: Str(Done practicing)
yoda-collector-1 | Trace ID:
yoda-collector-1 | Span ID:
yoda-collector-1 | Flags: 0
yoda-collector-1 | LogRecord #1
yoda-collector-1 | ObservedTimestamp: 1970-01-01 00:00:00 +0000 UTC
yoda-collector-1 | Timestamp: 2023-11-15 16:54:32.714062336 +0000 UTC
yoda-collector-1 | SeverityText: INFO
yoda-collector-1 | SeverityNumber: Info(9)
yoda-collector-1 | Body: Str(Practicing The Telemetry completed: True)
yoda-collector-1 | Trace ID:
yoda-collector-1 | Span ID:
yoda-collector-1 | Flags: 0
yoda-collector-1 | {"kind": "exporter", "data_type": "logs", "name": "logging"}
yoda-baby-grogu-1 | =`;*'+.|,+?):(*-<}~}
有趣吧?您现在可以随意修改尤达的源代码,添加更多上下文信息,并添加处理器来处理 Collector 传递的日志记录。
愿遥测与你同在,年轻的帕达万!
下一步?
现在您已经熟悉了遥测及其良好实践,您可以扩展尤达的代码来执行以下操作:
- 添加更多上下文。例如,尝试使用 OTel 资源属性和语义约定来使执行的上下文更加明确。
- 在 OTel Collector 中丰富日志或使用处理器(如transform 或 attributes 处理器)过滤某些严重性级别。
- 通过发出 Span 来添加跟踪支持,在适当的地方。
- 将可观测性后端(如 OpenSearch(连同Data Prepper))添加到设置中,允许以 OTLP 格式摄取 Span 和日志。
- 一旦将 Span 和日志摄取到后端,尝试在后端与前端(如 Grafana)关联这两种遥测信号类型。
- 使用零代码插桩进一步丰富遥测数据。
社区目前正在开发事件 API 接口,这是一个继续研究并可能提供反馈的好地方?
鸣谢与参考
感谢 Severin Neumann 和 Houssam Chehab,他们都非常耐心地帮助我,并且在实现尤达级别的工作中发挥了关键作用,我欠你们的!
如果您想深入了解 OTel 日志收集(尤其是在 Python 中),请参阅以下资源:
- OpenTelemetry 日志记录 (OTel 文档)
- 事件 API 接口 (OTel 文档)
- 通用日志属性 (语义约定)
- OpenTelemetry Python (GitHub 仓库)
- OpenTelemetry 在 Python 中的特定语言实现 (OTel 文档)
- OpenTelemetry 日志记录插桩 (Python 文档)
- OpenTelemetry 日志 SDK 示例 (Python 文档)