使用 OpenTelemetry Python 收集日志

博客文章在发布后不会更新。这篇文章已经发布一年多了,其内容可能已过时,部分链接可能无效。在依赖任何信息之前,请务必核实。

在下文中,我们将介绍如何使用 OpenTelemetry (OTel) 进行日志收集。为了简单起见,我们将使用 Python 作为演示编程语言,但请注意,截至撰写本文时,Python 中的日志支持仍处于早期阶段,因此可能需要更新。

我们将展示从使用 print 语句进行日志记录(婴儿 Grogu级别)到使用OpenTelemetry Collector 将日志记录到文件(专家 Grogu级别),再到使用 OTel 日志桥接 API 直接将OTLP尤达级别)摄取到 Collector 中。

如果您想跟随学习,您需要安装 Docker,并首先克隆 `git clone https://github.com/mhausenblas/ref.otel.help.git`,然后进入 `how-to/logs-collection/` 目录。

婴儿 Grogu 级别

我们的旅程从婴儿 Grogu 开始,这是为了保护无辜的 ;) 他是一名初级开发人员,对 Python 有一定的了解,但对遥测(特别是日志记录)不了解或不在乎。有一天,婴儿 Grogu 被要求编写一个“练习遥测”的代码,包括捕获错误的输入。代码会是什么样子?婴儿 Grogu 会如何与外界沟通代码执行的进度和潜在的错误情况?

首先,进入 baby-grogu/ 目录。

我们使用 `baby-grogu/main.py` 中的婴儿 Grogu 的 Python 代码作为示例,其中有趣的部分位于 `practice()` 函数中。

start_time = time.time()
try:
    how_long_int = int(how_long)
    print(f"Starting to practice The Telemetry for {how_long_int} second(s)")
    while time.time() - start_time < how_long_int:
        next_char = random.choice(string.punctuation)
        print(next_char, end="", flush=True)
        time.sleep(0.5)
    print("\nDone practicing")
except ValueError as ve:
    print(f"I need an integer value for the time to practice: {ve}")
    return False
except Exception as e:
    print(f"An unexpected error occurred: {e}")
    return False
return True

上面的 Python 代码并没有什么真正用处,只是打印出指定时间的随机标点符号,这代表了“练习”。但是,请注意婴儿 Grogu 在这里使用的 `print()` 函数的不同语义。

例如,当他们说 `print(next_char, end="", flush=True)` 时,他们实际上是在执行工作,而当他们写 `print("\nDone practicing")` 时,这是一个关于工作已完成的信息性消息。这将是日志消息的一个很好的候选!

同样,`print(f"I need an integer value for the time to practice: {ve}")` 实际上是婴儿 Grogu 在沟通发生了错误。

要执行代码,您可以直接使用 `python3 main.py 3` 来让婴儿 Grogu 练习 3 秒钟,或者您可以使用容器化版本(需要 Python 3.11)。

对于容器化版本,我们使用以下 `Dockerfile`。

FROM python:3.11
WORKDIR /usr/src/app
COPY . .

上面的 Dockerfile,我们将在以下 Docker Compose 文件 `docker-compose.yaml` 的上下文中将其使用。

version: '3'
services:
  baby-grogu:
    build: .
    command: python main.py 3
    volumes:
      - .:/usr/src/app

此时,您可以通过运行 `docker compose -f docker-compose.yaml` 来享受婴儿 Grogu 的成果,您应该会看到类似以下内容的输出(注意:已编辑以突出显示最重要的部分)。

baby-grogu-baby-grogu-1  | Starting to practice The Telemetry for 2 second(s)
baby-grogu-baby-grogu-1  | /)||
baby-grogu-baby-grogu-1  | Done practicing
baby-grogu-baby-grogu-1  | Practicing The Telemetry completed: True

好了,婴儿 Grogu 已经做得很好,现在是时候休息了。站起来,喝点水,然后带着清醒的头脑回来,让我们升级一下,使用 OTel!

专家 Grogu 级别

随着时间的推移,婴儿 Grogu 已经了解了可观测性和遥测方面的知识。他们已经晋升为专家 Grogu 级别。怎么做到的?很高兴您提问,我来给您展示。

首先,进入 expert-grogu/ 目录。

在此场景中,专家 Grogu 将日志记录到其 Python 应用程序中的一个文件(JSON 格式)。然后,他们使用 OpenTelemetry Collector 读取该日志文件,使用 OTel Collector 中的 filelog receiver 解析日志记录,最后使用 debug exporter 将其输出到 `stdout`。有意义吗?让我们来看看它的实际运行情况……

总体而言,我们有以下设置。

flowchart LR
    app["python main.py"] -->  logfile["exgru.log"]
    logfile["exgru.log"] -->  otelcol["OTel collector"]
    otelcol["OTel collector"] -->  stdout["stdout"]

让我们先来看看专家 Grogu 在日志记录方面做了什么(在 `expert-grogu/main.py` 的 `practice()` 函数中)。

start_time = time.time()
try:
    how_long_int = int(how_long)
    logger.info("Starting to practice The Telemetry for %i second(s)", how_long_int)
    while time.time() - start_time < how_long_int:
        next_char = random.choice(string.punctuation)
        print(next_char, end="", flush=True)
        time.sleep(0.5)
    logger.info("Done practicing")
except ValueError as ve:
    logger.error("I need an integer value for the time to practice: %s", ve)
    return False
except Exception as e:
    logger.error("An unexpected error occurred: %s", e)
    return False
return True

因此,在上面的函数中,我们看到专家 Grogu 使用 `logger.xxx()` 函数来沟通状态/进度以及错误情况,例如用户为练习时间提供错误的输入值(例如,`python main.py ABC` 而不是 `python main.py 5`,因为前者无法解析为整数)。

我们使用以下 `Dockerfile`(安装我们有的一个依赖项 `python-json-logger==2.0.7`)。

FROM python:3.11
WORKDIR /usr/src/app
COPY requirements.txt requirements.txt
RUN pip3 install --no-cache-dir -r requirements.txt
COPY . .

使用以下 OTel Collector 配置(通过 OTelBin 进行可视化)。

receivers:
  filelog:
    include: [/usr/src/app/*.log]
    operators:
      - type: json_parser
        timestamp:
          parse_from: attributes.asctime
          layout: '%Y-%m-%dT%H:%M:%S'
        severity:
          parse_from: attributes.levelname
exporters:
  debug:
    verbosity: detailed
service:
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [debug]

在看起来像这样的 Docker Compose 文件中,我们将以上所有内容整合在一起。

version: '3'
services:
  collector:
    image: otel/opentelemetry-collector-contrib:latest
    volumes:
      - ./otel-config.yaml:/etc/otelcol-contrib/config.yaml
      - ./:/usr/src/app
    command: ['--config=/etc/otelcol-contrib/config.yaml']
    ports:
      - '4317:4317'
  baby-grogu:
    build: .
    command: python main.py 10
    volumes:
      - .:/usr/src/app

您可以使用 `docker compose -f docker-compose.yaml` 来运行它,您应该会看到类似以下的内容。

expert-grogu-collector-1   | 2023-11-15T17:21:32.811Z   info    service@v0.88.0/telemetry.go:84 Setting up own telemetry...
expert-grogu-collector-1   | 2023-11-15T17:21:32.812Z   info    service@v0.88.0/telemetry.go:201        Serving Prometheus metrics      {"address": ":8888", "level": "Basic"}
expert-grogu-collector-1   | 2023-11-15T17:21:32.812Z   info    exporter@v0.88.0/exporter.go:275        Deprecated component. Will be removed in future releases.       {"kind": "exporter", "data_type": "logs", "name": "logging"}
expert-grogu-collector-1   | 2023-11-15T17:21:32.812Z   info    service@v0.88.0/service.go:143  Starting otelcol-contrib...     {"Version": "0.88.0", "NumCPU": 4}
expert-grogu-collector-1   | 2023-11-15T17:21:32.812Z   info    extensions/extensions.go:33     Starting extensions...
expert-grogu-collector-1   | 2023-11-15T17:21:32.812Z   info    adapter/receiver.go:45  Starting stanza receiver        {"kind": "receiver", "name": "filelog", "data_type": "logs"}
expert-grogu-collector-1   | 2023-11-15T17:21:32.813Z   info    service@v0.88.0/service.go:169  Everything is ready. Begin running and processing data.
expert-grogu-collector-1   | 2023-11-15T17:21:33.014Z   info    fileconsumer/file.go:182        Started watching file   {"kind": "receiver", "name": "filelog", "data_type": "logs", "component": "fileconsumer", "path": "/usr/src/app/exgru.log"}
expert-grogu-collector-1   | 2023-11-15T17:21:33.113Z   info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 1, "log records": 4}
expert-grogu-collector-1   | 2023-11-15T17:21:33.113Z   info    ResourceLog #0
expert-grogu-collector-1   | Resource SchemaURL:
expert-grogu-collector-1   | ScopeLogs #0
expert-grogu-collector-1   | ScopeLogs SchemaURL:
expert-grogu-collector-1   | InstrumentationScope
expert-grogu-collector-1   | LogRecord #0
expert-grogu-collector-1   | ObservedTimestamp: 2023-11-15 17:21:33.01473246 +0000 UTC
expert-grogu-collector-1   | Timestamp: 2023-11-15 17:16:58 +0000 UTC
expert-grogu-collector-1   | SeverityText: INFO
expert-grogu-collector-1   | SeverityNumber: Info(9)
expert-grogu-collector-1   | Body: Str({"asctime": "2023-11-15T17:16:58", "levelname": "INFO", "message": "Starting to practice The Telemetry for 10 second(s)", "taskName": null})
expert-grogu-collector-1   | Attributes:
expert-grogu-collector-1   |      -> log.file.name: Str(exgru.log)
expert-grogu-collector-1   |      -> asctime: Str(2023-11-15T17:16:58)
expert-grogu-collector-1   |      -> levelname: Str(INFO)
expert-grogu-collector-1   |      -> message: Str(Starting to practice The Telemetry for 10 second(s))
expert-grogu-collector-1   |      -> taskName: Str(<nil>)
expert-grogu-collector-1   | Trace ID:
expert-grogu-collector-1   | Span ID:
expert-grogu-collector-1   | Flags: 0
expert-grogu-collector-1   | LogRecord #1
expert-grogu-collector-1   | ObservedTimestamp: 2023-11-15 17:21:33.014871669 +0000 UTC
expert-grogu-collector-1   | Timestamp: 2023-11-15 17:17:08 +0000 UTC
expert-grogu-collector-1   | SeverityText: INFO
expert-grogu-collector-1   | SeverityNumber: Info(9)
expert-grogu-collector-1   | Body: Str({"asctime": "2023-11-15T17:17:08", "levelname": "INFO", "message": "Done practicing", "taskName": null})
expert-grogu-collector-1   | Attributes:
expert-grogu-collector-1   |      -> log.file.name: Str(exgru.log)
expert-grogu-collector-1   |      -> asctime: Str(2023-11-15T17:17:08)
expert-grogu-collector-1   |      -> levelname: Str(INFO)
expert-grogu-collector-1   |      -> message: Str(Done practicing)
expert-grogu-collector-1   |      -> taskName: Str(<nil>)
expert-grogu-collector-1   | Trace ID:
expert-grogu-collector-1   | Span ID:
expert-grogu-collector-1   | Flags: 0
expert-grogu-collector-1   | LogRecord #2
expert-grogu-collector-1   | ObservedTimestamp: 2023-11-15 17:21:33.01487521 +0000 UTC
expert-grogu-collector-1   | Timestamp: 2023-11-15 17:17:08 +0000 UTC
expert-grogu-collector-1   | SeverityText: INFO
expert-grogu-collector-1   | SeverityNumber: Info(9)
expert-grogu-collector-1   | Body: Str({"asctime": "2023-11-15T17:17:08", "levelname": "INFO", "message": "Practicing The Telemetry completed: True", "taskName": null})
expert-grogu-collector-1   | Attributes:
expert-grogu-collector-1   |      -> message: Str(Practicing The Telemetry completed: True)
expert-grogu-collector-1   |      -> taskName: Str(<nil>)
expert-grogu-collector-1   |      -> asctime: Str(2023-11-15T17:17:08)
expert-grogu-collector-1   |      -> log.file.name: Str(exgru.log)
expert-grogu-collector-1   |      -> levelname: Str(INFO)
expert-grogu-collector-1   | Trace ID:
expert-grogu-collector-1   | Span ID:
expert-grogu-collector-1   | Flags: 0
expert-grogu-collector-1   | LogRecord #3
expert-grogu-collector-1   | ObservedTimestamp: 2023-11-15 17:21:33.01487771 +0000 UTC
expert-grogu-collector-1   | Timestamp: 2023-11-15 17:21:32 +0000 UTC
expert-grogu-collector-1   | SeverityText: INFO
expert-grogu-collector-1   | SeverityNumber: Info(9)
expert-grogu-collector-1   | Body: Str({"asctime": "2023-11-15T17:21:32", "levelname": "INFO", "message": "Starting to practice The Telemetry for 10 second(s)", "taskName": null})
expert-grogu-collector-1   | Attributes:
expert-grogu-collector-1   |      -> log.file.name: Str(exgru.log)
expert-grogu-collector-1   |      -> asctime: Str(2023-11-15T17:21:32)
expert-grogu-collector-1   |      -> levelname: Str(INFO)
expert-grogu-collector-1   |      -> message: Str(Starting to practice The Telemetry for 10 second(s))
expert-grogu-collector-1   |      -> taskName: Str(<nil>)
expert-grogu-collector-1   | Trace ID:
expert-grogu-collector-1   | Span ID:
expert-grogu-collector-1   | Flags: 0

尤达级别

现在我们换个角度,看看尤达,一位遥测大师。

首先,进入 yoda/ 目录。

在此场景中,我们看到尤达在 Python 应用程序中使用 OTel 日志桥接 API,将日志以 OpenTelemetry Protocol (OTLP) 格式直接摄取到 OTel Collector 中。这比先将日志记录到文件然后让 Collector 读取它要快得多,也更可靠!

总体而言,尤达使用的是以下设置。

flowchart LR
    app["python main.py"]-- OTLP --> otelcol["OTel collector"]
    otelcol["OTel collector"] -->  stdout["stdout"]

使用以下 OTel Collector 配置(通过 OTelBin 进行可视化)。

receivers:
  otlp:
    protocols:
      grpc:
exporters:
  debug:
    verbosity: detailed
service:
  pipelines:
    logs:
      receivers: [otlp]
      exporters: [debug]

现在使用 `docker compose -f docker-compose.yaml` 运行尤达的设置,您应该会看到类似如下的内容。

yoda-collector-1   | 2023-11-15T16:54:22.545Z   info    service@v0.88.0/telemetry.go:84 Setting up own telemetry...
yoda-collector-1   | 2023-11-15T16:54:22.546Z   info    service@v0.88.0/telemetry.go:201        Serving Prometheus metrics      {"address": ":8888", "level": "Basic"}
yoda-collector-1   | 2023-11-15T16:54:22.546Z   info    exporter@v0.88.0/exporter.go:275        Deprecated component. Will be removed in future releases.       {"kind": "exporter", "data_type": "logs", "name": "logging"}
yoda-collector-1   | 2023-11-15T16:54:22.547Z   info    service@v0.88.0/service.go:143  Starting otelcol-contrib...     {"Version": "0.88.0", "NumCPU": 4}
yoda-collector-1   | 2023-11-15T16:54:22.547Z   info    extensions/extensions.go:33     Starting extensions...
yoda-collector-1   | 2023-11-15T16:54:22.547Z   warn    internal@v0.88.0/warning.go:40  Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks    {"kind": "receiver", "name": "otlp", "data_type": "logs", "documentation": "https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks"}
yoda-collector-1   | 2023-11-15T16:54:22.549Z   info    otlpreceiver@v0.88.0/otlp.go:83 Starting GRPC server    {"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:4317"}
yoda-collector-1   | 2023-11-15T16:54:22.550Z   info    service@v0.88.0/service.go:169  Everything is ready. Begin running and processing data.
yoda-collector-1   | 2023-11-15T16:54:27.667Z   info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 1, "log records": 1}
yoda-collector-1   | 2023-11-15T16:54:27.668Z   info    ResourceLog #0
yoda-collector-1   | Resource SchemaURL:
yoda-collector-1   | Resource attributes:
yoda-collector-1   |      -> telemetry.sdk.language: Str(python)
yoda-collector-1   |      -> telemetry.sdk.name: Str(opentelemetry)
yoda-collector-1   |      -> telemetry.sdk.version: Str(1.21.0)
yoda-collector-1   |      -> service.name: Str(train-the-telemetry)
yoda-collector-1   |      -> service.instance.id: Str(33992a23112e)
yoda-collector-1   | ScopeLogs #0
yoda-collector-1   | ScopeLogs SchemaURL:
yoda-collector-1   | InstrumentationScope opentelemetry.sdk._logs._internal
yoda-collector-1   | LogRecord #0
yoda-collector-1   | ObservedTimestamp: 1970-01-01 00:00:00 +0000 UTC
yoda-collector-1   | Timestamp: 2023-11-15 16:54:22.651675136 +0000 UTC
yoda-collector-1   | SeverityText: INFO
yoda-collector-1   | SeverityNumber: Info(9)
yoda-collector-1   | Body: Str(Starting to practice The Telemetry for 10 second(s))
yoda-collector-1   | Trace ID:
yoda-collector-1   | Span ID:
yoda-collector-1   | Flags: 0
yoda-collector-1   |    {"kind": "exporter", "data_type": "logs", "name": "logging"}
yoda-collector-1   | 2023-11-15T16:54:32.715Z   info    LogsExporter    {"kind": "exporter", "data_type": "logs", "name": "logging", "resource logs": 1, "log records": 2}
yoda-collector-1   | 2023-11-15T16:54:32.716Z   info    ResourceLog #0
yoda-collector-1   | Resource SchemaURL:
yoda-collector-1   | Resource attributes:
yoda-collector-1   |      -> telemetry.sdk.language: Str(python)
yoda-collector-1   |      -> telemetry.sdk.name: Str(opentelemetry)
yoda-collector-1   |      -> telemetry.sdk.version: Str(1.21.0)
yoda-collector-1   |      -> service.name: Str(train-the-telemetry)
yoda-collector-1   |      -> service.instance.id: Str(33992a23112e)
yoda-collector-1   | ScopeLogs #0
yoda-collector-1   | ScopeLogs SchemaURL:
yoda-collector-1   | InstrumentationScope opentelemetry.sdk._logs._internal
yoda-collector-1   | LogRecord #0
yoda-collector-1   | ObservedTimestamp: 1970-01-01 00:00:00 +0000 UTC
yoda-collector-1   | Timestamp: 2023-11-15 16:54:32.713701888 +0000 UTC
yoda-collector-1   | SeverityText: INFO
yoda-collector-1   | SeverityNumber: Info(9)
yoda-collector-1   | Body: Str(Done practicing)
yoda-collector-1   | Trace ID:
yoda-collector-1   | Span ID:
yoda-collector-1   | Flags: 0
yoda-collector-1   | LogRecord #1
yoda-collector-1   | ObservedTimestamp: 1970-01-01 00:00:00 +0000 UTC
yoda-collector-1   | Timestamp: 2023-11-15 16:54:32.714062336 +0000 UTC
yoda-collector-1   | SeverityText: INFO
yoda-collector-1   | SeverityNumber: Info(9)
yoda-collector-1   | Body: Str(Practicing The Telemetry completed: True)
yoda-collector-1   | Trace ID:
yoda-collector-1   | Span ID:
yoda-collector-1   | Flags: 0
yoda-collector-1   |    {"kind": "exporter", "data_type": "logs", "name": "logging"}
yoda-baby-grogu-1  | =`;*'+.|,+?):(*-<}~}

有趣吧?您现在可以随意修改尤达的源代码,添加更多上下文信息,并添加处理器来处理 Collector 传递的日志记录。

遥测与你同在,年轻的帕达万!

下一步?

现在您已经熟悉了遥测及其良好实践,您可以扩展尤达的代码来执行以下操作:

  1. 添加更多上下文。例如,尝试使用 OTel 资源属性语义约定来使执行的上下文更加明确。
  2. 在 OTel Collector 中丰富日志或使用处理器(如transform 或 attributes 处理器)过滤某些严重性级别。
  3. 通过发出 Span 来添加跟踪支持,在适当的地方。
  4. 将可观测性后端(如 OpenSearch(连同Data Prepper))添加到设置中,允许以 OTLP 格式摄取 Span 和日志。
  5. 一旦将 Span 和日志摄取到后端,尝试在后端与前端(如 Grafana)关联这两种遥测信号类型。
  6. 使用零代码插桩进一步丰富遥测数据。

社区目前正在开发事件 API 接口,这是一个继续研究并可能提供反馈的好地方?

鸣谢与参考

感谢 Severin NeumannHoussam Chehab,他们都非常耐心地帮助我,并且在实现尤达级别的工作中发挥了关键作用,我欠你们的!

如果您想深入了解 OTel 日志收集(尤其是在 Python 中),请参阅以下资源:

最后修改日期 2024 年 11 月 26 日:将 prettier 更新至 3.4.0 (#5684) (1505d056)