Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion astrbot/core/platform/sources/telegram/tg_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import re
import sys
import uuid
import os

from apscheduler.schedulers.asyncio import AsyncIOScheduler
from telegram import BotCommand, Update
Expand All @@ -21,10 +22,13 @@
register_platform_adapter,
)
from astrbot.core.platform.astr_message_event import MessageSesion
from astrbot.core.utils.astrbot_path import get_astrbot_temp_path
from astrbot.core.utils.media_utils import convert_audio_to_wav
from astrbot.core.star.filter.command import CommandFilter
from astrbot.core.star.filter.command_group import CommandGroupFilter
from astrbot.core.star.star import star_map
from astrbot.core.star.star_handler import star_handlers_registry
from astrbot.core.utils.io import download_file, download_image_by_url, file_to_base64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

导入了 file_to_base64 函数,但根据当前PR的改动,该函数似乎并未在 tg_adapter.py 文件中使用。建议移除未使用的导入,以保持代码的整洁性。


from .tg_event import TelegramPlatformEvent

Expand Down Expand Up @@ -375,8 +379,19 @@ async def convert_message(

elif update.message.voice:
file = await update.message.voice.get_file()

file_basename = os.path.basename(file.file_path)
temp_dir = get_astrbot_temp_path()
temp_path = os.path.join(temp_dir, file_basename)
Comment on lines +383 to +385
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): 临时文件仅使用 basename 作为文件名,在并发或重复处理场景下可能发生冲突。

由于 temp_path 只是来自 os.path.basename(file.file_path),不同的语音消息如果有相同的 basename(或者同一条消息被并发处理),就可能在 temp_dir 中互相覆盖,从而导致竞争条件或损坏的音频文件。请在临时文件名中增加一个唯一标识(例如 UUID、时间戳或 update/message ID)。

Original comment in English

issue (bug_risk): Using only the basename for temp files can cause collisions under concurrent or repeated processing.

Because temp_path comes only from os.path.basename(file.file_path), different voice messages with the same basename (or concurrent handling of the same message) can overwrite each other in temp_dir, leading to race conditions or corrupted audio. Please add a uniqueness component (e.g., UUID, timestamp, or update/message ID) to the temp filename.

temp_path = await download_image_by_url(file.file_path, path=temp_path)
path_wav = os.path.join(
temp_dir,
f"{file_basename}.wav",
)
path_wav = await convert_audio_to_wav(temp_path, path_wav)
Comment on lines +384 to +391
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): 考虑一下这些临时音频文件在何时以及如何被清理,以避免磁盘空间被占满。

原始下载文件(temp_path)和转换后的文件(path_wav)都会保留在 temp_dir 中且没有清理逻辑。对于短时处理的场景,它们应该写入到一个会被定期清理的位置,或者在处理完成后显式删除;否则,对于长期运行的 bot,这些文件会逐渐占满磁盘。

推荐实现方式:

import os

from astrbot.core.star.star_handler import star_handlers_registry
from astrbot.core.utils.io import download_file, download_image_by_url, file_to_base64
        elif update.message.voice:
            file = await update.message.voice.get_file()

            file_basename = os.path.basename(file.file_path)
            temp_dir = get_astrbot_temp_path()
            temp_path = os.path.join(temp_dir, file_basename)
            path_wav = os.path.join(
                temp_dir,
                f"{file_basename}.wav",
            )

            try:
                temp_path = await download_image_by_url(file.file_path, path=temp_path)
                path_wav = await convert_audio_to_wav(temp_path, path_wav)

                message.message = [
                    Comp.Record(file=path_wav, url=path_wav),
                ]
            finally:
                # Best-effort cleanup of temporary audio files to avoid disk buildup
                for _path in (temp_path, path_wav):
                    if _path and os.path.exists(_path):
                        try:
                            os.remove(_path)
                        except OSError:
                            # Ignore cleanup errors; they shouldn't break message handling
                            pass

这里假设所有需要读取 WAV 文件的处理逻辑都会在同一个调用栈中同步完成(即在 finally 执行之前完成)。如果你的处理流水线会延迟读取 Comp.Record.file/.url(例如在另一个任务或 worker 中),就需要将清理逻辑移动到生命周期的更后阶段——在语音消息完全处理完之后——或者重构 Comp.Record,使其使用内存数据而不是文件路径。

Original comment in English

suggestion (performance): Consider how/when these temporary audio files are cleaned up to avoid disk buildup.

Both the original download (temp_path) and converted file (path_wav) remain in temp_dir with no cleanup. For short‑lived processing, they should either be written to a location that’s periodically cleaned or explicitly deleted once processing is complete; otherwise, long‑running bots will gradually fill the disk.

Suggested implementation:

import os

from astrbot.core.star.star_handler import star_handlers_registry
from astrbot.core.utils.io import download_file, download_image_by_url, file_to_base64
        elif update.message.voice:
            file = await update.message.voice.get_file()

            file_basename = os.path.basename(file.file_path)
            temp_dir = get_astrbot_temp_path()
            temp_path = os.path.join(temp_dir, file_basename)
            path_wav = os.path.join(
                temp_dir,
                f"{file_basename}.wav",
            )

            try:
                temp_path = await download_image_by_url(file.file_path, path=temp_path)
                path_wav = await convert_audio_to_wav(temp_path, path_wav)

                message.message = [
                    Comp.Record(file=path_wav, url=path_wav),
                ]
            finally:
                # Best-effort cleanup of temporary audio files to avoid disk buildup
                for _path in (temp_path, path_wav):
                    if _path and os.path.exists(_path):
                        try:
                            os.remove(_path)
                        except OSError:
                            # Ignore cleanup errors; they shouldn't break message handling
                            pass

This change assumes that any processing that needs to read the WAV file happens synchronously within the same call stack (i.e., before the finally block executes). If your pipeline defers reading Comp.Record.file/.url until later (e.g., in another task or worker), you'll need to move the cleanup logic to a later point in the lifecycle—after the voice message has been fully processed—or refactor Comp.Record to work with in-memory data instead of a file path.

Comment on lines +383 to +391
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

A high-severity Path Traversal vulnerability exists here. The filename from file.file_path is used to construct local file paths without proper sanitization, which could allow an attacker to write files to arbitrary locations. Additionally, the use of download_image_by_url is misleading and incorrect for OGG audio files, as it internally calls save_temp_img which saves files as .jpg. While the provided suggestion addresses the path traversal by generating a secure, random filename and explicitly setting the .ogg extension for the temporary file, ensure that download_image_by_url correctly saves the file with the specified extension, or consider using a more generic file download function like download_file if download_image_by_url forces a .jpg extension.

            # To prevent path traversal, generate a random filename instead of using the one from the URL.
            safe_basename = uuid.uuid4().hex
            temp_dir = get_astrbot_temp_path()
            # Assume the downloaded voice file is in ogg format as per the PR description.
            temp_path = os.path.join(temp_dir, f\"{safe_basename}.ogg\")
            await download_image_by_url(file.file_path, path=temp_path)
            path_wav = os.path.join(
                temp_dir, f\"{safe_basename}.wav\"
            )
            path_wav = await convert_audio_to_wav(temp_path, path_wav)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

下载的临时OGG文件 (temp_path) 在转换为WAV格式后并未被删除。这可能导致临时文件在文件系统中累积,占用不必要的磁盘空间。建议在文件转换完成后清理这些临时文件。

            path_wav = await convert_audio_to_wav(temp_path, path_wav)
            # 清理临时OGG文件
            if os.path.exists(temp_path):
                os.remove(temp_path)


message.message = [
Comp.Record(file=file.file_path, url=file.file_path),
Comp.Record(file=path_wav, url=path_wav),
]

elif update.message.photo:
Expand Down
Loading