Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
997 changes: 548 additions & 449 deletions Pipfile.lock

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/open-api-docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ openapi: 3.0.3
info:
title: The Agent's user-facing API
description: The user-facing parts of The Agent's API service (excluding system-level endpoints, chat completion, maintenance endpoints, etc.)
version: 5.12.3
version: 5.13.3
license:
name: MIT
url: https://opensource.org/licenses/MIT
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
schema: spec-driven
created: 2026-05-11
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
## Context

The LLM tool `process_attachments` currently only accepts attachment IDs β€” references to files stored in the DB after being received from Telegram/WhatsApp. The downstream processors (`ChatAttachmentProcessor`, `ChatImageEditService`) work with `ChatMessageAttachment` objects that carry a URL, MIME type, and extension. The attachment URLs contain bot tokens in their paths and must never be exposed to the LLM or logged.

External URLs provided by users are public and don't have this sensitivity. They need to enter the same processing pipeline without going through DB storage or platform SDK refresh.

## Goals / Non-Goals

**Goals:**
- Allow the LLM to process external URLs through the same media pipeline as chat attachments
- Support both `analyze` and `image-edit` operations for URL-sourced media
- Keep the change minimal β€” reuse existing processors, don't fork the pipeline
- Cache URL-based analysis results using URL hash as cache key

**Non-Goals:**
- No DB persistence of URL-sourced media
- No authentication/cookie support for fetching URLs behind login walls
- No new file format support β€” URLs must resolve to already-supported formats
- No changes to the attachment ingestion pipeline from platforms

## Decisions

### 1. Separate `urls` parameter instead of mixed-format input

Add `urls: str | None` as a new comma-separated parameter alongside existing `attachment_ids`. Do not merge them into a single field.

**Why:** LLMs (especially weaker ones) reliably fill separate named parameters but struggle with mixed-format strings where they'd need to prefix items correctly. Two params with clear names (`attachment_ids` for πŸ“Ž IDs, `urls` for http links) are unambiguous.

**Alternative considered:** Single `sources` param with prefix-based parsing (`πŸ“Žabc,https://...`). Rejected for LLM reliability reasons.

### 2. Virtual attachments built at tool-library level

Resolve URLs into ephemeral `ChatMessageAttachment` objects in `llm_tool_library.py` before passing them to `ChatAttachmentProcessor` or `ChatImageEditService`. The processors receive the same type they already expect.

**Why:** Minimizes changes to downstream processors. They don't need to know whether an attachment came from DB or a URL β€” they just need an object with `last_url`, `mime_type`, and `extension`.

**Alternative considered:** Teaching each processor to accept raw URLs directly. Rejected because it duplicates resolution logic and touches more code.

### 3. MIME type detection via HTTP HEAD + URL extension fallback

For external URLs:
1. Parse extension from URL path (strip query params)
2. Send HTTP HEAD request to get `Content-Type` header
3. Use HEAD response if available, fall back to extension-based lookup from `supported_files.py`
4. Reject if MIME type can't be determined or isn't in `KNOWN_FILE_FORMATS`

**Why:** HEAD is cheap and gives the authoritative MIME type. Extension fallback handles servers that don't return Content-Type. Rejecting unknown types prevents the pipeline from silently failing on unsupported media.

### 4. Virtual attachment identity

Virtual attachments use a deterministic ID derived from a URL hash (e.g., `url-<md5>`). This gives:
- Stable cache keys for repeat analysis of the same URL
- Clear distinction from DB-backed attachment IDs (which are UUIDs)
- No collision with real attachment IDs

The `chat_id` field on virtual attachments will use the current invoker's chat ID.

### 5. Processor changes β€” accept pre-resolved attachments

Both `ChatAttachmentProcessor` and `ChatImageEditService` currently resolve attachments from IDs internally. They need a second entry path that accepts already-resolved `ChatMessageAttachment` objects (the virtual ones).

Approach: Add an optional `pre_resolved_attachments` parameter to both. When provided, these skip DB lookup and platform refresh. The existing `attachment_ids` path remains unchanged.

### 6. Rename `process_attachments` β†’ `process_media`

The tool name and `ALL_LLM_TOOLS` key change. The docstring is updated to mention URLs. The `attachment_ids` parameter keeps its name and description (πŸ“Ž IDs), and `urls` is added alongside it.

## Risks / Trade-offs

**Large file downloads** β†’ No mitigation beyond HTTP timeouts. Users could paste URLs to very large files. The existing `requests.get()` in processors already loads content into memory. This is a pre-existing concern, not introduced by this change.

**URL liveness** β†’ External URLs may go stale between processing calls. Unlike platform attachments (which get refreshed), there's no refresh mechanism. Acceptable for ephemeral, non-persisted media.

**HEAD request may be blocked** β†’ Some servers block HEAD or return different Content-Type than the actual content. The extension fallback mitigates this. If both fail, the tool returns an error to the LLM.

**Cache key stability** β†’ Same URL may serve different content over time. The cache TTL (13 weeks, matching existing attachment cache) is long. Acceptable trade-off β€” users can re-send the URL if content changed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
## Why

The `process_attachments` LLM tool only accepts chat message attachment IDs (files sent via Telegram/WhatsApp). When a user pastes a URL to an image, audio file, or document in their message, the agent has no way to process that media β€” it can only fetch web page text via `fetch_web_content`. This means users can't say "analyze this image: https://example.com/photo.jpg" or "edit this photo: https://imgur.com/abc.png" and get the same media processing they'd get from sending the file directly.

## What Changes

- Rename `process_attachments` to `process_media` in the LLM tool library
- Add a new `urls` parameter (comma-separated list of external URLs) alongside the existing `attachment_ids` parameter
- Build URL-to-virtual-attachment resolution: HTTP HEAD for MIME type detection, extension parsing from URL path, construction of ephemeral `ChatMessageAttachment` objects
- Modify `ChatAttachmentProcessor` to accept pre-resolved attachments (skip DB lookup and platform refresh for virtual attachments)
- Modify `ChatImageEditService` to accept pre-resolved attachments (same skip logic)
- Use URL hash as cache key for URL-based inputs (same caching infra, no DB persistence)
- Both `analyze` and `image-edit` operations work with external URLs

## Capabilities

### New Capabilities

- `external-url-media-resolution`: Resolving external URLs into virtual `ChatMessageAttachment` objects with MIME type detection and extension parsing, usable by existing media processing pipelines

### Modified Capabilities

None.

## Impact

- `src/features/chat/llm_tools/llm_tool_library.py` β€” rename function, add `urls` param, URL resolution logic
- `src/features/chat/chat_attachment_processor.py` β€” accept pre-resolved attachments, skip DB/refresh for them
- `src/features/chat/chat_image_edit_service.py` β€” accept pre-resolved attachments, skip DB/refresh for them
- `src/di/di.py` β€” update factory signatures if needed
- Tests for all modified files
- No API changes, no DB migrations, no new dependencies
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
## ADDED Requirements

### Requirement: LLM tool accepts external URLs for media processing

The `process_media` tool SHALL accept an optional `urls` parameter containing a comma-separated list of external URLs. At least one of `attachment_ids` or `urls` MUST be provided. Both MAY be provided simultaneously, in which case results from both sources are merged.

#### Scenario: URL-only media analysis
- **WHEN** the LLM calls `process_media` with `urls: "https://example.com/photo.jpg"` and `operation: "analyze"`
- **THEN** the system fetches the URL, detects its MIME type, routes it through the image analysis pipeline, and returns a text description

#### Scenario: URL-only image editing
- **WHEN** the LLM calls `process_media` with `urls: "https://example.com/photo.jpg"` and `operation: "image-edit"`
- **THEN** the system fetches the URL, treats it as a reference image, and generates an edited image delivered to the user

#### Scenario: Mixed attachments and URLs
- **WHEN** the LLM calls `process_media` with both `attachment_ids: "πŸ“Žabc-123"` and `urls: "https://example.com/photo.jpg"` and `operation: "analyze"`
- **THEN** the system processes both sources and returns combined results for all inputs

#### Scenario: Neither parameter provided
- **WHEN** the LLM calls `process_media` with both `attachment_ids` and `urls` empty or missing
- **THEN** the system returns a validation error indicating at least one source is required

### Requirement: External URLs are resolved into virtual attachments

The system SHALL resolve each external URL into an ephemeral `ChatMessageAttachment` object with a deterministic ID (`url-<md5_of_url>`), the URL as `last_url`, and detected MIME type and extension. These virtual attachments SHALL NOT be persisted to the database.

#### Scenario: MIME type detected from HTTP HEAD
- **WHEN** an external URL is resolved and the server responds to HTTP HEAD with a valid `Content-Type` header matching a known format
- **THEN** the virtual attachment uses that MIME type and the corresponding extension

#### Scenario: MIME type detected from URL extension
- **WHEN** an external URL is resolved and the HTTP HEAD either fails or returns no usable `Content-Type`, but the URL path contains a recognized file extension
- **THEN** the virtual attachment uses the MIME type mapped from that extension via `supported_files.py`

#### Scenario: MIME type cannot be determined
- **WHEN** an external URL is resolved and neither HTTP HEAD nor URL extension yields a known MIME type
- **THEN** the system returns an error for that URL indicating the media type is unsupported

### Requirement: Virtual attachments skip DB lookup and platform refresh

When `ChatAttachmentProcessor` or `ChatImageEditService` receive pre-resolved virtual attachments, they SHALL skip the database lookup and platform SDK refresh steps. The processing pipeline (image analysis, audio transcription, document search, image editing) SHALL work identically regardless of attachment source.

#### Scenario: Analyze path with virtual attachment
- **WHEN** `ChatAttachmentProcessor` receives a virtual attachment with a valid image URL and MIME type
- **THEN** it processes the image through computer vision analysis without any DB or platform SDK calls

#### Scenario: Image-edit path with virtual attachment
- **WHEN** `ChatImageEditService` receives a virtual attachment with a valid image URL
- **THEN** it passes the URL to `ImageEditor` without calling `refresh_attachments_by_ids`

### Requirement: URL-based results are cached using URL hash

The system SHALL cache analysis results for URL-based inputs using the URL's MD5 hash as the cache key (same prefix and TTL as attachment-based caching). Repeat analysis of the same URL with the same context within the cache TTL SHALL return cached results.

#### Scenario: Cache hit on repeated URL analysis
- **WHEN** the same URL is analyzed twice with the same context within the cache TTL
- **THEN** the second call returns cached results without re-fetching or re-analyzing

#### Scenario: Cache miss on different context
- **WHEN** the same URL is analyzed with a different context string
- **THEN** the system re-fetches and re-analyzes, producing a new cache entry
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## 1. URL Resolution

- [x] 1.1 Add URL-to-virtual-attachment resolution function in `llm_tool_library.py`: HTTP HEAD for MIME type, extension parsing from URL path, fallback to `supported_files.py` lookup, build ephemeral `ChatMessageAttachment` with `url-<md5>` ID
- [x] 1.2 Add validation: reject URLs whose MIME type can't be determined or isn't in `KNOWN_FILE_FORMATS`

## 2. LLM Tool Changes

- [x] 2.1 Rename `process_attachments` β†’ `process_media` in function name, `ALL_LLM_TOOLS` dict key, and docstring; add `urls: str | None` parameter; update docstring to describe both input sources
- [x] 2.2 Wire up `process_media` to resolve URLs into virtual attachments and merge with DB-resolved attachments before passing to processors; validate that at least one of `attachment_ids` or `urls` is provided

## 3. Processor Changes

- [x] 3.1 Modify `ChatAttachmentProcessor.__init__` to accept optional `pre_resolved_attachments` list; when provided, skip DB lookup and platform refresh for those; merge with any DB-resolved attachments
- [x] 3.2 Modify `ChatImageEditService.__init__` to accept optional `pre_resolved_attachments` list; when provided, skip `refresh_attachments_by_ids` for those; merge with any DB-resolved attachments
- [x] 3.3 Update DI factory methods (`chat_attachment_processor`, `chat_image_edit_service`) to pass through the new parameter

## 4. Tests

- [x] 4.1 Add tests for URL resolution: MIME from HEAD, MIME from extension fallback, unsupported type rejection, malformed URL handling
- [x] 4.2 Add tests for `process_media`: URL-only, attachment-only, mixed, neither (validation error)
- [x] 4.3 Add tests for `ChatAttachmentProcessor` with pre-resolved virtual attachments (verify no DB/SDK calls)
- [x] 4.4 Add tests for `ChatImageEditService` with pre-resolved virtual attachments (verify no DB/SDK calls)
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "the-agent"
version = "5.12.3"
version = "5.13.3"

[tool.setuptools]
package-dir = {"" = "src"}
Expand Down
2 changes: 1 addition & 1 deletion src/db/model/chat_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ def __ge__(self, other):
title = Column(String, nullable = True)
is_private = Column(Boolean, nullable = False)
reply_chance_percent = Column(Integer, nullable = False)
release_notifications = Column(EnumSQL(ReleaseNotifications), nullable = False, default = ReleaseNotifications.all)
release_notifications = Column(EnumSQL(ReleaseNotifications), nullable = False, default = ReleaseNotifications.major)
media_mode = Column(EnumSQL(MediaMode), nullable = False, default = MediaMode.photo)
chat_type = Column(EnumSQL(ChatType), nullable = False)

Expand Down
17 changes: 12 additions & 5 deletions src/di/di.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@
from features.chat.telegram.sdk.telegram_bot_sdk import TelegramBotSDK
from features.chat.telegram.telegram_data_resolver import TelegramDataResolver
from features.chat.telegram.telegram_domain_mapper import TelegramDomainMapper
from features.chat.url_attachment_resolver import UrlAttachmentResolver
from features.chat.whatsapp.sdk.whatsapp_bot_api import WhatsAppBotAPI
from features.chat.whatsapp.sdk.whatsapp_bot_sdk import WhatsAppBotSDK
from features.chat.whatsapp.whatsapp_data_resolver import WhatsAppDataResolver
Expand Down Expand Up @@ -898,13 +899,14 @@ def file_uploader(

def chat_image_edit_service(
self,
attachment_ids: list[str],
operation_guidance: str | None,
attachment_ids: list[str] | None,
urls: list[str] | None = None,
operation_guidance: str | None = None,
aspect_ratio: str | None = None,
output_size: str | None = None,
) -> "ChatImageEditService":
from features.chat.chat_image_edit_service import ChatImageEditService
return ChatImageEditService(attachment_ids, operation_guidance, aspect_ratio, output_size, self)
return ChatImageEditService(attachment_ids, urls, operation_guidance, aspect_ratio, output_size, self)

def image_editor(
self,
Expand Down Expand Up @@ -957,13 +959,18 @@ def audio_transcriber(
def_extension, audio_content,
)

def url_attachment_resolver(self, url: str) -> "UrlAttachmentResolver":
from features.chat.url_attachment_resolver import UrlAttachmentResolver
return UrlAttachmentResolver(url, self)

def chat_attachment_processor(
self,
additional_context: str | None,
attachment_ids: list[str],
attachment_ids: list[str] | None,
urls: list[str] | None,
) -> "ChatAttachmentProcessor":
from features.chat.chat_attachment_processor import ChatAttachmentProcessor
return ChatAttachmentProcessor(additional_context, attachment_ids, self)
return ChatAttachmentProcessor(additional_context, attachment_ids, urls, self)

def dev_announcements_service(
self,
Expand Down
3 changes: 2 additions & 1 deletion src/features/audio/audio_transcriber.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
from features.external_tools.configured_tool import ConfiguredTool
from features.external_tools.external_tool import ToolType
from features.integrations import prompt_resolvers
from features.web_browsing.web_fetcher import DEFAULT_HEADERS
from util import log
from util.error_codes import LLM_UNEXPECTED_RESPONSE
from util.errors import ExternalServiceError
Expand Down Expand Up @@ -57,7 +58,7 @@ def __init__(

def __validate_content(self, audio_url: str, audio_content: bytes | None):
log.t(f"Fetching and validating audio from URL '{audio_url}'")
self.__audio_content = audio_content or requests.get(audio_url).content
self.__audio_content = audio_content or requests.get(audio_url, headers = DEFAULT_HEADERS).content

if self.__extension not in SUPPORTED_AUDIO_FORMATS.keys():
log.t(f" Unsupported audio format: '.{self.__extension}'")
Expand Down
Loading
Loading