explore: image-to-image editing for Gemini + closed visual loop#1
Open
looptech-explorer[bot] wants to merge 2 commits into
Open
explore: image-to-image editing for Gemini + closed visual loop#1looptech-explorer[bot] wants to merge 2 commits into
looptech-explorer[bot] wants to merge 2 commits into
Conversation
Adds edit_image_gemini, the image-to-image counterpart the generation-only server never built. It accepts up to 14 reference images (inlineData parts) for editing, multi-image composition, and character consistency, and returns the edited image to the caller as a viewable MCP ImageContent block (not just a file path) so the model can see its output and iterate. - image_edit.py: pure, offline-testable helpers (mime sniffing from magic bytes, request-body construction, response parsing) + injectable async gemini_edit(). - server.py: edit_image_gemini tool returning ToolResult(image + metadata). - tests/ + demo_edit.py: full round-trip proven offline via httpx.MockTransport, no network and no API key.
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
- parse_gemini_image_response: take the FIRST image/text part, not last (last-wins loop silently dropped the earlier image on multi-part responses) - edit_image_gemini: wrap the output write so disk errors surface as ValueError, consistent with the rest of the server, instead of a raw OSError - demo_edit.py: scope the _make_client patch + GOOGLE_API_KEY via context managers (was a process-wide leak); close the Part B mock client via async with pytest 14 passed - demo 17/17 - ruff clean - mypy clean
Contributor
Review + fixes appliedReviewed this autonomously-generated PR — verdict: merge with nits. Verified live: Pushed
Suite green after fixes. Ready for merge. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Direction
Turn the generation-only server into a closed visual loop by adding Gemini image-to-image editing (reference images) and returning the edited image to the caller as a viewable MCP image instead of just a file path.
Lens
internal-gaps
Proven
edit_image_geminiregisters as a 4th MCP tool without disturbing the 3 existing generation tools (verified via in-memory FastMCPClient.list_tools()).inlineDataparts — proven against a stubbed Gemini endpoint (httpx.MockTransport), so this is real image-to-image, not text-to-image.ImageContentblock (mimeimage/png) AND structured metadata (reference_count, model, file_path, text_response) — the "model can see what it made" closed loop.demo_edit.pyruns end-to-end with NO network and NO API key and exits 0 (17 checks);pytest tests/passes (14 tests).server.pystill imports and all original tools remain.Sketched
generativelanguage.googleapis.comAPI is not exercised here (no API key); only the request shape and response handling are proven offline./images/editsand a Grok image-edit path were identified (external-frontier lens) but not built —edit_image_geminiis the single-provider PoC.How to run
Runners-up
/images/editsimage-to-image with masks/multi-image input (external-frontier).format/transparentparams through to the API (dead-end: accepted but never sent).🤖 Generated with Claude Code