Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions .github/styles/config/vocabularies/VGI/accept.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
VGI
vgi
[Vv]gi-python
[Vv]gi-rpc
DuckDB
Arrow
[Aa]pache Arrow
Haybarn
pyarrow
PyArrow
RecordBatch
RecordBatches
mkdocstrings
MkDocs
uv
uvx
subprocess
scalar
[Aa]ggregations?
[Dd]eserialize
[Ss]erializable
Diátaxis
ATTACH
classmethod
dataclass
runnable
bool
config
[Mm]etadata
namespace
namespaces
struct
async
stdin
stdout
optimizer
[Pp]ushdown
JWT
HTTP
OAuth
TTL
44 changes: 44 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,50 @@ jobs:
run: uv run ty check vgi/
continue-on-error: true

docs:
name: Docs (build + examples + prose)
runs-on: ubuntu-latest
env:
DISABLE_MKDOCS_2_WARNING: "true"
steps:
- uses: actions/checkout@v5

- name: Install uv
uses: astral-sh/setup-uv@v8.2.0

- name: Set up Python 3.13
run: uv python install 3.13

- name: Install dependencies
run: uv sync --all-extras --group docs

- name: Install d2
run: curl -fsSL https://d2lang.com/install.sh | sh -s --

- name: Build docs (strict)
run: uv run mkdocs build --strict

- name: Test documentation examples
run: uv run pytest tests/test_documentation_examples.py tests/test_examples_workers.py -q

# Prose lint. Runs on every PR as a signal. Kept non-blocking for now:
# the Google package + spelling check needs a vocabulary-tuning pass
# against a real run before it can gate without false positives. Flip
# fail_on_error to true (and drop continue-on-error) once the vocab in
# .github/styles/config/vocabularies/VGI/accept.txt is settled.
- name: Prose lint (Vale)
uses: errata-ai/vale-action@v2
continue-on-error: true
with:
files: docs
fail_on_error: true

- name: Link check (lychee)
uses: lycheeverse/lychee-action@v2
with:
args: "--no-progress --offline docs/**/*.md *.md"
fail: true

s3-offload-localstack:
name: S3 Offload Tests (LocalStack)
runs-on: ubuntu-latest
Expand Down
46 changes: 46 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Docs

on:
push:
branches: [main]
paths:
- "docs/**"
- "mkdocs.yml"
- "vgi/**"
- "pyproject.toml"
workflow_dispatch:

permissions:
contents: read

concurrency:
group: pages
cancel-in-progress: true

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5

- name: Install uv
uses: astral-sh/setup-uv@v8.1.0

- name: Set up Python
run: uv python install 3.13

- name: Install dependencies
run: uv sync --group docs

- name: Install d2
run: curl -fsSL https://d2lang.com/install.sh | sh -s --

- name: Build docs
run: uv run mkdocs build --strict

- name: Deploy to Cloudflare Pages
uses: cloudflare/wrangler-action@v3
with:
apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }}
accountId: ${{ secrets.CLOUDFLARE_ACCOUNT_ID }}
command: pages deploy site --project-name=vgi-python-docs --commit-dirty=true
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@ dist/
wheels/
*.egg-info

# MkDocs build output
/site/
# MkDocs plugin caches (d2 diagram render cache)
/.cache/

# Virtual environments
.venv

Expand Down
19 changes: 19 additions & 0 deletions .vale.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Vale prose-lint config for vgi-python docs.
# Run locally with: vale docs/ (after `vale sync` to fetch the Google package)
StylesPath = .github/styles

# Only fail on errors. The Google package emits many style suggestions/warnings;
# gating on those would be noisy, so the CI gate enforces error-level issues
# (Vale core checks + spelling against the VGI vocabulary) only.
MinAlertLevel = error

Packages = Google

Vocab = VGI

[*.md]
BasedOnStyles = Vale, Google

# Snippet directives and our auto-generated API pages aren't prose to lint.
[docs/api/*]
BasedOnStyles =
122 changes: 122 additions & 0 deletions DOCS_ACCEPTANCE_CRITERIA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# VGI-Python Documentation — Acceptance Criteria (review-ready v1)

> Status: DRAFT for senior DX-engineer review. Derived from a requirements interview.
> This document defines what "done and good" means for the documentation rework
> before the site goes live at `vgi-python.query.farm`.

## North Star

**A developer who has never used VGI can build and run a real worker fast.**
Everything on the site is optimized around that job-to-be-done; depth is available
but never blocks the fast path.

## Target audience (mixed — serve all via progressive disclosure)

The reader could be a Python developer new to DuckDB/Arrow, a DuckDB/SQL user newer
to Python, or someone fluent in both. Therefore:

- The happy path is skimmable by experts (dense, copy-paste-ready).
- Newcomers are served by **progressive disclosure**: inline "New to Arrow? →" /
"New to DuckDB extensions? →" callouts and links, not walls of prerequisite text.
- We never assume knowledge silently; we either explain briefly or link out.

## Information architecture — Diátaxis

Top-level navigation is reorganized into the four Diátaxis modes (this directly
addresses the current "hard to orient" problem):

1. **Tutorial** — one guided, end-to-end "build your first worker" path.
2. **How-to guides** — task-oriented recipes ("Add a table function", "Run over
HTTP with auth", "Persist aggregate state").
3. **Concepts** — explanations: worker lifecycle (bind/init/process/finalize),
transports, the Arrow data model, catalogs & ATTACH, parallel workers.
4. **API Reference** — the existing auto-generated mkdocstrings pages.

The current 11 hand-written guides are **re-homed** into How-to vs Concepts (not
left in a flat "Guides" bucket).

## Scope

### In scope for v1 (must be fully documented: tutorial coverage + how-to + runnable example)

- **All four function patterns**: scalar, table, table-in-out, aggregate.
- **Catalogs / ATTACH model** — how functions are surfaced to DuckDB.
- **State storage** — `FunctionStorage` backends for stateful/aggregate functions.
- **Auth + HTTP transport** — running a worker over HTTP with bearer/JWT auth.
- **Filter pushdown & column statistics** — optimizer integration for table functions.

### Out of scope for v1 (reference-only / deferred — must NOT block launch)

- Transactor (transactional DB access)
- External storage / large-payload offload (S3/GCS)
- Observability (OpenTelemetry / Sentry)
- Sharding / meta-worker, cross-language client codegen, standalone secret service

These remain available in the auto-generated API reference but get no tutorial/how-to
investment in v1.

## Headline acceptance test (Time-To-First-Success)

> **An unfamiliar developer, working unaided from the docs, has both a custom
> scalar function AND a custom table function callable from DuckDB within
> ≤20 minutes.**

- "Callable from DuckDB" = `SELECT my_cat.my_scalar(col) FROM t` returns rows, and
`SELECT * FROM my_cat.my_table(args)` returns rows.
- Engine for the timed path: **Haybarn** (`uvx haybarn-cli`) as the primary happy
path; a stock-DuckDB variant (`INSTALL vgi FROM community; LOAD vgi;`) shown in a
callout/tab for portability.
- Every place a test participant gets stuck is logged and fixed before sign-off.

## Per-page orientation standard (applies to every tutorial / how-to / concept page)

Each page must contain:

1. **Lead "what + who" line** — one sentence at the top: what this page is and who
it's for (reader self-orients in <10 s).
2. **Prerequisites stated** — explicit assumed knowledge, prior steps, and required
extras (`vgi-python[http]`, etc.), with links.
3. **At least one complete, runnable example** — no elisions; covered by the CI
example tests (see Quality Gates).
4. **"Next steps" links** — a closing section pointing to the logical next page(s);
no dead ends.

## Example correctness bar

- **100% of Python code blocks are copy-paste runnable and CI-tested** (e.g. via
`pytest-examples`, already a dev dependency).
- The tutorial worker is **built and queried end-to-end in an automated test**.
- A broken example fails the build.

## Quality gates (all three required to sign off v1)

1. **Fresh-dev usability test** — ≥1 developer unfamiliar with VGI completes the
headline acceptance test (scalar + table from DuckDB, ≤20 min, unaided). All
stumbling points resolved.
2. **Senior DX reviewer rubric** — named senior DX engineer(s) score the site
against a written checklist: orientation, scannability, completeness vs the
in-scope list, correctness, navigation, and the per-page standard above. All
must-fix items resolved before merge.
3. **Automated quality gates in CI**:
- `mkdocs build --strict` passes with zero warnings (no broken links / refs).
- All documentation examples execute successfully.
- Link check + prose/style lint pass.

## Definition of Done (v1)

- [ ] Diátaxis nav live (Tutorial / How-to / Concepts / API Reference); existing
guides re-homed.
- [ ] Guided tutorial takes a reader from zero → scalar + table function queried
from Haybarn, with the stock-DuckDB variant noted.
- [ ] How-to + runnable example exists for each in-scope topic (4 patterns +
catalogs + state storage + auth/HTTP + pushdown/stats).
- [ ] Concept pages cover lifecycle, transports, Arrow model, catalogs, parallelism.
- [ ] Every page meets the 4-point orientation standard.
- [ ] All examples runnable and CI-tested; tutorial validated end-to-end in CI.
- [ ] Out-of-scope topics confined to reference; not advertised as v1 guides.
- [ ] All three quality gates passed and signed off.

## Open items to confirm with reviewers

- Named senior DX reviewer(s) and the recruited fresh-dev test participant.
- Final wording/threshold of the prose-style lint (e.g. Vale ruleset), if adopted.
67 changes: 67 additions & 0 deletions DOCS_REVIEW_RUBRIC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# vgi-python docs — senior DX review rubric

> Status: review checklist for the review-ready v1 of the documentation
> (see `DOCS_ACCEPTANCE_CRITERIA.md`). A reviewer scores each item Pass / Fix /
> N/A. Every **Fix** must be resolved (or explicitly waived) before the site
> goes live at `vgi-python.query.farm`.

Reviewer: ________________ Date: ________________ Commit: ________________

## 1. Orientation (the problem this rework targets)

- [ ] The home page makes it obvious in <10 s what vgi-python is and where to start.
- [ ] Top-level nav clearly separates **Tutorial / How-to / Concepts / API Reference**
(Diátaxis); a newcomer can tell which to open for their need.
- [ ] Every tutorial/how-to/concept page opens with a **"what + who"** line.
- [ ] No page is a dead end — each ends with **"Next steps"** links.

## 2. The fast path (job-to-be-done: ship a worker fast)

- [ ] The tutorial gets a reader from zero → a **scalar + table** function callable from
DuckDB, and is realistically completable in **≤20 minutes**.
- [ ] The first step yields a working query quickly (scalar before table).
- [ ] Haybarn is the primary path; the stock-DuckDB variant is present and correct.
- [ ] Copy-paste works: the worker shown is complete and runnable as-is.

## 3. Completeness vs. the in-scope list

- [ ] All four function patterns are documented with a runnable example: scalar, table,
table-in-out, aggregate.
- [ ] Catalogs / ATTACH, state storage, auth + HTTP, and filter pushdown & stats each have a
how-to.
- [ ] Out-of-scope topics (transactor, external storage, observability, sharding/codegen/secret
service) are reference-only and not advertised as v1 guides.

## 4. Correctness

- [ ] Every code example is accurate and runs (CI: `test_documentation_examples.py` +
`test_examples_workers.py` green).
- [ ] SQL snippets use correct catalog/function names and match the worker shown.
- [ ] Conceptual claims (lifecycle phases, transports, Arrow semantics) are accurate.
- [ ] API reference renders for every in-scope module (CI: `mkdocs build --strict` green).

## 5. Scannability & progressive disclosure

- [ ] Pages use headings, tables, and short paragraphs; an expert can skim.
- [ ] Newcomer background is in collapsible callouts, not blocking the main flow.
- [ ] Prerequisites and required extras (`[http]`, `[oauth]`, …) are stated where needed.

## 6. Navigation & polish

- [ ] No broken links (CI: lychee + strict build green).
- [ ] Search returns sensible results for common terms (worker, scalar, aggregate, ATTACH).
- [ ] Light/dark themes, logo, and code-copy all work.

## Automated gates (must be green at review time)

- [ ] `mkdocs build --strict` — zero warnings
- [ ] `pytest tests/test_documentation_examples.py tests/test_examples_workers.py`
- [ ] lychee link-check
- [ ] Vale prose lint (advisory until vocab is tuned; note residual warnings)

## Sign-off

- [ ] All **Fix** items resolved or waived (waivers noted below).
- [ ] Fresh-dev usability test passed (see `DOCS_USABILITY_TEST.md`).

Waivers / notes:
Loading
Loading