Pages API
A page is a single URL that has been fetched, chunked, and embedded for a bot. Pages inherit from exactly one source and are the unit of inclusion — marking a page as excluded keeps it in the database but drops it from retrieval during chat. These endpoints power the Content tab in the dashboard and are useful for automating content hygiene (for example, excluding out-of-date changelog entries from a programmatic script).
Endpoints
The page object
| Field | Type | Description |
|---|---|---|
id | string (uuid) | Page id. |
botId | string (uuid) | Owning bot. |
sourceId | string (uuid) | Source this page was discovered from. |
url | string | Original URL as discovered. |
normalizedUrl | string | Canonicalized URL used for dedup. |
title | string | null | Extracted <title> or <h1>. |
contentHash | string | null | Hash of the parsed content — used to skip re-embedding on unchanged recrawls. |
tokenCount | number | Total tokens across chunks. |
chunkCount | number | Number of chunks this page was split into. |
headingBreadcrumb | string | null | First chunk's heading path, e.g. "Docs > Pricing". |
status | "ready" | "excluded" | "error" | Derived lifecycle — see below. |
included | boolean | Whether the page participates in retrieval. |
httpStatus | number | null | HTTP status code from the last crawl. |
error | string | null | Crawl error message if any. |
lastCrawledAt | string | null | ISO 8601 of the last successful crawl. |
createdAt | string | ISO 8601. |
updatedAt | string | ISO 8601. |
The status field is derived server-side from included and error: excluded when included = false, error when error is non-null, otherwise ready.
The Get a page response additionally carries contentMarkdown, an ordered chunks array ({ id, chunkIndex, headingPath, tokenCount, content }), and a headingTree list of all heading breadcrumbs present in the page.
List pages
GET /api/bots/{botId}/pages
Query parameters
| Name | Type | Default | Description |
|---|---|---|---|
q | string | — | Case-insensitive substring match over url and title. |
status | "ready" | "excluded" | "error" | — | Filter by derived status. |
sourceId | string (uuid) | — | Restrict to pages from a single source. |
section | string | — | Filter to pages whose first chunk's heading breadcrumb starts with this prefix. |
page | number | 1 | Page number. |
pageSize | number (1–200) | 50 | Rows per page. |
Request
curl "https://api.mimicbot.app/api/bots/$BOT_ID/pages?status=ready&pageSize=25" \
-H "Authorization: Bearer $MIMICBOT_TOKEN"
Response 200
{
"pages": [ { "id": "...", "url": "https://docs.acme.com/pricing", "status": "ready" } ],
"total": 128,
"page": 1,
"pageSize": 25,
"sections": ["Docs", "Pricing"]
}
Errors: 400 VALIDATION_ERROR, 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits). See Errors.
Get a page
GET /api/pages/{pageId}
Returns the full PageDetail, including contentMarkdown, chunks[], and headingTree[].
Request
curl https://api.mimicbot.app/api/pages/$PAGE_ID \
-H "Authorization: Bearer $MIMICBOT_TOKEN"
Response 200
{
"page": {
"id": "...",
"url": "https://docs.acme.com/pricing",
"status": "ready",
"contentMarkdown": "# Pricing\n\n...",
"chunks": [
{ "id": "...", "chunkIndex": 0, "headingPath": "Pricing", "tokenCount": 412, "content": "..." }
],
"headingTree": ["Pricing", "Pricing > Enterprise"]
}
}
Errors: 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits). See Errors.
Update a page
PATCH /api/pages/{pageId}
Toggles whether a page participates in retrieval.
Request body
| Field | Type | Required |
|---|---|---|
included | boolean | yes |
Request
curl -X PATCH https://api.mimicbot.app/api/pages/$PAGE_ID \
-H "Authorization: Bearer $MIMICBOT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "included": false }'
Response 200
{ "page": { "id": "...", "included": false, "status": "excluded" } }
Errors: 400 VALIDATION_ERROR, 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits). See Errors.
Recrawl a page
POST /api/pages/{pageId}/recrawl
Spawns a single-page reindex workflow that refetches and re-embeds this one URL. Returns the new crawl job id and Temporal workflow id.
Request
curl -X POST https://api.mimicbot.app/api/pages/$PAGE_ID/recrawl \
-H "Authorization: Bearer $MIMICBOT_TOKEN"
Response 202
{ "jobId": "8f3b...", "workflowId": "page-...-reindex-1712998400000" }
Errors: 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits), 503 TEMPORAL_UNAVAILABLE. See Errors.
Bulk page action
POST /api/pages/bulk
Apply an action to up to 50 pages in one call. For include / exclude the response updated counts the number of rows changed. For recrawl, jobIds contains one Temporal workflow per page.
Request body
| Field | Type | Required | Description |
|---|---|---|---|
pageIds | string[] (1–50 uuids) | yes | The pages to act on. |
action | "exclude" | "include" | "recrawl" | yes | The action to apply. |
Request
curl -X POST https://api.mimicbot.app/api/pages/bulk \
-H "Authorization: Bearer $MIMICBOT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "pageIds": ["uuid-1","uuid-2"], "action": "exclude" }'
Response 200
{ "updated": 2, "jobIds": [] }
Errors: 400 VALIDATION_ERROR, 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits), 503 TEMPORAL_UNAVAILABLE. See Errors.