Skip to main content

Pages API

A page is a single URL that has been fetched, chunked, and embedded for a bot. Pages inherit from exactly one source and are the unit of inclusion — marking a page as excluded keeps it in the database but drops it from retrieval during chat. These endpoints power the Content tab in the dashboard and are useful for automating content hygiene (for example, excluding out-of-date changelog entries from a programmatic script).

Endpoints

The page object

FieldTypeDescription
idstring (uuid)Page id.
botIdstring (uuid)Owning bot.
sourceIdstring (uuid)Source this page was discovered from.
urlstringOriginal URL as discovered.
normalizedUrlstringCanonicalized URL used for dedup.
titlestring | nullExtracted <title> or <h1>.
contentHashstring | nullHash of the parsed content — used to skip re-embedding on unchanged recrawls.
tokenCountnumberTotal tokens across chunks.
chunkCountnumberNumber of chunks this page was split into.
headingBreadcrumbstring | nullFirst chunk's heading path, e.g. "Docs > Pricing".
status"ready" | "excluded" | "error"Derived lifecycle — see below.
includedbooleanWhether the page participates in retrieval.
httpStatusnumber | nullHTTP status code from the last crawl.
errorstring | nullCrawl error message if any.
lastCrawledAtstring | nullISO 8601 of the last successful crawl.
createdAtstringISO 8601.
updatedAtstringISO 8601.

The status field is derived server-side from included and error: excluded when included = false, error when error is non-null, otherwise ready.

The Get a page response additionally carries contentMarkdown, an ordered chunks array ({ id, chunkIndex, headingPath, tokenCount, content }), and a headingTree list of all heading breadcrumbs present in the page.

List pages

GET /api/bots/{botId}/pages

Query parameters

NameTypeDefaultDescription
qstringCase-insensitive substring match over url and title.
status"ready" | "excluded" | "error"Filter by derived status.
sourceIdstring (uuid)Restrict to pages from a single source.
sectionstringFilter to pages whose first chunk's heading breadcrumb starts with this prefix.
pagenumber1Page number.
pageSizenumber (1–200)50Rows per page.

Request

curl "https://api.mimicbot.app/api/bots/$BOT_ID/pages?status=ready&pageSize=25" \
-H "Authorization: Bearer $MIMICBOT_TOKEN"

Response 200

{
"pages": [ { "id": "...", "url": "https://docs.acme.com/pricing", "status": "ready" } ],
"total": 128,
"page": 1,
"pageSize": 25,
"sections": ["Docs", "Pricing"]
}

Errors: 400 VALIDATION_ERROR, 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits). See Errors.

Get a page

GET /api/pages/{pageId}

Returns the full PageDetail, including contentMarkdown, chunks[], and headingTree[].

Request

curl https://api.mimicbot.app/api/pages/$PAGE_ID \
-H "Authorization: Bearer $MIMICBOT_TOKEN"

Response 200

{
"page": {
"id": "...",
"url": "https://docs.acme.com/pricing",
"status": "ready",
"contentMarkdown": "# Pricing\n\n...",
"chunks": [
{ "id": "...", "chunkIndex": 0, "headingPath": "Pricing", "tokenCount": 412, "content": "..." }
],
"headingTree": ["Pricing", "Pricing > Enterprise"]
}
}

Errors: 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits). See Errors.

Update a page

PATCH /api/pages/{pageId}

Toggles whether a page participates in retrieval.

Request body

FieldTypeRequired
includedbooleanyes

Request

curl -X PATCH https://api.mimicbot.app/api/pages/$PAGE_ID \
-H "Authorization: Bearer $MIMICBOT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "included": false }'

Response 200

{ "page": { "id": "...", "included": false, "status": "excluded" } }

Errors: 400 VALIDATION_ERROR, 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits). See Errors.

Recrawl a page

POST /api/pages/{pageId}/recrawl

Spawns a single-page reindex workflow that refetches and re-embeds this one URL. Returns the new crawl job id and Temporal workflow id.

Request

curl -X POST https://api.mimicbot.app/api/pages/$PAGE_ID/recrawl \
-H "Authorization: Bearer $MIMICBOT_TOKEN"

Response 202

{ "jobId": "8f3b...", "workflowId": "page-...-reindex-1712998400000" }

Errors: 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits), 503 TEMPORAL_UNAVAILABLE. See Errors.

Bulk page action

POST /api/pages/bulk

Apply an action to up to 50 pages in one call. For include / exclude the response updated counts the number of rows changed. For recrawl, jobIds contains one Temporal workflow per page.

Request body

FieldTypeRequiredDescription
pageIdsstring[] (1–50 uuids)yesThe pages to act on.
action"exclude" | "include" | "recrawl"yesThe action to apply.

Request

curl -X POST https://api.mimicbot.app/api/pages/bulk \
-H "Authorization: Bearer $MIMICBOT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "pageIds": ["uuid-1","uuid-2"], "action": "exclude" }'

Response 200

{ "updated": 2, "jobIds": [] }

Errors: 400 VALIDATION_ERROR, 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits), 503 TEMPORAL_UNAVAILABLE. See Errors.