Pages API

A page is a single URL that has been fetched, chunked, and embedded for a bot. Pages inherit from exactly one source and are the unit of inclusion — marking a page as excluded keeps it in the database but drops it from retrieval during chat. These endpoints power the Content tab in the dashboard and are useful for automating content hygiene (for example, excluding out-of-date changelog entries from a programmatic script).

The page object

Field	Type	Description
`id`	`string` (uuid)	Page id.
`botId`	`string` (uuid)	Owning bot.
`sourceId`	`string` (uuid)	Source this page was discovered from.
`url`	`string`	Original URL as discovered.
`normalizedUrl`	`string`	Canonicalized URL used for dedup.
`title`	`string \| null`	Extracted `<title>` or `<h1>`.
`contentHash`	`string \| null`	Hash of the parsed content — used to skip re-embedding on unchanged recrawls.
`tokenCount`	`number`	Total tokens across chunks.
`chunkCount`	`number`	Number of chunks this page was split into.
`headingBreadcrumb`	`string \| null`	First chunk's heading path, e.g. `"Docs > Pricing"`.
`status`	`"ready" \| "excluded" \| "error"`	Derived lifecycle — see below.
`included`	`boolean`	Whether the page participates in retrieval.
`httpStatus`	`number \| null`	HTTP status code from the last crawl.
`error`	`string \| null`	Crawl error message if any.
`lastCrawledAt`	`string \| null`	ISO 8601 of the last successful crawl.
`createdAt`	`string`	ISO 8601.
`updatedAt`	`string`	ISO 8601.

The status field is derived server-side from included and error: excluded when included = false, error when error is non-null, otherwise ready.

The Get a page response additionally carries contentMarkdown, an ordered chunks array ({ id, chunkIndex, headingPath, tokenCount, content }), and a headingTree list of all heading breadcrumbs present in the page.

List pages

GET /api/bots/{botId}/pages

Query parameters

Name	Type	Default	Description
`q`	`string`	—	Case-insensitive substring match over `url` and `title`.
`status`	`"ready" \| "excluded" \| "error"`	—	Filter by derived status.
`sourceId`	`string` (uuid)	—	Restrict to pages from a single source.
`section`	`string`	—	Filter to pages whose first chunk's heading breadcrumb starts with this prefix.
`page`	`number`	`1`	Page number.
`pageSize`	`number` (1–200)	`50`	Rows per page.

Request

curl "https://api.mimicbot.app/api/bots/$BOT_ID/pages?status=ready&pageSize=25" \
  -H "Authorization: Bearer $MIMICBOT_TOKEN"

Response 200

{
  "pages": [ { "id": "...", "url": "https://docs.acme.com/pricing", "status": "ready" } ],
  "total": 128,
  "page": 1,
  "pageSize": 25,
  "sections": ["Docs", "Pricing"]
}

Errors: 400 VALIDATION_ERROR, 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits). See Errors.

Get a page

GET /api/pages/{pageId}

Returns the full PageDetail, including contentMarkdown, chunks[], and headingTree[].

Request

curl https://api.mimicbot.app/api/pages/$PAGE_ID \
  -H "Authorization: Bearer $MIMICBOT_TOKEN"

Response 200

{
  "page": {
    "id": "...",
    "url": "https://docs.acme.com/pricing",
    "status": "ready",
    "contentMarkdown": "# Pricing\n\n...",
    "chunks": [
      { "id": "...", "chunkIndex": 0, "headingPath": "Pricing", "tokenCount": 412, "content": "..." }
    ],
    "headingTree": ["Pricing", "Pricing > Enterprise"]
  }
}

Errors: 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits). See Errors.

Update a page

PATCH /api/pages/{pageId}

Toggles whether a page participates in retrieval.

Request body

Field	Type	Required
`included`	`boolean`	yes

Request

curl -X PATCH https://api.mimicbot.app/api/pages/$PAGE_ID \
  -H "Authorization: Bearer $MIMICBOT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "included": false }'

Response 200

{ "page": { "id": "...", "included": false, "status": "excluded" } }

Errors: 400 VALIDATION_ERROR, 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits). See Errors.

Recrawl a page

POST /api/pages/{pageId}/recrawl

Spawns a single-page reindex workflow that refetches and re-embeds this one URL. Returns the new crawl job id and Temporal workflow id.

Request

curl -X POST https://api.mimicbot.app/api/pages/$PAGE_ID/recrawl \
  -H "Authorization: Bearer $MIMICBOT_TOKEN"

Response 202

{ "jobId": "8f3b...", "workflowId": "page-...-reindex-1712998400000" }

Errors: 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits), 503 TEMPORAL_UNAVAILABLE. See Errors.

Bulk page action

POST /api/pages/bulk

Apply an action to up to 50 pages in one call. For include / exclude the response updated counts the number of rows changed. For recrawl, jobIds contains one Temporal workflow per page.

Request body

Field	Type	Required	Description
`pageIds`	`string[]` (1–50 uuids)	yes	The pages to act on.
`action`	`"exclude" \| "include" \| "recrawl"`	yes	The action to apply.

Request

curl -X POST https://api.mimicbot.app/api/pages/bulk \
  -H "Authorization: Bearer $MIMICBOT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "pageIds": ["uuid-1","uuid-2"], "action": "exclude" }'

Response 200

{ "updated": 2, "jobIds": [] }

Errors: 400 VALIDATION_ERROR, 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits), 503 TEMPORAL_UNAVAILABLE. See Errors.

Endpoints​

The page object​

List pages​

Get a page​

Update a page​

Recrawl a page​

Bulk page action​

Endpoints

The page object

List pages

Get a page

Update a page

Recrawl a page

Bulk page action