Skip to main content

Sources API

A source tells the crawler where to start and which URLs to include or exclude for a bot. Every bot starts with one default website source pointing at the clientUrl it was created with; you can add more (e.g. a sitemap pointing at /sitemap.xml, or a single-page source for a specific landing page) and refine include and exclude patterns as the crawl surfaces junk pages you don't want in retrieval. Creating a source automatically queues an indexing crawl; deleting a source cascade-hard-deletes its pages.

Endpoints

The source object

FieldTypeDescription
idstring (uuid)Source id.
botIdstring (uuid)Owning bot.
type"website" | "sitemap" | "single_page"How the crawler should interpret rootUrl.
rootUrlstringStarting URL.
includePatternsstring[]URL patterns — if non-empty, only matching pages are indexed.
excludePatternsstring[]URL patterns to skip.
maxDepthnumber (1–20)How many link hops from the root the crawler will follow. Default 5.
maxPagesnumber (1–5000)Hard cap on pages to crawl from this source. Default 500.
status"pending" | "crawling" | "ready" | "error"Current lifecycle state.
lastCrawledAtstring | nullISO 8601 of the last successful crawl.
createdAtstringISO 8601.
updatedAtstringISO 8601.

List sources

GET /api/bots/{botId}/sources

Returns every source attached to a bot in creation order.

Request

curl https://api.mimicbot.app/api/bots/$BOT_ID/sources \
-H "Authorization: Bearer $MIMICBOT_TOKEN"

Response 200

{ "sources": [ { "id": "...", "type": "website", "rootUrl": "https://docs.acme.com", "status": "ready" } ] }

Errors: 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits). See Errors.

Create a source

POST /api/bots/{botId}/sources

Adds a new source and immediately kicks off a crawl scoped to it.

Request body

FieldTypeRequiredDefault
type"website" | "sitemap" | "single_page"no"website"
rootUrlstring (URL)yes
includePatternsstring[]no[]
excludePatternsstring[]no[]
maxDepthnumber (1–20)no5
maxPagesnumber (1–5000)no500

Request

curl -X POST https://api.mimicbot.app/api/bots/$BOT_ID/sources \
-H "Authorization: Bearer $MIMICBOT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"type": "sitemap",
"rootUrl": "https://docs.acme.com/sitemap.xml",
"excludePatterns": ["/changelog/*"]
}'

Response 201

{
"source": {
"id": "...",
"botId": "...",
"type": "sitemap",
"rootUrl": "https://docs.acme.com/sitemap.xml",
"includePatterns": [],
"excludePatterns": ["/changelog/*"],
"maxDepth": 5,
"maxPages": 500,
"status": "pending"
}
}

Errors: 400 VALIDATION_ERROR, 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits), 503 TEMPORAL_UNAVAILABLE. See Errors.

Update a source

PATCH /api/sources/{sourceId}

Update crawl scope. At least one of includePatterns, excludePatterns, maxDepth, or maxPages must be supplied. Changes take effect on the next crawl — call POST /api/bots/{botId}/crawl to apply them immediately.

Request body

FieldType
includePatternsstring[]
excludePatternsstring[]
maxDepthnumber (1–20)
maxPagesnumber (1–5000)

Request

curl -X PATCH https://api.mimicbot.app/api/sources/$SOURCE_ID \
-H "Authorization: Bearer $MIMICBOT_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "maxPages": 1000 }'

Response 200

{ "source": { "id": "...", "maxPages": 1000 } }

Errors: 400 VALIDATION_ERROR, 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits). See Errors.

Delete a source

DELETE /api/sources/{sourceId}

Hard-deletes the source. The ON DELETE CASCADE on pages.source_id removes every page from the same transaction — there is no soft delete and no undo.

Request

curl -X DELETE https://api.mimicbot.app/api/sources/$SOURCE_ID \
-H "Authorization: Bearer $MIMICBOT_TOKEN"

Response 200

{ "ok": true }

Errors: 401 UNAUTHENTICATED, 403 NO_AGENCY, 404 NOT_FOUND, 429 RATE_LIMITED (see Rate limits). See Errors.