Ingestion API
Ingest content directly into your search index without crawling. Ideal for CMS integrations, dynamic content, or bulk imports.
Ingest Pages
POST /organisations/{orgId}/ai-search/sites/{siteId}/pages
{
"pages": [
{
"url": "https://example.com/docs/getting-started",
"title": "Getting Started Guide",
"content": "Welcome to our platform! This guide will help you get up and running...",
"summary": "A beginner's guide to setting up and using the platform.",
"tags": ["getting-started", "tutorial", "beginner"]
}
]
} Page Fields
| Field | Type | Required | Description |
|---|---|---|---|
url | string | Yes | Unique URL identifier for the page |
content | string | Yes | Full page content (plain text or HTML) |
title | string | No | Page title (auto-extracted if HTML) |
summary | string | No | Brief description for search results |
tags | string[] | No | Keywords for search relevance |
Response
{
"processed": 1,
"skipped": 0,
"errors": []
} Content Processing
When you ingest content:
- HTML is cleaned - Navigation, headers, footers are removed
- Metadata is extracted - Title, summary, tags (if not provided)
- Content is chunked - Split into searchable segments
- Embeddings are generated - For semantic search
Pre-processed Content
If you've already cleaned your content, you can skip AI processing by providing
all metadata fields (title, summary, tags):
- HTML content → Always processed (to remove boilerplate)
- Plain text + all metadata → Indexed directly (faster, cheaper)
- Plain text, missing metadata → AI extracts missing fields
Batch Ingestion
Ingest up to 100 pages per request:
{
"pages": [
{ "url": "https://example.com/page1", "title": "Page 1", "content": "..." },
{ "url": "https://example.com/page2", "title": "Page 2", "content": "..." },
{ "url": "https://example.com/page3", "title": "Page 3", "content": "..." }
]
} For large imports, batch your requests and implement retry logic for any errors.
Updating Content
To update a page, simply ingest it again with the same URL. The new content replaces the old.
Delete Content
DELETE /organisations/{orgId}/ai-search/sites/{siteId}/pages
{
"url": "https://example.com/docs/getting-started"
} Removes the page and all its chunks from the index. The URL must match exactly.
Use Cases
CMS Integration
Trigger ingestion when content is published:
- Listen for publish/update webhooks from your CMS
- Fetch the page content
- POST to the ingestion API
Bulk Import
Import existing content:
- Export content from your source system
- Transform into the required format
- Batch ingest in chunks of 100
- Monitor for errors and retry failures
Dynamic Content
For content that changes frequently:
- Ingest on a schedule (e.g., hourly)
- Or trigger on content changes
- Consider using tags to track content freshness
Limits
| Limit | Value |
|---|---|
| Pages per request | 100 |
| Content size per page | 1 MB |
| Requests per minute | 60 |