Ingestion API

Ingest content directly into your search index without crawling. Ideal for CMS integrations, dynamic content, or bulk imports.

Ingest Pages

POST /organisations/{orgId}/ai-search/sites/{siteId}/pages

{
  "pages": [
    {
      "url": "https://example.com/docs/getting-started",
      "title": "Getting Started Guide",
      "content": "Welcome to our platform! This guide will help you get up and running...",
      "summary": "A beginner's guide to setting up and using the platform.",
      "tags": ["getting-started", "tutorial", "beginner"]
    }
  ]
}

Page Fields

Field Type Required Description
url string Yes Unique URL identifier for the page
content string Yes Full page content (plain text or HTML)
title string No Page title (auto-extracted if HTML)
summary string No Brief description for search results
tags string[] No Keywords for search relevance

Response

{
  "processed": 1,
  "skipped": 0,
  "errors": []
}

Content Processing

When you ingest content:

  1. HTML is cleaned - Navigation, headers, footers are removed
  2. Metadata is extracted - Title, summary, tags (if not provided)
  3. Content is chunked - Split into searchable segments
  4. Embeddings are generated - For semantic search

Pre-processed Content

If you've already cleaned your content, you can skip AI processing by providing all metadata fields (title, summary, tags):

  • HTML content → Always processed (to remove boilerplate)
  • Plain text + all metadata → Indexed directly (faster, cheaper)
  • Plain text, missing metadata → AI extracts missing fields

Batch Ingestion

Ingest up to 100 pages per request:

{
  "pages": [
    { "url": "https://example.com/page1", "title": "Page 1", "content": "..." },
    { "url": "https://example.com/page2", "title": "Page 2", "content": "..." },
    { "url": "https://example.com/page3", "title": "Page 3", "content": "..." }
  ]
}

For large imports, batch your requests and implement retry logic for any errors.

Updating Content

To update a page, simply ingest it again with the same URL. The new content replaces the old.

Delete Content

DELETE /organisations/{orgId}/ai-search/sites/{siteId}/pages

{
  "url": "https://example.com/docs/getting-started"
}

Removes the page and all its chunks from the index. The URL must match exactly.

Use Cases

CMS Integration

Trigger ingestion when content is published:

  1. Listen for publish/update webhooks from your CMS
  2. Fetch the page content
  3. POST to the ingestion API

Bulk Import

Import existing content:

  1. Export content from your source system
  2. Transform into the required format
  3. Batch ingest in chunks of 100
  4. Monitor for errors and retry failures

Dynamic Content

For content that changes frequently:

  • Ingest on a schedule (e.g., hourly)
  • Or trigger on content changes
  • Consider using tags to track content freshness

Limits

Limit Value
Pages per request 100
Content size per page 1 MB
Requests per minute 60