Crawler Settings
The QuantSearch crawler discovers and indexes your website content. Configure it to match your site structure and performance requirements.
Basic Settings
Max Pages
The maximum number of pages to crawl per job. This is limited by your plan:
| Plan | Max Pages per Site |
|---|---|
| Free | 50 |
| Pro | 2,000 |
| Enterprise | 10,000 |
Max Depth
How many links deep to follow from your start URL:
- 0 - Only crawl the start URL(s)
- 1 - Start URL + pages directly linked from it
- 3 - Recommended for most documentation sites
- 5+ - For deeply nested content
Concurrency
Number of parallel workers (1-20). Higher values crawl faster but put more load on your server. Default is 5.
Content Filtering
Include Patterns
Only crawl URLs matching these regex patterns. One pattern per line.
# Only crawl /docs/ section
^/docs/
# Only crawl English pages
^/en/ Exclude Patterns
Skip URLs matching these patterns:
# Skip preview/draft pages
\?.*preview=true
# Skip private areas
^/admin/
^/internal/
# Skip generated files
\.pdf$
\.zip$ JavaScript Rendering
Enable JavaScript rendering for Single Page Applications (SPAs) or sites with dynamically loaded content. This uses a headless browser to render pages.
Performance Note
JavaScript rendering is slower and more resource-intensive. Only enable it if your content requires JavaScript to display.
Custom Headers
Send custom HTTP headers with each request. Useful for:
- Basic authentication (staging sites)
- API keys
- Custom user agents
{
"Authorization": "Basic YWRtaW46cGFzc3dvcmQ=",
"X-Custom-Header": "value"
} Single URL Crawls
You can also crawl specific URLs without following links. This is useful for:
- Refreshing specific pages
- Adding new content immediately
- Testing changes
Enter up to 50 URLs (one per line) in the "Crawl URLs" modal.
Robots.txt
By default, the crawler respects your robots.txt file. It identifies as:
User-agent: QuantBot/1.0 (+https://quantcdn.io/bot) To allow QuantSearch while blocking other bots:
# robots.txt
User-agent: *
Disallow: /
User-agent: QuantBot
Allow: / Crawl Frequency
Currently, crawls are triggered manually from the dashboard. Scheduled crawls are coming soon.
Recommended frequency:
- Documentation - After each release
- Blog - Weekly
- Marketing site - When content changes
Troubleshooting
Pages not being indexed
Check:
- URL matches your include patterns (if any)
- URL doesn't match exclude patterns
- Max depth allows reaching the page
- Page returns 200 status code
robots.txtallows crawling
Content not extracted correctly
If content appears wrong in search results:
- Enable JavaScript rendering for SPAs
- Check if content is behind login/authentication
- Ensure content is in standard HTML (not canvas/images)
Crawl too slow
- Increase concurrency (be careful with your server load)
- Disable JavaScript rendering if not needed
- Use include patterns to focus on important sections