URL Connector
Automatically ingest and index content from web pages to power your Knowledge Repository.
This connector crawls publicly accessible URLs, extracting both text and data from images (via OCR) to prepare it for Agentic-Retrieval-Augmented Generation (A-RAG).
💡 Core Concepts
To use this connector effectively, understand how it processes web content.
1. What does the URL Connector do?
Unlike the Webpage Search tool (which an agent uses actively during a conversation), the URL Connector is used to pre-load knowledge. It crawls a target URL, scrapes the content, and stores it in your Knowledge Repository. This allows agents to answer questions based on that content later.
⚙️ Configuration Steps
Follow these steps to add a web source to your Knowledge Repository.
Prepare the Source
Identify the URL you wish to ingest. This works best for:
- Public documentation sites.
- Company docs, blogs, or news pages.
- Static knowledge bases.
Add Connector
In your Knowledge Repository configuration, select the URL connector type.
- Input: Provide the full URL string (e.g.,
https://docs.svahnar.com).
Ingestion Process
Once triggered, the system will:
- Crawl: Access the page.
- Extract: Scrape HTML text and perform OCR on
<img>tags. - Index: Chunk the data and store embeddings in the Knowledge Repository.
🚑 Troubleshooting
-
Content not appearing in RAG
- Check if the URL is behind a login or firewall. The connector requires public access.
- Ensure the website allows crawling. Some sites block bots via
robots.txtor Cloudflare challenges.
-
Poor OCR Results
- OCR accuracy depends on image resolution. Low-quality screenshots may yield gibberish.
- Ensure the images on the target URL are not lazy-loaded in a way that prevents the crawler from seeing them.
-
Ingestion Failures
- Verify the URL format includes the protocol (must be
https://orhttp://).
- Verify the URL format includes the protocol (must be
💰 Additional Credit Consumption
Extracted information is processed into pages of size A4 (8.5" x 11"). Each page consumes one credit.