URL Connector

Automatically ingest and index content from web pages to power your Knowledge Repository.

This connector crawls publicly accessible URLs, extracting both text and data from images (via OCR) to prepare it for Agentic-Retrieval-Augmented Generation (A-RAG).

💡 Core Concepts

To use this connector effectively, understand how it processes web content.

1. What does the URL Connector do?

Unlike the Webpage Search tool (which an agent uses actively during a conversation), the URL Connector is used to pre-load knowledge. It crawls a target URL, scrapes the content, and stores it in your Knowledge Repository. This allows agents to answer questions based on that content later.

⚙️ Configuration Steps

Follow these steps to add a web source to your Knowledge Repository.

Prepare the Source

Identify the URL you wish to ingest. This works best for:

Public documentation sites.
Company docs, blogs, or news pages.
Static knowledge bases.

Add Connector

In your Knowledge Repository configuration, select the URL connector type.

Input: Provide the full URL string (e.g., https://docs.svahnar.com).

Ingestion Process

Once triggered, the system will:

Crawl: Access the page.
Extract: Scrape HTML text and perform OCR on <img> tags.
Index: Chunk the data and store embeddings in the Knowledge Repository.

🚑 Troubleshooting

Content not appearing in RAG
- Check if the URL is behind a login or firewall. The connector requires public access.
- Ensure the website allows crawling. Some sites block bots via robots.txt or Cloudflare challenges.
Poor OCR Results
- OCR accuracy depends on image resolution. Low-quality screenshots may yield gibberish.
- Ensure the images on the target URL are not lazy-loaded in a way that prevents the crawler from seeing them.
Ingestion Failures
- Verify the URL format includes the protocol (must be https:// or http://).

💰 Additional Credit Consumption

Extracted information is processed into pages of size A4 (8.5" x 11"). Each page consumes one credit.

💡 Core Concepts​

1. What does the URL Connector do?​

⚙️ Configuration Steps​

Prepare the Source​

Add Connector​

Ingestion Process​

🚑 Troubleshooting​

💰 Additional Credit Consumption​