Skip to main content

URL

Use the URL connector to import content directly from web pages. The connector crawls and ingests publicly-accessible web content and stores it in the Knowledge Repository for retrieval-augmented generation (RAG) queries.

When to use

  • Quick ingestion of public documentation, blogs, or knowledge bases
  • Periodic scraping of frequently-updated pages

Usage

  • Just provide the URL in form https://yoururl.com. It will scrape the page content if images are there then that will also scraped.
  • For images it will use OCR to retrieve data

Notes

  • Ensure the target URLs allow crawling and scraping.