How do I add my sitemap to Libraria?

  1. Go to Knowledge Base > Add knowledge > Sitemap
  2. Paste your sitemap url

Why should I use a sitemap instead of crawling a website?

  1. Efficiency: A sitemap provides a clear roadmap of all the URLs on a website, saving time and resources as the scraper doesn’t need to discover these URLs by following links.
  2. Completeness: Sitemaps list all intended public pages, ensuring no important pages are missed, even if they aren’t linked from other parts of the site.
  3. Prioritization: Sitemaps can provide a <priority> tag, helping scrapers determine which pages to scrape first.
  4. Reduced Risk of Getting Blocked: Using a sitemap makes scraping activities resemble the behavior of search engines, reducing the risk of being seen as suspicious and getting blocked.

How do I find my sitemap?

Finding your website’s sitemap depends on how your website was created and hosted. Here are some general steps and places to check:

TLDR: Most sitemaps are located in https://<your_website>.com/sitemap.xml or https://<your_website>.com/sitemap_index.xml. If your website does not have a sitemap, you can use a tool like XML-Sitemaps.com to generate one or crawl a website instead.

  1. Default Sitemap URL: Most platforms and CMS systems have default locations for sitemaps.

    • Example: If your website is https://example.com, try accessing https://example.com/sitemap.xml.
  2. Content Management System (CMS):

    • WordPress: If you’re using WordPress, and you’ve installed an SEO plugin like Yoast or All in One SEO, they often generate a sitemap for you. You can find it at https://example.com/sitemap_index.xml (for Yoast).
    • Other CMS: Check the documentation or settings of your CMS. There’s often a section dedicated to SEO or sitemaps.
  3. Webmaster Tools:

    • If you’ve submitted your sitemap to search engine webmaster tools (like Google Search Console or Bing Webmaster Tools), you can find the sitemap URL there.
  4. Robots.txt:

    • Sitemaps are sometimes referenced in the robots.txt file. Try accessing https://example.com/robots.txt and see if there’s a sitemap URL mentioned there.
  5. Website Footer or Header:

    • Some websites link to their sitemap from the footer or header for user accessibility.
  6. Ask Your Web Developer or Hosting Provider:

    • If you hired someone to develop your website or if you’re using a hosting provider that offers website building tools, they might know where the sitemap is located.
  7. Search for It:

    • If you’re unsure of the exact URL, you can try doing a site-specific search on a search engine like Google. Type site:example.com sitemap.xml into the search bar and see if any results come up.
  8. Manual Generation:

    • If you can’t find your sitemap, it’s possible you don’t have one. In that case, there are many online tools and plugins available that can help you generate a sitemap for your website.
  9. Check Website Source Code:

    • Sometimes, the sitemap link is included in the source code of the website. Right-click on the homepage and select “View Page Source” or a similar option, then use the browser’s “Find” function (usually Ctrl+F or Cmd+F) and search for “sitemap”.
  10. Check .htaccess or server configuration:

  • If you have access to the server or hosting environment, check the .htaccess file (for Apache servers) or other server configuration files to see if there’s a rewrite rule or redirection related to the sitemap.

Remember, not all websites have a sitemap, if your website does not have one, you can use a tool like XML-Sitemaps.com to generate one or crawl a website instead.