Crawling
Crawl a website up to 3 levels deep
How do I crawl a website?
-
Go to the Knowledge Base of the Library you want to add knowledge to.
-
Click
Add Knowledge
button.
- Select Crawl website from the Add Knowledge window.
The page will be changed to enable you to crawl a website.
- Paste the URL of the site you want to crawl in the URL field.
- Click
Submit
button. Once submitted, it will take some time to crawl the website. You can check how many pages are currently found in the bottom left corner of the Crawl Website Section.
- Once the crawling is done, a window will show you the crawled pages and the number of credits that will be consumed to add the pages to the knowledge base.
You can selectively include crawled pages in your knowledge base by using the checkboxes located in the third column of the table.
For instance, in this case, I opted to exclude the Log in to DoNoHarm page from being added to the knowledge base. When you exclude a page, it will also result in a deduction of the credits used, based on the credit value associated with the excluded page.
- Click on the
Submit
button in the window to confirm and add it to your library.
It’s scraping unnecessary content. How do I refine what I scrape?
You can open the Advanced
accordion and use selectors to select the content you want to scrape in the URL.
- Open the Advanced accordion by clicking on it.
- Input the selectors of the content you want to scrape in the text field. Thse selectors are used to select the content you want to scrape in the URL.
For example, you have a page that has the following code section:
And you only want to scrape the text in Inclusive Design and Prior Art. You can use the following CSS selector to scrape those specific content.
- Click on
Submit
button.
Once submitted, a window will appear showing how much credits you will use to scrape.
An error will appear in the window if the selector you inputted is invalid. Make sure that the selector you’re using is valid to proceed.
- Click on the
Submit
button in the window to confirm the scraping and add it to your library.
Advanced Scraping Output