Editor’s note: If you want to crawl a website instead, we suggest going to sitemap or crawler
How many URLs can I add at a time?
You can paste up to 100 URLs at a time.
How do I add a URL to my knowledge base?
-
Go to the Knowledge Base of the Library you want to add a knowledge to.
-
Click Add Knowledge
button.
- Select Scrape with links from the Add Knowledge window.
- Paste the URLs you want to scrape in the field. Click Add more if you want to add more URLs.
- Choose what scrape type you want to use by clicking on the button of the option you want to choose.
Regular Scraping
It will scrape all the content in the page including the unnecessary content like the header, footer, and other unnecessary content.
AI Scraping
Libraria’s AI will scrape and summarize the most important information. Output will not inlcude the unnecessary content.
- Click
Submit
button. Once submitted, a window will appear showing how much credits you will use to scrape.
The credits used for the Regular Scraping will depend on how much content is in the page. The more content, the more credits you will use.
The credits used for the AI Scraping also depends on content volume but it will be more costly than the regular scraping, five times more costly to be exact.
- Click on the
Submit
button in the window to confirm the scraping and add it to your library.
It’s scraping unnecessary content. How do I refine what I scrape?
Option 1: Advanced
You can open the Advanced
accordion and use selectors to select the content you want to scrape in the URL.
- Open the Advanced accordion by clicking on it.
- Input the selectors of the content you want to scrape in the text field. Thse selectors are used to select the content you want to scrape in the URL.
For example, you have a page that has the following code section:
<section role="doc-introduction">
<h2 id="introduction" tabindex="-1">Introduction</h2>
<p>I realize not everybody’s going to ditch the Web and switch to Gemini or Gopher today (<span data-literal="that would be a difficult and unrealistic transition">that’ll take, like, at least a month /s</span>). Until that happens, here’s a non-exhaustive, highly-opinionated list of best practices for websites that focus primarily on text. I don’t expect anybody to fully agree with the list; nonetheless, the article should have at least some useful information for any web content author or front-end web developer.</p>
<h3 id="inclusive-design" tabindex="-1">Inclusive design</h3>
<p>My primary focus is <a href="https://100daysofa11y.com/2019/12/03/accommodation-versus-inclusive-design/">inclusive design</a>. Specifically, I focus on supporting <em>underrepresented ways to read a page</em>. Not all users load a page in a common web-browser and navigate effortlessly with their eyes and hands. Authors often neglect people who read through accessibility tools, tiny viewports, machine translators, “reading mode” implementations, the Tor network, printouts, hostile networks, and uncommon browsers, to name a few. I list more niches in <a href="#conclusion">the conclusion</a>. Compatibility with so many niches sounds far more daunting than it really is: if you only selectively override browser defaults and use plain-old, semantic HTML (<abbr title="plain-old, semantic HTML">POSH</abbr>), you’ve done half of the work already.</p>
<p>One of the core ideas behind the flavor of inclusive design I present is <dfn id="inc-by-default" tabindex="-1">inclusivity by default</dfn>. Web pages shouldn’t use accessible overlays, reduced-data modes, or other personalizations if these features can be available all the time. Personalization isn’t always possible: Tor users, students using school computers, and people with restrictive corporate policies can’t “make websites work for them”; that’s a webmaster’s responsibility.</p>
<p>At the same time, many users do apply personalizations; sites should respect those personalizations whenever possible. Balancing these two needs is difficult. Some features conflict; you can’t display a light and dark color scheme simultaneously. Personalization is a fallback strategy to resolve conflicting needs. Disproportionately underrepresented needs deserve disproportionately greater attention, so they come before personal preferences instead of being relegated to a separate lane.</p>
<h3 id="prior-art" tabindex="-1">Prior art</h3>
<p>You can regard this article as an elaboration on existing work by the Web Accessibility Initiative (<abbr title="Web Accessibility Initiative’s">WAI</abbr>).</p>
<p>I’ll cite the <abbr>WAI’s</abbr> <span class="h-cite" itemprop="citation" itemscope="" itemtype="https://schema.org/TechArticle"><cite itemprop="name" class="p-name"><a class="u-url" itemprop="url" href="https://www.w3.org/WAI/WCAG22/Techniques/">Techniques for WCAG 2.2</a></cite></span> a number of times. Each “Success Criterion” (requirement) of the WCAG has possible techniques. Unlike the <cite>Web Content Accessibility Guidelines</cite> (<abbr title="Web Content Accessibility Guidelines">WCAG</abbr>), the Techniques document does not list requirements; rather, it serves to non-exhaustively educate authors about <em>how</em> to use specific technologies to comply with the WCAG. I don’t find much utility in the technology-agnostic goals enumerated by the WCAG without the accompanying technology-specific techniques to meet those goals.</p>
<p>I’ll also cite <span class="h-cite" itemid="https://www.w3.org/TR/coga-usable/" itemprop="citation" itemscope="" itemtype="https://schema.org/TechArticle"><cite itemprop="name" class="p-name"><a class="u-url" itemprop="url" href="https://www.w3.org/TR/coga-usable/">Making Content Usable for People with Cognitive and Learning Disabilities</a></cite>, by <span itemscope="" itemtype="https://schema.org/Organization" itemprop="publisher">the WAI</span></span>. The document lists eight objectives. Each objective has associated personas, and can be met by several design patterns.</p>
<h3 id="why-this-article-exists" tabindex="-1">Why this article exists</h3>
<p>Performance and accessibility guidelines are scattered across multiple WAI documents and blog posts. Moreover, guidelines tend to be overly general and avoid giving specific advice. Guidelines from different places tend to contradict each other, especially when they have different goals (e.g., security and accessibility). They also tend to be focused on large corporate sites rather than the simple text-oriented content the Web was made for.</p>
<p>I wanted to create a single reference with non-contradictory guidelines, containing advice more specific and opinionated than existing material. I also wanted to approach the very different aspects of site design from the same perspective and in the same place, allowing readers to draw connections between them.</p>
</section>
And you only want to scrape the text in Inclusive Design and Prior Art. You can use the following CSS selector to scrape those specific content.
#inclusive-design ~ p, #prior-art ~ p
- Click on
Submit
button.
Once submitted, a window will appear showing how much credits you will use to scrape.
An error will appear in the window if the selector you inputted is invalid. Make sure that the selector you’re using is valid to proceed.
- Click on the
Submit
button in the window to confirm the scraping and add it to your library.
Option 2: Use AI Scraping
Libraria’s AI will scrape and summarize the most important information. Refer to the AI Scraping section above for more information.