xref: /aosp_15_r20/external/jsoup/src/main/javadoc/overview.html (revision 6da8f8c4bc310ad659121b84dd089062417a2ce2)
1<!DOCTYPE html>
2<html lang="en">
3<head>
4  <title>jsoup Javadoc overview</title>
5</head>
6<body>
7<h1>jsoup: Java HTML parser that makes sense of real-world HTML soup.</h1>
8
9<p><b>jsoup</b> is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs
10  and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.</p>
11
12<p>jsoup implements the <a href="https://html.spec.whatwg.org/multipage/">WHATWG HTML</a> specification, and parses HTML to the same DOM
13  as modern browsers do.</p>
14
15<ul>
16  <li>parse HTML from a URL, file, or string
17  <li>find and extract data, using DOM traversal or CSS selectors
18  <li>manipulate the HTML elements, attributes, and text
19  <li>clean user-submitted content against a safelist, to prevent XSS
20  <li>output tidy HTML
21</ul>
22
23<p>jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating,
24  to invalid tag-soup; jsoup will create a sensible parse tree.</p>
25
26<p>See <a href="https://jsoup.org/"><b>jsoup.org</b></a> for downloads, documentation, and examples.</p>
27
28@author <a href="https://jonathanhedley.com/">Jonathan Hedley</a>
29
30</body>
31</html>
32