1<!DOCTYPE html> 2<html lang="en"> 3<head> 4 <title>jsoup Javadoc overview</title> 5</head> 6<body> 7<h1>jsoup: Java HTML parser that makes sense of real-world HTML soup.</h1> 8 9<p><b>jsoup</b> is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs 10 and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.</p> 11 12<p>jsoup implements the <a href="https://html.spec.whatwg.org/multipage/">WHATWG HTML</a> specification, and parses HTML to the same DOM 13 as modern browsers do.</p> 14 15<ul> 16 <li>parse HTML from a URL, file, or string 17 <li>find and extract data, using DOM traversal or CSS selectors 18 <li>manipulate the HTML elements, attributes, and text 19 <li>clean user-submitted content against a safelist, to prevent XSS 20 <li>output tidy HTML 21</ul> 22 23<p>jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, 24 to invalid tag-soup; jsoup will create a sensible parse tree.</p> 25 26<p>See <a href="https://jsoup.org/"><b>jsoup.org</b></a> for downloads, documentation, and examples.</p> 27 28@author <a href="https://jonathanhedley.com/">Jonathan Hedley</a> 29 30</body> 31</html> 32