1--- 2c: Copyright (C) Daniel Stenberg, <[email protected]>, et al. 3SPDX-License-Identifier: curl 4Title: curl_url_get 5Section: 3 6Source: libcurl 7See-also: 8 - CURLOPT_CURLU (3) 9 - curl_url (3) 10 - curl_url_cleanup (3) 11 - curl_url_dup (3) 12 - curl_url_set (3) 13 - curl_url_strerror (3) 14Protocol: 15 - All 16Added-in: 7.62.0 17--- 18 19# NAME 20 21curl_url_get - extract a part from a URL 22 23# SYNOPSIS 24 25~~~c 26#include <curl/curl.h> 27 28CURLUcode curl_url_get(const CURLU *url, 29 CURLUPart part, 30 char **content, 31 unsigned int flags); 32~~~ 33 34# DESCRIPTION 35 36Given a *url* handle of a URL object, this function extracts an individual 37piece or the full URL from it. 38 39The *part* argument specifies which part to extract (see list below) and 40*content* points to a 'char *' to get updated to point to a newly 41allocated string with the contents. 42 43The *flags* argument is a bitmask with individual features. 44 45The returned content pointer must be freed with curl_free(3) after use. 46 47# FLAGS 48 49The flags argument is zero, one or more bits set in a bitmask. 50 51## CURLU_DEFAULT_PORT 52 53If the handle has no port stored, this option makes curl_url_get(3) 54return the default port for the used scheme. 55 56## CURLU_DEFAULT_SCHEME 57 58If the handle has no scheme stored, this option makes curl_url_get(3) 59return the default scheme instead of error. 60 61## CURLU_NO_DEFAULT_PORT 62 63Instructs curl_url_get(3) to not return a port number if it matches the 64default port for the scheme. 65 66## CURLU_URLDECODE 67 68Asks curl_url_get(3) to URL decode the contents before returning it. It 69does not decode the scheme, the port number or the full URL. 70 71The query component also gets plus-to-space conversion as a bonus when this 72bit is set. 73 74Note that this URL decoding is charset unaware and you get a zero terminated 75string back with data that could be intended for a particular encoding. 76 77If there are byte values lower than 32 in the decoded string, the get 78operation returns an error instead. 79 80## CURLU_URLENCODE 81 82If set, curl_url_get(3) URL encodes the hostname part when a full URL is 83retrieved. If not set (default), libcurl returns the URL with the hostname raw 84to support IDN names to appear as-is. IDN hostnames are typically using 85non-ASCII bytes that otherwise gets percent-encoded. 86 87Note that even when not asking for URL encoding, the '%' (byte 37) is URL 88encoded to make sure the hostname remains valid. 89 90## CURLU_PUNYCODE 91 92If set and *CURLU_URLENCODE* is not set, and asked to retrieve the 93**CURLUPART_HOST** or **CURLUPART_URL** parts, libcurl returns the host 94name in its punycode version if it contains any non-ASCII octets (and is an 95IDN name). 96 97If libcurl is built without IDN capabilities, using this bit makes 98curl_url_get(3) return *CURLUE_LACKS_IDN* if the hostname contains 99anything outside the ASCII range. 100 101(Added in curl 7.88.0) 102 103## CURLU_PUNY2IDN 104 105If set and asked to retrieve the **CURLUPART_HOST** or **CURLUPART_URL** 106parts, libcurl returns the hostname in its IDN (International Domain Name) 107UTF-8 version if it otherwise is a punycode version. If the punycode name 108cannot be converted to IDN correctly, libcurl returns 109*CURLUE_BAD_HOSTNAME*. 110 111If libcurl is built without IDN capabilities, using this bit makes 112curl_url_get(3) return *CURLUE_LACKS_IDN* if the hostname is using 113punycode. 114 115(Added in curl 8.3.0) 116 117## CURLU_GET_EMPTY 118 119When this flag is used in curl_url_get(), it makes the function return empty 120query and fragments parts or when used in the full URL. By default, libcurl 121otherwise considers empty parts non-existing. 122 123An empty query part is one where this is nothing following the question mark 124(before the possible fragment). An empty fragments part is one where there is 125nothing following the hash sign. 126 127(Added in curl 8.8.0) 128 129## CURLU_NO_GUESS_SCHEME 130 131When this flag is used in curl_url_get(), it treats the scheme as non-existing 132if it was set as a result of a previous guess; when CURLU_GUESS_SCHEME was 133used parsing a URL. 134 135Using this flag when getting CURLUPART_SCHEME if the scheme was set as the 136result of a guess makes curl_url_get() return CURLUE_NO_SCHEME. 137 138Using this flag when getting CURLUPART_URL if the scheme was set as the result 139of a guess makes curl_url_get() return the full URL without the scheme 140component. Such a URL can then only be parsed with curl_url_set() if 141CURLU_GUESS_SCHEME is used. 142 143(Added in curl 8.9.0) 144 145# PARTS 146 147## CURLUPART_URL 148 149When asked to return the full URL, curl_url_get(3) returns a normalized and 150possibly cleaned up version using all available URL parts. 151 152We advise using the *CURLU_PUNYCODE* option to get the URL as "normalized" as 153possible since IDN allows hostnames to be written in many different ways that 154still end up the same punycode version. 155 156Zero-length queries and fragments are excluded from the URL unless 157CURLU_GET_EMPTY is set. 158 159## CURLUPART_SCHEME 160 161Scheme cannot be URL decoded on get. 162 163## CURLUPART_USER 164 165## CURLUPART_PASSWORD 166 167## CURLUPART_OPTIONS 168 169The options field is an optional field that might follow the password in the 170userinfo part. It is only recognized/used when parsing URLs for the following 171schemes: pop3, smtp and imap. The URL API still allows users to set and get 172this field independently of scheme when not parsing full URLs. 173 174## CURLUPART_HOST 175 176The hostname. If it is an IPv6 numeric address, the zone id is not part of it 177but is provided separately in *CURLUPART_ZONEID*. IPv6 numerical addresses 178are returned within brackets ([]). 179 180IPv6 names are normalized when set, which should make them as short as 181possible while maintaining correct syntax. 182 183## CURLUPART_ZONEID 184 185If the hostname is a numeric IPv6 address, this field might also be set. 186 187## CURLUPART_PORT 188 189A port cannot be URL decoded on get. This number is returned in a string just 190like all other parts. That string is guaranteed to hold a valid port number in 191ASCII using base 10. 192 193## CURLUPART_PATH 194 195The *part* is always at least a slash ('/') even if no path was supplied 196in the URL. A URL path always starts with a slash. 197 198## CURLUPART_QUERY 199 200The initial question mark that denotes the beginning of the query part is a 201delimiter only. It is not part of the query contents. 202 203A not-present query returns *part* set to NULL. 204 205A zero-length query returns *part* as NULL unless CURLU_GET_EMPTY is set. 206 207The query part gets pluses converted to space when asked to URL decode on get 208with the CURLU_URLDECODE bit. 209 210## CURLUPART_FRAGMENT 211 212The initial hash sign that denotes the beginning of the fragment is a 213delimiter only. It is not part of the fragment contents. 214 215A not-present fragment returns *part* set to NULL. 216 217A zero-length fragment returns *part* as NULL unless CURLU_GET_EMPTY is set. 218 219# %PROTOCOLS% 220 221# EXAMPLE 222 223~~~c 224int main(void) 225{ 226 CURLUcode rc; 227 CURLU *url = curl_url(); 228 rc = curl_url_set(url, CURLUPART_URL, "https://example.com", 0); 229 if(!rc) { 230 char *scheme; 231 rc = curl_url_get(url, CURLUPART_SCHEME, &scheme, 0); 232 if(!rc) { 233 printf("the scheme is %s\n", scheme); 234 curl_free(scheme); 235 } 236 curl_url_cleanup(url); 237 } 238} 239~~~ 240 241# %AVAILABILITY% 242 243# RETURN VALUE 244 245Returns a CURLUcode error value, which is CURLUE_OK (0) if everything went 246fine. See the libcurl-errors(3) man page for the full list with 247descriptions. 248 249If this function returns an error, no URL part is returned. 250