ada-url¶
This is ada_url, a fast standard-compliant Python library for working with URLs based on the Ada URL
parser.
Installation¶
Install from PyPI:
pip install ada_url
Usage examples¶
Parsing URLs¶
The URL class is intended to match the one described in the
WHATWG URL spec.
>>> from ada_url import URL
>>> urlobj = URL('https://example.org/path/../file.txt')
>>> urlobj.href
'https://example.org/path/file.txt'
The parse_url function returns a dictionary of all URL elements:
>>> from ada_url import parse_url
>>> parse_url('https://user:pass@example.org:80/api?q=1#2')
{
'href': 'https://user:pass@example.org:80/api?q=1#2',
'username': 'user',
'password': 'pass',
'protocol': 'https:',
'port': '80',
'hostname': 'example.org',
'host': 'example.org:80',
'pathname': '/api',
'search': '?q=1',
'hash': '#2',
'origin': 'https://example.org:80',
'host_type': <HostType.DEFAULT: 0>,
'scheme_type': <SchemeType.HTTPS: 2>
}
Altering URLs¶
Replacing URL components with the URL class:
>>> from ada_url import URL
>>> urlobj = URL('https://example.org/path/../file.txt')
>>> urlobj.host = 'example.com'
>>> urlobj.href
'https://example.com/file.txt'
Replacing URL components with the replace_url function:
>>> from ada_url import replace_url
>>> replace_url('https://example.org/path/../file.txt', host='example.com')
'https://example.com/file.txt'
Search parameters¶
The URLSearchParams class is intended to match the one described in the
WHATWG URL spec.
>>> from ada_url import URLSearchParams
>>> obj = URLSearchParams('key1=value1&key2=value2')
>>> list(obj.items())
[('key1', 'value1'), ('key2', 'value2')]
The parse_search_params function returns a dictionary of search keys mapped to
value lists:
>>> from ada_url import parse_search_params
>>> parse_search_params('key1=value1&key2=value2')
{'key1': ['value1'], 'key2': ['value2']}
Internationalized domain names¶
The idna class can encode and decode IDNs:
>>> from ada_url import idna
>>> idna.encode('Bücher.example')
b'xn--bcher-kva.example'
>>> idna.decode(b'xn--bcher-kva.example')
'bücher.example'
WHATWG URL compliance¶
This library is compliant with the WHATWG URL spec. This means, among other things, that it properly encodes IDNs and resolves paths:
>>> from ada_url import URL
>>> parsed_url = URL('https://www.GOoglé.com/./path/../path2/')
>>> parsed_url.hostname
'www.xn--googl-fsa.com'
>>> parsed_url.pathname
'/path2/'
Contrast that with the Python standard library’s urllib.parse module, which loosely
follows the older RFC 3978 standard:
>>> from urllib.parse import urlparse
>>> parsed_url = urlparse('https://www.GOoglé.com/./path/../path2/')
>>> parsed_url.hostname
'www.googlé.com'
>>> parsed_url.path
'/./path/../path2/'
Performance¶
This package uses CFFI to call
the Ada C library’s functions, which makes it faster than the Python standard
library’s urllib.parse module for most applications.
An alternative package, can_ada, uses
pybind11 to interact with the Ada
C++ library functions, which is even faster.
Building from source¶
You will need to have Python 3 development files installed.
On macOS, you will have these if you installed Python with brew.
On Linux, you may need to install some packages (e.g., python3-dev and python3-venv).
You will also need a C++ toolchain.
On macOS, Xcode will provide this for you.
On Linux, you may need to install some more pacakges (e.g. build-esential).
Clone the git repository to a directory for development:
git clone https://github.com/ada-url/ada-python.git ada_url_python
cd ada_url_python
Create a virtual environment to use for building:
python3 -m venv env
source ./env/bin/activate
After that, you’re ready to build the package:
python -m pip install -r requirements/development.txt
python -m build --no-isolation
This will create a .whl file in the dist directory. You can install it in other virtual environments on the same machine.
To run tests, first build a package. Then:
python -m pip install -e . python -m unittest
Leave the virtual environment with the deactivate comamnd.
API Documentation¶
- class ada_url.URL(url, base=None)[source]¶
Parses a url (with an optional base) according to the WHATWG URL parsing standard.
>>> from ada_url import URL >>> old_url = 'https://example.org:443/file.txt?q=1' >>> urlobj = URL(old_url) >>> urlobj.host 'example.org' >>> urlobj.host = 'example.com' >>> new_url = urlobj.href >>> new_url 'https://example.com:443/file.txt?q=1'
You can read and write the following attributes:
hrefprotocolusernamepasswordhosthostnameportpathnamesearchhash
You can additionally read these attributes:
origin, which will be astrhost_type, which will be aHostTypeenumscheme_type, which will be aSchemeTypeenum
The class also exposes a static method that checks whether the input url (and optional base) can be parsed:
>>> url = 'file_2.txt' >>> base = 'https://example.org:443/file_1.txt' >>> URL.can_parse(url, base) True
See the WHATWG docs for more details on the URL class.
- class ada_url.HostType[source]¶
Enum for URL host types:
DEFAULThosts likehttps://example.orgare0.IPV4hosts likehttps://192.0.2.1are1.IPV6hosts likehttps://[2001:db8::]are2.
>>> from ada_url import HostType >>> HostType.DEFAULT <HostType.DEFAULT: 0>
- class ada_url.SchemeType[source]¶
Enum for URL scheme types.
HTTPURLs likehttp://example.orgare0.NOT_SPECIALURLs likegit://example.ogare1.HTTPSURLs likehttps://example.orgare2.WSURLs likews://example.orgare3.FTPURLs likeftp://example.orgare4.WSSURLs likewss://example.orgare5.FILEURLs likefile://exampleare6.
>>> from ada_url import SchemeType >>> SchemeType.HTTPS <SchemeType.HTTPS: 2>
- ada_url.check_url(s)[source]¶
Returns
Trueif s represents a valid URL, andFalseotherwise.>>> from ada_url import check_url >>> check_url('bogus') False >>> check_url('http://a/b/c/d;p?q') True
- ada_url.join_url(base_url, s)[source]¶
Return the URL that results from joining base_url to s. Raises
ValueErrorif no valid URL can be constructed.>>> from ada_url import join_url >>> base_url = 'http://a/b/c/d;p?q' >>> join_url(base_url, '../g') 'http://a/b/g'
- ada_url.normalize_url(s)[source]¶
Returns a “normalized” URL with all
'..'and'/'characters resolved.>>> from ada_url import normalize_url >>> normalize_url('http://a/b/c/../g') 'http://a/b/g'
- ada_url.parse_url(s[, attributes])[source]¶
Returns a dictionary with the parsed components of the URL represented by s.
>>> from ada_url import parse_url >>> url = 'https://user_1:password_1@example.org:8080/dir/../api?q=1#frag' >>> parse_url(url) { 'href': 'https://user_1:password_1@example.org:8080/api?q=1#frag', 'username': 'user_1', 'password': 'password_1', 'protocol': 'https:', 'host': 'example.org:8080', 'port': '8080', 'hostname': 'example.org', 'pathname': '/api', 'search': '?q=1', 'hash': '#frag' 'origin': 'https://example.org:8080', 'host_type': 0 'scheme_type': 2 }
The names of the dictionary keys correspond to the components of the “URL class” in the WHATWG URL spec.
host_typeis aHostTypeenum.scheme_typeis aSchemeTypeenum.Pass in a sequence of attributes to limit which keys are returned.
>>> from ada_url import parse_url >>> url = 'https://user_1:password_1@example.org:8080/dir/../api?q=1#frag' >>> parse_url(url, attributes=('protocol')) {'protocol': 'https:'}
Unrecognized attributes are ignored.
- ada_url.replace_url(s, **kwargs)[source]¶
Start with the URL represented by s, replace the attributes given in the kwargs mapping, and return a normalized URL with the result.
Provide an empty string to unset an attribute.
>>> from ada_url import replace_url >>> base_url = 'https://user_1:password_1@example.org/resource' >>> replace_url(base_url, username='user_2', password='', protocol='http:') 'http://user_2@example.org/resource'
Unrecognized attributes are ignored.
hrefis replaced first if it is given.hostnameis replaced beforehostif both are given.ValueErroris raised if the input URL or one of the components is not valid.
- class ada_url.URLSearchParams(params)[source]¶
Parses the given params string according to the WHATWG URL parsing standard.
The attribute and methods from the standard are implemented:
>>> from ada_url import URLSearchParams >>> obj = URLSearchParams('key1=value1&key2=value2&key2=value3') >>> obj.size 3 >>> obj.append('key2', 'value4') >>> str(obj) 'key1=value1&key2=value2&key2=value3&key2=value4' >>> obj.delete('key1') >>> str(obj) 'key2=value2&key2=value3&key2=value4' >>> obj.delete('key2', 'value2') >>> str(obj) 'key2=value3&key2=value4' >>> obj.get('key2') 'value3' >>> obj.get_all('key2') ['value3', 'value4'] >>> obj.has('key2') True >>> obj.has('key2', 'value5') False >>> obj.set('key1', 'value6') >>> str(obj) 'key2=value3&key2=value4&key1=value6' >>> obj.sort() >>> str(obj) 'key1=value6&key2=value3&key2=value4'
Iterators for the
keys,values, anditemsare also implemented:>>> obj = URLSearchParams('key1=value1&key2=value2&key2=value3') >>> list(obj.keys()) ['key1', 'key2', 'key2'] >>> list(obj.values()) ['value1', 'value2', 'value3'] >>> list(obj.items()) [('key1', 'value1'), ('key2', 'value2'), ('key2', 'value3')]
See the WHATWG docs for more details on the URLSearchParams class.
- class ada_url.parse_search_params(s)[source]¶
Returns a dictionary representing the parsed URL Parameters specified by s. The returned dictionary maps each key to a list of values associated with it.
>>> from ada_url import parse_search_params >>> parse_search_params('key1=value1&key1=value2&key2=value3') {'key1': ['value1', 'value2'], 'key2': ['value3']}
- class ada_url.replace_search_params(s, *args)[source]¶
Returns a string representing the URL parameters specified by s, modified by the
(key, value)pairs passed in as args.>>> from ada_url import replace_search_params >>> replace_search_params( ... 'key1=value1&key1=value2', ... ('key1', 'value3'), ... ('key2', 'value4') ... ) 'key1=value3&key2=value4'
- class ada_url.idna[source]¶
Process international domains according to the UTS #46 standard.
idna.encode()implements the UTS #46ToASCIIoperation. Its output is a Pythonbytesobject. It is also available asidna_to_ascii().>>> from ada_url import idna >>> idna.encode('meßagefactory.ca') b'xn--meagefactory-m9a.ca'
idna.decode()implements the UTS #46ToUnicodeoperation. Its oputput is a Pythonstrobject. It is also available asidna_to_unicode().>>> from ada_url import idna >>> idna.decode('xn--meagefactory-m9a.ca') 'meßagefactory.ca'
Both functions accept either
strorbytesobjects as input.