ada-url¶
The urlib.parse module in Python does not follow the legacy RFC 3978 standard nor does it follow the newer WHATWG URL specification. It is also relatively slow.
This is ada_url
, a fast standard-compliant Python library for working with URLs based on the Ada
URL
parser.
Installation¶
Install from PyPI:
pip install ada_url
Usage examples¶
Parsing URLs¶
The URL
class is intended to match the one described in the
WHATWG URL spec:.
>>> from ada_url import URL
>>> urlobj = URL('https://example.org/path/../file.txt')
>>> urlobj.href
'https://example.org/path/file.txt'
The parse_url
function returns a dictionary of all URL elements:
>>> from ada_url import parse_url
>>> parse_url('https://user:pass@example.org:80/api?q=1#2')
{
'href': 'https://user:pass@example.org:80/api?q=1#2',
'username': 'user',
'password': 'pass',
'protocol': 'https:',
'port': '80',
'hostname': 'example.org',
'host': 'example.org:80',
'pathname': '/api',
'search': '?q=1',
'hash': '#2',
'origin': 'https://example.org:80',
'host_type': <HostType.DEFAULT: 0>,
'scheme_type': <SchemeType.HTTPS: 2>
}
Altering URLs¶
Replacing URL components with the URL
class:
>>> from ada_url import URL
>>> urlobj = URL('https://example.org/path/../file.txt')
>>> urlobj.host = 'example.com'
>>> urlobj.href
'https://example.com/file.txt'
Replacing URL components with the replace_url
function:
>>> from ada_url import replace_url
>>> replace_url('https://example.org/path/../file.txt', host='example.com')
'https://example.com/file.txt'
Search parameters¶
The URLSearchParams
class is intended to match the one described in the
WHATWG URL spec.
>>> from ada_url import URLSearchParams
>>> obj = URLSearchParams('key1=value1&key2=value2')
>>> list(obj.items())
[('key1', 'value1'), ('key2', 'value2')]
The parse_search_params
function returns a dictionary of search keys mapped to
value lists:
>>> from ada_url import parse_search_params
>>> parse_search_params('key1=value1&key2=value2')
{'key1': ['value1'], 'key2': ['value2']}
Internationalized domain names¶
The idna
class can encode and decode IDNs:
>>> from ada_url import idna
>>> idna.encode('Bücher.example')
b'xn--bcher-kva.example'
>>> idna.decode(b'xn--bcher-kva.example')
'bücher.example'
WHATWG URL compliance¶
This library is compliant with the WHATWG URL spec. This means, among other things, that it properly encodes IDNs and resolves paths:
>>> from ada_url import URL
>>> parsed_url = URL('https://www.GOoglé.com/./path/../path2/')
>>> parsed_url.hostname
'www.xn--googl-fsa.com'
>>> parsed_url.pathname
'/path2/'
Contrast that with the Python standard library’s urlib.parse
module:
>>> from urllib.parse import urlparse
>>> parsed_url = urlparse('https://www.GOoglé.com/./path/../path2/')
>>> parsed_url.hostname
'www.googlé.com'
>>> parsed_url.path
'/./path/../path2/'
Alternative Python bindings¶
This package uses CFFI to call
the Ada
library’s functions, which has a performance cost.
The alternative can_ada (Canadian Ada)
package uses pybind11 to generate a
Python extension module, which is more performant.
Building from source¶
You will need to have Python 3 development files installed.
On macOS, you will have these if you installed Python with brew
.
On Linux, you may need to install some packages (e.g., python3-dev
and python3-venv
).
You will also need a C++ toolchain.
On macOS, Xcode will provide this for you.
On Linux, you may need to install some more pacakges (e.g. build-esential
).
Clone the git repository to a directory for development:
git clone https://github.com/ada-url/ada-python.git ada_url_python
cd ada_url_python
Create a virtual environment to use for building:
python3 -m venv env
source ./env/bin/activate
After that, you’re ready to build the package:
python -m pip install -r requirements/development.txt
c++ -c "ada_url/ada.cpp" -fPIC -std="c++17" -O2 -o "ada_url/ada.o"
python -m build --no-isolation
This will create a .whl file in the dist directory. You can install it in other virtual environments on the same machine.
To run tests, first build a package. Then:
python -m pip install -e . python -m unittest
Leave the virtual environment with the deactivate
comamnd.
API Documentation¶
- class ada_url.URL(url, base=None)[source]¶
Parses a url (with an optional base) according to the WHATWG URL parsing standard.
>>> from ada_url import URL >>> old_url = 'https://example.org:443/file.txt?q=1' >>> urlobj = URL(old_url) >>> urlobj.host 'example.org' >>> urlobj.host = 'example.com' >>> new_url = urlobj.href >>> new_url 'https://example.com:443/file.txt?q=1'
You can read and write the following attributes:
href
protocol
username
password
host
hostname
port
pathname
search
hash
You can additionally read these attributes:
origin
, which will be astr
host_type
, which will be aHostType
enumscheme_type
, which will be aSchemeType
enum
The class also exposes a static method that checks whether the input url (and optional base) can be parsed:
>>> url = 'file_2.txt' >>> base = 'https://example.org:443/file_1.txt' >>> URL.can_parse(url, base) True
See the WHATWG docs for more details on the URL class.
- class ada_url.HostType[source]¶
Enum for URL host types:
DEFAULT
hosts likehttps://example.org
are0
.IPV4
hosts likehttps://192.0.2.1
are1
.IPV6
hosts likehttps://[2001:db8::]
are2
.
>>> from ada_url import HostType >>> HostType.DEFAULT <HostType.DEFAULT: 0>
- class ada_url.SchemeType[source]¶
Enum for URL scheme types.
HTTP
URLs likehttp://example.org
are0
.NOT_SPECIAL
URLs likegit://example.og
are1
.HTTPS
URLs likehttps://example.org
are2
.WS
URLs likews://example.org
are3
.FTP
URLs likeftp://example.org
are4
.WSS
URLs likewss://example.org
are5
.FILE
URLs likefile://example
are6
.
>>> from ada_url import SchemeType >>> SchemeType.HTTPS <SchemeType.HTTPS: 2>
- ada_url.check_url(s)[source]¶
Returns
True
if s represents a valid URL, andFalse
otherwise.>>> from ada_url import check_url >>> check_url('bogus') False >>> check_url('http://a/b/c/d;p?q') True
- ada_url.join_url(base_url, s)[source]¶
Return the URL that results from joining base_url to s. Raises
ValueError
if no valid URL can be constructed.>>> from ada_url import join_url >>> base_url = 'http://a/b/c/d;p?q' >>> join_url(base_url, '../g') 'http://a/b/g'
- ada_url.normalize_url(s)[source]¶
Returns a “normalized” URL with all
'..'
and'/'
characters resolved.>>> from ada_url import normalize_url >>> normalize_url('http://a/b/c/../g') 'http://a/b/g'
- ada_url.parse_url(s[, attributes])[source]¶
Returns a dictionary with the parsed components of the URL represented by s.
>>> from ada_url import parse_url >>> url = 'https://user_1:password_1@example.org:8080/dir/../api?q=1#frag' >>> parse_url(url) { 'href': 'https://user_1:password_1@example.org:8080/api?q=1#frag', 'username': 'user_1', 'password': 'password_1', 'protocol': 'https:', 'host': 'example.org:8080', 'port': '8080', 'hostname': 'example.org', 'pathname': '/api', 'search': '?q=1', 'hash': '#frag' 'origin': 'https://example.org:8080', 'host_type': 0 'scheme_type': 2 }
The names of the dictionary keys correspond to the components of the “URL class” in the WHATWG URL spec.
host_type
is aHostType
enum.scheme_type
is aSchemeType
enum.Pass in a sequence of attributes to limit which keys are returned.
>>> from ada_url import parse_url >>> url = 'https://user_1:password_1@example.org:8080/dir/../api?q=1#frag' >>> parse_url(url, attributes=('protocol')) {'protocol': 'https:'}
Unrecognized attributes are ignored.
- ada_url.replace_url(s, **kwargs)[source]¶
Start with the URL represented by s, replace the attributes given in the kwargs mapping, and return a normalized URL with the result.
Provide an empty string to unset an attribute.
>>> from ada_url import replace_url >>> base_url = 'https://user_1:password_1@example.org/resource' >>> replace_url(base_url, username='user_2', password='', protocol='http:') 'http://user_2@example.org/resource'
Unrecognized attributes are ignored.
href
is replaced first if it is given.hostname
is replaced beforehost
if both are given.ValueError
is raised if the input URL or one of the components is not valid.
- class ada_url.URLSearchParams(params)[source]¶
Parses the given params string according to the WHATWG URL parsing standard.
The attribute and methods from the standard are implemented:
>>> from ada_url import URLSearchParams >>> obj = URLSearchParams('key1=value1&key2=value2&key2=value3') >>> obj.size 3 >>> obj.append('key2', 'value4') >>> str(obj) 'key1=value1&key2=value2&key2=value3&key2=value4' >>> obj.delete('key1') >>> str(obj) 'key2=value2&key2=value3&key2=value4' >>> obj.delete('key2', 'value2') >>> str(obj) 'key2=value3&key2=value4' >>> obj.get('key2') 'value3' >>> obj.get_all('key2') ['value3', 'value4'] >>> obj.has('key2') True >>> obj.has('key2', 'value5') False >>> obj.set('key1', 'value6') >>> str(obj) 'key2=value3&key2=value4&key1=value6' >>> obj.sort() >>> str(obj) 'key1=value6&key2=value3&key2=value4'
Iterators for the
keys
,values
, anditems
are also implemented:>>> obj = URLSearchParams('key1=value1&key2=value2&key2=value3') >>> list(obj.keys()) ['key1', 'key2', 'key2'] >>> list(obj.values()) ['value1', 'value2', 'value3'] >>> list(obj.items()) [('key1', 'value1'), ('key2', 'value2'), ('key2', 'value3')]
See the WHATWG docs for more details on the URLSearchParams class.
- class ada_url.parse_search_params(s)[source]¶
Returns a dictionary representing the parsed URL Parameters specified by s. The returned dictionary maps each key to a list of values associated with it.
>>> from ada_url import parse_search_params >>> parse_search_params('key1=value1&key1=value2&key2=value3') {'key1': ['value1', 'value2'], 'key2': ['value3']}
- class ada_url.replace_search_params(s, *args)[source]¶
Returns a string representing the URL parameters specified by s, modified by the
(key, value)
pairs passed in as args.>>> from ada_url import replace_search_params >>> replace_search_params( ... 'key1=value1&key1=value2', ... ('key1', 'value3'), ... ('key2', 'value4') ... ) 'key1=value3&key2=value4'
- class ada_url.idna[source]¶
Process international domains according to the UTS #46 standard.
idna.encode()
implements the UTS #46ToASCII
operation. Its output is a Pythonbytes
object. It is also available asidna_to_ascii()
.>>> from ada_url import idna >>> idna.encode('meßagefactory.ca') b'xn--meagefactory-m9a.ca'
idna.decode()
implements the UTS #46ToUnicode
operation. Its oputput is a Pythonstr
object. It is also available asidna_to_unicode()
.>>> from ada_url import idna >>> idna.decode('xn--meagefactory-m9a.ca') 'meßagefactory.ca'
Both functions accept either
str
orbytes
objects as input.