Semantically-Lossless Normalizations¶
There is a set of normalizations that do not change the semantics of a URL. These are defined as
Normalizer::PRESERVING_NORMALIZATIONS
. The normalizer applies this set of normalizations if no specific
normalizations are requested.
- capitalize percent encoding
- decode unreserved characters
- convert empty http path
- remove default file host
- remove port host
- remove path dot segments
- convert host unicode to punycode
<?php
use webignition\Url\Normalizer;
use webignition\Url\Url;
$url = new Url('http//♥.example.com:80/p%61th/../?option=%3f');
$normalizedUrl = Normalizer::normalize($url);
(string) $normalizedUrl;
// "http//xn--g6h.example.com:80/path/?option=%3F"
The Normalizer::PRESERVING_NORMALIZATIONS
flag can be used in conjunction with additional normalizations.
<?php
use webignition\Url\Normalizer;
use webignition\Url\Url;
$url = new Url('http//♥.example.com:80/p%61th/../?option=%3f&b=bear&a-apple');
$normalizedUrl = Normalizer::normalize(
$url,
Normalizer::PRESERVING_NORMALIZATIONS |
Normalizer::SORT_QUERY_PARAMETERS
);
(string) $normalizedUrl;
// "http//xn--g6h.example.com:80/path/?a=apple&bear&option=%3F"