URL parsing

Note: this was originally written as an answer to the Quora question “Are there URLs one cannot send over Facebook messenger due to its broken URL parsing?”.

Yes, at least if you want the URL to be clickable. One instance where this happens is if a URL ends with * (a star or asterisk). Facebook interprets a star followed by a space to be the end of a URL (and does not include the star in the URL), so one cannot click to navigate to http://exp.issarice.com/lol/*, since it will go instead to http://exp.issarice.com/lol/. Percent-encoding the URL results in http://exp.issarice.com/lol/%2A, but this displays a different page in this demo. (Try going to both the version with the * and with the %2A.)

Here’s what I get using curl:

% curl 'http://exp.issarice.com/lol/*'
You cannot access this by clicking on Facebook!
% curl 'http://exp.issarice.com/lol/%2A'
This URL was URL-encoded so can be clicked on Facebook.

This demo was created on nginx with the following:

location = /lol/* {
    if ($request_uri = "/lol/*") {
        return 200 "You cannot access this by clicking on Facebook!";
    }
    if ($request_uri = "/lol/%2A") {
        return 200 "This URL was URL-encoded so can be clicked on Facebook.";
    }
}

One might feel this is quite contrived, but I actually first encountered it when trying to send an archive.org link to a friend. The URL ending with * (which Facebook cannot send) was interpreted to mean “search all the URLs in this domain”, whereas the percent-encoded URL ending in %2A was interpreted to mean “find the URL in this domain containing just %2A”.

I also encountered a problem once when I sent a series of long URLs in a single message (I’ll have to dig that example up).