How many of us actually know the URI scheme for a URL while we access it (where a scheme in this case is either http:// or https:// ) ? Many a times, we might have observed that accessing a website for e.g. http://www.foo.com or simply foo.com, automatically redirects us to the https://www.foo.com version of the site (if it supports SSL/TLS). This version of the site running with HTTPS is the revered or legitimate version of the site. So, we are automatically redirected to the HTTPS version of the site if accessed via HTTP. Once the site runs over SSL/TLS, all data is encrypted and rest is history.
This redirect could happen every time the website is accessed over unsecured HTTP which however is a bad way. This is a potential use case for Man-In-The-Middle(MITM) attacks. As we know that for a small duration of time, the request actually flows through the wire over unsecured HTTP, a malicious user could eavesdrop and change into a purported request (oblivious to the server). To understand how can this small window affect you, lets take an example of how requests are actually dispatched to the target server. We will use traceroute to get a detailed path of a request. Let's say that we want to access www.google.com. Traceroute log for this request looks like this :
We find that there're 7 hops or 7 intermediate routers through which the request is passed on before it's finally received by the server hosted at Google. This also means that if the request is sent over unsecured HTTP, there are 7 unsecure points where the integrity of the request could be compromised by potential MITM attacks.
Always Allow Requests to go over HTTPS
To make this happen, certain mechanisms need to be put in place so that the integrity of the request is maintained at all times. Such mechanisms dictate the browser that the site ALWAYS needs to be accessed over HTTPS. This translates to the fact that the server specifies that the scheme redirect needs to be done client side.
This mechanism/policy is called HTTP Strict Transport Security. The server aids in achieving the same by the use of a particular response header Strict-Transport-Security.
The server needs to push this header with the response back to the client. When accessed over HTTP, instead of rendering the request, the server should respond with HTTP status code 301 (permanently moved) with the correct Location attribute set (merely changing the scheme). Note that special attention should be paid while setting up the Location attribute (if done dynamically) as this is a potential XSS attack use case.
When a site is first accessed via HTTPS, the server adds the Strict-Transport-Security header in the response specifying a max-age property (in seconds). Ideally as we want our site to function over HTTPS, the value for the max-age property is set to a very large value. The optional property includeSubDomains specifies that the same holds for all sub domains for the site. Now the browser makes a note of this and records this information. Whenever the site is accessed again, the browser uses the recorded information and renders the site only over HTTPS (this means that browser takes care of the client side redirection from HTTP to HTTPS if necessary).
Facebook uses the state-of-the-art security features to protect the site. The Chrome Devtools snapshot above shows two major headers specifically dedicated to application security.
content-security-policy - a novel approach towards fighting XSS attacks by specifying a set of policies (whitelisted set of URLs) and asking the browser to load dynamic content only if they originate from these URLs. For more details on content security policy, refer to this articleContent Security Policy.
strict-transport-security - topic under discussion, here note that the max-age property is set to 2592000 seconds or 30 days. However the includeSubdomains property is not set.
When the expiration time specified by the Strict-Transport-Security header elapses, the next attempt to load the site via HTTP will proceed as normal instead of automatically using HTTPS. Whenever the Strict-Transport-Security header is delivered to the browser, it will update the expiration time for that site, so that the access scheme can be restored to default HTTPS.
One thing of particular note is that the Strict-Transport-Security header is sent only when the site is first accessed over HTTPS or if the max-age property value has expired. However, all preceding requests running on HTTP are still unprotected and thus fosters SSL Stripping.
Imagine a user A who is not aware of the scheme of the website she wants to visit. So, she just types in the URL as foo.com (without any provided scheme). Now, browsers detect no scheme and by default, load the website over unsecured HTTP. Now, there exists a malicious user B who wants to eavesdrop on our conversation with the website. Since, the initial request is over HTTP, user B could easily dig into the request. He then starts acting as a relay between user A and the server. Once the request reaches the server, it detects that it needs to load it via HTTPS, so it redirects the request and sends back the response. The response however goes back to user B (instead of user A) who can strip the SSL and send the response back to user A over unsecured HTTP. Further, all communication from user A goes over HTTP through user B who then uses a secure channel to communicate with the server. Now for user A, she has no idea whether the site is still operational over HTTP because of an attack or for the very fact that the site doesn't support SSL/TLS. This attack actually renders both user A and the hosted server blind as it makes user A run over HTTP and makes the server believe that the requests are actually coming over HTTPS as intended.
However, there exists a favorable hack implemented by browsers like Chrome and Firefox, which basically relies on a pre-loaded list of all sites which support HSTS. The list is distributed to the browser which then uses it to verify if a request (even the very first one) needs to be entertained over HTTPS or not depending on whether the corresponding domain is present in the list. This however doesn't prudent enough as the list could really be exhaustive. What else could be done? How about not rendering any HTTP content at all if confused? Neh.. Not all servers actually support SSL/TLS. Use AI to detect the scheme based on data….?? (that just blew my mind !!). Only time will tell…..