Back in 2012, when Juliano Rizzo and Thai Duong announced the CRIME attack, TLS / SSL Compression attack against HTTPS, the ability to recover selected parts of the traffic through side channel attacks has been proved. This attack was mitigated by disabling the TLS / SSL level compression by most of the browsers. This year at Black Hat, a new attack called BREACH (Browser Reconnaissance and Exfiltration via Adaptive Compression of Hypertext) was announced which commanded the attention of entire industry. This presentation which came up with the title “SSL Gone in 30 seconds” is something that is not properly understood and hence there seems to be some confusion about how to mitigate this. So I felt this article would give some detailed insight into how notorious the attack is, how it works, how practical it is and what needs to be done to mitigate it. So let’s proceed and have a look at the same.
Unlike the previously known attacks such as BEAST, LUCKY etc, BREACH is not an attack against TLS. BREACH is basically an attack against the HTTP. If you are familiar with the famous Oracle Padding attack, BREACH is somewhat easy to understand. A BREACH attack can extract login tokens, email addresses or other sensitive information from TLS encrypted web traffic in as little as 30 seconds (depending on the number of bytes to be extracted). The attacker just needs to trick the victim to visit a malicious link for executing the attack. Before going into the details, let me explain a little bit more about the basic things to know. Web pages are generally compressed before the responses are sent out, which is called HTTP Compression, primarily to make better use of available bandwidth and to provide greater transmission speeds. Browser usually tells the server (through ‘Accept-Encoding’ header), what compression methods it supports and server accordingly compresses the content and sends it across. If the browser does not support any compression then the response is not compressed. The most commonly used compression algorithms are gzip and deflate.
Accept-Encoding: gzip, deflate
When the content arrives, it is uncompressed by the browser and processed. So basically with SSL enabled web sites, the content is first compressed, then encrypted and sent. But you can determine the length of this compressed content even when it’s wrapped by SSL.
How it works?
The attack primarily works by taking leverage of the ‘compressed size’ of the text when there are repetitive terms. So here is a small example which explains how deflate takes advantage of repetitive terms to reduce the compressed size of the response.
1. Consider the below search page which is present after logging into this site:
2. Observe that the text highlighted in red box is the username. Now enter any text (say ‘random’) and click search.
3. So you can control the response through the input parameter in the URL. Now imagine what if the search term is ‘Pentesting’ (which is the username in this case).
Now when deflate algorithm is compressing the above response, it finds that the term ‘Pentesting’ is repeated more than once in the response. So instead of displaying it second time the compressor says ‘this text is found 101 characters ago’. So this reduces the size of the compressed output. In other words, by controlling the input search parameter you can guess the username. How? The compressed size would be least when the search parameter matches the username. This concept is the base for the BREACH attack.
Now let us see how an attacker would practically exploit this issue and steal any sensitive information. Consider the below site and assume a legitimate user has just signed in.
[Before signing in to the application]
[Search page which is accessible after logging in]
As shown in the above figure, also assume that there is some sensitive data which is present in the Search page for example let card number be one such sensitive data in the application. When the user searches for something (say ‘test’) below is the message displayed.
Now the attacker can also get the corresponding compressed sizes of the responses for each of these requests. Can you guess why the compresses sizes for each of these responses would differ and can you guess which request would have the smallest compressed size? Below would be the request with the smallest compressed sizes.
Below is the explanation as to why the above requests have least compressed sizes. Now take the first request. Below would be the response from the server.
As shown above, when the deflate algorithm encounters this, it makes an easy representation of the repetitions and thus results in a least compressed size. So by analyzing the compressed size for each of the requests from 100-10000, an attacker can simply deduce what the card number is in this case. This the beauty of this attack lies in the fact that we did not decrypt any traffic but just by analyzing the size of the responses we were able to predict the text.
To summarize in simple steps, for an application to be vulnerable to this breach attack, below are the conditions that it must fulfill:
1. The server should be using HTTP level compression.
2. A parameter which reflects the input text. (This will be controlled by the attacker).
3. The page should contain some sensitive text which would be of interest to the attacker.
Turning off the HTTP Compression would save the day but that cannot be a possible solution since all the servers rely on it to effectively manage the bandwidth. Here are some of the other solutions that can be tried:
- Protect the vulnerable pages with a CSRF token
- Adding random bytes to the response so as to hide the actual compressed length.
- Separating the sensitive data from pages where input text is displayed.