Wednesday, June 2, 2010

Security Matters

When I was little, I recall watching this popular science program in which Peter Ustinov popularized the theory of relativity. There was a rocket, a man on the rocket, a launch of the man and the rocket, and a transmission from the really fast man on the really fast rocket. The man on the really fast rocket saw earthlings slow down. Conversely, the earthlings saw the man talk really fast. Makes sense?

No, it doesn't. A cursory understanding of relativity tells you that the other's time must slow down or accelerate for both observers, not just for one. So both the man on the rocket and the earth station would have observed an apparent slowdown in the other's time. Of course, that seems confusing, since once the man on the really fast rocket returns to earth, time will have passed much faster on earth than for the man - but the reason for it is not speed, but acceleration.

This intro serves to explain something that has been bothering me for a while: the way people misunderstand information security concepts and continually use the wrong thing for the right purpose. It's really not hard, since there are only a very few and very distinct concepts - yet people get them wrong all the time. It's a little as if people take "security" as a one-size fits all umbrella, and doing something secure means doing everything under the umbrella.


I was reading this Slashdot article this morning. Apparently, people were using TOR routing to send confidential information back and forth, not realizing that TOR anonymizes connections (that is, correlation between information source and destination) but not content. Anyone with access to a TOR node can snoop all the data passing through it, and if the data is not protected, it's fair game.

So, what are the fundamental things you can do in security? Here is a partial list:
  • Protect content from eavesdropping (encryption in transit)
  • Protect content as stored (encryption at rest)
  • Ensure the content received is the content you sent (signing and private key encryption)
  • Ensure only the intended recipient can read the content you sent (public key encryption)
  • Ensure nobody knows sender and recipient are talking to each other (anonymization)
  • Ensure the content you received is really from the sender you expect (PKI, certificates)
  • Ensure the person connecting with you is who they say they are (login)
  • Ensure the connection made is from a person you have logged in (authentication)
  • Ensure the person who is requesting an action is allowed to perform it (authorization)
At first, all of this may seem to be one and the same thing, but it really isn't. If you are trying to accomplish one of the tasks on the list by performing another solution (for instance, anonymizing a connection and expecting the content to be protected) you gain nothing and most likely make things worse than if you had done nothing at all.

Protecting Content - Encryption

When you don't want other people to see the content your are sending or receiving, you encrypt it. Encryption comes in many forms, but the most important distinction to us is whether you want to encrypt something permanently or only during the transmission.

Here you have to understand that "transmission" is a technical term and means "exchange of messages between two end points." An email, for instance, is not a "transmission," because it is handled by as many intermediate "end points" as necessary. Hence it behaves to the technical user as something that needs to be treated as in need of permanent encryption.

Now, how do you encrypt your message? You have a lot of options - some better, some worse. In general you want to think of encryption like a lock box into which you put the message. The better the box, the safer the message. In addition, the lock is also really important, as if it is easy to replicate or fake the key, then your safety is gone.

The nerd distinguishes between two types of encryption: symmetric, in which the same key is used to lock and unlock the box, and asymmetric, in which different keys do the same trick. In general, symmetric encryption is easier to handle, but asymmetric encryption much more powerful.

Think of the standard you use as the lock box itself, and the particular password as the key to the box. Sometimes the password is called a passphrase or key, or more in general the secret. It really has the same function as the key in the box: it makes sure that only the person that owns this particular box can open it.

Typically, you do not have to consciously choose a standard. The software you use to encrypt data will typically choose an appropriate standard, and if you keep it up to date it will also change standard as security improved.

Asymmetric Encryption - Private and Public Keys

With symmetric encryption, both end points use the same key to encrypt and decrypt data. For instance, you would use a password on a ZIP file, and the recipient uses the same password. In asymmetric encryption, though, the encryption occurs with one password, the decryption with another. How is this possible? Well, the two passwords are not chosen randomly. Instead, they match precisely: the one used to encrypt is a 100% match of the other, and such passwords are always generated in pairs.

Just like in a Sudoku puzzle, where you have plenty leeway in placing the numbers, but find ultimately strict constraints, in asymmetric encryption you cannot choose just any random pair of passwords. You actually cannot even choose one of the two. Instead, the two passwords are generated for you, and they are made up of gobbledygook that only security software is really happy with.

The two passwords, or keys, have slightly different functions. One of the two can be used to generate the second, but the second cannot be used to generate the first. Because of this, the first one is more valuable and must be kept safe at all times. The other one, on the other hand, is not independently important of the first and you can handle it in a much less strict way. That's how they got their respective names: the first one is called private key, the other one public key.

Private keys are so important that you usually encrypt them with symmetric encryption to ensure nobody can use them even if they get to them - at least not use them quickly. So, when you generate a key pair, the software will ask you to assign a passphrase to the private key (not to the public key).

While the asymmetry is fundamental, there is one thing in which the key pair is symmetric: what is encrypted with either key can only be decrypted with the matching other. Since only you have the private key, that makes for all sorts of interesting applications.

1. Sending a Message Only You Can Read

You are the only one with the private key. Anyone that encrypts a message with your public key ensures that nobody can read it but you. Not even they can read it once it's encrypted!

2. Sending a Message That's Certainly Yours

When you send a message encrypted with your private key, anyone with the public key can decrypt it. But since they have to use your public key, they know the message came from you, since only you can encrypt with the private key.

Signing

As we just saw, when you send a message that is encrypted with your private key, you essentially state the message came from you. What about if you want anyone to read the message, but want to ensure they know it's really from you?

Decades of unreliable connections have left us with the concept of a checksum. That's a number that is computed on the content of a message/file and is a summary or digest of the message itself. The message can be any length, the digest is typically only a few bytes - it's only function is to tell you whether the message was received accurately.

To give you a rough idea of how that works, imagine that you take the value of each letter of the alphabet and add them all up. You tack on the sum at the end of a message, and that's your digest. A = 1, B = 2, etc. Then the digest of this paragraph is 7997.

Now, imagine you create a digest of your message, but you encrypt it with your private key before tacking it onto the message. Suddenly, only you can have created the message, and anyone with your public key will be able to decrypt it. This way, they will know that the message is yours and that it was received as sent. That's quite brilliant, because it allows you to send something that you don't mean to be hidden from view,  just making sure people that care have a good way of verifying it's really from you.

Certificates and Chain of Trust

Now, imagine you wanted to send something to someone who doesn't have your public key. You want it to be either encrypted or signed, in any case you want them to know it's from you and only from you. Well, to do that, you would have to send them your public key, no? Easy!

But wait? If you send them your public key, how do they know it's your public key? Imagine a rogue government that listens to message exchanges and inserts its own public key whenever a different one is detected. Once it does that, it controls all encrypted traffic. And it doesn't even have to have a government - imagine the WiFi network at your coffee shop!

Fortunately, there is a solution to that, in the form of certificates. The idea here is that there is someone you trust in the world, because they are known to be trustworthy and because they have a process in place that ensures their word is worth your trust. You get their public key and whenever they send you a message encrypted with the corresponding key, you know it's them.

Imagine now they sent you a message saying something like, "Yes, I know XYZ, and I if they tell you abc is their public key, then that's right." Why, then you could trust the public key "abc"! Note that you have to trust a lot here: you have to trust the verifier in both keeping their list of good public keys safe, and in not snooping on the messages you send using "abc". After all, they told you "abc" is good!

In practice, that's done all the time without your knowing it. Browsers connect to secure sites (the ones whose URL starts with https:// instead of http://) by doing just that: the site shows a certificate that has some basic information, including someone trustworthy that can verify the information on the certificate. That someone is trusted by someone else, who is trusted by someone else again, who is trusted by someone you trust. In the end, you trust that first certificate because of all the other people's/sites' trust.

When our browser connects to a secure URL, e.g. https://mail.google.com, it immediately demands a certificate. Once it verifies all the data and the certificate chain, then it talks to the server and trusts it. The browsers usually display that in a very non-obvious way, for instance by showing a little lock at the bottom of the screen. If you click on the lock, you'll probably get the certificate information on screen. When you look at it, you'll notice that the fields in there all make sense now.

Anonymization

Sometimes it's just as important to know that two parties talked as it is to know what they said. You probably remember how big a deal it was when rumors surfaced that an Al Qaeda operative had been in secret talks with the Iraqi government during the build-up of the War in Iraq. It didn't really matter what they had said: it was the fact itself they had talks that was
 
Frankly, on the Internet the problem is less one of spies and terrorists and more one of not having certain parties know you are using certain services. Maybe you don't want your Internet provider to know you are surfing for porn, or maybe you don't want your government to know that you are reading up on a massacre it perpetrated.

Anonymization provides that level of protection. It takes a message and bounces it around so that neither the end point nor anyone untrusted in the middle know where it came from.

Today, there are two main forms of anonymization available: trusted proxies and TOR. The former is mostly used to bypass restricted networks, the latter... oh, well, look it up yourselves, I am not going to get into a controversy here.

Basically, in both cases the idea is to connect through an encrypted tunnel. While your requests and the responses may be in the clear, the tunnel through which they flow may be encrypted.

Let's consider the easier case of a trusted proxy, since it is conceptually the same as in the other case. Assume you are in a country that doesn't allow you, I don't know, to search on Google. It does, though, allow you to connect to secure sites. Now, since the connection to secure sites cannot be monitored, what if there was a secure site that just goes out and searches for you on Google?

Well, that's what a secure proxy does: you connect to it, and it connects for you to the site you really want to go to. When the response comes back, the site sends it back as its very own response, but through the secure channel that only it and you can penetrate.

The downside to this is that the proxy will see everything you are sending and receiving, including passwords and content - whatever you didn't want others to see. That's why you need to establish that the proxy is trusted: if it betrays you, it will know everything there is to know about you.

Lastly: Login, Authentication, Authorization

One of the questions that comes up regularly is, "Why is it that the server has to prove who it is with a certificate, and the user doesn't?"


Indeed, the web would be a much better place if client certificates were required. Unfortunately, as the whole security thing came up, trusted certificates were expensive and really hard to come by, so there was no way to ensure everybody used one. Still more damningly, it is hard to setup a server that authenticates using certificates, so even if you had one, it would be of no use.

Instead, the web moved to a model in which you first prove who you are by providing credentials. You essentially come to a web server as an unknown entity, but you provide a set of required data items and the web server accepts you. Usually, that's a user name and password combination, but some sites request more (like a rotating security question).

The point of this login phase is to establish that you are who you say you are and to provide you with "something" that tells the servers on the next try that it's you. This is to avoid having to send user name and password back and forth all the time.

When you connect to the server again, for instance because you clicked on a form, you will present this "something" (usually a combination of encrypted cookie and encrypted form field) and the server makes sure that those credentials are valid. Notice that we replaced something you provided (user name and password) with something the server provided. Since the server owns the new credentials, it can do with them as it pleases, including declaring them invalid at any point.

Every web service must include this verification step at any request that connects to user data. Failure to do so can cause the worst security issues, which gives this step its fundamental importance in web security. It's called authentication.

Once you are authenticated, the server still has to decide whether you can do something. Some users may be allowed to do things that others are not allowed to, for instance perform administrative tasks. Sometimes that's handled by creating separate applications for different tasks, and have separate user accounts on each service. But more and more frequently, applications are merged for ease of development and access control lists are used. This final step is called authorization.

Summary

At this point, you should understand the fundamental difference between terms used in security and should be able to make informed choices on a variety of options. Please, comment on omissions and requests for clarification.

No comments:

Post a Comment