PyCon 2012 Notes – Advanced Security Topics

Here are my raw notes.

Friday – 1:45pm – Advanced Security Topics

Paul McMillan

 

Hashing

Common Crpyo Hashes: md5, sha1, sha256

If you’re typing md5 into your code, you’re probably doing it wrong

 

Message signing – did the message come from who i expect it to come from?

 

How can we be attacked?

 

Did this file get corrupted

 

When doing a basic md5 hash, we check that hash that we have. Where did we get that hash?

If we know that the md5 hash that we have is good, then we’re pretty secure – hard to generate a hash collision (reasonably hard problem).

 

For message signing, it’s diferent.

H = md5(secret + message)

NO!

 

An attacker can generate:

md5(secret + message) == md5(secret + message + junk + attacker_message)

maybe your app is strict about format and this will result in an error

If an attacker can, given one message, sign a different message, it’s not good.

Solution: use HMAC

HMAC is essentially: hash(secret + hash(message))

use hmac lib

and salt your secret key.

salt = ‘session_cookie_signing’

hmac.new(salt + secret_key, msg)

If you don’t salt based on the use in each area of your app, then an attacker could take a signed message from one area and use it in another area.

 

Don’t use MD5! Avoid SHA1. Use SHA256 (for now).

There’s also SHA512, but on 32-bit machines can have serious performance problems. SHA256 is similarly secure.

 

Encryption

You should not be implementing encryption, in almost all cases.

Why do you need it?

If when protecting data in transit, use SSL/TLS.

Protecting data at rest: use underlying OS. There are already good solutions for this.

 

Random numbers

Generating a secret key:

 

import random

secret = ”.join(random.choice(allowed_cars) for i in range(length))

 

Don’t do that! Default random in python is predictable.

 

Solution:

from random import SystemRandom()

 

will use entropy data from various sources.

 

But… you don’t know how much entropy is actually there. In some cases this can be a real problem. i.e. if you’re running at the start of a virtualized machine, there’s not a lot of entropy yet. It’ll run out of entropy and give you predictable numbers.

 

Timing attacks

message, sig = ‘|’.split(incoming)

sig2 = hmac.new(salt + secret_key, message)

if sig == sig2:

do_something()

 

== is the problem because string comparison will short circuit when there isn’t a match.

 

If you take a lot of message, it’s possible to figure out a long string, one character at a time.

 

Very small difference but it’s statistically decidable in hundreds-to-millions of messages depending on the app. Over the internet there’s a lot of latency so it’s impractical. But a lot of apps are virtualized – I’m on the same machine as you, can ask you lots of questions really fast.

 

Solution: use a constant-time compare function.

check same length, then do an operation of every set of characters.

 

if len(val1) != len(val2):

return False

result = 0

for x, y in zip(val1, val2):

result |= ord(x) ^ ord(y)

return result == 0

 

Pickle

Loading an untrusted string is equivalent to running eval() on that string!

Use JSON or something, not pickle, for untrusted data.

If you must use pickle, sign and verify it… but use JSON if possible to avoid complexity of signing.

 

“You would think that…”

No. Always verify your assumptions.

 

For example….

pip install Django

- I trust the django authors

- i trust the people who run pypi

- i trust the people who wrote pip

- pip verifies the md5 hashes of packages

- packages on pypi can be pgp signed

- pypi uses ssl

 

So I’m safe right?

Of course not… all of these things require you to trust everyone on the internet.

- pip verifies md5 hash  — but it downloaded the hash from the same page it downloaded the package from, and that’s in plain text. if someone can change the code, they can change the hash.

- pgp signed — no tool currently checks pgp

- pypi uses ssl — not really

 

PyPi -

- untrusted (by default) certificate

- plaintext by default

- easy_install and pip don’t use the SSL

Python doesn’t make it easy to check SSL certs. This is a problem.

 

You’d think that when you open a HTTPS url you’d get encryption… and you do, but it doesn’t verify the certificate.

 

Recommendation: use the ‘requests’ library. It *does* do the cert check by default.

 

Demo

DNS lookup – UDP. Computer will trust the first response it gets back. So on open wifi, easy for someone to spoof your dns and say that pypi.org is whatever they want.

 

“sudo pip install certifi”

 

Runs setup.py — so anyone can put code in setup.py and then it’ll run on your computer as root.

 

PyPI offers everything necessary to make this not possible, but we don’t use it.

 

How do we help the python community?

- use the requests library

- if you’re using hashes, use hmac.

 

Q&A

Q: Putting packages on pypi — is there a better way to do this at a release-process level?

A: It’s a hard problem. If you put something in the package that verifies the package is what you expected, that’s no good. If you provide something alongside the package, it’s the same problem. Signed PGP files — it’s great, but noone checks this. We need to make the tools.

 

Q: How much of this is actually new? How much of this is actually security-specific?

The HMAC stuff, for example, is all “don’t reinvent the wheel”, and security is the context. Are there cases where you have to do something because of a security reason? What about the signing stuff – that’s been seen before too.

A: None of this is new. But as a community we pretend it doesn’t exist. The end result of most conversations about this are “it’s not pypi’s problem, users should be doing something secure.” But that’s not the way to think about this: should make the way that’s obvious the right thing.

Q: But right now it’s more work to do it the right way, so it’s bad engineering on the part of users.

A: Higher-level constructs tend to make code more secure. But have to use

 

Q: Why didn’t the demo work? Issues with the wifi – security features?

A: Script was set up for WPA2 network, but this is an older network.

 

Q: Timing attack – constant-time string compare is not the obvious way to do it…

A: Have to think about security implications. Python should have a constant-time compare function.

 

Q: Constant-time compare function probably has problems because of mallocs. Don’t think it’s possible to really have a constant time compare function.

A: Is possible to get very very close. Not possible in C either but can get close.

 

Q: Suggestions for downloading packages for doing releases?

A: Most large software projects have a system for finalizing packages. Unusual to install things directly from the cheese shop. Pip team is sprinting on adding auth.

 

Q: crate.io – what does this actually do to increase security if installers are still the same?

A: Storing sha256 hashes. Storing hashes long-term so you have a history of what files were there. In PyPI things can be changed without you noticing. crate will maintain a history.

 

 

 

6 thoughts on “PyCon 2012 Notes – Advanced Security Topics

  1. Pingback: My experience note-taking at PyCon 2012 « Brian Rue’s Blog

  2. Fantastic beat ! I wish to apprentice while you amend your web site,
    how can i subscribe for a blog web site? The account helped me a acceptable deal.
    I had been a little bit acquainted of this your broadcast offered bright clear idea

    Reply

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s