The “Do it the Hard Way” Trap

I’ve recently noticed that I’ve been intentionally doing more things the hard way.

The allure is this: the hard way feels like it ought to produce the best results. If you’re uniquely talented or qualified to do whatever it is you’re doing the “hard” or “best” or “right” way, it should follow that doing so will give you a competitive advantage. You’ll have the best product, and no one else will be able to replicate it.

This of course is a fallacy. Why? Because the relationship between raw effort and results is not linear, or even strongly correlated. It’s often decreasing returns past a threshold, and only effort in the direction of the goal counts (the dot product of effort and goals).

The Hard Way is a tempting trap. We’ve been conditioned toward it in school, having been warded for going “above and beyond” to get the A+. Brute force doesn’t require much creativity, but it’s still Hard. It can feel like a sure-fire way to build the best product or get the most customers. That customers will recognize your greatness, and competitors won’t be able to replicate you, because you did it The Hard Way. But customers don’t care how hard it was to build your product; they care how well it solves their problems. And smart competitors will find easier and more effective ways to compete.

Building a custom datastore for your web app? Maybe you should use something off the shelf instead.

Trying to win by making your design or marketing copy or user interface “perfect”? Consider whether “pretty good” is good enough. (This is what I’ve been catching myself doing lately.)

Doing things The Easy Way requires constant reflection about how well your effort is aligned with what you’re trying to accomplish, and being ruthless about cutting out work that isn’t pulling its weight. Are you getting a good return on your effort? What can you get by without doing? Can you be more focused? Every time I stop to ask these questions, I’m glad I did.

How to stream the World Series live if you’re in the US

MLB.tv has a legit, HD, live stream of the World Series… but it’s blacked out in the US and Canada. Here’s what I did to be able to watch the game from the comfort of my desk at work.

Things to note:

  • This is not free; it’ll cost you around $30 to watch the rest of the world series.
  • The following guide is somewhat technical (though doesn’t require any programming), and is Mac-specific.
  • Do this at your own risk; I bear no responsibility whatsoever for following the instructions below.

Part 1: Get a Linode server in the UK

I may have left out a few minor steps here — if you get stuck, ask for help in the comments.

a. Sign up for an account at http://www.linode.com . Choose the smallest server size – the $19.99/month one is fine.

Image

It’s $19.99 for a whole month, but Linode will reimburse you for unused full days when you cancel. So just make sure to cancel it when you’re done with it after the world series.

b. Choose “London” as the location

c. Go to the Dashboard for your new Linode, and click “boot”. Choose “Ubuntu 12.04 LTS” as the operating system. Use the defaults for disk and swap size.

d. Wait for it to finish booting – will take a few minutes. You’ll know it’s done when you see this on the top right:

Image

f. Click to the Remote Access tab. Next to where it says “SSH Access”, note the IP address.

Image

g. Open a Terminal window. (Press command-space to open Spotlight, then type Terminal, then select the Terminal app and press enter.)

Image

h. Type the following command (using your IP address from step (f) instead of 100.101.102.103.104) and press enter:

ssh -C -D 8080 root@100.101.102.103

You’ll be prompted for the root password you created when setting up your linode. Enter it now.

i. You now have an SSH tunnel set up that will let you proxy your Mac’s internet traffic through your London-based Linode. Don’t close this window. You can minimize it if you want. If your connection gets disrupted, you may need to run the above command again to log back in.

Part 2: Tunnel your internet traffic through your new Linode

a. Open up System Preferences (command-space to open Spotlight, type System Preferences, choose the System Preferences App, press enter)

Image

b. Click “Advanced…”

Image

c. Click Proxies

Image

 

d. Check the box next to “SOCKS Proxy” (make sure it’s the only box checked). Enter “localhost” for the proxy server name, and 8080 for the port number.

Image 

e. Press OK, then Apply.

Image

f. You’re now tunneling your HTTP traffic through your linode. To confirm that it worked, go to http://www.whatismyip.com and confirm that it shows that your IP address is your Linode’s IP address.

Part 3: Sign up for MLB.tv

a. Go here and sign up. It’s $24.99 for the rest of the postseason.

b. Once you log in, you’ll see a page like this:

Image

c. At game time, press “Watch”. It’ll pop up the MLB.tv player. If all is well, the player will load and the stream will be live!

If you get a message about blackout restrictions, it probably means that something in step 2 didn’t go right. Make sure that your ip address according to whatismyip.com is your Linode ip, and try again.

Good luck and go Giants!

What Adobe’s new pricing for Flash means for social game developers

My jaw dropped this morning when I read the news: Adobe will begin charging a 9% revenue share on revenue above $50k for using some “premium features” in Flash Player. The new pricing goes into effect for apps launched after August 1st that use both of the following features:

  • ApplicationDomain.domainMemory
  • Stage3D.request3DContext, when using hardware acceleration

These features are used together by the Unity and Unreal engines and allow console-quality games to be played in the browser. They could also be used together inadvertently by developers rolling their own optimizations; for example, using domainMemory for code optimizations and Stage3D for rendering would also fall under the new terms.

Up until today, developers like me have assumed that, like everything else in Flash Player, these features would always be free.

So, what exactly does “9% of net revenue” mean for social game developers? In the cutthroat, thinning-margin social games business, it probably means that no one will be able to afford to use the new features. Dropping your effective LTV from $3 to $2.73, when you’re paying $2.50 to acquire that user, means you now have 54% less money to cover both marginal costs (hosting and support) and fixed costs (initial and ongoing development). That’s huge.

I suspect we’ll see a few developers give it a shot, but in the longer run this is going to push more people to HTML5 / WebGL and contribute to the eventual abandonment of Flash as a platform for social games.

I can only assume that the Flash Player team is doing this because they’ve been told they have to figure out a way to pay for themselves. But I hope they figure out a different way–perhaps taking a revshare on tools like Unity instead of games themselves–so that the Flash gaming ecosystem can continue to live and grow.

Thanks to Justin Rosenthal for reading drafts of this.

Farewell, Lolapps

Today marks my last day at 6waves Lolapps. 6waves Lolapps was formed last July as the result of a merger between 6waves, a Hong Kong-based publisher of social games, and Lolapps, the company I co-founded 4 years ago. On Monday this week we let go substantially all of the development staff, and today is the end of my short transition-out period.

It’s been a crazy ride of ups and downs and it feels very strange that it’s come to an end. I’m proud of the culture that we built, and it was especially evident this week as suddenly-former employees banded together to commiserate and find new companies to call home. I’m also very proud of the products we built, from the well-tuned viral apps in the early days (Quiz Creator and Gift Creator) to the envelope-pushing games we made more recently (Ravenwood Fair and Ravenskye City). And I’ve thoroughly enjoyed being part of an amazing team of talented, fun, and ambitious people.

I’ve learned more than I could have imagined and look forward to cataloging that over the coming few weeks.

What I’m doing next: still to be determined. What I do know: I want to start another company, and I want to try something very different. Check out the markets I follow on AngelList to see the general direction.

Countless thanks to Annie Chang, Kamo Asatryan, Kavin Stewart, Arjun Sethi, AJ Cantu, Justin Rosenthal, Cory Virok, Vivek Tatineni, and the rest of the Lolapps team for an amazing four years. I can’t wait to see what we all come up with next.

My experience note-taking at PyCon 2012

First off: the notes are here. I took most of them, but got some help from Roman Kofman and Alex Graveley. Thanks guys!

The rest of this post is a reflection on the note-taking experience. I’m not able to attend PyCon today so hopefully someone else will carry the torch the rest of the way :)

I started my notetaking extravaganza as a means of remembering what happened, and writing down interesting things to share with the rest of the engineering team at 6waves Lolapps who weren’t in attendance. I took them in TextEdit. Paul Graham’s keynote was really good so I posted the notes on this blog. I did the same for the first few sessions I attended, but the formatting was really terrible. Then I remembered seeing something on TechCrunch about using HackPad to take shared notes at SXSW, and tried it out on the next talk. It worked pretty well so I ended up taking notes at each of the sessions I went to.

Here’s my quick review of HackPad for this purpose:

  • Liked the interface – easy to type stuff pretty fast
  • Liked that sharing it is super easy. Cool to see people online in real-time.
  • Sync worked reasonably well over the conference wifi. not realtime but updated every ~second
  • Got disconnected a lot on saturday. Not sure if that was HackPad’s problem or the wifi’s.
  • Didn’t appear to be a way to delete any pads, or remove them from collections. This kept me from using a Collection for the notes – I couldn’t remove the default “Welcome to [this collection]” pad that would’ve been confusing because I already had a “home page” pad.
  • Better code formatting support would be nice. you can format things as “code” by indenting 4 spaces, but it feels a little fragile, and it’s hard to indent multiple lines of code that has its own indentation.
  • Auto-”table of contents” by looking at things that are bold was cool. Worked reasonably well here, though on some of the larger documents it would’ve been nice to have multiple layers of headings.

Some more general thoughts about the shared-note-taking experience:

  • Taking notes was pretty useful for me, to remember what was presented and force myself to stay engaged.
  • One problem with them being shared is that I feel a little awkward promoting them because I don’t really own them.
  • The notes feel the most useful and most meaningful for the talks that didn’t have as much info on the slides. There were a few talks where I found myself trying to type everything that was on the slides… that seems like an unnecessary duplication of effort. The talks where the slides were minimal or nonexistent felt better from a note-taking perspective.

All in all, this was a lot of fun and I’ll definitely do it again, especially if more people get involved. And for those of you at pycon now, go take some notes!

 

 

PyCon 2012 Notes – Advanced Security Topics

Here are my raw notes.

Friday – 1:45pm – Advanced Security Topics

Paul McMillan

 

Hashing

Common Crpyo Hashes: md5, sha1, sha256

If you’re typing md5 into your code, you’re probably doing it wrong

 

Message signing – did the message come from who i expect it to come from?

 

How can we be attacked?

 

Did this file get corrupted

 

When doing a basic md5 hash, we check that hash that we have. Where did we get that hash?

If we know that the md5 hash that we have is good, then we’re pretty secure – hard to generate a hash collision (reasonably hard problem).

 

For message signing, it’s diferent.

H = md5(secret + message)

NO!

 

An attacker can generate:

md5(secret + message) == md5(secret + message + junk + attacker_message)

maybe your app is strict about format and this will result in an error

If an attacker can, given one message, sign a different message, it’s not good.

Solution: use HMAC

HMAC is essentially: hash(secret + hash(message))

use hmac lib

and salt your secret key.

salt = ‘session_cookie_signing’

hmac.new(salt + secret_key, msg)

If you don’t salt based on the use in each area of your app, then an attacker could take a signed message from one area and use it in another area.

 

Don’t use MD5! Avoid SHA1. Use SHA256 (for now).

There’s also SHA512, but on 32-bit machines can have serious performance problems. SHA256 is similarly secure.

 

Encryption

You should not be implementing encryption, in almost all cases.

Why do you need it?

If when protecting data in transit, use SSL/TLS.

Protecting data at rest: use underlying OS. There are already good solutions for this.

 

Random numbers

Generating a secret key:

 

import random

secret = ”.join(random.choice(allowed_cars) for i in range(length))

 

Don’t do that! Default random in python is predictable.

 

Solution:

from random import SystemRandom()

 

will use entropy data from various sources.

 

But… you don’t know how much entropy is actually there. In some cases this can be a real problem. i.e. if you’re running at the start of a virtualized machine, there’s not a lot of entropy yet. It’ll run out of entropy and give you predictable numbers.

 

Timing attacks

message, sig = ‘|’.split(incoming)

sig2 = hmac.new(salt + secret_key, message)

if sig == sig2:

do_something()

 

== is the problem because string comparison will short circuit when there isn’t a match.

 

If you take a lot of message, it’s possible to figure out a long string, one character at a time.

 

Very small difference but it’s statistically decidable in hundreds-to-millions of messages depending on the app. Over the internet there’s a lot of latency so it’s impractical. But a lot of apps are virtualized – I’m on the same machine as you, can ask you lots of questions really fast.

 

Solution: use a constant-time compare function.

check same length, then do an operation of every set of characters.

 

if len(val1) != len(val2):

return False

result = 0

for x, y in zip(val1, val2):

result |= ord(x) ^ ord(y)

return result == 0

 

Pickle

Loading an untrusted string is equivalent to running eval() on that string!

Use JSON or something, not pickle, for untrusted data.

If you must use pickle, sign and verify it… but use JSON if possible to avoid complexity of signing.

 

“You would think that…”

No. Always verify your assumptions.

 

For example….

pip install Django

- I trust the django authors

- i trust the people who run pypi

- i trust the people who wrote pip

- pip verifies the md5 hashes of packages

- packages on pypi can be pgp signed

- pypi uses ssl

 

So I’m safe right?

Of course not… all of these things require you to trust everyone on the internet.

- pip verifies md5 hash  — but it downloaded the hash from the same page it downloaded the package from, and that’s in plain text. if someone can change the code, they can change the hash.

- pgp signed — no tool currently checks pgp

- pypi uses ssl — not really

 

PyPi -

- untrusted (by default) certificate

- plaintext by default

- easy_install and pip don’t use the SSL

Python doesn’t make it easy to check SSL certs. This is a problem.

 

You’d think that when you open a HTTPS url you’d get encryption… and you do, but it doesn’t verify the certificate.

 

Recommendation: use the ‘requests’ library. It *does* do the cert check by default.

 

Demo

DNS lookup – UDP. Computer will trust the first response it gets back. So on open wifi, easy for someone to spoof your dns and say that pypi.org is whatever they want.

 

“sudo pip install certifi”

 

Runs setup.py — so anyone can put code in setup.py and then it’ll run on your computer as root.

 

PyPI offers everything necessary to make this not possible, but we don’t use it.

 

How do we help the python community?

- use the requests library

- if you’re using hashes, use hmac.

 

Q&A

Q: Putting packages on pypi — is there a better way to do this at a release-process level?

A: It’s a hard problem. If you put something in the package that verifies the package is what you expected, that’s no good. If you provide something alongside the package, it’s the same problem. Signed PGP files — it’s great, but noone checks this. We need to make the tools.

 

Q: How much of this is actually new? How much of this is actually security-specific?

The HMAC stuff, for example, is all “don’t reinvent the wheel”, and security is the context. Are there cases where you have to do something because of a security reason? What about the signing stuff – that’s been seen before too.

A: None of this is new. But as a community we pretend it doesn’t exist. The end result of most conversations about this are “it’s not pypi’s problem, users should be doing something secure.” But that’s not the way to think about this: should make the way that’s obvious the right thing.

Q: But right now it’s more work to do it the right way, so it’s bad engineering on the part of users.

A: Higher-level constructs tend to make code more secure. But have to use

 

Q: Why didn’t the demo work? Issues with the wifi – security features?

A: Script was set up for WPA2 network, but this is an older network.

 

Q: Timing attack – constant-time string compare is not the obvious way to do it…

A: Have to think about security implications. Python should have a constant-time compare function.

 

Q: Constant-time compare function probably has problems because of mallocs. Don’t think it’s possible to really have a constant time compare function.

A: Is possible to get very very close. Not possible in C either but can get close.

 

Q: Suggestions for downloading packages for doing releases?

A: Most large software projects have a system for finalizing packages. Unusual to install things directly from the cheese shop. Pip team is sprinting on adding auth.

 

Q: crate.io – what does this actually do to increase security if installers are still the same?

A: Storing sha256 hashes. Storing hashes long-term so you have a history of what files were there. In PyPI things can be changed without you noticing. crate will maintain a history.

 

 

 

PyCon 2012 – The Art of Subclassing (Raymond Hettinger)

Here are my notes. Many thanks to Raymond for this excellent talk. Hopefully he can share his slides soon :)

 

Friday – 11:30am – The Art of Subclassing

Raymond Hettinger

 

- revisit the notion of a class and subclass witha  view to deepening our core concepts of what it means to be a calass/subclass/instance

- discuss use cases, principles, and design patterns

- demonstrate super()

- use examples from standard library

 

terminology:

adding: Dog adds the bark() method to Animal which didn’t know how to do before

overriding: snake replaces walk()

extending: cat modifies walk() to add tail swishing

 

Patterns for subclassing:

1. Frameworks

- parent class supplies all of the controller functionality and makes calls to prenamed stub methods

- the subclass overrides stub methods of interest

- e.g. SimpleHTTPServer runs an event loop and dispatches HTTP requests to stub methods like do_HEAD() and do_GET()

(these start as just “pass”)

- someone writing an HTTP server would use a subclass to supply to desired actions in the event of a GET or HEAD request

 

very static. choices to override are predefined. framework tells you “here are the things you can subclass”

 

2. Dynamic dispatch to subclass methods

- parent class uses getattr() to dispatch to new functionality

- child class implements appropriately names methods

example: cmd.py

 

* best way to become a better python programmer is to read code written by great python programmers, like the standard library

 

polish a class and think about subclassers (like yourself) in the future

dynamic dispatch is a great method.

 

3. Call Pattern

don’t hardwire class name in __repr__ — use self.__class__.__name__

 

super:

#1 misunderstanding: in other languages, means i’m calling one or more of my parents.

‘self’ might not be you.

when you call super, it’s not your ancestry tree that matters, it’s the caller’s.

super means go up from your children – start at the bottom.

 

Let’s retool our thinking about subclasses.

 

patterns from java, etc. cause you to think in a constrained way in python.

 

What does it mean to be an object/class?

Object: an entity that encapsulates data together with functions (methods) for manipulating that data.

We implement that with dictionaries:

- instance dictionaries hold state and point to their class

- class dicts hold the functions (methods)

 

InstDict1 ->

InstDict2 ->  ClassDict

InstDict3 ->

 

instance points at the class, not the other way around.

 

So what is a subclass?

- a class that delegates work to another class

- a subclass and its parent are just two different dicts that contain functions

- subclass points to its parent

- pointer means “i delegate work to this class”

 

InstDict1 ->

InstDict2 ->  SubClassDict -> ParClassDict

InstDict3 ->

 

“subclass means to specialize” –> no, that’s not it in python. it’s delegation.

 

- Subclassing can be viewed as a technique for code reuse.

- It is the subclass that is in charge

- The subclass decides what work gets delegated.

 

Operational (implementaiton) view of subclassing:

- classes are dicts of functions

- subclasses point to other dictionaries to reuse their code

- subclasses are in complete contorl of what happens

 

Conceptual views of a subclasses:

[the way many people think of them - not how it actually is!]

- parent classes define an interface

- subclasses can extend that interface

- parents are in charge

- subclasses are just specializations of the parent

 

Shift from the view at the bottom to the view at the top.

 

Liskov Substitution Principle

“If S is a subtype of T, then objects of type T may be replaced with objects of the S”

 

Why do we care about Liskov?

- It is all about polymorphism and subsitutability so that our subclass can be used in client code without changing the client code.

- Example: consider a large body of code for a cash register that calls an accept_payment() method on a payment object

e.g. deployed code given to clients

- We can write separate classes for credit cards, cash, checks, etc that all owrk without chanigng the cash register code

- We can write subclasses of the credit card class that provides custom handling for Visa, Amex, etc.

 

It’s a Principle not a Law.

 

Liskov Violations:

- any part of the api that isn’t fully substituatable

- this is common and normal

- useful subclasses commonly have different constructor signatures

- for example, the array API is very similar to the list API but the constructor is different:

s = list(someiterable)

s = array(‘c’, someiterable)

-> not substitutable.

 

Goal: minimize or isolate the impact when signature is different.

 

e.g. sets:

MutableSet instances suport union, intersection, difference

So they need to be able to create new instances of MutableSets

But the siganture of the constructor is unknown

So, we factor out calls to the constructor in _from_iterable().

-> if your constructor has a different signature, override _from_iterable and you’re good to go.

 

-> Factor out all your Liskov violations into one place to make it easier on subclasses. Just one adapter.

 

The Circle / Ellipse Problem

- in math, circle is just a special case of ellipse

- if one ellipse method stretches an axis, what does that mean for circle instances? (i.e. circle.skew())

- the problem is that circles have less info that an ellipse and have constraints that don’t apply to general ellipses

- the reverse wouldn’t work either because circles have capabilites that don’t apply to ellipses (i.e. the bounding box is a square)

 

-> there’s no Liskov-wise right answer as to which is the superclass

 

So how do you decide?

Not specialization: think about code reuse.

Principle for deciding which is the parent class: whichever maximizes code reuse. (whichever has the most reusable code should be the parent.)

 

Taxonomy hierarchies do not neatly transform into useful class hierarchies.

- i.e. making a tree of exceptions

- substitutability can be a hard problem.

something is kind of a TypeEror, kind of a RuntimeError… use multiple inheritance.

 

substitutability is a nontrivial problem. if you start a sentence with “well you just…” that’s probably not a good answer.

 

* think about code reuse.

 

Open-Closed Principle

- “software enttiies (classes, modules, functions, etc.) should be open for extension, but closed for modification”

- has many different interpretations

- sometimes it refers to use of abstract base classes to create fixed interfaces with multiple impls

 

Facts of life when subclassing builtins:

- dicts: overridding __getitem__ does not solve it for get()

it’s possible for you to do something that breaks an invariant.

so need to override every method instead of just one

 

OCP in python with name mangling

- A method named __update in a class caleed MyDict transforms the name into _MyDict__update

- This makes the method invisible to subclasses. Use this to create protected internal calls in addition to overridable public methods.

 

class MyDict:

def __init__(self, iterable):

self.items_list = []

self.__update(iterable)

 

def update(self, iterable):

for item in iterable:

self.items_list.append(item)

__update = update

 

This is what the double-underscore is for.