PyCon 2012 – The Art of Subclassing (Raymond Hettinger)

Here are my notes. Many thanks to Raymond for this excellent talk. Hopefully he can share his slides soon :)

 

Friday – 11:30am – The Art of Subclassing

Raymond Hettinger

 

- revisit the notion of a class and subclass witha  view to deepening our core concepts of what it means to be a calass/subclass/instance

- discuss use cases, principles, and design patterns

- demonstrate super()

- use examples from standard library

 

terminology:

adding: Dog adds the bark() method to Animal which didn’t know how to do before

overriding: snake replaces walk()

extending: cat modifies walk() to add tail swishing

 

Patterns for subclassing:

1. Frameworks

- parent class supplies all of the controller functionality and makes calls to prenamed stub methods

- the subclass overrides stub methods of interest

- e.g. SimpleHTTPServer runs an event loop and dispatches HTTP requests to stub methods like do_HEAD() and do_GET()

(these start as just “pass”)

- someone writing an HTTP server would use a subclass to supply to desired actions in the event of a GET or HEAD request

 

very static. choices to override are predefined. framework tells you “here are the things you can subclass”

 

2. Dynamic dispatch to subclass methods

- parent class uses getattr() to dispatch to new functionality

- child class implements appropriately names methods

example: cmd.py

 

* best way to become a better python programmer is to read code written by great python programmers, like the standard library

 

polish a class and think about subclassers (like yourself) in the future

dynamic dispatch is a great method.

 

3. Call Pattern

don’t hardwire class name in __repr__ — use self.__class__.__name__

 

super:

#1 misunderstanding: in other languages, means i’m calling one or more of my parents.

‘self’ might not be you.

when you call super, it’s not your ancestry tree that matters, it’s the caller’s.

super means go up from your children – start at the bottom.

 

Let’s retool our thinking about subclasses.

 

patterns from java, etc. cause you to think in a constrained way in python.

 

What does it mean to be an object/class?

Object: an entity that encapsulates data together with functions (methods) for manipulating that data.

We implement that with dictionaries:

- instance dictionaries hold state and point to their class

- class dicts hold the functions (methods)

 

InstDict1 ->

InstDict2 ->  ClassDict

InstDict3 ->

 

instance points at the class, not the other way around.

 

So what is a subclass?

- a class that delegates work to another class

- a subclass and its parent are just two different dicts that contain functions

- subclass points to its parent

- pointer means “i delegate work to this class”

 

InstDict1 ->

InstDict2 ->  SubClassDict -> ParClassDict

InstDict3 ->

 

“subclass means to specialize” –> no, that’s not it in python. it’s delegation.

 

- Subclassing can be viewed as a technique for code reuse.

- It is the subclass that is in charge

- The subclass decides what work gets delegated.

 

Operational (implementaiton) view of subclassing:

- classes are dicts of functions

- subclasses point to other dictionaries to reuse their code

- subclasses are in complete contorl of what happens

 

Conceptual views of a subclasses:

[the way many people think of them - not how it actually is!]

- parent classes define an interface

- subclasses can extend that interface

- parents are in charge

- subclasses are just specializations of the parent

 

Shift from the view at the bottom to the view at the top.

 

Liskov Substitution Principle

“If S is a subtype of T, then objects of type T may be replaced with objects of the S”

 

Why do we care about Liskov?

- It is all about polymorphism and subsitutability so that our subclass can be used in client code without changing the client code.

- Example: consider a large body of code for a cash register that calls an accept_payment() method on a payment object

e.g. deployed code given to clients

- We can write separate classes for credit cards, cash, checks, etc that all owrk without chanigng the cash register code

- We can write subclasses of the credit card class that provides custom handling for Visa, Amex, etc.

 

It’s a Principle not a Law.

 

Liskov Violations:

- any part of the api that isn’t fully substituatable

- this is common and normal

- useful subclasses commonly have different constructor signatures

- for example, the array API is very similar to the list API but the constructor is different:

s = list(someiterable)

s = array(‘c’, someiterable)

-> not substitutable.

 

Goal: minimize or isolate the impact when signature is different.

 

e.g. sets:

MutableSet instances suport union, intersection, difference

So they need to be able to create new instances of MutableSets

But the siganture of the constructor is unknown

So, we factor out calls to the constructor in _from_iterable().

-> if your constructor has a different signature, override _from_iterable and you’re good to go.

 

-> Factor out all your Liskov violations into one place to make it easier on subclasses. Just one adapter.

 

The Circle / Ellipse Problem

- in math, circle is just a special case of ellipse

- if one ellipse method stretches an axis, what does that mean for circle instances? (i.e. circle.skew())

- the problem is that circles have less info that an ellipse and have constraints that don’t apply to general ellipses

- the reverse wouldn’t work either because circles have capabilites that don’t apply to ellipses (i.e. the bounding box is a square)

 

-> there’s no Liskov-wise right answer as to which is the superclass

 

So how do you decide?

Not specialization: think about code reuse.

Principle for deciding which is the parent class: whichever maximizes code reuse. (whichever has the most reusable code should be the parent.)

 

Taxonomy hierarchies do not neatly transform into useful class hierarchies.

- i.e. making a tree of exceptions

- substitutability can be a hard problem.

something is kind of a TypeEror, kind of a RuntimeError… use multiple inheritance.

 

substitutability is a nontrivial problem. if you start a sentence with “well you just…” that’s probably not a good answer.

 

* think about code reuse.

 

Open-Closed Principle

- “software enttiies (classes, modules, functions, etc.) should be open for extension, but closed for modification”

- has many different interpretations

- sometimes it refers to use of abstract base classes to create fixed interfaces with multiple impls

 

Facts of life when subclassing builtins:

- dicts: overridding __getitem__ does not solve it for get()

it’s possible for you to do something that breaks an invariant.

so need to override every method instead of just one

 

OCP in python with name mangling

- A method named __update in a class caleed MyDict transforms the name into _MyDict__update

- This makes the method invisible to subclasses. Use this to create protected internal calls in addition to overridable public methods.

 

class MyDict:

def __init__(self, iterable):

self.items_list = []

self.__update(iterable)

 

def update(self, iterable):

for item in iterable:

self.items_list.append(item)

__update = update

 

This is what the double-underscore is for.

One thought on “PyCon 2012 – The Art of Subclassing (Raymond Hettinger)

  1. Pingback: My experience note-taking at PyCon 2012 « Brian Rue’s Blog

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s