Python 3 Porting Guide

Author:Brian Curtin <curtin@acm.org>
Date:July 29, 2010

Warning

This document is under construction.

Introduction

The move from Python 2.x to 3.x introduced a window of time where a number of changes could be made in order to cleanup the language. In doing so, a level of backwards incompatibility was introduced for the betterment the language.

Outlined below are details of the changes introduced in Python 3 and their impact on porting. Where possible, example code is used.

Organizational Changes

Over the lifetime of Python, the names of some packages and modules have deviated from the standards laid out in PEP 8. During the creation of Python 3, several changes were made to bring names back to conformance with the standard and reorganize some of the common functionality which existed side-by-side.

Name Changes

The following modules were renamed outright.

Python 2 name Python 3 name
__builtin__ builtins
ConfigParser configparser
copy_reg copyreg
cPickle pickle
Queue queue
repr reprlib
SocketServer socketserver
Tkinter tkinter
_winreg winreg
thread _thread
dummy_thread _dummy_thread
markupbase _markupbase

When writing code to support both Python 2 and 3 in the same codebase, a common import idiom is to try the new name first, then fall back to the old name imported as the new name.

try:
    import queue
except ImportError:
    import Queue as queue

Reorganization

The following objects were renamed and moved into packages in order to group common functionality.

Python 2 name Python 3 name
xrange() range()
reduce() functools.reduce()
intern() sys.intern()
unichr() chr()
basestring() str()
long() int()
itertools.izip() zip()
itertools.imap() map()
itertools.ifilter() filter()
itertools.ifilterfalse() itertools.filterfalse()
cookielib http.cookiejar
Cookie http.cookies
htmlentitydefs html.entities
HTMLParser html.parser
httplib http.client
Dialog tkinter.dialog
FileDialog tkinter.FileDialog
ScrolledText tkinter.scolledtext
SimpleDialog tkinter.simpledialog
Tix tkinter.tix
Tkconstants tkinter.constants
Tkdnd tkinter.dnd
tkColorChooser tkinter.colorchooser
tkCommonDialog tkinter.commondialog
tkFileDialog tkinter.filedialog
tkFont tkinter.font
tkMessageBox tkinter.messagebox
tkSimpleDialog tkinter.simpledialog
robotparser urllib.robotparser
urlparse urllib.parse
cStringIO.StringIO() io.StringIO
UserString collections.UserString
UserList collections.UserList

The contents of the following modules were merged into other modules which share a common theme.

Python 2 name Python 3 name
BaseHTTPServer http.server
CGIHTTPServer http.server
SimpleHTTPServer http.server
whichdb dbm
anydbm dbm
dbm dbm.ndbm
dumbdbm dbm.dumb
gdbm dbm.gnu
dbm dbm.ndbm
dbm dbm.ndbm
DocXMLRPCServer xmlrpc.server
SimpleXMLRPCServer xmlrpc.server
commands subprocess

The following built-in functions were moved into packages.

Python 2 name Python 3 name
reduce() | functools.reduce()
reload() | imp.reload()

Optimized Modules

Modules such as pickle and StringIO have traditionally been offered with C implementations for performance reasons. Rather than continue to expose two different implementations, a decision was made to expose both implementations under the same name, choosing to utilize the C implemenation if available and falling back to the pure-Python version if not.

This decision removes the need for the following idiom, instead making the more performant decision for the user.

try:
import cPickle as pickle
except ImportError:
import pickle

Printing

One of the most obvious changes when porting code to Python 3 is that print is no longer a statement. Introduced in Python 3 and backported as far back as 2.6 (although not as the default), print() became a built-in function.

print as a function offers all of the same features of print as a statement but in a more natural syntax that fits with the rest of the language.

In the event that you need to support both Python 3 and Python 2.5 or prior, it is best to use the sys module’s output capabilities: the sys.stdout or sys.stderr file objects. This method leaves the handling of print features like separators and line endings up to the user.

Python 2.5 compatible format Python 2.6+ format Python 3 format
sys.stdout.write("hello world\n")
print "hello world"
print("hello world")
sys.stdout.write("hello %\n" % name)
print "hello", name
print("hello", name)
sys.stdout.write("\n".join([x, y]))
print "\n".join([x, y])
print(x, y, sep="\n")
sys.stderr.write("error\n")
print >> sys.stderr, "error"
print("error", file=sys.stderr)
sys.stdout.write("one line")
print "one line",
print("one line", end="")

Backporting Note

The print() function was backported to Python 2.6 in the way of the __future__ module. from __future__ import print_function will expose print() as a function and removes the ability to use print as a statement.

Executing Arbitrary Code

exec Statement

As with the print function, exec has also become a function. exec() is used for the dynamic execution of arbitrary Python code either as a string or a code object. Using exec() as a function is similar to exec as a statement.

Where the exec statement used to take the format exec some_code in global_namespace, local_namespace, the order is still the same, but just passed as parameters to the exec() function.

Python 2 format Python 3 format
exec "print 'hello'"
exec("print 'hello'")
exec code in global_ns
exec(code, global_ns)
exec code in global_ns, local_ns
exec(code, global_ns, local_ns)

execfile Statement

Starting with Python 3, the execfile statement is no longer available. An alternative is to use the compile() function in conjunction with exec(). compile() can create a code object from a file, and then it can be passed into exec().

exec(compile(source_code, source_file_name, "exec"))

Exceptions

Exceptions were changed in a few ways for Python 3. First, strings are no longer usable as exceptions. Additionally, the raise syntax no longer accepts comma-separated arguments, instead working with exception instances. Perhaps the largest difference in Python 3 is that exception objects are only available via the as keyword, which was introduced in 2.6.

Raising Exceptions

Raising an exception creates an instance of Exception or a subclass, so it follows that the raise statement uses the same syntax required to create other class instances.

Python 2 format Python 3 format
raise IOError, "file error"
raise IOError("file error")
raise "ahhhh!"
raise Exception("ahhhh!")
raise TypeError, msg, tb
raise TypeError.with_traceback(tb)

Handling Exceptions

A major change to exception handling is the use of the as keyword for assignment of the exception object. Catching multiple exception classes remains the same as before, implemented using a tuple with explicit parentheses.

Python 2 format Python 3 format
try:
    fn()
except IOError, err:
    print err
try:
    fn()
except IOError as err:
    print(err)
try:
    fn()
except (IOError, TypeError), err:
    print err
try:
    fn()
except (IOError, TypeError) as err:
    print(err)

Backporting Note

The exception as var syntax was backported to Python 2.6 which allows you to use either the 2.x or 3.x way simultaneously.

Due to the fact that the as keyword isn’t found Python 2.5 and before, code which must run on versions with and without as support can use the following idiom.

import sys
try:
    fn()
except (IOError, TypeError):
    err = sys.exc_info()[1]
    print(err)

See sys.exc_info() for further details on it’s use.

Exceptions from Generators

Generators have a throw() method to raise an exception in the current frame, then return the next object in the function. The throw() method follows similar rules as Raising Exceptions: no string exceptions and custom messages come in the form of an exception instance. The general case of calling gen.throw(Exception) remains the same across 2 and 3.

Python 2 format Python 3 format
gen.throw(ValueError, "bad value")
gen.throw(ValueError("bad value"))
gen.throw("bad value")
Deprecated

Division

Python 3 introduces the ability to do true division using the / division operator, as proposed and outlined in PEP 238. Although the functionality has been in place since Python 2.2, it did not become the default operation for / until Python 3.

True division works for int() and long() as well as float(). In all cases a float() object is the return value.

PEP 238 Excerpt

Note that for int and long arguments, true division may lose information; this is in the nature of true division (as long as rationals are not in the language). Algorithms that consciously use longs should consider using //, as true division of longs retains no more than 53 bits of precision (on most platforms).

The following examples show the difference between the division operator on Python 2 and 3.

Python 2 format Python 3 format
>>> 1/10
0
>>> 1.0/10
0.10000000000000001
>>> 10/1
10
>>> 1/10
0.1
>>> 1.0/10
0.1
>>> 10/1
10.0

Preparation Options

Whether you are preparing to move from Python 2 or you need to support both 2 and 3 at the same time, there are two ways to get true division on Python 2.

Access to true division can be had via code by using the __future__ module’s from __future__ import division, which is available dating back to Python 2.2.

Additionally, the -Q command line option to the Python 2 interpreter allows the user to globally define what will occur during division operations. The option accepts four arguments.

  • old is the default in Python 2.2 and forces the “classic” division operator to be enabled.
  • warn causes “classic” division of int and long to issue a DeprecationWarning.
  • warnall issues DeprecationWarning on “classic” division of float or complex numbers, in addition to the characteristics of warn.
  • new changes the / division operator to use true division. This is the same result as using from __future__ import division.

The // floor division operator remains unchanged, providing the same functionality across both versions.

Long Integers

PEP 237, first drafted in 2001, introduced an effort to remove the distinction between int and long integers. The work was completed in a three-phased approach spanning Python 2.2 through 2.4. Python 3.0 added a final step by officially removing the long() type and long literals (e.g., 123456789L).

In places where you would use long(), int() is the replacement and it will store the value in the correct internal representation.

In places where you used the L suffix to produce``long`` literals, removal of the L is necessary, otherwise a SyntaxError will be raised.

Iterators

Python 3 introduced a next() function to replace the next() method on iterator objects. Rather than calling the method on the iterator, the next() function is called with the iterable object as it’s sole parameter, which calls the underlying __next__() method.

Iterating

Given the following simple generator function, getting the values one-by-one is done slightly different across versions.

def word(letters):
    for letter in letters:
        yield letter
Python 2 format Python 3 format
>>> x = word("hi")
>>> x.next()
'h'
>>> x.next()
'i'
>>> x.next()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> x = word("hi")
>>> next(x)
'h'
>>> next(x)
'i'
>>> next(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

Backporting Note

The next() function was backported to Python 2.6 which allows you to use either the 2.x or 3.x way simultaneously.

Creating Iterators

Due to the change discussed in Iterating, creation of iterators is also slightly different in Python 3. The next() function now calls an iterator’s __next__() special method, whereas Python 2.5 and prior call the iterator’s next() method directly. The good thing is that it’s easy to write iterators that can run on versions prior to 2.6 and 3 at the same time.

class Countdown(object):
    def __init__(self, max):
        self.value = max

    def __iter__(self):
        return self

    def __next__(self):
        """2.6-3.x version"""
        return self.next()

    def next(self):
        """2.5 version"""
        self.value -= 1
        if self.value == 0:
            raise StopIteration
        else:
            return self.value

Dictionaries

A number of changes were made to the dict type to better integrate it with the current trends in the language. PEP 3106, which was adopted in 3.0, outlines the introduction of a new feature called dictionary views, or a more lightweight replacement for the lists returned by several methods. Additionally, the PEP explains the removal of iter* methods on the attributes of dictionaries. Another removal is the dict.has_key() method.

Supporting Dictionary Views

Python 3.0 introduced the concept of views, or a dynamic peek into the contents of a dict. Even when a dictionary is mutated, all of its views are kept in-sync to reflect the current state of the dictionary. Views are supported on dict.items(), dict.keys(), and dict.values() to replace the old-style form which returned a list.

Backporting Note

Since the view concept has no equivalent in 2.x, the backporting of views was done in Python 2.7 under different names. View-returning methods are prefixed with view, while the list-returning methods remain unchanged.

Supporting 2.7 and 3.x

Due to the name difference in the APIs, supporting views in both 2.7 and 3.x results in a less than beautiful snipped of code.

import sys

view_attr = "items" if sys.version_info.major == 3 else "viewitems"
location = {"city" : "Chicago", "state" : "Illinois"}

def view_location():
    for key, value in getattr(d, view_attr)():
        print("{}: {}".format(key, value))
>>> view_location()
city: Chicago
state: Illinois

Supporting pre-2.7 and 3.x

If your project supports versions prior to 2.7 where there is no concept of a view, you can create a list from a view in order to safely remain compatible across versions.

keys = list(location.keys())

Without a list, you should be aware of how views function when the dictionary is changed. In a lot of cases you will be fine as-is. Other cases may require an explicit list like above.

Python 2.6.5+ (release26-maint:83007:83008, Jul 27 2010, 22:52:5)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> location = {"city" : "Chicago", "state" : "Illinois"}
>>> keys = location.keys()
>>> print(keys)
['city', 'state']
>>> del location["city"]
>>> print(keys)
['city', 'state']
Python 3.2a0 (py3k:83172M, Jul 27 2010, 23:01:10)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> location = {"city" : "Chicago", "state" : "Illinois"}
>>> keys = location.keys()
>>> print(keys)
dict_keys(['city', 'state'])
>>> del location["city"]
>>> print(keys)
dict_keys(['state'])

Notice that the 3.2 code is using a view for dict.keys, which results in the keys object being updated after the dictionary was modified. Contrast that with the 2.6 example which uses a list for keys, which does not stay in-sync with the dictionary from which the keys are from.

Replacing iter* methods

Use of dict.iteritems() and dict.itervalues() should be replaced by the following familiar looping construct.

for key, value in d.items()

The same could be done for dict.iterkeys(); however, key in d is a more common option.

Replacing dict.has_key()

The dict.has_key() is not supported in Python 3.0, in favor of the key in d idiom which was introduced in 2.2. Formal deprecation of dict.has_key() began in 2.6.

Python 2 format Python 3 format
if d.has_key("foo"):
    bar()
if "foo" in d:
    bar()

Strategies

When planning support for both Python 2 and 3 simultaneously, there are a number of solutions to a number of problems. Perhaps the most important prerequisite to supporting mutliple (or even one!) versions of Python is tests. Without tests, it will be very hard to tell if your code works the way you intend it to across versions, especially across 2 and 3. Consider expanding your test coverage in order to ensure the quality of your application as you introduce your users to Python 3.

What follows are three strategies for simultaneously supporting Python 2 and 3. An option not listed here is to create two branches for your project and support 2 and 3 separately. That method may require more work for a developer, but it’s straightforward to do and doesn’t require any discussion here.

Single Codebase

One possibility for supporting both Python 2 and it’s backwards incompatible successor, Python 3, is by writing all of your code in a single codebase. Some users have been able to support ranges as wide as Python 2.0 through 3.0 all from one source.

Imports

As listed in Organizational Changes, a number of package and module names have changed in Python 3. In order to support import name changes, an common idiom is to try one name and fallback to another.

try:
    # 3.x name
    import configparser
except ImportError:
    # 2.x name
    import ConfigParser as configparser

However, the further back your support needs to go, the further you get away from the as keyword and it’s nicetes. If you need to support Python 2.4 and prior, your conditional importing can use __import__().

try:
    # 3.x name
    import configparser
except ImportError:
    # 2.x name
    configparser = __import__("ConfigParser")

Defeating Deprecation

Over the 10 years since Python 2.0 came out in October 2000, plenty of things have come and gone in the language. Due to the deprecation and introduction of some things, codebases which support a wide range of versions will need to do some operations in two different ways depending on the runtime version.

A good way to support differing underlying implementations is to create a compatiblility module. You can use sys.version_info to figure out which version you are running on.

compat/
    __init__.py
    two.py
    three.py

compat/__init__.py

import sys
if sys.version_info[0] == 2:
    from two import *
else:
    from .three import *

The idea here is that your 2.x-specific code resides in compat.two and is only accessed when you import compat while running on 2.x. The same for compat.three – it can use the new way to do things.

compat/two.py

from types import DictType
def is_dict(obj):
    return type(obj) == DictType

As you may recognize, two uses an old method for figuring out if some object is a dictionary.

compat/three.py

def is_dict(obj):
    return isinstance(obj, dict)

three uses the newer method, which would also work for recent versions of Python 2.x as well.

Introducing six

A compatibility-focused package currently exists to support this exact method of working with Python 2 and 3 at the same time. six is a package which includes compatiblilty layers for imports, constants, syntax, and other areas which differ between the two versions.

Pure 2.x Source

Maintaining individual branches for 2.x and 3.x support isn’t a well liked solution for most projects. With the introduction of 2to3, projects can keep their source in 2.x format and run 2to3 when preparing for a release. 2to3 applies what are known as “fixers” in lib2to3 to convert 2.x code into the equivalent 3.x code.

example.py

# 2.x print without trailing new-line
print "hello world",

Output:

$ example.py
RefactoringTool: Skipping implicit fixer: buffer
RefactoringTool: Skipping implicit fixer: idioms
RefactoringTool: Skipping implicit fixer: set_literal
RefactoringTool: Skipping implicit fixer: ws_comma
--- example.py (original)
+++ example.py (refactored)
@@ -1,2 +1,2 @@
 # 2.x print without trailing new-line
-print "hello world",
+print("hello world", end=' ')
RefactoringTool: Files that need to be modified:
RefactoringTool: example.py

See the documentation for 2to3 for all available options.

Tests

In order to successfully use 2to3 at release time, your project should have an exhaustive test suite. Although 2to3 intends to be as complete a tool as possible, there may be situations it does not currently handle which could leave your project in a state different than expected.

Along with tests, you should manually review the changes made by 2to3 to ensure that your project was modified in a correct manner.

Pure 3.x Source

Thanks to 3to2, the possibility exists to have 3.x source as your base and convert it to 2.x – the opposite of Pure 2.x Source.

The project is currently underway for the Google Summer of Code, an extension of the work done during the 2009 GSoC. For further details and examples, see http://bitbucket.org/amentajo/lib3to2.

As with the 2to3 method, a good test suite is instrumental in succesfully using a 3.x base and 3to2 for conversion.