Beginners Guide to Python Core Development

Author:Brian Curtin <curtin@acm.org>
Date:October 26, 2010

Introduction

The development of Python takes a wide range of individuals from a wide range of backgrounds and forms a successful team. From students to retirees, from those in the East to the West, it takes all kinds. It takes people like you.

When starting out in a project like Python, there are a number of guidelines to ease your introduction and maximize your success. What follows is a series of those guidelines, from getting started to getting your work accepted to the core, and many things in between.

The Setup

Before you can start working, you need to acquire a few things. You’ll have to download some software and code in order to get started.

Source

In order to work on Python you’ll need to get its source. The development team uses Subversion for source control, so you should start by downloading it. Using Subversion to check out the source gives you a local copy of the most up-to-date version of each file in the project.

What you check out depends on what you are interested in working on. The following table shows some popular paths.

Version Path
2.7 branches/release27-maint
3.1 branches/release31-maint
3.2 branches/py3k

If you are interested in working on 3.2, you’ll want to look at branches/py3k. To check it out, run this command:

svn co http://svn.python.org/projects/python/branches/py3k

That command will checkout the latest files in the py3k branch and put them in a subdirectory named py3k. To specify directory name, you can add one at the end of the command (separated by a space).

For other versions, replace branches/py3k in the above example with the path you are interested in. This page shows all available paths.

For more information on Subversion, see the free online book Version Control with Subversion.

Now that you have the source, you’ll want to build it.

Building the Source

Windows

In order to build Python on Windows, you will need to have Microsoft Visual Studio 2008. The zero-cost version, called Express, is the minimum required version.

Use Visual Studio 2008 to open the PCbuild\pcbuild.sln file in your checkout folder. Using the Build menu, choose Build Solution. After the build completes, you’ll have a new Python interpreter at PCbuild\python.exe.

Tip

You can build a debug version of Python by choosing the Debug build configuration, which produces PCbuild\python_d.exe.

You may have noticed that a number of extension modules did not succeed, such as _ssl and _sqlite3. There are several subprojects which require the source of external libraries, i.e., more checkouts. Unless you intend to work with any of the failed modules, you can safely move along. If you require any of those modules, please read through the PCbuild\readme.txt file and the Tools\buildbot\external*.bat files for further information.

Mac, Linux, and similar

Python development on UNIX-like systems involves a set of tools typically available through your package manager. For your particular operating system, please see your documentation for how best to obtain these tools.

Packages

On Ubuntu you’ll need to have the build-essentials packages installed.

To begin, you must run the configure script.:

./configure

Tip

You can build a debug version of python by adding the --with-pydebug option to configure.

This will check various the existence of various dependencies and prepare the environment for compilation.:

make

make will compile the source tree with GCC and output the python binary in the current directory.

Building the Documentation

In order to work with the documentation, it’s helpful to know how to build it.

The documentation is located in the Docs subdirectory of your checkout. Change into that directory and run the make command (make.bat on Windows) to see a list of options. The most common option is make html which builds the same documentation you see online.

make html will first see that you have the required dependencies (Sphinx, docutils, Jinja, and pygments), and if not, it will obtain them for you. The script will then build the documentation and put the output in the build/html subdirectory. Opening build/html/index.html is the same as going to docs.python.org.

Additionally, http://www.python.org/dev/doc/ contains a number of resources for the Python documentation.

Running Python

The first thing you’ll want to do with your newly built Python interpreter is run it. Depending on your platform, you invoke the interpreter in different ways. When you run it, you’ll get that familiar >>> prompt that we all know and love.

Mac: ./python.exe
Windows: PCbuild\python.exe
Linux: ./python

Regression Tests

A more helpful start would be to run the regression tests to see what works and what doesn’t. For brevity in the examples, it’s assumed you are on Linux from now on.:

./python -m test.regrtest
<list of tests as they are run>
<number of tests that are OK>
<list of tests that failed>
<list of tests skipped>

Hopefully nothing fails. If anything does, that might be a place to start working. In any case, now you know how to run the tests. For more test info, add the -h option to regrtest.py to receive the full list of options.

Getting to Work

Once you’re all setup, it’s time to get to the good stuff. There are a number of ways to get started, but before you begin it would be helpful to know and understand the following.

  1. Read PEP-7 and PEP-8. These are the style guides for the Python codebase. They specify rules such as using spaces instead of tabs. The PEPs are a quick and easy read and ensure that you are on the same page as everyone else.
  2. Always include tests when you change code! Without tests, you can’t prove that your code works. Try it, I bet you can’t do it :) As best you can, come up with all of the scenarios your code can go through. The tests for each module are located in the Lib/test directory.
  3. Be open. In an environment like Python’s, a number of people will see your work as it goes through the stages of development. Some of them have worked with the code you are fixing. Code reviews and their resulting comments are a necessary step that will not only help you write better code, they’ll help Python by containing better code.

Working on Issues

The Python core team uses an issue tracker called Roundup, located at bugs.python.org. There you will find the list of all bug reports and feature requests for the code and documentation.

Bugs

Even the best projects in the world have bugs. Bugs in the code may even be the reason you are reading this document right now. When looking for bugs to work on, there are a few things to keep in mind.

  1. The regression tests are a great place to start. Many users have many different hardware and software configurations, so running the tests may show failures specific to your machine. You’ll eventually have to run the tests anyways, so why not start there?

  2. The easy issues summary on the left navigation pane contains a number of issues tagged as being good for beginners. These issues tend to involve small isolated fixes, easy enough for someone to fix without getting their hands too dirty.

  3. Queries are your friend. The default view when going to bugs.python.org is to show every open issue, from the most recent acted upon to the oldest. When you are starting out, you may want to narrow your search. On the left navigation pane, hit the edit link on Your Queries. From there you can create or edit queries, with options for every field. Interested in 2.7 crash bugs on Windows? Select 2.7 from the versions dropdown, select crash from the type dropdown, and select Windows from the component dropdown, and then hit Search.

    • Check out the needs patch and needs review queries. needs patch is exactly that: a list of issues which need a patch. needs review is useful because without reviews, many issues get stalled. It’s even more useful to a new contributor because the more you review other people’s work, the more you’ll learn, and the more your name will be seen. When you need a review, it’s much easier to get others to review your work when they’ve seen that you offer to do the same.
  4. It’s good to be nosy. In Roundup, the nosy list is a list of interested users. If you find an interesting issue that you don’t immediately have a comment for, you can add yourself as nosy and you’ll be emailed when the issue receives comments. This is useful as a bookmark to come back to things later, but to not miss any chatter in between.

Bug fixes will most often end up in numerous branches, as expected when versions build off of previous versions. Python’s policy of bug fixes typically states that the previous version will receive bug fixes for some time, so if you fix a bug in 2.7, it may also end up in 2.6 if applicable. Fixing the bug for the most recent version is often enough to get the job done. If you feel inclined and are comfortable, look at the maintenance branches to see if your work applies there as well and if any additional work is needed.

Features

New features to Python are often the most fun parts to work on. When thinking about new features, there are a few things to keep in mind.

  1. New features can only go into new major or minor versions, where releases are defined as Major.Minor.Revision. What this means is that once 3.2.0 is released, no new features can go into 3.2.1 or beyond – they will go into 3.3.0. Be sure to check the release schedule (defined in a release specific PEP) to know what the window for new features is for the specific release.
  2. Major features should be discussed with the community before being considered for inclusion. Think you have a way to fix the GIL? You should probably consult the python-ideas mailing list first. The list is used for fleshing out major changes in the language, from adding new features to changing old ones. Have an addition to expand on an existing API? It’s usually fine to go ahead to suggest and discuss it on Roundup.
  3. New modules need to go through the PEP process before inclusion. There are a lot of great modules out there, but they need to be proven in the wild first, and then examined to find how and where they fit in.

When writing code for a new feature, you should consider the following:

  1. New features should come with documentation. Ideally you should include the shortest, most straightforward explanation of your code as you can. If you aren’t comfortable with the way documentation is written for Python, do your best to write as much helpful information as you can and someone can help you with the format.
  2. New features should come with tests. As mentioned earlier, untested code doesn’t work. Python uses it’s own unittest module for testing and includes a test runner to run all available tests. Do your best to exercise any and all cases your code can go through. Should your function raise a ValueError when given certain values? Make sure you have a test for it.

Getting Your Work Accepted

After the work has been done, it’s time to show what you’ve completed and start the road to getting it into a Python release. In some cases, this might be where most of your time is spent for a particular issue. In other cases, this step can be a breeze. In any case, there are processes in place to ensure that Python’s quality and dependabilty remain intact.

Creating a Patch

So you’ve already fixed that tricky bug and added a test case for it? The next step is to get that code into a core developer’s hands. The development team takes patches – files showing the difference between the original file and your changes – from contributors like you all the time.

The best way to generate a patch is to use the diff command within Subversion.

svn diff > issue1234.diff

Here are a few tips for making sure your patch is easy to work with.

  1. Create your patches using the unified diff format. Subversion’s diff command uses the unified format by default. This is the format that core developers are used to working with, so it makes the process go quicker if your patch is in the expected format.
  2. Create your patches from the top-level directory of your checkout, e.g., the py3k directory rather than the py3k\Docs directory.
  3. Name your patch appropriately. my_fix.diff isn’t very descriptive and will get lost in a sea of other diff files. issue1234_doc_typos.diff tells a developer exactly what’s in it. If you need to upload a second patch, a common idiom is to version them, so adding v2 on your second patch makes it stand out from the first.

Updating the Issue

Now that you’ve written the code or documentation and have your patch ready, it’s time to let everyone know what you have done. In order to maximize communication and minimize ambiguity, there are a number of ways to effectively display your work.

Comments

Writing a good comment for an issue can be an easy task. Just write your thoughts about a bug or feature. However, once you’ve made an attempt at fixing that issue, the task gets a little more complex.

  1. Be thorough. If you questioned yourself while working on the issue, someone else would have done the same, and may do so while looking at your patch. Trivial things like “I make sure the file exists before opening it” can be left out. More complex things like “Sometimes foo() can return None which used to cause bar() to raise a BlahError. I fixed bar() to handle that case by...” can be helpful and answer questions before they get asked.
  2. Keep the comments on-topic and specific to the issue at hand. If you found another issue in the process, it might deserve a quick mention, but it’s better off being fully discussed in it’s own separate issue.
  3. Use tracker links to cross-reference other issues, comments, or revisions which supplement your comment.

Uploading a Patch

Patches can be uploaded with a comment or by themselves, along with a description of the file itself. The File Description box is useful for stating which version the patch is for, and for which revision it was created against. A good description would be “patch for py3k r81842”.

Classification Fields

When updating an issue, it’s helpful to make sure all of the various fields are correct for the current state. The fields are used in search queries and at certain stages in the process, the issue will show up in different queries and attract different sets of people. If you don’t have access to update certain fields, feel free to leave a comment suggesting that a certain field gets updated from one option to another.

Title
The title of an issue should be the shortest, most clear description of the issue at hand. As investigation into an issue sometimes shows, the issue itself may be different than originally reported. Updating the title to reflect the currently understood issue is always welcome.
Type
Some types convey information better than others, such as the crash type. behavior is a type that also applies for crashes, but given that crash situations are pretty important, it’s better to choose the more descriptive type.
Components
There are a number of component types, and as issues progress through discussion, sometimes components change. This is often seen when a bug in some code is reported, but it is later found to be a documentation issue. In that case, the component might change from Library (Lib) to Documentation.
Versions
The version multi-select field is used to show the versions which will be receiving the fix to the bug or new feature. As time goes on, certain versions are no longer applicable to a particular issue, either because a feature window has closed or because a bug is no longer seen on certain versions.
Nosy List
As mentioned earlier, this is a list of interested users. If you know of a user who may be able to comment on the issue, you can add them to the list.

Waiting for the Commit

Outside of trivial issues such as typos, taking an issue from its submission to its checkin can take anywhere from days to weeks to, in some cases, months. It all depends on how well the issue is understood, how well the fix has been written, and the available time of one of the core developers. The core developers are responsible for checking in your patches to the appropriate areas of the codebase, and doing so takes time.

Code Review

All patches go through some form of code review by at least one person. Trivial fixes are often accepted with a simple review by a core developer. More complex fixes are often reviewed by more than one person, usually including the maintainer of that specific area of code.

In any event, code reviews are in place to ensure the quality of the codebase. Having more eyes on the code hopefully results in a lower issue rate with the code in question. When going through the process of working with issues in Python, it helps to be receptive to comments or constructive criticism of your work. The faster you are able to react and the more open you are to suggestion, the more likely your work will be quickly accepted.

A Matter of Time

Sometimes it all comes down to time. You found a bug, you fixed it, you submitted it, it looks alright. As is often the case in free software, time is one of the bottlenecks. There are typically less people entrusted with commit access than there are contributors at any given time, meaning that core developers’ time has to be split up not only among the things they want to do, but also the things that you want to do.

Conclusion

Working on Python can be a very rewarding experience. For many people, they choose to give back by working on the code because Python gave to them the power to solve tough problems. It gave them the power to make the next best tool and to benefit in a number of ways. For others, giving back is a form of education, to learn the depths of the language and expand their knowledge while providing a service. No matter what reasons you choose, your contributions to the community are invaluable, and hopefully this document helps you in making the best contributions possible.