python-dev Summary for 2006-06-01 through 2006-06-15
- Announcements
- Summaries
- Getting more comparable results from pybench
- PEP 360: Externally Maintained Packages
- Universally unique identifiers (UUIDs)
- PEP 275: Switching on Multiple Values
- The period of the random module's random number generator
- Pre-PEP: Allow Empty Subscript List Without Parentheses
- PEP 337: Logging Usage in the Standard Library
- inspect.isgenerator
- Unescaping entities with sgmllib
- Scoping vs augmented assignment vs sets (Re: 'fast locals' in Python 2.5)
- Checking out an older version of Python
- Source control tools
- Underscore assignment in the interactive interpreter
- Removing MAC OS 9 cruft
- Fixing buffer object's char buffer support
- Importing subpackages in Jython
- RFC 3986: Uniform Resource Identifiers (URIs)
- False instead of TypeError for frozenset.__contains__
- IOError or ValueError for invalid file modes
- Testing, unittest and py.test
- hex(), oct() and the 'L' suffix for long numbers
- Adding an index of Python symbols
- Behavior of searching for empty substrings
- subprocess.IGNORE
- Deferred Threads
- Previous Summaries
- Skipped Threads
- Epilogue
[The HTML version of this Summary is available at http://www.python.org/dev/summary/2006-06-01_2006-06-15]
Announcements
Python 2.5 schedule
Python 2.5 is moving steadily towards its next release. See PEP 356 for more details and the full schedule.
Contributing threads:
Request for Bug Trackers to replace SourceForge
The Python Software Foundation's Infrastructure committee asked for suggestions for tracker systems that could replace SourceForge. The minimum requirements are:
- Can import SourceForge data
- Can export data
- Has an email interface
and if you'd like to suggest a particular tracker system all you need to do is:
- Install a test tracker
- Import the SourceForge data dump
- Make the Infrastructure committee members administrators of the tracker
- Add your tracker to the wiki page
- Email the Infrastructure committee
Be sure to check the wiki page for additional information.
Contributing thread:
Summaries
Getting more comparable results from pybench
Skip Montanaro mentioned that the NeedForSpeed folks had some trouble with the pybench string and unicode tests. In some discussions both on the checkins list and off-list, Fredrik Lundh had concluded that stringbench more reliably reported performance than pybench. There was then a long discussion about how to improve pybench including:
- Using time.clock() on Windows and time.time() on Linux. This was accompanied by a long debate about whether to use wall-time or process time. Both wall time and process time can see interference from other programs running at the same time; wall time because the time consumed by other programs running at the same time is also counted, and process time because it is sampled so that other processes can charge their time to the running process by using less than a full time slice. In general, the answer was to use the timer with the best resolution.
- Using the minimum time rather than the average. Andrew Dalke explained that timing results do not have a Gaussian distribution (they have more of a gamma distribution) and provided some graphs generated on his machine to demonstrate this. Since the slower runs are typically caused by other things running at the same time (which is pretty much unpredictable), it's much better to report the fastest run, which should more consistently approximate the best possible time.
- Making sure to use an appropriate warp factor. Marc-Andre Lemburg explained that each testing round of pybench is expected to take around 20-50 seconds. If rounds are much shorter than this, pybench's warp factor should be adjusted until they are long enough.
At the end of the thread, Marc-Andre checked in pybench 2.0, which included the improvements suggested above.
Contributing threads:
PEP 360: Externally Maintained Packages
After checking wsgiref into the Python repository, Phillip J. Eby requested in PEP 360 that patches to wsgiref be passed to him before being committed on the trunk. After a number of changes were committed to the trunk and he had to go through a complicated two-way merge, he complained that people were not following the posted procedures. Guido suggested that PEP 360 was a mistake, and that whenever possible, development for any module in the stdlib should be done in the Python trunk, not externally. He also requested that the PEP indicate that even for externally maintained modules, bugfixes and ordinary maintenance should be allowed on the trunk so that bugs in external modules don't hold up Python core development.
A number of solutions were discussed for authors of code that is also distributed standalone. Using svn:externals is mostly undesirable because svn is much slower at checking whether or not an svn:externals directory is updated, and because upgrading to a newer version would require making sure that no changes made by Python developers were lost in the new version. Phillip suggested adding an "Externals" directory and modifying Python's setup to invoke all the Externals/*/setup.py scripts, though this would mean having some Python code that lives outside of the Lib/ subtree. Barry Warsaw explained that for the email package, he maintains a directory in the sandbox with all the distutils and documentation stuff needed for the standalone releases as well as the email package from the Python repository through svn:externals. This means having to create some extra directories (since svn:externals doesn't work with individual files) and having one checkout per version of Python supported, but seemed to work pretty well for Barry. People seemed to like Phillip's Externals idea (possibly renamed to Packages), but work on that was postponed for Python 2.6.
One of the side benefits of these discussions was that Thomas Heller generously offered to move ctypes development fully into the Python repository.
Contributing threads:
- wsgiref documentation
- wsgiref doc draft; reviews/patches wanted
- [Web-SIG] wsgiref doc draft; reviews/patches wanted
- FYI: wsgiref is now checked in
- Please stop changing wsgiref on the trunk
- Dropping externally maintained packages (Was: Please stop changing wsgiref on the trunk)
- External Package Maintenance (was Re: Please stop changing wsgiref on the trunk)
- External Package Maintenance
- rewording PEP 360
- Updating packages from external ?
Universally unique identifiers (UUIDs)
Ka-Ping Yee was looking to put his uuid module into Python 2.5. He addressed a number of requests from the last round of discussions, including making UUIDs immutable, removing curly braces from the UUID string and adding the necessary tests to the test suite. Then he asked about how best to address the fact that uuid1() required looking up a MAC address, a potentially slow procedure. At the suggestion of Fredrik Lundh, he changed the API to allow a MAC address to be passed in if it was already known. If a MAC address is not passed in to uuid1(), the getnode() utility function is called, which searches for the MAC address through a variety of routes, including some quicker paths through ctypes that Thomas Heller and others helped Ka-Ping with. The code was checked into the Python trunk.
Contributing thread:
PEP 275: Switching on Multiple Values
Thomas Lee offered up a patch implementing the switch statement from PEP 275. People brought up a number of concerns with the implementation (and the switch statement in general). The implementation didn't allow for any way of allowing multiple values to be mapped to the same case (without repeating the code in the case). The implementation also made the switch statement essentially syntactic sugar for a series of if/elif/else statements, and people were concerned that just adding another way to write if/elif/else was not much of a gain for Python. The discussion continued on into the next fortnight.
Contributing thread:
The period of the random module's random number generator
Alex Martelli noticed a note in random.shuffle.__doc__ which said that most permutations of a long sequence would never be generated due to the short period of the random number generator. This turned out to be an artifact from back when Python used the Whichman-Hill generator instead of the Mersenne Twister generator it uses currently. There was some discussion as to whether the comment should be removed or updated, and Robert Kern pointed out that at sequence lengths of 2081 or greater, the comment was still true. Tim Peters decided it was best to just remove the comment, explaining that "anyone sophisticated enough to understand an accurate warning correctly would have no need to be warned".
Contributing thread:
Pre-PEP: Allow Empty Subscript List Without Parentheses
Noam Raphael presented a pre-PEP for empty subscript lists in getitem-style access to objects. This would allow zero-dimensional arrays to work in a similar manner to all other N dimensional arrays, and make all of the following equivalences hold:
x[i, j] <--> x[(i, j)] x[i,] <--> x[(i,)] x[i] <--> x[(i)] x[] <--> x[()]
Most people felt that zero-dimensional arrays were uncommon enough that either they could be replaced with simple names, e.g. x, or could use the currently available syntax, i.e. x[()]. Zero-dimensional arrays are even uncommon in numpy where, after rehashing the issue innumerable times, zero-dimensional arrays have been almost entirely replaced with scalars.
Contributing thread:
PEP 337: Logging Usage in the Standard Library
For the Google Summer of Code, Jackilyn Hoxworth has been working on implementing parts of PEP 337 to use the logging module in parts of the stdlib. When Jim Jewett, who is mentoring her, brought up a few issues, people got concerned that this work was being done at all, being that PEP 337 has not been approved. Jim and A.M. Kuchling clarified that the goal of Jackilyn's work is to both clarify the PEP (e.g. determine exactly which modules would benefit from logging) and to provide an implementation that can be tweaked as necessary if the PEP is accepted. For the first draft at least, it looked like Jackilyn would keep things simple -- using "py." + __name__ for the logger name, not adding any new logging messages, not changing any message formats, and generally aiming only to give stderr and stdout messages across different modules a common choke point.
Contributing thread:
inspect.isgenerator
Michele Simionato asked for a new function in the inspect module that would identify a function as being a generator function. Phillip J. Eby pointed out that any function can return a generator-iterator (though generator functions are of course guaranteed to do so) and suggested that the perceived need for this inspect function was misguided. Michele agreed and withdrew the proposal.
Contributing threads:
Unescaping entities with sgmllib
Sam Ruby asked why sgmllib unescapes entities selectively, not all or nothing (which would be easier to work around), and Fred L. Drake, Jr. explained that sgmllib is really only intended as support for htmllib. Sam suggested isolating the code that attempts to resolve character references into a single method so that subclasses could override this behavior as needed. Martin v. Löwis agreed that this seemed reasonable, though he suggested two functions, one for character references and one for entity references. Sam implemented the suggested behavior and provided a patch to sgmllib.
Contributing thread:
Scoping vs augmented assignment vs sets (Re: 'fast locals' in Python 2.5)
A bug in Python 2.5 that did not detect augmented assignment as creating a local name allowed code like the following to work:
>>> g = 1 >>> def f1(): ... g += 1 ... >>> f1() >>> g 2
This of course started the usual discussion about giving Python a way to rebind names in enclosing scopes. Boris Borcic in particular was hoping that the bug could be considered a feature, but Terry Reedy explained that Python was not willing to give up the near equivalence between x = x + 1 and x += 1. Since the former creates a local name, the latter ought to do the same thing. The thread seemed like it might drift on further until Guido cut it off, pronouncing that the behavior of augmented assignments creating local names was not going to change.
Contributing threads:
- 'fast locals' in Python 2.5
- Scoping vs augmented assignment vs sets (Re: 'fast locals' in Python 2.5)
- Comparing closures and arguments (was Re: Scoping vs augmented assignment vs sets (Re: 'fast locals' in Python 2.5)
- The baby and the bathwater (Re: Scoping, augmented assignment, 'fast locals' - conclusion)
Checking out an older version of Python
Skip Montanaro asked about checking out a particular version of Python. Oleg Broytmann and Tim Peters explained that tags are just directories in Subversion, and you can view all the existing ones and their corresponding revision numbers at http://svn.python.org/projects/python/tags/. Oleg also explained that the difference between:
svn switch svn+ssh://pythondev@svn.python.org/python/tags/r242
and noting that the r242 tag corresponds to revision 39619 and doing:
svn up -r 39619
is that with the latter, commits will go to the trunk (assuming the update was performed on a trunk checkout), while with the former, updates will go to the appropriate tag or branch. Giovanni Bajo provided a nice explanation of this, describing Subversion's 2D coordinate system of [url, revision] and Skip added the explanation to the Development FAQ.
Contributing thread:
Source control tools
In the externally maintained packages discussion, Guido suggested offhand that some other version control project might make it easier to resolve some of the issues. Thomas Wouters put forward a number of considerations. On the negative side of changing to one of the newer version control systems:
- Workflow would have to change somewhat to use most of the new branch-oriented systems.
- Everyone would have to download the whole repository (at least once) since with the newer systems everyone usually has their own repository.
But on the positive side:
- History can be preserved for merges of branches (unlike Subversion), which is a big gain for when the trunk is switched to 3.0.
Thomas tried importing the Python repository into a number of different systems, and after playing around with them, concluded that in the short term, none of the other version control systems were quite ready yet, though he seemed optimistic for them in the next few years. He also promised to publish imports of the Python repository into Git, Darcs, Mercurial, Bazaar-NG and Monotone somewhere once he was able to successfully import them all.
Contributing thread:
Underscore assignment in the interactive interpreter
Raymond Hettinger noted that in the interactive interpreter, an expression that returns None not only suppresses the printing of that None, but also suppresses the assignment to _. Raymond asked if this was intentional as it makes code like the following break:
>>> import re, string
>>> re.search('lmnop', string.letters)
<_sre.SRE_Match object at 0xb6f2c480>
>>> re.search('pycon', string.letters)
>>> if _ is not None:
... print _.group()
lmnop
Fredrik Lundh pointed out that users just need to recognize that the _ holds the most recently printed result. Guido pronounced that this would not change. Terry Reedy suggested adding some documentation for this behavior to either Language Reference 2.3.2 Reserved Classes of Identifiers and/or to Tutorial 2.1.2 Interactive Mode, but it was unclear if any doc changes were committed.
Contributing thread:
Removing MAC OS 9 cruft
A number of old MAC OS 9 bits and pieces that are no longer used were removed:
- IDE scripts
- MPW
- Tools/IDE
- Tools/macfreeze
- Unsupported/mactcp/dnrglue.c
- Wastemods
This should solve some problems for Windows checkouts where files with trailing dots are not supported.
Contributing threads:
Fixing buffer object's char buffer support
Brett Cannon found that import array; int(buffer(array.array('c'))) caused the interpreter to segfault because buffer objects were redirecting tp_as_buffer->bf_getcharbuffer to the wrong tp_as_buffer slot. Brett fixed the bug and updated the docs a bit to clarify what was intended for the implementation, but kept changes pretty minimal as Python 3.0 will ditch buffer for the bytes type anyway.
Contributing threads:
Importing subpackages in Jython
In Jython 2.1, importing a module makes all subpackages beneath it available, unlike in regular Python, where subpackages must be imported separately. Samuele Pedroni explained that this was intentional so that imports in Jython would work like imports in Java do. Guido suggested that having imports work this way in Jython was fine as long as a Java package was being imported, but when a Python package was being imported, Jython should use the Python semantics.
Contributing thread:
RFC 3986: Uniform Resource Identifiers (URIs)
There was some continued discussion of Paul Jimenez's proposed uriparse module which more faithfully implements RFC 3986 than the current urlparse module. Nick Coghlan submitted an alternate implementation that kept all parsed URIs as (scheme, authority, path, query, fragment) tuples by allowing some of these elements to be non-strings, e.g. authority could be a (user, password, host, port) tuple, and path could be a (user, host) tuple. People seemed to like Nick's implementation, but no final decision on the module was made.
Contributing thread:
False instead of TypeError for frozenset.__contains__
Collin Winter suggested that code like {} in frozenset([1, 2, 3]) should return False instead of raising a TypeError. Guido didn't like the idea because he thought it would mask bugs where, say, a user-defined __hash__() method accidentally raised a TypeError.
Contributing thread:
IOError or ValueError for invalid file modes
Kristján V. Jónsson asked why open()/file() throws an IOError for an invalid mode string instead of a ValueError. Georg Brandl explained that either an IOError or a ValueError can be raised depending on whether the invalid mode was detected in Python's code or in the OS's fopen call. Guido suggested that this couldn't really be fixed until Python gets rid of its stdio-based implementation in Python 3.0.
Contributing thread:
Testing, unittest and py.test
Martin Blais checked in un-unittestification of test_struct, and a number of people questioned whether that was a wise thing to do. Thomas Wouters suggested that unittest should merge as many features from py.test as possible. This would reduce some of the class-based boilerplate currently required, and also allow some nice additional features like test cases generated on the fly. He didn't get much of a response though, so it was unclear what the plans for Python 2.6 were.
Contributing thread:
hex(), oct() and the 'L' suffix for long numbers
Ka-Ping Yee asked why hex() and oct() still produced an 'L' suffix for long numbers even now that ints and longs have basically been unified. PEP 237 had mentioned the removal of this suffix, but not given it a specific release for removal, so people decided it was best to wait until Python 3.0 when the 'L' suffix will also be removed from repr().
Contributing thread:
Adding an index of Python symbols
Terry Reedy suggested adding a page to the Python Language Reference index that would list each symbol in Python (e.g. (), [] and @) along with the places in the documentation where it was discussed. Terry promised to submit a plain-text version in time for the Python 2.5 release, so that someone could convert it to LaTeX and merge it into the docs.
Contributing thread:
Behavior of searching for empty substrings
Fredrik Lundh resolved the issues discussed previously with searching for an empty substring at a position past the end of the string. The current behavior looks like:
>>> "ab".find("")
0
>>> "ab".find("", 1)
1
>>> "ab".find("", 2)
2
>>> "ab".find("", 3)
-1
Both Tim Peters and Guido applauded the final resolution.
Contributing thread:
subprocess.IGNORE
Martin Blais asked about adding subprocess.IGNORE along the lines of subprocess.PIPE which would ignore the child's output without being susceptible to buffer deadlock problems. Under Unix, IGNORE could be implemented as open('/dev/null', 'w'), and on Windows, open('nul:', 'w'). People seemed to think this was a useful feature, but at the time of this summary, no patch had yet been provided.
Contributing thread:
Skipped Threads
- Segmentation fault of Python if build on Solaris 9 or10 with Sun Studio 11
- Possible bug in complexobject.c (still in Python 2.5)
- [Python-checkins] r46300 - in python/trunk: Lib/socket.py Lib/test/test_socket.py Lib/test/test_struct.py Modules/_struct.c Modules/arraymodule.c Modules/socketmodule.c
- test_struct failure on 64 bit platforms
- string inconsistency
- S/390 buildbot URLs problematic
- SF patch #1473257: "Add a gi_code attr to generators"
- test_unicode failure on MIPS
- valgrind report
- test_ctypes failures on ppc64 debian
- Request for patch review
- patch #1454481 vs buildbot
- Seeking Core Developers for Vancouver Python Workshop
- [Python-checkins] Python Regression Test Failures refleak (1)
- Include/structmember.h, Py_ssize_t
- DC Python sprint on July 29th
- tarfile and unicode filenames in windows
- [Python-checkins] buildbot warnings in hppa Ubuntu dapper trunk
- -Wi working for anyone else?
- Inject some tracing ...
- Segmentation fault in collections.defaultdict
- Add pure python PNG writer module to stdlib?
- crash in dict on gc collect
- "can't unpack IEEE 754 special value on non-IEEE platform"
- socket._socketobject.close() doesn't really close sockets
- DRAFT: python-dev summary for 2006-04-16 to 2006-04-30
- [Python-checkins] r46795 - in python/trunk: Doc/lib/libstdtypes.tex Lib/test/string_tests.py Misc/NEWS Objects/stringobject.c Objects/unicodeobject.c
- xrange vs. int.__getslice__
- request for review: patch 1446489 (zip64 extensions in zipfile)
- DRAFT: python-dev summary for 2006-05-01 to 2006-05-15
- pychecker warnings in Lib/encodings
- Moving PEP 343 to Final
- Python sprint at Google Aug. 21-24
- Long options support
- High Level Virtual Machine
- sqlite3 test errors - was : Re: [Python-checkins] r46936 - in python/trunk: Lib/sqlite3/test/regression.py Lib/sqlite3/test/types.py Lib/sqlite3/test/userfunctions.py Modules/_sqlite/connection.c Modules/_sqlite/cursor.c Modules/_sqlite/module.c Modules/_sqlite/module.h
- [Python-checkins] sqlite3 test errors - was : Re: r46936 - in python/trunk: Lib/sqlite3/test/regression.py Lib/sqlite3/test/types.py Lib/sqlite3/test/userfunctions.py Modules/_sqlite/connection.c Modules/_sqlite/cursor.c Modules/_sqlite/module.c Modules/_sqlite/module.h
- [Python-checkins] sqlite3 test errors - was : Re: r46936 - in python/trunk: Lib/sqlite3/test/regression.py Lib/sqlite3/test/types.py Lib/sqlite3/test/userfunctions.py Modules/_sqlite/connection.c Modules/_sqlite/cursor.c Modules/_sqlite/module.c
- [Python-checkins] sqlite3 test errors - was : Re: r46936 - in python/trunk: Lib/sqlite3/test/regression.py Lib/sqlite3/test/types.py Lib/sqlite3/test/userfunctions.py Modules/_sqlite/connection.c Modules/_sqlite/cursor.c Modules/_sql
- Last-minute curses patch
- DRAFT: python-dev summary for 2006-05-16 to 2006-05-31
- Bug: xml.dom.pulldom never gives you END_DOCUMENT events with an Expat parser
- Misleading error message from PyObject_GenericSetAttr
- About dynamic module loading
Epilogue
This is a summary of traffic on the python-dev mailing list from June 01, 2006 through June 15, 2006. It is intended to inform the wider Python community of on-going developments on the list on a semi-monthly basis. An archive of previous summaries is available online.
An RSS feed of the titles of the summaries is available. You can also watch comp.lang.python or comp.lang.python.announce for new summaries (or through their email gateways of python-list or python-announce, respectively, as found at http://mail.python.org).
- This python-dev summary is the 6th written by
- Steve Bethard. (Please, ma, don't make me do the switch statement summary!)
To contact me, please send email:
- Steve Bethard (steven.bethard at gmail.com)
Do not post to comp.lang.python if you wish to reach me.
The Python Software Foundation is the non-profit organization that holds the intellectual property for Python. It also tries to advance the development and use of Python. If you find the python-dev Summary helpful please consider making a donation. You can make a donation at http://python.org/psf/donations.html . Every cent counts so even a small donation with a credit card, check, or by PayPal helps.
Commenting on Topics
To comment on anything mentioned here, just post to comp.lang.python (or email python-list@python.org which is a gateway to the newsgroup) with a subject line mentioning what you are discussing. All python-dev members are interested in seeing ideas discussed by the community, so don't hesitate to take a stance on something. And if all of this really interests you then get involved and join python-dev!
How to Read the Summaries
This summary is written using reStructuredText. Any unfamiliar punctuation is probably markup for reST (otherwise it is probably regular expression syntax or a typo :); you can safely ignore it. We do suggest learning reST, though; it's simple and is accepted for PEP markup and can be turned into many different formats like HTML and LaTeX.
