More Thoughts On Arrays in PyPy

My previous blog post was trying to diplomatically point out that Numpy is a whole lot more than merely a fast array library. Somehow the more subtle point seems to be lost on a number of people, so this follow-up post is going to be more blunt. (But not mean or angry – I was feeling rather angry when I first started this blog entry, but then I realized it was because I was hungry, so I had a Snickers bar and re-wrote it.)

I think people should re-read Travis’s blog post on porting Numpy to PyPy, and really read it carefully. Travis not only created Numpy and helped create Scipy, but in his professional life he has been invited to labs and universities all over the world to speak about the use of Python in scientific computing. He has seen, first-hand, what works and what doesn’t when trying to push Python into this space, so even though he makes many high-level points in his blog, he has been in the trenches enough to have a very good sense of what users are looking for. He does not use his words lightly, and he very much means what he says.

In looking over the discussion on this subject, I realized that Travis has basically already said all that needs to be said. However, certain things bear highlighting.

A Low-level API Is A Feature

Travis: “Most of the scientists and engineers who have come to Python over the past years have done so because it is so easy to integrate their legacy C/C++ and Fortran code into Python.

Most of these [Scipy,Matplotlib,Sage] rely on not just the Python C-API but also the NumPy C-API which you would have to have a story for to make a serious technical user of Python get excited about a NumPy port to PyPy.” (emphasis mine)

The bottom line: There is no such thing as “Numpy on PyPy” without a low-level API for extensions.

Call it “Array Intrinsics for PyPy” if you will. Call it “Fast Arrays for PyPy”. But to call it “Numpy Support for PyPy” without offering the integration ability that Travis alludes to, and without actually using any of Numpy’s code (for FFTs, linear algebra, etc.) is kind of false advertising. (David Cournapeau tries to gently make this point on the mailing list by saying that “calling it numpy is a bit confusing”.)

As I pointed out in a previous post, there are dynamic compilers and JITters for Numpy right now; however, none of them call themselves “PyPy for Numpy”, because there is a huge portion of the PyPy feature set they don’t support. Just because they do some JITting, they are not PyPy. Likewise, writing an array object with operator overloading does not mean you have Numpy.

Travis: “It’s not to say that there isn’t some value in re-writing NumPy in PyPy, it just shouldn’t be over-sold and those who fund it should understand what they aren’t getting in the transaction.” (emphasis mine)

On the pypy-dev mailing list, Jacob Hallen said: “We did a survey last spring, in which an overwhelming number of people asked for numpy support.” I am curious about the exact wording of this survey. By “Numpy support”, did respondents mean “a fast array library, with no hope of integration with Scipy, Scikits, Matplotlib, etc.”? If so, then the PyPy team may very well be able to meet their demand. However, I will wager that when most people ask for “Numpy support”, they also imply some level of compatibility or at least a reasonable timeline for integration with the rest of the Scipy stack. It might be worth re-polling those folks who wanted “Numpy support” and ask them how useful a fast array library in PyPy would be, if it could not be integrated with Scipy, Matplotlib, etc., and if there was no way to integrate their own extension modules.

In my experience, there is virtually no one who uses Numpy that does not also (directly or indirectly) use something that consumes its C-API. Most folks get at that API via Swig, Cython, Weave, f2py, etc., but nonetheless they are very much relying on it.

The Joy of Glue

On the PyPy’s donation page, there is the claim:

“bringing NumPy into the equation is a reasonable next step – as it’s a very convenient and popular tool for doing this kind of work. The resulting implementation should move Python in scientific world from being a merely glue language into being the main implementation language for a lot of people in the scientific/numeric worlds.”

This statement encapsulates most of the philosophical differences between the PyPy guys and the Scipy guys in this discussion. I’m pretty sure that while almost everyone in the Scipy world would rather be coding Python than wading through mounds of legacy C or FORTRAN, most of them have been in the trenches long enough to know that reimplementing libraries is simply not possible in most real-world cases. Even if they could make all new development happen in RPython, they would still need to interface with those legacy libraries. If the PyPy devs want to field their Python implementation as a serious contender in the scientific computing space, they absolutely have to have a way to use it as a “glue” language.

Furthermore, the use of the word “merely” in “merely glue language” really highlights the philosophic difference. I’m sure that for language purists who enjoy geeking out on compilers and VMs and Lambda-the-Ultimate, something as mundane as a low-level VM API for an external language as unsexy as FORTRAN deserves the sneer of “merely”. But for many tens of thousands of scientists and researchers around the world who rely on Python to get their jobs done, the ability to glue together disparate tools, frameworks, languages, and data is utterly priceless. I would even argue that this is one of its essential features; without it, to many people, Python would just be a slightly prettier MATLAB. (Actually, lacking an infix matrix multiplication operator, some would argue that it’s not even that much prettier.)

A High-Level Future

In truth, I am personally torn by this brouhaha with PyPy, because I actually agree with their ultimate goals. I, too, envision a better world in which people are writing high-level languages to achieve better performance than what is even possible at the C level. However, I have seen enough of the problem space to know that (1) any solution that doesn’t integrate with legacy code will be Dead On Arrival, and (2) JIT optimizations of the normal PyPy variety (i.e. approaches that work well for Algol-derived languages) have limited efficacy in the space of serious scientific and high-performance computing. As Travis says in his blog post:

“I am also a true-believer in the ability for high-level languages to achieve faster-than-C speeds. In fact, I’m not satisfied with a Python JIT. I want the NumPy constructs such as vectorization, fancy indexing, and reduction to be JIT compiled.”

It is very important to note that Travis says “JIT compiled” and not “JIT optimized“. In a subsequent post, I will discuss why JIT optimization is not the end-all and be-all for scientific computing, and what could be even better.

How Can PyPy and Numpy Work Together?

It occurs to me that it might help illuminate the discussion if we looked at a concrete example of an alternative approach to collaboration between Numpy and PyPy. A few years ago, Ilan Schnell implemented a very interesting “fast_vectorize” decorator for Numpy ufuncs, which used PyPy’s translator to dynamically generate a natively-compiled version of ufuncs written in Python. It only took him about a week, and I believe most of it was isolating out the RPython-to-C translation functionality.

Furthermore, at this year’s Scipy conference alone, I counted at least three projects that were doing AST walking and code munging to attempt to convert high-level Python code into kernels for fast evaluation, and not merely for Numpy. These included differential equation solvers and the like. Supporting efforts for dynamic compilation of Python would be massively useful for the Scipy community at large, and there is almost no one better poised to do this than the PyPy devs. I think that would be a far, far more useful application of their efforts.

Conclusion

I understand that speed has become PyPy’s raison d’etre. I also understand that they’ve had wonderfully encouraging results with optimizing vanilla procedural code, like what is found in the Python standard library. But I think it is unwise announce a big, loud foray into scientific computing while ignoring the wisdom of their fellow Python coders who have been toiling in this space for over a decade.

If the goal is to create a nifty little array module for users who are happy living within the walls of the PyPy interpreter, then I wish them luck and think they will do very well. But if their goal is have an impact on Python’s role in scientific computing at large, on the scale of what Numpy and Scipy have achieved, then they cannot ignore the integration question, and, IMHO, Cython currently looks to be the most promising avenue for that.

About these ads

29 Responses to More Thoughts On Arrays in PyPy

  1. Peter Wang says:

    One more thing: Speed is currently PyPy’s raison d’etre, but that was not always the case. I remember at PyCon 2005 when Armin showed it off and then mentioned that it was only a few thousand times slower than CPython. The general reaction around the room was “Nifty, but… meh.”

    I personally think that providing a Python interpreter that supports on-the-fly, dynamic compilation to native code is *itself* a killer feature for the scientific world. On the merits of that alone, PyPy could win the hearts of the scientific community, and gain a second, indisputable advantage over CPython. However, a low-level interface for extension modules is an absolute necessity for those users, no matter how much they wish it were not the case.

    I believe that over the course of the next year or two, something will emerge to fill this dynamic compilation void in the Python/Scipy space. There is a clear demand for it, across many different application domains. If one of the usual suspects doesn’t step up, there are many dark horses in the shadows waiting to ride up, including forking/extending the Python language syntactically (as Cython has already done, to some extent), or reviving some sort of LLVM effort.

  2. Andrew Dalke says:

    I am one of a number of serious technical users who are interested in a NumPy without SciPy.

    I’ve been doing scientific programming in a subfield of computational chemistry for about 15 years. Python is big in computational chemistry. There are several commercial/proprietary tools built in Python, as well as a number of free ones. Only one of them uses NumPy, and it does not use SciPy. (They do use SWIG or Boost for the C++/Python bindings, and I agree that it will take some time before there’s an easy PyPy migration path for them.)

    I program in NumPy about one day a year, and haven’t used SciPy. This is pretty standard for my field, which deals with molecular graphs. I could use NumPy more often, especially to store 2-D array information, but I end up writing my code so I don’t need that extra dependency. Very little of SciPy or Sage looks useful, although matplotlib support might be nice.

    Chemists usually use Excel, Spotfire, or other tool for graphics display. They need a tool which displays chemical structures. They are not sitting at an IPython shell.

    I double checked with a neighboring field – biology instead of chemistry. The Biopython source code again uses only NumPy and not the other SciPy tools; except one support module to work with matplotlib. It has 12 C files and 288 Python files, so porting to PyPy would be doable, if the performance is there.

    You conclude saying “a nifty little array module for users who are happy living within the walls of the PyPy interpreter.” That to me shows your own ignorance of the people doing scientific computing with Python. Support for just that “nifty module” would be an immediate boon to people in computational chemistry and in computational biology. We’re already happy living in the walls of Python, so triple performance for little work sounds great!

    It might be that you don’t hear about these cases because those who do scientific computing, in Python, using NumPy, don’t go to the SciPy conference or talk to the SciPy developers because they, well, don’t use SciPy.

    • Peter Wang says:

      Andrew, I think you misunderstand me. I do not deny that an array module for PyPy would be useful for some people; I am merely echoing Travis’s sentiment in his very first quote that *most* scientific users of Python have adopted it in part because of the integration with legacy software.

      I don’t know the exact numbers, but I’m pretty sure that the vast majority of scientific users of Python *do* use Numpy, and probably IPython and Matplotlib, even if they are not using Scipy.

      Additionally, it’s important to point out that even if you are *only* using Numpy (and only on one day of each year) you may still be using parts of it that will *not* be included in the PyPy port: the fast linear algebra support, the FFT libraries, etc.

      The more subtle point is this: Without digging through your code right now, and without looking through the detailed docs/FAQ for the in-progress PyPy array library, can you be absolutely certain that your usage of Numpy will be supported by PyPy?

      Imagine that you are new to the Python space, and are trying to understand the differences between Python2 vs. Python3, EPD vs. PythonXY vs. Activestate vs. Homebrew vs. OS X system Python vs. Macports vs. Fink, etc. Imagine adding to this confusion with mixed messages of “Numpy is faster on PyPy!” but “It’s not the full Numpy, so some things won’t work, but it’s probably good enough for many people!”. If a veteran Python user like yourself cannot be absolutely sure their code will work on PyPy without looking through technical details about sub-packages and the like, imagine how the situation must feel for a new user who is staring blankly at a tutorial web page somewhere, and must make the decision about whether or not to use Numpy on PyPy. This is the fragmentation that people like David Cournapeau allude to on the pypy-dev mailing list, and it’s a real concern.

      • “I don’t know the exact numbers, but I’m pretty sure that the vast majority of scientific users of Python *do* use Numpy, and probably IPython and Matplotlib, even if they are not using Scipy.”

        How about being scientific and finding some hard numbers instead of guessing?

        “you may still be using parts of it that will *not* be included in the PyPy port: the fast linear algebra support, the FFT libraries, etc.”

        this argument sounds like FUD to me. Who said that these parts of Numpy will not be included in PyPy? The workplan of PyPy’s Numpy proposal states:

        “We’ll implement all NumPy APIs that are officially documented and we’ll pass all of NumPy’s tests that cover documented APIs and are not implementation details. Specifically, we don’t plan to:
        - implement NumPy’s C API
        - implement other scientific libraries, like SciPy, matplotlib or biopython

      • Peter Wang says:

        In response to Carl:

        “””
        “you may still be using parts of it that will *not* be included in the PyPy port: the fast linear algebra support, the FFT libraries, etc.”

        this argument sounds like FUD to me. Who said that these parts of Numpy will not be included in PyPy?
        “””

        This came out in the discussion on the mailing list. Those parts of Numpy are implemented in FORTRAN or contain external libraries that would have to be re-written in RPYthon. Maybe I’m not as l33t a hacker as the PyPy guys, but I think even they would have trouble re-writing (and fully testing) an ATLAS in RPython in 3 months.

      • “This came out in the discussion on the mailing list.”

        That may well be, but it was never said by anybody who is part of the PyPy project. And in fact, I think Maciek and Armin were trying several times to correct this notion (maybe not expressing themselves well enough, or being drowned by the volume of the discussion).

        “Those parts of Numpy are implemented in FORTRAN or contain external libraries that would have to be re-written in RPYthon.”

        No, that’s wrong. It’s not like RPython has no way at all to interface with external libraries. I don’t think anybody ever planned to reimplement the heavy lifting in RPython, or claimed to.

        Of course the approach of doing the interfacing in RPython doesn’t actually scale, so your points about needing Cython still stand. However, I think that support for Cython is a completely separate project that can be done by different contributors and in parallel to the work of supporting a fast array module.

        Meta-point of this comment: I think there are some misunderstanding in this whole discussion about various points (probably on both sides), which lead to the whole “brouhaha”.

      • Andrew Dalke says:

        My reply is because I take umbrage over the belief that a “serious” technical user of Python must use Scipy, and that a … “numpypy” shall we say? … by itself isn’t enough, even as a first step. It’s good enough for me, and many of the scientists I work with. I have bias error because I know few people who use SciPy. You and Travis have bias error because you know few people who only use NumPy. I believe you are more right than I am, but I, as a scientific software programmer in Python, feel somewhat ostracized by your implied view that I am not a serious programmer because I don’t use SciPy.

        For many of the things I do, after all, “integration with legacy software” means “talk to the program through the command-line.”

        You asked: “can you be absolutely certain that your usage of Numpy will be supported by PyPy?”

        Yes, I am absolutely certain of that. I’ve been using Python for scientific work since before there was a numeric package. Missing features get added because people work on it. If I need it, then I can contribute the code to PyPy. Have you doubts?

        You then asked about “someone new to the Python space” and expressed concern about the fragmentation that might occur. I do not share your conservatism; or perhaps I should say pessimism. If your view is true there would be no Jython, no Iron Python. If true there would be no Boost (since SWIG could do most of it), no numarray (since numeric could do most of it), no … shall I go on?

        If you are new to the Python space then you look to see what others (in your lab, in your field, etc) are doing, and you often choose those. If in chemistry most people are doing PyPy then you’ll likely end up with PyPy. There will be crappy bridges at the start, like using file I/O to send data over to matplotlib in CPython namespace, but these will be improved and fixed over time.

        Of that I have no doubt.

        I have no need to be “absolutely sure [my] code will work on PyPy” just like I have no need to be absolutely sure it will work on Python. I’ve had to work around limitations in CPython, in various C compilers, in Perl, and every other language implementation. Why should PyPy be any different?

        I have no doubt there will be Cython->PyPy adapters. I have no doubt that someone will have PyPy support SWIG’s .i files. These take time, and I can wait; and occasionally help.

        What you’re forgetting is that PyPy opens a whole new set of solutions. For example, people use f2c to translate LAPACK from Fortran to C. There will be people who write the equivalent to convert Fortran into Python … and people who improve PyPy for that sort of code. It won’t be as fast as Fortran, but it will likely be fast enough.

        I do point out that the de facto “glue language” across four main Python implementation is ctypes. Last week I shipped a ctypes interface to one of my (scientific) clients, and replaced the mismash of SWIG code they had; version-specific SWIG since it was 1.x code which didn’t work with 2.x. In CPython I wouldn’t do that for time critical bindings because CPython’s ctypes call interface is several times worse than going through the C-API interface. But in PyPy it’s faster, so ctypes may be the right solution there.

        That means that the Biopython developers, who currently drop to C for performance, along with Python/C API code, might with PyPy not need C at all, or just have a normal C library and call it with PyPy’s ctypes interface.

        So what you’re complaining about seems to be that PyPy hasn’t had the 10+ years that CPython has had to acquire multiple glue interfaces.

        Lastly, I don’t think you really believe this statement: “I personally think that providing a Python interpreter that supports on-the-fly, dynamic compilation to native code is *itself* a killer feature for the scientific world.”

        Iron Python and Jython both compile down, with their respective JITs, to native code. The Iron Python demo at PyCon a few years ago clearly showed, in Visual Studio’s debugger, the Intel assembly corresponding to the Python code. Yet most scientific programmers haven’t switched over.

        The problem is that most of it is boilerplate, with lots of overhead. It takes something like PyPy to figure out which of the boilerplate can be omitted.

        P.S. Hi Ian!

    • Regarding Biopython using NumPy, we’re already trying it out under PyPy. Large chunks of Biopython do not use NumPy at all, although there a few problems on PyPy 1.6 (one due to a missing XML library, bug filed), most of that seems to work.

      A few things using numpy work with PyPy 1.6′s “micronumpy” but there are major gaps (e.g. missing functions like numpy.dot and the whole of numpy.linalg – bugs filed), and of course there is some of our code using the numpy C API to think about.

  3. I’ll wade in here. Hi Peter, I’m the chap who started the main discussion thread a few days back on pypy-dev. In general I agree with your points, particularly the point about surveying for ‘numpy’. I was one of those who said Yes to numpy but made no mention of scipy/matplotlib, it didn’t seem relevant at the time. I do fear that this survey answer has led to a skewed understanding of the needs of pypy users.

    Andrew – hi again, we met at EuroPython. Your answer comes as a bit of a surprise (but a welcome one). For my physics clients I often go from numpy into bits of scipy and/or cython. For AI work I can end up using the various scikits. I confess I’d assumed that most scientific domains would take advantage of scipy/cython, it is interesting to note that that’s not necessarily the case.

  4. As a biologist, I almost never need to use numpy directly. What I do often use are tools like matplotlib and now pandas, which build on numpy. Where these are pure Python, a reimplementation would work (once they actually get more of it done). But the pieces using the C-API, especially matplotlib, are pretty critical.

    For my own work, PyPy’s JIT would not save me enough time to make up for the loss of access to CPython extensions.

    As an IPython developer, it’s an annoyance that PyPy already calls its implementation ‘numpy’. Various parts of our code, which work perfectly if numpy can’t be imported, fail because they find a ‘numpy’ which doesn’t include many key parts of numpy.

    • Thomas Kluyver says:

      Hmmph, typo: PyPy’s JIT would definitely *not* save enough time to make up for the lack of extensions like matplotlib.

    • Peter Wang says:

      FYI Thomas – Pandas has Cython code (and therefore consumes the Numpy C API), and is getting more each day. Wes seems to really be in love with Cython. :)

      • Thomas Kluyver says:

        Indeed – I’ve read some of his posts about it. But I think the key parts (read_csv is critical for me) could easily fall back to pure-Python implementations if they had to – although I’d much rather the Cython implementation could work on PyPy.

        It seems to me PyPy and Cython are a fairly obvious match. PyPy has two sets of machinery to compile Python-like code without explicit types – the JIT interpreter and the RPython translation toolchain. Could one or other be adapted to use the type declarations in Cython code to statically compile it so it can be imported in PyPy?

      • Peter Wang says:

        Yes, PyPy + Cython would be great, and I agree that it seems like an obvious match. This is the direction that several people are pushing for, and it would be great. I don’t know enough about PyPy internals, but I think that they might have to provide a .pyd file for interfacing some runtime internals of the JIT interpreter, like objects/types.

      • The plan for supporting Cython is actually to write a Cython backend that produces normal Python code which uses ctypes to call into C functions. Work on this has started several months ago, but is stalled right now, because we have nobody who wants to lead this work.

      • Thomas Kluyver says:

        I know – it seems like it was a bit too much for one GSoC student to finish.

        My point was that this approach seems somewhat wasteful: Cython code typically includes static type declarations so it can run faster. Converting it back to Python+ctypes throws that information away, only for the JIT to infer those types again at runtime.

        Of course, the upside is that this code could be used by any Python implementation with a working ctypes implementation.

  5. So, is what you are saying equivalent to the following: “PyPy should also work on Cython support, not just on implementing basic arrays.”?

    If yes, is it necessary to do the Cython support *before* doing the “nifty little array module”?

    • Peter Wang says:

      “So, is what you are saying equivalent to the following: “PyPy should also work on Cython support, not just on implementing basic arrays.”?”

      Yes. It’s not just me – Travis Oliphant, David Cournapeau, and others have said the same.

      “If yes, is it necessary to do the Cython support *before* doing the “nifty little array module”?”

      If PyPy supported Cython, then there would be effectively no need for a new array module implementation. Furthermore, they would be able to bring in a large number of existing modules (like pandas), and they would get all of the improvements to Numpy that are going to happen over the next year or two (including loop and stream fusion and automatic parallelization).

      Nobody – least of all me – is going to tell any other hacker that they can’t go and do something cool. I totally see the appeal of their pure RPython approach to an array library. The reason why the Scipy folks are getting involved at all is because this is being touted as some kind of replacement for Numpy (and the subsequent buzz on HN/Reddit/StackOverflow/etc. about “is this the future of Numpy?!”)

      Additionally, the PyPy guys seem to have expressed interest in furthering the Python story for scientific computing. To me, it’s not clear if this is just because they can JIT optimize some array loops, and think that answers all the performance problems in the scientific side. (If so, then they are in for a rude surprise when they meet the MKL.) As I keep saying: if they really want to make an impact on Python in science – and I sincerely believe they can – they should be listening to the input from the Scipy folks about the most effective ways to do so.

      • fijal says:

        “If PyPy supported Cython, then there would be effectively no need for a new array module implementation. Furthermore, they would be able to bring in a large number of existing modules (like pandas), and they would get all of the improvements to Numpy that are going to happen over the next year or two (including loop and stream fusion and automatic parallelization).”

        I would be very interested to know how supporting cython would give us the array module for free. I’m all ears.

    • I think cython support is important if pypy wants to go beyond just numpy, because most scientific packages on top of need use it (scikit.learn, scipy, etc…). It also seems to me that it is the fastest route toward easy integration with compiled code (be it C/C++/Fortran). My understanding is that pypy can interface through compiled code already, so the main point would be automated wrapping of compiled code (there are at least several thousand of such wrapped functions in scipy alone).

  6. Stuart Axon says:

    The pypy guys can only focus on one thing at a time, they want an easy way to start.

    Getting a basic small implementation working fast is a start, also their way will work with the JIT.

    People are right, it won’t integrate with all the other stuff straight away, but it will be more than we have at the moment (a little more).

    It seems a lot of people want it to work with the other stuff, so maybe someone will step forward and do that, having something that works a little bit will show the way move things forward; with open source you have to either wait or take part; theres not much point in reiterating this point, if people want it to work better with say a cython FFI then by all means go and work on cython ctypes integration, it’s not the pypy guys job.

    • Stuart, the issue is that if the focus is on a small subset of numpy working well, it is confusing to call it numpy. I completely understand the wish of pypy developers to do leverage their infrastructure, do fun things with arrays. But calling in numpy means something else for quite a few people (on the ML alone, there were several people who said they meant scipy/matplotlib when they wrote numpy).

      I don’t think it is fair to call FUD here as Carl did when multiple people on the pypy ML mentioned they were confused as well.

      • “if the focus is on a small subset of numpy working well”

        But it’s not! I don’t know where people get this idea. The work plan on pypy.org clearly states that we want to support the full Python-facing side of the Numpy module, step by step.

  7. Jacob Hallén says:

    I think this is a very useful discussion that brings out the needs of the Scipy community when it comes to linking with libraries. I am sure the PyPy community will use the information when planning the next stage of improvements. However, it is clear that the implementation of nympy in PyPy has a number of applications even without extensive library integration. I agree that there may be some confusion about the applicability of this project, and it would probably be useful to work a bit more on the language on the project page. However, numpy development is a great stepping stone for people who want to get into coding on PyPy. Therefore it makes sense for us to push ahead in this direction.

  8. The discussion is interesting and informative, but the reality remains that the PyPy guys are going to effectively split and confuse the community by continuing what they are doing and code re-use will decrease. They should at least rename the project.

    I would be interested in ways that ideas from PyPy could be used to improve the CPython eco-system instead of re-create it. But, everybody has their right to work on what they want. Diversity in the end is a good thing as it forces us all to think differently to solve hard problems.

    R, Matlab, NumRuby, PDL, and all other high-level expressions of scientific code also “split” the community and it’s not necessarily a bad thing. I wish them all the best and will be looking for ways to leverage their work and try to cooperate where it makes sense.

    Perhaps, my position can be summarized like this: If someone is willing to spend 10 million dollars or more, then I would be more enthused about the NumPy on PyPy project as it would then potentially have the ability to cover the domains and use-cases I actually care about — and even then I would have to get much more comfortable with the runtime environment that PyPy code actually lives in. On the other hand, for less than $500,000 I believe you could get the same benefits and more of NumPy on PyPy by a project like Dynamically Compiled Python which can already leverage all of the CPython stack today.

  9. santagada says:

    “On the other hand, for less than $500,000 I believe you could get the same benefits and more of NumPy on PyPy by a project like Dynamically Compiled Python which can already leverage all of the CPython stack today.”

    I think exactly the opposite, cpython is a dead end in terms of performance and the only future for python is pypy. When unladen swallow started I said the same thing, but people wasted lots of time only to get marginal performance boost while making cpython many times more complex. I still think that most of these other projects that try to make python fast don’t know how hard it is.

    If I was a company with more than a hundred bucks to invest in python performance I would give to the pypy guys instead of wasting my money on the other dead ends. If you have less than 100 bucks to invest then go buy some tasty food while you wait for pypy to make magic happen :)

  10. Andrew says:

    I agree with santagada that CPython is a dead end. I use both numpy and scipy, and these packages are now a liability to me. Almost everything else works on PyPy, which gives 20x performance improvements on some of my projects.

    The longer the numpy and scipy projects continue to fail on PyPy, the more I will pine for alternatives. Instead of the scipy project wasting its time trying to improve the CPython ecosystem, they should try to find out how to make as much of their code as possible work with PyPy.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: