Python web developpement: the dilemma

There is a bunch of Python based web framework. So much that it’s a really hard choice to do. Most of them came w/ a special templating language, and a different approach.

Dilemma 1

I really think most of the frameworks, have nothing to do with templating. We should give up with that mixOmatic. A framework is a framework not a templating language. So while looking for a good framework, never looks at its templating system, since you should use the one you want. (not the one that come with the framework).

Dilemma 2

The second thing that run me in trouble is the approach. I find about 3 different way to handle requests, and still wondering what is the best choice

Thread approach

Webware has this kind of approach. It maintain a pool of servlets, and use apache as front-end to dispatch to webware server. This approach is really interesting since you can really tweak performance by maintaining the some cached stuff .. By the other side, you have all the drawbacks of this way to using:

  • A lot of python module aren’t thread safe. Even the SQLObject which is normally written in this way have trouble with thread. Phil lost part of the night to discover that SQLO + SQLite run give him a bunch of errors..
  • Really hard to deploy: Since the servlet pool doesn’t support advanced caching, every single servlet load stay loaded all the time. (I want to deploy a bunch of website .. something like 20 .. with low traffic that isn’t the right way)

Mod_Python approach

There is a lot of frameworks that use this way. The one i really enjoy is MP Servlets. The major feature is that it use Apache 2 thread behaviour, so you don’t have to use python threads. This sound really good, and you don’t have to launch a python server on the host. But the major drawback is that there is no way to maintain (or even limit) a pool of objects between request. (This is possible but you need to tweak the apache way to handle request to disable threading for a proceess… and this isn’t a good approach for virtual hosted website).

Twisted approach

Twisted try to fix the thread problem by using a single loop, and async request handling. With that you can:

  • enjoy the python world without thread nightmare
  • desging small dedicated servers

But once again this way have a major drawback has you will be unable to use stuff like SQLObject since Twisted require some specials database connection to avoid locking (remember single loop). And i don’t want to use the old fashiong sql way to build website. Another thing to remember about Twisted is that you need to use apache to proxy the request to the host, and i don’t really think that this way is really pretty for performance.

Conclusion ?

I don’t have a good conclusion. I know Zope can fix some of this problem, but Zope is a too hard to developp / maintain for me. Zope3 tend to be more simple for a lot of ways, but still under developpement, and need a lot of work to learn right now.

Update: Nobody seems to have a decent solution :) Really strange no ?



Related Posts

16 thoughts on “Python web developpement: the dilemma

  1. If it’s just the caching issue with Webware, that can certainly be resolved — it would just be a small modification to ServletFactory to make it keep its cache down to a reasonable size. You can also mark your servlets as not being reusable, which probably wouldn’t be much of a performance hit.

    As far as threading issues, it is a problem, but on some level any system has to deal with concurrency, and concurrency is hard. It’s the same in Zope (though they tend to keep threads more isolated to avoid some of these issues). Twisted doesn’t solve this either, it just uses a different technique for concurrency. And if you want to use something like SQLObject (which you can) you’ll have to do so with threads — leaving you potentially with both thread and async issues.

    I think there are some advantages to the forking system that Apache uses. SkunkWeb also does this, but unlike mod_python it doesn’t run in the Apache process. Keeping the two isolated can provide some advantages, and make it all more controllable.

  2. Anser to Ian:

    • I think the best way to fix my trouble w/ webware is to tweak the ServletFactory, since i want to have advanced cache. I want the servlet to be reusable because i work w/ ZPT, and ZPT are long to parse.
    • Mixing Twisted with thread won’t be a nice approach i think, since as you say, ‘leaving you potentially with both thread and async issues’. I haven’t take a look deeply in ZPublisher but Zope doesn’t suffer of this kind of issue.
    • The fork approach (which i forget in the document) can be a way to do this too. but once again i guess this can eat a lot of CPU, and i try to avoid this.. But i gonna look at Skunk closer
  3. Actually Twisted already uses threads for DB querying. It has a fantastic package which is Twisted.enterprise that can handle all the most popular databases available for python (postgres, msserver, firebird, mysql, oracle and so on), and it also has a object relational mapper built-in called Row, which works even if it’s not maintained anymore. I would suggest that if you really need the object mapping stuff you can offer for maintainership.

    Also there is a good light and fast OODB called Atop which is available at http://www.divmod.org

    Twisted has NO problems to work with databases itself, and it has no problems working with threads. Actually it uses a ThreadPool to connect to various databases, if you need to some heavy tasks which are blocking you can always use ‘deferToThread’ which sends your task to a thread and then returns the result to the main twisted thread.

    Also you can rely on processes, which are far better than threads in python (at least), because they can be debugged easily and can be stopped easily and can scale to more computers. 32 computers are cheaper than a 32-way SMP supercomputer.

    Also Nevow is arguably the best web framework out there for scalability and speed :P

  4. Answer for Dialtone (the twisted entousiast of #twisted.web)

    • For me ORM has nothing to deal w/ a web oriented framework. I think SQLObject or Modeling offer a large set of possibilites for this kind of issues.
    • OODB, hum if i need to choose one OODB i guess i’will choose ZODB. But once again that doesn’t feet w/ my needs.
    • Do you meen that using SQLo in twisted is do-able ? Hum .. mixing thread w/ async is perhaps a not so bad idea .. i don’t know.
    • Processes is a good approach too .. perhaps a mix, i mean: a Twisted server for page rendering and another for DB..
    • Last: yes Nevow is pretty kool for templating (one more !!) but i don’t feel confident w/ Stan :) I think template is a html file and Stan .. no thanks :)
  5. Actually Stan is just an s-expr like syntax to build a template without using (x)html.

    It is also used to produce xmls like an RSS feed with little to no effort.

    The most important thing of all, anyway, is that you don’t have to use stan if you don’t like it. It is widely used in the examples because with stan you can fit everything inside just one python file (which is handy for that matter).

    BTW, maybe the reason because you have a 2 day vacuum on your DB is probably because you haven’t written you SQL by hand (which is great for optimizing a database).

    Give a try to Nevow and don’t look at stan if you don’t like it (it is great for protoyping though).

    PS: I’m no Twisted enthusiast ;P

  6. Another approach to handle requests is implemented by Quixote with using SCGI. A SCGI-Server is listening on a local port, and Apache is routing requests there via MOD_SCGI.

    Performance is quite high, as the SCGI-Server can spawn more processes if requests come in rapidly. Threading within the requesthandler can be used, but does not need to be used – and is ususally not needed.

    SCGI is a very lightweight protocol. http://www.quixote.ca

    • Quixote and Webware have the quite same approach (handling is done thought mod_webkit and executed in the threaded AppServer on Webware). But in webware, objects are shared thought the AppServer, which is a threaded TCPServer. This mean use thread and most of us right know that thread in python is a holy graal :). I’m wondering how Quixote handle this case.
    • I just finished a test w/ the twisted producer / consumer. And this works but :
      • spliting a page rendering in chunk is quite hard (or impossible)
      • if you use too small chunk, you will get some really bad performance.
    • For performance, I wondering: Should we do a bit of benchmarking ? I can collect results but i don’t have a lot of time to bench each system..
  7. ??? shared objects are bloody simple with mod_python; module-level globals persist between requests and threads quite nicely.

  8. I too have this dilemma.  I recently learned python (have a pretty good understanding of it after a week or so) and now want to write web stuff with it…  The best approach to writing things I’ve managed to find so far while looking is  HTML::Mason, but that unfortunately uses perl :-)

    In that, a page is component, which can be any combination of perl and html, usually stuff like

    The result is=< % $text %>

    < %args>

    $greet => “Hello”

    < %init>

    my $text = “$greet, World”;

    which is pretty obvious what it does, but then you can automatically call it as mypage?greet=hi, or from another page like

    < & mypage, greet=>“hi” &>

    which means that bigger pages can be made up of sub pages that don’t even know they are sub pages.  The closest thing to this in python seems to be Spyce, but that doesn’t look very popular.

    In most of the frameworks I’ve looked at, there is no way to do this type of embedding of components, without the parent page knowing about the sub page, or the sub page knowing about the parent.

    see the example for nested pages here:

    http://www.onlamp.com/pub/a/python/2004/02/26/python_server_pages.html

    • On mod_python: I get a bunch of website.. so i want to tweak what should be shared and what should, and this kind of stuff involve timestamping some objects and disable it when nothing hapen .. I guess Zope do it without trouble but. To be really clear i want to control how many DB connection i should maintain and stuffs like this. Another thing that come to my mind is that to use Session (in mod_python) you need to use apache2 which is threaded app .. so i will get the same thread issue. The only way to solve this is to use a single mod_python_enable apache fork.. but tweaking this way apache is good for custommer apps not virtual hoster like me.
    • For cherry, i look at it.. but never felt happy w/, simply because i like class .. and cherry don’t tend to use them. And my dilemma, isn’t about pythonic way to do this, but more about scale and reduce_to_the_max(tm) the CPU load.
    • To Justin: (HTML:Mason): I did a lot of perl in a past life. I got this question a lot of time on python-fr mailing list, and i usually answer that cheetah is close to what Mason do.
  9. Hey…

    I’m developing a fairly complex DB-driven site for my small business and I was posed with the same dilemma. I just really didn’t like Zope, wanted something web-oriented like PHP without the nasty warts, and wanted to leverage the ease of expression of Python. Check out skunkweb (http://www.skunkweb.org) — it’s really amazing (it’s closest to Webware among the ones you mentioned above). The name is pretty retarded but so far it has met all of our needs.

    It does have a lightweight templating language but it makes sense (and makes easy things easy and hard things possible.) You can do some pretty kickass stuff with very little code, and for me it makes web dev fun again.

    I looked at the other frameworks you mentioned and found that they were either clumsy or had showstopper issues like the ones you discussed. (To the poster above — I hated the way CherryPy interleaves html and python — it’s clumsy as hell and brings back ulcer-inducing nightmares of ugly, unmaintainable PHP written by novices. Skunkweb’s STML tags are clearly identifiable as STML tags and are visually easy to separate from HTML, which is great.)

    I can post more about it if you’d like. Check out PyDO (comes with Python; also investigating SQLobject — sqlobject.org) which makes DB interaction really great; basically don’t have to write sql and every row returned can be accessed like an object or dict.

    e.g.

    u = Users.getUnique( username=’xyz’, password=’foo’ )
    print u['address']

    Oh, it also runs off a forking model, so a runaway request won’t take down your server, and you don’t have ugly threading issues. (Flipside is that it’s kind of a resource hog, at least in terms of memory footprint). Hope this helps — Skunkweb really rocks and I was lucky to stumble upon it. It will (already has) save me months of development.

  10. Hi

    Check http://orderweb.co.za

    My approach does not use any templating language, you can choose what you want, I personally prefer HTMLGen so I never mix html with code rather generated html from python objects

    Code will be released end of August on SourgeFourge

  11. As far as twisted goes, I think many people get really confused about this event queue thing. In reality, just like the event queue and interupts on your processor, you don’t need to know about it!! Knowing about it lets you understand what goes on behind the scenes a little more and play with some more powerful stuff but for 99% of the time it’s irrelevant.

    As for Nevow, a colleague helps withe development and I’ve played devils advocate repeatedly on web development issues. Nevow handles static, on disc templates better than any templating system I know. I’m not even just a blind convert, I’ve tried out about 20+ different systems and in the end was in the middle of developing my own with what I thought were the best features. Nevow beat me to it. Even things like automatic form generation (formless) can be rendered from on disc templates. Your templates don’t even have to be split into lots of little bits and they will all render accurately as normal html.

    I’ve started to look at Stan and, with the caveat that your HTML is VVV.Semantic and that all of your presentation is done through CSS (which is how we build our websites and applications) then stan can be a very good thing for componenet level building. I’m still wary of using stan for framework templates (the page level ones that may contain) but we’re building these sorts of things at the moment so we’ll try it out.

  12. Just a bit of feedback on the twisted nevow thing. We’ve just finished largish project which is built on Nevow and Twisted. We had three people working on the project, one of whom was just learning Python (never mind Twisted/Nevow) the other (myself) was using Nevow/Twisted in anger for the first time and the final person was experienced in Nevow. We were also using deferred database and even threading in places in order to prevent blocking the Twisted ‘queue’ (processing large images)

    The project went remarkably well although there were a few frustrations. Firstly the frustrations :-

    1 Nevow errors are indecipherable at times

    2 The documentation for Nevow is virtually non-existent

    3 Table layouts are evil (nothing to do with Nevow but I had to say it)

    4 forms are a pain

    just to answer a couple of these :-

    1 However, the majority of the errors we had were pretty much instantly recognisable by our ‘expert’ and I’ve been informed that a suggestion has been made to offer ‘possible problem solutions’ alongside the errror messages that are associated with common problems.

    2 Hmm… combine this with the indecipherable error messages and this led to many headscratching moments that were only resolved when our ‘expert’ was brought in. 99% of these moments could have been prevented with a good FAQ and documentation. On the documentation side I beleive people are already working on it.

    3 use css layout where possible.

    4 Use formless where possible. The main problem with formless is the lack of control over the layout of the form. I’ve also been informed this is being worked on and, in my opinion, when you can use formless and have control over the layout of the form, this will be the point where using nevow will be near unquestionable.

    just to look at a couple of comments

    * OODB, hum if i need to choose one OODB i guess i’will choose ZODB. But once again that doesn’t feet w/ my needs.

    We’ve gone through a lot of pain in looking at persistence and we finally came to the conclusion that there isn’t an ORM out there that can efficiently handle complex interrelated data in the fashion that a database can. At database, although a pain, just offers too many advantages at this point in time. Pypersist looks like a very interesting concept but I’d always be worried that my memory would run out. We’ve finally settled on using Postgres as our persistence layer and psyco2 as the db-api. Time will tell where ORM goes, it seems to offer the functionality at the moment but not the perfomance.

        * Do you meen that using SQLo in twisted is do-able ? Hum .. mixing thread w/ async is perhaps a not so bad idea .. i don’t know.

    As I mentioned, threads are very simple in Twisted.

        * Processes is a good approach too .. perhaps a mix, i mean: a Twisted server for page rendering and another for DB..

    Hmm not sure I know what you mean here.

        * Last: yes Nevow is pretty kool for templating (one more !!) but i don’t feel confident w/ Stan :) I think template is a html file and Stan .. no thanks :)

    Ah.. I know exactly how you feel here. However, in my thinking about templates I went through the following process.

    1) If a template needs processing logic, that processing should be done using the best tool for the job. In 99% of cases this is the underlying programming language.

    2) Given this, templates should only contain the ability to mark up a block as a ‘pattern’ for use, to provide slots for data and to mark up attributes for replacement.

    3) When a template becomes complex, for instance :- grouping a list of sports fixtures by round and then date and having a special header for each and marking up fixtures with scores differently to fixtures without scores and marking up alternate rows with different colours. This situation would need so many fragments in order to be able to be driven from disk that it wouldn’t make any sense anyway.

    4) this is the final one… once disk fragments become dissassociated with each other in this way, they are typically small enough to be represented using stan without ending up with an unreadable mess.

    5) you now have all the power of python, the majority of your page templates on disk and only at the component level do you pull your templates into stan. These templates then become self commenting and very easily modified.

    6) final caveat.. if you really want to you can leave the larger fragments on disk, all marked up with patterns, and then use simple stan elements to populate the contents of them.

    ahh… enough waffle now.. back to work on a PHP project for a moment :-(

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>