Posted
over 13 years
ago
by
Fetchez le Python
What’s this ? read this post.
What happened
I am getting ready for an important push today, that will switch some services under Python. Things are looking good.
On a side note: my fishes kept on dying in my aquarium and I finally found the problem
... [More]
by having the water analyzed. The nice little river in my small town (175 people), that looks so clean, is basically saturated with Nitrates. I used it for my aquarium because I did not want to use tap water combined with some products to remove the chlorine. I thought I was helping out those poor fishes with nice river water, but I was killing them. This is insane.
Cross-team reviews
During the previous summit, we discuss at the MoPy meeting an interesting idea. What if people could ask for a code review from anyone from another team, that has the skills to do the review. I first thought about doing some kind of plugin for Bugzilla and have for every registered user a set of skillx, then propose a reviewer in the patch UI.
But that means people need to use the Bugzilla review process, and sometimes they use something else. I also wanted this review to be just an extra review with a low-commitment from the reviewers. In Bugzilla, if you are asked to review something, it will stay there waiting for your review for ever even if you don’t review it. I don’t think there’s a way to timeout a review.
Last, cross-team review could be something broader than Mozilla teams. What about getting a review from someone in another Python project ?
Anyways, I started “Bug Brothers”, a prototype to do this. There’s a demo running here: http://bugbro.ziade.org and the code is here: https://bitbucket.org/tarek/bugbro. This was a good opportunity to try Pyramid, and yeah no tests and the code is not very clean.
It’s not finished but it already allows people to ask for reviews, provide a link to a diff. When you review something you get credits, and when you ask for a review you pay credits. Everything is email-driven.
The next steps are to add more features like Rietveld. a tighter integration to Bugzilla, github. etc — but without introducing a dependency to any tool so it can work for every team.
Mozilla Pulse
Coming from the Plone/Zope/Python world, I miss my checkins mailing lists in Mozilla projects. That is, getting a mail everytime a commit is done in one of the projects you work on. You can always read the Atom feeds in the various Mercurial repos, but that’s not the same.
What I want is a diff in a mail can quickly look at. This is very useful to get instant reviews from other people. You usually catch more typos or mistakes. It also help initiating coding discussions.
Christian Legnitto has started the Mozilla Pulse project, which is exactly what I needed: a way to get notified on every change in the Mozilla eco-system. I was waiting for Pulse to get hooked in all our repos and this is now done.
The script to send e-mails on commits is very simple: https://bitbucket.org/tarek/services-pulse/src/tip/consumer.py . I need to add a diff view in the e-mail and a few options, but that’s basically it. For now, it keeps only events happening in hg.mozila.org/services, and it will send e-mail to our services mailing list.
Overall, Pulse is a good way for anyone to watch a particular area in the Mozilla project
Stop guessing encodings
We had a bug in our Services code, related to a password containing an non-ascii character. It’s a shame that as a French I did not insist on unicode vs str before. So here we go.
In Python 2 we have two types to deal with strings. We can use the str type or the unicode type. The str type is basically storing bytes so a string is encoded using a particular encoding, By default the encoding is ascii. The unicode type encodes strings as 16 or 32 bits integers and covers the unicode table. The most common error is to make no assumption whatsoever on the type of the string you get. What will happen is that some functions that need bytes will simply try to decode unicodes using the ascii coded, or vice-versa:
>>> import base64
>>> def encode(data):
... return base64.encodestring(data)
...
>>> encode('I am oke')
'SSBhbSBva2U=\n'
>>> encode(u'I am oké')
Traceback (most recent call last):
File "", line 1, in
File "", line 2, in encode
File "/usr/lib/python2.7/base64.py", line 315, in encodestring
pieces.append(binascii.b2a_base64(chunk))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128)
One solution that comes in mind is to check for the type of the string in your function:
>>> def encode(data):
... if isinstance(data, unicode):
... data = data.encode('utf8')
... return base64.encodestring(data)
...
>>> encode(u'I am oké')
'SSBhbSBva8Op\n'
This is tempting but leads to another issue: if by default your program is able to deal with string or unicode for all your strings, there are high chances that you’ll miss to check for the type somewhere or combine str and unicode in some places.
A much better approach is to use internally in your program only unicode and deal with conversions in inputs and outputs. In a Python web app it boils down to check that all inputs are unicode (beware of JSON).
The other issue is the encoding and the decoding. What codec should we use ? The asnswer is utf-8, because it’s the most universal. To make sure there’s no misunderstanding: a unicode is a decoded string, and a str is encoded. So you can decode() str and encode() unicode:
>>> u'é'.encode('utf8')
'\xc3\xa9'
>>> 'é'.decode('utf8')
u'\xe9'
So, use only unicode in your apps, and when encoding, use the utf8 codec by default.
More on this here: http://docs.python.org/howto/unicode.html
What’s next
More Python deployments
Some benches in the Sync server
[Less]
|
Posted
over 13 years
ago
by
gmane.comp.web.zope.announce
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I have pushed out releases of PAS in order to address a potential
security vulnerability reported by Alan Hoey:
https://bugs.launchpad.net/zope-pas/+bug/789858
The releases are available on PyPI:
... [More]
http://pypi.python.org/pypi/Products.PluggableAuthService/1.5.5
http://pypi.python.org/pypi/Products.PluggableAuthService/1.6.5
http://pypi.python.org/pypi/Products.PluggableAuthService/1.7.5
If you cannot install one of these versions for some reason, the patch
on the Launchpad should be applicable (with some fuzz) to any PAS
version since 1.4.
Tres.
- --
===================================================================
Tres Seaver +1 540-429-0999 tseaver< at >palladion.com
Palladion Software "Excellence by Design" http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk3j2PoACgkQ+gerLs4ltQ7yfQCgt+WJSObr8jaZTyGdvH8oTFOC
R6gAoJtsV [Less]
|
Posted
over 13 years
ago
by
Plone News
The 2011 Plone Conference will be held at the San Francisco State
University downtown campus November 3-6. The last day of the conference
will be a half day of conference, including a brunch, and half day of
sprint preparation. This will give
... [More]
those who aren't sprinting extra time
to travel and sprinters extra time to coordinate and group up. Training
will be November 1-2 and sprints will be November 6-8.
There are hundreds of hotels and hostels in the downtown San
Francisco area. For those that need to plan right away, please see this
map to get an idea of the location. We recommend flying into SFO and staying downtown. It is not advised
to rent a car. We are currently working with a travel agent who will
help coordinate group flights and visas.
Tickets will be $250 for early bird rates and $300 for full ticket
price. Speakers will get a discount at $225 and students will be able to
snag a seat at $175. Unlike previous years, we will NOT be catering any
food except for a brunch on Sunday. There is a full food court in the
basement in addition to hundreds of restaurants and food stands within a
1 mile radius of the venue. Not catering allows us keep the price down
and helps you keep your stomach happier. Don't worry! We will still have
coffee, snacks, and plenty of water coolers for chatting around. That
price includes a "celebration" on Saturday night in traditional Plone
style.
In addition to the standard Plone tracks, we will also be having a
"related technologies" track, with heavy emphasis on Pyramid and WSGI
application development. There will be training available for this as
well. A full track of open space talks will occur throughout the
conference, and lightening talks to start and end each day.
On top of all that nerdy bang for your buck, there will be
coordinated activities for family and friends of Ploners who will be
travelling as well in the form of a "Better Half" track. San Francisco
is a great experience for the whole family, and we are doing what we can
to make this years conference especially friendly for travelling
partners.
Tickets will go on sale June 1st, and we will have a conference website
and full details very soon. Please note that conference attendance will
be limited to 400 people due to very strict city fire laws so start
planning now.
Thanks for your patience, and we look forward to seeing you in San Francisco! [Less]
|
Posted
over 13 years
ago
by
Weblog
Armin Ronacher gives the keynote "State of webdev in Python", at the PyGrunn conference in Groningen, The Netherlands. Organized by Paylogic and Goldmund Wyldebeast & Wunderliebe.
I am founding member of the Pocoo Team, doing Jinja2, Crossroads
... [More]
,
Werkzeug, etc. Python 3 is kind of the elephant in the room that no
one is talking about. So Python in 2011 has the big Python 2 versus 3
debate. 'Unladen Swallow' is resting, Python 3.2 was released, the
packaging infrastructure is being worked on, including distutils2.
PyPy has become really fast. PyPy is Python written in Python. PyPy
trunk is on average 3.7 times faster than the standard CPython, see
http://speed.pypy.com. There is only experimental support for the
Python C API. Different garbage collection behavior, no reference
counting. So a few things will break, depending on what you are using
it for. Django, ctypes, pyglet, twisted, all work.
All language development is now happening on Python 3. It adds
unicode to the whole stack. The language is cleaned up. It does
break backwards compatibility. Most code does not run on both 2 and
3, but a few packages do (lxml, at least at some point). It
introduces unicode in exceptions and source compilation as well as
identifiers (although I urge you not to try that). Greatly improved
I/O API for unicode. Some new language constructs. Implementation
was cleaned up a lot.
New constructs: extended iterable unpacking (a, *b =
some_iterable), keyword-only arguments, nonlocal, function
parameter and return value annotations (use them e.g. for
documentation purposes).
print is now a function. Improved syntax for catching and raising
exceptions. Ellipsis (...) syntax element.
Some different behaviours. More powerful metaclasses. List
comprehensions are closer to generators now. Lesson: don't rely on
undocumented 'features'.
Classic classes are gone. Imports are absolute by default.
Python 2.6 and 2.7 make it possible to write code that you can run
through the 2to3 program to turn it into Python 3 code. You can
use abstract base classes to check for certain implementations (not:
is this a dict, but: is this dict-like).
Do you want beauty of code? Use Python 3. Do you want raw speed?
Use PyPy.
Numeric libraries work great on Python 3 and benefit from improvements
in the language.
Predictions:
Most people will write their code against 2.7 with the intention of
supporting PyPy.
Libraries that require the C API will become less common.
We will see libraries that support targeting both Python 2.7 and
3.x.
Now the second part of this talk: Python and the Web.
WSGI has a new specification for Python 3. Some work is done to port
implementations to Python 3. It just works; not really an issue anymore.
New developments: Improvements to PyPy support for database adapters.
Improvements in template compilation in e.g. Django to take advantage
of PyPy's behaviour. Some libraries are being ported over to Python 3.
Python 3 can work. You can start porting libraries over. Issues will
only be resolved if you actually try to port. Higher level code is
usually easier to port; low level libraries are trickier. Porting is
easier if you drop support for 2.6. For porting, see
http://bit.ly/python3-now.
WSGI works well in practice on Python 3. Pylons and BFG are now
Pyramid, which is a nice introduction into the Zope world. There is
less and less framework specific code out there; it is easier to share
code.
At the low level, Werkzeug and WebOb may merge at some point; they are
much alike.
Frameworks are good. In new frameworks we can explore new paradigms
and concepts. It is surprisingly easy to switch frameworks or parts
of them. Frameworks themselves are even merging.
I think PyPy will gain more traction in the python web world. It may
eventually become more popular than CPython. Supporting Python 3 in
PyPy should be easier than Python 2.
Things like 0MQ (Zero MQ) may help to have parts of your code in
Python 2 and part in Python 3.
[Less]
|
Posted
over 13 years
ago
by
Weblog
Jobert Abma, ethical hacker at Online24, talks about the ten commandments of security, at the PyGrunn conference in Groningen, The Netherlands. Organized by Paylogic and Goldmund Wyldebeast & Wunderliebe.
I will discuss ten things you need to
... [More]
think of to get a secure
application.
1. Your application is not the only attack vector. There can be weak
passwords in other parts of the stack or server. Social engineering
can become an issue.
2. Conduct risk assessments to identify risks. Then you start
controlling them. You can score a risk on Confidentiality, Integrity,
Availability.
3. Only trust your own code. And double check. The platform you are
developing on can have security problems.
4. 'Security by design' solves major issues. Application logic is an
important part. Centralize validation.
5. Always be aware of technical issues,
like CSRF, XSS.
6. Time (mis)management. You don't always get time from your manager
to solve security issues, even when you are aware of it.
7. Keep track of design documents and documentation. Is the design
secure? Does it still match the current functionality?
8. Process designing is one of the most important processes securing
an application. If a checkout process in a web shop is not designed
well so that 10,000 euros each day end up on someone else's bank
account, that is a problem.
9 Security can clash with usability. 'This email is not in our
database' is potentially interesting knowledge for an attacker.
10. Information is power. Encryption on the server side and on the
transport layer. If your database gets hacked, does that give the
attacker information he can use, like passwords and credit card
numbers?
One more thing: handle input as being dangerous. It will save your
ass more than once.
Summary: Security is not just a bunch of tricks. It is a process. [Less]
|
Posted
over 13 years
ago
by
Weblog
Òscar Vilaplana (Paylogic) talks about ØMQ (Zero MQ), at the PyGrunn conference in Groningen, The Netherlands. Organized by Paylogic and Goldmund Wyldebeast & Wunderliebe.
0MQ is about sockets how it should be. It is in many, many
languages.
... [More]
Of course looks is simple: you send a message and that guy
receives it! Well, it still requires work. But 0MQ is indeed
simple. (In python: import zmq). Messages are sent in the
background. You have a queue. If the receiver is not there, the
message just stays in the queue longer.
Messages are strings and they have a length. With multiparts they can
have a sender. You can send through TCP or UDP. You can publish and
subscribe. You send messages down a pipeline. As infrastructure
you can choose a queue, a forwarder or a streamer.
You can poll from a 0MQ socket or a regular TCP socket or stdin.
Code from this talk: http://oscarvilaplana.cat/zmqtalk.tar.gz [Less]
|
Posted
over 13 years
ago
by
Weblog
Òscar Vilaplana (Paylogic) talks about ØMQ (Zero MQ), at the PyGrunn conference in Groningen, The Netherlands. Organized by Paylogic and Goldmund Wyldebeast & Wunderliebe.
0MQ is sockets how it should be. Bindings are available in many, many
... [More]
languages, including Python. Of course messaging looks simple: you send a message and the guy on the other end
receives it! Well, it still requires work. But 0MQ is indeed
simple. (In python: import zmq). Messages are sent in the
background. You have a queue. If the receiver is not there, the
message just stays in the queue longer.
Messages are strings and they have a length. With multiparts they can
have a sender. You can send through TCP or UDP. You can publish and
subscribe. You send messages down a pipeline. As infrastructure
you can choose a queue, a forwarder or a streamer.
You can poll from a 0MQ socket or a regular TCP socket or stdin.
Code from this talk: http://oscarvilaplana.cat/zmqtalk.tar.gz [Less]
|
Posted
over 13 years
ago
by
Weblog
Òscar Vilaplana (Paylogic) talks about ØMQ (Zero MQ), at the PyGrunn conference in Groningen, The Netherlands. Organized by Paylogic and Goldmund Wyldebeast & Wunderliebe.
0MQ is sockets how it should be. Bindings are available in many, many
... [More]
languages, including Python. Of course messaging looks simple: you send a message and the guy on the other end
receives it! Well, it still requires work. But 0MQ is indeed
simple. (In python: import zmq). Messages are sent in the
background. You have a queue. If the receiver is not there, the
message just stays in the queue longer.
Messages are strings and they have a length. With multiparts they can
have a sender. You can send through TCP or UDP. You can publish and
subscribe. You send messages down a pipeline. As infrastructure
you can choose a queue, a forwarder or a streamer.
You can poll from a 0MQ socket or a regular TCP socket or stdin.
Code from this talk: http://oscarvilaplana.cat/zmqtalk.tar.gz [Less]
|
Posted
over 13 years
ago
by
Weblog
Pieter Noordhuis: Redis in Practice, at the PyGrunn conference in Groningen, The Netherlands. Organized by Paylogic and Goldmund Wyldebeast & Wunderliebe.
Redis is a key-value store. It can be compared to memcached. But it
natively supports
... [More]
strings, lists, sets, sorted sets and hashes.
Everything is stored in memory, so that puts a limit on what you can
put in it but also makes it very fast. You can also persist it
though, unlike memcached. Supports replication, so you can have one
master that you write to and say fifty slaves just for reading.
Any blob will do: ascii, utf-8, png. Example:
redis> set str "hello world"
OK
redis> get str
"hello world"
It runs in a single thread: no race conditions or locks; every
operation is atomic. This greatly simplifies replication.
Invalidate immediately:
redis> del page:/home
(integer) 1
You can have rate limiting:
INCR limit
EXPIRE limit 60 # iff INCR == 1
Lists are a natural fit: RPUSH a new job (push it at the right of the
list) and LPOP a job from the left of the list. With PubSub you can
set up some notifications when jobs are done.
Sets: unordered sets of unique values. Ordered Sets of unique values:
ZADD an item to increase its score, ZREM to decrease it. You can use
this easily to show currently logged in users, top users for some
measurement, etc.
We can do 100,000 gets and sets per second on commodity hardware.
With tweaking and better hardware we have heard of 1.5 million per
second.
You can have durability through snapshotting: save every N seconds or
every N changes.
Getting started: http://redis.io/download, no dependencies, make
it and you are done. [Less]
|
Posted
over 13 years
ago
by
Reinout van Rees' weblog
I got a request for repeating an earlier talk
about practical project automation at the PyGrunn
conference. A conference in Groningen in the north of the Netherlands about
python and related subjects. Sounded great, so I set out to update
... [More]
the
talk. The basic structure could remain intact, as I'm still practicing what I
preached at that older talk even though I'm at a new job right now. So it is
really valid :-)
Projects are more than just a collection of code. Projects are
important. Projects are only finished when they're deployed. On the server
so that the website is visible to the customer, for instance. Or installed on
the customer's laptop. So one way or another you'll have to grab together all
the pieces and install it somewhere. But you also have to do that yourself on
your development laptop, right? And your colleagues, too. How many pages of
instructions do you want to give to your colleagues? You'd better be
automating this! And... automation helps heaps regarding quality.
The cobbler's children go barefoot. We write software for
our clients, but our own processes are non-automated manual and menial
steps. Menial is bad, as it is boring. So you cut corners. And you make
mistakes. So automate away those boring bits and gain more time for the fun
stuff!
The first and most essential step is build automation. Building your
software. Collecting all the dependencies and installing them together with
your project. Preparing everything, generating documentation, running
tests. Whether it is ./configure;make;make install, ant/maven or python's
buildout. You need basically a single command to get your project all set
up. That's the only way to make sure you and your colleagues can reliably and
comfortably get started on a project. And the only way you can rely on a
proper installation on the webserver or your customer's laptop.
Those build tools are great points for hooking in more automation. All build
tools are extensible. Buildout, for instance, has so-called "recipes" to
extend it.
For instance, with this buildout snippet ...:
[apacheconf]
recipe = collective.recipe.template
input = ${buildout:directory}/etc/mysite.apache.conf.in
output = ${buildout:directory}/etc/mysite.apache.conf
... the following example apache config file snippet ...:
ServerName mysite.example.org
CustomLog ${buildout:directory}/var/log/access.log combined
ErrorLog ${buildout:directory}/var/log/error.log
DocumentRoot ${buildout:directory}/var/www
...
RewriteRule ^(.*) http://localhost:${django:gunicorn_port}$1 [P]
... is automatically and reliably converted to:
ServerName mysite.example.org
CustomLog /srv/mysite/var/log/access.log combined
ErrorLog /srv/mysite/var/log/error.log
DocumentRoot /srv/mysite/var/www
...
RewriteRule ^(.*) http://localhost:10003$1 [P]
So the filesystem paths and port numbers are all stored in just one place and
reliably inserted into the actual apache configuration. Oh wait, we do need
to make sure that local apache logfile directory exists, otherwise apache
won't start. But, don't worry. We'll make sure buildout just generates that
directory for us automatically:
[mkdir]
recipe = z3c.recipe.mkdir
paths =
${buildout:directory}/generated
${buildout:directory}/var/geotiffs
${buildout:directory}/var/log
${buildout:directory}/var/media
${buildout:directory}/var/sqlite
${buildout:directory}/var/static
When we can set up our project in such a way, running tests automatically also
is possible. Just grab a copy of the jenkins continuous integration server and play with it. Set it up to run the available
tests on your software every time you change something in your version control
system's repository. This prevents errors like hardcoded local filename paths
and files that you forgot to commit.
And, again, extend. Your jenkins can also be extended. It runs tests just
fine, but it can also do more with a couple of plugins. Coverage testing! Look
up coverage and hook it up in
jenkins. You'll get a nice report on the amount of lines in your codebase
that's covered by tests. A great quality measure!
Likewise, I'd recommend hooking up pep8
and pyflakes into jenkins for even
more code quality monitoring. But they're also very useful stand-alone:
automated pep8 code style checking and pyflakes' excellent detection of
missing/extraneous imports and unused/missing variables. And if you've got
javascript: make sure your javascript code is always automatically checked
with jslint or jshint. It
helps a lot with preventing errors in internet explorer.
An important extra mind set to keep in mind regarding automation: make it
easy to do the right thing. Human beings follow the path of least
resistance. If it is easier to cut corners, corners will be cut. If it
easier to call some script that does the right thing, the right thing will be
done.
For instance setting up a project. Do you take the time to add a proper
documentation generation setup? Do you add test setup code even though you
don't have tests yet? Do you add tests later if there's no test setup yet?
So generate a new project automatically with all these goodies already in
place. See the movie below for an example. Use so-called "paster
skeletons". See also one of my blog entries about gathering and distributing
organizational knowledge with those
kind of skeletons.
Paster skeleton demo (with nensskel):
Releasing a package is also something where corners are cut. Do you make
proper tags all the time? Do you add a new header in the changelog for the new
version? Do you even update the version number? And both in the changelog and
the setup.py? Automate those boring bits away and do the right thing, all
the time. Use zest.releaser.
zest.releaser demo:
So, in summary:
Make it easy to do the right thing.
Automate your builds. This is the number one step you really really really
need.
Automate your tests and get a test server like jenkins to run them
automatically.
Extend your build tool and your test server. Those are handy points to hook
in extra automation steps: don't let that opportunity pass!
Here's the presentation (minus the two movies shown above):
Practical project automation (PyGrunn conference) View more presentations from Reinout van Rees
Redis in practice - Pieter Noordhuis (PyGrunn conference)
http://reinout.vanrees.org/weblog/2011/05/20/pygrunn-redis.html
Reinout van Rees
2011-05-20T00:00:00 01:00
2011-05-20T13:02:00 01:00
Pieter works on the core of Redis. The core is just two people, but still :-)
So it was fun to have him give a talk here at this conference.
Redis is an advanced key/value store. You probably have
heard of memcached. It is also a key/value
store. Redis is something similar, but it has more data types. Memcached only
stores blobs, redis also has lists, sets, etc.
Redis stores everything in memory, so it is blindingly fast. Single thread, so
just fire up multiple intsances if you've got more cores. Unlike memcached,
you can persist it to disk. Replication is build-in. And it has client
libraries for some 20 languages.
Redis has build-in expiry. Just explicitly set the expiration time, no need to
keep track of expiration yourself.
Regarding lists, redis shows why it is more advanced than other non-datatype
key/value stores: it supports pushing and popping at the begin and end of
lists. You can use this for instance for showing recent events on some
dashboard. Or a twitter-like timeline. To prevent these lists from becoming
too big, you can tell redis to limit the length to a certain number. Now
multiple clients can add items to the list with impunity without making the
list too big.
Lists are also a natural fit for job queues. You add new jobs with a right
push and workers grab jobs with a left pop. FIFO queue.
A different datatype: sets. Just unordered sets of items. There are no
duplicates, though, like is normal for sets. The good thing about sets is
constant-time membership tests. And cheap intersections and unions.
A really powerful datatype is the sorted set. An ordered set of unique
values. Redis orders them in-memory by score. Technically, behind the scenes,
it is a so-called skiplist. The extra sorting takes extra time. Inserting
isn't constant-time anymore, but log(N).
Every command in redis is atomic. But what if we want to group multiple
commands in a transaction? For this, redis can queue up commands until you
tell it to execute them all in one go. So there's no rollback, but it can
executed at the same time. The way redis works, this is enough.
On a regular machine, you can get 100k GET/SET per second. Very performant.
Persistance is handled with snapshotting. You can save every x seconds or
every x changes. You an also write all writes to a log that you can replay.
Side effects of trying to understand a large codebase - Henk Doornbos (PyGrunn conference)
http://reinout.vanrees.org/weblog/2011/05/20/pygrunn-large-codebase.html
Reinout van Rees
2011-05-20T00:00:00 01:00
2011-05-20T12:02:00 01:00
Since a few months he's head of paylogic's
architecture department. So obviously he had to learn all about the existing
codebase. What does it? What's infrastructure? What's front office, what
back office?
To get an idea, he ran sloccount over
the codebase. Mostly 84k lines of python code. 560 meter of code, printed out
:-)
Becoming familiar with such a codebase takes some doing. You can look at the
components, the classes, the features and in the end the code. A good way to
get started is by looking at bug reports and feature requests and figuring
them out. (They use fogbugz as a bug
reporting system, btw).
He digged into a specific bug and had to check by hand in the website
whether the issue was fixed already. So he started thinking about automating
the tests. What if we use the robot framework with selenium? And describe the bug reports in the language
needed by robot? Normally a ticket already includes the steps needed to
reproduce and the expected result, right? That would effectively make the the
bug reports executable!
With robot framework you can write reasonably understandable lines like:
Given I go to the backoffice customer 12345
And I order a new xyz
And I set the country to France
Then I should not get an error 500
Parts of those sentences then have to be configured in robot
framework. They're basically macros. And I order a param 1 is a macro that
executes a couple of selenium commands.
In the end, a bug report often really expresses a missing requirement. What we
now have with these executable bug reports are automated acceptance tests for
those requirements.
What he wants: requirements expressed as proper user stories:
To have some fun (the reason for the action)
as a customer (the actor)
I want to buy a ticket. (the actual action)
He wants this format as it forces you to write down an actual
reason. Without a reason, there's no business value. Without value, the
requirement should not be implemented. And it forces you to be clear about the
actual actor.
Conclusion: requirements can be written such that the customer understands
them, developers can implement them and machines can understand them.
The state of python and the web - Armin Ronacher (PyGrunn conference)
http://reinout.vanrees.org/weblog/2011/05/20/pygrunn-state-of-webframeworks.html
Reinout van Rees
2011-05-20T00:00:00 01:00
2011-05-20T15:32:00 01:00
Armin's a founding member of the pocoo team with
projects like jinja2, werkzeug, sphinx and so.
Python 3 is the elephant in the room that nobody's talking about. There's a
lot of python 2 code and porting is turning out to be harder than expected.
Some recent python developments:
Unladen swallow is resting. Development stopped. It was the number one
hope to make python faster.
Python 3.2 got released.
Python's packaging is being worked on a lot.
Pypy turns out to be quite fast nowadays. Really really fast.
What's pypy? It is python written in python. Well, in
restricted python, which can be translated to C automatically. It is 3.7
till 40 times faster than regular cpython!!! Things that will work? Django,
flask, pyglet, twisted, sqlite, ctypes and so on.
A problem with pypy is that there's only experimental support for the python C
API. And C-level extensions will always be slow. And there is no reference
counting, so you need (for instance) to close files yourself.
But, in summary, python 3 is where all the new language development is
happening. What does python 3 mean?
Unicode cleanup. All text-like things are either unicode text or a binary
blob.
The language is cleaned up a bit.
The cost... there's some backwards incompatibility.
The good parts about python 3:
Unicode everywhere where it wasn't possible before, like in exceptions. And
the python source code is now also compiled into unicode, so unicode
variables are now possible.
Big IO improvements regarding unicode.
A couple of handy new language constructs. For instance extended iterable
unpacking like a, b, c* = some_tuple() with c containing the rest of
the items.
The under-the-hood implementation was cleaned up a lot.
Various smaller things. Print is now a function. Obscure standard library
modules were removed. No more classic classes. Absolute imports are standard
now. Etc.
New python 2.6/2.7 features contain features that make it possible to write
code that also works on python 3:
Print as a function.
New IO library.
Explicit byte literals to match python 3's new string/unicode/bytestring
handling.
Advanced string formatting. More advanced than the regular %s items
string formatting. A new syntax that allows you to rearrange items in the
string which is often needed for translatable strings.
More things like support for fractions. Abstract base classes (basically
documented ducktyping...). The multiprocessing package (which works much
like the thread library, but fur subprocesses).
Going forward: currently you basically have to choose between beauty or speed.
The beauty of the code (python 3) or the raw performance of pypy.
Pypy itself will stay written in python 2, but the interpreter might someday
support python 3.
Library support:
Numeric libraries work great on python 3.
Pypy still lack proper C-api support.
Some predictions:
Most people will write for python 2.7 without worrying about backwards
compatibility. And with the intention of supporting pypy.
Python C api libraries will become less common. ctypes could help a lot and
is often better. Pure python (with the help of pypy) is also fast.
We will see libraries that support both 2.7 and 3.x.
Python and the web, especially regarding python 3.
There's a wsgi spec revision for python 3. And there's some work done on
porting implementations to python 3. Right now WSGI itself isn't something
people really want to change anymore: it just works.
Django and jinja could take advantage of pypy's behaviour for their template
compilation. He's going to work on an implementation that allows django and
jinja to use the same code.
How can we make python 3 work on the web?
Port libraries over to 3.x.
Issues with python 3 will only be resolved when people are actively porting
and trip over those issues.
Higher level items like Django are easier to port over, the low-level
libraries for network connections and so are the really hard nuts to knack.
It is not too hard to port if you can drop python 2.6 support. Good python
2.7 code can generally survive a 2-to-3 conversion just fine.
See http://bit.ly/python3-now for some pointers and tips.
We're doing great regarding python web in general:
WSGI works out well in practice.
Pylons and BFG merged into pyramid, which means it is a nice introduction
into the great ZOPE world.
There's less and less framework specific code out there, there's lots of
reusable generic code.
New frameworks help figure out new paradigms.
Switching frameworks or using parts of different frameworks in the same code
is generally no big problem. Python's namespacing helps a lot here. You can
use zope code in django just fine.
The ten commands of security - Jobert Abma (PyGrunn conference)
http://reinout.vanrees.org/weblog/2011/05/20/pygrunn-ten-commands-security.html
Reinout van Rees
2011-05-20T00:00:00 01:00
2011-05-20T14:22:00 01:00
He's an ethical hacker at online24, hacking SME
and government websites as a job.
Some items to keep in mind regarding security.
Your application isn't the only attack vector. Your app is on a server. Who
has access to that server? Which other apps run on that server? And the
bigger the organization, the more chances for social engineering: for
instance mimicking an employee and asking for a password for xyz.
Conduct risk assessments to identify risks. Identify the risks, only then
can you take action on them. Rate your identified risks.
Only trust your own code. And double check. It is not OK to rely implicitly
on the underlying platform.
Something that can solve a lot of major issues: security by design. Design
it in. Centralize certain processes like validation or authentication and
make them secure.
Always be aware of technical issues. If you catch errors, make sure you know
when something goes wrong because of catching the errors, for instance.
Time (mis)management is a big problem. If you don't have enough time to
properly do security, you'll have problems. Jobert thinks the big
playstation hack was a security risk they were aware of, but just didn't
have/got the time to properly fix it.
Change management: keep track of design documentation. Make sure you keep
the design documents up to date, otherwise you won't see the impact on
security that a certain change could have. Also keep the risk assessment up
to date when making changes!
Most important aspect in securing an application: the process design. If the
checkout process in a web shop isn't properly designed, you can have a
serious security breach.
Security is more important than usability, even though people sometimes say
the opposite. Look at a lost password form. Don't say this email adress
is invalid when the email address isn't in your database as that means
you're saying that all non-failing addresses ARE in your database. Suddenly
the attacker has the first half of the user/password combination! But you
can get good usability anyway by saying An email was send. If you don't get
a mail, try again (or your email address wasn't in our database).
Information is power. But do you want to have the power? Don't store
unencrypted passwords, as they can only get stolen. Minimize the risks of
your application. Just hash the passwords, for instance.
Input is dangerous. Handle it that way. Look for instance at django's good
offensive input filtering and sanitization.
Fossgis: mapproxy workshop - Dominik Helle and Oliver Tonnhofer
http://reinout.vanrees.org/weblog/2011/04/08/fossgis-mapproxy-workshop.html
Reinout van Rees
2011-04-08T00:00:00 01:00
2011-04-08T11:01:00 01:00
After the actuall conference was over, I attended another workshop: about
mapproxy. At my work, we have done nothing regarding
map caching and proxying and tiling (we have done a lot on regular web
caching, though). We're just generating complete images on demand via WMS as
lots of our data is pretty dynamic. But there is mostly-static data. And by
setting up our software differently, tiling and proxying and caching becomes
possible. At the moment I can only dream about what that'll do to the speed of
our web applications... :-) So that's why I attended this workshop: getting to
know at least one of the available map proxy tools.
Ok, what's mapproxy?
Started in 2008.
Made with python, so platform-independent. (It is harder to install on
windows than on osx/linux, though, probably because of some dependencies).
It is more or less a proxy, a cache, between a WMS server and clients. It
speeds it up.
The advantage of mapproxy compared to most other proxies is that it supports
all of WMS, so desktop clients also work fine with it.
A core piece of functionality of mapproxy is that it tiles the map. In order
to cache a map, it subdivides it into parts.
What are the possible input data types?
WMS
WMS-C
TMS
Tiles (like google maps, openstreetmap).
And what can come out? (Note that this means that you can theoretically take
google tiles and turn them into a WMS service!)
WMS
WMS-C
TMS (Tile Map Service, so google-tile-like, only sepecified properly)
KML
Some comments from the hands-on-keyboard part of the workshop:
MapProxy is just an pip install MapProxy away. Well packaged!
You can create an example configuration with paster. This made a good
impression on me, as it means it is really a well-packaged python product.
Whoever designed those German keyboard layouts should be shot. Twice. Darn,
I kept mis-typing all the time.
Hey, a nice .ini python paste file to run the server with. Integration
with standard python wsgi tools and so easy integration into web
servers. Looks good.
Some instructions are at http://workshop.omniscale.net/workshop.html
In the end, I found it a very good workshop. Glad I attended. And the mapproxy
software left a very good impression on me: I'm looking forward to using it.
Fossgis: teamwork on webgis with mapfish - Anja Sigesmund
http://reinout.vanrees.org/weblog/2011/04/06/fossgis-teamwork.html
Reinout van Rees
2011-04-06T00:00:00 01:00
2011-04-06T09:53:00 01:00
They used the mapfish web framework (python based:
pylons) for a website about bio-energy. They developed it fully using open
source software and they also were open (to the other project partners) during
development (which was new for them).
Software they used: postgis, mapserver, mapfish, openlayers. For the
development process: trac.
Before starting the main development, they had a workshop with all
participants where they talked about the project. Looking at geo websites they
liked, for instance. Coming up with user interface ideas. Everyone was really
energized by the workshop.
The software was developed by several people and companies so everyone was
constantly working on the same code. Some of the participants normally worked
alone (einzelkaempfer) and they liked it a lot to be able to cooperate for
a change.
Some advantages of this development model:
Broad acceptance of the web application by cooperative development and early
involvement.
Knowledge transfer. Everyone has a much broader knowledge than would
otherwise have been the case.
Costs and resource usage was lower. Important for the municipal organisation
that started the project: costs is always a problem and they simply don't
have the resources anymore to do everything themselves.
Intelligent, friendly, fun cooperation.
See http://www.energieatlas.org/ (click on bioenergie).
[Less]
|