Skip to content. | Skip to navigation

Personal tools
Log in
You are here: Home

Open Source Posts

Paul Ramsey: 2.1.8 Released

From Planet PostgreSQL. Published on Jul 06, 2015.

Due to a number of bugs capable of crashing a back-end, we have released 2.1.8. If you are running an earlier version on a public site, we recommend that you update as soon as possible.

View all closed tickets for 2.0.8.

Michael Paquier: Postgres 9.5 feature highlight: New missing_ok option for file access functions

From Planet PostgreSQL. Published on Jul 05, 2015.

Finding its origin as a fix for pg_rewind where rewind process could fail if a file marked as listed in the source server was removed before fetching its data (see more details here, this could impact temporary files or relation file for example), here is a small feature of PostgreSQL 9.5 that may be useful for application developers:

commit: cb2acb1081e13b4b27a76c6b5311115528e49c59
author: Heikki Linnakangas <>
date: Sun, 28 Jun 2015 21:35:46 +0300
Add missing_ok option to the SQL functions for reading files.

This makes it possible to use the functions without getting errors, if there
is a chance that the file might be removed or renamed concurrently.
pg_rewind needs to do just that, although this could be useful for other
purposes too. (The changes to pg_rewind to use these functions will come in
a separate commit.)

The read_binary_file() function isn't very well-suited for extensions.c's
purposes anymore, if it ever was. So bite the bullet and make a copy of it
in extension.c, tailored for that use case. This seems better than the
accidental code reuse, even if it's a some more lines of code.

Michael Paquier, with plenty of kibitzing by me.

Postgres has a set of superuser functions allowing to have a look at the files of PGDATA from the SQL interface:

  • pg_ls_dir, to list the files of a given path.
  • pg_read_file, to read a given file and return its data as text.
  • pg_read_binary_file, to read a given file and return its data as bytea.
  • pg_stat_file, to get statistics about a file similarly to the system function stat().

With the above commit, all those functions have gained a missing_ok option allowing to not fail should a file (or path for pg_ls_dir) selected by the user not exist, returning a NULL result instead (or more or less an empty record for pg_ls_dir). In each case, the default is false.

In the case of pg_read_file and pg_read_binary_file, this new boolean flag is available with a new 4-argument version of those functions (for backward-compatibility purposes).

=# SELECT pg_read_file('', 0, 32, true) AS data;
 # Do not edit this file manually
(1 row)
=# SELECT pg_read_file('file_not_exists', 0, 1, true) AS data;
(1 row)

For pg_stat_file, a second argument is added.

=# SELECT * FROM pg_stat_file('', true);
-[ RECORD 1 ]+-----------------------
size         | 88
access       | 2015-07-05 21:15:52+09
modification | 2015-07-03 21:42:12+09
change       | 2015-07-03 21:42:12+09
creation     | null
isdir        | f
=# SELECT * FROM pg_stat_file('file_not_exists', true);
-[ RECORD 1 ]+-----
size         | null
access       | null
modification | null
change       | null
creation     | null
isdir        | null

pg_ls_dir is a bit more special, actually two options have been added:

  • missing_ok to avoid an error in case of a missing path.
  • include_dot_dirs to list '.' and '..' in the content list of this path.

The problem with including only missing_ok would be that as it returns an empty content list should the path specified by user be missing, it would not be possible to make the difference between an empty folder that exists, and one which is missing. include_dot_dirs is here to make the difference between both cases: if the content listed just contains '.' and '..' and missing_ok is true, it means that the directory defined by user is empty but exists. missing_ok is the second argument of the function and include_dot_dirs its third argument.

-- This one does not exist.
=# SELECT pg_ls_dir('not_exists', true, true);
(0 rows)
-- No 2PC transactions are running.
=# SELECT pg_ls_dir('pg_twophase', true, true);
(2 rows)

Combined together, this set of functions can be quite powerful, here is for example a WITH RECURSIVE query able to list all the files in PGDATA (query more or less a copycat of the one used by pg_rewind in libpq_fetch.c), informing as well the size of each entry or if it is a directory or not.

WITH RECURSIVE files (path, filename, size, isdir) AS (
  SELECT '' AS path, filename, size, isdir FROM
   (SELECT pg_ls_dir('.', true, false) AS filename) AS fn,
   pg_stat_file(fn.filename, true) AS this
  SELECT parent.path || parent.filename || '/' AS path,
   fn, this.size, this.isdir
   FROM files AS parent,
   pg_ls_dir(parent.path || parent.filename, true, false) AS fn,
   pg_stat_file(parent.path || parent.filename || '/' || fn, true) AS this
   WHERE parent.isdir = 't' AND
   this.size IS NOT NULL)
SELECT path || filename AS file_path, size, isdir
FROM files;

Then use for example pg_read_file to grab the data of a given file and it is possible to fetch a complete copy of PGDATA via SQL (superuser rights needed for that of course).

Barry Jones: Heroku PostgreSQL vs Amazon RDS for PostgreSQL

From Planet PostgreSQL. Published on Jul 03, 2015.

PostgreSQL is becoming the relational database of choice for web development for a whole host of good reasons. That means that development teams have to make a decision on whether to host their own or use a database as a service provider. The two biggest players in the world of PostgreSQL are Heroku PostgreSQL and Amazon RDS for PostgreSQL. Here's a detailed comparison.

Christoph Berg: PostgreSQL 9.5 in Debian

From Planet PostgreSQL. Published on Jul 02, 2015.

Today saw the release of PostgreSQL 9.5 Alpha 1. Packages for all supported Debian and Ubuntu releases are available on

deb YOUR_RELEASE_HERE-pgdg main 9.5

The package is also waiting in NEW to be accepted for Debian experimental.

Being curious which PostgreSQL releases have been in use over time, I pulled some graphics from Debian's popularity contest data:

Before we included the PostgreSQL major version in the package name, "postgresql" contained the server, so that line represents the installation count of the pre-7.4 releases at the left end of the graph.

Interestingly, 7.4 reached its installation peak well past 8.1's. Has anyone an idea why that happened?

Joshua Drake: Elevating your confidence with the Elephant's restoration capabilities

From Planet PostgreSQL. Published on Jul 02, 2015.

In the beginning

There was Unix, Linux and Windows. They all run on hardware and that hardware all has bugs. What is the best way to work around hardware bugs? Backups.

You haven't had bad hardware, only bad developers? That's o.k., we have a solution to them too. It is called backups.

You haven't had bad hardware or bad developers, just bosses who still demand to have direct access to the data even though they haven't proven an ability to extract useful information without an extreme amount of hand holding? That's o.k., we have a solution to them too. It is called backups.

You haven't had any of the above? Lucky you! Can we move to your country or will you share whatever it is that you are putting in your tea? It sounds fantastic.

Either way, we have a solution to your data confidence, it is called backups and that is what this training is about.

See more here!

Register here!

Josh Berkus: Test PostgreSQL 9.5 Alpha in a Docker container

From Planet PostgreSQL. Published on Jul 02, 2015.

Yay, the first alpha of PostgreSQL 9.5 is out!  This has lots of cool stuff in it, which you can read about elsewhere.  What I'm posting here is a new experiment: offering up Docker images for testing the alphas and betas.

TL:DR = Image here, information and instructions on the wiki.

This occurred to me last week at DockerCon (naturally enough); one of the reasons more people don't test the PostgreSQL beta releases is that it can be a pain to install them, especially with all of the stuff you want to test.  So I've just put a couple days' work into making it easier for you: all you gotta do is install the image and you can test away.

And, since you're testing in a container, you don't have to worry about messing up the PostgreSQL installation on the desktop/laptop/server you normally use.  If you feel like doing your testing in a public cloud, many of them have ways to run containers as cloud servers, so that's easy to do.

I created this image crammed full of stuff: all of contrib, jsquery, pgxn, PL/python and PL/perl, a sample database, etc., so that you'd have something to test.  I wasn't able to include PL/Lua due to build issues, nor PostGIS because I'm building from source, and building PostGIS from source is kinda scary.  If you want to help me build a better test image, though, the Dockerfile project for it is here.

In the meantime, I've removed another excuse for not testing PostgreSQL 9.5.  So what are you waiting for?  Test already!

Julien Rouhaud: How About Hypothetical Indexes ?

From Planet PostgreSQL. Published on Jul 02, 2015.

After so much time missing this features, HypoPG implements hypothetical indexes support for PostgreSQl, available as an extension.


It’s now been some time since the second version of PoWA has been announced. One of the new feature of this version is the pg_qualstats extension, written by Ronan Dunklau.

Thanks to this extension, we can now gather real-time statistics to detect missing indexes, and much more (if you’re interested in this extension, you should read Ronan’s article about pg_qualstats). And used with PoWA, you have an interface that allows you to find the most consuming queries, and will suggest you the missing indexes if they’re needed.

That’s really nice, but now a lot of people come with this natural question: Ok, you say that I should create this index, but will PostgreSQL eventually use it ?. That’s actually a good question, because depending on many parameters (in many other things), PostgreSQL could choose to just ignore your freshly created index. That could be a really bad surprise, especially if you had to wait many hours to have it built.

Hypothetical Indexes

So yes, the answer to this question is hypothetical indexes support. That’s really not a new idea, a lot of popular RDBMS support them.

There has already been some previous work on this several years ago, presented at pgCon 2010, which was implenting much more than hypothetical indexes, but this was a research work, which means that we never saw those features coming up in PostgreSQL. This great work is only available as a fork of a few specific PostgreSQL versions, the most recent being 9.0.1.

lightweight implementation: HypoPG

I had quite a different approach in HypoPG to implement hypothetical indexes support.

  • first of all, it must be completely pluggable. It’s available as an extension and can be used (for now) on any 9.2 or higher PostgreSQL server.
  • it must be as non intrusive as it’s possible. It’s usable as soon as you create the extension, without restart. Also, each backend has it’s own set of hypothetical indexes, which mean that adding an hypothetical index will not disturb other connections. Also, the hypothetical indexes are stored in memory, adding/removing a huge amount of them will not bloat your system catalog.

The only restriction in implementing such a feature as an extension is that you can’t change the syntax without modifying the PostgreSQL source code. So, everything has to be done through user defined functions, and change regular behaviour of existing functionnalities, like the EXPLAIN command. We’ll study the details later.


For now, the following functions are available:

  • hypopg(): return the list of hypothetical indexes (in a similar way as pg_index).
  • hypopg_add_index(schema, table, attribute, access_method): create a 1-column only hypothetical index.
  • hypopg_create_index(query): create an hypothetical index using a standard CREATE INDEX statement.
  • hypopg_drop_index(oid): remove the specified hypothetical index.
  • hypopg_list_indexes(): return a short human readable version list of available hypothetical indexes.
  • hypopg_relation_size(oid): return the estimated size of an hypothetical index
  • hypopg_reset(): remove all hypothetical indexes

If some hypothetical indexes exists for some relations used in an EXPLAIN (without ANALYZE) statement, they will automatically added to the list of real indexes. PostgreSQL will then choose to use them or not.


Installing HypoPG is quite simple. Assuming you downloaded and extracted a tarball in the hypopg-0.0.1 directory, are using a packaged version of PostgreSQL and have -dev packages:

$ cd hypopg-0.0.1
$ make
$ sudo make install

Then HypoPG should be available:

rjuju=# CREATE EXTENSION hypopg ;

Let’s try some really simple tests. First, create a small table:

rjuju=# CREATE TABLE testable AS SELECT id, 'line ' || id val
rjuju=# FROM generate_series(1,1000000) id;
SELECT 100000
rjuju=# ANALYZE testable ;

Then, let’s see a query plan that should benefit an index that’s not here:

rjuju=# EXPLAIN SELECT * FROM testable WHERE id < 1000 ;
                          QUERY PLAN
 Seq Scan on testable  (cost=0.00..17906.00 rows=916 width=15)
   Filter: (id < 1000)
(2 rows)

No surprise, a sequential scan is the only way to do. Now, let’s try to add an hypothetical index, and EXPLAIN again:

rjuju=# SELECT hypopg_create_index('CREATE INDEX ON testable (id)');
(1 row)

Time: 0,753 ms

rjuju=# EXPLAIN SELECT * FROM testable WHERE id < 1000 ;
                                          QUERY PLAN
 Index Scan using <41079>btree_testable_id on testable  (cost=0.30..28.33 rows=916 width=15)
   Index Cond: (id < 1000)
(2 rows)

Yeah! Our hypothetical index is used. We also notice that the hypothetical index creation is more or less 1ms, which is way less than the real index creation would have last.

And of course, this hypothetical index is not used in an EXPLAIN ANALYZE:

rjuju=# EXPLAIN ANALYZE SELECT * FROM testable WHERE id < 1000 ;
                                                 QUERY PLAN
 Seq Scan on testable  (cost=0.00..17906.00 rows=916 width=15) (actual time=0.076..234.218 rows=999 loops=1)
   Filter: (id < 1000)
   Rows Removed by Filter: 999001
 Planning time: 0.083 ms
 Execution time: 234.377 ms
(5 rows)

Now let’s go further:

rjuju=# EXPLAIN SELECT * FROM testable
rjuju=# WHERE id < 1000 and val LIKE 'line 100000%';
                                         QUERY PLAN
 Index Scan using <41079>btree_testable_id on testable  (cost=0.30..30.62 rows=1 width=15)
   Index Cond: (id < 1000)
   Filter: (val ~~ 'line 100000%'::text)
(3 rows)

Our hypothetical index is still used, but an index on id and val should help this query. Also, as there’s a wildcard on the right-side of the LIKE pattern, the operator class text_pattern_ops is needed. Let’s check that:

rjuju=# SELECT hypopg_create_index('CREATE INDEX ON testable (id, val text_pattern_ops)');
(1 row)

Time: 1,194 ms

rjuju=# EXPLAIN SELECT * FROM testable
rjuju=# WHERE id < 1000 and val LIKE 'line 100000%';
                                              QUERY PLAN
 Index Only Scan using <41080>btree_testable_id_val on testable on testable  (cost=0.30..26.76 rows=1 width=15)
   Index Cond: ((id < 1000) AND (val ~>=~ 'line 100000'::text) AND (val ~<~ 'line 100001'::text))
   Filter: (val ~~ 'line 100000%'::text)
(3 rows)

And yes, PostgreSQL decides to use our new index!

Index size estimation

For now, the index size estimation is done quickly, which can give us a clue on what would be the real index size.

Let’s check the estimated size of our two hypothetical indexes:

rjuju=# SELECT indexname,pg_size_pretty(hypopg_relation_size(indexrelid))
rjuju=# FROM hypopg();
           indexname           | pg_size_pretty 
 <41080>btree_testable_id     | 25 MB
 <41079>btree_testable_id_val | 49 MB
(2 rows)

Now, create the real indexes, and compare the sizes:

rjuju=# CREATE INDEX ON testable (id);
Time: 1756,001 ms

rjuju=# CREATE INDEX ON testable (id, val text_pattern_ops);
Time: 2179,185 ms

rjuju=# SELECT relname,pg_size_pretty(pg_relation_size(oid))
rjuju=# FROM pg_class WHERE relkind = 'i' AND relname LIKE '%testable%';
       relname       | pg_size_pretty 
 testable_id_idx     | 21 MB
 testable_id_val_idx | 30 MB

The estimated index size is a bit higher than the real size. It’s on purpose. If the estimated index size is less than an existing index, PostgreSQL would prefer to use the hypothetical index than the real index, which is definitively not interesting. Also, to simulate a bloated index (which is quite frequent on real indexes), a hardcoded 20% bloat factor is added. Finally, the estimation could also be improved a lot.


This 0.0.1 version of HypoPG is still a work in progress, and a lot of work is still needed.

Here are the main limitations (at least that I’m aware of):

  • only btree hypothetical indexes are supported
  • no hypothetical indexes on expression
  • no hypothetical indexes on predicate
  • tablespace specification is not possible
  • index size estimation could be improved, and it’s not possible to change the bloat factor

However, I believe it can already be helpful.

What’s next ?

Now, the next step is to implement HypoPG support in PoWA, to help DBA decide wether they should create the suggested index or not, and remove the current limitations.

If you want to try HypoPG, here is the github repository:

Stay tuned!

How About Hypothetical Indexes ? was originally published by Julien Rouhaud at rjuju's home on July 02, 2015.

Greg Sabino Mullane: How fast is pg_upgrade anyway?

From Planet PostgreSQL. Published on Jul 01, 2015.

Back in the old days, upgrading Postgres required doing a pg_dump and loading the resulting logical SQL into the new database. This could be a very slow, very painful process, requiring a lot of downtime. While there were other solutions (such as Bucardo) that allowed little (or even zero) downtime, setting them up was a large complex task. Enter the pg_upgrade program, which attempts to upgrade a cluster with minimal downtime. Just how fast is it? I grew tired of answering this question from clients with vague answers such as "it depends" and "really, really fast" and decided to generate some data for ballpark answers.

Spoiler: it's either about 3.5 times as fast as pg_dump, or insanely fast at a flat 15 seconds or so. Before going further, let's discuss the methodology used.

I used the venerable pgbench program to generate some sample tables and data, and then upgraded the resulting database, going from Postgres version 9.3 to 9.4. The pgbench program comes with Postgres, and simply requires an --initialize argument to create the test tables. There is also a --scale argument you can provide to increase the amount of initial data - each increment increases the number of rows in the largest table, pgbench_accounts, by one hundred thousand rows. Here are the scale runs I did, along with the number of rows and overall database size for each level:

Effect of --scale
--scaleRows in pgbench_accountsDatabase size
10010,000,0001418 MB
15015,000,0002123 MB
20020,000,0002829 MB
25025,000,0003535 MB
30030,000,0004241 MB
35035,000,0004947 MB
40040,000,0005652 MB
45045,000,0006358 MB
50050,000,0007064 MB
55055,000,0007770 MB
60060,000,0008476 MB

To test the speed of the pg_dump program, I used this simple command:

$ pg_dump postgres | psql postgres -q -p 5433 -f -

I did make one important optimization, which was to set fsync off on the target database (version 9.4). Although this setting should never be turned off in production - or anytime you cannot replace all your data, upgrades like this are an excellent time to disable fsync. Just make sure you flip it back on again right away! There are some other minor optimizations one could make (especially boosting maintenance_work_mem), but for the purposes of this test, I decided that the fsync was enough.

For testing the speed of pg_upgrade, I used the following command:

$ pg_upgrade -b $BIN1 -B $BIN2 -d $DATA1 -D $DATA2 -P 5433

The speed difference can be understood because pg_dump rewrites the entire database, table by table, row by row, and then recreates all the indexes from scratch. The pg_upgrade program simply copies the data files, making the minimum changes needed to support the new version. Because of this, it will always be faster. How much faster depends on a lot of variables, e.g. the number and size of your indexes. The chart below shows a nice linear slope for both methods, and yielding on average a 3.48 increase in speed of pg_upgrade versus pg_dump:

pg_dump versus pg_upgrade
--scaleDatabase sizepg_dump
1001.4 GB210.074.72.82
1502.1 GB305.079.43.86
2002.8 GB457.6122.23.75
2503.5 GB636.1172.13.70
3004.2 GB832.2215.13.87
3504.9 GB1098.8320.73.43
4005.7 GB1172.7361.43.25
4506.4 GB1340.2426.73.15
5007.1 GB1509.6476.33.17
5507.8 GB1664.0480.03.47
6008.5 GB1927.06073.17

If you graph it out, you can see both of them having a similar slope, but with pg_upgrade as the clear winner:

I mentioned earlier that there were some other optimizations that could be done to make the pg_dump slightly faster. As it turns out, pg_upgrade can also be made faster. Absolutely, beautifully, insanely faster. All we have to do is add the --link argument. What this does is rather than copying the data files, it simply links them via the filesystem. Thus, each large data file that makes up the majority of a database's size takes a fraction of a second to link to the new version. Here are the new numbers, generated simply by adding a --link to the pg_upgrade command from above:

pg_upgrade --link is crazy fast
--scaleDatabase sizepg_upgrade --link
1001.4 GB12.9
1502.1 GB13.4
2002.8 GB13.5
2503.5 GB13.2
3004.2 GB13.6
3504.9 GB14.4
4005.7 GB13.1
4506.4 GB13.0
5007.1 GB13.2
5507.8 GB13.1
6008.5 GB12.9

No, those are not typos - an average of thirteen seconds despite the size of the database! The only downside to this method is that you cannot access the old system once the new system starts up, but that's a very small price to pay, as you can easily backup the old system first. There is no point in graphing these numbers out - just look at the graph above and imagine a nearly flat line traveling across the bottom of the graph :)

Are there any other options that can affect the time? While pgbench has a handy --foreign-keys argument I often use to generate a more "realistic" test database, both pg_dump and pg_upgrade are unaffected by any numbers of foreign keys. One limitation of pg_upgrade is that it cannot change the --checksum attribute of a database. In other words, if you want to go from a non-checksummed version of Postgres to a checksummed version, you need to use pg_dump or some other method. On the plus side, my testing found negligible difference between upgrading a checksummed versus a non-checksummed version.

Another limitation of the pg_upgrade method is that all internal stats are blown away by the upgrade, so the database starts out in a completely unanalyzed state. This is not as much an issue as it used to be, as pg_upgrade will generate a script to regenerate these stats, using the handy --analyze-in-stages argument to vacuum. There are a few other minor limitations to pg_upgrade: read the documentation for a complete list. In the end, pg_upgrade is extraordinarily fast and should be your preferred method for upgrading. Here is a final chart showing the strengths and weaknesses of the major upgrade methods.

Postgres upgrade methods compared
  • Always works
  • Battle tested
  • Slowest method
  • Maximum downtime
  • Requires lots of disk space
  • Very fast
  • --link mode super fast
  • Cannot always be used (finicky)
  • Stats are lost
  • Minimal but non-zero downtime
  • Handles complex cases
  • Zero-downtime possible
  • Complex to setup
  • Requires primary keys on large tables

(As an addendum of sorts, pg_upgrade is fantastic, but the Holy Grail is still out of sight: true in-place upgrades. This would mean dropping in a new major version (similar to the way revisions can be dropped in now), and this new version would be able to read both old and new data file formats, and doing an update-on-write as needed. Someday!)

Einladung zur Django-UserGroup Hamburg am 08. Juli

By Arne Brodowski from Django community aggregator: Community blog posts. Published on Jul 01, 2015.

Das nächste Treffen der Django-UserGroup Hamburg findet am Mittwoch, den 08.07.2015 um 19:30 statt. Achtung: Neue Location! Dieses Mal treffen wir uns in den Räumen der Smaato Inc., Valentinskamp 70, Emporio 19. Stock in 20355 Hamburg.

Auf diesem Treffen gibt es einen Vortrag über TDD für APIs von Michael Kuehne.

Bitte seid um ca. 19:20 unten im Foyer, wir fahren dann gemeinsam in den 19. Stock. Wer später kommt, sagt bitte am Empfang bescheid, dass er zur Smaato Inc möchte und wird dann hoch gelassen.

Eingeladen ist wie immer jeder der Interesse hat sich mit anderen Djangonauten auszutauschen. Eine Anmeldung ist nicht erforderlich, hilft aber bei der Planung.

Weitere Informationen über die UserGroup gibt es auf unserer Webseite

Die Organisation der Django-UserGroup Hamburg findet ab jetzt über Meetup statt. Um automatisch über zukünftige Treffen informiert zu werden, werdet bitte Mitglied in unserer Meetup-Gruppe:

Observations on the nature of time. And javascript.

By Isotoma Blog from Django community aggregator: Community blog posts. Published on Jun 30, 2015.

In the course of working on one of our latest projects, I picked up an innocuous looking ticket that said: “Date pickers reset to empty on form submission”. “Easy”, I thought. It’s just the values being lost somewhere in form validation.And then I saw the ‘in Firefox and IE’ description. Shouldn’t be too hard, it’ll be a formatting issue or something, maybe even a placeholder, right?

Yeah, no.

Initial Investigations

Everything was fine in Chrome, but not in Firefox. I confirmed the fault also existed in IE (and then promptly ignored IE for now).

The responsible element looked like this:
<input class="form-control datepicker" data-date-format="{{ js_datepicker_format }}" type="date" name="departure_date" id="departure_date" value="{{ form.departure_date.value|default:'' }}">

This looks pretty innocent. It’s a date input, how wrong can that be?

Sit comfortably, there’s a rabbit hole coming up.

On Date Inputs

Date type inputs are a relatively new thing, they’re in the HTML5 Spec. Support for it is pretty mixed. This jumps out as being the cause of it working in Chrome, but nothing else. Onwards investigations (and flapping at colleagues) led to the fact that we use bootstrap-datepicker to provide a JS/CSS based implementation for the browsers that have no native support.

We have an isolated cause for the problem. It is obviously something to do with bootstrap-datepicker, clearly. Right?

On Wire Formats and Localisation

See that data-date-format="{{ js_datepicker_format }}" attribute of the input element. That’s setting the date format for bootstrap-datepicker. The HTML5 date element doesn’t have similar. I’m going to cite this stackoverflow answer rather than the appropriate sections of the documentation. The HTML5 element has the concept of a wire format and a presentation format. The wire format is YYYY-MM-DD (iso8601), the presentation format is whatever the user has the locale set to in their browser.

You have no control over this, it will do that and you can do nothing about it.

bootstrap-datepicker, meanwhile has the data-date-format element, which controls everything about the date that it displays and outputs. There’s only one option for this, the wire and presentation formats are not separated.

This leads to an issue. If you set the date in YYYY-MM-DD format for the html5 element value, then Chrome will work. If you set it to anything else, then Chrome will not work and bootstrap-datepicker might, depending on if the format matches what is expected.

There’s another issue. bootstrap-datepicker doesn’t do anything with the element value when you start it. So if you set the value to YYYY-MM-DD format (for Chrome), then a Firefox user will see 2015-06-24, until they select something, at which point it will change to whatever you specified in data-date-format. But a Chrome user will see it in their local format (24/06/2015 for me, GB format currently).

It’s all broken, Jim.

A sidetrack into Javascript date formats.

The usual answer for anything to do with dates in JS is ‘use moment.js’. But why? It’s a fairly large module, this is a small problem, surely we can just avoid it?

Give me a date:

>>> var d = new Date();

Lets make a date string!

>>> d.getYear() + d.getMonth() + d.getDay() + ""

Wat. (Yeah, I know that’s not how you do string formatting and therefore it’s my fault.)

>>> d.getDay()

It’s currently 2015-06-24. Why 3?.

Oh, that’s day of the week. Clearly.

>>> d.getDate()

The method that gets you the day of the month is called getDate(). It doesn’t, you know, RETURN A DATE.

>>> var d = new Date('10-06-2015')
>>> d
Tue Oct 06 2015 00:00:00 GMT+0100 (BST)

Oh. Default date format is US format (MM-DD-YYYY). Right. Wat.

>>> var d = new Date('31-06-2015')
>>> d
Invalid Date

That’s… reasonable, given the above. Except that’s a magic object that says Invalid Date. But at least I can compare against it.

>>> var d = new Date('31/06/2015')
>>> d
Invalid Date

Oh great, same behaviour if I give it UK date formats (/ rather than -). That’s okay.

>>> var d = new Date('31/06/2015')
>>> d
"Date 2017-07-05T23:00:00.000Z"


What’s going on?

The difference here is that I’ve used Firefox, the previous examples are in Chrome. I tried to give an explanation of what that’s done, but I actually have no idea. I know it’s 31 months from something, as it’s parsed the 31 months and added it to something. But I can’t work out what, and I’ve spent too long on this already. Help. Stop.

So. Why you should use moment.js. Because otherwise the old great ones will be summoned and you will go mad.


ISO Date Format is not supported in Internet Explorer 8 standards mode and Quirks mode.


The Actual Problem

Now I knew all of this, I could see the problem.

  1. The HTML5 widget expects YYYY-MM-DD
  2. The JS widget will set whatever you ask it to
  3. We were outputting GB formats into the form after submission
  4. This would then be an incorrect format for the HTML 5 widget
  5. The native widget would not change an existing date until a new one is selected, so changing the output format to YYYY-MM-DD meant that it changed when a user selected something.

A Solution In Two Parts

The solution is to standardise the behaviour and formats across both options. Since I have no control over the HTML5 widget, looks like it’s time to take a dive into bootstrap-datepicker and make that do the same thing.

Deep breath, and here we go…

Part 1

First job is to standardise the output date format in all the places. This means that the template needs to see a datetime object, not a preformatted date.

Once this is done, can feed the object into the date template tag, with the format filter. Which takes PHP date format strings. Okay, that’s helpful in 2015. Really.

Figured that out, changed the date parsing Date Input Formats and make sure it has the right ISO format in it.

That made the HTML5 element work consistently. Great.

Then, to the javascript widget.

bootstrap-datepicker does not do anything with the initial value of the element. To make it behave the same as the HTML5 widget, you need to:

1. Get the locale of the user

2. Get the date format for that locale

3. Set that as the format of the datepicker

4. Read the value

5. Convert the value into the right format

6. Call the setValue event of the datepicker with that value

This should be relatively straightforward, with a couple of complications.

  1. moment.js uses a different date format to bootstrap-datepicker
  2. There is no easy way to get a date format string, so a hardcoded list is the best solution.

// taken from bootstrap-datepicker.js
function parseFormat(format) {
    var separator = format.match(/[.\/\-\s].*?/),
        parts = format.split(/\W+/);
    if (!separator || !parts || parts.length === 0){
        throw new Error("Invalid date format.");
    return {separator: separator, parts: parts};

var momentUserDateFormat = getLocaleDateString(true);
var datepickerUserDateFormat = getLocaleDateString(false);

$datepicker.each(function() {
    var $this = $(this);
    var presetData = $this.val();
    $'datepicker').format = parseFormat(datepickerUserDateFormat);
    if (presetData) {
        $this.datepicker('setValue', moment(presetData).format(momentUserDateFormat));

A bit of copy and paste code from the bootstrap-datepicker library, some jquery and moment.js and the problem is solved.

Part 3

Now we have the dates displaying in the right format on page load, we need to ensure they’re sent in the right format after the user has submitted the form. Should just be the reverse operation.

 function rewriteDateFormat(event) {
    var $this = $(;
    if ($this.val()) {
        var momentUserDateFormat = getLocaleDateString(true);
        $this.val(moment($this.val(), [momentUserDateFormat, 'YYYY-MM-DD']).format('YYYY-MM-DD'));

$datepicker.each(function() {
    var $this = $(this);
     // set the form handler for rewriting the format on submit
    var $form = $this.closest('form');
    $form.on('submit', {input: this}, rewriteDateFormat);

And we’re done.


Some final points that I’ve learnt.

  1. Always work in datetime objects until the last possible point. You don’t have to format them.
  2. Default to ISO format unless otherwise instructed
  3. Use parsing libraries


Leo Hsu and Regina Obe: PostgreSQL OGR FDW update and PostGIS 2.2 news

From Planet PostgreSQL. Published on Jun 29, 2015.

PostGIS 2.2 is planned to reach feature freeze June 30th 2015 so we can make the September PostgreSQL 9.5 curtain call with confidence. Great KNN enhancements for PostgreSQL 9.5 only users. I've been busy getting all my ducks lined up. A lot on tiger geocoder and address standardizer extension to be shipped with windows builds, story for later. One other feature we plan to ship with the windows PostGIS 2.2 builds is the ogr_fdw ogr_fdw Foreign data wrapper extension. I've been nagging Paul Ramsey a lot about issues with it, this in particular, and after some prodding, he finally put his nose in and fixed them and pinged Even Rouault for some help on a GDAL specific item.

Needless to say, I've been super happy with the progress and support I've gotten with ogr_fdw development and really enjoying my ogr_fdw use. The XLSX reading a file saved after the connection was open required a fix in GDAL 2.0 branch (which missed GDAL 2.0.0 release, so because of this, this new package contains a GDAL 2.0.1ish library. Hopeful GDAL 2.0.1 will be out before PostGIS 2.2.0 comes out so I can release without guilt with this fix.

Continue reading "PostgreSQL OGR FDW update and PostGIS 2.2 news"

Michael Paquier: Postgres 9.5 feature highlight: New JSONB functions and operators

From Planet PostgreSQL. Published on Jun 28, 2015.

jsonb is coming up with a set of new features in Postgres 9.5. Most of them have been introduced by the following commit:

commit: c6947010ceb42143d9f047c65c1eac2b38928ab7
author: Andrew Dunstan <>
date: Tue, 12 May 2015 15:52:45 -0400
Additional functions and operators for jsonb

jsonb_pretty(jsonb) produces nicely indented json output.
jsonb || jsonb concatenates two jsonb values.
jsonb - text removes a key and its associated value from the json
jsonb - int removes the designated array element
jsonb - text[] removes a key and associated value or array element at
the designated path
jsonb_replace(jsonb,text[],jsonb) replaces the array element designated
by the path or the value associated with the key designated by the path
with the given value.

Original work by Dmitry Dolgov, adapted and reworked for PostgreSQL core
by Andrew Dunstan, reviewed and tidied up by Petr Jelinek.

Note that some slight modifications have been done after this commit though. So the list of new operators and functions presented here is not exactly the one listed in this commit log but the one that will be included in Postgres 9.5 alpha 1 that will be released next week. Also, something worth mentioning is that portion of this work is available as the extension jsonbx that is compatible even with 9.4 installations (see here).

So, 4 new operators have been added in the existing jsonb set in 9.5.

jsonb || jsonb for concatenation on two jsonb fields, where two things can be noted. First, key name ordering is done depending on their names (this is not surprising as on-disk-format is a parsed tree). Then, the last value of a given key will be used as jsonb enforces key uniqueness, even of course if values are of json type.

=# SELECT '{"a1":"v1","a3":"v3"}'::jsonb || '{"a2":"v2"}'::jsonb AS field;
 {"a1": "v1", "a2": "v2", "a3": "v3"}
(1 row)
=# SELECT '{"a1":{"b1":"y1","b2":"y2"},"a2":"v2"}'::jsonb ||
          '{"a1":"v1"}'::jsonb AS field;
 {"a1": "v1", "a2": "v2"}
(1 row)

jsonb - text, which can be used to remove in a jsonb field a given key at the top-level of the field (no nested operations possible here).

=# SELECT '{"a1":"v1","a2":"v2"}'::jsonb - 'a1' AS field;
 {"a2": "v2"}
(1 row)

jsonb - int, to remove in a jsonb array field the value matching the given position, negative values counting from the end of the array, and 0 as the first element. If the position matches no existing value, the array remains the same.

=# SELECT '["a", "b"]'::jsonb - 0 AS pos_0,
    '["a", "b"]'::jsonb - 1 AS pos_1,
    '["a", "b"]'::jsonb - 2 AS pos_2,
    '["a", "b"]'::jsonb - -1 AS pos_less_1;
 pos_0 | pos_1 |   pos_2    | pos_less_1
 ["b"] | ["a"] | ["a", "b"] | ["a"]
(1 row)

jsonb #- text[] (operator has been renamed for clarity after more discussion) to remove a key in the given nested path. An integer can as well be used to remove an element in an array at the wanted position. Here is for example how this works with a mix of nested arrays and json values:

=# SELECT '{"a1":{"b1":"y1","b2":["c1", {"c2":"z1","c3":"z3"}]},"a2":"v2"}'::jsonb #- '{a1}' AS level_1;
 {"a2": "v2"}
(1 row)
=# SELECT '{"a1":{"b1":"y1","b2":["c1", {"c2":"z1","c3":"z3"}]},"a2":"v2"}'::jsonb #- '{a1,b2}' AS level_2;
 {"a1": {"b1": "y1"}, "a2": "v2"}
(1 row)
=# SELECT '{"a1":{"b1":"y1","b2":["c1", {"c2":"z1","c3":"z3"}]},"a2":"v2"}'::jsonb #- '{a1,b2,1}' AS level_3;
 {"a1": {"b1": "y1", "b2": ["c1"]}, "a2": "v2"}
(1 row)
=# SELECT '{"a1":{"b1":"y1","b2":["c1", {"c2":"z1","c3":"z3"}]},"a2":"v2"}'::jsonb #- '{a1,b2,1,c3}' AS level_4;
 {"a1": {"b1": "y1", "b2": ["c1", {"c2": "z1"}]}, "a2": "v2"}
(1 row)

Then there are three new functions.

jsonb_pretty(), to format a jsonb value with a nice json indentation. This will be useful for many applications aiming at having nice-looking data output.

=# SELECT jsonb_pretty('{"a1":{"b1":"y1","b2":["c1", {"c2":"z1","c3":"z3"}]},"a2":"v2"}'::jsonb);
 {                          +
     "a1": {                +
         "b1": "y1",        +
         "b2": [            +
             "c1",          +
             {              +
                 "c2": "z1",+
                 "c3": "z3" +
             }              +
         ]                  +
     },                     +
     "a2": "v2"             +
(1 row)

jsonb_set() to update a value for a given key. This function has support for nested keys as well as a path to redirect to a nested key can be specified, even within an array. The second argument of the function defines the path where the key is located, and the third argument assigns the new value. A fourth argument can be specified as well to enforce the creation of a new key/value pair if the key does not exist in the specified path. Default is true.

-- Here the value in path a1->b2 gets updated.
=# SELECT jsonb_set('{"a1":{"b1":"y1","b2":"y2"},"a2":"v2"}'::jsonb,
                    '{a1,b2}', '"z2"');
 {"a1": {"b1": "y1", "b2": "z2"}, "a2": "v2"}
(1 row)
-- Here a new key is added in a1 as b3 does not exist.
=# SELECT jsonb_set('{"a1":{"b1":"y1","b2":"y2"},"a2":"v2"}'::jsonb,
                    '{a1,b3}', '"z3"', true);
 {"a1": {"b1": "y1", "b2": "y2", "b3": "z3"}, "a2": "v2"}
(1 row)
-- Key does not exist in path a1, do not create it then.
=# SELECT jsonb_set('{"a1":{"b1":"y1","b2":"y2"},"a2":"v2"}'::jsonb,
                    '{a1,b3}', '"z3"', false);
 {"a1": {"b1": "y1", "b2": "y2"}, "a2": "v2"}
(1 row)

Finally there is jsonb_strip_nulls() (available as well for json data type with json_strip_nulls) to remove key/value pairs with NULL values. This function does through all the parsed tree levels, and does not affect NULL values in arrays.

=# SELECT json_strip_nulls('{"a1":{"b1":"y1","b2":null},"a2":null}');
(1 row)
=# SELECT json_strip_nulls('{"a1":[1,null,"a"],"a2":null}');
(1 row)

Each one of those functions is going to alleviate the amount of code that applications had to create previously with some equivalent in either plpgsql/sql on backend side, or even things on frontend side (think jsonb_pretty for user-facing applications for example), so that's definitely useful. Note that Postgres 9.5 alpha 1 will be released next week. So if you have remarks or complaints about its features (not limited to this ticket), don't hesitate to contact community about that. And be sure to test this new stuff, again and again.

Josh Berkus: NDAs at SFPUG: Survey Results

From Planet PostgreSQL. Published on Jun 26, 2015.

It's boomtime in San Francisco, which means we're also full swing into The Cult of the NDA.  This includes many of our venues for SFPUG meetups; they require signing a confidentiality disclaimer before entering their office.  While I question the utility of this, since these hosts are providing us with free space, food, and drink, I'm really not in a position to argue.  So I launched a survey a month ago to see how much of a problem this is for our members.  I thought it might be useful to share the results with other PUG leaders so that they can also develop policies around this.

Here's the results. First, let me give you the overall results in a pie chart.  Scroll down to the bottom of my post for my suggested policy conclusions.

Incidentally, these are all graphed using iPython Notebook and Pylab, which is awesome way to do one-off graphs.  Here's the code for that graph:

    %matplotlib inline
    import psycopg2
    from pylab import *

    conn=psycopg2.connect('dbname=sfpug host=')
    cur = conn.cursor()
    cur.execute("""SELECT att, count(*) as members
        FROM ndasurvey GROUP BY att ORDER BY att""");

    labels = []
    fracs = []
    explode = []
    for rec in cur:

    figure(1, figsize=(6,6))
    pie(fracs, explode=explode, labels=labels,
                    autopct='%1.0f%%', shadow=True, startangle=90)
    title('Attitudes Towards Venue NDAs: Overall')

So overall we have a somewhat split distribution.  BTW, here's the definitions of the attitudes:

  • won't attend: I won't go to a meetup which requires signing
  • needs review: I/my employer must review the agreement first
  • depends on text: depends on what agreement says
  • go anway: I don't like them, but I'll still go
  • don't care: I don't care, whatever
Does this differ for which of our three divisions (San Francisco, East Bay, and South Bay) it is?

So, East Bay attendees don't seem to care in general, and South Bay attendees are much more concerned about reviewing the text of the confidentiality statement.  Which makes sense, if you think about it.

The other division we have is how frequent of an attendee someone is; is there a difference in attitude about our regulars?  Apparently there is:

... so the "won't attend" group is also, generally, a group which doesn't regularly go to SFPUG meetups.  What does this tell us for a policy for SFPUG? 

Well, the 5 options actually fall into three broader groups: those who will go regardless, those who need to see the text of the agreement, and those who won't go.  What if we group it that way?

So from the look of things, 83% of SFPUG will go to a meetup with an confidentiality agreement, provided that they get to review it first.  This suggests that a good policy for SFPUG should be:

"Venues which require confidentiality agreements are acceptable, if not preferred, provided that the text of the agreement is disclosed to the group beforehand."

It also suggests that I and the other leaders should be dilligent in finding out about such agreements whenever booking a venue.

Oh, and for full data, here's the population breakdown of the above:

Josh Berkus: Elephants in Containers

From Planet PostgreSQL. Published on Jun 26, 2015.


In the wake of DockerCon, the July meetup in San Francisco will be all about Linux Containers:

  • Chris Winslett of will present Governor, an auto-failover system for managing a cluster of PostgreSQL replicas.
  • Madhuri Yechuri of ClusterHQ will explain Flocker, which uses the brand-new plugin interface for Docker in order to support volume management and data migration for containers.

If you're not familiar with all of this containerish stuff, Josh Berkus will do a 5-minute primer on using Postgres with Docker at the beginning of the session. Hosted by in their new Rincon Cafe.

(note: Salesforce will require attendees to RSVP with their full names, and sign a confidentiality statement, which will be supplied to attendees in advance)

Shaun M. Thomas: PG Phriday: 10 Ways to Ruin Performance: In The Loop

From Planet PostgreSQL. Published on Jun 26, 2015.

As a database, PGDB (PostgreSQL) is fairly standard in its use of SQL. Developers of all colors however, might have trouble switching gears and thinking in set operations, since so many language constructs focus on conditionals and looping. Last week in the performance pitfalls series, we discussed a bit of Set Theory, and how ignorance of its implications can be disastrous. But what about the more mundane?

What happens, for instance, when we treat a database like a programming language?

This time, the example tables will emulate a pretty terrible ordering system with a user table of 1000 users, and a million orders distributed over roughly the last three years. Here it is, in all its glory:

    user_id      SERIAL       NOT NULL,
    username     VARCHAR      NOT NULL,
    password     VARCHAR      NOT NULL,
    last_order   TIMESTAMPTZ  NULL,
    created_dt   TIMESTAMPTZ  NOT NULL DEFAULT now(),
    modified_dt  TIMESTAMPTZ  NOT NULL DEFAULT now()
INSERT INTO sys_user (username, password, created_dt, modified_dt)
SELECT 'user' ||,
       md5('use-bcrypt-instead' || 'user' || || 'somepassword'),
       now() - ( % 1000 || 'd')::INTERVAL,
       now() - ( % 100 || 'd')::INTERVAL
  FROM generate_series(1, 1000) a(id);
ALTER TABLE sys_user ADD CONSTRAINT pk_user_id
      PRIMARY KEY (user_id);
ANALYZE sys_user;
CREATE TABLE sys_order
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    user_id      INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now(),
    valid_dt     TIMESTAMPTZ  NULL
INSERT INTO sys_order (product_id, item_count, user_id, order_dt, valid_dt)
SELECT ( % 100000) + 1, ( % 100) + 1, ( % 1000) + 1,
       now() - ( % 1000 || 'd')::INTERVAL,
       CASE WHEN % 499 = 0
            THEN NULL
            ELSE now() - (id % 999 || 'd')::INTERVAL
  FROM generate_series(1, 1000000) a(id);
ALTER TABLE sys_order ADD CONSTRAINT pk_order_id
      PRIMARY KEY (order_id);
CREATE INDEX idx_order_order_dt
    ON sys_order (order_dt DESC);
ANALYZE sys_order;

With that out of the way, please note that I do not now, nor will I ever advocate using md5 for passwords. Just don’t. Use crypt (bcrypt) from the pgcrypto extension instead. Note that in even this woefully simplified schema, I use a rudimentary salt. I’m not a security analyst so going further is beyond the scope of this article. But if you plan on writing any kind of secured application of any kind, read up on proper cryptography protocols and two-factor authentication. Never, never, never throw an authentication system together, or you will be hacked and every password in your system will be compromised. Just ask linkedin

Going back to our example, imagine we want to keep some denormalized information about the user in the user table itself. There’s a batch job that updates the user table and maintains the last time the user ordered something. The job does something like this:

\timing ON
DO $$
    i INT;
    FOR i IN SELECT user_id
               FROM sys_order
              WHERE order_dt > CURRENT_DATE - INTERVAL '1 month'
      UPDATE sys_user
         SET last_order = now()
       WHERE user_id = i;
$$ LANGUAGE plpgsql;
TIME: 4107.704 ms

“Contrived,” you scream! And you’re partially right. Unfortunately, this kind of scenario happens more often than you might expect. This is actually a modified example I caught in a system I was auditing two years ago, and its execution time was on the order of hours. In that scenario, there was also the language and network overhead to consider. It wasn’t a PGDB DO loop running directly in the database, but some Groovy code that was operating on every result from another query.

So while this example is clearly contrived, that situation definitely wasn’t. Part of the reason I began my education crusade within our company was to prevent such things from ever occurring. Which brings me to the alternative: anything else. Almost literally any other database-focused solution will work better than a language looping construct.

I’ve specifically advocated against using IN syntax in many contexts, but look what happens when we convert the loop to operating on the set of results in the loop query:

\timing ON
UPDATE sys_user
   SET last_order = now()
 WHERE user_id IN (
         SELECT user_id
           FROM sys_order o
          WHERE order_dt > CURRENT_DATE - INTERVAL '1 month'
TIME: 17.466 ms

This particular example is almost 250x faster, but I’ve seen instances where the duration differs by five or even six orders of magnitude. Remember that example I mentioned that used to run for hours? The post-modification execution time was fifteen seconds. If we had further modified the update job to use a temporary table, it would have been five seconds.

It all depends, of course, on the number of records involved, the speed of the underlying disks, the health and correlation of underlying indexes and data heap, and a lot of other factors. The lesson here is, as with the last article, to try and think in sets and groups, and perform operations on the whole collection of data at once.

But there’s yet another way I suggest using that’s even simpler. PGDB has UPDATE ... FROM syntax that lets you combine the two statements into a single operation that resembles a standard JOIN. Here’s what it looks like using our example:

\timing ON
UPDATE sys_user u
   SET last_order = now()
  FROM sys_order o
 WHERE o.user_id = u.user_id
   AND o.order_dt > CURRENT_DATE - INTERVAL '1 month';
TIME: 40.822 ms

Anything you can do in a SELECT can be translated into this syntax. And like the IN example, it’s compatible with EXPLAIN so we can investigate how many records are affected before executing it. But why is it slower? In this particular instance, the reason is due to the number of orders that occurred in the last month. Because of the data distribution, there are 32,000 orders from 32 customers. Unfortunately the planner sees the 32,000 first, and doesn’t know it’s only 32 distinct values, so it performs a sequence scan on the user table, inflating the overall execution time.

That won’t always be the case, and regardless, it’s still much faster than the alternative. The primary benefit from this syntax is its simplicity. If you can write a JOIN, you can use UPDATE ... FROM, and not have to worry about properly formatting or vetting a subselect.

But wait, there’s more! Examples like this are too straight-forward. Sometimes a lot of preliminary work is required before the final UPDATE. Sometimes this means using several intermediate temporary tables, or as before, building a program loop to perform several supplementary tasks and issuing individual updates.

With PGDB Common Table Expression (CTE) syntax, those prerequisite steps can be done inline. Check this out:

\timing ON
      FROM sys_order
     WHERE order_dt > CURRENT_DATE - INTERVAL '1 month'
UPDATE sys_user u
   SET modified_dt = now()
  FROM ord
 WHERE u.user_id = ord.user_id;
TIME: 19.849 ms

We’ve now done two things.

  1. Brought execution time back inline with the IN case.
  2. Broken the UPDATE into two distinct steps.

PGDB CTEs act as an optimization fence, because each segment is physically materialized as a temporary object. This object has very real dimensions, and the results can be fed into further CTE steps, or used as in our example, directly in a final query. Here’s how adding a second step might look:

\timing ON
WITH ord AS (
      FROM sys_order
     WHERE order_dt > CURRENT_DATE - INTERVAL '1 month'
good AS (
    SELECT user_id
      FROM ord
     WHERE user_id % 3 = 0
UPDATE sys_user u
   SET modified_dt = now()
  FROM good
 WHERE u.user_id = good.user_id;
TIME: 19.741 ms

Now the example really is contrived, because we could have just added the extra WHERE clause to the first step. But that’s because our example is hilariously basic. In a real system, there would be dozens or even hundreds of other tables, and a lot more potential for subsequent calculations, joins, and so on.

The point here isn’t that the second step was pointless, but that it was possible at all. This is what happens when you think in sets. Each result set is something that can be reduced or combined with another result set, and the combination can be used for further operations. Imagine it as object-oriented programming; each set is a blob that should be considered one entity with various attributes.

I’ll end with this bit of Python that should make it obvious what I mean:

# This is set theory:
foo = [1, 2, 3, 3, 3, 4]
print foo.count(3)
# This isn't:
total = 0
for i in foo:
  if i == 3:
    total += 1
print total

Rajeev Rastogi: Presented paper in PGCon 2015 on Native Compilation Technology

From Planet PostgreSQL. Published on Jun 25, 2015.

I am back from the PGCon 2015 and was fortunate enough to present my first paper on "Native Compilation" technology. Also got opportunity to meet most creamy  folks of PostgreSQL community, found everyone to be very polite and easy to go.

As part of this Native Compilation technology, I mostly focused on the Native Compilation of Relation, which we call it as Schema Binding.

Some of the details from presentation are as below (For complete presentation please visit Go Faster With Native Compilation):

Native Compilation:
Native Compilation is a methodology to reduce CPU instructions by executing only instruction specific to given query/objects unlike interpreted execution. Steps are:

  • Generate C-code specific to objects/query.
  • Compile C-code to generate DLL and load with server executable.
  • Call specialized function instead of generalized function.
Schema Binding:
Native Compilation of relation is called the Schema Binding. Since most of the properties of a particular remains same once it is created, so its data gets stored and accessed in the similar patter irrespective of any data. So instead of accessing tuples for same relation in generic way, we create a specialized access function for the relation during its creation and the same gets used for further query containing that table.

It gives performance improvement of upto 30% on standard TPC-H benchmark.

Ernst-Georg Schmid: nseq - A datatype for the efficient storage of nucleotide sequences in PostgreSQL

From Planet PostgreSQL. Published on Jun 25, 2015.

The nseq datatype allows to store DNA and RNA sequences consisting of the letters AGCT or AGCU respectively in PostgreSQL.

By encoding four bases per Byte, it uses 75 percent less space on disk than text. While this idea is obvious and far from being novel, it still is one of the most efficient compression schemes for DNA/RNA, if not the most efficient.

As of now, nseq only supports very basic native operations. The main shortcoming is, that it has to be decompressed into text hence expanding by 4x, for e. g. substring operations and the like.

This will change.

Enough said - here it is...

Raghavendra Rao: Compiling write-able mongo_fdw extension on binary format of PostgreSQL installation.

From Planet PostgreSQL. Published on Jun 25, 2015.

A short blog to enable write-able mongo_fdw extension in PostgreSQL 9.4. PostgreSQL provides a powerful feature called Foreign Data Wrappers (FDW), which allows DBAs to connect to other data sources from within PostgreSQL. Foreign Data Wrapper implementation is based on SQL/MED, that's supported from PostgreSQL 9.1 version onwards, which means we can now access remote database through PostgreSQL seamlessly. Today we have variety of FDW's available, in this blog, we will be compiling a latest version of write-able FDW "mongo_fdw" to access MongoDB.

Latest mongo_fdw extension is based on Mongo-c-driver and Libbson. To implement mongo_fdw, first we need to compile all the dependencies required by the extension. Below are the step-by-step execution on my CentOS 7 (64bit) machine with PostgreSQL 9.4 installed.

Step 1. First install dependency packages required by Mongo-c-Driver and Libbson.
yum install git automake autoconf libtool gcc
Step 2. Clone mongo_fdw repository from Github.
git clone
Step 3. Pre-compilation require pkgconfig/pkg-config (installed in Step 1) and PostgreSQL pg_config location set in the path.
[root@localhost ~]# export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
[root@localhost ~]# export PATH=/opt/PostgreSQL/9.4/bin:$PATH

[root@localhost mongo_fdw]# type pg_config
pg_config is /opt/PostgreSQL/9.4/bin/pg_config
Step 4. Mongo_fdw compilation can be done manually or with the help of auto-compilation script ( provided in the bundle. Here, I will be using auto-compilation script, which will  download and install required mongo-c-driver and libbson libraries in default location(/usr/local/lib). For more details on compilation script refer to the documentation here.
cd mongo_fdw/
./ --with-master
make install
After compilation, we can notice the files created in PostgreSQL home directory.
-bash-4.2$ find $PWD -name "mongo*"
Fine, now we can create the extension in the database.
-bash-4.2$ psql
psql.bin (9.4.4)
Type "help" for help.

postgres=# create extension mongo_fdw;
ERROR: could not load library "/opt/PostgreSQL/9.4/lib/postgresql/": cannot open shared object file: No such file or directory
Oops...seems I forgot to set the library path for newly created and MongoDB libs. To enable libraries, PostgreSQL server should be restarted after setting the library path.
-bash-4.2$ export LD_LIBRARY_PATH=/opt/PostgreSQL/9.4/lib:/usr/local/lib
-bash-4.2$ /opt/PostgreSQL/9.4/bin/pg_ctl -D /opt/PostgreSQL/9.4/data/ start
server starting
Hope, this time there won't be any errors..
-bash-4.2$ psql
psql.bin (9.4.4)
Type "help" for help.

postgres=# create extension mongo_fdw;

postgres=# \dx
List of installed extensions
Name | Version | Schema | Description
adminpack | 1.0 | pg_catalog | administrative functions for PostgreSQL
mongo_fdw | 1.0 | public | foreign data wrapper for MongoDB access
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(3 rows)
Thats cool... we have mongo_fdw extension create in PostgreSQL server.

To play with the extension, you can refer to the documentation. [1],[2].


Shaun M. Thomas: PGCon 2015 Unconference: A Community

From Planet PostgreSQL. Published on Jun 23, 2015.

Well, I’ve returned from PGCon 2015 in Canada, and after a couple days to decompress, it’s time to share. I wrote about the PGCon 2014 unconference after returning to Chicago last year, so I felt it was only fitting that I start there. I feel as strongly now as I did a year ago, that directly interacting with the PostgreSQL maintainers at this level helps the community thrive. Even though PGCon is generally a more developer-focused conference, being able to brainstorm with the bigwigs, even if nothing comes of it, means the ideas have been given a fair shake.

The format this year was quite a bit different than last year. I attribute this to Josh Berkus once again, in what I see as his attempt to formalize the process and give it more weight. Indeed, it’s hard to argue with his results. Just take a look at the 2015 Unconference Wiki page. It’s a deluge of information I wish we had about the 2014 talks, from attendees and talk summaries, to relevant external links and the all-important schedule. I’m a bit biased in that regard because I tend to produce and consume vast swaths of excessive information, but it’s an excellent reflection on how much the PostgreSQL community values documentation in general.

Unfortunately due to inclement weather, I missed the voting process and the first day of talks entirely. I desperately missed watching the talk selection process, though Josh said they did a lot of that electronically because several people would be late to the conference. I’m not sure how I missed that, so I’ll blame email; it deserves it anyway. Regardless, after witnessing the 2014 talk selection, I stand by my earlier assertion that it’s a sight to behold. It warms my crooked old heart to watch people so excited about technology and the amicable interaction they have with the friendly developers.

Despite the setbacks, I did attend several chats on Wednesday, which is another departure from last year’s format. In 2014, the selection process and every talk were constrained to one long Saturday and was very ad-hoc. I can’t tell whether or not distributing everything across two days is an improvement, but it certainly worked in the favor of anyone offset by rain delays this year. And the information available on Wednesday was certainly copious. Those at the same sessions I watched got a summary of semiconductor storage variants, PostgreSQL and Docker, the possible future of horizontal scaling, how pg_shard is progressing, and how partitioning might be better integrated into the core (my personal favorite). All told, it was a good assortment and most of them were fairly informative.

Through all of these audiences, something felt different from before, and it took me a while to figure out what it was: spontaneity and interactivity. Every session I attended, barring Josh’s own topic on Docker, had slides. It’s extremely difficult to have community discussion or collaboration when there’s a single speaker pushing his or her own agenda. From what I saw this year, the majority of the Unconference was plagued by this propensity to allow the speaker an open forum, as if they had won a bonus PGCon slot. I get the impression that was not the intention of the format, and I’ve heard slides might be disallowed next year. If that’s the case, that’s an impressively prompt response, and suggests I wasn’t the only one who voiced the concern.

I also suspect the unconference was harmed by holding it before the primary conference itself. I think part of what made everything work last year, was that the unconference was a response to the primary talks. Once we had a chance to learn about the current direction of PostgreSQL, potential upcoming features, extensions, modules, kitchen sinks, and so on, everyone could discuss inherent implications. We bootstrapped our brainstorming with all of the exciting information presented in the conference, and organically produced further discussions based on it. Without that, the unconference was just… more talks, and unnecessarily covered some overlapping information. It’s impossible to know everything a speaker will include in a topic, so we were operating blindly in most cases. I think that might have contributed to the passive audiences; there was no direction to work with.

And that really is the challenge, isn’t it? Is it possible to wrangle a group of developers and DBAs into a room and encourage semi-focused conversations without allowing it to devolve into anecdotes and speculation? Yes! But can it be done without injecting a ringmaster into each session to ensure there’s some kind of convergence on the stated topic? I definitely don’t want the unconference to become the equivalent of a moderated internet forum, because that cheapens its impact. This is in many respects a once-in-a-lifetime opportunity for many attendees: to be part of PostgreSQL’s future. I’d hate for anyone to miss it because they didn’t want to listen to a bunch of loose squabbling.

And that’s why I have really high hopes for next year. The trick with anything is to find balance. The first attempt revealed the potential of the unconference format, the second gave it some necessary structure but may have overreached slightly, and the third will combine everything into a cohesive amalgam everyone can love. These people are all geniuses, and I have every bit of faith they’ll obliterate my tiny, little mind with something I never even considered. That’s what they do, it’s who they are.

Me? I just try to pay attention so I don’t miss anything. Join one of the mailing lists so you can, too. Also, don’t be afraid to attend a PostgreSQL User Group in your area. It’s not uncommon to see a committer there depending on the location; you don’t always have to attend a conference to bask in the glow of these luminous god creatures!

Wrapping Up CycleMento

By GoDjango - Django Screencasts from Django community aggregator: Community blog posts. Published on Jun 23, 2015.

in this video we wrap up the Building a Product series by doing an overview of topics we discussed in the previous 11 videos.
Watch Now...

Announcing the Evennia example-game project "Ainneve"

By Griatch's Evennia musings (MU* creation with Django+Twisted) from Django community aggregator: Community blog posts. Published on Jun 22, 2015.

The Evennia example-game project is underway!

I was quite impressed with the response I got on the mailing list to my call for developing an Evennia example game (see my Need your Help blog post).

The nature of the responses varied, many were from interested people with little to no experience in Evennia or Python whereas others had the experience but not the time to lead it. It was however clear that the interest to work on an "official" Evennia game is quite big.

I'm happy to announce, however, that after only a week we now have a solid lead developer/manager, George Oliver. Helping him on the technical/architecture side is Whitenoise (who, despite a barren github profile, is a professional developer).

George put together a game proposal based on the OpenAdventure rpg, an open-source (CC-SA) ruleset that is also found on github. The example game is to be named "Ainneve" and its development is found in a in a separate repository under the github Evennia organisation.

All the relevant links and future discussion can be found on the mailing list.

George and whitenoise have already made it clear that they aim to not only make Ainneve a good example Evennia game for others to learn from and build on, but to make the development itself a possibility for people of all skill levels to get involved. So get in touch with them if you are at all interested in Python, Evennia and mud development!

So thanks to George and whitenoise for taking this on, looking forward to see where it leads!

image from loveintoblender.

Bruce Momjian: PgCon 2015 Developer Meeting Photo

From Planet PostgreSQL. Published on Jun 20, 2015.

This nice photo was taken during this year's PgCon Developer Meeting. My family was surprised at how many other developers they have met over the years. A signed copy of the photo was sold for USD ~$500 at the conference's charity auction.

The meeting was very productive, and in fact the entire conference has been productive.

Andrew Dunstan: Conference Slides - 9.5 Json improvements and Rotfang FDW

From Planet PostgreSQL. Published on Jun 20, 2015.

Here are my slides from pgCon. First, my slides and Dmitry Dolgov's slides from the presentation on 9.5 Json Features, and also those from my lightning talk on the Rotfang Foreign Data Wrapper.

Reading MT940 files using Python

By from Django community aggregator: Community blog posts. Published on Jun 19, 2015.

Some time ago I wrote a library to read MT940 files with Python. While there are multiple libraries available for this target, none of the others really work properly and/or support all variants of the format.

The MT940 library I wrote is slightly different, it’s designed to be able to parse any MT940 file, regardless whether it’s correct or complete. The initial version of the library was very strict and only supported files that perfectly followed the standards, a few months after the release it became obvious that most banks either used different standards when implementing the standard or interpreted the standard different. Regardless, the library gave little to no results or even crashed on some MT940 files.

Upon reflection I rewrote nearly all of the code to have a script that is flexible enough to support any format (even supporting custom processors for specific format) and wrote test code that tested every MT940 file I could find on the web. The result… a library that parsers pretty much everything out there while still maintaining a reasonable amount of results.

Usage? As simple as you might imagine. After installing (pip install mt-940, note the dash) usage can be as simple as this:

import mt940
import pprint

transactions = mt940.parse('tests/jejik/abnamro.sta')

print 'Transactions:'
print transactions

for transaction in transactions:
    print 'Transaction: ', transaction

For more examples, have a look at the tests. For example, the preprocessor test:

import pytest
import mt940

def sta_data():
    with open('tests/jejik/abnamro.sta') as fh:

def test_pre_processor(sta_data):
    transactions = mt940.models.Transactions(processors=dict(

    assert['closing_balance'].amount.currency == 'USD'
    assert['opening_balance'].amount.currency == 'EUR'

def test_post_processor(sta_data):
    transactions = mt940.models.Transactions(processors=dict(

    assert 'closing_balance_day' not in

Andrew Dunstan: Rotfang, the random data generator

From Planet PostgreSQL. Published on Jun 19, 2015.

The other day I gave a lightning talk at pgCon about a Foreign Data Wrapper called Rotfang, which is for generating arbitrary amounts of random data. This is intended for generating test cases. The software supports a small number of data types natively, but also allows you to use user-supplied functions to generate data. It's available on bitbucket. A short example:

CREATE FUNCTION random_tstz (typmod int)
RETURNS timestamptz
SELECT now() - (random() * 20.0) * interval '1 year'

CREATE FOREIGN TABLE rand2 (b boolean, ts timestamptz)
SERVER rotfang
OPTIONS (maxrows '10', func_ts 'random_tstz');

SELECT * FROM rand2;

Shaun M. Thomas: PG Phriday: 10 Ways to Ruin Performance: Functionally Bankrupt

From Planet PostgreSQL. Published on Jun 19, 2015.

Functions are great. Having cut my teeth on a database that didn’t even provide the ability to define functions, I’ve come to almost take them for granted in PGDB (PostgreSQL). However, with this kind of ubiquity, sometimes they can be overused in ways that don’t seem to be part of the common programmer lexicon. In this week’s PG Phriday series on performance-killing missteps, I’m going to talk a bit about set theory, and how a certain amount of familiarity is necessary to properly interact with a database.

One of the axioms of set theory states that f(x) = y, and x !~ y. That is, a function applied to some value produces a value that may not be equivalent to the original. Put another way, a spayed pet does not have the same reproductive capacity as a regular pet, and is thus not equal to the original. In the context of a database, this is extremely relevant because equivalence is what makes calculations and restrictions possible.

Given this, consider what a database index does. When creating an index, the database takes the value of one or more columns and essentially builds a pointer tree to quickly locate a record in ln time. This is much faster than reading every record in a table and applying a filter to remove unwanted records. That’s something even novice developers tend to know. But they don’t know how functions modify this scenario.

Given the very simplified introduction to set theory, applying a function to the column value means the database must discard equality. Remember: x !~ y. The database has indexed x, not y. So when a function is used in a WHERE clause, any index on x will be ignored. This makes perfect sense, considering there are an infinite number of possible functions, and the output of each is indeterminate. It’s not possible to predict the end result of every possible function.

To further illustrate this, we’ll reuse one of our tried-and-true examples:

CREATE TABLE sys_order
    order_id     SERIAL       NOT NULL,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now(),
    valid_dt     TIMESTAMPTZ  NULL
INSERT INTO sys_order (product_id, item_count, order_dt, valid_dt)
SELECT ( % 100000) + 1, ( % 100) + 1,
       now() - (id % 1000 || 'd')::INTERVAL,
       CASE WHEN % 499 = 0
            THEN NULL
            ELSE now() - (id % 999 || 'd')::INTERVAL
  FROM generate_series(1, 1000000) a(id);
ALTER TABLE sys_order ADD CONSTRAINT pk_order_id
      PRIMARY KEY (order_id);
CREATE INDEX idx_order_order_dt
    ON sys_order (order_dt DESC);

Pay special attention to the index on order_dt. I’ve applied a modifier that builds the index in descending order, as it’s very common to retrieve the newest records of a table. This doesn’t actually affect the values being indexed, and is a very handy feature of PGDB. We’re also going to try and use this index, because we don’t want to search through one-million records any time we want to get a few recent data points.

Here’s a query that will count the number of orders yesterday:

  FROM sys_order
 WHERE date_trunc('day', order_dt) = CURRENT_DATE - INTERVAL '1 day';
                             QUERY PLAN
 Aggregate  (cost=29865.50..29865.51 ROWS=1 width=0)
            (actual TIME=989.773..989.774 ROWS=1 loops=1)
   ->  Seq Scan ON sys_order
         (cost=0.00..29853.00 ROWS=5000 width=0)
         (actual TIME=0.021..988.407 ROWS=1000 loops=1)
         FILTER: (date_trunc('day'::text, order_dt) = 
                 (('now'::cstring)::DATE - '1 day'::INTERVAL))
         ROWS Removed BY FILTER: 999000
 Planning TIME: 0.120 ms
 Execution TIME: 989.810 ms

From this output, we can see that PGDB ignored the index on order_dt and did exactly what we didn’t want. Instead of using an index to jump to the relevant values, it scanned the entire table and filtered out the values that didn’t apply. Yet the implications are actually much worse than that. The date_trunc function, even though it’s written in C, is not a free operation. So not only have we unnecessarily read a million rows, we applied a function to each and every row, just to see if the result is within the boundary we specified. And my example fits in memory; the situation degrades exponentially when that isn’t possible.

That’s insanely expensive, and our relatively small million-row table illustrates that well enough. Imagine the same operation on a table with 100M rows or more. So we lose twice: disk resources are wasted retrieving unwanted rows, and CPU time is consumed performing unnecessary function executions. Given how many times I’ve encountered this at several unrelated companies, it’s a pretty serious problem. Here’s how the query should look:

  FROM sys_order
 WHERE order_dt >= CURRENT_DATE - INTERVAL '1 day'
   AND order_dt < CURRENT_DATE;
                             QUERY PLAN
Aggregate  (cost=2564.11..2564.12 ROWS=1 width=0)
[ snip ]
 ->  Bitmap INDEX Scan ON idx_order_order_dt
       (cost=0.00..21.38 ROWS=894 width=0)
       (actual TIME=0.381..0.381 ROWS=1000 loops=1)
       INDEX Cond: ((order_dt >= 
                     (('now'::cstring)::DATE - '1 day'::INTERVAL))
                 AND (order_dt < ('now'::cstring)::DATE))
 Planning TIME: 0.211 ms
 Execution TIME: 5.855 ms

The overall execution plan is slightly more complicated since the index involved now, but note the execution time: it’s almost 200 times faster than the original. All we did was modify the query to use a range that includes all the possible date and time combinations for yesterday. We needed to do that for the same reason we’d tried date_trunc previously, but the end result is the same. The only difference is that this allows the database to use a value range scan on the index and obtain all matching rows immediately.

If you’ve fallen into this trap, don’t feel bad. I’ve seen everyone do this at least once. From newbies right out of college, to highly seasoned technical leads, and even including other DBAs, there doesn’t seem to be any discernible pattern. It’s too easy to frame a query without considering the underlying mechanisms that make everything work. I also want to point out that since PGDB supports functional indexes, it’s also possible to do something like this:

CREATE INDEX idx_order_order_dt
    ON sys_order (date_trunc('day', order_dt));

In this case, we’re simply indexing the resulting value of f(x), so as long as the function call is the same in any WHERE clauses, the index will be used. To PGDB, it’s all the same. If 99% of the development staff, the application itself, and stray dogs are all using a function instead of doing it the “right” way, it’s actually the DBA that is going against the tide.

The only reason I don’t tend to recommend this pattern, is that the functional index is generally too restrictive. What if we wanted to search for a four hour window during yesterday? Well, now we can’t, because the index is only relevant for the date information. The simple index case is applicable to far more potential scenarios, and in the database world, index versatility is extremely important. I try to ensure developers I work with are cognizant of the pitfalls of arbitrarily using functions to filter results.

After all, it’s better to learn these things preemptively!

Michael Paquier: Postgres 9.5 feature highlight: archive_mode = always

From Planet PostgreSQL. Published on Jun 19, 2015.

After seeing an email on pgsql-general about a user willing to be able to archive WAL from a standby to store them locally and to save bandwidth by only receiving the WAL segments through a WAL stream, let's talk about a new feature of Postgres 9.5 that will introduce exactly what this user was looking for, as known as being able to archive WAL from a standby to have more complicated archiving strategies. This feature has been introduced by this commit:

commit: ffd37740ee6fcd434416ec0c5461f7040e0a11de
author: Heikki Linnakangas <>
date: Fri, 15 May 2015 18:55:24 +0300
Add archive_mode='always' option.

In 'always' mode, the standby independently archives all files it receives
from the primary.

Original patch by Fujii Masao, docs and review by me.

As mentioned in the commit message, setting archive_mode = 'always' will make a standby receiving WAL from a primary server archive the segments whose reception has been completed. While it can be interesting for even a set of nodes running on the same host to have each of them archive independently WAL segments on different partitions, this becomes more interesting when nodes are on separate hosts to be able for example to reduce the bandwidth usage as the bandwidth necessary to archive the WAL segments on the standby host is directly included in the WAL stream that a standby gets from its root node, saving resources at the same time.

Let's have a look at how this actually works with a simple set of nodes, one master and one standby running on the same host, listening respectively to ports 5432 and 5433 for example. Each node runs the following archiving configuration:

$ psql -At -c 'show archive_command' -p 5432
cp -i %p /path/to/archive/5432/%f.master
$ psql -At -c 'show archive_command' -p 5433
cp -i %p /path/to/archive/5432/%f.standby
$ psql -At -c 'show archive_mode' -p 5432
$ psql -At -c 'show archive_mode' -p 5433

So with that, both the standby and its primary node will archive their WAL segments once they are considered as complete. And when enforcing a switch to the next segment like that:

$ cd /path/to/archive && ls -l
total 229384
-rw-------  1 michael  staff    16M Jun 19 16:06 000000010000000000000001.master
-rw-------  1 michael  staff   302B Jun 19 16:06 000000010000000000000002.00000028.backup
-rw-------  1 michael  staff    16M Jun 19 16:06 000000010000000000000002.master
-rw-------  1 michael  staff    16M Jun 19 16:06 000000010000000000000002.standby
-rw-------  1 michael  staff    16M Jun 19 16:07 000000010000000000000003.master
-rw-------  1 michael  staff    16M Jun 19 16:07 000000010000000000000003.standby
$ psql -At -c 'SELECT pg_switch_xlog()' -p 5432

The new segments have been both archived from the standby and the master, and they are identical:

$ cd /path/to/archive && ls -l
total 229384
-rw-------  1 michael  staff    16M Jun 19 16:06 000000010000000000000001.master
-rw-------  1 michael  staff   302B Jun 19 16:06 000000010000000000000002.00000028.backup
-rw-------  1 michael  staff    16M Jun 19 16:06 000000010000000000000002.master
-rw-------  1 michael  staff    16M Jun 19 16:06 000000010000000000000002.standby
-rw-------  1 michael  staff    16M Jun 19 16:07 000000010000000000000003.master
-rw-------  1 michael  staff    16M Jun 19 16:07 000000010000000000000003.standby
-rw-------  1 michael  staff    16M Jun 19 16:12 000000010000000000000004.master
-rw-------  1 michael  staff    16M Jun 19 16:12 000000010000000000000004.standby
$ [[ `md5 -q 000000010000000000000004.master` == \
     `md5 -q 000000010000000000000004.standby` ]] && \
  echo equal || echo not-equal

Have fun with that.

Denish Patel: Postgres Replication using Replication Slots

From Planet PostgreSQL. Published on Jun 18, 2015.

Postgres 9.4 introduced an awesome feature Replication Slots. This allows you to implement Postgres replication without using any external archive management tools. Yesterday, I presented three hours long tutorial on “Out of the box Postgres 9.4 Replication using Replication slots” at PgCon (Postgres conference) in Ottawa,Canada. If you want to follow along slides with VM, you … Continue reading Postgres Replication using Replication Slots

Django Birthday Party

By Revolution Systems Blog from Django community aggregator: Community blog posts. Published on Jun 17, 2015.

Django Birthday Party

Beyond Request-Response

By Andrew Godwin from Django community aggregator: Community blog posts. Published on Jun 17, 2015.

Examining one of Django's key abstractions and how it could be updated for a more modern age.

While I love Django dearly, and I think the rate of progress we keep on the project draws a fine balance between progress and backwards-compatibility, sometimes I look ahead to the evolving Web and wonder what we can do to adapt to some more major changes.

We're already well-placed to be the business-logic and data backend to more JavaScript-heavy web apps or native apps; things like Django REST Framework play especially well into that role, and for more traditional sites I still think Django's view and URL abstractions do a decent job - though the URL routing could perhaps do with a refresh sometime soon.

The gaping hole though, to me, was always WebSocket support. It used to be long-poll/COMET support, but we seem to have mostly got past that period now - but even so, the problem remains the same. As a framework, Django is tied to a strict request-response cycle - a request comes in, a worker is tied up handling it until a response is sent, and you only have a small number of workers.

Trying to service long-polling would eat through your workers (and your RAM if you tried to spin up more), and WebSockets are outside the scope of even WSGI itself.

That's why I've come up with a proposal to modify the way Django handles requests and views. You can see the full proposal in this gist, and there's a discussion thread on django-developers, but I wanted to make a blog post explaining more of my reasoning.

The Abstraction

I'm going to skip over why we would need support for these features - I think that's relatively obvious - and jump straight to the how.

In particular, the key thing here is that this is NOT making Django asynchronous in the way where we make core parts nonblocking, or make everything yield or take callbacks.

Instead, the key change is quite small - changing the core "chunk" of what Django runs from views to consumers.

Currently, a view takes a single request and returns a single response:

In my proposed new model, a consumer takes a single message on a channel and returns zero to many responses on other channels:

The code inside the consumer runs like normal Django view code, complete with things like transaction auto-management if desired - it doesn't have to do anything special or use any new async-style APIs.

On top of that, the old middleware-url-view model can itself run as a consumer, if we have a channel for incoming requests and a channel (per client) for outgoing responses.

In fact, we can extend that model to more than just requests and responses; we can also define a similar API for WebSockets, but with more channels - one for new connections, one for incoming data packets, and one per client for outgoing data.

What this means is that rather than just reacting to requests and returning responses, Django can now react to a whole series of events. You could react to incoming WebSocket messages and write them to other WebSockets, like a chat server. You could dispatch task descriptions from inside a view and then handle them later in a different consumer, once the response is sent back.

The Implementation

Now, how do we run this? Clearly it can't run in the existing Django WSGI infrastructure - that's tied to the request lifecycle very explicitly.

Instead, we split Django into three layers:

  • The interface layers, initially just WSGI and WebSockets at launch. These are responsible for turning the client connections into channel messages and vice-versa.
  • The channel layer, a pluggable backend which transports messages over a network - initially two backends, one database-backed and one redis-backed.
  • The worker layer, which are processes that loop and run any pending consumers when messages are available for them.

The worker is pretty simple - it's all synchronous code, just finding a pending message, picking the right consumer to run it, and running the function until it returns (remember, consumers can't block on channels, only send to them - they can only ever receive from one, the one they're subscribed to, precisely to allow this worker model and prevent deadlocks).

The channel layer is pluggable, and also not terribly complicated; at its core, it just has a "send" and a "receive_many" method. You can see more about this in the prototype code I've written - see the next section.

The interface layers are the more difficult ones to explain. They're responsible for interfacing the channel-layer with the outside world, via a variety of methods - initially, the two I propose are:

  • A WSGI interface layer, that translates requests and responses
  • A WebSocket interface layer, that translates connects, closes, sends and receives

The WSGI interface can just run as a normal WSGI app (it doesn't need any async code to write to a channel and then block on the response channel until a message arrives), but the WebSocket interface has to be more custom - it's the bit of code that lets us write our logic in clean, separate consumer functions by handling all of that connection juggling and keeping track of potentially thousands of clients.

I'm proposing that the first versions of the WebSocket layer are written in Twisted (for Python 2) and asyncio (for Python 3), largely because that's what Autobahn|Python supports, but there's nothing to stop someone writing an interface server that uses any async tech they like (even potentially another language, though you'd have to then also write channel layer bindings).

The interface layers are the glue that lets us ignore asynchrony and connection volumes in the rest of our Django code - they're the things responsible for terminating and handling protocols and interfacing them with a more standard set of channel interactions (though it would always be possible to write your own with its own channel message style if you wanted).

An end-user would only ever run premade one of them; they're the code that solves the nasty part of the common problem, and all the issues about tuning and tweaking them fall to Django - and I think that's the job of a framework, to handle those complicated parts for you.

Why Workers?

Some people will wonder why this is just a simple worker model - there's nothing particularly revolutionary here, and it's nowhere near rewriting Django to be "asynchronous" internally.

Basically, I don't think we need that. Writing asynchronous code correctly is difficult for even experienced programmers, and what would it accomplish? Sure, we'd be able to eke out more performance from individual workers if we were sending lots of long database queries or API requests to other sites, but Django has, for better or worse, never really been about great low-level performance.

I do think it will perform slightly better than currently - the channel layer, providing it can scale well enough, will "smooth out" the peaks in requests across the workers. Scaling the channel layer is perhaps the biggest potential issue for large sites, but there's some potential solutions there (especially as only the channels listened to by interface servers need to be global and not sharded off into chunks of workers)

What I want is the ability for anyone from beginner programmers and up to be able to write code that deals with WebSockets or long-poll requests or other non-traditional interaction methods, without having to get into the issues of writing async code (blocking libraries, deadlocks, more complex code, etc.).

The key thing is that this proposal isn't that big a change to Django both in terms of code and how developers interact with it. The new abstraction is just an extension of the existing view abstraction and almost as easy to use; I feel it's a reasonably natural jump for both existing developers and new ones working through tutorials, and it provides the key features Django is missing as well as adding other ways to do things that currently exist, like some tasks you might send via Celery.

Django will still work as it does today; everything will come configured to run things through the URL resolver by default, and things like runserver and running as a normal WSGI app will still work fine (internally, an in-memory channel layer will run to service things - see the proper proposal for details). The difference will be that now, when you want to go that step further and have finer control over HTTP response delays or WebSockets, you can now just drop down and do them directly in Django rather than having to go away and solve this whole new problem.

It's also worth noting that while some kind of "in-process" async like greenlets, Twisted or asyncio might let Django users solve some of these problems, like writing to and from WebSockets, they're still process local and don't enable things like chat message broadcast between different machines in a cluster. The channel layer forces this cross-network behaviour on you from the start and I think that's very healthy in application design; as an end developer you know that you're programming in a style that will easily scale horizontally.

Show Me The Code

I think no proposal is anywhere near complete until there's some code backing it up, and so I've written and deployed a first version of this code, codenamed channels.

You can see it on GitHub:

While this feature would be rolled into Django itself in my proposal, developing it as a third-party app initially allows much more rapid prototyping and the ability to test it with existing sites without requiring users to run an unreleased version or branch of Django.

In fact, it's running on this very website, and I've made a simple WebSocket chat server that's running at The code behind it is pretty simple; here's the file:

import redis
from channels import Channel

redis_conn = redis.Redis("localhost", 6379)

def ws_connect(path, send_channel, **kwargs):
    redis_conn.sadd("chatroom", send_channel)

def ws_receive(channel, send_channel, content, binary, **kwargs):
    # Ignore binary messages
    if binary:
    # Re-dispatch message
    for channel in redis_conn.smembers("chatroom"):
        Channel(channel).send(content=content, binary=False)

def ws_disconnect(channel, send_channel, **kwargs):
    redis_conn.srem("chatroom", send_channel)
    # NOTE: this does not clean up server crash disconnects,
    # you'd want expiring keys as well real life.

Obviously, this is a simple example, but it shows how you can have Django respond to WebSockets and both push and receive data. Plenty more patterns are possible; you could push out chat messages in a post_save signal hook, you could dispatch thumbnailing tasks when image uploads complete, and so on.

There's not enough space here for all the examples and options, but hopefully it's given you some idea what I'm going for. I'd also encourage you to, if you're interested, download and try the example code; it's nowhere near consumer ready yet, and I aim to get it much further and get better documentation soon, but the README should give you some idea.

Your feedback on the proposal and my alpha code is more than welcome; I'd love to know what you think, what you don't like, and what issues you're worried about. You can chime in on the django-developers thread, or you can email me personally at

Buildout and Django: djangorecipe updated for gunicorn support

By Reinout van Rees' weblog from Django community aggregator: Community blog posts. Published on Jun 15, 2015.

Most people in the Django world probably use pip to install everything. I (and the company were I work, Nelen & Schuurmans) use buildout instead. If there are any other buildout users left outside of zope/plone, I'd love to hear it :-)

First the news about the new update, after that I'll add a quick note about what's good about buildout, ok?

Djangorecipe 2.1.1 is out. The two main improvements:

  • Lots of old unused functionality has been removed. Project generation, for instance. Django's own startproject is good enough right now. And you can also look at cookiecutter. Options like projectegg and wsgilog are gone as they're not needed anymore.

  • The latest gunicorn releases didn't come with django support anymore. You used to have a bin/django run_gunicorn (or python run_gunicorn) management command, but now you just have to run bin/gunicorn yourproject.wsgi. And pass along an environment variable that points at your django settings.

    With the latest djangorecipe, you can add a scripts-with-settings = gunicorn option and it'll create a bin/gunicorn-with-settings script for you that sets the environment variable automatically. Handy!

Advantage of buildout. To me, the advantage of buildout is threefold:

  • Buildout is more fool-proof. With pip/virtualenv you should remember to activate the virtualenv. With buildout, the scripts themselves make sure the correct sys.path is set.

    With pip install something you shouldn't forget the -r requirements.txt option. With buildout, the requirement restrictions ("versions") are applied automatically.

    With pip, you need to set the django settings environment variable in production and staging. With buildout, it is just bin/django like in development: it includes the correct reference to the correct settings file automatically.

    There just isn't anything you can forget!

  • Buildout is extensible. You can extend it with "recipes". Like a django recipe that helps with the settings and so. Or a template recipe that generates an ngnix config based on a template with the django port and hostname already filled in from the buildout config file. Or a sysegg recipe that selectively injects system packages (=hard to compile things like numpy, scipy, netcdf4).

  • Buildout "composes" your entire site, as far as possible. Pip "just" grabs your python packages. Buildout can also build NPM and run grunt to grab your javascript and can automatically run bin/django collectstatic -y when you install it in production. And generate an nginx/apache file based on your config's gunicorn port. And generate a supervisord config with the correct gunicorn call with the same port number.

Of course there are drawbacks:

  • The documentation is definitively not up to the standards of django itself. Actually, I don't really want to point at the effectively unmaintained main documentation site at You need some experience with buildout to be able to get and keep it working.
  • Most people use pip.

Why do I still use it?

  • The level of automation you can get with buildout ("composability") is great.
  • It is fool-proof. One bin/buildout and everything is set up correctly. Do you trust every colleague (including yourself) to remember 5 different commands to set up a full environment?
  • If you don't use buildout, you have to use pip and virtualenv. And a makefile or something like that to collect all the various parts. Or you need ansible even to set up a local environment.
  • Syseggrecipe makes it easy to include system packages like numpy, scipy, mapnik and so on. Most pip-using web developers only need a handful of pure python packages. We're deep into GIS and numpy/gdal territory. You don't want to compile all that stuff by hand. You don't want to have to keep track of all the development header file packages!

So... hurray for buildout and for the updated djangorecipe functionality! If you still use it, please give me some feedback at or in the comments below. I've removed quite some old functionality and I might have broken some usecases. And buildout/django ideas and thoughts are always welcome.

Need your help

By Griatch's Evennia musings (MU* creation with Django+Twisted) from Django community aggregator: Community blog posts. Published on Jun 15, 2015.

This for all you developers out there who want to make a game with Evennia but are not sure about what game to make or where to start off.

We need an example game

One of the main critiques Evennia get from newbies is the lack of an (optional) full game implementation to use as an example and base to build from. So, Evennia needs a full, BSD-licensed example game. I'm talking "diku-like", something you could in principle hook up and allow players into within minutes of installing Evennia. The Tutorial world we already have is a start but it is more of a solo quest, it's not designed to be a full multiplayer game. Whereas Evennia supports other forms of MU* too, the idea is that the systems from a more "code-heavy" MUD could easily be extracted and adopted to a more freeform-style game whereas the reverse is not generally true.

The exact structure of such a game would be up to the person or team taking this on, but it should be making use of Evennia's api and come distributed as a custom game folder (the folder you get with evennia --init). We will set this up as a separate repository under the Evennia github organisation - a spin-off from the main evennia project, and maintained separately.

We need you!

Thing is, while I am (and, I'm sure other Evennia core devs) certainly willing to give considerable help and input on such a project, it's not something I have time to take the lead on myself. So I'm looking for enthusiastic coders who would be willing to step up to both help and take the lead on this; both designing and (especially) coding such an example game. Even if you have your own game in mind for the future, you still need to build most of these systems, so starting with a generic system will still help you towards that final goal - plus you get to be immortalized in the code credits, of course.

Suggestion for game

Being an example game, it should be well-documented and following good code practices (this is something we can always fix and adjust as we go though). The systems should be designed as stand-alone/modular as possible to make them easy to rip out and re-purpose (you know people will do so anyway). These are the general features I would imagine are needed (they are open to discussion):
  • Generic Tolkien-esque fantasy theme (lore is not the focus here, but it can still be made interesting)
  • Character creation module
  • Races (say, 2-3)
  • Classes (say 2-3)
  • Attributes and Skills (based on D&D? Limit number of skills to the minimal set)
  • Rule module for making skill checks, rolls etc (D&D rules?)
  • Combat system (twitch? Turn-based?)
  • Mobs, both friendly and aggressive, with AI
  • Trade with NPC / other players (money system)
  • Quest system
  • Eventual new GM/admin tools as needed
  • Small game world (batch-built) to demonstrate all features (of good quality to show off)
  • More? Less?

I'm interested!

Great! We are as a first step looking for a driven lead dev for this project, a person who has the enthusiasm, coding experience and drive to see the project through and manage it. You will (hopefully) get plenty of collaborators willing to help out but It is my experience that a successful hobby project really needs at least one person taking responsibility to "lead the charge" and having the final say on features: Collaborative development can otherwise easily mean that everyone does their own thing or cannot agree on a common course. This would be a spin-off from the main Evennia project and maintained separately as mentioned above.

Reply to this thread if you are willing to participate at any level to the project, including chipping in with code from your already ongoing development. I don't know if there'd be any "competition" over the lead-dev position but if multiple really enthusiastic and willing devs step forward we'll handle that then.

So get in touch!

Andrew Dunstan: What is a Set type?

From Planet PostgreSQL. Published on Jun 14, 2015.

Yesterday I proposed an Unconference talk about Set types, and I've had a couple of people ask me about what Set types are and what they would be for, so here is a brief rundown.

Say you have a table of things that you want to be able to apply some set of tags to. It might be blog posts, for example. The tags might be subject classifications, or reader ratings. Let's say for the sake of argument that it's going to be subject classifications, and that we actually have a (large) enum type to standardize that.

The classic way to do this is to have a table of {blog_post_id, tag} which will be unique on the combination. Up to now adding a tag is a nuisance - you need to make sure the tag doesn't already exist or you'll get a uniqueness violation. In 9.5 that's improved with INSERT ... ON CONFLICT IGNORE. Even then it's a bit more work that I really want to do. What I really want is something like this:
UPDATE blog_posts
SET subject_tags = subject_tags + 'my_new_subject'
WHERE post_id = 12345
and then I'll be able to do
SELECT * FROM blog_posts
WHERE subject_tags ? 'some subject'
It's also possible to do this, somewhat inefficiently, by using hstore or jsonb fields (in jsonb you'd just have a top level object). In both cases you would use dummy values - say make everything have a value of 1. But that's ugly, and error prone, and rather hackish.

Now I don't know if there is enough extra value here to justify the work and code maintenance involved. I've managed to live without it all these years without attacking what little hair I have left. But it's something where I have occasionally thought "I could do this so much more simply and elegantly if I had a set type." So I'd like to see if there is enough interest to make it into a real project.

Stanford Social Innovation Review Highlights Caktus' Work in Libya

By Caktus Consulting Group from Django community aggregator: Community blog posts. Published on Jun 12, 2015.

The Stanford Social Innovation Review recently featured Caktus in “Text the Vote” in Suzie Boss’ “What’s Next: New Approaches to Social Change” column. It describes how our team of developers built the world’s first SMS voter registration system in Libya using RapidSMS.

Article excerpt

In a classic leapfrogging initiative, Libya has enabled its citizens to complete voter registration via digital messaging technology.

In late 2013, soon after Vinod Kurup joined Caktus Group, an open source software firm based in Durham, N.C., he became the lead developer for a new app. The client was the government of Libya, and the purpose of the app would be to support voter registration for the 2014 national elections in that country. Bomb threats and protests in Libya made in-person registration risky. “I realized right away that this wasn’t your standard tech project,” says Kurup.

As a result of that project, Libya became the first country in the world where citizens can register to vote via SMS text messaging. By the end of 2014, 1.5 million people—nearly half of all eligible voters in Libya— had taken advantage of the Caktus-designed app during two national elections. “This never would have happened in a country like the United States, where we have established systems in place [for registering voters],” says Tobias McNulty, co-founder and CEO of Caktus. “Libya was perfect for it. They didn’t have an infrastructure. They were looking for something that could be built and deployed fast.”

To read the rest of article, visit the Stanford Social Innovation Review online.

Josh Berkus: Today's update release: NOW it's time to apply

From Planet PostgreSQL. Published on Jun 12, 2015.

You may have noticed that we've had a lot of churn in PostgreSQL update releases lately, doing three updates in less than two months.  We expect that we're done with that now, so it's time to get on updating all of your servers with today's update release.

Users of PostgreSQL 9.3 and 9.4 should update this weekend if possible.  This cumulative update fixes several problems with our "multixact" mechanism which have been plaguing Postgres since the release of 9.3.0.  While most users will not hit these issues, if you do, the bugs cause unrecoverable data corruption.  In other words, waiting "until you see a problem" is a really, really bad idea.

Additionally, one of the problems fixed requires cleaning up some garbage left by the version of pg_upgrade which shipped with Postgres versions 9.3.0 to 9.3.4.  If you used pg_upgrade to upgrade to one of those versions -- even if you subsequently upgraded to 9.4 -- then the server is liable to trigger a database-wide all-table autovacuum as soon as you restart after applying 9.3.9 or 9.4.2. 

If you are prepared for minutes to hours of degraded performance after applying the update, this is not a problem.  If it is a problem, though, you can do precautionary vacuuming before you apply the update.  Detailed instructions are supplied in the release notes on how to do this.

Now, go update those servers!

Bruce Momjian: Postgres 9.5 Draft Release Notes

From Planet PostgreSQL. Published on Jun 12, 2015.

The draft Postgres 9.5 release notes are now online. This document will be adjusted regularly until the final release, which is expected later this year.

Shaun M. Thomas: PG Phriday: 10 Ways to Ruin Performance: Out Of Order

From Planet PostgreSQL. Published on Jun 12, 2015.

There are a lot of database engines out there. As such, a developer or DBA will naturally have varying levels of experience with each, and some of this might conflict with how PGDB (PostgreSQL) operates. These kinds of embedded misunderstandings can cause potential issues by themselves, but in this particular case, corrective action is fairly simple.

So this week, I’d like to talk about indexes. Many people treat them as a “make query faster” button, and this often results in egregious misuse. Indexes take space, require CPU resources to maintain, and bring overhead that adversely affects INSERT and UPDATE performance. Most know these drawbacks and generally acknowledge that too many indexes can be detrimental. However, with some databases, even column order can make a drastic difference. PGDB is among these.

To illustrate the point, let’s make a simple test case of one-million records in an inventory tracking table of one thousand products over the course of about three years.

CREATE TABLE sys_inventory_snapshot
    product_id   SERIAL       NOT NULL,
    record_dt    TIMESTAMPTZ  NOT NULL,
    item_count   INT          NOT NULL,
    order_count  INT          NOT NULL
INSERT INTO sys_inventory_snapshot (
       product_id, item_count, order_count, record_dt
       now() - ( || 'd')::INTERVAL
  FROM generate_series(1, 1000) a(id),
       generate_series(1, 1000) b(id);
ALTER TABLE sys_inventory_snapshot ADD CONSTRAINT pk_inventory_snapshot
      PRIMARY KEY (product_id, record_dt);

At first glance, this looks pretty good. If we want information based on date or product, it’s right there. Things are even better if we have both values available! The query plan is also encouraging:

  FROM sys_inventory_snapshot
 WHERE record_dt >= CURRENT_DATE - INTERVAL '1 day';
                            QUERY PLAN                             
 Bitmap Heap Scan ON sys_inventory_snapshot
        (cost=22912.60..24880.07 ROWS=674 width=20)
        (actual TIME=106.051..106.530 ROWS=1000 loops=1)
   Recheck Cond: (record_dt >= (('now'::cstring)::DATE - '1 day'::INTERVAL))
   Heap Blocks: exact=7
   ->  Bitmap INDEX Scan ON pk_inventory_snapshot
         (cost=0.00..22912.43 ROWS=674 width=0)
         (actual TIME=106.027..106.027 ROWS=1000 loops=1)
         INDEX Cond: (record_dt >= (('now'::cstring)::DATE - '1 day'::INTERVAL))
 Planning TIME: 0.115 ms
 Execution TIME: 106.988 ms

Look at that! It’s using our date constraint to grab the most recent rows using the primary key for the table. That’s good, right?

Well, no. What’s actually happened here is somewhat esoteric, and the only real clue we have that something isn’t right, is the execution time. We just spent 100ms fetching 1000 records, which is downright offensive to an experienced DBA. Imagine this query in an application that’s executing it hundreds of times per second from multiple threads. In the database world, 100ms is basically forever. This table and query do not warrant such inexcusably bad performance.

So what happened? Buckets. Indexes take one value and point it to a bucket of other values. In the case of a composite index, one bucket value points to another bucket, which contains the values which satisfy both constraints. Now consider that record_dt is the second bucket. In order to ensure we get the most recent data for all products, we have to visit all of the top-level buckets before we can continue to the most recent date for each.

See the problem yet? Instead of simply jumping straight to the first applicable date value and retrieving the entire bucket, we have to visit 1000 buckets and then continue to 1000 smaller buckets. This is a fine access pattern if we had both the product_id and record_dt in the query, since that jumps straight to the values we want. But for everything else, it’s a non-ideal solution. It might be faster than a sequence scan if we’re lucky, but that’s not a guarantee. In some scenarios, it’s actually much slower.

PGDB veterans tend to solve this one of two ways:

  1. Reverse the column order of the index. If it turns out that 90% of queries use both columns, and 10% only use the second column, it’s better to flip them. The primary key in this case still fills its role, but without impacting performance of date searches.
  2. Add another index to represent the second column. This happens when the first column is actually used in exclusion or combination most of the time, but queries using only the second column are frequent enough to cause concern.

Because I’m lazy and don’t want to redefine the primary key, let’s take the second approach:

CREATE INDEX idx_inventory_snapshot_record_dt
    ON sys_inventory_snapshot (record_dt);
  FROM sys_inventory_snapshot
 WHERE record_dt >= CURRENT_DATE - INTERVAL '1 day';
                            QUERY PLAN                             
 INDEX Scan USING idx_inventory_snapshot_record_dt
    ON sys_inventory_snapshot
       (cost=0.43..28.24 ROWS=674 width=20)
       (actual TIME=0.038..0.663 ROWS=1000 loops=1)
   INDEX Cond: (record_dt >= (('now'::cstring)::DATE - '1 day'::INTERVAL))
 Planning TIME: 0.238 ms
 Execution TIME: 1.087 ms

That’s… quite a difference. If you were unfamiliar with databases or PGDB, you might be surprised by this result. But the reality is quite clear: simply listing a column in an index does not magically improve performance of queries that reference it. Order matters. In addition, the further a column strays from the principal position, the worse the degradation becomes.

I’ve seen indexes with six columns, and then a baffled developer asks why his query is slow. Well, his query only used the fourth column in the list. At that point, reading the entire 30-million row table and filtering for desired results would have been faster. The planner tries to account for this, but sometimes statistics or settings lead it astray, and it performs an index scan and dutifully follows all those buckets down the chain to the fourth column. It happens.

Now that you know why, you’re armed to avoid contributing to the problem. Think about indexes before adding them. Consider existing indexes and queries against their parent table, and ensure that column order faithfully represents the real access pattern. It’s an important—and often missed—optimization technique.

As a side note, this is why serious companies have at least one DBA on staff. Designing table structures and optimizing queries are hard enough, but invisible killers like these are more prevalent than one might realize. An experienced database-driven application developer might be familiar with these kind of minutiae, but that’s a lot to expect. Not everyone reads database performance articles, or have encountered and addressed related problems.

Until next week!

Denish Patel: Why should you consider upgrading Postgres again?

From Planet PostgreSQL. Published on Jun 12, 2015.

As title suggests,  if you are running Postgres in your environment, it is very important that you plan to upgrade Postgres with the latest announced release.

I upgraded Postgres recently in last couple of weeks. Should I upgrade again? Why?

Yes, you should plan to upgrade again.

Postgres released data corruption and security bug fix release on May 22nd and follow up release on June 4th to fix some of the issue introduced in previous release. However, the June 4th release did not fix the  multixact wraparound bugs (Thread1Thread2 ) introduced in recent versions.  The wraparound bug could cause problem starting Postgres after the crash , so it is critical to apply. The wraparound bug is more critical for 9.3 & 9.4 but it is very good idea to upgrade for other versions to make sure there isn’t any problem. Read the release notes for the further details/explanation.

Special care should be taken regarding vacuuming , if you ever ran Postgres 9.3 version and used pg_upgrade for the production databases.

What if I run Postgres 8.X (8.1,8.2, ?

Yes, absolutely!!  You are running EOL or unsupported version of Postgres. It is very important that you upgrade Postgre database to supported release  because it is most likely to hit any or all of the  security and/or data loss bugs fixed in recent Postgres releases.

What if I run Postgres 9.X  but have not applied the the latest minor release?

Yes, you should upgrade to the latest minor release (3rd digit in version numbering). The minor release only requires installing new binaries and restart the database.  It does _not_ require to use pg_dump/restore or pg_upgrade.

You should be upgrading to the following latest relevant Postgres release for your environment ASAP:

  • 9.4.4
  • 9.3.9
  • 9.2.13
  • 9.1.18
  • 9.0.22
How can I keep track of the supported Postgres releases?

Postgres releases support can be found here. It is worth to note that , if you are running Postgres 9.0,  you should plan for major version upgrade (most likely to latest Postgres 9.4.X release)  before the holidays season begin because  Postgres 9.0.X will be EOL or unsupported by September,2015.

Hope this will help to convince your manager(s) for the Postgres database upgrade!

Hans-Juergen Schoenig: Using temporary tables the way they should not be used

From Planet PostgreSQL. Published on Jun 12, 2015.

Temporary tables are a core feature of SQL and are commonly used by people around the globe. PostgreSQL provides a nice implementation of temporary tables, which has served me well over the years. An interesting question arises when using a temporary table: What if a temporary table has the same name as a “real” table? What […]

gabrielle roth: PDXPUG: June meeting next week

From Planet PostgreSQL. Published on Jun 11, 2015.

When: 6-8pm Thursday June 18, 2015
Where: Iovation
Who: Mark Wong
What: Pg Performance

Come have a chat with our local performace expert, Mark Wong.

This presentation will take a look at the performance changes in the development version of PostgreSQL. A selection of features, some proposed and some already committed, are evaluated using synthetic open source workloads. One of the workloads used is the former OSDL’s Database Test 3 that consists of business oriented ad-hob queries with concurrent data modification.

If you have a job posting or event you would like me to announce at the meeting, please send it along. The deadline for inclusion is 5pm the day before the meeting.

Our meeting will be held at Iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

Elevators open at 5:45 and building security closes access to the floor at 6:30.

The building is on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots.

See you there!

Robots Robots Ra Ra Ra!!! (PyCon 2015 Must-See Talk: 6/6)

By Caktus Consulting Group from Django community aggregator: Community blog posts. Published on Jun 10, 2015.

Part six of six in our PyCon 2015 Must-See Series, a weekly highlight of talks our staff enjoyed at PyCon.

I've had an interest in robotics since high school, but always thought it would be expensive and time consuming to actually do. Over the past few years, though, I've observed the rise of open hardware such as the Arduino and the Raspberry Pi, and modules and kits built on top of them, that make this type of project more affordable and accessible to the casual hobbyist. I was excited by Katherine's talk because Robot Operating System (ROS) seems to do for the software side what Arduino and such do for the hardware side.

ROS is a framework that can be used to control a wide range of robots and hardware. It abstracts away the hard work, allowing for a publish-subscribe method of communicating with your robot's subsystems. A plus side is that you can use higher-level programming languages such as Python or Lisp, not just C and C++, and there is an active and vibrant open source community built up around it already. Katherine did multiple demonstrations with a robot arm that she'd brought to the talk, that did much with a relatively small amount of easily understandable code. She showed that it was even easy to hook in OpenCV and do such things as finding a red bottle cap in the robot's field of vision.

Why Doesn't Python Have Switch/Case?

By pydanny's blog from Django community aggregator: Community blog posts. Published on Jun 09, 2015.


Unlike every other programming language I've used before, Python does not have a switch or case statement. To get around this fact, we use dictionary mapping:

def numbers_to_strings(argument):
    switcher = {
        0: "zero",
        1: "one",
        2: "two",
    return switcher.get(argument, "nothing")

This code is analogous to:

    switch(argument) {
        case 0:
            return "zero";
        case 1:
            return "one";
        case 2:
            return "two";
            return "nothing";

While the Python code is often more terse than the standard method of handling cases, I could argue it is more arcane. When I first started Python it felt weird and distracting. Over time it grew on me, the use of a dictionary key being the identifier in a switch becoming more and more habitual.

Dictionary Mapping for Functions

In Python we can also include functions or lambdas in our dictionary mapping:

def zero():
    return "zero"

def one():
    return "one"

def numbers_to_functions_to_strings(argument):
    switcher = {
        0: zero,
        1: one,
        2: lambda: "two",
    # Get the function from switcher dictionary
    func = switcher.get(argument, lambda: "nothing")
    # Execute the function
    return func()

While the code inside zero() and one are simple, many Python programs use dictionary mappings like this to dispatch complex procedures.

Dispatch Methods for Classes

If we don't know what method to call on a class, we can use a dispatch method to determine it at runtime.

class Switcher(object):
    def numbers_to_methods_to_strings(self, argument):
        """Dispatch method"""
        # prefix the method_name with 'number_' because method names
        # cannot begin with an integer.
        method_name = 'number_' + str(argument)
        # Get the method from 'self'. Default to a lambda.
        method = getattr(self, method_name, lambda: "nothing")
        # Call the method as we return it
        return method()

    def number_0(self):
        return "zero"

    def number_1(self):
        return "one"

    def number_2(self):
        return "two"

Pretty nifty, right?

The Official Answer

The official answer says, "You can do this easily enough with a sequence of if... elif... elif... else". And that you can use dictionary mapping for functions and dispatch methods for classes.

Arguably the official answer doesn't explain anything except for workarounds. In other words, a "non-answer". In my opinion, what the official answer is really trying to say is, "Python doesn't need a case statement."


Yup. But there's more. I've heard people I respect say that switch/case statements in code can be really hard to debug.

Personally I find that argument breaks down as soon as you run into gigantic nested dictionaries used for mapping of code branches. Think about it, a 100+ element nested dictionary is just as hard to debug as a nested switch and case block with 100+ cases.

Maybe Dictionary Mapping Runs Faster?

Moot as Python doesn't have a case statement. Talking about benchmarks from other languages is pointless as what is faster in one language is not always faster in another. Let's move on.

The Significant Advantage of Python's Approach

Every once in a while I walk into a scenario where Python's approach just works better than a switch/case statement. This is when at runtime I need to add or remove potential items from the mapping. When this occurs, my years of practice of writing dictionary mappings and dispatch methods pays off. I have insights now that I never had back in the day when I relied on switch/case statements.

Closing Thoughts

To me, that Python forced me to accumalate lots of practical experience with mappings is a blessing in disguise. The constraint of not having switch/case statements allowed me to create approaches and ideas I may not have developed with it.

Intentional or not, Python's lack of switch/case has been a social construct that made me a better coder.

Enough so that I think this accidental social construct is a better answer than the official one of 'Do this instead!'

The reference book I co-authored with Audrey Roy Greenfeld on Django best practices, Two Scoops of Django 1.8, is now available in both print paperback and PDF formats.

Testing Client-Side Applications with Django Post Mortem

By Caktus Consulting Group from Django community aggregator: Community blog posts. Published on Jun 08, 2015.

I had the opportunity to give a webcast for O’Reilly Media during which I encountered a presenter’s nightmare: a broken demo. Worse than that it was a test failure in a presentation about testing. Is there any way to salvage such an epic failure?

What Happened

It was my second webcast and I chose to use the same format for both. I started with some brief introductory slides but most of the time was spent as a screen share, going through the code as well as running some commands in the terminal. Since this webcast was about testing this was mostly writing more tests and then running them. I had git branches setup for each phase of the process and for the first forty minutes this was going along great. Then it came to the grand finale. Integrate the server and client tests all together and run one last time. And it failed.

Test Failure

I quickly abandoned the idea of attempting to live debug this error and since I was at the end away I just went into my wrap up. Completely humbled and embarrassed I tried to answer the questions from the audience as gracefully as I could while inside I wanted to just curl up and hide.

Tracing the Error

The webcast was the end of the working day for me so when I was done I packed up and headed home. I had dinner with my family and tried not to obsess about what had just happened. The next morning with a clearer head I decided to dig into the problem. I had done much of the setup on my personal laptop but ran the webcast on my work laptop. Maybe there was something different about the machine setups. I ran the test again on my personal laptop. Still failed. I was sure I had tested this. Was I losing my mind?

I looked through my terminal history. There it was and I ran it again.

Single Test Passing

It passed! I’m not crazy! But what does that mean? I had run the test in isolation and it passed but when run in the full suite it failed. This points to some global shared state between tests. I took another look at the test.

import os

from django.conf import settings
from django.contrib.staticfiles.testing import StaticLiveServerTestCase
from django.test.utils import override_settings

from selenium import webdriver
from import By
from import expected_conditions
from import WebDriverWait

    os.path.join(os.path.dirname(__file__), 'static'), ))
class QunitTests(StaticLiveServerTestCase):
    """Iteractive tests with selenium."""

    def setUpClass(cls):
        cls.browser = webdriver.PhantomJS()

    def setUpClass(cls):
        cls.browser = webdriver.PhantomJS()

    def tearDownClass(cls):

    def test_qunit(self):
        """Load the QUnit tests and check for failures."""

        self.browser.get(self.live_server_url + settings.STATIC_URL + 'index.html')
        results = WebDriverWait(self.browser, 5).until(
                (By.ID, 'qunit-testresult')))
        total = int(results.find_element_by_class_name('total').text)
        failed = int(results.find_element_by_class_name('failed').text)
        self.assertTrue(total and not failed, results.text)

It seemed pretty isolated to me. The test gets its own webdriver instance. There is no file system manipulation. There is no interaction with the database and even if it did Django runs each test in its own transaction and rolls it back. Maybe this shared state wasn’t in my code.

Finding a Fix

I’ll admit when people on IRC or Stackoverflow claim to have found a bug in Django my first instinct is to laugh. However, Django does have some shared state in its settings configuration. The test is using the override_settings decorator but perhaps there was something preventing it from working. I started to dig into the staticfiles code and that’s where I found it. Django was using the lru_cache decorator for the construction of the staticfiles finders. This means they were being cached after their first access. Since this test was running last in the suite it meant that the change to STATICFILES_DIRS was not taking effect. To fix my test meant that I simply needed to bust this cache at the start of my test.

from django.contrib.staticfiles import finders, storage
from django.utils.functional import empty
class QunitTests(StaticLiveServerTestCase):
    def setUp(self):
        # Clear the cache versions of the staticfiles finders and storage
        # See
        storage.staticfiles_storage._wrapped = empty

All Tests Passing

Fixing at the Source

Digging into this problem, it became clear that this wasn’t just a problem with the STATICFILES_DIRS setting but was a problem with using override_settings with most of the contrib.staticfiles related settings. In fact I found the easiest fix for my test case by looking at Django’s own test suite. I decided this really needed to be fixed in Django so that this issue wouldn’t bite any other developers. I opened a ticket and a few days later I created a pull request with the fix. After some helpful review from Tim Graham it was merged and was included in the recent 1.8 release.

What’s Next

Having a test which passes alone and fails when running in the suite is a very frustrating problem. It wasn’t something that I planned to demonstrate when I started with this webcast but that’s where I ended up. The problem I experienced was entirely preventable if I had prepared for the webcast better. However, my own failing lead to a great example of tracking down global state in a test suite and ultimately helped to improve my favorite web framework in just the slightest amount. All together I think it makes the webcast better than I could have planned it.

This was DjangoCon Europe 2015

By Horst Gutmann from Django community aggregator: Community blog posts. Published on Jun 07, 2015.

DjangoCon Europe is special. Most conferences I attend each year are basically just that: Conferences, meet-ups of like-minded people discussing interesting stuff. But what happened this past week in Cardiff is more like a huge family gatherings but where you can talk tech with anyone 😀

The City Hall as seen from Cardiff Castle

And this year the organisers tried to make this family much bigger by explicitly inviting new speakers to the event and helping them to get started. But not only speakers: There was also a whole day prior to the actual conference with workshops and talks that were free to attend hosted by Cardiff University in the same rooms as last year’s DjangoWeekend. No registration required, to strings attached. And from what I could see, quite a few people gave Django a shot there 😊

Markus had a workshop scheduled on that day for people already familiar with Python to learn the basics of Django. According to the sign-up page he could expect around 20 attendees so he asked around if someone was willing to help out. In the end we were about half a dozen trainers helping … half a dozen students. Slight overkill there on our end but still an awesome experience for me as I had never done something like that before. If time permits I definitely want to help with workshops again in the future 😁

Welsh Cakes!

Then came the conference days and with them some awesome talks by old and new speakers. But the really interesting stuff happened between the talks (and I don’t mean the delicious Welsh Cakes 😉). There was, for instance, and impromptu meeting of people interested in working on code-for-all-like projects in the park in front of the City Hall (the location of the main conference part). We didn’t get that far there but there is now at least a mailing-list where we can stay in touch. It is open to the public, so if you’re into this topic, join!

The obligatory fight with the projector that killed quite a few laptops.

IIRC that topic came up after the organisers announced that there would be anonymous wellbeing sessions with professionals from Cardiff University being held alongside the conference. That institute was currently struggling with some software issues so a couple of us gathered to help them as much as we could. During the sprints we also started to work on a prototype for a system that might facilitate some of their work-flows in the future.

If this still sounds too classic-conferencey for you, how about that:

  • On the last sprint day we sang Happy Birthday to one of the workshop tutors after walking through half the university in order to make a surprise entrance.

  • Daniele receives a book and a CD about/of his probably favorite boy-band by that time as a thank-you gift.

  • There was a quiet-room for when you want to be left alone to cool down a bit.

  • People talked very openly about whatever problems they had, not limited to technical but also social and psychological ones.

  • The whole event was supposed to be as waste-free as possible, so there were no plastic cups or plates but every attendee received a Klean Kanteen bottle and there were water coolers everywhere.

This year’s event also tried to offer as many evening activities as possible with one conference dinner for all held at the National Museum of Cardiff and visits to The Clink and the Vegetarian Food Studio. On Thursday there was also a BBQ!

As mentioned above, there were also (as always) sprints going on after the session-days. Due to my travel-plans I could only attend the first one but managed to get a ton of stuff done there again including some fixes for django-backup!

Aside from the conference I also did a bit of sight-seeing but I will try to put that into some other posts either here or on my travelogue, so stay tuned 😉

There is not all that much left to say here other than a big THANK YOU to all the organisers, volunteers, speakers and sponsors who made this event possible and I can’t wait for next year’s DjangoCon Europe in Budapest 😃

PyLadies RDU and Astro Code School Team Up for an Intro to Django Workshop

By Caktus Consulting Group from Django community aggregator: Community blog posts. Published on Jun 05, 2015.

This past Saturday, Caktus developer Rebecca Conley taught a 4-hour introductory level workshop in Django hosted by PyLadies RDU. PyLadies RDU is the local chapter of an international mentorship group for women who love coding in Python. Their main focus is to empower women to become more active participants and leaders in the Python open-source community.

The workshop was held in our Astro Code School space and sponsored by Windsor Circle, Astro Code School, and Caktus Group. Leslie Ray, the local organizer of PyLadies, is always looking for new opportunities “to create a supportive atmosphere for women to learn and teach Python.” With a strong interest in building projects in Django herself, Leslie thought an introductory workshop was the perfect offering for those looking to expand their knowledge in Python as well as a great platform from which Rebecca could solidify her own skills in the language.

“Django is practical,” explains Rebecca, “and it’s the logical next step for those with experience in Python looking to expand their toolkit.”

The event was extremely successful, with a total of thirty students in attendance. Rebecca was impressed with the students, who were “ enthusiastic and willing to work cooperatively,” which is always key in workshop environments. The class attracted everyone from undergraduates, to PhD students, to those looking into mid-career changes. In addition, she was glad to team up with PyLadies for the workshop, appreciating the group’s goal to provide a free and friendly environment for those wishing to improve and expand on their skills.

“It’s important to create new channels for individuals to explore programming. Unfortunately, the lack of diversity in tech is an indication not of who is interested in programming or technology, but of the lack of entryways into that industry. So any opportunity to widen that gateway, or to create more gateways, or to give more people the power to program is to be valued and diversity will ultimately make the field better.”It’s important to create new gateways for people to enter the field. The group of people with interest in and aptitude for programming is large and diverse, and diversity will make this field better. It’s up to those of us already in the field to open more doors and actively welcome and support people when they come in.”

For more information on PyLadies and their local programming, be sure to join their Meetup page, follow them on Twitter, or check out the international PyLadies group page. Other local groups that provide opportunities to code and that we’re proud sponsors of include Girl Develop It! RDU, TriPython, and Code for Durham. For women in tech seeking career support, Caktus also founded Durham Women in Tech.

Formatting python log messages

By Reinout van Rees' weblog from Django community aggregator: Community blog posts. Published on Jun 04, 2015.

I see three conflicting styles of log formatting in most of the code I come across. Basically:

import logging

logging = logging.getLogger(__name__)"%s went %s wrong", 42, 'very')"{} went {} wrong".format(42, 'very'))"%s went %s wrong" % (42, 'very'))

I looked at the official PEP 282 and at the official docs.

With the help of someone's stackoverflow question I think I understand it now.

  • Use the first version of the three examples. So:

    • The actual log message with the old, well-known %s (and %d, %f, etc) string formatting indicators.
    • As many extra arguments as you have %s-thingies in your string.
  • Don't use the second and third example, as both of them format the string before it gets passed to the logger. So even if the log message doesn't need to be actually logged somewhere, the full string gets created.

    The first example only gets passed a string and some extra arguments and only turns it into a real full string for in your logfile if it actually gets logged. So if you only display WARN level and higher, your DEBUG messages don't need to be calculated.

  • There is no easy way to use the first example with {} instead of %s.

So: use the"%s went %s wrong", 42, 'very') form.

(Unless someone corrects me, of course)

Djangocon sprint: zest.releaser 5.0 runs on python 3

By Reinout van Rees' weblog from Django community aggregator: Community blog posts. Published on Jun 04, 2015.

Good news! zest.releaser supports python 3.3+ now!

Now... what is zest.releaser? zest.releaser takes care of the boring release tasks for you. So: make easy, quick and neat releases of your Python packages. The boring bit?

  • Change the version number (1.2.dev0 -> 1.2 before releasing and 1.2 -> 1.3.dev0 after releasing).
  • Add a new heading in your changelog. Preferrably record the release date in the changelog, too.
  • svn/git/bzr/hg tag your project.
  • Perhaps upload it to pypi.

Zest.releaser takes care of this for you! Look at the docs on for details. The short page on our assumptions about a releasable python project might be a good starting point.

Now on to the python 3.3 support. It was on our wish list for quite some time and now Richard Mitchell added it, kindly sponsored by isotoma! We got a big pull request last week and I thought it would be a good idea to try and merge it at the djangocon sprint. Reason? There are people there who can help me with python 3.

Pull request 101 was the important one. So if you want to get an idea of what needs to be done on a python package that was first released in 2008, look at that one :-)

Long-time zest.releaser users: read Maurits' list of improvements in 4.0, released just two weeks ago.

Running on python 3 is enough of a change to warrant a 5.0 release, though. Apart from python 3, the biggest change is that we now test the long description (what ends up on your pypi page) with the same readme library that pypi itself uses. So no more unformatted restructured text on your pypi page just because there was a small error somewhere. Handy. Run longtest to check it. If it renders OK, it'll be opened in your webbrowser, always handy when you're working on your readme.

Zweite Hamburger Python Unconference

By Arne Brodowski from Django community aggregator: Community blog posts. Published on Jun 04, 2015.

Python Unconference Hamburg

Vom 04. bis 06. September 2015 findet die 2​. Python Unconference​ Hamburg im Institut für organische Chemie der Universität Hamburg statt.

Erwartet werden weit über 200 Python­-User bzw. Developer aus Hamburg und dem Rest der Welt. Von "Greenhorns" bis zu "proven Experts" aus den Bereichen Mathematik, Data Science, System­-Administration und DevOps bis hin zu Web­-Development und Python­-Core-­Entwicklung werden alle nur erdenklichen Facetten der Python­-Welt vertreten sein.

Wie im letzten Jahr findet die Unconference am Samstag und Sonntag im Barcamp-Stil statt, mit Vorträgen und Diskussionen aus allen Bereichen der Python Welt. Neu ist der Freitag: am Freitag gibt es Raum für Sprints, Tutorials und Workshops.

Aktuell gibt es noch Early Bird Tickets für 42,- Euro.

Mehr Informationen gibt es unter

Djangocon: React.js workshop - Maik Hoepfel

By Reinout van Rees' weblog from Django community aggregator: Community blog posts. Published on Jun 03, 2015.

He compares react.js to angular. Angular basically want to take over your whole page. Multiple apps on one page, mixed with some jquery, isn't really possible. You also have to learn a lot of terminology. It is a full stack framework. And angular 2.0 won't have a migration path from angular 1.x...

React.js only does one thing and does it well: the view. It doesn't do data fetching, url routing and so.

React.js is quite pythonic. You can start adding little bits at a time to your code. You cannot really do anything wrong with it.

A core idea is that the html is in one corner. That's the output. And the state is in the other corner. They're strictly seperated, which makes it easier to reason about.

It is build by facebook. They take page load time seriously. You can even run react.js partially on the server side ("react native", inside node.js), taking down rendering time, especially on mobile. Facebook is pretty deeply invested in it, so the chances of it disappearing are low. (As opposed to angular: google is barely using it itself).

When you start using react.js on a page, you'll have to start thinking about components. An "unordered list of results", a "search box". Those kinds of components.

  • Component: basically a piece of html, handled by react. If react renders it, html comes out. It looks at its state and properties to render it ("it calls the .render() method").

  • State: the current state of the component. The component can change this.

    If you can do without state: do it. You're better off calculating it from something else. And if it makes sense to store the state on a higher-level component: do it.

    If a component's state changes, you can assume that the render method gets called (it does some magic so that it only updates html if there are changes).

  • Properties: what you pass a component from the outside. Think parameters, like a list of products. The component cannot change these.

React.js code is quite simple to reason about as there's a simple path through the code. If some method changes the state, react handles the rendering. The rendering might set up click handlers or other things. If such a click handler changes the state, react renders everything and so on and so on.

Integration with existing jquery plugins is generally possible, but you'd rather not do it. React.js tries to store state nicely in a corner and jquery plugins normally store state on the DOM itself. So watch out a bit. You might have to build some more custom html than you'd have to do otherwise as you'd just download one of the existing jquery plugins.

A note about facebook's flux. "A flux" is just a structure for building large applications. There are multiple ones. He hopes you don't have to build such a large application yet, as the flux you're choosing might not be the one that wins out in the end.

If you need a good example of django and react.js, look at django-mediacat.

Question: "How do you integrate with django? Does django still render forms?" Answer: normally, you'd decouple the front-end and back-end, at least for the bits you let react.js handle. So perhaps generate some javascript variables in the django template. And grab the rest via an REST API.

Some clarification to the previous paragraph: you can choose yourself how much of the page you let react.js handle. He himself is using react.js to handle a couple of interactive elements on an otherwise-static django-rendered page.

Question: "Are you happy with react.js?". Answer: "YES!". It is quite small in scope. It doesn't get in your way. Easy to get started with. You can't do an awfully lot wrong (he has seen terrible examples with angular: it gives you much more rope to hang yourself with). You can progressively add interactivity to your page.

Tip: keep your react.js components small. Only let it handle spitting out html. Do any calculations you might have to do on the server side, behind your REST API.

Loop Letters | Tracy Osborn

By Lincoln Loop from Django community aggregator: Community blog posts. Published on Jun 03, 2015.

The following is the second installment of our Loop Letters series, an interview with Tracy Osborn.

Tracy Osborn avatar

Tracy is the author of Hello Web App, a book to help designers and non-programmers get up and running with Python and Django. Tracy comes from a designer background herself and offers a unique and refreshing take on programming in her writing. We asked Tracy all about getting involved with Django, recent talks she's given, any advice she has, and what she enjoys doing away from a screen.

Let's get started!

Your startup, Wedding Lovely, has turned into your full time job. What inspired you to get this going? What’s your favorite part about what you do each day?

I have an Art degree and I've always loved wedding invitations — they're such beautiful pieces of art. When I first decided to do a startup, I wanted to build a website to help people do the typography for their own invitations. Unfortunately, that turned into a harder problem to solve than I thought; outputting a PDF with perfect typography is no easy task! Instead, I decided to teach myself programming and start with an easier idea: building a simple directory of freelance and small studio wedding invitation designers. I launched the site in about six weeks after I started teaching myself Python and Django, and have been working on it ever since!

Honestly my favorite part is the freedom. Every day is different (and a lot of that is because I'm the designer and developer of WeddingLovely as well as founder, so there is always something to do!) and I get to choose my hours and my tasks. It's very freeing.

You also recently published a book, Hello Web App (congrats!), what’s the story behind why you decided to write this?

When I taught myself to code, I kept being annoyed at tutorials that assumed previous programming knowledge. Most Django tutorials don't emphasize the templates — the website bits — sticking to setting up models and views and checking output in the command line. But as a front-end developer, everything I set up didn't feel "real" until I saw it as a website. As I continued teaching myself Django and playing with web apps, I kept thinking, "Wow, this is so easy. If tutorials started with this, I would have been a lot more interested and engaged." Eventually I decided that I should just write it myself.

Photo of Hello Web App book

Could you tell us a bit about the main points you touched on at your Refresh Portland presentation where you discussed the lessons learned from writing a book?

The whole book process took a long time because I was trying to figure out the best process for writing, the best process for marketing, the best process for designing, etc. Now that I have a system, I could write another a lot faster; future books are definitely being planned! Essentially, self-publishing is so much easier now that it really doesn't make sense to go with a publisher unless you score a deal with one of the top companies.

Crowdfunding on Kickstarter or Indiegogo is a great way to get an "advance" while still keeping 100% of your royalties not to mention a great way to validate your book idea. Writing in Markdown is recommended since a lot of writing platforms use it (I use Draft), it's easily transformed to HTML, as well as easily ported into InDesign (and editorial design program.) For technical books, I wish I found the Leanpub platform earlier as their eBook generation is superb, and you can use their .mobi/Kindle file to sell on Amazon. I sell most of my books on Gumroad, and I added a video tier on a whim, and thank goodness I did — it generates 60%+ of my revenue.

You mention in your Kickstarter video for Hello Web App that you are fairly new to programming. What did you find to be the most challenging aspect of getting started coming from a design background?

I wanted to see results in the browser and so many tutorials avoid the website and templates! It was frustrating because I just wanted to learn how to build a basic web app, but Googling for "how to create a contact page" didn't come up with anything. Eventually I figured out that I needed to understand how to send emails in general first, and then how forms worked. If I tied these two together I could make a contact page, but that last mental leap (when my hand wasn't being held) was super hard.

What do you find especially appealing about Django and the Django community?

Everyone has been so encouraging! My husband is a Python programmer so he's great for Python questions, but doesn't use Django. I tweeted out that I was learning Django, and wonderful Django-ers in my area (whom I hadn't met in person yet) offered to meet up with me and help out and teach me best practices. I immediately felt welcomed. Generally, the Django community is very open and welcoming to new programmers.

When teaching yourself programming did you find any resources to be super handy? Any advice for other designers looking to break into coding?

Well, I ended up writing the book I wished existed! Other than Hello Web App, I used Learn Python the Hard Way to teach myself basic Python programming and logic. For best practices and just reading about Django, Two Scoops of Django is great.

Generally, I am a huge fan of learn-by-doing. It's okay not to 100% understand what you're working on or coding — as different programming tasks are completed, information will slowly sink into your brain. I encourage readers in Hello Web App to just work through the examples and I avoid a lot of explanation of what exactly is going on behind the scenes. Django is kind of famous for its "magic," which might be frustrating for advanced programmers but totally great for beginners.

The other big programming tip that really helped me was learning while working on a "real" project — something unique to me. I originally followed a blog tutorial to build my directory, because the structure between both are similar. Being invested in my own project (rather than the verbatim tutorial project) really helped the information stick.

As you advance your programming skills what concepts are you finding to be more and more difficult to break down for complete beginners?

Installation is the hardest. Unfortunately you have to set up Python and Django on your computer — it's not like HTML where you can create a file and the browser will show what you've built. Downloading the utilities you need for programming is kind of scary, especially since those sites are really written in "developer speak" and aren't very reassuring to beginners, not to mention you'll probably have to use the command line. Once you get everything installed, the rest is easy peasy ... until deployment. That's also really tough. I use Heroku in Hello Web App, but it was still a hard task to write the instructions needed to deploy on Heroku in a beginner-friendly way.

Finally, do you have any hobbies outside of tech that you are especially good at? How about any that you are terrible at but enjoy nonetheless?

I LOVE pretty much anything outdoorsy. Last summer I completed the John Muir Trail by myself, 200+ fun miles between Yosemite and Mt. Whitney in California. It was an amazing experience (other than my last day.) I love rafting and almost quit tech to become a rafting guide. I've dabbled in mountaineering and want to climb Rainier someday. I'm actually kind of terrible at hiking (I'm usually the slowest) but slow and steady wins the race, right?

You can stay up to date on Tracy’s adventures on Twitter.

Read our previous Loop Letter with Henrik Joreteg here.

Djangocon: CMS Panel - Iacopo Spalletti and Tom Dyson

By Reinout van Rees' weblog from Django community aggregator: Community blog posts. Published on Jun 02, 2015.

(One of the summaries of a talk at the 2015 Djangocon EU conference).

The panel consisted of Iacopo Spalletti of Django CMS and Tom Dyson of Wagtail.

For many people, a CMS is the way they get acquainted with django, so it is pretty important. Often they come from Wordpress originally.

  • What is the main philosophy of your CMS?

    Django CMS tries to be just another django app. After a while, django cms grew to support multiple roles. The content manager, the front-end designer, the main editors and the other editors. The main editors set up the content structure of the site and the other editors fill it.

    Wagtail doesn't try to give editors as much power as django CMS (and Drupal and so) do. Often the super-powers aren't even used. It is more aimed at django programmers. It has its own UI (instead of the standard django admin). Performance is a key priority.

  • In practice, how does the philosophy work out for you?

    Django CMS is build upon the flexibility of the django admin. They use it everywhere and it gives them a lot of power. The only pain point is the URLconf. They love django. They wouldn't be anywhere without django.

    Wagtail also tries to make the most use of all the great tools that django gives them. A point of friction for first-time users is django's installation. It just isn't as easy as uploading a couple of PHP files.

  • Ultimately, every CMS has a UI. Does django need to change anything in the way it deals with the front-end?

    Wagtail is often used through its API. The front-ent is often a separate javascript application. No problems there.

    CMSs are often used to create traditional websites. Websites that need to be crawlable by search engine bots. So that often means a regular html website. And if you need it there are more than enough ways to get your data into a javascript front-end.

  • Do you have any advice for the other CMS?

    Django CMS's lesson number one is "communicate correctly with the community". This is especially necessary for an open source project. They had some grumbling about features that got removed from one version to another.

    Wagtail: they made a couple mistakes with the community because they didn't respond quickly enough to pull requests at the start of the project. So watch out for that.

  • How far away are we from a tidal wave of new developers because we've pushed wordpress off its throne?

    Django CMS: deployment is the bottleneck. You could solve it at the service provider level, perhaps, because then you wouldn't have to install it yourself.

    Wagtail: yes, agreed. A second point is themes. Wordpress has lots of themes. But I don't know if that would be a good fit for us.

  • We don't really have a commercialization story. You can't really sell django themes at the moment, for instance.

    Wagtail: they're mainly developing wagtail to make it easy for themselves to build websites quickly. They're not sure themselves how it could be commercialized (in addition to making money with the websites).

Holiday picture from the Eifel region

Djangocon: What it’s really like building RESTful APIs with Django - Paul Hallett

By Reinout van Rees' weblog from Django community aggregator: Community blog posts. Published on Jun 02, 2015.

(One of the summaries of a talk at the 2015 Djangocon EU conference).

Paul Hallett considers himself an "API fanatic". He works for lyst, a website for buying clothes. They had an existing API, json-rpc, and wanted to have a better API that fitted better with http.

RPC (remote procedure call) means you only have one endpoint (=url) and everything is a POST request, even if you only request information.

"RPC for http is bad" is what he said. He wanted to do it with django rest framework. Django is a batteries-included webframework, django rest framework is a batteries-included API framework.

If you need to re-build some existing API as a rest API, think of it as a chance to rebuild. If it is an existing API, the program probably changed a bit from how it looked originally. Shouldn't you clean it up a bit? Change a couple of models? The alternatively is to shoe-horn non-fitting models into an otherwise nice API...

REST focuses on Resources. In django terms, you often think "a model". This isn't necessarily true, it might be either more or less. A model might actually have two separate sets of attributes that you'd rather treat as two separate resources.

The core rest framework technology you need to use for this is serializers. Basically translators between django models and resources.

Permissions tip: set DEFAULT_PERMISSION_CLASSES to the admin-only one. That way, if you make a mistake, only the admin can access it. Otherwise you might leave too many API endpoints open.

Authentication: always have it. Always do something. Even if it is on the same server. Tokens, a simple string, whatever. But the best option is oauth2. Install the oauthtoolkit to set it up.

Don't be afraid to bend the rules for http a little bit. If you can make a nice API with just a little bit of rule-bending: that's fine.

Documentation is necessary. If you don't have documentation, you have a bad API. You might have a great API and clear endpoints, but documentation is still key. At the bare minimum provide documentation on how to obtain an oauth2 token!

Holiday picture from the Eifel region

Djangocon: An ageing coder tackles his ageing code - Owen Campbell

By Reinout van Rees' weblog from Django community aggregator: Community blog posts. Published on Jun 02, 2015.

(One of the summaries of a talk at the 2015 Djangocon EU conference).

Owen Campbell himself is the ageing coder he talks about. He picked the title because he qualified for early bird tickets because of his age :-)

His experience started at age 11 with a 1kB synclair ZX81 that you could only program in z80 assembly language. Storage? Cassette tapes.

In university he did assembly programming on the 8086 and 86000. And for an extra course he did Fortran, a "high level language" which meant submitting code to the mainframe and getting a print back that you missed a comma. Please re-submit.

He showed an impressive list of programming languages he used in all those years.

For his own company he needed an accounting system. He re-used an older Lotus project and re-did it in microsoft access. Later he moved it to MSSQL server and a dotnet frontend. All in all it ran well for 15 years. But he got enough of the server license costs and the microsoft development environment.

That accounting system is the "old code" from the title. He wanted to convert it to a web app. He started out with Ruby & Rails. He had quite some experience with it.

He hit a brick wall instantly. Rails is very opinionated. Which works fine when you start a new project and buy into the opinions. But for an existing project with existing data models... Rails' opinion of that was low.

So he looked further. nodejs+javascript looked nice, but working for months with javascript: that would have been horrible.

Then he looked at Scala and Play. Nice language, reasonable web framework. But there are hardly any libraries, so he would need to build most of it himself.

He then looked at python and django. Not too opinionated. The admin was easy to set up. It worked fine.

  • He moved from MSSQL to postgres. The business logic was originally in MSSQL as stored procedures. Those are now moved in django itself.
  • Django rest framework and a front end with Angularjs. He used angularjs because he wanted to get to know it because a customer used it for something and he didn't know it yet. To avoid javascript he uses coffeescript.

What did he get out of it?

  • A nicely structured modular application.
  • It is entirely open source.
  • He still uses his original database design. That's a point of pride for him, that his original database design is still being used after 15 year and will continue to be used.
Holiday picture from the Eifel region

Djangocon: lookups, transforms and expressions - Anssi Kääriäinen

By Reinout van Rees' weblog from Django community aggregator: Community blog posts. Published on Jun 02, 2015.

(One of the summaries of a talk at the 2015 Djangocon EU conference).

(Anssi Kääriäinen gave a great talk last year at "django under the hood" about the django ORM. Look at that one, too.)

Lookups, transforms and expressions are new features in the ORM (part in 1.7, part in 1.8). They are ways to inject custom SQL in a well-structured way.

Custom lookups are things like __glob that you can write yourself in addition to the build-in ones like __gte. Custom transforms? __lower, as an example.

How you could do this previously was by using .extra(), you could tweak the sql this way. The new approach is more reusable. It can also be chained (somefield__lower__glob)! Another drawback of .extra() is that it quite a hacky part of the django code. The new functionality

Lookups. Those are basically "a condition in the query's WHERE clause". Examples in django are lt, gt, gte, contains and so. To create a custom lookup, you subclass models.Lookup. You set the lookup_name and you implement the .as_sql() method. Afterwards you register it for the various Field types for which it is valid.

An example (very quickly and incompletely copied over from the presentation):

class GlobLookup(models.Lookup):
    lookup_name = 'glob'
    def as_sql(self, compiler, connection):
        lhs_sql, lhs_params = .....
        # ^^^ lhs means "left hand side", same rhs
        rhs_sql, rhs_params = .....

        return "%s glob %s" % (lhs_sql, rhs_sql) # + plus something more

# try it out

Transform example, in this case for lowercasing a string:

@models.CharField.register_lookup  # this will be register_transform later
class LowerTransform(models.Transform):
     lookup_name = 'lower'
     # bilateral = True
     # ^^^ Enable this if you also want to transform the right hand side.
     def as_sql(self, compiler, connection):
         lhs_sql, lhs_params = compiler.compile(....)
         return 'lower(%s)' % ...

# try it out

Expressions are an element in the SELECT clause. Subclass models.Func (or models.Aggregate) and set a method:

class GroupConcat(models.aggregate):
      function = 'group_concat'


You can combine it all (pardon me any typos...):


You can do all sorts of queries with this that weren't possible in earlier versions!

Some extra notes:

  • You can add output_field if the field type changes. If you have a Length expression, you probably effectively convert a list of strings into one single number.
  • Multiple sql dialects are supported, too. If a certain database needs special treatment, you can add methods like .as_postgres() or .as_oracle(). If available, when you use that database, they are used in preference to the generic .as_sql().

The above functionality allowed django to support some specific postgres fields for HStore, for instance:

                   data={'breed': 'labrador',
                         'something': 'else'})
# ^^^ Data is an hstore field: key/value store.


Same with a postgres list field. .filter(tags__1__lower__glob='*django').... You can combine anything.

What's next?

  • Implement extra common expressions in django core (or in contrib).
  • Ordering by transform results.
  • Unify expression and transform implementations.

If you want to know more, check the django documentation:

And... a lot of this work was possible because of, so support django!

Holiday picture from the Eifel region

Djangocon: How to write a view - David Winterbottom

By Reinout van Rees' weblog from Django community aggregator: Community blog posts. Published on Jun 02, 2015.

(One of the summaries of a talk at the 2015 Djangocon EU conference).

David Winterbottom says that on the one hand, views are simple: a request comes in and a response goes out. On the other hand, you have lots of freedom in views, which might lead to problems.

A central principle in software architecture is separation of concerns. For instance by encapsulation: providing a clean interface and hiding information from other parts of the system.

A well-known design pattern is MVC, model-view-controller. Originally used for desktop application, but also used in most web frameworks. Note that every web framework interpretes it differently.

  • Model: business logic. Domain logic. The data, the rules.
  • View: what the user sees.
  • Controller: steers the whole process. The user interacts with the controller.

Django talks about MTV: model, template, view. He'd say that what's the view in MVC is mostly the templates and static files. The Controller in django is then views, middleware,

What David often notices in django projects is that the is too big. There's data validation in there. Some of the model layer creeps into the view. And so on. You're not encapsulating everything like you could be.

Django is for perfectionists with deadlines. With models and views you can quickly get something working. But especially with the new class based views, it is easy to mix the two up a bit, functionality-wise.

Perhaps it is caused by django's convention of placing the database models in There's talk about "fat models, thin view" and the other way around. That's just shifting code between and

Wouldn't it be better to capture the domain knowledge in multiple files like,,, Something like that might help make your code cleaner and clearer.

Some "view smells":

If your view start smelling, you have to know when to stop adjusting it and when to just retreat back to a regular modelform, for instance.

What you should take away from this talk: keep views boring! Views should be simple. The hard code should be in the domain layer. And: separate concerns.

What might help is asking yourself questions like:

  • How would I do it if it were a REST API?
  • How would I do this in a management command? (This question is great for finding places where you're wrongly passing a request around).
Holiday picture from the Eifel region

Is Open Source Consulting Dead?

By chrism from . Published on Sep 10, 2013.

Has Elvis left the building? Will we be able to sustain ourselves as open source consultants?

Consulting and Patent Indemification

By chrism from . Published on Aug 09, 2013.

Article about consulting and patent indemnification

Python Advent Calendar 2012 Topic

By chrism from . Published on Dec 24, 2012.

An entry for the 2012 Japanese advent calendar at

Why I Like ZODB

By chrism from . Published on May 15, 2012.

Why I like ZODB better than other persistence systems for writing real-world web applications.

A str. __iter__ Gotcha in Cross-Compatible Py2/Py3 Code

By chrism from . Published on Mar 03, 2012.

A bug caused by a minor incompatibility can remain latent for long periods of time in a cross-compatible Python 2 / Python 3 codebase.

In Praise of Complaining

By chrism from . Published on Jan 01, 2012.

In praise of complaining, even when the complaints are absurd.

2012 Python Meme

By chrism from . Published on Dec 24, 2011.

My "Python meme" replies.

In Defense of Zope Libraries

By chrism from . Published on Dec 19, 2011.

A much too long defense of Pyramid's use of Zope libraries.

Plone Conference 2011 Pyramid Sprint

By chrism from . Published on Nov 10, 2011.

An update about the happenings at the recent 2011 Plone Conference Pyramid sprint.

Jobs-Ification of Software Development

By chrism from . Published on Oct 17, 2011.

Try not to Jobs-ify the task of software development.

WebOb Now on Python 3

By chrism from . Published on Oct 15, 2011.

Report about porting to Python 3.

Open Source Project Maintainer Sarcastic Response Cheat Sheet

By chrism from . Published on Jun 12, 2011.

Need a sarcastic response to a support interaction as an open source project maintainer? Look no further!

Pylons Miniconference #0 Wrapup

By chrism from . Published on May 04, 2011.

Last week, I visited the lovely Bay Area to attend the 0th Pylons Miniconference in San Francisco.

Pylons Project Meetup / Minicon

By chrism from . Published on Apr 14, 2011.

In the SF Bay Area on the 28th, 29th, and 30th of this month (April), 3 separate Pylons Project events.

PyCon 2011 Report

By chrism from . Published on Mar 19, 2011.

My personal PyCon 2011 Report