Skip to content. | Skip to navigation

Personal tools
Log in
You are here: Home

Open Source Posts

Michael Paquier: Postgres 9.5 feature highlight: Parallel VACUUM with vacuumdb

From Planet PostgreSQL. Published on Jan 26, 2015.

A couple of days back a new mode has been added in vacuumdb for the support of parallel jobs:

commit: a17923204736d8842eade3517d6a8ee81290fca4
author: Alvaro Herrera <>
date: Fri, 23 Jan 2015 15:02:45 -0300
vacuumdb: enable parallel mode

This mode allows vacuumdb to open several server connections to vacuum
or analyze several tables simultaneously.

Author: Dilip Kumar.  Some reworking by Álvaro Herrera
Reviewed by: Jeff Janes, Amit Kapila, Magnus Hagander, Andres Freund

When specifying a number of jobs with -j, the number of maximum connections defined by max_connections should be higher than the number of jobs specified as process creates a number of connections to the remote database equal to the number of jobs, and then reuses those connections to process the tables specified.

This of course supports all the modes already present in vacuumdb, like --analyze, --analyze-in-stages, etc. The list of tables processed in parallel can as well be customized when passing several values via --tables.

An important thing to note is that when using this feature with -f (VACUUM FULL), there are risks of deadlocks when processing catalog tables. For example in this case what happens was a conflict between pg_index and pg_depend:

$ vacuumdb -j 32 -f -d postgres
vacuumdb: vacuuming database "postgres"
vacuumdb: vacuuming of database "postgres" failed: ERROR:  deadlock detected
DETAIL:  Process 2656 waits for RowExclusiveLock on relation 2608 of database 12974; blocked by process 2642.
Process 2642 waits for AccessShareLock on relation 2610 of database 12974; blocked by process 2656.
HINT:  See server log for query details.
$ psql -At -c "SELECT relname FROM pg_class WHERE oid IN (2608,2610);"

Note that this has higher chances to happen if:

  • the number of relations defined on the database processed is low.
  • the quantity of data to be processed is low
  • the number of jobs is high

So be careful when using parallel jobs with FULL on a complete database.

Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Use abbreviated keys for faster sorting of text datums.

From Planet PostgreSQL. Published on Jan 26, 2015.

On 19th of January, Robert Haas committed patch: Use abbreviated keys for faster sorting of text datums.   This commit extends the SortSupport infrastructure to allow operator classes the option to provide abbreviated representations of Datums; in the case of text, we abbreviate by taking the first few characters of the strxfrm() blob. If the […]

Daniel Pocock: Get your Nagios issues as an iCalendar feed

From Planet PostgreSQL. Published on Jan 26, 2015.

The other day I demonstrated how to get your Github issues/bugs as an iCalendar feed.

I'm planning to take this concept further and I just whipped up another Python script, exposing Nagios issues as an iCalendar feed.

The script is nagios-icalendar. Usage is explained concisely in the README file, it takes just minutes to get up and running.

One interesting feature is that you can append a contact name to the URL and just get the issues for that contact, e.g.:


Here I demonstrate using Mozilla Lightning / Iceowl-extension to aggregate issues from Nagios, the Fedora instance of Bugzilla and Lumicall's Github issues into a single to-do list.

Gabriele Bartolini: Incremental backup with Barman 1.4.0

From Planet PostgreSQL. Published on Jan 26, 2015.

Today version 1.4.0 of Barman has been officially released. The most important feature is incremental backup support, which relies on rsync and hard links and helps you reduce both backup time and disk space by 50-70%.

Stacked coffee cupsBarman adds one configuration option, called reuse_backup. By setting this option to link, Barman will transparently reuse the latest available backup in the catalogue of a given server when issuing a barman backup command.

“Mhh … hang on … Did you just say ‘Reuse’? In what terms?”

Essentially, Barman will reuse the previous backup in two phases:

  • when reading the backup files;
  • when permanently saving the backup files.

Barman simply exploits rsync’s robust and proven technology in order:

  • to skip transferring those database files that have not changed from the latest base backup;
  • use hard links on the backup side in order to save disk space (data deduplication).

If you happen to follow me on Twitter, you might have already seen this message about data deduplication on one of our customer’s database:

#pgbarman #postgresql #IncrementalBackup #database size: 13.2 TB. Actual size on disk: 5.0 TB (-62.01% #deduplication ratio). Saved 8.2TB!This particular case involves a very large database of 13.2 Terabyte. Consider doing a weekly backup of a 13.2 TB database. You face two major problems:

  • backup time
  • backup size (with a large impact on retention policies)

As you can see, over 8.2 TB of data had not changed between the two backups, with a subsequent reduction of both backup time (17 hours instead of more than 50!) and disk space (5TB instead of 13TB!), as well as network bandwidth (by the way, Barman allows you to enable network compression too).

I must confess that a 13TB database is quite an exceptional case. However, what we have witnessed so far, even on smaller (and much smaller) databases, is an almost constant deduplication ratio in all the backups that we are managing with 2ndQuadrant. On average, deduplication ratio is between 50% and 70%.

The experience gained on this topic by Barman’s development team won’t stop here. A similar approach is being followed by Marco Nenciarini in its attempt to introduce file level incremental backup in PostgreSQL 9.5 through streaming replication and pg_basebackup.

On a final note, I would like to thank BIJ12, Jobrapido, Navionics, Sovon Vogelonderzoek Nederland and for their support towards this release.

Hubert 'depesz' Lubaczewski: Waiting for 9.5 – vacuumdb: enable parallel mode

From Planet PostgreSQL. Published on Jan 26, 2015.

On 23rd of January, Alvaro Herrera committed patch: vacuumdb: enable parallel mode   This mode allows vacuumdb to open several server connections to vacuum or analyze several tables simultaneously.   Author: Dilip Kumar. Some reworking by Álvaro Herrera Reviewed by: Jeff Janes, Amit Kapila, Magnus Hagander, Andres Freund This is great for multi-table vacuums/analyzes. Previously, […]

Guillaume LELARGE: A new vmstat-like tool for PostgreSQL

From Planet PostgreSQL. Published on Jan 25, 2015.

The simplest tools are usually the best.

One of the tools I usually need when I go see customers is vmstat. Nothing beats vmstat to give me a real overview of what the server is really doing. This overview gives system metrics, such as CPU usage, and disk usage. That's quite useful to check where the bottleneck comes from.

I wish I had a PostgreSQL tool like that. I wished enough to eventually build it. I call it pgstat because I couldn't find a better name for it.

It's an online command tool that connects to a database and grabs its activity statistics. As PostgreSQL has many statistics, you have a command switch to choose the one you want (-s):

  • archiver for pg_stat_archiver
  • bgwriter for pg_stat_bgwriter
  • connection for connections by type
  • database for pg_stat_database
  • table for pg_stat_all_tables
  • tableio for pg_statio_all_tables
  • index for pg_stat_all_indexes
  • function for pg_stat_user_function
  • statement for pg_stat_statements
  • pbpools for pgBouncer pools statistics
  • pbstats for pgBouncer general statistics

It looks a lot like vmstat. You ask it the statistics you want, and the frequency to gather these statistics. Just like this:

$ pgstat -s connection
 - total - active - lockwaiting - idle in transaction - idle -
    1546       15             0                     0   1531  
    1544       17             0                     0   1527  
    1544       14             0                     0   1530  
    1546       26             0                     0   1520  
    1543       21             0                     0   1522 

Yeah, way too many idle connections. Actually, way too many connections. Definitely needs a pooler there.

This is what happens on a 10-secondes 10-clients pgbench test:

$ pgstat -s database 1
- backends - ------ xacts ------ -------------- blocks -------------- -------------- tuples -------------- ------ temp ------ ------- misc --------
                commit rollback     read    hit read_time write_time      ret    fet    ins    upd    del    files     bytes   conflicts deadlocks
         1      224041       17    24768 2803774         0          0   4684398 234716 2105701  16615    113        1  14016512           0         0
         1           0        0        0      0         0          0        0      0      0      0      0        0         0           0         0
         1           3        0        0    205         0          0       92     92      0      0      0        0         0           0         0
        11          20        0        0    500         0          0     1420    184      0      1      0        0         0           0         0
        11          69        0        1   4438         0          0     1736    986     68    204      0        0         0           0         0
        11         136        0       12   4406         0          0     1767    270    135    405      0        0         0           0         0
        11         108        0        0   3434         0          0     1394    214    107    321      0        0         0           0         0
        11          96        0        0   3290         0          0     1240    190     95    285      0        0         0           0         0
        11         125        0        0   4045         0          0     1620    248    124    372      0        0         0           0         0
        11         126        0        0   4222         0          0     1628    250    125    375      0        0         0           0         0
        11         111        0        0   3644         0          0     1436    220    110    330      0        0         0           0         0
        11          78        0        0   2549         0          0     1918    161     75    225      0        0         0           0         0
        11         118        0        0   3933         0          0     1524    234    117    351      0        0         0           0         0
         1         130        0        0   4276         0          0     1685    258    129    387      0        0         0           0         0
         1           1        0        0      0         0          0        0      0      0      0      0        0         0           0         0
         1           1        0        0      0         0          0        0      0      0      0      0        0         0           0         0
         1           1        0        0      0         0          0        0      0      0      0      0        0         0           0         0

You clearly see when it starts, when it stops, and what it did during the 10 seconds. Here is what happens at the tables level:

$ pgstat -s table -d b1 1
-- sequential -- ------ index ------ ----------------- tuples -------------------------- -------------- maintenance --------------
   scan  tuples     scan  tuples         ins    upd    del hotupd   live   dead analyze   vacuum autovacuum analyze autoanalyze
  68553  1467082   264957  266656      7919869  59312    113  57262 4611779   3782   5401      22         10       4          22
      3     430        0       0           0      0      0      0      0      0      0       0          0       0           0
      3     430        0       0           0      0      0      0      0      0      0       0          0       0           0
    231    2351     1116    1222          61    184      0    180     61    124    245       2          0       0           0
    431    1750      240     240         120    360      0    358    120    242    480       0          0       0           0
    385    1640      220     220         110    330      0    327    110     11    440       0          0       0           0
    340    1475      190     190          95    285      0    285     95    189    380       0          0       0           0
    398    1651      222     222         111    333      0    331    111     -2    444       0          0       0           0
    353    1519      198     198          99    297      0    293     99    200    396       0          0       0           0
    335    1453      186     186          93    279      0    274     93   -210    372       0          0       0           0
    446    1838      256     256         128    384      0    381    128    104    512       0          0       0           0
    425    1739      238     238         119    357      0    354    119    241    476       0          0       0           0
    360    1552      204     204         102    306      0    305    102    -10    408       0          0       0           0
    386    1629      218     218         109    327      0    325    109     57    436       0          0       0           0
    437    1761      242     242         121    363      0    363    121   -292    484       0          0       0           0
    373    1563      206     206         103    309      0    305    103     -1    412       0          0       0           0
    323    1442      184     184          92    276      0    273     92    188    368       0          0       0           0
    412    1706      232     232         116    348      0    346    116     76    464       0          0       0           0
    291    1332      164     164          82    246      0    245     82   -216    328       0          0       0           0
    189    1013      106     106          53    159      0    158     53    106    212       0          0       0           0
    346    1508      196     196          98    294      0    290     98    -18    392       0          0       0           0
    304    1376      172     172          86    258      0    258     86   -156    344       0          0       0           0
    442    1794      248     248         124    372      0    368    124   -260    496       0          0       0           0
      9    1371      157     260           0     13      0     13 -11602   -329  -6053       0          2       0           3
      3     430        0       0           0      0      0      0      0      0      0       0          0       0           0
      3     430        0       0           0      0      0      0      0      0      0       0          0       0           0

You can alsop filter by table name with the -f command line switch:

$ pgstat -s table -d b1 -f pgbench_history 1
-- sequential -- ------ index ------ ----------------- tuples -------------------------- -------------- maintenance --------------
   scan  tuples     scan  tuples         ins    upd    del hotupd   live   dead analyze   vacuum autovacuum analyze autoanalyze
      0       0        0       0       21750      0      0      0   2022      0      0       1          0       1           7
      0       0        0       0           0      0      0      0      0      0      0       0          0       0           0
      0       0        0       0          64      0      0      0     64      0     64       0          0       0           0
      0       0        0       0         122      0      0      0    122      0    122       0          0       0           0
      0       0        0       0         106      0      0      0    106      0    106       0          0       0           0
      0       0        0       0          99      0      0      0     99      0     99       0          0       0           0
      0       0        0       0          88      0      0      0     88      0     88       0          0       0           0
      0       0        0       0         116      0      0      0    116      0    116       0          0       0           0
      0       0        0       0          99      0      0      0     99      0     99       0          0       0           0
      0       0        0       0          61      0      0      0     61      0     61       0          0       0           0
      0       0        0       0          42      0      0      0     42      0     42       0          0       0           0
      0       0        0       0         106      0      0      0    106      0    106       0          0       0           0
      0       0        0       0          55      0      0      0     55      0     55       0          0       0           0
      0       0        0       0         121      0      0      0    121      0    121       0          0       0           0
      0       0        0       0          68      0      0      0  -1942      0  -1011       0          0       0           1
      0       0        0       0          99      0      0      0     99      0     99       0          0       0           0
      0       0        0       0         109      0      0      0    109      0    109       0          0       0           0
      0       0        0       0          94      0      0      0     94      0     94       0          0       0           0
      0       0        0       0         120      0      0      0    120      0    120       0          0       0           0
      0       0        0       0         110      0      0      0    110      0    110       0          0       0           0
      0       0        0       0         100      0      0      0    100      0    100       0          0       0           0
      0       0        0       0         115      0      0      0    115      0    115       0          0       0           0
      0       0        0       0           0      0      0      0      0      0      0       0          0       0           0
      0       0        0       0           0      0      0      0      0      0      0       0          0       0           0

We see that the activity on this table is quite different from what happens to the other tables.

Today, I added reporting from the pg_stat_statements extension. It works pretty well:

$ pgstat -s statement -d b1
--------- misc ---------- ----------- shared ----------- ----------- local ----------- ----- temp ----- -------- time --------
  calls      time   rows      hit   read  dirty written      hit   read  dirty written    read written        read   written
 383843   1756456.50 13236523   9277049  38794  50915    1640   1008844  17703   8850    8850    1711    1711        0.00      0.00
      1     0.75      1        0      0      0       0        0      0      0       0       0       0        0.00      0.00
      1     0.50      1        0      0      0       0        0      0      0       0       0       0        0.00      0.00
      1     0.75      1        0      0      0       0        0      0      0       0       0       0        0.00      0.00
    310   2709.88    220     1527     10     63       0        0      0      0       0       0       0        0.00      0.00
    797   8555.00    569     3736     10    109       0        0      0      0       0       0       0        0.00      0.00
    725   9215.25    519     3610     23    115       0        0      0      0       0       0       0        0.00      0.00
    266   7729.38    190     1257      2     43       0        0      0      0       0       0       0        0.00      0.00
    831   10196.12    594     3988     11    112       0        0      0      0       0       0       0        0.00      0.00
    788   8678.38    563     3803      8     92       0        0      0      0       0       0       0        0.00      0.00
    736   9080.62    526     3616      7     89       0        0      0      0       0       0       0        0.00      0.00
    792   8395.50    566     3742     11     96       0        0      0      0       0       0       0        0.00      0.00
    814   9346.75    582     3985      9     84       0        0      0      0       0       0       0        0.00      0.00
    763   8941.12    545     3799      9     84       0        0      0      0       0       0       0        0.00      0.00
    728   8543.25    520     3549      8     62       0        0      0      0       0       0       0        0.00      0.00
    589   9143.62    421     2812      7     45       0        0      0      0       0       0       0        0.00      0.00
    785   8710.00    561     3788      4     60       0        0      0      0       0       0       0        0.00      0.00
    785   9117.25    561     3885      4     60       0        0      0      0       0       0       0        0.00      0.00
    785   8397.12    561     3788      1     52       0        0      0      0       0       0       0        0.00      0.00
    799   9398.12    571     3925      7     60       0        0      0      0       0       0       0        0.00      0.00
    765   9033.88    547     3757      3     43       0        0      0      0       0       0       0        0.00      0.00
    805   8663.25    575     3886      6     57       0        0      0      0       0       0       0        0.00      0.00
    765   8490.50    547     3661      7     39       0        0      0      0       0       0       0        0.00      0.00
    764   8850.00    546     3698      4     41       0        0      0      0       0       0       0        0.00      0.00
    396   6706.50    283     1992      1     14       0        0      0      0       0       0       0        0.00      0.00
      1     0.38      1        0      0      0       0        0      0      0       0       0       0        0.00      0.00
      1     0.62      1        0      0      0       0        0      0      0       0       0       0        0.00      0.00

Of course, it first searchs for the extension, and complains if it isn't there:

$ pgstat -s statement -d b2
pgstat: Cannot find the pg_stat_statements extension.

I'll continue to work on this little tool. It definitely needs more love: better code, and lots of documentation. But it's still a nice tool to have when you work on a PostgreSQL server.

If you want to test it, go to my github page, download the tool, and compile it. Let me know what you think of it on my email address (

Daniel Pocock: Get your Github issues as an iCalendar feed

From Planet PostgreSQL. Published on Jan 24, 2015.

I've just whipped up a Python script that renders Github issue lists from your favourite projects as an iCalendar feed.

The project is called github-icalendar. It uses Python Flask to expose the iCalendar feed over HTTP.

It is really easy to get up and running. All the dependencies are available on a modern Linux distribution, for example:

$ sudo apt-get install python-yaml python-icalendar python-flask python-pygithub

Just create an API token in Github and put it into a configuration file with a list of your repositories like this:

api_token: 6b36b3d7579d06c9f8e88bc6fb33864e4765e5fac4a3c2fd1bc33aad
bind_address: ::0
bind_port: 5000
- repository: your-user-name/your-project
- repository: your-user-name/another-project

Run it from the shell:

$ ./github_icalendar/ github-ics.cfg

and connect to it with your favourite iCalendar client.

Consolidating issue lists from Bugzilla, Github, Debian BTS and other sources

A single iCalendar client can usually support multiple sources and thereby consolidate lists of issues from multiple bug trackers.

This can be much more powerful than combining RSS bug feeds because iCalendar has built-in support for concepts such as priority and deadline. The client can use these to help you identify the most critical issues across all your projects, no matter which bug tracker they use.

Bugzilla bugtrackers already expose iCalendar feeds directly, just look for the iCalendar link at the bottom of any search results page. Here is an example URL from the Mozilla instance of Bugzilla.

The Ultimate Debian Database consolidates information from the Debian and Ubuntu universe and can already export it as an RSS feed, there is discussion about extrapolating that to an iCalendar feed too.

Further possibilities

  • Prioritizing the issues in Github and mapping these priorities to iCalendar priorities
  • Creating tags in Github that allow issues to be ignored/excluded from the feed (e.g. excluding wishlist items)
  • Creating summary entries instead of listing all the issues, e.g. a single task entry with the title Fix 2 critical bugs for project foo


The screenshots below are based on the issue list of the Lumicall secure SIP phone for Android.

Screenshot - Mozilla Thunderbird/Lightning (Icedove/Iceowl-extension on Debian)

Leo Hsu and Regina Obe: Using SSL https connections with www_fdw on windows

From Planet PostgreSQL. Published on Jan 24, 2015.

One of the foreign data wrappers I included in the PostgreSQL 9.3 Windows FDW bag and PostgreSQL 9.4 Windows FDW bag is the www_fdw extension used for querying web services. Someone asked that since I didn't build curl with SSL support, they are unable to use it with https connections. The main reason I didn't is that the EDB installs come with ssleay32.dll and libeay32.dll (even the 64-bit) which are dependencies of curl when built with SSL support. I wanted to minimize the issue of distributing dlls that are packaged with Windows PostgreSQL installers already.

Though this article is specific to using www_fdw on Windows systems, many of the issues are equally applicable to other platforms, so may be worth a read if you are running into similar issues with using specialty SSL certificates on Linux/Unix/Mac.

Continue reading "Using SSL https connections with www_fdw on windows"

Amit Kapila: Read Scalability in PostgreSQL 9.5

From Planet PostgreSQL. Published on Jan 24, 2015.

In PostgreSQL 9.5, we will see a boost in scalability for read workload
when the data can fit in RAM.  I have ran a pgbench read-only load to
compare the performance difference between 9.4 and HEAD (62f5e447)
on IBM POWER-8 having 24 cores, 192 hardware threads, 492GB RAM
and here is the performance data

The data is mainly taken for 2 kind of workloads, when all the data fits
in shared buffers (scale_factor = 300) and when all the data can't fit in
shared buffers, but can fit in RAM (scale_factor = 1000).

First lets talk about 300 scale factor case, in 9.4 it peaks at 32 clients,
now it peaks at 64 clients and we can see the performance improvement
upto (~98%) and it is better in all cases at higher client count starting from
32 clients.  Now the main work which lead to this improvement is
commit - ab5194e6 (Improve LWLock scalability).  The previous implementation
has a bottleneck around spin locks that were acquired for  LWLock
Acquisition and Release and the implantation for 9.5 has changed the
LWLock implementation to use atomic operations to manipulate the state.
Thanks to Andres Freund (and according to me the credit goes to reviewers
(Robert Haas and myself) as well who have reviewed multiple versions
of this patch) author of this patch due to whom many PostgreSQL users will
be happy.

Now lets discuss about 1000 scale factor case,  in this case, we could

see the good performance improvement (~25%) even at 32 clients and it

went upto (~96%) at higher client count, in this case also where in 9.4
it was peaking at 32 client count, now it peaks at 64 client count and
the performance is better at all higher client counts.  The main work
which lead to this improvement is commit id 5d7962c6 (Change locking
regimen around buffer replacement) and  commit id  3acc10c9 (Increase
the number of buffer mapping partitions to 128).  In this case there were
mainly 2 bottlenecks (a) a BufFreeList LWLock was getting acquired to
find a free buffer for a page (to find free buffer, it needs to execute
clock sweep)  which becomes bottleneck when many clients try to perform the
same action simultaneously (b) to change the association of buffer in
buffer mapping hash table a LWLock is acquired on a hash partition to
which the buffer to be associated belongs and as there were just 16
such partitions, there was huge contention when multiple clients starts
operating on same partition.  To reduce the bottleneck due to (a), used
a spinlock which is held just long enough to pop the freelist or advance
the clock sweep hand, and then released.  If we need to advance the
clock sweep further, we reacquire the spinlock once per buffer.  To reduce
the bottleneck due to (b), increase the buffer partitions to 128.  The crux
of this improvement is that we had to resolve both the bottlenecks (a and b)
together to see a major improvement in scalability.  The initial patch for
this improvement is prepared by me and then Robert Haas extracted the
important part of patch and committed the same.  Many thanks to both
Robert Haas and Andres Freund who not only reviewed the patch, but
given lot of useful suggestions during this work.

During the work on improvements in buffer management, I noticed that

the next bigger bottleneck that could buy us reasonably good improvement
in read workloads is in dynamic hash tables used to manage shared buffers,
so improving the concurrency of dynamic hash tables could help further
improving the read operation.  There was some discussion about using
concurrent hash table for shared buffers (patch by Robert Haas), but still
it has not materialized.

Peter Geoghegan: Abbreviated keys: exploiting locality to improve PostgreSQL's text sort performance

From Planet PostgreSQL. Published on Jan 23, 2015.

On Monday, Robert Haas committed a patch of mine that considerably speeds up the sorting of text in PostgreSQL. This was the last and the largest in a series of such patches, the patch that adds "abbreviated keys". PostreSQL 9.5 will have big improvements in sort performance.

In realistic cases, CREATE INDEX operations on text are over 3 times faster than in PostgreSQL 9.4. Not every such utility operation, or data warehousing query involving a big sort is sped up by that much, but many will be.

This was a piece of work that I spent a considerable amount of time on over the past few months. It's easy to justify that effort, though: sorting text is a very fundamental capability of any database system. Sorting is likely the dominant cost when creating B-Tree indexes, performing CLUSTER operations, and, most obviously, for sort nodes that are required by many plans that are executed in the service of queries with ORDER BY or DISTINCT clauses, or aggregates using the GroupAggregate strategy. Most of the utility statements that need to perform sorts must perform them with a very disruptive lock on the target relation (CREATE INDEX CONCURRENTLY is a notable exception), so quite apart from the expense of the sort, the duration of sorts often strongly influences how long a production system is seriously disrupted.

My interest in sorting is not new: I first worked on it in 2011. Early research on it back then prompted Robert Haas and Tom Lane to write the SortSupport infrastructure, which I've now extended here. Originally, the SortSupport infrastructure was all about providing alternative versions of comparators for use in sort routines, versions that avoided certain overhead otherwise inherent to calling functions that are generally accessible from SQL. As a highly extensible system, PostgreSQL requires that sort behavior be defined in terms of a default B-Tree operator class, which is itself defined in terms of SQL operators with underlying SQL-callable functions. These functions are written in C for built-in types, but in principle they could be written in a PL, like PL/Python, for example. When the underlying comparator can be expected to compile to just a few CPU instructions, "fmgr elision" becomes important.

Note that I generalized sortSupport to work for more cases, so B-Tree index builds will get a nice little boost in PostgreSQL 9.5, even for types like integer and float8. That's not what this blog post is really about, though. This blog post is about the interesting new direction that the SortSupport infrastructure has been taken in, beyond mere "fmgr elision" - abbreviation.

A well known problem with the text datatype in PostgreSQL is that it uses the operating system/C standard library strcoll() function to resolve comparisons as tuples are sorted, which is very expensive. It's at least a thousand times more expensive than comparing integers, for example. This general problem is something that Robert Haas has expressed concern about in the past.

The expense relates to a normalization process whereby string comparisons use complex tables to make sure that strings are compared according to the rules of some particular culture or nation (that is, some particular collation associated with a locale). Even in English speaking countries, this is important; for example, the en_US collation considers difference in case (higher case versus lower case) after alphabetical ordering and diacritical differences, so case is considered last of all. In addition, while English usually doesn't have diacritics, sometimes it does. At work, I'm still sometimes annoyed by the sort order of the Linux Hipchat client's user list, which uses the C locale. Hi Ómar!

It was always suspected that we could more effectively amortize the cost of these locale-aware comparisons, by performing a transformation of strings into binary keys using strxfrm(), and sorting the keys instead (using a strcmp()-based comparator with the keys, which only considers raw byte ordering). This comparison will produce equivalent results to just using strcoll() directly. But the binary keys are much larger than the original strings - typically almost 4x larger. Moreover, we'd still need to do a tie-breaker strcmp() comparison (to check for strict binary equality) using the original string, when strcoll() reports equality, because the historic idea of equality that the text type offers is strict binary equality. There were some historic edge cases where a tie-breaker strcmp() was not performed following strcoll() returning '0', resulting in corrupt B-Tree indexes on a Hungarian database. strcoll() could return 0 despite not being passed a pair of bitwise-identical strings.

Having to keep around the original text datum seemed like an additional burden on the whole idea of using strxfrm() blobs as sort keys. It seemed like using binary keys to sort had a lot of promise, but we couldn't quite work out how to exploit that idea - until recently.

Abbreviated keys were committed:

Use abbreviated keys for faster sorting of text datums.

This commit extends the SortSupport infrastructure to allow operator
classes the option to provide abbreviated representations of Datums;
in the case of text, we abbreviate by taking the first few characters
of the strxfrm() blob.  If the abbreviated comparison is insufficent
to resolve the comparison, we fall back on the normal comparator.
This can be much faster than the old way of doing sorting if the
first few bytes of the string are usually sufficient to resolve the

There is the potential for a performance regression if all of the
strings to be sorted are identical for the first 8+ characters and
differ only in later positions; therefore, the SortSupport machinery
now provides an infrastructure to abort the use of abbreviation if
it appears that abbreviation is producing comparatively few distinct
keys.  HyperLogLog, a streaming cardinality estimator, is included in
this commit and used to make that determination for text.

Peter Geoghegan, reviewed by me.

It's surprisingly effective to just store the first 8 bytes of a strxfrm() blob, and tie-break relatively infrequently by using a full old-style comparison, rather than the more obvious approach of sorting with pointers to strxfrm()-generated blobs (the approach that the C standard recommends for general purpose text sorting).


By storing just the first 8 bytes (on 64-bit platforms; 4 bytes on 32-bit platforms) of the strxfrm() blob in a field that would otherwise contain a pointer-to-text (since text is a pass-by-value type) - the same type-punned field that directly stores the representation of pass-by-value types like integer - most comparisons can often be resolved using just those 8 bytes directly, and without pointer-chasing. At the same time, the cost of locale transformations is still quite effectively amortized, because as always when using strxfrm(), binary key blobs performs a transformation O(n) times, rather than an average of O(n log n) times (the transformation process performed by strxfrm() and strcoll() may not be exactly comparable, but close enough).

It turns out that the large binary key blobs produced by strxfrm(), while much larger than the original strings, have a significant concentration of entropy towards the start of the blob (assuming the use of the Unicode collation algorithm, or an algorithm with similar properties). This is because the representation consists of a series of "levels". The "primary weights", which appear first, represent primary alphabetical ordering when using Latin scripts. So whitespace differences and punctuation differences are not represented at that level (nor are differences in case). For accented Latin characters, for example, diacritics are represented at a subsequent level, and so the abbreviated key representation typically won't vary if accents are added or removed to a text datum. This is important because languages that use accents extensively, like French or Spanish, will get a concentration of entropy in their 8 byte abbreviated keys that's about the same as if no accents were used, even though accented code points usually take 2 bytes of storage in UTF-8, rather than 1 byte on unaccented Latin alphabet code points.


A more general problem with sort performance is the problem of cache misses. My earlier work on sorting targeted pass-by-value PostgreSQL types like integer and float8. These pass-by-value types naturally have great locality of reference. Their comparisons are integral operations, which are fast, but operating on a representation that is tightly packed is what makes sorting integers with Quicksort perhaps as fast as is practically possible for a comparison-based sort. Cache miss penalties are likely to be the dominant cost on modern CPUs, that are more bottlenecked on memory bandwidth and latency in every successive generation (PDF); sorting infrastructure must heavily weigh this.

When I initially discussed the idea of abbreviated keys, there was a certain degree of skepticism from other Postgres hackers. What if most comparisons are not resolved by abbreviated comparisons, due to text datums with a lot of redundant or repeated information at the beginning? Could all the strxfrm() work go to waste when that happens? Well, for one thing, low cardinality sets (tuples with text columns that have a relatively low number of distinct values) are not a problem. That's because strcoll() is still a huge cost, and if we can have our authoritative tie-breaker comparator observe that the strings are identical, then no strcoll() comparison is ever needed - we can just exit early with a simple, cheap opportunistic binary comparison (memcmp()), which is almost as good. But what about when there are many different strings with differences towards the end of the string, past the 8th or so byte?

CPU cache characteristics have necessitated complicated engineering trade-offs for sorting infrastructure for a long time. Database luminary Jim Gray proposed an abbreviation-like technique as early as 1994, in his AlphaSort paper (PDF). He describes a "key-prefix sort" in the paper. Even back in 1994, Gray observed that memory latency was the dominant cost by a wide margin. The underlying trends in CPU performance characteristics have continued apace since then. Before his death in 2007, Gray officiated the Sort Benchmark. Among the rules for the "Daytona sort" category, which concerns the sort performance of general-purpose algorithms (which is what I'm interested in), it states that Daytona sort entrants "must not be overly dependent on the uniform and random distribution of key values in the sort input". It's almost as if Gray was saying: "of course I expect you to use abbreviated keys, but don't push your luck!". And so it is for PostgreSQL. Some cases benefit much more than others, and some cases might even be slightly regressed.

Merge Joins

An earlier piece of work for 9.5 had conventional comparisons (not involving abbreviated keys) always try and opportunistic memcmp() tie-breaker. This is likely to be particularly beneficial for merge joins (quite apart from any sort node that may feed the merge join), since they must "synchronize" relations using comparisons that can often be expected to indicate equality. Multi-column sorts on text are also considerably accelerated, where many leading column comparisons can be expected to indicate equality. It's also important for abbreviated keys, because as already mentioned we can still win big with low cardinality sets provided the full tie-breaker comparisons are resolved with a cheap memcmp().


Fundamentally, when you do a cost/benefit analysis, abbreviated keys are very compelling. The upsides are clearly very large, and the break-even point for switching to using abbreviation is surprisingly far out. We cannot ignore the performance benefits of these techniques because some much rarer cases will be slightly regressed. But, as it happens, we have cheap worst case insurnace: hyperLogLog is used to cheaply and fairly accurately check the cardinality of both abbreviated keys and the original text values. If abbreviated cardinality is an effective proxy for full cardinality, then most comparisons will either use abbreviated comparisons, or use a cheap memcmp() tie-breaker, which is almost as good. Otherwise, we abort abbreviation before the sort proper is underway.


Abbreviated keys are just infrastructure. While text is the most compelling case, there are at least a few other datatypes that would greatly benefit from support for abbreviation. These include:

  • numeric
  • character(n)
  • citext (case insensitive text, from contrib/citext)
I welcome others with an interest in making sorting faster to work on the relevant opclass support for each of these types, and possibly others. Other people may be able to come up with novel encoding schemes for these types, that maximize the entropy within the finished abbreviated keys. Order-preserving compression is likely to be an area where text's support could be improved, by making comparisons resolved at the abbreviated key level more frequent. Hopefully the benefits of the abbreviated key infrastructure will not be limited to accelerating sorts on text.

robert berry: Regular Expression Stop Words for Postgresql

From Planet PostgreSQL. Published on Jan 23, 2015.

Regular Expression Stop Words for Postgresql

Jan 24, 2015 – Portland

While working on a NLP project with OCR-derived text, I discovered a need that wasn’t immediately met by the built in parsers and dictionaries which ship with Postgresql. Fortunately text search in Postgresql is fairly extensible.

This post looks at the process of implementing a custom dictionary which can be used to create stop words which are really stop regular expressions.

Background on Parsing, Dictionaries, and Stop Words

The first step in using most text search features is converting unstructured text into text search vectors. A text search vector is a set of (lexeme, position) pairs.

=# select to_tsvector('Time is an illusion.  Lunchtime doubly so.');
 'doubli':6 'illus':4 'lunchtim':5 'time':1
(1 row)

As you might expect, the internal representation of a ts_vector is simply information about where words are located inside a big string.

typedef struct
  int32   vl_len_;    /* varlena header (do not touch directly!) */
  int32   size; 
  WordEntry entries[1];   /* variable length */
  /* lexemes follow the entries[] array */
} TSVectorData;

The transformation from text to ts_vector involves parsing text into tokens, then filtering the tokens through a dictionary which may change or eliminate words.

A text search configuration can be used to map token categories to dictionaries. The official documentation contains additional details.

The Problem Statement

The text corpus is derived from an OCR process which results in some unhelpful tokens.

-43 44 40
Lunchtime is 46 #@$dfsdf an illusion.Q!~ Lunchtime ksdfkjsadjfksdjf so.

So how to get pleasant ts_vectors?

One solution would be adding a token type ocr gibberish to the parser and removing mappings from ocr_gibberish to any dictionary. While you can write your own parser, the default is pretty good, and it does not appear to be easily extended.

Another approach would be to use a dictionary as a white list. Unfortunately this corpus has creative authors who use words — and regularly invent words — not available in a dictionary.

Due to these limitations, the chosen solution was to create a dictionary which removes tokens based on regular expressions.

Writing a Custom Dictionary

This turned out to be fairly painless thanks to examples in pg_contrib.

Implementing a dictionary required implementing just two functions. dict_exclude_init to compile regular expressions from a rule file and dict_exclude_lexize to apply the regular expressions in the lexize process.

Once these functions are implemented and installed, the dictionary can be added.

CREATE TEXT SEARCH TEMPLATE dict_exclude_template (
        LEXIZE = dict_exclude_lexize,
        INIT   = dict_exclude_init

  TEMPLATE = dict_exclude_template

A text search configuration can pass tokens to dict_exclude first, which will prevent other dictionaries from resolving lexemes.

create text search configuration ocr_gibberish ( COPY = pg_catalog.english );
alter text search configuration ocr_gibberish 
  alter mapping for asciihword, asciiword
    with dict_exclude, english_stem;

by adding a regular expression like ([^aeiouAEIOU]{3}) to an exclude.rules file, some gibberish can be removed.

=# select to_tsvector('ocr_gibberish', 
                      'Time is an illusion. sdfsdfsdf  Lunchtime doubly so.');

 'Time':1 'an':3 'illusion':4 'is':2 'so':8

Additional details on using dict_exclude are available in the project’s readme.

If you have solved this problem another way, please let me know. A solution that did not require an external library would be preferable.

Binod Nirvan: MixERP PgDoc on OSX and Linux

From Planet PostgreSQL. Published on Jan 23, 2015.

MixERP PgDoc is a console-based application that creates beautiful PostgreSQL database documentation. The first release supported Windows and with the second release today, we now support OSX and Linux (tested on OSX Yosemite and Ubuntu 14.4). Please make sure you have Mono installed first before you download PgDoc here:

Running on OSX and Linux

Extract the downloaded archive Open terminal and type

mono /path/to/mixerp-pgdoc.exe <arguments>

Or simply

mono /users/nirvan/desktop/pg-doc/mixerp-pgdoc.exe -s=localhost -d=mixerp -u=postgres -p=secret -o=/users/nirvan/desktop/db-doc

I have created a category in MixERP Forums for further discussion.

Gabriele Bartolini: How monitoring of WAL archiving improves with PostgreSQL 9.4 and pg_stat_archiver

From Planet PostgreSQL. Published on Jan 23, 2015.

PostgreSQL 9.4 introduces a new statistic in the catalogue, called pg_stat_archiver.
Thanks to the SQL language it is now possible, in an instant, to check the state of the archiving process of transactional logs (WALs), crucial component of a PostgreSQL disaster recovery system.

PostgreSQL and Barman

Introduction and reasons

The need for the pg_stat_archiver view comes from the last few years of experience with Barman as a solution for disaster recovery of PostgreSQL databases in business continuity environments.

In particular, some of the needs and recurring questions that DBAs, system administrators, CTOs and CIOs repeatedly – and legimitately – share with us, are:

  • how much disk space will I need?
  • how can I keep the whole backup process, including continuous archiving, under control?

The starting point can only be a business requirement and it is defined through the concept of retention policy.
Usually, a company defines a disaster recovery plan within a business continuity plan, where it is clearly defined the period of retention of backup data. In the same documents we find both the recovery point objective (RPO) and the recovery time objective (RTO) definitions, the two key metrics that respectively measure the amount of data that a business can afford to lose and the maximum allowed time to recover from a disaster.
Monitoring and studying the past behaviour of a database are important elements for correctly sizing the storage requirements of a PostgreSQL backup solution.

For example, a company may decide to retain data of a PostgreSQL database for a month, in order to reconstruct the state of the database in a consistent manner at any point in time from the first available backup to the last available WAL file (through the rock solid Point-in-Time-Recovery technology of PostgreSQL, introduced 10 years ago).

The required size is given not only by the number of periodic full backups (for example, one a week), but also the number of WAL files stored in the Barman archive, each containing all the transactions that have been properly executed in the database, in a differential manner.

One of the needed metrics is therefore the number of archived WAL files per second, through which it is possible to estimate the number of WALs generated in a week and to predict disk usage requirements.

Before PostgreSQL 9.4, unless you:

  1. used tools for sampling and trending (e.g. Munin), or
  2. examined the timestamp of each WAL file, taking advantage of a high value of wal_keep_segments, or
  3. delegated this information to a custom script invoked by archive_command,

it was not possible to get any of the following pieces of information from the database server: number of archived WAL files, timestamp and name of the last archived WAL file, number of failures, timestamp and WAL name of the last failure, … and so on.

For this reason, last year I decided to write a small patch for Postgres which then became part of the core for version 9.4. This patch adds a real-time statistic in the PostgreSQL catalogue, called pg_stat_archiver.

Statistics overview

The pg_stat_archiver catalogue view in PostgreSQL 9.4 makes available to Barman users – and from a more general point of view to everyone using classic continuous backup with WAL file shipping – the following fields:

  • archived_count: number of WAL files successfully archived;
  • last_archived_wal: name of the last successfully archived WAL file;
  • last_archived_time: timestamp of the last successfully archived WAL file;
  • failed_count: number of failed WAL archiving attempts;
  • last_failed_wal: WAL name of the last archiving failure;
  • last_failed_time: timestamp of the last archiving failure;
  • stats_reset : timestamp of the last reset of statistics.

Here is an example taken from a local database server (with very low workload):

postgres=# SELECT * FROM pg_stat_archiver;
-[ RECORD 1 ]------+-----------------------------------------
archived_count     | 17
last_archived_wal  | 00000001000000000000000B.00000028.backup
last_archived_time | 2014-12-23 08:40:17.858291+01
failed_count       | 13
last_failed_wal    | 000000010000000000000001
last_failed_time   | 2014-12-04 13:09:07.348307+01
stats_reset        | 2014-12-03 16:52:21.755025+01

Basically, once continuous archiving is activated through archive_mode, PostgreSQL becomes responsible that, for each WAL file that is generated, the archive_command program is successfully executed, re-trying indefinitely in case of error (disk space permitting).

Unlike previous versions, PostgreSQL 9.4 is now able to collect some information regarding the two main events of the archiving process: success and failure. In particular, for both operations, the catalogue view reports:

  • count (since cluster initialisation or the last time statistics were reset);
  • WAL name of the last operation;
  • time of the last operation.

Moreover, you can reset the statistics of the archiver with the following SQL statement:

-- Requires superuser privileges
SELECT pg_stat_reset_shared('archiver');

Integration with Barman

From version 1.4 onwards, Barman automatically takes advantage of the pg_stat_archiver view for PostgreSQL 9.4 databases, transparently reporting information regarding the archiver process in some common informative commands such as status and show-server.

Furthermore, using the power of the SQL language, reliability of the check command has been improved so that an archiver problem is detected directly from the source.

Let’s walk through the query that has been introduced in Barman:

        AND (last_failed_wal IS NULL
            OR last_failed_wal <= last_archived_wal)
        AS is_archiving,
    CAST (archived_count AS NUMERIC)
        / EXTRACT (EPOCH FROM age(now(), stats_reset))
        AS current_archived_wals_per_second
FROM pg_stat_archiver

In addition to retrieving all columns of the pg_stat_archiver view, the query calculates two fields, directly from the source:

  • is_archiving: is the process of WAL archiving in progress or not?
  • current_archived_wals_per_second: frequency of archived WALs per second.

The is_archiving field must be TRUE, as the process of archiving is necessary for Barman to work correctly. Therefore, archive_mode must be active and the value of the last failed WAL must be either undefined (NULL) or not higher than the last properly archived WAL. This control is now part of the barman check command for Postgres 9.4 servers (and subsequent versions).

The second field, on the other hand, returns a very interesting statistic on the workload produced by the Postgres server. This metric allows operation managers to estimate the disk space that is required to store days, weeks and months of WAL files (even compressed), thus responding to one of the major initial questions.

How to check the operating status of archiving

Thanks to pg_stat_archiver, checking the status of WAL continuous archiving comes down to the execution of a single SQL query.

As a result, you can use the above query, already used by Barman, to verify that archiving is correctly working and integrate it in the probes and plugins used by your company’s alerting system.


Those using Barman with a PostgreSQL 9.4 server that have already integrated barman check within Nagios or Icinga will transparently enjoy this feature.


In its simplicity, the pg_stat_archiver view is a very important tool for those who consider disaster recovery a critical component – not a peripheral one – of a PostgreSQL database in a business continuity system.

Although PostgreSQL allows you to perform backups using streaming replication, continuous archiving of WAL files through shipping is a very safe and reliable fallback method (and still the only one so far supported by Barman). Therefore, proper monitoring of this component significantly increases the robustness of the entire Postgres database solution.

Finally, being able to access some statistics about the number of archived WAL files by a PostgreSQL server in a given period through a simple SQL query is an important step towards the analysis of transactional workload and the prediction of disk usage for a backup solution.

US PostgreSQL Association: How We Selected Talks for PGConf US 2015

From Planet PostgreSQL. Published on Jan 22, 2015.

In the spirit of open-source, we would like to share how we handled the talk selection process for PGConf US 2015. This post will discuss the entire process of how a talk ends up in one of our United States PostgreSQL Association conferences; our goal is to help you understand what our conference is looking for in talk proposals and help you decide what you submit for PGConf US 2016!

read more

Leo Hsu and Regina Obe: Installing PostGIS packaged address_standardizer on Ubuntu

From Planet PostgreSQL. Published on Jan 22, 2015.

One of the changes coming to you in PostGIS 2.2 are additional extensions. Two ones close to my heart are the address_standardizer (which was a separate project before, but folded into PostGIS in upcoming 2.2) and the SFCGAL extension for doing very advanced 3D stuff. We had a need to have address standardizer running on our Ubuntu box, but since PostGIS 2.2 isn't released yet, you can't get it without some compiling. Luckily the steps are fairly trivial if you are already running PostGIS 2.1. In this article, I'll walk thru just building an installing the address_standardizer extension from the PostGIS 2.2 code base. Though I'm doing this on Ubuntu, the instructions are pretty much the same on any Linux, just replacing with your Linux package manager.

Continue reading "Installing PostGIS packaged address_standardizer on Ubuntu"

gabrielle roth: PDXPUG: January Lab Next Week: postgres_fdw

From Planet PostgreSQL. Published on Jan 22, 2015.

When: Thu Jan 29, 6pm-8pm
Where: Renewable Funding, 400 SW 6th Ave

We’ll go over Foreign Data Wrappers, focusing on the Postgres FDW.

Our labs are casual, attendee-led affairs with no agenda set in advance. The goal is to share what we know & learn together.

Show up with a couple of Pg databases – they can be on the same cluster or different ones, doesn’t really matter. One of them should have at least one table with data in it.

We do need to know who’s coming so we can notify building security, so please sign up in advance here.

Grab dinner at the carts & come on over!

damien clochard: PoWA 1.2.1 is out !

From Planet PostgreSQL. Published on Jan 22, 2015.

We’ve just released a new version of PoWA, the PostgreSQL Workload Analyzer. Check out complete release note here.

This new version adds an highly requested feature : you can now collect stats from multiple PostgreSQL servers using a single PoWA instance. This should be usefull if you have many servers to monitor. We’ve also improved the UI in many ways : the graph should be easier to read on full screen and the install process is more convienient

This version is probably the major release of the 1.x branch. We’re currently working on the branch that will change almost everything : PoWA 2.x will be only compatible with PostgreSQL 9.4 because we want to use the new stats and features of latest version. This should allow PoWA 2.x to some nice things like finding missing indexes or displaying advanced filesysteme stats. The 2.x version will also have a brand new user interface and we hope this will be similar to what we did when we rewrote completely the pgBadger interface 2 years ago…

Here’s a some preview:

PoWA 2.x screenshot

PoWA 2.x screenshot

PoWA 2.x screenshot

Of course we will continue to maintain PoWA 1.x for the users who installed it on PostgreSQL 9.3 servers

If you go to FOSDEM 2015 next week, come see us ! We will be presenting these new features and more !

If you can’t go to FOSDEM 2015, subscribe to the powa-users mailing list to receive news and updates.

Denish Patel: Postgres – ERROR: Unsupported startup parameter: replication

From Planet PostgreSQL. Published on Jan 22, 2015.

Postgres provides much clear ERROR reporting messages compare to other databases/data-stores I have worked or been working at my $DAYJOB. However, someone reported following error on their Secondary Postgres replicated database server.

2015-01-22 15:20:49.247 GMT 56914 LOG: unexpected pageaddr 142D/73000000 in log file 5166, segment 75, offset 0
2015-01-22 15:20:49.413 GMT 42933 FATAL: could not connect to the primary server: ERROR: Unsupported startup parameter: replication

On the first site, it looks scary but it’s actually not !

I googled a bit but I couldn’t able to find quick answer. So, I investigated further to understand what does reported error mean?  Fortunately, the replication was up-to-date because most of our clients have been recommended or setup to use hybrid replication (WAL shipping using OmniPITR + Streaming Replication).

While discussing further with the team, it turned out they introduced pgbouncer, a connection pooling solution, in to their architecture to better manage connections on the db server. The pgbouncer is set up to run on port 5432 (default Postgres port) on the master database server and make Postgres to listen on 5433. This change wasn’t reflected on secondary replicated database server recovery.conf file and it was still pointing to port 5432. So, we found the cause of the error & the mystery is solved !!

recovery.conf file has been modified to use port 5433 and restarted secondary database to load the config changes. Yay !! Secondary database is connected to primary db server using streaming replication.

2015-01-22 15:41:09.357 GMT 49294 LOG: database system is ready to accept read only connections
2015-01-22 15:41:09.512 GMT 49296 LOG: restored log file "000000030000142E0000006A" from archive
2015-01-22 15:41:10.016 GMT 49296 LOG: unexpected pageaddr 142D/8E000000 in log file 5166, segment 107, offset 0
2015-01-22 15:41:10.182 GMT 49408 LOG: streaming replication successfully connected to primary

I wanted to share my experience that can be useful if you see this ERROR  in your Postgres installation ! Hope this will help :)

David Christensen: PGConf.US NYC: "Choosing a Logical Replication System"

From Planet PostgreSQL. Published on Jan 22, 2015.

I'm excited to have my talk "Choosing a Logical Replication System" accepted to PGConf.US! I'll be speaking on Friday, March 27th from 2:00 - 2:50, as part of the Strategy track.

In this talk I will cover a variety of existing Logical Replication systems for PostgreSQL and go over some of the differences between requirements, supported capabilities, and why you might choose one system over another. I'll also cover some of the changes in PostgreSQL 9.4.

Read about the talk here.

Dimitri Fontaine: My First Slashdot Effect

From Planet PostgreSQL. Published on Jan 21, 2015.

Thanks to the Postgres Weekly issue #89 and a post to Hacker News front page (see Pgloader: A High-speed PostgreSQL Swiss Army Knife, Written in Lisp it well seems that I just had my first Slashdot effect...

Well actually you know what? I don't...

So please consider using the new mirror and maybe voting on Hacker News for either tooling around your favorite database system, PostgreSQL or your favorite programming language, Common Lisp...

It all happens at

Coming to FOSDEM?

If you want to know more about pgloader and are visiting FOSDEM PGDAY or plain FOSDEM I'll be there talking about Migrating to PostgreSQL, the new story (that's pgloader) and about some more reasons why You'd better have tested backups...

If you're not there on the Friday but still want to talk about pgloader, join us at the PostgreSQL devroom and booth!

Glyn Astill: Copying Pavel Stehule’s simple history table but with the jsonb type

From Planet PostgreSQL. Published on Jan 19, 2015.

On 15/01/2015 Pavel Stehule wrote about implementing a dead simple history table using the hstore type. On Friday evening I wanted to copy this almost line for line switching the hstore type for jsonb , but I counldn’t really see how to replicate the update part so simply without creating a delete operator. Once that […]

US PostgreSQL Association: Accepting code donations from the Ukraine (Crimea) and other sanctioned countries

From Planet PostgreSQL. Published on Jan 19, 2015.

President Obama recently signed Executive Order: #13685 [1] , in short this Order states:

(a) The following are prohibited:
(i) new investment in the Crimea region of Ukraine by a United States person, wherever located;
(ii) the importation into the United States, directly or indirectly, of any goods, services, or technology from the Crimea region of Ukraine;
(iii) the exportation, reexportation, sale, or supply, directly or indirectly, from the United States, or by a United States person, wherever located, of any goods, services, or technology to the Crimea region of Ukraine;

read more

Binod Nirvan: MixERP PostgreSQL Database Documenter

From Planet PostgreSQL. Published on Jan 19, 2015.

We chose PostgreSQL database for our ERP software MixERP, which is going Beta 1 release very soon. Unlike other ERP solutions which support PostgreSQL, we do not use PostgreSQL just to merely only store the tables. In fact, we try and take the full advantage of PostgreSQL database power and capabilities. Just to remind you, MixERP is an open source ERP solution, which has been undergoing development since a year. Despite of being a very small team, we have been getting a substantial number of requests for our database design.

Since we needed a good documentation generator, I investigated the existing tools which could do that. I was not so pleased with the existing tools related to documentation generation. I thought I would give it a try myself.

For the task, I used and Npgsql, and of course PostgreSQL, as well. During the last few days, I was in PostgreSQL documentation site all the time:

The documentation site provided metadata information profoundly well. There was nothing that was not already there, except for the fact that I could not find a find a way to ask PostgreSQL to give me the definitions of Composite Types. Interestingly, pgAdmin3 could tell me exactly how a Type was defined. Upon investigating what pgAdmin was doing under the hood, I was able to create a function to extract that information. :)

Coming back to my project MixERPPgDoc, it is a small console application exposing a few arguments, and packs all necessary dependencies inside itself. Since it is a self-contained executable file, one should be able to quickly use it and play around with. To give you an idea, MixERPPgDoc creates HTML documentation on the fly. I used a new and sexy CSS component library called SemanticUI for design, and for code highlighting task, prism.js.


mixerp-pgdoc.exe -s=[server] -d=[database] -u=[pg_user] -p=[pwd] -o=[output_dir]


mixerp-pgdoc.exe -s=localhost -d=mixerp -u=postgres -p=secret -o="c:\mixerp-doc"

Example Documentation Site (Produced by MixERPPgDoc)

MixERPPgDoc is a free software, even for commercial use. Do not use this against your production database. I am not liable if your screw things up.

If you like it, please do share. MixERP project needs you. :)

Michael Paquier: Postgres ODBC driver: libpq to govern them all

From Planet PostgreSQL. Published on Jan 18, 2015.

Many improvements are being done in the ODBC driver for PostgreSQL these days, one being for example the addition of more and more integrated regression tests insuring the code quality. One feature particularly interested has been committed these days and consists of the following commit:

commit: e85fbb24249ae81975b1b2e46da32479de0b58d6
author: Heikki Linnakangas <>
date: Wed, 31 Dec 2014 14:49:20 +0200
Use libpq for everything.

Instead of speaking the frontend/backend protocol directly, use libpq's
functions for executing queries. This makes it libpq a hard dependency, but
removes direct dependencies to SSL and SSPI, and a lot of related code.

This feature can be defined in one single word: simplification. Before discussing about it, see for example the cleanup that this has done in the driver code in terms of numbers:

$ git log -n1 e85fbb2 --format=format: --shortstat
53 files changed, 1720 insertions(+), 8173 deletions(-)

Note the total size of the source code after this commit, which has been reduced by a bit more than 10% in total, which is huge!

# All code
$ git ls-files "*.[c|h]" | xargs wc -l | tail -n1
55998 total
# Regression tests
$ cd test/ && git ls-files "*.[c|h]" | xargs wc -l | tail -n1
5910 total

Now, let's consider the advantages that this new feature has.

First of all, libpq is an in-core library of PostgreSQL managing communication with the backend server that is well-maintained by the core developers of Postgres. Before doing the all-libpq move in Postgres ODBC it was a soft dependency: the driver being usable as well through SSPI, SSL or even nothing thanks to the additional code it carried for managing directly the backend/frontend communication protocol, while with libpq there are APIs directly usable for this purpose. So a large portion of the simplification is related to that (and also to some code used to manage communication socket and SSPI).

Hence, this move is an excellent thing particularly for the Windows installer of Postgres ODBC because until now it needed to include a version of OpenSSL, version that can be vulnerable depending on the issues found in it (remember Heartbleed to convince yourself). So the driver msi installer needed an update each time OpenSSL was dumped to a new version, making its maintenance more frequent. Those updates are not needed when ODBC driver uses libpq as a hard dependency, only PostgreSQL needing an update when a vulnerability is found within OpenSSL for example.

What would be a next step then now that the driver side is more simple? Well, more work on Postgres itself can be done. And this effort has begun with some legwork done in the upcoming 9.5 to allow support of other SSL implementations by making the infrastructure more pluggable with commit 680513a, opening the door for things like SChannel on Windows, new implementations being interesting because this gives more freedom in the choice of dependencies a custom build has.

Pavel Stehule: how to push parameters to DO statement from command line

From Planet PostgreSQL. Published on Jan 18, 2015.

PostgreSQL DO statement doesn't support parametrization. But with psql variables we are able to "inject" do statement safely and we can do it:
bash-4.1$ cat 

echo "
set myvars.msgcount TO :'msgcount';
DO \$\$
FOR i IN 1..current_setting('myvars.msgcount')::int LOOP
END \$\$" | psql postgres -v msgcount=$1

bash-4.1$ ./ 3
Time: 0.386 ms
Time: 1.849 ms

Glyn Astill: Wanting for a hstore style delete operator in jsonb

From Planet PostgreSQL. Published on Jan 16, 2015.

PostgreSQL 9.4 intorduced the jsonb type, but it’d be nice to be able to delete keys and pairs using the “-” operator; just like you can with the hstore type. Fortunately postgres makes creating an operator really easy for us, so lets have a go at creating a delete operator for jsonb. First lets try […]

Dimitri Fontaine: New release: pgloader 3.2

From Planet PostgreSQL. Published on Jan 16, 2015.

PostgreSQL comes with an awesome bulk copy protocol and tooling best known as the COPY and \copy commands. Being a transactional system, PostgreSQL COPY implementation will ROLLBACK any work done if a single error is found in the data set you're importing. That's the reason why pgloader got started: it provides with error handling for the COPY protocol.

That's basically what pgloader used to be all about

As soon as we have the capability to load data from unreliable sources, another use case appears on the horizon, and soon enough pgloader grew the capacity to load data from other databases, some having a more liberal notion of what is sane data type input.

To be able to adapt to advanced use cases in database data migration support, pgloader has grown an advanced command language wherein you can define your own load-time data projection and transformations, and your own type casting rules too.

New in version 3.2 is that in simple cases, you don't need that command file any more. Check out the pgloader quick start page to see some examples where you can use pgloader all from your command line!

Here's one such example, migrating a whole MySQL database data set over to PostgreSQL, including automated schema discovery, automated type casting and on-the-fly data cleanup (think about zero dates or booleans in tinyint(1) disguise), support for indexes, primary keys, foreign keys and comments. It's as simple as:

$ createdb sakila
$ pgloader mysql://root@localhost/sakila pgsql:///sakila
2015-01-16T09:49:36.068000+01:00 LOG Main logs in '/private/tmp/pgloader/pgloader.log'
2015-01-16T09:49:36.074000+01:00 LOG Data errors in '/private/tmp/pgloader/'
                    table name       read   imported     errors            time
------------------------------  ---------  ---------  ---------  --------------
               fetch meta data         43         43          0          0.222s
                  create, drop          0         36          0          0.130s
------------------------------  ---------  ---------  ---------  --------------
                         actor        200        200          0          0.133s
                       address        603        603          0          0.035s
                      category         16         16          0          0.027s
                          city        600        600          0          0.018s
                       country        109        109          0          0.017s
                      customer        599        599          0          0.035s
                          film       1000       1000          0          0.075s
                    film_actor       5462       5462          0          0.147s
                 film_category       1000       1000          0          0.035s
                     film_text       1000       1000          0          0.053s
                     inventory       4581       4581          0          0.086s
                      language          6          6          0          0.041s
                       payment      16049      16049          0          0.436s
                        rental      16044      16044          0          0.474s
                         staff          2          2          0          0.170s
                         store          2          2          0          0.010s
        Index Build Completion          0          0          0          0.000s
------------------------------  ---------  ---------  ---------  --------------
                Create Indexes         40         40          0          0.343s
               Reset Sequences          0         13          0          0.026s
                  Primary Keys         16         14          2          0.013s
                  Foreign Keys         22         22          0          0.078s
                      Comments          0          0          0          0.000s
------------------------------  ---------  ---------  ---------  --------------
             Total import time      47273      47273          0          2.261s

Other options are available to support a variety of input file formats, including compressed csv files found on a remote location, as in:

curl \
    | gunzip -c                                                        \
    | pgloader --type csv                                              \
               --field "usps,geoid,aland,awater,aland_sqmi,awater_sqmi,intptlat,intptlong" \
               --with "skip header = 1"                                \
               --with "fields terminated by '\t'"                      \
               -                                                       \

2015-01-16T10:09:06.027000+01:00 LOG Main logs in '/private/tmp/pgloader/pgloader.log'
2015-01-16T10:09:06.032000+01:00 LOG Data errors in '/private/tmp/pgloader/'
                    table name       read   imported     errors            time
------------------------------  ---------  ---------  ---------  --------------
                         fetch          0          0          0          0.010s
------------------------------  ---------  ---------  ---------  --------------
             districts_longlat        440        440          0          0.087s
------------------------------  ---------  ---------  ---------  --------------
             Total import time        440        440          0          0.097s

As usual in unix commands, the - input filename stands for standard input and allows streaming data from a remote compressed file down to PostgreSQL.

So if you have any data loading job, including data migrations from SQLite, MySQL or MS SQL server: have a look at pgloader!

David Kerr: Introducing MrTuner and RDSTune

From Planet PostgreSQL. Published on Jan 15, 2015.

Two simple gems to help tune your Postgres databases.

I'm a big fan of PgTune. I think that in many cases you can run PgTune and set-it-and-forget-it for your Postgres parameters. I like it so much that I often wish I had access to it in my code - especially when working with Puppet to provision new databases servers.

When I started looking into RDS Postgres a while back I realized that the default configuration for those instances was lacking and I really wished I could run PgTune on the RDS instances.

It was to solve those problems above that these two projects formed.

  • RDSTune will create a MrTuner-ized RDS Parameter Group 
  • MrTuner is a Ruby gem that follows in the sprit of PgTune if not directly in it's footsteps.

Both will run from the command line but, more importantly, they can be `required` by your ruby projects to allow you to access these values programmatically. 

Both Gems are available on rubygems and source, examples, configuration and docks available at their respective bitbucket pages.

RDSTune -
MrTuner -

Feedback and Pull requests very welcome!

Pavel Stehule: most simply implementation of history table with hstore extension

From Planet PostgreSQL. Published on Jan 15, 2015.

Postgres has this nice extension (hstore) lot of years. It can be used for simulation some features of doc databases - or can be used for implementation of generic triggers for history table:

I have a table test and table history:

CREATE TABLE test(a int, b int, c int);

event_time timestamp(2),
executed_by text,
origin_value hstore,
new_value hstore

INSERT INTO history(event_time, executed_by, new_value)
$$ LANGUAGE plpgsql;

INSERT INTO history(event_time, executed_by, origin_value)
$$ LANGUAGE plpgsql;

hs_new hstore := hstore(NEW);
hs_old hstore := hstore(OLD);
INSERT INTO history(event_time, executed_by, origin_value, new_value)
VALUES(CURRENT_TIMESTAMP, SESSION_USER, hs_old - hs_new, hs_new - hs_old);
$$ LANGUAGE plpgsql;

CREATE TRIGGER test_history_insert AFTER INSERT ON test

CREATE TRIGGER test_history_delete AFTER DELETE ON test

CREATE TRIGGER test_history_update AFTER UPDATE ON test
INSERT INTO test VALUES(1000, 1001, 1002);
UPDATE test SET a = 10, c = 20;

postgres=# SELECT * FROM history ;
event_time | executed_by | origin_value | new_value
2015-01-15 20:59:05.52 | pavel | | "a"=>"1000", "b"=>"1001", "c"=>"1002"
2015-01-15 20:59:05.6 | pavel | "a"=>"1000", "c"=>"1002" | "a"=>"10", "c"=>"20"
2015-01-15 20:59:06.51 | pavel | "a"=>"10", "b"=>"1001", "c"=>"20" |
(3 rows)

Tested on PostgreSQL 9.2

Denish Patel: Postgres work_mem setting is a powerful performance tuning knob!

From Planet PostgreSQL. Published on Jan 15, 2015.

Postgres provides various parameters in postgresql.conf  for performance tuning related to help better use database server resources. At OmniTI, One of our client’s application was reported slowness during high intensive but very critical daily operations. The Postgres DB server is servicing mail generation application with 3-4K TPS and containing more than 400K partitioned tables.  The systems, including DB servers, are getting monitored through Circonus, so it was easier to review graph patterns to find out anything changed recently.

While investigating the graph patterns, we came across significantly high IO increase  (~10-11K per Sec) on Circonus IOSTAT graph around end of Nov. This pattern change matches with the Software upgrade. The software was upgraded by end of November !! As you can see below in the graph, the IO increased up to 10-11K per seconds after upgrading software since last week of Nov,2014.


It was easier to dig further because daily pgbadger log analysis reports were ready for review! Since Software upgrade, the most accessible stored function was generating ~8TB of of temp files per day!! Digging further into report, the average size of generated temp files were around ~100MB size.  As you might be aware, hash tables and sort operations should happen in memory , which depends on work_mem setting,  if the size of the operation is larger than work_mem , then it will end up happening on disks. The work_mem was set to 50MB. It’s clear that the last software upgrade introduced a change in function and underlying query. This change was the root cause of spike in disk IO activities observed in the IOSTAT graph.

The first attempt to tune the query so it avoids disk sorting but there are two challenges to tune the query:

(1) It wasn’t easy to change the application code because it has to go through application change process, which might take 1-2 months.

(2) the query is accessing pg catalog tables to gather details. The pg catalog tables are pretty large because of large number of partitioned tables. For example, pg_class table size is 2597 MB.

However, the server specs are beefy !  It has 252 GBs of total available RAM and have enough memory to allocate for memory sorting. We decided to increase work_mem from 50MB to 128MB.  Postgres allows to change work_mem without database restart, it was pretty easy to convince client to  buy in the proposed change.  After the change,  you can see in the graph below, it helped to reduce IO to 1/2 !!  That’s amazing improvement with a single online change :)


We are still working on the problematic query and some other temp table generating queries to reduce IO on the system.  Clearly, work_mem knob helped a lot until the queries are fixed permanently. shutting down

By Adomas Paltanavičius from Planet Django. Published on Mar 08, 2014.

Is Open Source Consulting Dead?

By chrism from plope. Published on Sep 10, 2013.

Has Elvis left the building? Will we be able to sustain ourselves as open source consultants?

Consulting and Patent Indemification

By chrism from plope. Published on Aug 09, 2013.

Article about consulting and patent indemnification

Python Advent Calendar 2012 Topic

By chrism from plope. Published on Dec 24, 2012.

An entry for the 2012 Japanese advent calendar at

Why I Like ZODB

By chrism from plope. Published on May 15, 2012.

Why I like ZODB better than other persistence systems for writing real-world web applications.

A str. __iter__ Gotcha in Cross-Compatible Py2/Py3 Code

By chrism from plope. Published on Mar 03, 2012.

A bug caused by a minor incompatibility can remain latent for long periods of time in a cross-compatible Python 2 / Python 3 codebase.

In Praise of Complaining

By chrism from plope. Published on Jan 01, 2012.

In praise of complaining, even when the complaints are absurd.

2012 Python Meme

By chrism from plope. Published on Dec 24, 2011.

My "Python meme" replies.

In Defense of Zope Libraries

By chrism from plope. Published on Dec 19, 2011.

A much too long defense of Pyramid's use of Zope libraries.

Plone Conference 2011 Pyramid Sprint

By chrism from plope. Published on Nov 10, 2011.

An update about the happenings at the recent 2011 Plone Conference Pyramid sprint.

Jobs-Ification of Software Development

By chrism from plope. Published on Oct 17, 2011.

Try not to Jobs-ify the task of software development.

WebOb Now on Python 3

By chrism from plope. Published on Oct 15, 2011.

Report about porting to Python 3.

Open Source Project Maintainer Sarcastic Response Cheat Sheet

By chrism from plope. Published on Jun 12, 2011.

Need a sarcastic response to a support interaction as an open source project maintainer? Look no further!

Pylons Miniconference #0 Wrapup

By chrism from plope. Published on May 04, 2011.

Last week, I visited the lovely Bay Area to attend the 0th Pylons Miniconference in San Francisco.

Pylons Project Meetup / Minicon

By chrism from plope. Published on Apr 14, 2011.

In the SF Bay Area on the 28th, 29th, and 30th of this month (April), 3 separate Pylons Project events.

PyCon 2011 Report

By chrism from plope. Published on Mar 19, 2011.

My personal PyCon 2011 Report