Skip to content. | Skip to navigation

Personal tools
Log in
Sections
You are here: Home

Open Source Posts

Magnus Hagander: The end of PostgreSQL 8.4 - a small tribute

From Planet PostgreSQL. Published on Jul 24, 2014.

Todays release marks the end of life for PostgreSQL 8.4, after 5 years of service.

I've found it interesting that PostgreSQL 8.4 was probably the first release to actually make it possible to run fairly busy PostgreSQL installations without any hands-on at all. There were many important parts to 8.4 that made that happen, such as much better standby database support (though not hot standby yet - that came in 9.0) and better statistics and monitoring systems. 8.4 also came with Recursive CTEs (WITH-queries) and Window Aggregates, both of which made a whole new set of applications possible.

But I think nothing actually brought about this "run without hands-on" as much as the new code for Visibility Map (VM) and in particular the automatic sizing of the Free Space Map (FSM). Anybody who deployed 8.3 or earlier in any kind of busy environment knows the pain of trying to configure max_fsm_pages correctly. It was almost impossible to get it right, and the value of right kept changing with your load and data. And the worst part of it all was that if you got it wrong you were in trouble - there wasn't really any remedy other than taking your database offline (or at least read-only) for manual full database vacuums (there were tricks to get around the vacuum specifics, but the core problem was there). So what happened was that a lot of those people who knew what to do just increased that number to something that was "big enough", which usually meant "way too big" and thus wasting resources. And even with that, sometimes ran into it not being big enough because the workload changed.

In fact, I think more or less every single system I did reviews for customers for on those versions had a substantially wrong number in max_fsm_pages - usually the default value because they had no idea what to do with it - and were paying the price for it.

Extra funny is that I've been talking to Heikki (who wrote the dynamic FSM and VM code) a few times recently, and he's still surprised that these changes had such a big impact on real life deployments. Probably because it's one of those things that is usually not a big problem at all in small installations, developer installations, and testing systems, but can quickly snowball in real life production. I think many of us didn't realize before the release actually made it out there how useful it would be.

This feature, together with things like Recursive CTEs and Window Functions really makes 8.4 one of the landmark PostgreSQL releases. It's easier to point to releases like 9.0 which had Streaming Replication and Hot Standby because they are sexier and more marketable features, but infrastructure like this is more important than we often think.

8.4 got rid of some of the hardest things to tune. Let's stay on the path of trying to keep the system simple enough to use that it doesn't need handholding, and look forward to the upcoming 9.4 release!

Greg Sabino Mullane: Postgresql conflict handling with Bucardo and multiple data sources

From Planet PostgreSQL. Published on Jul 23, 2014.


Image by Flickr user Rebecca Siegel (cropped)

Bucardo's much publicized ability to handle multiple data sources often raises questions about conflict resolution. People wonder, for example, what happens when a row in one source database gets updated one way, and the same row in another source database gets updated a different way? This article will explain some of the solutions Bucardo uses to solve conflicts. The recently released Bucardo 5.1.1 has some new features for conflict handling, so make sure you use at least that version.

Bucardo does multi-source replication, meaning that users can write to more than one source at the same time. (This is also called multi-master replication, but "source" is a much more accurate description than "master"). Bucardo deals in primary keys as a way to identify rows. If the same row has changed on one or more sources since the last Bucardo run, a conflict has arisen and Bucardo must be told how to handle it. In other words, Bucardo must decide which row is the "winner" and thus gets replicated to all the other databases.

For this demo, we will again use an Amazon AWS. See the earlier post about Bucardo 5 for directions on installing Bucardo itself. Once it is installed (after the './bucardo install' step), we can create some test databases for our conflict testing. Recall that we have a handy database named "shake1". As this name can get a bit long for some of the examples below, let's make a few databases copies with shorter names. We will also teach Bucardo about the databases, and create a sync named "ctest" to replicate between them all:

createdb aa -T shake1
createdb bb -T shake1
createdb cc -T shake1
bucardo add db A,B,C dbname=aa,bb,cc
## autokick=0 means new data won't replicate right away; useful for conflict testing!
bucardo add sync ctest dbs=A:source,B:source,C:source tables=all autokick=0
bucardo start

Bucardo has three general ways to handle conflicts: built in strategies, a list of databases, or using custom conflict handlers. The primary strategy, and also the default one for all syncs, is known as bucardo_latest. When this strategy is invoked, Bucardo scans all copies of the conflicted table across all source databases, and then orders the databases according to when they were last changed. This generates a list of databases, for example "B C A". For each conflicting row, the database most recently updated - of all the ones involved in the conflict for that row - is the winner. The other built in strategy is called "bucardo_latest_all_tables", which scans all the tables in the sync across all source databases to find a winner.

There may be other built in strategies added as experience/demand dictates, but it is hard to develop generic solutions to the complex problem of conflicts, so non built-in strategies are preferred. Before getting into those other solutions, let's see the default strategy (bucardo_latest) in action:

## This is the default, but it never hurts to be explicit:
bucardo update sync ctest conflict=bucardo_latest
Set conflict strategy to 'bucardo_latest'
psql aa -c "update work set totalwords=11 where title~'Juliet'"; \
psql bb -c "update work set totalwords=21 where title~'Juliet'"; \
psql cc -c "update work set totalwords=31 where title~'Juliet'"
UPDATE 1
UPDATE 1
UPDATE 1
bucardo kick sync ctest 0
Kick ctest: [1 s] DONE!
## Because cc was the last to be changed, it wins:
for i in {aa,bb,cc}; do psql $i -tc "select current_database(), \
totalwords from work where title ~ 'Juliet'"; done
aa   |   31
bb   |   31
cc   |   31

Under the hood, Bucardo actually applies the list of winning databases to each conflicting row, such that example above of "B C A" means that database B wins in a conflict in which a rows was updated by B and C, or B and A, or B and C and A. However, if B did not change the row, and the conflict is only between C and A, then C will win.

As an alternative to the built-ins, you can set conflict_strategy to a list of the databases in the sync, ordered from highest priority to lowest, for example "C B A". The list does not have to include all the databases, but it is a good idea to do so. Let's see it in action. We will change the conflict_strategy for our test sync and then reload the sync to have it take effect:


bucardo update sync ctest conflict='B A C'
Set conflict strategy to 'B A C'
bucardo reload sync ctest
Reloading sync ctest...Reload of sync ctest successful
psql aa -c "update work set totalwords=12 where title~'Juliet'"; \
psql bb -c "update work set totalwords=22 where title~'Juliet'"; \
psql cc -c "update work set totalwords=32 where title~'Juliet'"
UPDATE 1
UPDATE 1
UPDATE 1
bucardo kick sync ctest 10
Kick ctest: [1 s] DONE!
## This time bb wins, because B comes before A and C
for i in {aa,bb,cc}; do psql $i -tc "select current_database(), \
totalwords from work where title ~ 'Juliet'"; done
aa   |   22
bb   |   22
cc   |   22

The final strategy for handling conflicts is to write your own code. Many will argue this is the best approach. It is certaiy the only one that will allow you to embed your business logic into the conflict handling.

Bucardo allows loading of snippets of Perl code known as "customcodes". These codes take effect at specified times, such as after triggers are disabled, or when a sync has failed because of an exception. The specific time we want is called "conflict", and it is an argument to the "whenrun" attribute of the customcode. A customcode needs a name, the whenrun argument, and a file to read in for its content. They can also be associated with one or more syncs or tables.

Once a conflict customcode is in place and a conflict is encountered, the code will be invoked, and it will in turn pass information back to Bucardo telling it how to handle the conflict.

The code should expect a single argument, a hashref containing information about the current sync. This hashref tells the current table, and gives a list of all conflicted rows. The code can tell Bucardo which database to consider the winner for each conflicted row, or it can simply declare a winning database for all rows, or even for all tables. It can even modify the data in any of the tables itself. What it cannot do (thanks to the magic of DBIx::Safe) is commit, rollback, or do other dangerous actions since we are in the middle of an important transaction.

It's probably best to show by example at this point. Here is a file called ctest1.pl that asks Bucardo to skip to the next applicable customcode if the conflict is in the table "chapter". Otherwise, it will tell it to have database "C" win all conflicts for this table, and fallback to the database "B" otherwise.

## ctest1.pl - a sample conflict handler for Bucardo
use strict;
use warnings;

my $info = shift;
## If this table is named 'chapter', do nothing
if ($info->{tablename} eq 'chapter') {
    $info->{skip} = 1;
}
else {
    ## Winning databases, in order
    $info->{tablewinner} = 'C B A';
}
return;

Let's add in this customcode, and associate it with our sync. Then we will reload the sync and cause a conflict.

bucardo add customcode ctest \
  whenrun=conflict src_code=ctest1.pl sync=ctest
Added customcode "ctest"
bucardo reload sync ctest
Reloading sync ctest...Reload of sync ctest successful
psql aa -c "update work set totalwords=13 where title~'Juliet'"; \
psql bb -c "update work set totalwords=23 where title~'Juliet'"; \
psql cc -c "update work set totalwords=33 where title~'Juliet'"
UPDATE 1
UPDATE 1
UPDATE 1
bucardo kick sync ctest 0
Kick ctest: [1 s] DONE!
## This time cc wins, because we set all rows to 'C B A'
for i in {aa,bb,cc}; do psql $i -tc "select current_database(), \
totalwords from work where title ~ 'Juliet'"; done
aa   |   33
bb   |   33
cc   |   33

We used the 'skip' hash value to tell Bucardo to not do anything if the table is named "chapter'. In real life, we would have another customcode that will handle the skipped table, else any conflict in it will cause the sync to stop. Any number of customcodes can be attached to syncs or tables.

The database preference will last for the remainder of this sync's run, so any other conflicts in other tables will not even bother to invoke the code. You can use the hash key "tablewinneralways" to make this decision sticky, in that it will apply for all future runs by this sync (its KID technically) - which effectively means the decision stays until Bucardo restarts.

One of the important structures sent to the code is a hash named "conflicts", which contains all the changed primary keys, and, for each one, a list of which databases were involved in the sync. A Data::Dumper peek at it would look like this:

$VAR1 = {
  'romeojuliet' => {
    'C' => 1,
    'A' => 1,
    'B' => 1,
  }
};

The job of the conflict handling code (unless using one of the "winner" hash keys) is to change each of those conflicted rows from a hash of involved databases into a string describing the preferred order of databases. The Data::Dumper output would thus look like this:

$VAR1 = {
  'romeojuliet' => 'B'
};

The code snippet would look like this:

## ctest2.pl - a simple conflict handler for Bucardo.
use strict;
use warnings;

my $info = shift;
for my $row (keys %{ $info->{conflicts} }) {
  ## Equivalent to 'A C B'
  $info->{conflicts}{$row} = exists $info->{conflicts}{$row}{A} ? 'A' : 'C';
}

## We don't want any other customcodes to fire: we have handled this!
$info->{lastcode} = 1;
return;

Let's see that code in action. Assuming the above "bucardo add customcode" command was run, we will need to load an updated version, and then reload the sync. We create some conflicts, and check on the leresults:


bucardo update customcode ctest src_code=ctest2.pl
Changed customcode "ctest" src_code with content of file "ctest2.pl"
bucardo reload sync ctest
Reloading sync ctest...Reload of sync ctest successful
psql aa -c "update work set totalwords=14 where title~'Juliet'"; \
psql bb -c "update work set totalwords=24 where title~'Juliet'"; \
psql cc -c "update work set totalwords=34 where title~'Juliet'"
UPDATE 1
UPDATE 1
UPDATE 1
bucardo kick sync ctest 10
Kick ctest: [2 s] DONE!
## This time aa wins, because we set all rows to 'A C B'
for i in {aa,bb,cc}; do psql $i -tc "select current_database(), \
totalwords from work where title ~ 'Juliet'"; done
aa   |   14
bb   |   14
cc   |   14

That was an obviously oversimplified example, as we picked 'A' for no discernible reason! These conflict handlers can be quite complex, and are only limited by your imagination - and your business logic. As a final example, let's have the code examine some other things in the database, and as well as jump out of the database itself(!) to determine the resolution to the conflict:

## ctest3.pl - a somewhat silly conflict handler for Bucardo.
use strict;
use warnings;
use LWP;

my $info = shift;

## What is the weather in Walla Walla, Washington?
## If it's really hot, we cannot trust server A
my $max_temp = 100;
my $weather_url = 'http://wxdata.weather.com/wxdata/weather/rss/local/USWA0476?cm_ven=LWO&cm_cat=rss';
my $ua = LWP::UserAgent->new;
my $req = HTTP::Request->new(GET => $weather_url);
my $response = $ua->request($req)->content();
my $temp = ($response =~ /(\d+) \°/) ? $1 : 75;
## Store in our shared hash so we don't have to look it up every run
## Ideally we'd add something so we only call it if the temp has not been checked in last hour
$info->{shared}{wallawallatemp} = $temp;

## We want to count the number of sessions on each source database
my $SQL = 'SELECT count(*) FROM pg_stat_activity';
for my $db (sort keys %{ $info->{dbinfo} }) {
    ## Only source databases can have conflicting rows
    next if ! $info->{dbinfo}{$db}{issource};
    ## The safe database handles are stored in $info->{dbh}
    my $dbh = $info->{dbh}{$db};
    my $sth = $dbh->prepare($SQL);
    $sth->execute();
    $info->{shared}{dbcount}{$db} = $sth->fetchall_arrayref()->[0][0];
}

for my $row (keys %{ $info->{conflicts} }) {
    ## If the temp is too high, remove server A from consideration!
    if ($info->{shared}{wallawallatemp} > $max_temp) {
        delete $info->{conflicts}{$row}{A}; ## May not exist, but we delete anyway
    }

    ## Now we can sort by number of connections and let the least busy db win
    (my $winner) = sort {
        $info->{shared}{dbcount}{$a} <=> $info->{shared}{dbcount}{$b}
        or
        ## Fallback to reverse alphabetical if the session counts are the same
        $b cmp $a
    } keys %{ $info->{conflicts}{$row} };

    $info->{conflicts}{$row} = $winner;
}

## We don't want any other customcodes to fire: we have handled this!
$info->{lastcode} = 1;
return;

We'll forego the demo: suffice to say that B always won in my tests, as Walla Walla never got over 97, and all my test databases had the same number of connections. Note some of the other items in the $info hash: "shared" allows arbitrary data to be stored across invocations of the code. The "lastcode" key tells Bucardo not to fire any more customcodes. While this example is very impractical, it does demonstrate the power available to you when solving conflicts.

Hopefully this article answers many of the questions about conflict handling with Bucardo. Suggestions for new default handlers and examples of real-world conflict handlers are particularly welcome, as well as any other questions or comments. You can find the mailing list at bucardo-general@bucardo.org, and subscribe by visiting the bucardo-general Info Page.

Gurjeet Singh: Announcing TPC-C.js; a Lightweight Implementation of TPC-C

From Planet PostgreSQL. Published on Jul 23, 2014.

I am glad to announce the beta release of TPC-C.js, which implements one of the most popular database benchmarks, TPC-C. It’s not a coincidence that today is also the 22nd anniversary of the TPC-C benchmark.

It currently supports Postgres database, but can be easily extended to test other database systems.

You might ask “Why another TPC-C implementation when we already have so many of them?”“

Short answer: This one is very light on system resources, so you can

  1. Run the benchmark strictly adhering to the specification, and
  2. Invest more in database hardware, rather than client hardware.

Long answer: It’s covered in the Motivation section of TPC-C.js, which I’ll quote here:

Motivation

The TPC-C benchmark drivers currently available to us, like TPCC-UVa, DBT2, HammerDB, BenchmarkSQL, etc., all run one process (or thread) per simulated client. Because the TPC-C benchmark specification limits the max tpmC metric (transactions per minute of benchmark-C) from any single client to be 1.286 tpmC, this means that to get a result of, say, 1 million tpmC we have to run about 833,000 clients. Even for a decent number as low as 100,000 tpmC, one has to run 83,000 clients.

Given that running a process/thread, even on modern operating systems, is a bit expensive, it requires a big upfront investment in hardware to run the thousands of clients required for driving a decent tpmC number. For example, the current TPC-C record holder had to run 6.8 million clients to achieve 8.55 million tpmC, and they used 16 high-end servers to run these clients, which cost them about $ 220,000 (plus $ 550,000 in client-side software).

So, to avoid those high costs, these existing open-source implementations of TPC-C compromise on the one of the core requirements of the TPC-C benchmark: keying and thinking times. These implementations resort to just hammering the SUT (system under test) with a constant barrage of transactions from a few clients (ranging from 10-50).

So you can see that even though a decent modern database (running on a single machine) can serve a few hundred clients simultaneously, it ends up serving very few (10-50) clients. I strongly believe that this way the database is not being tested to its full capacity; at least not as the TPC-C specification intended.

The web-servers of yesteryears also suffer from the same problem; using one process for each client request prohibits them from scaling, because the underlying operating system cannot run thousands of processes efficiently. The web-servers solved this problem (known as c10k problem) by using event-driven architecture which is capable of handling thousands of clients using a single process, and with minimal effort on the operating system’s part.

So this implementation of TPC-C uses a similar architecture and uses NodeJS, the event-driven architecture, to run thousands of clients against a database.

Hans-Juergen Schoenig: Optimization issues: Cross column correlation

From Planet PostgreSQL. Published on Jul 23, 2014.

Planner estimates have already been discussed on this blog in my previous posting and also in some posting before that. A couple of years ago I stumbled over an interesting issue which is commonly known as “cross correlation”. Let us consider the following example: test=# CREATE TABLE t_test (a int, b int); CREATE TABLE test=# […]

Andrew Dunstan: Code size

From Planet PostgreSQL. Published on Jul 21, 2014.

Someone was just talking about the size of some source files in PostgreSQL. The source code (.c, .h, .y and .l files) weighs in at a bit over 1 million lines of code. The documentation source has another roughly 300,000 lines. That's a large project, but by no means enormous by today's standards. The biggest source code file is pg_dump.c, at around 15,700 lines. The biggest documentation file is funcs.sgml, at around 17,600 lines. Both of these might well do with a bit of reorganization.

Michael Paquier: Postgres 9.5 feature highlight: Display failed queries in psql

From Planet PostgreSQL. Published on Jul 19, 2014.

Postgres 9.5 is coming up with a new ECHO mode for psql that has been introduced by this commit:

commit: 5b214c5dd1de37764797b3fb9164af3c885a7b86
author: Fujii Masao <fujii@postgresql.org>
date: Thu, 10 Jul 2014 14:27:54 +0900
Add new ECHO mode 'errors' that displays only failed commands in psql.

When the psql variable ECHO is set to 'errors', only failed SQL commands
are printed to standard error output. Also this patch adds -b option
into psql. This is equivalent to setting the variable ECHO to 'errors'.

Pavel Stehule, reviewed by Fabrízio de Royes Mello, Samrat Revagade,
Kumar Rajeev Rastogi, Abhijit Menon-Sen, and me.

Up to now, there have been two ECHO modes:

  • "all", to print to the standard output all the queries before they are parsed or executed. This can be set when starting psql with option -a.
  • "queries", to have psql print all the queries sent to server. This can be set additionally with option -e of psql.

The new mode is called "errors" and can be either set with the option -b when starting psql or with "set" command in a psql client like that:

=# \set ECHO errors

The feature added is simple: have psql print all the failed queries in the standard error output. The failed query is printed in an additional field prefixed with STATEMENT:

=# CREATE TABLES po ();
ERROR:  42601: syntax error at or near "TABLES"
LINE 1: CREATE TABLES po ();
           ^
LOCATION:  scanner_yyerror, scan.l:1053
STATEMENT:  CREATE TABLES po ();

If multiple queries are specified within a single input only the query that failed is displayed:

=# CREATE TABLE aa (a int); CREATE FOO po; CREATE TABLE bb (a int);
CREATE TABLE
ERROR:  42601: syntax error at or near "FOO"
LINE 1: CREATE FOO po;
               ^
LOCATION:  scanner_yyerror, scan.l:1053
STATEMENT:  CREATE FOO po;
CREATE TABLE

Also, queries that are typed in multiple lines are showed as they are, spaces included:

=# SELECT
      col1_not_here,
      col2_not_here
   FROM
      table_not_here;
ERROR:  42P01: relation "table_not_here" does not exist
LINE 5:     table_not_here;
            ^
LOCATION:  parserOpenTable, parse_relation.c:986
STATEMENT:  SELECT
    col1_not_here,
    col2_not_here
FROM
    table_not_here;

Hans-Juergen Schoenig: Detecting wrong planner estimates

From Planet PostgreSQL. Published on Jul 17, 2014.

In 99% of all the cases the PostgreSQL planner is doing a perfect job to optimize your queries and to make sure, that you can enjoy high performance and low response times. The infrastructure ensuring this is both sophisticated as well as robust. However, there are some corner cases, which can turn out to be quite […]

Hubert 'depesz' Lubaczewski: Waiting for 9.5 – psql: Show tablespace size in \db+

From Planet PostgreSQL. Published on Jul 15, 2014.

On 14th of July, Alvaro Herrera committed patch: psql: Show tablespace size in \db+   Fabrízio de Royes Mello As I previously mentioned – I'm sucker for psql additions. And while todays patch is rather small, I really like it. What it does? It simply makes \db+ show size of all objects in given tablespace. […]

gabrielle roth: PDXPUG: July meeting – OSCON BoF Session

From Planet PostgreSQL. Published on Jul 15, 2014.

When: 7-8pm Tuesday
Where: Oregon Convention Center, Room E147

We’re having a Birds of a Feather session at OSCON instead of our usual July meeting. Come hang out with Pg peeps who are in town for the conference! You do not need to be registered for the conference to attend the BoF.

There will be an additional social hour afterwards should we need it.

See you there!


Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Implement IMPORT FOREIGN SCHEMA.

From Planet PostgreSQL. Published on Jul 14, 2014.

On 10th of July, Tom Lane committed patch: Implement IMPORT FOREIGN SCHEMA.   This command provides an automated way to create foreign table definitions that match remote tables, thereby reducing tedium and chances for error. In this patch, we provide the necessary core-server infrastructure and implement the feature fully in the postgres_fdw foreign-data wrapper. Other […]

Magnus Hagander: Getting information about PostgreSQL SSL connections

From Planet PostgreSQL. Published on Jul 14, 2014.

PostgreSQL can, as many other products, use SSL to secure client/server communications. It can be configured to be optional or required, to require a client certificate, or even to use the SSL client certificate to log in. However, the DBA tools are currently a bit lacking in this regard. In particular, there is no way for a DBA to see what SSL parameters are in force (or indeed if SSL is enabled at all) for an already existing connection.

There are multiple ways to see the status of a specific connection (such as the libpq PQgetssl() function, the psql startup message or the sslinfo module. Unfortunately all these methods have one thing in common - they are only available to the process making the connection to the database, not to a DBA inspecting the system from the outside.

9.4 will make it a little bit better, because log_connections now include SSL information when the user connects, similar to:
LOG:  connection authorized: user=mha database=postgres SSL enabled (protocol=TLSv1.1, cipher=ECDHE-RSA-AES256-SHA)

But this can still be a bit of a pain to figure out for existing connectioons of course.

To deal with this problem, I've written a small PostgreSQL extension called pg_sslstatus. When loaded using shared_preload_libraries it will keep track of the SSL status for each connection, and expose it through a system view named pg_sslstatus like this:

postgres=# SELECT * FROM pg_sslstatus;
  pid  | ssl | bits | compression | version |        cipher        |                         clientdn                         
-------+-----+------+-------------+---------+----------------------+----------------------------------------------------------
 27286 | t   |  256 | f           | TLSv1   | ECDHE-RSA-AES256-SHA | 
 26682 | t   |  256 | t           | TLSv1   | ECDHE-RSA-AES256-SHA | /C=AU/ST=Some-State/O=Internet Widgits Pty Ltd/CN=magnus
 26693 | f   |      |             |         |                      | 
(3 rows)

It will include one row for each connection, and can then be joined with either pg_stat_activity or pg_stat_replication to view the SSL data in a bigger context:

postgres=# SELECT datname, usename, client_addr, ssl, cipher FROM pg_stat_activity INNER JOIN pg_sslstatus ON pg_stat_activity.pid=pg_sslstatus.pid;
 datname  | usename | client_addr | ssl |        cipher        
----------+---------+-------------+-----+----------------------
 postgres | mha     | 127.0.0.1   | t   | ECDHE-RSA-AES256-SHA
 postgres | mha     |             | f   | 
 pgweb    | pgweb   |             | f   | 
(2 rows)

The extension is available from my github account today for existing version of PostgreSQL, under the standard PostgreSQL license. My plan is to work on having a similar view included in PostgreSQL 9.5 by default, but it's unfortunately too late to include it in 9.4 at this point.

Michael Paquier: Postgres 9.5 feature highlight: IMPORT FOREIGN SCHEMA

From Planet PostgreSQL. Published on Jul 11, 2014.

IMPORT FOREIGN SCHEMA is a SQL query defined in the SQL specification allowing to import from a foreign source a schema made of foreign tables. Its support has been added in Postgres 9.5 with the following commit:

commit 59efda3e50ca4de6a9d5aa4491464e22b6329b1e
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Thu Jul 10 15:01:31 2014 -0400

Implement IMPORT FOREIGN SCHEMA.

This command provides an automated way to create foreign table definitions
that match remote tables, thereby reducing tedium and chances for error.
In this patch, we provide the necessary core-server infrastructure and
implement the feature fully in the postgres_fdw foreign-data wrapper.
Other wrappers will throw a "feature not supported" error until/unless
they are updated.

Ronan Dunklau and Michael Paquier, additional work by me

This feature is made of two parts:

  • New API available for foreign data wrappers to support this SQL query
  • Support for this query in postgres_fdw, foreign-data wrapper (FDW) for PostgreSQL available in core.

The new API available has the following shape:

List *
ImportForeignSchema (ImportForeignSchemaStmt *stmt, Oid serverOid);

ImportForeignSchemaStmt is a parsed representation of the raw query of IMPORT FOREIGN SCHEMA and serverOid is the OID of the FDW server used for the import. The parsed statement contains all the information needed by a FDW to fetch all the information to rebuild a schema fetched from a remote source, mainly being:

  • Type of import done with stmt->list_type with the table list (not for ALL)
    • FDW_IMPORT_SCHEMA_LIMIT_TO (LIMIT clause specified in query) for a restricted list of table names imported
    • FDW_IMPORT_SCHEMA_EXCEPT (EXCEPT clause specified in query) for a list of tables to not fetch during import
    • FDW_IMPORT_SCHEMA_ALL (no LIMIT TO or EXCEPT clauses in query) to let the FDW know that all the tables from the foreign schema
  • Remote schema name
  • List of options to customize the import

Then this API needs to return a list of raw queries that will be applied as-is by the server after parsing them. The local schema is overridden by server to avoid any abuse. Documentation should be used as a reference for more details as well.

The second part of the feature is the support of IMPORT FOREIGN SCHEMA for postgres_fdw itself, allowing to import a schema from a different node. For example let's take the case of two instances on the same server. The first node listens to port 5432 and uses postgres_fdw to connect to a second node listening to port 5433 (for more details on how to set of that refer to that or directly have look at the official documentation).

On the remote node (listening to 5433) the two following tables are created on a default schema, aka "public":

=# CREATE TABLE remote_tab1 (a int not null);
CREATE TABLE
=# CREATE TABLE remote_tab2 (b timestamp default now());
CREATE TABLE

Importing them locally on schema public is a matter of running this command on the local node (local schema name is defined with clause INTO, and foreign schema at the beginning of the query):

=# IMPORT FOREIGN SCHEMA public FROM SERVER postgres_server INTO public;
IMPORT FOREIGN SCHEMA
=# \d
                List of relations
 Schema |    Name     |     Type      | Owner  
--------+-------------+---------------+--------
 public | remote_tab1 | foreign table | ioltas
 public | remote_tab2 | foreign table | ioltas
(2 rows)

IMPORT FOREIGN SCHEMA offers some control to the list of tables imported with LIMIT TO and EXCEPT, so this query would only import the table remote_tab1 in schema test_import1:

=# CREATE SCHEMA test_import1;
CREATE SCHEMA
=# IMPORT FOREIGN SCHEMA public LIMIT TO (remote_tab1)
   FROM SERVER postgres_server INTO test_import1;
IMPORT FOREIGN SCHEMA
=# \d test_import1.*
      Foreign table "test_import1.remote_tab1"
  Column |  Type   | Modifiers |    FDW Options    
 --------+---------+-----------+-------------------
  a      | integer | not null  | (column_name 'a')
 Server: postgres_server
 FDW Options: (schema_name 'public', table_name 'remote_tab1')

And this query would import everything except remote_tab1 in schema test_import2:

 =# CREATE SCHEMA test_import2;
 CREATE SCHEMA
 =# IMPORT FOREIGN SCHEMA public EXCEPT (remote_tab1)
    FROM SERVER postgres_server INTO test_import2;
 IMPORT FOREIGN SCHEMA
 =# \d test_import2.*
                Foreign table "test_import2.remote_tab2"
  Column |            Type             | Modifiers |    FDW Options    
 --------+-----------------------------+-----------+-------------------
  b      | timestamp without time zone |           | (column_name 'b')
Server: postgres_server
FDW Options: (schema_name 'public', table_name 'remote_tab2')

By default, the import will try to import collations and NOT NULL constraints. So, coming back to what has been imported on schema public, relation remote_tab1 is defined like that, with a NOT NULL constaint:

=# \d remote_tab1
         Foreign table "public.remote_tab1"
  Column |  Type   | Modifiers |    FDW Options    
 --------+---------+-----------+-------------------
  a      | integer | not null  | (column_name 'a')
 Server: postgres_server
 FDW Options: (schema_name 'public', table_name 'remote_tab1')

Note that this can be controlled with the clause OPTIONS in IMPORT FOREIGN SCHEMA, postgres_fdw offerring 3 options:

  • import_collate to import collates, default is true
  • import_default to import default expressions, default is false
  • import_not_null to import NOT NULL constaints, default is true

import_default is the tricky part, particularly for volatile expressions. Import will also fail if the default expression is based on objects not created locally, like what would happen when trying to import a relation with a SERIAL column.

Dave Cramer: PostgreSQL JDBC Driver version 9_3_1102 released

From Planet PostgreSQL. Published on Jul 10, 2014.

This is a maintenance release with a few interesting upgrades

Version 9.3-1102 (2014-07-10)

Author:epgrubmair bug #161
    fix copyOut close hanging bug #161 from epgrubmair

Author:romank0

    backpatch exception during close of fully read stream from romank0

Author:Christophe Canovas

    Added caching for ResultSetMetaData  complete commit

Author:Elizabeth Chatman
    NullPointerException in AbstractJdbc2DatabaseMetaData.getUDTs

    setNull, setString, setObject may fail if a specified type cannot be transferred in a binary mode #151

    backpatch fix for changing datestyle before copy

Author:TomonariKatsumata
    binary transfer fixes new feature -1 for forceBinaryTransfer

Author:Sergey Chernov
    connectTimeout property support backpatch
   
Author:Naoya Anzai
    fix prepared statement ERROR due to EMPTY_QUERY defined as static.

9.4 jars can also be found on the site 

Greg Sabino Mullane: Version differences via Github from the command line

From Planet PostgreSQL. Published on Jul 09, 2014.

I work with a lot of open source projects, and I use the command-line for almost everything. It often happens that I need to examine a file from a project, and thanks to bash, Github, and curl, I can do so easily, without even needing to have the repo handy. One of the things I do sometimes is compare a file across versions to see what has changed. For example, I needed to see what changes were made between versions 1.22 and 1.23 to the file includes/UserMailer.php which is part of the MediaWiki project. For this trick to work, the project must be on Github, and must label their versions in a consistent manner, either via git branches or git tags.

MediaWiki exists on Github as wikimedia/mediawiki-core. The MediaWiki project tags all of their releases in the format X.Y.Z, so in this example we can use the git tags 1.22.0 and 1.23.0. Github is very nice because you can view a specific file at a certain commit (aka a tag), and even grab it over the web as a plain text file. The format is:

https://raw.githubusercontent.com/PROJECTNAME/BRANCH-OR-TAG/FILE

Note that you can use a tag OR a branch! So to compare these two files, we can use one of these pairs:

https://raw.githubusercontent.com/wikimedia/mediawiki-core/REL1_21/includes/UserMailer.php
https://raw.githubusercontent.com/wikimedia/mediawiki-core/REL1_22/includes/UserMailer.php

https://raw.githubusercontent.com/wikimedia/mediawiki-core/1.21.0/includes/UserMailer.php
https://raw.githubusercontent.com/wikimedia/mediawiki-core/1.22.0/includes/UserMailer.php

All that is left is to treat git as a web service and compare the two files at the command line ourselves. The program curl is a great tool for downloading the files, as it dumps to stdout by default. We will add a -s flag (for "silent") to prevent it from showing the progress meter as it usually does. The last bit of the puzzle is to use <(), bash's process substitution feature, to trick diff into comparing the curl outputs as if they were files. So our final command is:

diff <(curl -s https://raw.githubusercontent.com/wikimedia/mediawiki-core/1.21.0/includes/UserMailer.php) \
<(curl -s https://raw.githubusercontent.com/wikimedia/mediawiki-core/1.22.0/includes/UserMailer.php) \
| more

Voila! A quick and simple glance at what changed between those two tags. This should work for any project on Github. You can also replace the branch or tag with the word "master" to see the current version. For example, the PostgreSQL project lives on github as postgres/postgres. They use the format RELX_Y_Z in their tags. To see what has changed since release 9.3.4 in the psql help file (as a context diff), run:

diff -c <(curl -s https://raw.githubusercontent.com/postgres/postgres/REL9_3_4/src/bin/psql/help.c) \
<(curl -s https://raw.githubusercontent.com/postgres/postgres/master/src/bin/psql/help.c)

You are not limited to diff, of course. For a final example, let's see how many times Tom Lane is mentioned in the version 9 release notes:

for i in {0,1,2,3,4}
do grep -Fc 'Tom Lane' \
<(curl -s https://raw.githubusercontent.com/postgres/postgres/master/doc/src/sgml/release-9.$i.sgml)
done
272
206
174
115
16

The last number is so low relative to the rest because 9.4 is still under development. Rest assured Tom's contributions have not slowed down! :) Thanks to Github for providing such a useful service for so many open source projects, and for providing the raw text to allow useful hacks like this.

Kirk Roybal: DFWPUG Meetup August 6, 2014

From Planet PostgreSQL. Published on Jul 08, 2014.

Just a quick reminder that Dallas/Fort Worth PostgreSQL Users Group has a Meetup the first Wednesday of every month.

What: PostgreSQL-backed Websites: Middleware Considered Harmful

Who: Justin Tocci

When: Wednesday, August 6, 2014 7:00 PM

Where:
Improving Enterprises
16633 Dallas Parkway Suite 110 Addison, TX 75001

DFW PUG on Meetup

Paul Ramsey: FOSS4G 2014 in Portland, Oregon, September 2014

From Planet PostgreSQL. Published on Jul 07, 2014.

Just a quick public service announcement for blog followers in the Pacific Northwest and environs: you've got a once in a not-quite-lifetime opportunity to attend the "Free and Open Source Software for Geospatial" (aka FOSS4G) conference this year in nearby Portland, Oregon, a city so hip they have trouble seeing over their pelvis.

Anyone in the GIS / mapping world should take the opportunity to go, to learn about what technology the open source world has available for you, to meet the folks writing the software, and the learn from other folks like you who are building cool things.

September 8th-13th, be there and be square.

Baji Shaik: Updating pg_cast helps, however sometimes !!

From Planet PostgreSQL. Published on Jul 05, 2014.

I've seen one of our customer is migrating a table from SQL Server to PostgreSQL using EnterpriseDB's Migration ToolKit.  This table has a boolean datatype column. In migration process, MTK converts datatype "boolean" to "bit" in PostgreSQL and the process was taking 6 hrs to complete. Customer wanted to change the datatype from "Bit" to "Integer" and alter command for changing type was taking another 6 hrs in PostgreSQL. If he migrates only structure to PostgreSQL first, and then change the type to "Integer" from "Bit", then it does not allow you to load the data with below error. If it allows, it takes only 6 hrs as no need of alter the type after data load.

ERROR: column "hidehelm" is of type integer but expression is of type boolean
Hint: You will need to rewrite or cast the expression.

So I found a work around to type cast from "boolean" to "Integer" implicitly by updating "pg_cast" table as below. By this, he can directly load the boolean data into integer column which saves the time of altering the type from BIT to Integer after migrating.

postgres=# insert into tarik values (1::boolean);
ERROR:  column "t" is of type integer but expression is of type boolean
LINE 1: insert into tarik values (1::boolean);
HINT:  You will need to rewrite or cast the expression.
postgres=# select * from pg_cast where castsource='boolean'::regtype and casttarget='int4'::regtype;
castsource | casttarget | castfunc | castcontext | castmethod
------------+------------+----------+-------------+------------
         16 |         23 |     2558 | e           | f
(1 row)
postgres=# update pg_cast set castcontext ='i' where castsource='boolean'::regtype and casttarget='int4'::regtype;
UPDATE 1
postgres=# select * from pg_cast where castsource='boolean'::regtype and casttarget='int4'::regtype;
 castsource | casttarget | castfunc | castcontext | castmethod 
------------+------------+----------+-------------+------------
         16 |         23 |     2558 | i           | f
(1 row)
postgres=# insert into tarik values (1::boolean);
INSERT 0 1

Irrespective of any side effects(which I'm not aware of) of this workaround, this worked and migration had take only 6 hrs. Of-course, customer rolled back this setting after migrating the table. Updating catalogs is very dangerous, so might be I should have concentrated on how to reduce the time of ALTER command after migration?, anyways, it worked, so I was happy !! ;-)

Thanks for any suggestions/comments.

Baji Shaik: Oops I corrupted my table, of-course just to recover salvaged data.

From Planet PostgreSQL. Published on Jul 04, 2014.


The way I started the title might be confusing, "I corrupted my table", so everyone starts with "Crap !! why did you do that !!, ...", so just to justify it.......
I see many customers coming for recovering the corrupted tables without any backup. In such cases, hard to recover the tables completely and it needs lot of work, however we can recover salvaged data if they dont care about corrupted rows.

Let me corrupt the table first.. :-)

 I created a million-row table called "damaged"
postgres=# select count(*) from to_be_damaged ;
  count
---------
 1000000
(1 row)
postgres=# select relfilenode,relname from pg_class where relname='to_be_damaged';
relfilenode |    relname  
-------------+---------------
       461257 | to_be_damaged

(1 row)
I've used "hexedit" to damage it. Open relfilenode file from OS level using hexedit and try picking a line which is the start of an 8K boundary and typing hex DE AD BE EF across it.

postgres=# select count(*) from to_be_damaged ;
ERROR:  invalid page in block 0 of relation base/12896/461257

Now create an identical table "salvaged" to recover salvaged data from "to_be_damaged" table.
postgres=# create table salvaged(t int);
CREATE TABLE
Prepared below function which copies the rows which are still salvageable:
create or replace function salvage_damaged()
  returns void
  language plpgsql
as $$
declare
  pageno int;
  tupno int;
  pos tid;
begin
  <<pageloop>>
  for pageno in 0..35930 loop  -- pg_class.relpages for the damaged table
    for tupno in 1..1000 loop
      pos = ('(' || pageno || ',' || tupno || ')')::tid;
      begin
        insert into salvaged select * from damaged where ctid = pos;
      exception
        when sqlstate 'XX001' then
          raise warning 'skipping page %', pageno;
          continue pageloop;
        when others then
          raise warning 'skipping row %', pos;
      end;
    end loop;
  end loop;
end;
$$;

Now run the function to copy salvagable rows:
postgres# select salvage_damaged();
WARNING: skipping page 0
salvage_damaged
-----------------

(1 row)
postgres=# select count(*) from salvaged ;
count
----------
12999815
(1 row)

postgres=# select 13000000-12999815;
?column?
----------
185
(1 row)

I hope it helps someone. Thanks for reading.

Michael Paquier: Postgres 9.5 feature highlight: WHERE clause pushdown in subqueries with window functions

From Planet PostgreSQL. Published on Jul 04, 2014.

Postgres 9.5 is going to improve the performance of subqueries using window functions by allowing the pushdown of WHERE clauses within them. Here is a commit, done during commit fest 1, that is the origin of this improvement:

commit d222585a9f7a18f2d793785c82be4c877b90c461
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Fri Jun 27 23:08:08 2014 -0700

Allow pushdown of WHERE quals into subqueries with window functions.

We can allow this even without any specific knowledge of the semantics
of the window function, so long as pushed-down quals will either accept
every row in a given window partition, or reject every such row.  Because
window functions act only within a partition, such a case can't result
in changing the window functions' outputs for any surviving row.
Eliminating entire partitions in this way obviously can reduce the cost
of the window-function computations substantially.

David Rowley, reviewed by Vik Fearing; some credit is due also to
Thomas Mayer who did considerable preliminary investigation.

The pushdown of the WHERE qual is done only if two conditions are satisfied:

  • Only the partitioning columns are referenced
  • The qual contains no volatile functions

Let's have a look at how things are improved by using a simple data set: a list of individuals referenced by an ID, with information about the city where they live and their respective age (the authorities lacked imagination for the city names).

=# CREATE TABLE population (
     person_id serial,
     age int,
     city text);
CREATE TABLE
=# INSERT INTO population (age, city)
     SELECT round(random() * 100),
            'city ' || round(random() * 200)
     FROM generate_series(1, 1000000);
INSERT 0 1000000

Now, here is a query that evaluates the average, minimum and maximum age of the cities where the population is living. If 9.4, this query returns the following plan:

=# EXPLAIN SELECT * FROM
     (SELECT city,
        avg(age) OVER (PARTITION BY city) avg_age,
        min(age) OVER (PARTITION BY city) min_age,
        max(age) OVER (PARTITION BY city) max_age
      FROM population) age_all
   WHERE city in ('city 26', 'city 47')
   GROUP BY avg_age, city, min_age, max_age;
                                         QUERY PLAN
 ---------------------------------------------------------------------------------------------
  HashAggregate  (cost=184834.34..184844.29 rows=995 width=48)
   Group Key: age_all.avg_age, age_all.city, age_all.min_age, age_all.max_age
    ->  Subquery Scan on age_all  (cost=149734.84..184734.84 rows=9950 width=48)
          Filter: (age_all.city = ANY ('{"city 26","city 47"}'::text[]))
          ->  WindowAgg  (cost=149734.84..172234.84 rows=1000000 width=12)
                ->  Sort  (cost=149734.84..152234.84 rows=1000000 width=12)
                      Sort Key: population.city
                      ->  Seq Scan on population  (cost=0.00..15896.00 rows=1000000 width=12)
  Planning time: 0.227 ms
 (9 rows)

As you can notice, a sequential scan is done by the subquery on the whole table "Seq Scan on population", while the WHERE clause is applied after generating the results through a costly sort operation on all the rows. This query took 2 seconds to run on a machine of the author of this article (sort did not spill on disk as work_mem was set high enough).

 =# SELECT * FROM
     (SELECT city,
        avg(age) OVER (PARTITION BY city) avg_age,
        min(age) OVER (PARTITION BY city) min_age,
        max(age) OVER (PARTITION BY city) max_age
      FROM population) age_all
    WHERE city in ('city 26', 'city 47')
    GROUP BY avg_age, city, min_age, max_age;
   city   |       avg_age       | min_age | max_age
 ---------+---------------------+---------+---------
  city 47 | 49.6150433555152248 |       0 |     100
  city 26 | 49.7953169156237384 |       0 |     100
 (2 rows)
 Time: 2276.422 ms

In Postgres 9.5, here is the plan obtained for the same query (plan has been reformated a bit here to fit on this blog page):

                                        QUERY PLAN
 ------------------------------------------------------------------------------------
  HashAggregate  (cost=15171.49..15178.33 rows=684 width=72)
    Group Key: avg(population.age) OVER (?),
               population.city,
               min(population.age) OVER (?),
               max(population.age) OVER (?)
    ->  WindowAgg  (cost=14880.83..15034.71 rows=6839 width=36)
          ->  Sort  (cost=14880.83..14897.93 rows=6839 width=36)
                Sort Key: population.city
                ->  Seq Scan on population  (cost=0.00..14445.20 rows=6839 width=36)
                      Filter: (city = ANY ('{"city 26","city 47"}'::text[]))
 Planning time: 0.203 ms
(6 rows)

Things are getting better, the WHERE clause is evaluated within the subquery, drastically reducing the cost of the sort by reducing the number of tuples selected. Running this query takes as well only 300ms, which is an interesting improvement compared to the pre-commit period.

Devrim GÜNDÜZ: Using huge pages on RHEL 7 and PostgreSQL 9.4

From Planet PostgreSQL. Published on Jul 04, 2014.

PostgreSQL 9.4 will finally support huge pages. This article will cover about configuring huge pages on RHEL 7 box.



Continue reading "Using huge pages on RHEL 7 and PostgreSQL 9.4"

Devrim GÜNDÜZ: Running more than one 9.4 /9.3 instance in parallel on RHEL 7

From Planet PostgreSQL. Published on Jul 03, 2014.

In the 3rd post of "PostgreSQL on RHEL7 series, I will mention about running more than one 9.4 instance on RHEL 7. This article is also valid for 9.3.
Continue reading "Running more than one 9.4 /9.3 instance in parallel on RHEL 7"

Hubert 'depesz' Lubaczewski: Waiting for 9.5 – Add cluster_name GUC which is included in process titles if set.

From Planet PostgreSQL. Published on Jul 02, 2014.

On 29th of June, Andres Freund committed patch: Add cluster_name GUC which is included in process titles if set.   When running several postgres clusters on one OS instance it's often inconveniently hard to identify which "postgres" process belongs to which postgres instance.   Add the cluster_name GUC, whose value will be included as part […]

Pavel Golub: Get PostgreSQL sources using SVN

From Planet PostgreSQL. Published on Jul 02, 2014.

As you probably know PostgreSQL sources are managed by Git version control system. Which is great, but for me as SVN user it would be much better to get access to the sources using familiar tools, e.g. TortoiseSVN. Mainly because I don’t need write access, read mode only. There is a workaround for such guys.

First of all, there is a mirror of PostgreSQL sources on the GitHub. And the second, GitHub supports SVN protocol using the bridge to communicate svn commands to GitHub.

So in my case I need only trunk with the latest commits. I’ve created an empty folder and made Checkout to the https://github.com/postgres/postgres/trunk URL.Image


Filed under: Coding, PostgreSQL Tagged: development, git, PostgreSQL, svn, trick

Devrim GÜNDÜZ: Installing and configuring PostgreSQL 9.3 and 9.4 on RHEL 7

From Planet PostgreSQL. Published on Jul 01, 2014.

Red Hat Enterprise Linux 7 was released recently. The day after the release, we announced PostgreSQL 9.4 RPMs for RHEL 7, and last week we pushed PostgreSQL 9.3 RPMs for RHEL 7.

Since Red Hat switched to systemd as of RHEL 7, there are some changes in the packaging as compared to RHEL 5 and RHEL 6. So, how do you install and configura PostgreSQL on RHEL 7?
Continue reading "Installing and configuring PostgreSQL 9.3 and 9.4 on RHEL 7"

Christophe Pettus: Djangocon 2014 US CFP Closes Soon

From Planet PostgreSQL. Published on Jul 01, 2014.

Just a reminder that the Djangocon US 2014 Call for Proposals ends July 15, 2014… we would love your talks and tutorials!

Kirk Roybal: DFWPUG Meetup July 2, 2014

From Planet PostgreSQL. Published on Jul 01, 2014.

Just a quick reminder that Dallas/Fort Worth PostgreSQL Users Group has a Meetup the first Wednesday of every month.

What: Brainstorming on Crypto-Digests and Relational Databases

Who: John Scott

When: Wednesday, July 2, 2014 7:00 PM

Where:
Improving Enterprises
16633 Dallas Parkway Suite 110 Addison, TX 75001

DFW PUG on Meetup

Kirk Roybal: Create a media calendar in PostgreSQL

From Planet PostgreSQL. Published on Jul 01, 2014.

Here’s a quick snippet that I wrote because I needed a media calendar that covers the time period of most living people. Hopes this helps keep somebody from typing this up themselves.

CREATE TABLE calendar (id bigserial primary key,
gregorian_date date not null,
 long_date text,
  day_of_week text,
  day_in_week smallint,
  day_number_in_month smallint,
  day_number_in_year smallint,
  last_day_in_week boolean,
  last_day_in_month boolean,
  week_ending_day date,
  month_ending_day date,
  week_in_month smallint,
  week_in_year smallint,
  month_name text,
  month_number smallint,
  year_month text,
  quarter text,
  year_quarter text,
  year_number smallint,
  roman_year text,
  roman_month text,
  roman_day text,
  load_date timestamp with time zone NOT NULL DEFAULT now())
  ;

COMMENT ON COLUMN calendar.gregorian_date IS 'Date in current calendar';
COMMENT ON COLUMN calendar.long_date IS 'Date in full formal notation';
COMMENT ON COLUMN calendar.day_of_week IS 'Text representation of the day of week';
COMMENT ON COLUMN calendar.day_in_week IS 'Numeric position of the day in the week';
COMMENT ON COLUMN calendar.day_number_in_month IS 'Numeric position of the day in the month';
COMMENT ON COLUMN calendar.day_number_in_year IS 'Numeric position of the day within the year';
COMMENT ON COLUMN calendar.last_day_in_week IS 'Is this the last day of the week';
COMMENT ON COLUMN calendar.last_day_in_month IS 'Is this the last day of the month';
COMMENT ON COLUMN calendar.week_ending_day IS 'What is the date of Saturday in this week';
COMMENT ON COLUMN calendar.month_ending_day IS 'Is this the last day of the week (saturday)';
COMMENT ON COLUMN calendar.week_in_month IS 'Numeric position of the week in the month';
COMMENT ON COLUMN calendar.week_in_year IS 'Numeric position of the week in the year';
COMMENT ON COLUMN calendar.month_name IS 'Text name of the month';
COMMENT ON COLUMN calendar.month_number IS 'Numeric value of the month in year';
COMMENT ON COLUMN calendar.year_month IS 'YYYY-MM formatted';
COMMENT ON COLUMN calendar.quarter IS 'Q1 - Q4';
COMMENT ON COLUMN calendar.year_quarter IS 'YYYY-Q(1-4)';
COMMENT ON COLUMN calendar.year_number IS 'Year as an integer';
COMMENT ON COLUMN calendar.load_date IS 'the timestamp when this row was created';
CREATE OR REPLACE FUNCTION media_calendar () RETURNS TRIGGER
AS $$
BEGIN
	NEW.long_date := to_char(NEW.gregorian_date, 'FMDay, the FMDDth of FMMonth, YYYY');
	NEW.day_of_week := to_char(NEW.gregorian_date, 'Day');
	NEW.day_in_week := to_char(NEW.gregorian_date, 'D')::smallint;
	NEW.day_number_in_month := to_char(NEW.gregorian_date, 'DD')::smallint;
	NEW.day_number_in_year := to_char(NEW.gregorian_date, 'DDD')::smallint;	
	NEW.month_ending_day := to_char(NEW.gregorian_date + interval '1 month', 'YYYY-MM-01')::date -1; --  (to_char(NEW.gregorian_date, 'YYYY') || '-' || (to_char(NEW.gregorian_date, 'MM')::integer + 1)::text || '-01')::date -1;
	NEW.last_day_in_month := NEW.gregorian_date = NEW.month_ending_day;
	NEW.week_ending_day := NEW.gregorian_date + (7 - NEW.day_in_week);
	NEW.last_day_in_week := NEW.gregorian_date = NEW.week_ending_day;
	NEW.week_in_month := to_char(NEW.gregorian_date, 'W')::smallint;
	NEW.week_in_year := to_char(NEW.gregorian_date, 'WW')::smallint;
	NEW.month_name := to_char(NEW.gregorian_date, 'Month');
	NEW.month_number := to_char(NEW.gregorian_date, 'MM')::smallint;
	NEW.year_month := to_char(NEW.gregorian_date, 'YYYY-MM');
	NEW.quarter := 'Q' || to_char(NEW.gregorian_date, 'Q');
	NEW.year_quarter := to_char(NEW.gregorian_date, 'YYYY') || '-' || NEW.quarter;
	NEW.year_number := to_char(NEW.gregorian_date, 'YYYY')::smallint;
	NEW.roman_year := to_char(to_char(NEW.gregorian_date, 'YYYY')::integer, 'FMRN');
	NEW.roman_month := to_char(NEW.gregorian_date, 'RM');
	NEW.roman_day := to_char(to_char(NEW.gregorian_date, 'DD')::integer, 'FMRN');

	RETURN NEW;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER media_calendar
BEFORE INSERT ON calendar
FOR EACH ROW EXECUTE PROCEDURE media_calendar();

--covers 1890-01-01 to 2099-12-31
INSERT INTO calendar (gregorian_date) select '1890-01-01'::date + s.a as dates from generate_series(0,76700,1) as s(a);

CREATE UNIQUE INDEX idx_calendar_gregorian ON calendar(gregorian_date);
CLUSTER calendar USING idx_calendar_gregorian;

Michael Paquier: Postgres 9.5 feature highlight: Tracking processes with cluster_name

From Planet PostgreSQL. Published on Jun 30, 2014.

Here is a small feature that has showed up during the first commit fest of Postgres 9.5 allowing to add complementary information in the process names displayed by a server:

commit 51adcaa0df81da5e94b582d47de64ebb17129937
Author: Andres Freund <andres@anarazel.de>
Date:   Sun Jun 29 14:15:09 2014 +0200

Add cluster_name GUC which is included in process titles if set.

When running several postgres clusters on one OS instance it's often
inconveniently hard to identify which "postgres" process belongs to
which postgres instance.

Add the cluster_name GUC, whose value will be included as part of the
process titles if set. With that processes can more easily identified
using tools like 'ps'.

To avoid problems with encoding mismatches between postgresql.conf,
consoles, and individual databases replace non-ASCII chars in the name
with question marks. The length is limited to NAMEDATALEN to make it
less likely to truncate important information at the end of the
status.

Thomas Munro, with some adjustments by me and review by a host of people.

This is helpful to identify to which server is attached a process when running multiple instances on the same host, here is for example the case of two nodes: a master and a standby (feel free to not believe that by the way!).

$ psql -At -p 5432 -c 'show cluster_name'
master
$ psql -At -p 5433 -c 'show cluster_name'
standby
$ ps x | grep "master\|standby" | grep -v 'grep'
80624   ??  Ss     0:00.00 postgres: standby: logger process
80625   ??  Ss     0:00.02 postgres: standby: startup process   recovering 000000010000000000000004
80633   ??  Ss     0:00.01 postgres: standby: checkpointer process
80634   ??  Ss     0:00.07 postgres: standby: writer process
80635   ??  Ss     0:00.00 postgres: standby: stats collector process
80655   ??  Ss     0:00.00 postgres: master: logger process
80657   ??  Ss     0:00.01 postgres: master: checkpointer process
80658   ??  Ss     0:00.07 postgres: master: writer process
80659   ??  Ss     0:00.04 postgres: master: wal writer process
80660   ??  Ss     0:00.02 postgres: master: autovacuum launcher process
80661   ??  Ss     0:00.01 postgres: master: archiver process
80662   ??  Ss     0:00.05 postgres: master: stats collector process
80669   ??  Ss     0:00.76 postgres: standby: wal receiver process   streaming 0/4000428
80670   ??  Ss     0:00.01 postgres: master: wal sender process ioltas 127.0.0.1(55677) streaming 0/4000428

Non-ascii characters are printed as question marks.

$ grep cluster_name $PGDATA/postgresql.conf
cluster_name = 'éèê'
$ ps x | grep postgres | head -n1
81485   ??  Ss     0:00.00 postgres: ??????: logger process

For development purposes, this makes debugging easier...

Pavel Stehule: plpgsql_check is available for PostgreSQL 9.2

From Planet PostgreSQL. Published on Jun 27, 2014.

Short note: I did backport of plpgsql_check to PostgreSQL 9.2.

What is plpgsql_check? It is plpgsql validator without necessity to run checked function and all paths inside.
postgres=# select oid::regprocedure, plpgsql_check_function(oid::regprocedure) from pg_proc where prolang = 12599 and prorettype 
 2279;
oid │ plpgsql_check_function
───────────────┼─────────────────────────────────────────────────────────────────────────
bubu(integer) │ error:42702:3:RETURN:column reference "a" is ambiguous
bubu(integer) │ Query: SELECT (select * from t where t.a = a)
bubu(integer) │ -- ^
bubu(integer) │ Detail: It could refer to either a PL/pgSQL variable or a table column.
(4 rows)

Time: 4.857 ms

PlanetDjango.org shutting down

By Adomas Paltanavičius from Planet Django. Published on Mar 08, 2014.

Is Open Source Consulting Dead?

By chrism from plope. Published on Sep 10, 2013.

Has Elvis left the building? Will we be able to sustain ourselves as open source consultants?

Consulting and Patent Indemification

By chrism from plope. Published on Aug 09, 2013.

Article about consulting and patent indemnification

Python Advent Calendar 2012 Topic

By chrism from plope. Published on Dec 24, 2012.

An entry for the 2012 Japanese advent calendar at http://connpass.com/event/1439/

Why I Like ZODB

By chrism from plope. Published on May 15, 2012.

Why I like ZODB better than other persistence systems for writing real-world web applications.

A str. __iter__ Gotcha in Cross-Compatible Py2/Py3 Code

By chrism from plope. Published on Mar 03, 2012.

A bug caused by a minor incompatibility can remain latent for long periods of time in a cross-compatible Python 2 / Python 3 codebase.

In Praise of Complaining

By chrism from plope. Published on Jan 01, 2012.

In praise of complaining, even when the complaints are absurd.

2012 Python Meme

By chrism from plope. Published on Dec 24, 2011.

My "Python meme" replies.

In Defense of Zope Libraries

By chrism from plope. Published on Dec 19, 2011.

A much too long defense of Pyramid's use of Zope libraries.

Plone Conference 2011 Pyramid Sprint

By chrism from plope. Published on Nov 10, 2011.

An update about the happenings at the recent 2011 Plone Conference Pyramid sprint.

Jobs-Ification of Software Development

By chrism from plope. Published on Oct 17, 2011.

Try not to Jobs-ify the task of software development.

WebOb Now on Python 3

By chrism from plope. Published on Oct 15, 2011.

Report about porting to Python 3.

Open Source Project Maintainer Sarcastic Response Cheat Sheet

By chrism from plope. Published on Jun 12, 2011.

Need a sarcastic response to a support interaction as an open source project maintainer? Look no further!

Pylons Miniconference #0 Wrapup

By chrism from plope. Published on May 04, 2011.

Last week, I visited the lovely Bay Area to attend the 0th Pylons Miniconference in San Francisco.

Pylons Project Meetup / Minicon

By chrism from plope. Published on Apr 14, 2011.

In the SF Bay Area on the 28th, 29th, and 30th of this month (April), 3 separate Pylons Project events.

PyCon 2011 Report

By chrism from plope. Published on Mar 19, 2011.

My personal PyCon 2011 Report