Skip to content. | Skip to navigation

Personal tools
Log in
You are here: Home

Open Source Posts

Robert Haas: Why The Clock is Ticking for MongoDB

From Planet PostgreSQL. Published on Apr 16, 2014.

Last month, ZDNet published an interview with MongoDB CEO Max Schireson which took the position that the document databases, such as MongoDB, are better-suited to today's applications than traditional relational databases; the title of the article implies that the days of relational databases are numbered. But it is not, as Schireson would have us believe, that the relational database community is ignorant of or has not tried the design paradigms which he advocates, but that they have been tried and found, in many cases, to be anti-patterns. Certainly, there are some cases in which the schemaless design pattern that is perhaps MongoDB's most distinctive feature is just the right tool for the job, but it is also misleading to think that such designs must use a document store. Relational databases can also handle such workloads, and their capabilities in this area are improving rapidly.

Read more »

Jim Mlodgenski: PostgreSQL Videos

From Planet PostgreSQL. Published on Apr 16, 2014.

With all of the new users to PostgreSQL, one of the things we wanted to do for PGConf NYC this year was to have a video showing the history of PostgreSQL. We ended up with a very professional video showing how the features have grown over the years and by the applause of the attendees delivered on our goals. You can see it at:

However, our proof of concept video was very different. Since I was done by me and my video editing skills are very limited, its not nearly as clean and professional. I also have the feeling that Jonathon Katz was afraid we’d freak out some people with the Five Finger Death Punch background music so we ended up going to more professional route. I still think it had some potential so I posted it up to YouTube:

Also, thanks to Kirk Roybal for putting together the animation of PostgreSQL source code changes over the years.


Dan Langille: Have you never been to PGCon?

From Planet PostgreSQL. Published on Apr 15, 2014.

I remember a time when I’d never been to a conference related to my passions. Once I went, things changed. I realized that making strong working relationships with others who share my passion is important. Not only does this solidify the community of which you are a member, it also helps you personally. Every conference [...]

Hans-Juergen Schoenig: Regular expressions unleased

From Planet PostgreSQL. Published on Apr 15, 2014.

When cleaning up some old paperwork this weekend I stumbled over a very old tutorial. In fact, I have received this little handout during a UNIX course I attended voluntarily during my first year at university. It seems that those two days have really changed my life – the price tag: 100 Austrian Schillings which […]

gabrielle roth: PDXPUG: April meeting this week

From Planet PostgreSQL. Published on Apr 14, 2014.

When: 7-9pm Thu April 17, 2014
Where: Iovation
Who: Brian Panulla
What: Intro to Graph Databases

Brian Panulla is a Business Intelligence Developer at Acureo. He’s been fulfilling PDXPUG’s rhombic triacontahedron needs for over a year. Next week, he’ll be giving us an Intro to Graph Databases!

Our meeting will be held at Iovation, on the 32nd floor of the US Bancorp Tower at 111 SW 5th (5th & Oak). It’s right on the Green & Yellow Max lines. Underground bike parking is available in the parking garage; outdoors all around the block in the usual spots. No bikes in the office, sorry!

Building security will close access to the floor at 7:30.

See you there!

Hubert 'depesz' Lubaczewski: Waiting for 9.4 – Provide moving-aggregate support for a bunch of aggregates.

From Planet PostgreSQL. Published on Apr 13, 2014.

On 13th of April, Tom Lane committed patch: Provide moving-aggregate support for a bunch of numerical aggregates.   First installment of the promised moving-aggregate support in built-in aggregates: count(), sum(), avg(), stddev() and variance() for assorted datatypes, though not for float4/float8.   In passing, remove a 2001-vintage kluge in interval_accum(): interval array elements have been […]

Michael Paquier: Testing heartbleed on Postgres

From Planet PostgreSQL. Published on Apr 11, 2014.

Except if you have been cut from the Internet the last week, you have already heard of Heartbleed. This good candidate for the "Bug of the Year 2014" price is already costing a better-not-to-count amount of money in maintenance and development for many companies around the world.

It has already been mentioned in which cases a PostgreSQL server would be vulnerable, but you might want to run some tests to check that the update of openssl on your server is effective. After this bug went public on 2014/04/07, many scripts have popped around on the net to help you checking if a server is vulnerable to this bug or not. However, you need to know that you may not be able to test them directly on a Postgres server as Postgres uses a custom protocol before handling the connection to openssl. A connection needs to send first a message called SSLRequest described here, consisting of two 32-bit integers, 8 and 80877103. The server then answers a single byte, either 'S' if SSL connection is supported, or 'N' if not. Once 'S' is received the SSL startup handshake message can be sent for further processing.

Taking that into account, an example of script usable to test Heartbleed vulnerability on a Postgres server can be found here, extended by my colleague Heikki Linnakangas and forked from here. Particularly, note this portion of the code to handle the PostgreSQL custom protocol:

sslrequest = h2bin('''
00 00 00 08 04 D2 16 2F


print 'Sending PostgreSQL SSLRequest...'
print 'Waiting for server response...'

sslresponse = recvall(s, 1)
if sslresponse == None:
  print 'Server closed connection without responding to SSLRequest.'
# Server responds 'S' if it accepts SSL, or 'N' if SSL is not supported.
pay = struct.unpack('>B', sslresponse)
if pay[0] == 0x4E: # 'N'
  print 'PostgreSQL server does not accept SSL connections.'
if pay[0] != 0x53: # 'S'
  print 'Unexpected response to SSLRequest: %d.', pay

# Continue with SSL start handshake...

The variable "sslrequest" is the hexadecimal conversion of SSLRequest explained above. Now let's test that on a Linux box, Archlinux more precisely. In order to reproduce the vulnerability openssl has been temporarily downgraded to 1.0.1f as it was still available in the pacman cache. A PostgreSQL server with ssl enabled has been deployed as well:

$ pacman -Ss openssl | grep 1.0.1
core/openssl 1.0.1.g-1 [installed: 1.0.1.f-2]
$ psql -c "show ssl"
(1 row)

Note the part "installed: 1.0.1.f" corresponding to the installed version of openssl, which is not the one available through the packager. And here is the result obtained when checking the vulnerability with the script:

$ ./ -p 5432 | tail -n 1
WARNING: server returned more data than it should - server is vulnerable!

Yes server is vulnerable. Now let's see after an upgrade to 1.0.1g.

$ pacman -Ss openssl | grep 1.0.1
core/openssl 1.0.1.g-1 [installed]
$ ./ -p 5432 | tail -n 1
No heartbeat response received, server likely not vulnerable

Feeling better... Feel free to have a look at the script itself for more details.

Szymon Guz: Speeding Up Saving Millions of ORM Objects in PostgreSQL

From Planet PostgreSQL. Published on Apr 11, 2014.

The Problem

Sometimes you need to generate sample data, like random data for tests. Sometimes you need to generate it with huge amount of code you have in your ORM mappings, just because an architect decided that all the logic needs to be stored in the ORM, and the database should be just a dummy data container. The real reason is not important - the problem is: let’s generate lots of, millions of rows, for a sample table from ORM mappings.

Sometimes the data is read from a file, but due to business logic kept in ORM, you need to load the data from file to ORM and then save the millions of ORM objects to database.

This can be done in many different ways, but here I will concentrate on making that as fast as possible.

I will use PostgreSQL and SQLAlchemy (with psycopg2) for ORM, so all the code will be implemented in Python. I will create a couple of functions, each implementing another solution for saving the data to the database, and I will test them using 10k and 100k of generated ORM objects.

Sample Table

The table I used is quite simple, just a simplified blog post:

  title TEXT NOT NULL,
  payload TEXT NOT NULL

SQLAlchemy Mapping

I'm using SQLAlchemy for ORM, so I need a mapping, I will use this simple one:
class BlogPost(Base):
    __tablename__ = "posts"

    id = Column(Integer, primary_key=True)
    title = Column(Text)
    body = Column(Text)
    payload = Column(Text)

The payload field is just to make the object bigger, to simulate real life where objects can be much more complicated, and thus slower to save to the database.

Generating Random Object

The main idea for this test is to have a randomly generated object, however what I really check is the database speed, and the whole randomness is used at the client side, so having a randomly generated object doesn’t really matter at this moment. The overhead of a fully random function is the same regardless of the method of saving the data to the database. So instead of randomly generating the object, I will use a static one, with static data, and I will use the function below:

TITLE   = "title"      * 1764
BODY    = "body"       * 1764
PAYLOAD = "dummy data" * 1764

def generate_random_post():
    "Generates a kind of random blog post"
    return BlogPost(title=TITLE, body=BODY, payload=PAYLOAD)

Solution Ideas

Generally there are two main ideas for such a bulk inserting of multiple ORM objects:

  • Insert them one-by-one with autocommit
  • Insert them one-by-one in one transaction

Save One By One

This is the simplest way. Usually we don’t save just one object, but instead we save many different objects in one transaction, and making a couple of related changes in multiple transactions is a great way leading to a database with bad data.

For generating millions of unrelated objects this shouldn’t cause data inconsistency, but this is highly inefficient. I’ve seen this multiple times in code: create an object, save it to the database, commit, create another object and so on. It works, but is quite slow. Sometimes it is fast enough, but for the cost of making a very simple change in this algorithm we can make it 10 times faster.

I’ve implemented this algorithm in the function below:

def save_objects_one_by_one(count=MAX_COUNT):
    for i in xrange(1, MAX_COUNT+1):
        post = generate_random_post()

Save All in One Transaction

This solution is as simple as: create objects, save them to the database, commit the transaction at the end, so do everything in one huge transaction.

The implementation differs only by four spaces from the previous one, just run commit() once, after adding all objects:

def save_objects_one_transaction(count=MAX_COUNT):
    for i in xrange(1, MAX_COUNT+1):
        post = generate_random_post()

Time difference

I ran the tests multiple times, truncating the table each time. The average results of saving 10k objects were quite predictable:

  • Multiple transactions - 268 seconds
  • One transaction - 25 seconds

The difference is not surprising, the whole table size is 4.8MB, but after each transaction the database needs to write the changes on disk, which slows the procedure a lot.


So far, I’ve described the most common methods of generating and storing many ORM objects. I was wondering about another, which may seem surprising a little bit at the beginning.

PostgreSQL has a great COPY command which can copy data between a table and a file. The file format is simple: one table row per one file row, fields delimited with a defined delimiter etc. It can be a normal csv or tsv file.

My crazy idea was: how about using the COPY for loading all the generated ORM objects? To do that, I need to serialize them to a text representation, to create a text file with all of them. So I created a simple function, which does that. This function is made outside the BlogPost class, so I don't need to change the data model.

def serialize_post_to_out_stream(post, out):
    import csv
    writer = csv.writer(out, delimiter="\t", quoting=csv.QUOTE_MINIMAL)
    writer.writerow([post.title, post.body, post.payload])

The function above gets two parameters:

  • post - the object to be serialized
  • out - the output stream where the row with the post object will be saved, in Python it is a file-like object, so an object with all the functions a file object has

Here I use a standard csv module, which supports reading and writing csv files. I really don’t want to write my own function for escaping all the possible forms of data I could have - this usually leads to many tricky bugs.

The only thing left is to use the COPY command. I don’t want to create a file with data and load that later; the generated data can be really huge, and creating temporary files can just slow things down. I want to keep the whole procedure in Python, and use pipes for data loading.

I will use the psql program for accessing the PostgreSQL database. Psql has a different command called \COPY, which can read the csv file from psql's standard input. This can be done using e.g.: cat file.csv | psql database.

To use it in Python, I’m going to use the subprocess module, and create a psql process with stdin=subprocess.PIPE which will give me write access to the pipe psql reads from. The function I’ve implemented is:

def save_objects_using_copy(count=MAX_COUNT):
    import subprocess
    p = subprocess.Popen([
        'psql', 'pgtest', '-U', 'pgtest',
        '-c', '\COPY posts(title, body, payload) FROM STDIN',
        ], stdin=subprocess.PIPE
    for i in xrange(1, MAX_COUNT+1):
        post = generate_random_post()
        serialize_post_to_out_stream(post, p.stdin)


I’ve also tested that on the same database table, truncating the table before running it. After that I’ve also checked this function, and the previous one (with one transaction) on a bigger sample - 100k of BlogPost objects.

The results are:

Sample size Multiple Transactions One Transaction COPY
10k 268 s 25 s 5 s
100k 262 s 51 s

I haven’t tested the multiple transactions version for 100k sample, as I just didn’t want to wait multiple hours for finishing that (as I run each of the tests multiple times to get more reliable results).

As you can see, the COPY version is the fastest, even 5 times faster than the full ORM version with one huge transaction. This version is also memory friendly, as no matter how many objects you want to generate, it always needs to store one ORM object in memory, and you can destroy it after saving.

The Drawbacks

Of course using psql poses a couple of problems:

  • you need to have psql available; sometimes that’s not an option
  • calling psql creates another connection to the database; sometimes that could be a problem
  • you need to set up a password in ~/.psql file; you cannot provide it in the command line

You could also get the pcycopg2 cursor directly from the SQLAlchemy connection, and then use the copy_from() function, but this method needs to have all the data already prepared in memory, as it reads from a file-like object, e.g. StringIO. This is not a good solution for inserting millions of objects, as they can be quite huge - streaming is much better in this case.

Another solution to this is to write a generator, which is a file like object, and the copy_from() method can read from it directly. This function calls the file's read() method trying to read 8192 bytes per call. This can be a good idea when you don't have access to the psql, however due to the overhead for generating the 8192 bytes strings, it should be slowever than the psql version.

Francisco Figueiredo Jr: Npgsql 2.1.3 released!

From Planet PostgreSQL. Published on Apr 09, 2014.

This version restores the @@ operator support removed in the previous version. Thanks Glen Parker who provided a fix.

Downloads can be found at the usual locations:

github download page
the nuget repository.

Pgfoundry will be updated soon.

Oleg Bartunov: Jsonb: "Wildcard" query

From Planet PostgreSQL. Published on Apr 08, 2014.

Just to let people know about our experiments.

Consider this top-level query, which well supported by GIN in 9.4.

postgres=# select count(*) from jb where jb @> '{"tags":[{"term":"NYC"}]}'::jsonb;
(1 row)

What if I want to find just {"term":"NYC"}, or even "NYC" ? Now, with some improvements, jsonb_hash_ops supports such queries with less than 5% bigger index size and not sacrificing performance !

postgres=# select count(*) from jb where jb @>> '{"term":"NYC"}'::jsonb;
(1 row)
postgres=# select count(*) from jb where jb @>> '"NYC"'::jsonb;
(1 row)

I can see the use-case for wildcard queries for shop aggregators, which combine different hierachies, so it's difficult to say if some specific key is on the same level in different sources.

Szymon Guz: Filling Gaps in Cumulative Sum in Postgres

From Planet PostgreSQL. Published on Apr 08, 2014.

I found an interesting problem. There was a table with some data, among which there was a date and an integer value. The problem was to get cumulative sum for all the dates, however including dates for which we don't have entries. In case of such dates we should use the last calculated sum.

Example Data

I will use an example table:


with sample data:

# INSERT INTO test(d,v)
  VALUES('2014-02-01', 10),
        ('2014-02-02', 30),
        ('2014-02-05', 10),
        ('2014-02-10', 3);

Then the data in the table looks like:

# SELECT * FROM test;
     d      |  v
 2014-02-01 | 10
 2014-02-02 | 30
 2014-02-05 | 10
 2014-02-10 |  3
(4 rows)

What I want is to have a cumulative sum for each day. Cumulative sum is a sum for all the earlier numbers, so for the above data I want to get:

     d      |  v
 2014-02-01 | 10
 2014-02-02 | 40
 2014-02-05 | 50
 2014-02-10 | 53
(4 rows)

The simple query for getting the data set like shown above is:

FROM test

Filling The Gaps

The query calculates the cumulative sum for each row. Unfortunately this way there are gaps between dates, and the request was to fill those in using the values from previous days.

What I want to get is:

     d      |  v
 2014-02-01 | 10
 2014-02-02 | 40
 2014-02-03 | 40
 2014-02-04 | 40
 2014-02-05 | 50
 2014-02-06 | 50
 2014-02-07 | 50
 2014-02-08 | 50
 2014-02-09 | 50
 2014-02-10 | 53

My first idea was to use the generate_series() function, which can generate a series of data. What I need is a series of all dates between min and max dates. This can be done using:

# SELECT generate_series(
    '1 day')::date;

The generate_series() function arguments are (begin, end, interval). The function returns all timestamps from beginning to end with given interval. The return value is timestamp, so I had to cast it to date with '::date', which is a nice PostgreSQL shortcut for the standard syntax, CAST(generate_series(...) AS DATE).

I also want to use the first query to use the cumulative sum I calculated before. It can be simply achieved using the great WITH command which creates something like a temporary table, which can be queried:

# WITH temp AS 
  SELECT generate_series(1, 1000) d
FROM temp
WHERE d < 4


Combining all the above queries resulted in the below one:

  FROM test
  (SELECT v 
   FROM y 
   WHERE y.d <= g.d
   LIMIT 1)
  (SELECT generate_series(min(d), max(d), '1 day')::date d 
   FROM y) g

After the earlier explanations understanding this one should be easy.

  • I placed the original query calculating the cumulative sum in the WITH block.
  • SELECT creates a row set with two columns
    • The first column is date returns from subselect, just before ORDER BY. There are returned all dates between min and max date from the original data.
    • The second column is a subquery getting calculated cumulative sum. It gets the sum for current date (from the first column), or the previous calculated.
  • And of course we need ordering at the end. The database can reorder the data as it wants during executing the query, so we always need to declare the ordering at the end. Otherwise strange things can happen (like having the same ordering of rows for years, and suddenly a totally different one, just because someone added new row, deleted some other, or just restarted application).

Magnus Hagander: PostgreSQL and the OpenSSL Heartbleed vulnerability

From Planet PostgreSQL. Published on Apr 08, 2014.

Is your PostgreSQL installation vulnerable to the Heartbleed bug in OpenSSL? The TL;DR; version is "maybe, it depends, you should read this whole thing to find out". If you are vulnerable, it is a high risk vulnerability!

The slightly longer version is that it will be vulnerable if you are using SSL, and not vulnerable if you are not. But the situation is not quite that easy, as you may be using SSL even without planning to. PostgreSQL also not provide any extra protection against the bug - if you are using SSL, you are vulnerable to the bug just as with any other service.

As the bug is in OpenSSL, however, what you need to get patched is your OpenSSL installation and not PostgreSQL itself. And of course, remember to restart your services (this includes both PostgreSQL and any other services using SSL on your system). You will then have to consider in your scenario if you have to replace your SSL keys or not - the same rules apply as to any other service.

It depends on if SSL is enabled

PostgreSQL by default ships with SSL turned off on most platforms. The most notable exception is Debian and derivatives (such as Ubuntu), which enable SSL by default.

If SSL is disabled globally, your installation is not vulnerable.

The easiest way to check this is to just use a simple SQL query:

postgres=# show ssl;
(1 row)

If this parameter returns off, you are not vulnerable. If it returns on, you are.

If you do not need SSL, the easiest fix is to turn this off and restart PostgreSQL. This also brings additional benefits of not paying the overhead of encryption if you don't need it. If you actually use SSL, this is of course not an option.

It depends on your installation

If you have installed PostgreSQL using a package based system, such as yum (from redhat/fedora or from, apt (from debian/ubuntu/etc or from, FreeBSD ports etc, it is up to your operating system to provide a patch. Most major distributions have already done this - you just need to to install it (and restart your services!). If your distribution has not yet updated, you need to convince them to do so ASAP.

If you are using a PostgreSQL installation package that bundles OpenSSL, you need an updated version of this package. The most common example of this is the EnterpriseDB Graphical Installers primarily used on Windows and Mac. We expect a new version of these installers to be released within a day or a few. is also vulnerable and needs an update, but is normally not used for servers.

The OpenSCG separate download packages are also vulnerable.

For each of these you will have to wait for an updated package to show up in the next couple of days. All package maintainers have been notified, so it's only a matter of time.

Per the download pages we do recommend that you always use the "package manager" system for any platform where this is supported, which means most modern Linux or BSD distributions. If you are currently using one of the above installers on these platforms, a quick fix before the packages are out would be to switch to one of the "package manager" platforms that rely on the operating system update process. This may or may not be an option of course, depending on the complexity of the installation.

If you are using a platform where this is not available (such as Windows), your only option is to wait.

pg_hba does not protect you

In PostgreSQL, the SSL negotiation happens before pg_hba.conf is matched. And the vulnerability in OpenSSL is in the negotiation phase. For this reason, even if you have restricted access to your server using pg_hba.conf IP filter rules, or your pg_hba.conf specifies only hostnossl records, this does not protect you.

Obviously, if you have an IP level firewall, either at the host or on the network, that will protect you. But pg_hba does not.

Usage in pgcrypto

The pgcrypto module in PostgreSQL uses OpenSSL to provide encryption functions when available. Since the vulnerability is specifically in the protocol negotiation, use in pgcrypto is not vulnerable to this issue.

Joel Jacobson: SET search_path for all SECURITY DEFINER functions

From Planet PostgreSQL. Published on Apr 07, 2014.

As pointed out by Peter Eisentraut in a blog post named
Schema Search Paths Considered Pain in the Butt, you need to make sure the search_path is explicitly set for all SECURITY DEFINER functions in PostgreSQL.

Fixing this manually for, in my case, 2106 functions, is, indeed a “pain in the butt”, so I crafted a little query to automate the job:

\pset format unaligned
\o /tmp/fix_search_path_for_security_definer_functions.sql
            -- inject SET search_path in-between LANGUAGE and SECURITY DEFINER in the declaration
                E'(LANGUAGE [a-z]+)\\s+(SECURITY DEFINER)',
                E'\\1\n SET search_path TO public, pg_temp\n \\2'
from pg_proc
where prosecdef is true -- SECURITY DEFINER functions
-- don't include functions for which we have already specified a search_path
and not (coalesce(array_to_string(proconfig,''),'') like '%search_path%')
-- public schema
and pronamespace = 2200
\i /tmp/fix_search_path_for_security_definer_functions.sql
-- If all goes well you should see a lot of CREATE FUNCTION being spammed on the screen

Bruce Momjian: Postgres Gets the Business

From Planet PostgreSQL. Published on Apr 06, 2014.

I just returned from attending PGConf NYC. They had 259 participants, more than double last year's total. The conference was in a hotel near Wall Street.

While I have been to many Postgres user conferences, this felt like my first corporate Postgres conference. Presenters from multinational banks Goldman Sachs and Morgan Stanley explained how and why they use Postgres in their organizations. These are trend-setting organizations, and their public embrace of Postgres will have lasting benefits.

In fact, one hallway discussion was how to enable large organizations like these, particularly those with significant legal and regulatory requirements, to work with the Postgres community. Some companies have employees post from non-company email accounts like Gmail, while others contract with consulting companies to work with the Postgres community on their behalf. Unfortunately, neither of these approaches have the companies working with the community openly.

Continue Reading »

Michael Paquier: Postgres 9.4 feature highlight: Indexing JSON data with jsonb data type

From Planet PostgreSQL. Published on Apr 06, 2014.

PostgreSQL 9.4 is shipping with a new feature called jsonb, which is a new data type able to store JSON data supporting GIN indexing (!). In short, this feature, one of the most important of the upcoming release, if not the most important, puts Postgres directly in good position in the field of document-oriented database systems.

Since 9.2, an integrated JSON datatype already exists, completed with a set of functions (data generation and parsing functions) as well as operators added in 9.3. When using "json" data type, data is stored as an exact copy of the input text which functions working on it need to reparse causing some processing overhead.

The new jsonb data type stores data in a decomposed binary format, so inserting it is less performant than json because of the overhead necessary to put it in shape but it is faster as it does not need reparsing, and it has the advantage to support GIN indexing. For this last reason it is actually recommended to use jsonb for your applications instead of json (you might need only json depending on your needs though). Note as well that jsonb has the same operators as functions as json, you can refer to my previous posts on the matter to get some insight on them or directly at the documentation of Postgres.

Now let's see how jsonb works and let's compare it with json with as data sample a dump of geobase, worth 8.6 million tuples and 1.1GB, with many fields like the city name, country code (you can refer to a complete list of the fields here). After storing the data into a new table with a raw COPY, let's transform it into json/jsonb in a set of tables with a fillfactor at 100 to see how much space they use:

=# COPY geodata FROM '$HOME/Downloads/allCountries.txt';
COPY 8647839
=# CREATE TABLE geodata_jsonb (data jsonb) with (fillfactor=100);
=# CREATE TABLE geodata_json (data json) with (fillfactor=100);
=# \timing
Timing is on.
=# INSERT INTO geodata_json SELECT row_to_json(geodata) FROM geodata;
INSERT 0 8647839
Time: 287158.457 ms
=# INSERT INTO geodata_jsonb SELECT row_to_json(geodata)::jsonb FROM geodata;
INSERT 0 8647839
Time: 425825.967 ms

Inserting jsonb data took a little bit more time. And what is the difference of size?

=# SELECT pg_size_pretty(pg_relation_size('geodata_json'::regclass)) AS json,
          pg_size_pretty(pg_relation_size('geodata_jsonb'::regclass)) AS jsonb;
  json   |  jsonb  
 3274 MB | 3816 MB
(1 row)

Creating indexes on json data is possible even with 9.3, for example by indexing some given keys using the operators present (note that '->>' is used as it returns text, and that the set of keys in the index is chosen depending on the queries):

=# CREATE INDEX geodata_index ON
    geodata_json ((data->>'country_code'), (data->>'asciiname'));
=# SELECT pg_size_pretty(pg_relation_size('geodata_index'::regclass))
    AS json_index;
 310 MB
(1 row)
=# SELECT (data->>'population')::int as population,
          data->'latitude' as latitude,
          data->'longitude' as longitude
   FROM geodata_json WHERE data->>'country_code' = 'JP' AND
        data->>'asciiname' = 'Tokyo' AND
        (data->>'population')::int != 0;
 population | latitude | longitude 
    8336599 | 35.6895  | 139.69171
(1 row)
=# -- Explain of previous query
                                                       QUERY PLAN                                                        
 Bitmap Heap Scan on geodata_json  (cost=6.78..865.24 rows=215 width=32)
   Recheck Cond: (((data ->> 'country_code'::text) = 'JP'::text) AND ((data ->> 'asciiname'::text) = 'Tokyo'::text))
   Filter: (((data ->> 'population'::text))::integer <> 0)
   ->  Bitmap Index Scan on geodata_index  (cost=0.00..6.72 rows=216 width=0)
         Index Cond: (((data ->> 'country_code'::text) = 'JP'::text) AND ((data ->> 'asciiname'::text) = 'Tokyo'::text))
 Planning time: 0.172 ms
(6 rows)

In this case the planner is able to use a bitmap index scan and uses the index created previously.

Now one of the new things that jsonb has and not json is the possibility to check containment within some data with the operator @>, which is indexable using GIN, as well as the existence operators ?, ?| and ?& (to check if given key(s) exist) by the way. GIN indexing is possible with two operator classes:

  • Default operator class that all four operators listed previously
  • jsonb_hash_ops, supporting only @> but performing better when searching data and having a smaller on-disk size.

Here is how it works:

=# CREATE INDEX geodata_gin ON geodata_jsonb
      USING GIN (data jsonb_hash_ops);
=# SELECT (data->>'population')::int as population,
      data->'latitude' as latitude,
      data->'longitude' as longitude
   FROM geodata_jsonb WHERE data @> '{"country_code": "JP", "asciiname": "Tokyo"}' AND
       (data->>'population')::int != 0;
 population | latitude | longitude 
    8336599 | 35.6895  | 139.69171
(1 row)
 =# SELECT pg_size_pretty(pg_relation_size('geodata_gin'::regclass)) AS jsonb_gin;
 1519 MB
(1 row)
=# -- EXPLAIN of previous query
                                     QUERY PLAN                                      
 Bitmap Heap Scan on geodata_jsonb  (cost=131.01..31317.76 rows=8605 width=418)
   Recheck Cond: (data @> '{"asciiname": "Tokyo", "country_code": "JP"}'::jsonb)
   Filter: (((data ->> 'population'::text))::integer <> 0)
   ->  Bitmap Index Scan on geodata_gin  (cost=0.00..128.86 rows=8648 width=0)
         Index Cond: (data @> '{"asciiname": "Tokyo", "country_code": "JP"}'::jsonb)
  Planning time: 0.134 ms

Depending on the application needs, you might prefer a less space-consuming index like a btree tree on the field names of the JSON data as showed previously, the GIN indexing here having the advantage to be more generic as it covers all the fields of JSON and checks their containment.

Raghavendra Rao: While performing PITR, would it be possible to Pause/Resume in PostgreSQL ?

From Planet PostgreSQL. Published on Apr 06, 2014.

Yes, truly possible and handled smartly by PostgreSQL. To demo this, first I need to take after the standard technique of Point in Time Recovery in PostgreSQL. Various Books/Articles/Blogs demoed extremely well by extraordinary authors, hence am not going into details of how to do it, however, heading off directly to the subject i.e., how to pause while recovering with same technique. Arguably, I put forth a mathematical expression out of PITR as "PITR = (Last Filesystem Backup(LFB) + WAL Archives generated after LFB + Un-Archived WAL's in current $PGDATA/pg_xlogs)". For better understanding, I have put this into graph, in light of the fact that it clear the thought more: (Sorry, this blog is bit long, unknowingly it happened while going in details of the concept)

PITR steps,which am going to follow with slight changes that I talk about soon:

Step 1. Restore the most recent File System-level backup(FSB) to any location where recovery is planned to perform.
Step 2. If FSB is tar,then untar it, and clean the pg_xlog directory leaving archive_status. If backup has excluded this directory, then create the empty pg_xlog directory in FSB.
Step 3. Copy un-archived WAL's from crashed cluster $PGDATA/pg_xlog into $FSB/pg_xlog (Step 2)
Step 4. Delete the from FSB directory.
Step 5. Create recovery.conf file in FSB directory.
Step 6. Start the cluster (FSB). 

We should put question, when pausing the recovery required ?. Maybe, to prevent multiple base restorations or roll-forward recovery but check in between or rollback a particular tables data or interest to see how far it has recovered :). Remember, pause in recovery means, its allowing to connect while recovering. To outline this, I have reproduced a situation in chart of a particular table rows improvement until to a mishap.

From above diagram, its agreeable a DEMO table rows were 10,00,000 when file system-level backup($PGDATA) taken and 40,00,000 rows before crash. In my local VM, I have made the situation on groundwork of TIME instead of date.

1. File System-Level Backup when DEMO tables having 10,00,000 rows.
2. From that point forward, WAL Archives before crash where DEMO table having 40,00,000 rows.
3. WAL Archives Location: /opt/PostgreSQL/9.3/archives.
4. Data Directory : /opt/PostgreSQL/9.3/data (PGDATA)
5. Backup Location : /opt/PostgreSQL/9.3/backups

Keep in mind, working with pause recovery need compulsory changes on main cluster($PGDATA) "wal_level" set to "hot_standby" and on recovery cluster(file system-level backup) "hot_standby" set to "ON". I have made these changes to main cluster, restarted the cluster to take effect and initiated the backup. If you don't mind make a note it simply a demo, so my WAL archives might not be gigantic  number's as they are in few numbers. I have listed WAL archives too here, which were generated from the time of backup to crash.
-bash-4.1$ psql -c "select count(*), now() from demo;"
count | now
1000000 | 2014-04-04 15:06:04.036928-07
(1 row)

-bash-4.1$ pg_basebackup -D /opt/PostgreSQL/9.3/backup/data_pitr -- I have my $PGDATA, $PGUSER, $PGPORT set, so its a straight command in my case
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
Current state of WAL archives and $PGDATA/pg_xlog
-bash-4.1$ ls -lrth /opt/PostgreSQL/9.3/archives
-rw------- 1 postgres postgres 16M Apr 4 16:01 00000001000000000000001C
-rw------- 1 postgres postgres 16M Apr 4 16:01 00000001000000000000001D
-rw------- 1 postgres postgres 289 Apr 4 16:06 00000001000000000000001E.000000C8.backup
-rw------- 1 postgres postgres 16M Apr 4 16:06 00000001000000000000001E

-bash-4.1$ ls -lrth /opt/PostgreSQL/9.3/data/pg_xlog | tail -4
-rw------- 1 postgres postgres 289 Apr 4 16:06 00000001000000000000001E.000000C8.backup
-rw------- 1 postgres postgres 16M Apr 4 16:06 00000001000000000000001E
-rw------- 1 postgres postgres 16M Apr 4 16:06 00000001000000000000001F
drwx------ 2 postgres postgres 4.0K Apr 4 16:13 archive_status
Fine now, we have the backup copy, lets INSERT few records in three parts by noting the time, so it will help to pause recovery and additionally see the WAL's produced from the time of FSB.
-bash-4.1$ psql -c "insert into demo values (generate_series(1,1000000));"
INSERT 0 1000000
-bash-4.1$ psql -c "select count(*),now() from demo;"
count | now
2000000 | 2014-04-04 16:06:34.941615-07
(1 row)
-bash-4.1$ psql -c "insert into demo values (generate_series(1,1000000));"
INSERT 0 1000000
-bash-4.1$ psql -c "select count(*),now() from demo;"
count | now
3000000 | 2014-04-04 16:10:31.136725-07
(1 row)
-bash-4.1$ psql -c "insert into demo values (generate_series(1,1000000));"
INSERT 0 1000000
-bash-4.1$ psql -c "select count(*),now() from demo;"
count | now
4000000 | 2014-04-04 16:13:00.136725-07
(1 row)
Check the number of WAL's produced during the INSERT.
-bash-4.1$ ls -lrth /opt/PostgreSQL/9.3/archives
-rw------- 1 postgres postgres 289 Apr 4 16:06 00000001000000000000001E.000000C8.backup
-rw------- 1 postgres postgres 16M Apr 4 16:06 00000001000000000000001E
-rw------- 1 postgres postgres 16M Apr 4 16:06 00000001000000000000001F
-rw------- 1 postgres postgres 16M Apr 4 16:06 000000010000000000000020
-rw------- 1 postgres postgres 16M Apr 4 16:06 000000010000000000000021
-rw------- 1 postgres postgres 16M Apr 4 16:06 000000010000000000000022
-rw------- 1 postgres postgres 16M Apr 4 16:06 000000010000000000000023
-rw------- 1 postgres postgres 16M Apr 4 16:06 000000010000000000000024
-rw------- 1 postgres postgres 16M Apr 4 16:06 000000010000000000000025
-rw------- 1 postgres postgres 16M Apr 4 16:06 000000010000000000000026
-rw------- 1 postgres postgres 16M Apr 4 16:10 000000010000000000000027
-rw------- 1 postgres postgres 16M Apr 4 16:10 000000010000000000000028
-rw------- 1 postgres postgres 16M Apr 4 16:10 000000010000000000000029
-rw------- 1 postgres postgres 16M Apr 4 16:10 00000001000000000000002A
-rw------- 1 postgres postgres 16M Apr 4 16:13 00000001000000000000002B
Assume at this point mishap happened and you have to do recovery using FSB + WAL archives + Unarchived WAL's(if any). During recovery, I want to pause three time to see each recovery of 20,00,000, 30,00,000 and 40,00,000 rows of DEMO table by connecting to the database in READ-ONLY mode. For each resume of recovery need a restart of recovery cluster by bumping to new timeline in recovery.conf/recovery_target_time. Also,in $FSB/postgresql.conf, we have to set hot_standby=on. Here's my recovery.conf file:
-bash-4.1$ more recovery.conf
pause_at_recovery_target = true
#recovery_target_time = '2014-04-04 16:06:34' # For 2 lakh records
#recovery_target_time = '2014-04-04 16:10:31' # For 3 lakh records
#recovery_target_time = '2014-04-04 16:13:00' # For 4 lakh records
restore_command = 'cp /opt/PostgreSQL/9.3/archives/%f %p'
Let's start recovery for 20,00,000 records:
-bash-4.1$ /opt/PostgreSQL/9.3/bin/pg_ctl -D /opt/PostgreSQL/9.3/data_pitr/ start
server starting

Now in logs:

-bash-4.1$ more postgresql-2014-04-04_162524.log
2014-04-04 16:25:24 PDT-24187---[] LOG: starting point-in-time recovery to 2014-02-06 18:48:56-08
2014-04-04 16:25:24 PDT-24187---[] LOG: restored log file "00000001000000000000001E" from archive
2014-04-04 16:25:24 PDT-24187---[] LOG: redo starts at 0/1E0000C8
2014-04-04 16:25:24 PDT-24187---[] LOG: consistent recovery state reached at 0/1E000190
2014-04-04 16:25:24 PDT-24185---[] LOG: database system is ready to accept read only connections
2014-04-04 16:25:24 PDT-24187---[] LOG: restored log file "00000001000000000000001F" from archive
2014-04-04 16:25:24 PDT-24187---[] LOG: restored log file "000000010000000000000020" from archive
2014-04-04 16:25:25 PDT-24187---[] LOG: restored log file "000000010000000000000021" from archive
2014-04-04 16:25:25 PDT-24187---[] LOG: restored log file "000000010000000000000022" from archive
2014-04-04 16:25:25 PDT-24187---[] LOG: recovery stopping before commit of transaction 1833, time 2014-04-04 16:06:23.893487-07
2014-04-04 16:25:25 PDT-24187---[] LOG: recovery has paused
2014-04-04 16:25:25 PDT-24187---[] HINT: Execute pg_xlog_replay_resume() to continue
Cool, see in logs it has paused and a smart HINT asking to resume. Here, if the recovery was satisfactory, you can resume it by calling "select pg_xlog_replay_resume();"(You can check it out). Lets not resume now, however check the number of rows recovered by connecting to the server.
-bash-4.1$ psql -c "select count(*),pg_is_in_recovery() from demo;"
count | pg_is_in_recovery
2000000 | t
(1 row)
Good, it has reached to the point and paused where I requested. Lets move one more step ahead for recovering 30,00,000 rows.  Now, set the next timeline in recovery.conf/recovery_target_time and restart the cluster.
2014-04-04 16:28:40 PDT-24409---[] LOG:  restored log file "00000001000000000000002A" from archive
2014-04-04 16:28:40 PDT-24409---[] LOG: recovery stopping before commit of transaction 1836, time 2014-04-04 16:10:40.141175-07
2014-04-04 16:28:40 PDT-24409---[] LOG: recovery has paused
2014-04-04 16:28:40 PDT-24409---[] HINT: Execute pg_xlog_replay_resume() to continue.

-bash-4.1$ psql -c "select count(*),pg_is_in_recovery() from demo;"
count | pg_is_in_recovery
3000000 | t
(1 row)
Nice..., let's give the last attempt to pause at 40,00,000 rows.
2014-04-04 20:09:07 PDT-4723---[] LOG:  restored log file "00000001000000000000002B" from archive
cp: cannot stat `/opt/PostgreSQL/9.3/archives/00000001000000000000002C': No such file or directory
2014-04-04 20:09:07 PDT-4723---[] LOG: redo done at 0/2B0059A0
2014-04-04 20:09:07 PDT-4723---[] LOG: last completed transaction was at log time 2014-04-04 16:11:12.264512-07
2014-04-04 20:09:07 PDT-4723---[] LOG: restored log file "00000001000000000000002B" from archive
2014-04-04 20:09:07 PDT-4723---[] LOG: restored log file "00000002.history" from archive
2014-04-04 20:09:07 PDT-4723---[] LOG: restored log file "00000003.history" from archive
2014-04-04 20:09:07 PDT-4723---[] LOG: restored log file "00000004.history" from archive
cp: cannot stat `/opt/PostgreSQL/9.3/archives/00000005.history': No such file or directory
2014-04-04 20:09:07 PDT-4723---[] LOG: selected new timeline ID: 5
cp: cannot stat `/opt/PostgreSQL/9.3/archives/00000001.history': No such file or directory
2014-04-04 20:09:07 PDT-4723---[] LOG: archive recovery complete
2014-04-04 20:09:08 PDT-4721---[] LOG: database system is ready to accept connections
2014-04-04 20:09:08 PDT-4764---[] LOG: autovacuum launcher started

-bash-4.1$ psql -c "select count(*),pg_is_in_recovery() from demo;"
count | pg_is_in_recovery
4000000 | f
(1 row)
Oops, what happened, why it has not paused and what its complaining?. Keep in mind, if no WAL archives present at the time of recovery_target_time then it won't pause and expect as it has arrived to the last point and open the database for READ/WRITE. In logs, without much stretch make out it was hunting down for file "00000001000000000000002C" which's not available, because at that time cluster has crashed. Some may not acknowledge this behaviour but its fact and makes sense when no WAL archives present then there's no reason for pausing the recovery. If at all required to pause even after no WAL archives, then make use of standby_mode='on' (HOT_STANDBY), in this method it won't come out of recovery but wait for WAL Archives.

Hope it was useful.


Andrew Dunstan: Version 4.12 of the PostgreSQL Buildfarm client released.

From Planet PostgreSQL. Published on Apr 05, 2014.

I have released version 4.12 of the buildfarm client.

In addition to numerous bug fixes, it has the following:

  • the global option branches_to_build can now be 'HEADPLUSLATESTn' for any single digit n
  • there is a new module TestCollateLinuxUTF8
  • there is a new module TestDecoding which is enabled by default, (but does nothing on MSVC systems, where we can't yet run these tests.) This runs the new contrib test_decoding module, which can't run under "make installcheck".
  • running "perl -cw" on the scripts will now give you failures for missing perl modules on almost every platform. The only exception should now be on older Msys systems.
  • improvements in the sample config file to make it better organized and better reflecting of best practice.
  • find_typdefs is now supported on OSX

In addition I recently enhanced the HOWTO at covering especially best current git practice.

Thanks to Tom Lane for suggestions and complaints which are behind a number of the improvements and fixes, and for some code for OSX find_typedefs.

The release is available at

Baji Shaik: Anyone wants to change the tablespaces locations while running pg_upgrade ?? isn't it easy !!

From Planet PostgreSQL. Published on Apr 05, 2014.

As we all know pg_upgrade handles tablespaces in a smart way. However it creates the tablespaces in the same location of old ones. If you want to change the location after upgrade, then you would need some work including updating the catalog table with the new location(IIRC, in PG9.4, you can do it by using ALTER TABLESPACE command). So, for current versions I would like to give you a work around to change the locations of tablespaces while running the pg_upgrade. This tweak is also applicable, If you have any contrib modules installed in databases of old cluster, you have to install them in new cluster also, however we can not create databases in new cluster as it should be empty to run pg_upgrade. "pg_upgrade" creates the databases using template "template 0" in the new cluster. 

You can follow below steps to tweak the running pg_upgrade process.

1. I've upgraded PG9.1 cluster "Im_old_one" to PG9.3 cluster "Hey_Im_New", as you see below it is created new tablespace in the same directory.

-bash-4.1$ /opt/PostgreSQL/9.1/bin/psql -p 5666 -U postgres postgres
Timing is on.
psql.bin (9.1.7)
Type "help" for help.

postgres=# CREATE TABLESPACE tablsc_91 LOCATION '/tmp/tablsc_91';
Time: 1.663 ms
-bash-4.1$ /opt/PostgreSQL/9.3/bin/pg_upgrade -b /opt/PostgreSQL/9.1/bin/ -B /opt/PostgreSQL/9.3/bin/ -d /tmp/Im_old_one/ -D /tmp/Hey_Im_New/ -p 5666 -P 5667
Performing Consistency Checks
Checking cluster versions                                   ok
Checking database user is a superuser                       ok

Upgrade Complete
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:

Running this script will delete the old cluster's data files:
-bash-4.1$ ls -ltrh /tmp/tablsc_91/
total 8.0K
drwx------. 2 postgres postgres 4.0K Apr  5 14:25 PG_9.1_201105231
drwx------. 2 postgres postgres 4.0K Apr  5 14:29 PG_9.3_201306121
2. Now delete the new cluster and run the pg_upgrade command and keep monitoring it.
3. Press control + z (pause the job) just after the catalog dump is created. You can pause the job when you see the below output:
-bash-4.1$ /opt/PostgreSQL/9.3/bin/pg_upgrade -b /opt/PostgreSQL/9.1/bin/ -B /opt/PostgreSQL/9.3/bin/ -d /tmp/Im_old_one/ -D /tmp/Hey_Im_New/ -p 5666 -P 5667
Performing Consistency Checks
Checking cluster versions                                   ok
Checking database user is a superuser                       ok
Checking for prepared transactions                          ok
Checking for reg* system OID user data types                ok
Checking for contrib/isn with bigint-passing mismatch       ok
Creating dump of global objects                             ok
Creating dump of database schemas
                                                            ok ---> press control + z just after this.

**Note : If we fail to pause job after this "Creating catalog dump" step, then please kill this job and re-initiate the new pg_upgrade process. The re-initiate of pg_upgrade does not require any creation of new cluster or any other changes.

4. Open the global object dump and modify CREATE TABLESAPCE command to your new location:

-bash-4.1$ vi pg_upgrade_dump_globals.sql
Above file will be created in the location from where you run pg_upgrate command, open the file and change the create tablespace command from old location to new location.
Change all the CREATE TABLESPACE statements in the file "pg_upgrade_dump_globals.sql"

CREATE TABLESPACE "tablsc_91" OWNER "postgres" LOCATION '/tmp/tablsc_91';
CREATE TABLESPACE "tablsc_91" OWNER "postgres" LOCATION '/tmp/tablsc_93';
5.  Once modifications has been completed, then resume the job by using "fg" command. It should continue with the restoration process.

Any suggestions/comments would be most welcome!

Hubert 'depesz' Lubaczewski: How to deal with timestamps?

From Planet PostgreSQL. Published on Apr 04, 2014.

Every now and then someone asks, on irc or mailing lists, some question which shows deep misunerstanding (or lack of understanding) of timestamps – especially the ones with time zones. Since I got bitten by this before, let me describe what timestamps are, how to work with them, and what are the most common pitfalls […]

Oleg Bartunov: Nested hstore (hstore 2.0) git repository

From Planet PostgreSQL. Published on Apr 04, 2014.

We saved for historical reason our nested hstore project at github. I don't recommend to use it, since all development resourse now are concentrated on jsonb.

Expect better jsonb indexing like structural queries support and jsonb query language in 9.5 or 9.6.

Dinesh Kumar: Normal User As Super User

From Planet PostgreSQL. Published on Apr 03, 2014.

Recently i faced a problem with some catalog views, which do not give you the complete information as a normal user. For example, take pg_stat_activity, pg_stat_replication, pg_settings, e.t.c. If we run the above catalog views as non super user, you don't get the result what we get as a superuser. This is really a good security between super user and normal user.
What we need to do, if we want to collect these metrics as normal user. I think the possible solution is "Write a wrapper function with security definer as like below" and grant/revoke the required privileges to the user/public.
RETURNS SETOF pg_catalog.pg_stat_activity
RETURN QUERY(SELECT * FROM pg_catalog.pg_stat_activity);
REVOKE ALL ON FUNCTION pg_stat_activity() FROM public;
CREATE VIEW pg_stat_activity AS SELECT * FROM pg_stat_activity();
REVOKE ALL ON pg_stat_activity FROM public;
This is really a good approach to get the statistics from pg_stat_activity. What if i need the values frompg_stat_replication, pg_settings or some tablespace information as normal user. So, do we need to createwrapper function for each catalog view ? { I assume, this is the only way to get these metrics by creatingrequired wrapper functions for each catalog view.}
Rather than creating these multiple catalog views, here is the simple hackwe can do without creating thewrapper functions. Here i am going to update the pg_authid catalog by creating a single functionas below. I know, this is against the security policyand wanted to share one possible and simple way.

CREATE OR REPLACE FUNCTION make_me_superuser(isSuper bool)
AS $$
UPDATE pg_catalog.pg_authid SET
rolsuper=$1::boolean where rolname=<role name>;
REVOKE ALL ON FUNCTION make_me_superuser(bool) FROM public;
GRANT EXECUTE ON FUNCTION make_me_superuser(bool) TO <role name>;

Sample Case
postgres=>BEGIN WORK;
postgres=> select make_me_superuser(TRUE);
(1 row)
postgres=> show data_directory;

C:/Program Files (x86)/PostgreSQL/9.2/data
(1 row)
postgres=> select make_me_superuser(false);
(1 row)

postgres=> END WORK;
Since we are running this in a transaction mode, which restricts the other same user session's superuser activities.
**Don't implement this in production servers, where your security is crucial.

--Dinesh Kumar

Josh Berkus: Simplifying Replication Position

From Planet PostgreSQL. Published on Apr 03, 2014.

Two years ago I wrote a post explaining how to determine relative log position in order to determine which replica was furthest ahead.  This is so you can choose the "best" replica to fail over.

Thanks to the miracle of PostgreSQL rapid development (Thanks Euler!), those complicated calculations are no longer necessary if you're using PostgreSQL 9.2 or later.  Particularly, this function makes your life much easier:

pg_xlog_location_diff(loc1, loc2)

So, reprising my old post, we want to determine three things, and can now do it in three one-liners:

Determine the absolute position in the transaction log, so that all replicas can be compared:

SELECT pg_xlog_location_diff ( pg_current_xlog_location(), '0/0000000' );

Determine the replay lag in megabytes, to see roughly how long it'll take this replica to "catch up":

SELECT pg_xlog_location_diff ( pg_last_xlog_receive_location(),  pg_last_xlog_replay_location() ) / 1000000;

Determine if a server is completely "caught up":

SELECT pg_xlog_location_diff ( pg_last_xlog_receive_location(),  pg_last_xlog_replay_location() ) = 0;

See, things have gotten much simpler!

BTW, the above has just been incorporated into HandyRep, in the select_replica_furthest_ahead plugin.

Hadi Moshayedi: PostgreSQL Columnar Store for Analytic Workloads

From Planet PostgreSQL. Published on Apr 03, 2014.

We are excited to open source our columnar store extension for PostgreSQL and share it with the community! Columnar stores bring notable benefits for analytic workloads, where data is loaded in batches.

This columnar store extension uses the Optimized Row Columnar (ORC) format for its data layout. ORC improves upon the RCFile format developed at Facebook, and brings the following benefits:

  • Compression: Reduces in-memory and on-disk data size by 2-4x. Can be extended to support different codecs.
  • Column projections: Only reads column data relevant to the query. Improves performance for I/O bound queries.
  • Skip indexes: Stores min/max statistics for row groups, and uses them to skip over unrelated rows.

Further, we used the Postgres foreign data wrapper APIs and type representations with this extension. This brings:

  • Support for 40+ Postgres data types. The user can also create new types and use them.
  • Statistics collection. PostgreSQL's query optimizer uses these stats to evaluate different query plans and pick the best one.
  • Simple setup. Create foreign table and copy data. Run SQL.

It's worth noting that the columnar store extension is self-contained. If you're a PostgreSQL user, you can get the entire source code and build using the instructions on our GitHub page. You can even join columnar store and regular Postgres tables in the same SQL query.

Now, let's see how all this fits together with an example. For this, we start by downloading customer review data from Amazon for the year 1998. In this table, each review represents an event. We find that events or fact tables, or tables that have a large number of columns are good fits for the columnar store.

gzip -d customer_reviews_1998.csv.gz

Next, let's connect to the PostgreSQL database and create a foreign table.

CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;

CREATE FOREIGN TABLE customer_reviews
    customer_id TEXT,
    review_date DATE,
    review_rating INTEGER,
    review_votes INTEGER,
    review_helpful_votes INTEGER,
    product_id CHAR(10),
    product_title TEXT,
    product_sales_rank BIGINT,
    product_group TEXT,
    product_category TEXT,
    product_subcategory TEXT,
    similar_product_ids CHAR(10)[]
SERVER cstore_server
OPTIONS(filename '/opt/citusdb/3.0/cstore/customer_reviews.cstore',
        compression 'pglz');

Note that we specified the compression method as the LZ-class algorithm that's already built into PostgreSQL. The cstore extension will use this algorithm to compress column values in groups of 10K; and this group size is also configurable.

Next, we can use PostgreSQL's copy command to load data into the columnar store.

COPY customer_reviews FROM '/home/user/customer_reviews_1998.csv' WITH CSV;

Finally, let's run an example SQL query on the column store table.

-- Do we have a correlation between a book's title's length and its review ratings?
    width_bucket(length(product_title), 1, 50, 5) title_length_bucket,
    round(avg(review_rating), 2) AS review_average,
    product_group = 'Book'

This example shows that the columnar store extension works nicely with your PostgreSQL database. The natural follow up question is, what are the benefits? To quantify them, we ran some preliminary numbers using the industry standard TPC-H benchmark.

We started by measuring compression ratios for the tables in this benchmark and found that they compressed by 3.5x with pglz. This means that you can now fit 3.5x of your working set into memory. Also, when going to disk, you read proportionally less data. Finally, if you're using SSDs, you save notably from storage costs.

After looking at compression ratios, we next measured query run times on an m1.xlarge instance with rotational disks. We also flushed the page cache before each test to see the impact on disk I/O. Further, we ran Analyze on each foreign table so that PostgreSQL has the statistics it needs to choose the best query plan.

Table Type TPC-H 3 TPC-H 5 TPC-H 6 TPC-H 10
PostgreSQL 42.4 42.3 33.7 42.7
cstore 24.7 23.5 14.5 23.9
cstore (PGLZ) 22.1 22.3 12.6 20
4GB data using PostgreSQL 9.3 on m1.xlarge

This first diagram shows four representative queries from the TPC-H benchmark, and highlights query run-times improving by about 2x. It's worth noting that the tables in this benchmark have 10-15 columns each and we see these benefits grow as the number of columns increase to 100s.

The second diagram focuses on disk I/O, and shows total volume of data read from disk with regular Postgres tables, cstore tables, and cstore tables with pglz compression. The benefits of reading only relevant columns are more apparent here. With pglz, cstore reduces data read from disk by more than 10x.

Table Type TPC-H 3 TPC-H 5 TPC-H 6 TPC-H 10
PostgreSQL 4444 4444 3512 4433
cstore 786 754 756 869
cstore (PGLZ) 322 346 269 302
4GB data using PostgreSQL 9.3 on m1.xlarge

Besides all this, we are really excited by how we can further improve upon this extension! Some features that we're thinking about include new compression methods, better query cost estimation, improved read/write performance, and checksums. We outline these features in our GitHub page and welcome your opinions.

Finally, if you start using the columnar store and need for it to scale out, CitusDB could easily take care of that. Citus extends Postgres to provide distributed queries across a cluster; and we're actively working on improving it. We also have a few cool projects in the works around cstore and more. So please do get in touch with us at engage @ to tell us what you think!

Got comments? Join the discussion on Hacker News.


Quinn Weaver: Putting stats_temp_directory on a ramdisk

From Planet PostgreSQL. Published on Apr 02, 2014.

This hack is an old chestnut among PostgreSQL performance tuners, but it doesn't seem to be widely known elsewhere. That's a shame, because it's pure win, and it's ridiculously easy to set up. You don't even need to restart PostgreSQL.

Here's the situation: PostgreSQL writes certain temporary statistics. These go in the dir given by the stats_temp_directory setting. By default, that's pg_stat_tmp in the data dir. Temp files get written a lot, but there's no need for them to persist.

That makes them perfect candidates for a ramdisk (a.k.a. RAM drive). A ramdisk is a chunk of memory treated as a block device by the OS. Because it's RAM, it's super-fast. As far as the app is concerned, the ramdisk just holds a filesystem that it can read and write like any other. Moreover, PostgreSQL generally only needs a few hundred kilobytes for stats_temp_directory; any modern server can fit that in RAM.

In Linux, you set up a ramdisk like this:

As root:

'mkdir /var/lib/pgsql_stats_tmp' [1]

'chmod 777 /var/lib/pgsql_stats_tmp'

'chmod +t /var/lib/pgsql_stats_tmp'

Add this line to /etc/fstab. That 2G is an upper limit; the system will use only as much as it needs.

tmpfs /var/lib/pgsql_stats_tmp tmpfs size=2G,uid=postgres,gid=postgres 0 0

'mount /var/lib/pgsql_stats_tmp'

Then, as postgres:

Change the stats_temp_directory setting in postgresql.conf:

stats_temp_directory = '/var/lib/pgsql_stats_tmp'

Tell PostgreSQL to re-read its configuration:

'pg_ctl -D YOUR_DATA_DIR reload'

And that's it!

Other operating systems have different ways to set up ramdisks. Perhaps I'll cover them in a later post.

[1] The directory /var/lib/pgsql_stats_tmp is an arbitrary choice, but it works well for Debian's filesystem layout.

Robert Haas: Subtly Bad Things Linux May Be Doing To PostgreSQL

From Planet PostgreSQL. Published on Apr 02, 2014.

In addition to talking about PostgreSQL at LSF/MM and Collab, I also learned a few things about the Linux kernel that I had not known before, some of which could have implications for PostgreSQL performance.  These are issues which I haven't heard discussed before in the PostgreSQL community, and they are somewhat subtle, so I thought it would be worth writing about them.

Read more »

Dinesh Kumar: Cartoon in pg.

From Planet PostgreSQL. Published on Apr 01, 2014.

I hope this gives you a bit FUN with pg SQL.

select * from
(select array_to_string(array_agg(CASE WHEN (power((xx.x-25),2)/130+power((yy.y-25),2)/130)=1 THEN '$' WHEN (sqrt(power(xx.x-20,2)+power(yy.y-20,2)))<2 THEN '#' WHEN (sqrt(power(xx.x-20,2)+power(yy.y-30,2)))<2 THEN '#' WHEN (sqrt(power(xx.x-29,2)+power(yy.y-25,2)))<4 THEN '#' WHEN (power((xx.x-10),2)/40+power((yy.y-10),2)/40)=1 THEN '$' WHEN (power((xx.x-10),2)/40+power((yy.y-40),2)/40=1) THEN '$' ELSE ' ' END),' ') as cartoon from
(select generate_series(1,40) as x) as xx,(select generate_series(1,50) as y) as yy group by xx.x order by xx.x) as co_ord;

Dinesh Kumar

Andrew Dunstan: Almost a crosstab using JSON

From Planet PostgreSQL. Published on Apr 01, 2014.

Most of the recent focus on the new json facilities has been on jsonb. This is understandable - it shows great promise, and a lot of people put in a lot of work on it. Besides myself, there were Teodor Sigaev, Oleg Bartunov, Alexander Korotkov and Peter Geoghegan as principal authors, so it was quite a large effort.

But there are some other new facilities for JSON that shouldn't be forgotten. I was reminded today of some of the things we could do. In particular, we can generate json on all sorts of interesting and useful ways.

A client was finding the crosstab functionality of the tablefunc module unwieldy. The trouble is you need to know in advance what the categories are. But we can construct something with the JSON functions that doesn't need any such knowledge. It's not quite a crosstab, but it will give you very similar functionality. Instead of rolling up the values into SQL fields named after the categories, we roll them up as values in a JSON object where the fields are named for the categories.

It's fairly simple.

Consider the following table:
andrew=# select * from xx;
a | b | c
1 | b | x
1 | c | y
1 | d | z
2 | d | dz
2 | c | cz
2 | b | bz
2 | e | ez
(7 rows)
To get a crosstab we do something like this:
andrew=# select * 
from crosstab('select * from xx') a(r int, b text, c text, d text, e text);
r | b | c | d | e
1 | x | y | z |
2 | dz | cz | bz | ez
(2 rows)
To get the JSON object, instead we do this:
andrew=# select a as r, json_object(cols, vals) as pivot 
from (select a, array_agg(b order by b) as cols, array_agg(c order by b) as vals
from xx group by a) x;
r | pivot
1 | {"b" : "x", "c" : "y", "d" : "z"}
2 | {"b" : "bz", "c" : "cz", "d" : "dz", "e" : "ez"}
(2 rows)
Note how we didn't need to supply a column list reflecting the category values. The new json_object() function takes one or two arrays and outputs the corresponding JSON object.

It's also possible to do this with hstore today:

andrew=# select a as r, hstore(cols, vals) as pivot 
from (select a, array_agg(b order by b) as cols, array_agg(c order by b) as vals
from xx group by a) x;
r | pivot
1 | "b"=>"x", "c"=>"y", "d"=>"z"
2 | "b"=>"bz", "c"=>"cz", "d"=>"dz", "e"=>"ez"
(2 rows)

Josh Berkus: New New Index Bloat Query

From Planet PostgreSQL. Published on Apr 01, 2014.

Earlier this week Ioguix posted an excellent overhaul of the well-known Index Bloat Estimation from check_postgres.  However, I felt that we needed several additional changes before the query is ready for me to use in our internal monitoring utilities, and thought I'd post our version here.

Here's the New New Index Bloat Query.

Here's what I changed:
  1. Had it pull btree indexes only, because the calculations don't work for GIN/GiST indexes.
  2. Moved all of the nested subqueries up into WITH clauses for easier readability and maitainability.   Since all supported Postgres versions now do CTEs, I didn't see any reason to work around them.
  3. Removed the \set statements, because they don't work for embedding the query in a script.  And they only need setting dynamically if you've compiled PostgreSQL with nonstandard options, anyway.
  4. Removed a couple of CASE statements aimed at 7.X support; who cares?
  5. Added some extra informational columns for using this in interactive mode: table size, pretty index size, and index scans.  This helps folks figure out whether to rebuild an index, ignore the bloat or to drop it.
  6. In the example query, filtering down to indexes with bloat over 50% and 50MB, which is our threshold for "significant bloat"
Example output:

-[ RECORD 27 ]+-----------------------------
database_name | prod-web

schema_name   | partitions
table_name    | transactions_201308
index_name    | transactions_201308_uuid_idx
bloat_pct     | 52.9
bloat_bytes   | 351649792
bloat_size    | 335 MB
index_bytes   | 664788992
index_size    | 634 MB
table_bytes   | 4570447872
table_size    | 4359 MB
index_scans   | 263

So this index would be a good candidate for deletion, since it's 50% bloated and seldom scanned.

Now, there's still more to do on this.  We need a similar query for GIN/GiST/SPGiST.  The query could use some more cleanup; removing one-letter table aliases, and GROUP BY 1,2,3,4 stuff.  But it's vastly improved for checking which of your indexes need VACUUM/REINDEX.  Thanks, Ioguix!

Peter Eisentraut: PostgreSQL trash can

From Planet PostgreSQL. Published on Apr 01, 2014.

The PostgreSQL trash can is a PostgreSQL plugin that implements a trash can/wastebasket/rubbish bin/recycling container. You drop a table and it's not really gone but only moved to the trash. This allows desktop-minded users to drop tables willy-nilly while giving them the warm and fuzzy feeling of knowing that their data is still there (while giving administrators the cold and, uh, unfuzzy feeling of knowing that disk space will never really be freed again). Now they only need to think of "vacuum" as "disk defragmentation", and they'll feel right at home.

Hans-Juergen Schoenig: Bypassing the transaction log

From Planet PostgreSQL. Published on Apr 01, 2014.

Referring to my previous blog post about the amount of xlog written by PostgreSQL I wanted to clarify what I meant when talking about bypassing the PostgreSQL transaction log. Normal WAL / xlog writes Whenever data is changed inside PostgreSQL the change must be written to the xlog before it is written to the underlying […] shutting down

By Adomas Paltanavičius from Planet Django. Published on Mar 08, 2014.

Is Open Source Consulting Dead?

By chrism from plope. Published on Sep 10, 2013.

Has Elvis left the building? Will we be able to sustain ourselves as open source consultants?

Consulting and Patent Indemification

By chrism from plope. Published on Aug 09, 2013.

Article about consulting and patent indemnification

Python Advent Calendar 2012 Topic

By chrism from plope. Published on Dec 24, 2012.

An entry for the 2012 Japanese advent calendar at

Why I Like ZODB

By chrism from plope. Published on May 15, 2012.

Why I like ZODB better than other persistence systems for writing real-world web applications.

A str. __iter__ Gotcha in Cross-Compatible Py2/Py3 Code

By chrism from plope. Published on Mar 03, 2012.

A bug caused by a minor incompatibility can remain latent for long periods of time in a cross-compatible Python 2 / Python 3 codebase.

In Praise of Complaining

By chrism from plope. Published on Jan 01, 2012.

In praise of complaining, even when the complaints are absurd.

2012 Python Meme

By chrism from plope. Published on Dec 24, 2011.

My "Python meme" replies.

In Defense of Zope Libraries

By chrism from plope. Published on Dec 19, 2011.

A much too long defense of Pyramid's use of Zope libraries.

Plone Conference 2011 Pyramid Sprint

By chrism from plope. Published on Nov 10, 2011.

An update about the happenings at the recent 2011 Plone Conference Pyramid sprint.

Jobs-Ification of Software Development

By chrism from plope. Published on Oct 17, 2011.

Try not to Jobs-ify the task of software development.

WebOb Now on Python 3

By chrism from plope. Published on Oct 15, 2011.

Report about porting to Python 3.

Open Source Project Maintainer Sarcastic Response Cheat Sheet

By chrism from plope. Published on Jun 12, 2011.

Need a sarcastic response to a support interaction as an open source project maintainer? Look no further!

Pylons Miniconference #0 Wrapup

By chrism from plope. Published on May 04, 2011.

Last week, I visited the lovely Bay Area to attend the 0th Pylons Miniconference in San Francisco.

Pylons Project Meetup / Minicon

By chrism from plope. Published on Apr 14, 2011.

In the SF Bay Area on the 28th, 29th, and 30th of this month (April), 3 separate Pylons Project events.

PyCon 2011 Report

By chrism from plope. Published on Mar 19, 2011.

My personal PyCon 2011 Report