Skip to content. | Skip to navigation

Personal tools
Log in
Sections
You are here: Home

Open Source Posts

Josh Berkus: SFPUG August: Tour de Bruce

From Planet PostgreSQL. Published on Jul 30, 2015.

main-image

Original Core Team member Bruce Momjian will be visiting the Bay Area this August, and will be doing two different talks, both for SF City and North Bay. If you want the full Brucealicious experience, you can go to both! For San Francisco, we're taking a vote on which topic Bruce should present.

Join us, and welcome Bruce.

Photo (c) 2006 Michael Glaesmann

Josh Berkus: Texas Trip: DjangoCon and Postgres Open

From Planet PostgreSQL. Published on Jul 30, 2015.



This September, I will be having a great Texas adventure, and you're invited along. 

September 8-10, Austin: DjangoCon.US.   I will be presenting "PostgreSQL Performance in 15 Minutes" on the 9th. 

September 9th or 10th, Austin: I will speak at AustinPUG.  Date, location and exact topic still TBA.  Follow the AustinPUG Meetup, or check back here in a week for an update.

September 16-18, Dallas: Postgres Open:  I will be doing my Replication 101 tutorial, and then Explaining Explain.

September is a great time to be in Texas: temperatures are reasonable, Texans are friendly, and there's lots of great Mexican food and barbecue.  So, register for DjangoCon and/or Postgres Open today and join me!

Oh, and the photo? That's one of the hand-thrown DjangoCon pint cups I'm making as speaker gifts for DjangoCon.  Don't you wish you'd submitted a talk, now?

Automatic visual image enhancement for your web application

By Cloudinary Blog - Django from Django community aggregator: Community blog posts. Published on Jul 29, 2015.

VIESUS automatic image enhancement

Various factors can have an effect on the visual quality of photos captured by a wide variety of digital cameras. Technical limitations of cameras, coupled with changing conditions in which users take photos, results in a wide range of visual quality. Camera-related limitations arise from a combination of poor optics, noisy sensors, and the modest capabilities of mobile camera phones that are used to take photos in conditions that range from bright daylight to indoor scenes with incandescent light or even dark night scenes.

If you have lots of spare time, one option is to spend hours trying to enhance your images by adjusting brightness and color, restoring sharpness, removing noise, correcting for overexposure or underexposure, etc. Furthermore, the results achieved will not only depend on your training and experience with the photo editing software, but also on the quality, condition and calibration of the monitor used. Manual fine-tuning is also time consuming, and as the amount of image content is constantly growing, there is an obvious need for automatic image enhancement

VIESUS logo

VIESUS™ is a software application developed by Imaging Solutions AG that takes everyday digital camera images and enhances them to look more visually attractive. VIESUS first analyses the image data then automatically applies any processing steps as needed: fixing dull colors and bad color balance, removing digital noise, adjusting poor sharpness / blurriness, correcting for overexposure or underexposure, and more.

Cloudinary provides an add-on for using VIESUS's image enhancement capabilities, fully integrated into Cloudinary's image management and manipulation pipeline. With VIESUS's image enhancement add-on, you can extend Cloudinary's powerful image manipulation and optimization capabilities by automatically enhancing images to their best visual quality.

Automatically enhancing images

Cloudinary already supports on-the-fly manipulation using URLs for resizing, cropping, applying effects, etc. Now you can also use VIESUS as an effect by setting the effect transformation parameter to viesus_correct (or e_viesus_correct for URLs) which tells Cloudinary to dynamically enhance the image to the best visual quality using the VIESUS add-on.

Take a look at the following photo of the Golden Gate Bridge in San Francisco that was uploaded to Cloudinary's demo account as golden_gate_side.jpg. The original photo on the left has darkened colors, low contrast and poor sharpness, and looks like it was taken on an overcast day. In the VIESUS enhanced photo on the right, the brightness and contrast is increased and the colors appear sharper and more vivid, while the photo now looks like it was taken on a bright sunny day.

Original image Auto corrected image

Ruby:
cl_image_tag("golden_gate_side.jpg", :width=>350, :crop=>:scale, :effect=>"viesus_correct", :sign_url=>true)
PHP:
cl_image_tag("golden_gate_side.jpg", array("width"=>350, "crop"=>"scale", "effect"=>"viesus_correct", "sign_url"=>true))
Python:
CloudinaryImage("golden_gate_side.jpg").image(width=350, crop="scale", effect="viesus_correct", sign_url=True)
Node.js:
cloudinary.image("golden_gate_side.jpg", {width: 350, crop: "scale", effect: "viesus_correct", sign_url: true})
Java:
cloudinary.url().transformation(new Transformation().width(350).crop("scale").effect("viesus_correct")).signed(true).imageTag("golden_gate_side.jpg")
jQuery:
$.cloudinary.image("golden_gate_side.jpg", {width: 350, crop: "scale", effect: "viesus_correct"})
.Net:
cloudinary.Api.UrlImgUp.Transform(new Transformation().Width(350).Crop("scale").Effect("viesus_correct")).Signed(true).BuildImageTag("golden_gate_side.jpg")

Further image manipulations

Visual enhancement using the VIESUS add-on can be mixed with any of Cloudinary's rich set of image manipulation capabilities. The VIESUS add-on can also enhance a generated image, so instead of improving the original large photo, you can separately enhance each thumbnail or cropped version you would like to display.

For example, the following code generates and delivers a version of the uploaded golden_gate_side photo as follows:

  • Crops the photo to a width of 80% and a height of 35% with east gravity, and applies the viesus_correct effect.
  • Adds another uploaded png image named viesus_icon as an overlay. The overlay is resized to a width of 400 pixels, positioned 10 pixels from the top right corner of the containing image and is made 40% semi transparent.
  • The entire image is scaled down to a width of 600 pixels with rounded corners.

Without visual enhancement:

golden_gate_side.jpg cropped to 600 pixels with rounded corners and a logo overlay

With VIESUS visual enhancement:

golden_gate_side.jpg cropped to 600 pixels with rounded corners, enhanced with viesus and a logo overlay

Ruby:
cl_image_tag("golden_gate_side.jpg", :sign_url=>true, :transformation=>[
  {:width=>0.8, :height=>0.35, :crop=>:crop, :gravity=>:east, :effect=>"viesus_correct"},
  {:opacity=>40, :overlay=>"viesus_icon", :width=>400, :x=>10, :y=>10, :crop=>:scale, :gravity=>:north_east},
  {:radius=>20, :width=>600, :crop=>:scale}
  ])
PHP:
cl_image_tag("golden_gate_side.jpg", array("sign_url"=>true, "transformation"=>array(
  array("width"=>0.8, "height"=>0.35, "crop"=>"crop", "gravity"=>"east", "effect"=>"viesus_correct"),
  array("opacity"=>40, "overlay"=>"viesus_icon", "width"=>400, "x"=>10, "y"=>10, "crop"=>"scale", "gravity"=>"north_east"),
  array("radius"=>20, "width"=>600, "crop"=>"scale")
  )))
Python:
CloudinaryImage("golden_gate_side.jpg").image(sign_url=True, transformation=[
  {"width": 0.8, "height": 0.35, "crop": "crop", "gravity": "east", "effect": "viesus_correct"},
  {"opacity": 40, "overlay": "viesus_icon", "width": 400, "x": 10, "y": 10, "crop": "scale", "gravity": "north_east"},
  {"radius": 20, "width": 600, "crop": "scale"}
  ])
Node.js:
cloudinary.image("golden_gate_side.jpg", {sign_url: true, transformation: [
  {width: 0.8, height: 0.35, crop: "crop", gravity: "east", effect: "viesus_correct"},
  {opacity: 40, overlay: "viesus_icon", width: 400, x: 10, y: 10, crop: "scale", gravity: "north_east"},
  {radius: 20, width: 600, crop: "scale"}
  ]})
Java:
cloudinary.url().transformation(new Transformation()
  .width(0.8).height(0.35).crop("crop").gravity("east").effect("viesus_correct").chain()
  .opacity(40).overlay("viesus_icon").width(400).x(10).y(10).crop("scale").gravity("north_east").chain()
  .radius(20).width(600).crop("scale")).signed(true).imageTag("golden_gate_side.jpg")
jQuery:
$.cloudinary.image("golden_gate_side.jpg", {transformation: [
  {width: 0.8, height: 0.35, crop: "crop", gravity: "east", effect: "viesus_correct"},
  {opacity: 40, overlay: "viesus_icon", width: 400, x: 10, y: 10, crop: "scale", gravity: "north_east"},
  {radius: 20, width: 600, crop: "scale"}
  ]})
.Net:
cloudinary.Api.UrlImgUp.Transform(new Transformation()
  .Width(0.8).Height(0.35).Crop("crop").Gravity("east").Effect("viesus_correct").Chain()
  .Opacity(40).Overlay("viesus_icon").Width(400).X(10).Y(10).Crop("scale").Gravity("north_east").Chain()
  .Radius(20).Width(600).Crop("scale")).Signed(true).BuildImageTag("golden_gate_side.jpg")

For more detailed information on implementing this automatic visual enhancement to your images, see the VIESUS™ add-on documentation, and for a full list of Cloudinary's image manipulation options, see the Image transformations documentation.

Summary

Enhancing your images and user uploaded photos makes your website look nicer and improves user engagement. The VIESUS add-on is utilized to extend Cloudinary's powerful image manipulation and optimization capabilities by automatically enhancing images to their best visual quality. Simply add a single parameter to your image URLs and everything is done seamlessly, dynamically and automatically for you.

VIESUS automatic visual enhancement  add-on

The free tier of the VIESUS add-on is available to all our free and paid plans. If you don't have a Cloudinary account, you are welcome to sign up to our free account and try it out.

Hadi Moshayedi: cstore_fdw 1.3 released

From Planet PostgreSQL. Published on Jul 29, 2015.

Citus Data is excited to announce the release of cstore_fdw 1.3, which is available on GitHub at github.com/citusdata/cstore_fdw. cstore_fdw is an open source columnar store extension for PostgreSQL created by Citus Data which reduces the data storage footprint and disk I/O for PostgreSQL databases.

cstore_fdw Changes

The changes in this release include:

  • ALTER FOREIGN TABLE ADD/DROP COLUMN ... support. You can now add/drop columns on existing cstore tables. Default values on newly added columns are also supported with some restrictions.
  • Improved COPY FROM support. Column references in the COPY command is now supported. You can now specify list of columns to be copied from a file to a cstore table.
  • Query performance improvements. Table row count estimation works better which allows the query planner to better prepare query plans.
  • (BugFix) Whole row references. You can now use row references in function calls like SELECT to_json(*) FROM cstore_table.
  • (BugFix) Insert concurrency. Deadlock issue during concurrent insert is resolved.

For installation and update instructions, you can download the cstore_fdw Quick Start Guide from our website or you can view installation and update instructions on the cstore_fdw page on GitHub.

To learn more about what’s coming up for cstore_fdw, see our development roadmap.

Got questions?

If you have questions about cstore_fdw, please contact us using the cstore-users Google group.

If you discover an issue when using cstore_fdw, please submit it to cstore_fdw’s issue tracker on GitHub.

Further information about cstore_fdw is available on our website where you can also find information on ways to contact us with questions.

Read more...

Using Unsaved Related Models for Sample Data in Django 1.8

By Caktus Consulting Group from Django community aggregator: Community blog posts. Published on Jul 28, 2015.

Note: In between the time I originally wrote this post and it getting published, a ticket and pull request were opened in Django to remove allow_unsaved_instance_assignment and move validation to the model save() method, which makes much more sense anyways. There's a chance this will even be backported to Django 1.8. So, if you're using a version of Django that doesn't require this, hopefully you'll never stumble across this post in the first place! If this is still an issue for you, here's the original post:

In versions of Django prior to 1.8, it was easy to construct "sample" model data by putting together a collection of related model objects, even if none of those objects was saved to the database. Django 1.8 adds a restriction that prevents this behavior. Errors such as this are generally a sign that you're encountering this issue:

ValueError: Cannot assign "...": "MyRelatedModel" instance isn't saved in the database.

The justification for this is that, previously, unsaved foreign keys were silently lost previously if they were not saved to the database. Django 1.8 does provide a backwards compatibility flag to allow working around the issue. The workaround, per the Django documentation, is to create a new ForeignKey field that removes this restriction, like so:

class UnsavedForeignKey(models.ForeignKey):
    # A ForeignKey which can point to an unsaved object
    allow_unsaved_instance_assignment = True

class Book(models.Model):
    author = UnsavedForeignKey(Author)

This may be undesirable, however, because this approach means you lose all protection for all uses of this foreign key, even if you want Django to ensure foreign key values have been saved before being assigned in some cases.

There is a middle ground, not immediately obvious, that involves changing this attribute temporarily during the assignment of an unsaved value and then immediately changing it back. This can be accomplished by writing a context manager to change the attribute, for example:

import contextlib

@contextlib.contextmanager
def allow_unsaved(model, field):
    model_field = model._meta.get_field(field)
    saved = model_field.allow_unsaved_instance_assignment
    model_field.allow_unsaved_instance_assignment = True
    yield
    model_field.allow_unsaved_instance_assignment = saved

To use this decorator, surround any assignment of an unsaved foreign key value with the context manager as follows:

with allow_unsaved(MyModel, 'my_fk_field'):
    my_obj.my_fk_field = unsaved_instance

The specifics of how you access the field to pass into the context manager are important; any other way will likely generate the following error:

RelatedObjectDoesNotExist: MyModel has no instance.

While strictly speaking this approach is not thread safe, it should work for any process-based worker model (such as the default "sync" worker in Gunicorn).

This took a few iterations to figure out, so hopefully it will (still) prove useful to someone else!

Josh Berkus: PipelineDB: streaming Postgres

From Planet PostgreSQL. Published on Jul 28, 2015.

If you've been following the tech news, you might have noticed that we have a new open source PostgreSQL fork called "PipelineDB".  Since I've joined the advisory board of PipelineDB, I thought I'd go over what it is, what it does, and what you'd use it for.  If you're not interested in Pipeline, or Postgres forks, you can stop reading now.

PipelineDB is a streaming database version of PostgreSQL.  The idea of a streaming database, first introduced in the PostgreSQL fork TelegraphCQ back in 2003, is that queries are run against incoming data before it is stored, as a kind of stream processing with full query support.  If the idea of a standard database is "durable data, ephemeral queries" the idea of a streaming database is "durable queries, ephemeral data".  This was previously implemented in StreamBase, StreamSQL, and the PostgreSQL fork Truviso. In the Hadoop world, the concept is implemented in Apache SparkSQL.

On a practical level, what streaming queries do is allow you to eliminate a lot of ETL and redundant or temporary storage.

PipelineDB 0.7.7 is 100% of PostgreSQL 9.4, plus the ability to create Continuous Views, which are actually standing queries which produce different data each time you query them depending on the incoming stream.  The idea is that you create the queries which filter and/or summarize the data you're looking for in the stream, and store only the data you want to keep, which can go in regular PostgreSQL tables.

As an example of this, we're going to use PipelineDB to do tag popularity counts on Twitter.  Twitter has a nice streaming API, which gives us some interesting stream data to work with.  First I spun up a PipelineDB Docker container.  Connecting to it, I created the "twitter" database and a static stream called "tweets":

Creating a static stream isn't, strictly speaking, necessary; you can create a Continuous View without one.  As a career DBA, though, implied object names give me the heebie-jeebies.  Also, in some future release of PipelineDB, static streams will have performance optimizations, so it's a good idea to get used to creating them now.

    docker run pipelinedb/pipelinedb
    josh@Radegast:~$ psql -h 172.17.0.88 -p 6543 -U pipeline
    Password for user pipeline:
    psql (9.4.1, server 9.4.4)
    Type "help" for help.
    pipeline=# create database twitter;
    CREATE DATABASE
    pipeline=# \c twitter
    twitter=# create stream tweets ( content json );
    CREATE STREAM


Then I created a Continous View which pulls out all identified hashtags from each tweet.  To do this, I have to reach deep inside the JSON of the tweet structure and use json_array_elements to expand that into a column.  Continuous Views also automatically add a timestamp column called "arrival_timestamp" which is the server timestamp when that particular streaming row showed up.  We can use this to create a 1-hour sliding window over the stream, by comparing it to clock_timestamp().  Unlike regular views, volatile expressions are allowed in Continuous Views.

    CREATE CONTINUOUS VIEW tagstream as
    SELECT json_array_elements(content #>

      ARRAY['entities','hashtags'] ->> 'text' AS tag
    FROM tweets
    WHERE arrival_timestamp >
          ( clock_timestamp() - interval '1 hour' );


This pulls a continous column of tags which appear in the San Francisco stream.

Then I created a linked Docker container with all of the Python tools I need to use TwitterAPI, and then wrote this little script based on a TwitterAPI example.  This pulls a stream of tweets with geo turned on and identified as being in the area of San Francisco.  Yes, INSERTing into the stream is all that's required for a client to deliver stream data.  If you have a high volume of data, it's better to use the COPY interface if your language supports it.

Then I started it up, and it started pushing tweets into my stream in PipelineDB.  After that, I waited an hour for the stream to populate with an hour's worth of data.

Now, let's do some querying:

    twitter=# select * from tagstream limit 5;
        tag    
    ------------
    Glendale
    Healthcare
    Job
    Jobs
    Hiring


How about the 10 most popular tags in the last hour?

    twitter=# select tag, count(*) as tag_count from tagstream group
              by tag order by tag_count desc limit 10;
         tag       | tag_count
    ---------------+-----------
    Hiring         |       211
    Job            |       165
    CareerArc      |       163
    Jobs           |       154
    job            |       138
    SanFrancisco   |        69
    hiring         |        60
    FaceTimeMeNash |        42
    CiscoRocks     |        35
    IT             |        35


I detect a theme here.  Namely, one of sponsored tweets by recruiters.

Now, obviously you could do this by storing all of the tweets in an unlogged table, then summarizing them into another table, etc.  However, using continuous views avoids a bunch of disk-and-time-wasting store, transform, store again if you're uninterested in the bulk of the data (and tweet metadata is bulky indeed). Further, I can create more continuous views based on the same stream, and pull different summary information on it in parallel. 

So, there you have it: PipelineDB, the latest addition to the PostgreSQL family.  Download it or install it using the Docker container I built for it.

Shaun M. Thomas: PG Phriday: 10 Ways to Ruin Performance: Sex Offenders

From Planet PostgreSQL. Published on Jul 24, 2015.

We’re finally at the end of the 10-part PGDB (PostgreSQL) performance series I use to initiate new developers into the database world. To that end, we’re going to discuss something that affects everyone at one point or another: index criteria. Or to put it another way:

Why isn’t the database using an index?

It’s a fairly innocuous question, but one that may have a surprising answer: the index was created using erroneous assumptions. Let’s explore what happens in a hospital environment with a pared-down table of patients.

DROP TABLE IF EXISTS sys_patient;
 
CREATE TABLE sys_patient
(
    patient_id  SERIAL   NOT NULL,
    full_name   VARCHAR  NOT NULL,
    birth_dt    DATE     NOT NULL,
    sex         CHAR     NOT NULL
);
 
INSERT INTO sys_patient (full_name, birth_dt, sex)
SELECT 'Crazy Person ' || a.id,
       CURRENT_DATE - (a.id % 100 || 'y')::INTERVAL
                    + (a.id % 365 || 'd')::INTERVAL,
       CASE WHEN a.id % 2 = 0 THEN 'M' ELSE 'F' END
  FROM generate_series(1, 1000000) a(id);
 
ALTER TABLE sys_patient ADD CONSTRAINT pk_patient_id
      PRIMARY KEY (patient_id);
 
CREATE INDEX idx_patient_birth_dt ON sys_patient (birth_dt);
CREATE INDEX idx_patient_sex ON sys_patient (sex);
 
ANALYZE sys_patient;

This particular hospital has a few queries that operate based on the sex of the patient, so someone created an index on that column. One day, another developer is doing some code refactoring and, being well-trained by the resident DBA, runs the query through EXPLAIN to check the query plan. Upon seeing the result, the dev curses a bit, tries a few variants, and ultimately takes the issue to the DBA.

This is what the developer saw:

EXPLAIN ANALYZE 
SELECT *
  FROM sys_patient
 WHERE sex = 'F';
 
                             QUERY PLAN                             
--------------------------------------------------------------------
 Seq Scan ON sys_patient  
      (cost=0.00..19853.00 ROWS=498233 width=29)
      (actual TIME=0.018..541.738 ROWS=500000 loops=1)
   FILTER: (sex = 'F'::bpchar)
   ROWS Removed BY FILTER: 500000
 
 Planning TIME: 0.292 ms
 Execution TIME: 823.901 ms

No matter what the dev did, the database adamantly refused to use the idx_patient_sex index. The answer is generally obvious to a DBA or a relatively seasoned developer, but this actually happens far more frequently than one might think. This is an extreme example, yet even experienced database users, report writers, and analysts make this mistake.

Before using an index, the database essentially asks a series of questions:

  1. How many matches do I expect from this index?
  2. What proportion of the table do these matches represent?
  3. Are the cumulative random seeks faster than filtering the table?

If the answer to any of those questions is too large or negative, the database will not use the index. In our example, the sex column only has two values, and thus the answer to the above questions are more obvious than usual. With one million rows, a query only on the sex column would match half of them. In addition, randomly seeking 500,000 results is likely an order of magnitude slower than simply filtering the whole table for matches.

But it’s not always so easy to figure out what kind of cardinality to expect from a table column. Short of checking every column of every table with count(DISTINCT my_col) or something equally ridiculous, someone unfamiliar with the data in a complex table architecture would get stuck. However, in order to answer the above questions, the database itself must track certain statistics about table contents.

It just so happens that PGDB makes that data available to everyone through the pg_stats view. Let’s check what PostgreSQL has stored regarding the sys_patient table.

SELECT attname AS column_name, n_distinct
  FROM pg_stats
 WHERE tablename = 'sys_patient';
 
 column_name | n_distinct 
-------------+------------
 patient_id  |         -1
 full_name   |         -1
 birth_dt    |       7310
 sex         |          2

Interpreting these results is actually very easy. Any column with a negative n_distinct value is a ratio approaching 1. At -1, there’s a one-to-one relationship with the number of rows in the table, and the number of distinct values in that column. As a general rule, nearly any column with a negative value here is a good index candidate because a WHERE clause will reduce the potential results significantly.

Positive values are an absolute count of unique values for that column. During table analysis, the database checks a random sampling of rows and tabulates statistics based on them. That means the value in n_distinct is representative instead of exact, but usually doesn’t deviate by a significant margin. The data here doesn’t need to be viable for reporting, just to calculate an efficient query plan.

From here, we can see that the sex column would likely be a terrible index candidate, even if we know nothing else about the table. There are simply not enough distinct values to reduce the amount of matches for a query.

Given all of this, the dev has nothing to fear; the idx_patient_sex index should have never existed in the first place. A query that needs to fetch all of any particular sex will simply require a sequential scan, and that’s fine.

Creating indexes can be a game, and sometimes the only way to win is not to play.

PyCon 2015 Workshop Video: Building SMS Applications with Django

By Caktus Consulting Group from Django community aggregator: Community blog posts. Published on Jul 24, 2015.

As proud sponsors of PyCon, we hosted a one and a half hour free workshop. We see the workshops as a wonderful opportunity to share some practical, hands-on experience in our area of expertise: building applications in Django. In addition, it’s a way to give back to the open source community.

This year, Technical Director Mark Lavin and Developers Caleb Smith and David Ray presented “Building SMS Applications with Django.” In the workshop, they taught the basics of SMS application development using Django and Django-based RapidSMS. Aside from covering the basic anatomy of an SMS-based application, as well as building SMS workflows and testing SMS applications, Mark, David, and Caleb were able to bring their practical experience with Caktus client projects to the table.

We’ve used SMS on behalf of international aid organizations and agencies like UNICEF as a cost-effective and pervasive method for conveying urgent information. We’ve built tools to help Libyans register to vote via SMS, deliver critical infant HIV/AIDs results in Zambia and Malawi, and alert humanitarian workers of danger in and around Syria.

Interested in SMS applications and Django? Don’t worry. If you missed the original workshop, we have good news: we recorded it. You can participate by watching the video above!

Michael Paquier: Minimalistic Docker container with Postgres

From Planet PostgreSQL. Published on Jul 23, 2015.

Docker is well-known, is used everywhere, is used by everybody and is a nice piece of technology, there is nothing to say about that.

Now, before moving on with the real stuff, note that for the sake of this ticket all the experiments done are made on a Raspberry PI 2, to increase a bit the difficulty of the exercise and to grab a wider understanding of how to manipulate Docker containers and images at a rather low level per the reasons of the next paragraph.

So, to move back to Docker... It is a bit sad to see that there are not many container images based on ARM architectures even if there are many machines around. And also, the size of a single container image can reach easily a couple of hundred megabytes in its most simple shape (it does not change the fact that some of those images are very popular, so perhaps the author of this blog should not do experimentations on such small-scale machines to begin with).

Not all the container images are that large though, there is for example one container based on the minimalistic distribution Alpine Linux, with a size of less than 5MB. Many packages are available as well for it so it makes it a nice base image for more extended operations. Now, the fact is that even if Alpine Linux does publish deliverables for ARM, there are no Docker container around that make use of it, and trying to use a container image that has been compiled for example x86_64 would just result on an epic failure.

Hence, extending a bit a script from the upstream Docker facility of Alpine Linux, it is actually easily possible to create from scratch a container image able to run on ARM architectures (the trick has been to consider the fact that Alpine Linux publishes its ARM deliverables with the alias armhf). Note in any case the following things about this script: - root rights are needed - ARM environment needs to be used to generate an ARM container - the script is here Roughtly, what this script does is fetching a minimal base image of Alpine Linux and then importing it in an image using "docker import".

Once run simply as follows, it will register a new container image:

$ ./mkimage-alpine.sh
[...]
$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
alpine-armv7l       edge                448a4f53f4df        About an hour ago   4.937 MB
alpine-armv7l       latest              448a4f53f4df        About an hour ago   4.937 MB

The size is drastically small, and comparable to the container image already available in the Docker registry. Now, moving on to things regarding directly Postgres: how much would it cost to have a container image able to run Postgres? Let's use the following Dockerfile and get a look at it then (file needs to be named as Dockerfile):

$ cat Dockerfile_postgres
FROM alpine-armv7l:edge
RUN echo http://nl.alpinelinux.org/alpine/edge/testing >> /etc/apk/repositories && \
apk --update && \
apk add shadow postgresql bash

Note that here the package shadow is included to have pam-related utilities like useradd and usermod as Postgres cannot run as root, and it makes life simpler (and shadow is only available in the repository testing). After building the new container image, let's look at its size:

$ docker build -t alpine-postgres .
[...]
$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
alpine-postgres     latest              3bcc06a7ce79        2 hours ago         23.46 MB
alpine-armv7l       edge                448a4f53f4df        2 hours ago         4.937 MB
alpine-armv7l       latest              448a4f53f4df        2 hours ago         4.937 MB

Without bash this gets down to 22.55 MB, and without shadow + bash its size is 20.86 MB. This container image includes only the necessary binaries and libraries to be able to run a PostgreSQL server, and does nothing to initialize it or configure it. Let's use it then and create a server:

$ docker run -t -i alpine-postgres /bin/bash
# useradd -m -g wheel postgres
# su - postgres
$ initdb -D data
[...]
$ pg_ctl start -D data
$ psql -At -c 'SELECT version();'
PostgreSQL 9.4.4 on armv6-alpine-linux-muslgnueabihf, compiled by gcc (Alpine 5.1.0) 5.1.0, 32-bit

And things are visibly working fine. Now let's look at how much space would consume a development box for Postgres as a container image, and let's use the following Dockerfile spec for this purpose with some packages needed to compile and work on the code:

FROM alpine-armv7l:edge
RUN echo http://nl.alpinelinux.org/alpine/edge/testing >> /etc/apk/repositories && \
apk update && \
apk add shadow bash gcc bison flex git make autoconf

Once built, this gets larger to 125MB, but that's not really a surprise...

$ docker build -t alpine-dev .
[...]
$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
alpine-dev          latest              15dc9934cc36        16 minutes ago      125 MB
alpine-postgres     latest              3bcc06a7ce79        About an hour ago   23.46 MB
alpine-armv7l       edge                448a4f53f4df        About an hour ago   4.937 MB
alpine-armv7l       latest              448a4f53f4df        About an hour ago   4.937 MB

All the files and Dockerfile specs have been pushed here. Feel free to use them and play with them.

Josh Berkus: unsafe UPSERT using WITH clauses

From Planet PostgreSQL. Published on Jul 23, 2015.

By now you've read about PostgreSQL 9.5 and our shiny new UPSERT feature.  And you're thinking, "hey, 9.5 is still in alpha, how do I get me some UPSERT action now?"  While there are some suggestions for workarounds on the PostgreSQL wiki, I'm going to show you one more method for approximating UPSERT while you help us test 9.5 so that it will come out sooner.

The general suggested way for handling UPSERT situations is one of the following:
  1. Try to do an INSERT.  On error, have the application retry with an UPDATE.
  2. Write a PL/pgSQL procedure which does insert-then-update or update-then-insert.
Both of these approaches have the drawback of being very high overhead: the first involves multiple round-trips to the database and lots of errors in the log, and the second involves major subtransaction overhead.  Neither is concurrency-safe, but then the method I'm about to show you isn't either.  At least this method avoids a lot of the overhead, though.

What's the method?  Using writeable WITH clauses.  This feature, introduced in 9.1, allows you to do a multi-step write transaction as a single query.  For an example, let's construct a dummy table with a unique key on ID and a value column, then populate it:

     create table test_upsert ( id int not null primary key, 
        val text );
     insert into test_upsert select i, 'aaa'
       from generate_series (1, 100) as gs(i);


Now, let's say we wanted to update ID 50, or insert it if it doesn't exist.  We can do that like so:

    WITH
    newrow ( id, val ) as (
        VALUES ( 50::INT, 'bbb'::TEXT ) ),
    tryupdate as (
        UPDATE test_upsert SET val = newrow.val
        FROM newrow
        WHERE test_upsert.id = newrow.id
        RETURNING test_upsert.id
    )
    INSERT INTO test_upsert
    SELECT id, val
        FROM newrow
    WHERE id NOT IN ( SELECT id FROM tryupdate );


The above tries to update  ID=50.  If no rows are updated, it inserts them.  This also works for multiple rows:

    WITH
    newrow ( id, val ) as (
        VALUES ( 75::INT, 'ccc'::TEXT ),

                      ( 103::INT, 'ccc'::TEXT )
    ),
    tryupdate as (
        UPDATE test_upsert SET val = newrow.val
        FROM newrow
        WHERE test_upsert.id = newrow.id
        RETURNING test_upsert.id
    )
    INSERT INTO test_upsert
    SELECT id, val
        FROM newrow
    WHERE id NOT IN ( SELECT id FROM tryupdate );


... and will update or insert each row as called for.

Given that we can do the above, why do we need real UPSERT?  Well, there's some problems with this approximate method:
  • It's not concurrency-safe, and can produce unexpected results given really bad timing of multiple connections wanting to update the same rows.
  • It will still produce key violation errors given bad concurrency timing, just fewer of them than method 1 above.
  • It's still higher overhead than 9.5's UPSERT feature, which is optimized.
  • It will return INSERT 0 0  from calls that do only updates, possibly making the app think the upsert failed.
  • It's not safe to use with INSERT/UPDATE triggers on the same table.
But ... if you can't wait for 9.5, then at least it's a temporary workaround for you.  In the meantime, download 9.5 alpha and get testing the new UPSERT.

Django and Python 3 How to Setup pyenv for Multiple Pythons

By GoDjango - Django Screencasts from Django community aggregator: Community blog posts. Published on Jul 23, 2015.

We need to be doing Django development in Python 3. Unfortunately, we have a lot of projects still in Python 2.7 so switching between the 2 versions can be frustrating. Fortunately pyenv takes the guess work out of switching, and makes it super simple.
Watch Now...

Ernst-Georg Schmid: Deriving the elemental composition from a molformula in pgchem::tigress

From Planet PostgreSQL. Published on Jul 22, 2015.

pgchem::tigress can generate molecular formulae like C3H6NO2- from chemical structures.

But what if we need access to the elemental composition as a relation, e.g:

elementcount
C3
N1
O2

Fortunately, PostgreSQL is awesome:

CREATE OR REPLACE FUNCTION elemental_composition(molformula TEXT)
  RETURNS TABLE(element TEXT, count INTEGER) AS
$BODY$
DECLARE token TEXT[];
DECLARE elements TEXT[];
BEGIN
elements := ARRAY['C','N','O','P','S','Cl'];
molformula := REPLACE(REPLACE(molformula,'-',''),'+','');

FOREACH element IN ARRAY elements LOOP
count := 1;
token := REGEXP_MATCHES(molformula, element || '[\d?]*');

IF (token[1] IS NOT NULL) THEN
    token :=
REGEXP_MATCHES(token[1],'[0-9]+');
        IF (token[1] iS NOT NULL) THEN
            count := token[1]::INTEGER;
        END IF;
RETURN NEXT;
END IF;
END LOOP;
RETURN;
END;
$BODY$
  LANGUAGE plpgsql IMMUTABLE STRICT
  COST 1000;


SELECT * FROM elemental_composition('C3H6NO2-');

And that's it. Did I already mention that PostgreSQL is awesome? :-)

Ozgun Erdogan: First PGConf Silicon Valley Speakers Announced Today

From Planet PostgreSQL. Published on Jul 22, 2015.

As a member of the PGConf Silicon Valley Conference Committee, I have been extremely happy with the volume and quality of the talks submitted to the conference. The Committee has been working hard on sorting through the talks, and I am pleased to announce the first 5 of the 24 total breakout sessions:

  • Grant McAlister, Senior Principal Engineer for Amazon.com, on “Cloud Amazon RDS for PostgreSQL - What’s New and Lessons Learned”
  • Kenny Gorman, CTO of Data for Rackspace, on “Cloud PostgreSQL Automation Management with Ansible”
  • Magnus Hagander, Database Architect, Systems Administrator & Developer for Redpill Linpro, on “What’s New in PostgreSQL 9.5”
  • Ryan Lowe, Production Engineer at Square, on “Postgres for MySQL DBAs”
  • Matthew Kelly, In House Postgres Expert for TripAdvisor, on “At the Heart of a Giant: Postgres at TripAdvisor”

PGConf Silicon Valley is November 17-18, 2015 at the South San Francisco Conference Center. It is a technical conference aimed at the local Silicon Valley PostgreSQL community and is an opportunity for leading industry experts and the local PostgreSQL community to discuss and learn about the major new capabilities of PostgreSQL 9.4 (and 9.5!) and how to optimize a PostgreSQL environment.

If you plan to attend, Super Saver pricing is available through July 25, 2015. You can reserve your seat now by visiting the conference website.

In addition to great talks, we're also pleased to see great support from sponsors with Platinum level sponsor 2ndQuadrant joined by additional sponsors EnterpriseDB, VividCortex, PostgreSQL Experts, and Consistent State. We're also happy to welcome our first two media sponsors, Database Trends & Applications and Datanami.

We hope you will register now to attend the conference at the highly discounted Super Saver rates, which end on July 25, 2015. See you in November!

Read more...

A couple quick tips

By James Bennett from Django community aggregator: Community blog posts. Published on Jul 22, 2015.

As noted yesterday, I’ve spent the last little while working to freshen up various bits of open-source code I maintain, in order to make sure all of them have at least one recent release. Along the way I’ve picked up a few little tips and tricks; some of them may be old hat to you if you’ve been lucky enough to be working with fairly modern Django and Python versions for a while, but I ...

Read full entry

Julien Rouhaud: Keep an eye on your PostgreSQL configuration

From Planet PostgreSQL. Published on Jul 22, 2015.

Have you ever wished to know what configuration changed during the last weeks, when everything was so much faster, or wanted to check what happened on your beloved cluster while you were in vacation?

pg_track_settings is a simple, SQL only extension that helps you to know all of that and more very easily. As it’s designed as an extension, it requires PostgreSQL 9.1 or more.

Some insights

As amost any extension, you have to compile it from source, or use the pgxn client, since there’s no package yet. Assuming you just extract the tarball of the release 1.0.0 with a typical server configuration:

$ cd pg_track_settings-1.0.0
$ sudo make install

Then the extension is available. Create the extension on the database of your choice:

postgres=# CREATE EXTENSION pg_track_settings ;
CREATE EXTENSION

In order to historize the settings, you need to schedule a simple function call on a regular basis. This function is the pg_track_settings_snapshot function. It’s really cheap to call, and won’t have any measurable impact on your cluster. This function will do all the smart work of storing all the parameters that changed since the last call.

For instance, if you want to be able to know what changed on your server within a 5 minutes accuracy, a simple cron entry like this for the postgres user is enough:

*/5 *  * * *     psql -c "SELECT pg_track_settings_snapshot()" > /dev/null 2>&1

A background worker could be used on PostgreSQL 9.3 and more, but as we only have to call one function every few minutes, it’d be overkill to add one just for this. If you really want one, you’d better consider settting up PoWA for that, or another extension that allows to run task like pgAgent.

How to use it

Let’s call the snapshot function to get ti initial values:

postgres=# select pg_track_settings_snapshot()
 ----------------------------
  t
  (1 row)

A first snapshot with the initial settings values is saved. Now, I’ll just change a setting in the postgresql.conf file (ALTER SYSTEM could also be used on a PostgreSQL 9.4 or more release), reload the configuration and take another snapshot:

postgres=# select pg_reload_conf();
 pg_reload_conf
 ----------------
  t
  (1 row)

postgres=# select * from pg_track_settings_snapshot();
 pg_track_settings_snapshot
----------------------------
 t
(1 row)

Now, the fun part. What information is available?

First, what changed between two timestamp. For instance, let’s check what changed in the last 2 minutes:

postgres=# SELECT * FROM pg_track_settings_diff(now() - interval '2 minutes', now());
        name         | from_setting | from_exists | to_setting | to_exists
---------------------+--------------|-------------|------------|----------
 max_wal_size        | 93           | t           | 31         | t
(1 row)

What do we learn ?

  • as the max_wal_size parameter exists, I’m using the 9.5 alpha release. Yes, what PostgreSQL really needs right now is people testing the upcoming release! It’s simple, and the more people test it, the faster it’ll be avalable. See the how to page to see how you can help :)
  • the max_wal_size parameter existed 2 minutes ago (from_exists is true), and also exists right now (to_exists is true). Obviously, the regular settings will not disappear, but think of extension related settings like pg_stat_statements.* or auto_explain.*
  • the max_wal_size changed from 93 (from_setting) to 31 (to_setting).

Also, we can get the history of a specific setting:

postgres=# SELECT * FROM pg_track_settings_log('max_wal_size');
              ts               |     name     | setting_exists | setting
-------------------------------+--------------+----------------+---------
 2015-07-17 22:42:01.156948+02 | max_wal_size | t              | 31
 2015-07-17 22:38:02.722206+02 | max_wal_size | t              | 93
(2 rows)

You can also retrieve the entire configuration at a specified timestamp. For instance:

postgres=# SELECT * FROM pg_track_settings('2015-07-17 22:40:00');
                name                 |     setting
-------------------------------------+-----------------
[...]
 max_wal_senders                     | 5
 max_wal_size                        | 93
 max_worker_processes                | 8
[...]

The sames functions are provided to know what settings have been overloaded for a specific user and/or database (the ALTER ROLE … SET, ALTER ROLE … IN DATABASE … SET and ALTER DATABASE … SET commands), with the functions:

  • pg_track_db_role_settings_diff()
  • pg_track_db_role_settings_log()
  • pg_track_db_role_settings()

And finally, just in case you can also know when PostgreSQL has been restarted:

postgres=# SELECT * FROM pg_reboot;
              ts
-------------------------------
 2015-07-17 08:39:37.315131+02
(1 row)

That’s all for this extension. I hope you’ll never miss or forget a configuration change again!

If you want to install it, the source code is available on the github repository github.com/rjuju/pg_track_settings.

Limitations

As the only way to know what is the current value for a setting is to query pg_settings (or call current_setting()), you must be aware that the user calling pg_track_settings_snapshot() may see an overloaded value (like ALTER ROLE … SET param = value) rather than the original value. As the pg_db_role_setting table is also historized, it’s pretty easy to know that you don’t see the original value, but there’s no way to know what the original value really is.

Keep an eye on your PostgreSQL configuration was originally published by Julien Rouhaud at rjuju's home on July 22, 2015.

How to automatically create images for Responsive design

By Cloudinary Blog - Django from Django community aggregator: Community blog posts. Published on Jul 21, 2015.

Responsive Images Responsive web design is a method of designing websites to provide an optimal viewing experience to users, irrespective of the device, window size, orientation, or resolution used to view the website. A site designed responsively adapts its layout to the viewing environment, resizing and moving elements dynamically and based on the properties of the browser or device the site is being displayed on.

The responsive design uses CSS for dynamic content changes and controlling the text font size, the layout grid used and the various image dimensions, which are based on media queries and the browser window size.  Most of the dynamic changes can be accomplished this way (or with frameworks like Bootstrap) but not so when it comes to the images.  

The simple solution is to always deliver the image in the highest resolution and then simply scale the image down with CSS for lower resolution devices or smaller browser windows. However, high resolution images use more bytes and take more time to deliver, so this solution could needlessly waste bandwidth and loading time, increasing costs and harming users experience.

To offset the problem, a more complex framework could be designed that can load different images for each responsive mode, but when considering the number of images needed and the time to create all the different resolutions and image dimensions, implementing this solution becomes complex and hard to maintain.

Cloudinary can help reduce the complexity with dynamic image manipulation. You can simply build image URLs with any image width or height based on the specific device resolution and window size. This means you don't have to pre-create the images, with dynamic resizing taking place on-the-fly as needed.

Responsive images solution

The solution for simply and dynamically integrating images within responsive design layouts, can be implemented with a method added to Cloudinary's Javascript library a few months ago. The Cloudinary Javascript library, which is based on jQuery, automatically builds image URLs to match the size available for each image in the responsive layout and works as follows:

  • A Cloudinary dynamic manipulation URL is automatically built on the fly to deliver an uploaded image that is scaled to the exact available width.

  • If the browser window is consequently enlarged then new higher resolution images are automatically delivered, while using stop-points (every 10px by default) to prevent loading too many images.

  • If the browser window is scaled down, browser-side scaling is used instead of delivering a new image.

This feature allows you to provide one high resolution image, and have it automatically adapted to the resolution and size appropriate to each user’s device or browser on the fly. This ensures a great user experience by delivering the best possible resolution image, based on the device's resolution and the width available, without needlessly wasting bandwidth or loading time.

Implementing responsive design with Cloudinary's Javascript library

Implementing the responsive design in code using the Cloudinary jQuery plugin is a very simple process.

Step 1:

Include the jQuery plugin in your HTML pages (see the jQuery plugin getting started guide for more information).

Step 2:

For each image to display responsively:

  1. Set the data-src attribute of the img tag to the URL of an image that was uploaded to Cloudinary. The src attribute is not set and the actual image is updated dynamically (you can set the src attribute to a placeholder image that is displayed until the image is loaded).

  2. Set the width parameter to auto (w_auto in URLs). This allows the jQuery plugin to dynamically generate an image URL scaled to the correct width value, based on the detected width actually available for the image in the containing element.

  3. Add the cld-responsive class to the image tag. This is the default class name, but you can use custom class names and programmatically make HTML elements become responsive.

For example:

<img data-src="http://res.cloudinary.com/demo/image/upload/w_auto/smiling_man.jpg" class="cld-responsive">

Step 3:

Add Cloudinary's responsive Javascript method call at the end of the HTML page.

<script type="text/javascript">$.cloudinary.responsive()</script>

The responsive method looks for all images in the page that have the "cld-responsive" class name, detects the available width for the image on the page, and then updates the HTML image tags accordingly. The image is also updated whenever the window size or screen resolution changes.

Note that the three step process presented above covers the simplest and most general solution. The behaviour can be further customized to control whether to update images on resize, when to update the image using stop-points, preserving the CSS image height and more. See the Cloudinary Javascript library for more details.

Thats it! Checkout the following demo images created using Cloudinary (for the images) and Bootstrap (for the layout). The images also include a text overlay that is updated on-the-fly to display the actual width of the image and the Device Pixel Ratio setting (see further on in this blog post for more details on DPR).

Resize this browser window to see how the layout and images dynamically respond to the changes.

4 columns

4-3-2 grid

3-2-1 grid

As can be seen in the demo images above, the URL of an image can be further manipulated on the fly like any other image uploaded to Cloudinary.

Implementing responsive design with the Cloudinary SDKs

To make things even easier, responsive design can be implemented with the Cloudinary SDK's view helper methods (e.g. cl_image_tag in Ruby on Rails). Setting the width parameter to auto creates an HTML image tag with a blank src attribute while the data-src attribute points to a dynamic image manipulation URL. When you load Cloudinary's jQuery plugin and call the responsive method, the image tags are automatically updated and URLs are replaced with the correct width value. You can also set a placeholder image using the responsive_placeholder parameter, or set it to an inline blank image by setting the parameter to blank.

For example, creating an HTML image tag for the "smiling_man.jpg" image with the width automatically determined on the fly as needed, and using a blank image placeholder:

Ruby:
cl_image_tag("smiling_man.jpg", :width => :auto, 
  :responsive_placeholder => "blank")
PHP:
cl_image_tag("smiling_man.jpg",  array("width" => "auto", 
  "responsive_placeholder" => "blank"));
Python:
cloudinary.CloudinaryImage("smiling_man.jpg").image(width = "auto",
  responsive_placeholder = "blank")
Node.js:
cloudinary.image("smiling_man.jpg",  { width: "auto", 
  responsive_placeholder: "blank" })
Java:
cloudinary.url().transformation(new Transformation().width("auto").
  responsive_placeholder("blank")).imageTag("smiling_man.jpg");

The code above generates the following HTML image tag:

<img class="cld-responsive" 
data-src="http://res.cloudinary.com/demo/image/upload/w_auto/smiling_man.jpg"
src=""  />

Responsive design with support for retina and HiDPI devices

You can also simultaneously create the correct DPR image for devices that support higher resolutions by simply adding the dpr parameter set to auto to the URL or SDK method. The Javascript code will check the DPR of the device as well as the space available for the image. Delivery and manipulation URLs are then built automatically (and lazily) for all image tags to match the specific settings, with all image generation happening in the cloud. Users of devices with high pixel density will get a great visual result, while low-DPR users don't have to wait needlessly for larger images to load (see this blog post for more details).

For example, creating an HTML image tag for the "woman.jpg" image with the width and DPR automatically determined on the fly as needed, and using a blank image placeholder:

Ruby:
cl_image_tag("woman.jpg", :width => :auto,  :dpr => :auto,
  :responsive_placeholder => "blank")
PHP:
cl_image_tag("woman.jpg",  array("width" => "auto",  "dpr" => "auto",
  "responsive_placeholder" => "blank"));
Python:
cloudinary.CloudinaryImage("woman.jpg").image(width = "auto", dpr = "auto", 
  responsive_placeholder = "blank")
Node.js:
cloudinary.image("woman.jpg",  { width: "auto", dpr: "auto", 
  responsive_placeholder: "blank" })
Java:
cloudinary.url().transformation(new Transformation().width("auto").dpr("auto"). 
  responsive_placeholder("blank")).imageTag("woman.jpg");

The code above generates the following HTML image tag:

<img class="cld-responsive" 
data-src=
"http://res.cloudinary.com/demo/image/upload/w_auto,dpr_auto/woman.jpg"
src=""  
/>

Summary

In the modern world, applications have to look good on both the web and on various mobile devices, and therefore need to become more responsive to support the large number of devices available and adjust to the varying amount of space available for displaying content. Responsive frameworks such as Bootstrap can help with the layout and text, but have no support for images beyond client-side resizing.

Cloudinary allows you to manage your images in a very simple way by just uploading your hi-res images, using any web framework to add your image tag with automatic width and automatic DPR, and adding one line of Javascript for all your images to become Responsive. Improve your user's experience with plenty more of Cloudinary's image optimization and manipulation capabilities all done in the cloud, without the need to install image processing software or pre-generating all the image versions and resolutions, while reducing needless page loading times and saving bandwidth.

Responsive support is available in all the Cloudinary plans, including the free plan. If you don't have a Cloudinary account, you are welcome to sign up to our free account and try it out.

Hubert 'depesz' Lubaczewski: Waiting for 9.6 – Add psql \ev and \sv commands for editing and showing view definitions.

From Planet PostgreSQL. Published on Jul 21, 2015.

On 3rd of July, Tom Lane committed patch: Add psql \ev and \sv commands for editing and showing view definitions.   These are basically just like the \ef and \sf commands for functions.   Petr Korobeinikov, reviewed by Jeevan Chalke, some changes by me It's not a huge patch, but it's the first patch for […]

Testing Django Views Without Using the Test Client

By Ian's Blog from Django community aggregator: Community blog posts. Published on Jul 21, 2015.

Testing Django Views Without Using the Test Client

Ernst-Georg Schmid: 1st impression of pg_shard

From Planet PostgreSQL. Published on Jul 21, 2015.

1st impression of pg_shard:

  1. It works as described, documented limitations apply
  2. Status '3' of a shard_placement means inactive, '1' means online
  3. Read the issues list - unless you want to discover them yourselves :-)
  4. If a whole worker node goes offline this may go unnoticed, since there seems to be no heartbeat between the head and worker nodes unless you try to write data

Hans-Juergen Schoenig: SKIP LOCKED: One of my favorite 9.5 features

From Planet PostgreSQL. Published on Jul 21, 2015.

PostgreSQL 9.5 is just around the corner and many cool new features have been added to this wonderful release. One of the most exciting ones is definitely SKIP LOCKED. To make sure that concurrent operations don’t lead to race conditions, SELECT FOR UPDATE has been supported for many years now and it is essential to […]

The uWSGI Swiss Army Knife

By Lincoln Loop from Django community aggregator: Community blog posts. Published on Jul 20, 2015.

uWSGI is one of those interesting projects that keeps adding features with every new release without becoming totally bloated, slow, and/or unstable. In this post, we'll look at some of its lesser used features and how you might use them to simplify your Python web service.

Let's start by looking at a common Python web project's deployment stack.

  • Nginx: Static file serving, SSL termination, reverse proxy
  • Memcached: Caching
  • Celery: Background task runner
  • Redis or RabbitMQ: Queue for Celery
  • uWSGI: Python WSGI server

Five services. That's a lot of machinery to run for a basic site. Let's see how uWSGI can help you simplify things:

Static File Serving

uWSGI can serve static files quite efficiently. It can even do so without tying up the same worker/thread pool your application uses thanks to it's offloading subsystem. There are a bunch of configuration options around static files, but the common ones we use are:

  • offload-threads the number of threads to dedicate to serving static files
  • check-static this works like Nginx's @tryfiles directive, checking for the existence of a static file before hitting the Python application
  • static-map does the same, but only when a URL pattern is matched

Other options exist to allow you to control gzipping and expires headers among other things. An ini configuration for basic static file serving might look like this:

offload-threads = 4
static-map = /static=/var/www/project/static
static-map = /media=/var/www/project/media
static-expires = /var/www/project/static/* 2592000

More information on static file handling is available on a topic page in the uWSGI docs. When placed behind a CDN, this setup is sufficient for even high-traffic sites.

SSL Termination

uWSGI can handle SSL connections and even the SPDY protocol. Here's an example configuration which will use HTTPS and optionally SPDY as well as redirecting HTTP requests to HTTPS:

https2 = addr=0.0.0.0:8443,cert=domain.crt,key=domain.key,spdy=1
http-to-https = true

Reverse Proxy

uWSGI speaks HTTP and can handle efficiently routing requests to multiple workers. Here's an example that will start an HTTP listener on port 80:

master = true
http = 80
# http://uwsgi-docs.readthedocs.org/en/latest/articles/SerializingAccept.html
thunder-lock = true
uid = www-data
gid = www-data

In this scenario, you'll need to start uwsgi as the root user to access port 80, but it will drop privileges to an unprivileged account via the uid/gid arguments.

You can also do routes and redirects (see the docs for more complex examples):

route = ^/favicon\.ico$ permanent-redirect:/static/favicon.ico

Note: It is unclear to me whether uWSGI's HTTP server is vulnerable to DoS attacks such as Slowloris. Please leave a comment if you have any more information here.

Caching

Did you know uWSGI includes a fast in-memory caching framework? The configuration for it looks like this:

cache2 = name=default,items=5000,purge_lru=1,store=/tmp/uwsgi_cache

This will configure a cache named default capable of holding up to 5000 items purging least recently used keys in the event of an overflow. The cache will periodically be asynchronously flushed to disk (/tmp/uwsgi_cache) so the uWSGI process can be restarted without also dropping the entire cache.

You can find the caching framework docs here and a Django-compliant cache backend, django-uwsgi-cache, is available on PyPI.

Task Queuing

Yes, that's right, uWSGI includes a task queue too. The uWSGI spooler can not only queue tasks for immediate execution, but also provide cron-like functionality to schedule tasks to run at some point in the future. It is configured, simply by providing a directory to store the queue and the number of workers to run:

spooler = /tmp/uwsgi_spooler
spooler-processes = 4

The uwsgi Python package provides a uwsgidecorators module that can be used to place jobs on the queue for execution. A simple example:

from uwsgidecorators import cron, spool

@cron(0, 0, -1, -1, -1)
def cronjob_task():
    # This will run everyday at midnight
    ...

@spool
def queued_task(**kwargs):
    # Something slow you want to queue
    ...

# Put a task into the queue
queued_task.spool(foo=1, bar='test')

Conclusion

As you can see, uWSGI really is a swiss army knife for serving Python web services. Actually, it's not even limited to Python. You can use it for Ruby and Perl sites as well. We've used many of these features on production sites with great success. While specialized are certainly going to be more robust for high-volume workloads, they are simply overkill for the majority of sites.

Distributed microservice architectures may be all the rage, but the reality is that most sites can run on a single server. Reducing the number of services and dependencies makes deployment easier and removes points of failure in your system. Before you jump to add more tools to your stack, it's worth checking if you can make do with what you already have.

News and such

By James Bennett from Django community aggregator: Community blog posts. Published on Jul 20, 2015.

First things first: though I announced this sort of quietly at the time, I should probably announce it publicly as well: after four years as part of the MDN development team, as of last month I am no longer at Mozilla. There are some parallels to the last time I moved on, so I’ll link to that in lieu of writing it all over again. For the moment I’m enjoying a summer vacation, but I’m ...

Read full entry

Julien Rouhaud: Keep an eye on your PostgreSQL configuration

From Planet PostgreSQL. Published on Jul 20, 2015.

Have you ever wished to know what configuration changed during the last weeks, when everything was so much faster, or wanted to check what happened on your beloved cluster while you were in vacation?

pg_track_settings is a simple, SQL only extension that helps you to know all of that and more very easily. As it’s designed as an extension, it requires PostgreSQL 9.1 or more.

Some insights

As amost any extension, you have to compile it from source, or use the pgxn client, since there’s no package yet. Assuming you just extract the tarball of the release 1.0.0 with a typical server configuration:

$ cd pg\_track\_settings-1.0.0
$ sudo make install

Then the is available. Create the extension on the database of your choice:

postgres=# CREATE EXTENSION pg\_track\_settings ;
CREATE EXTENSION

In order to historize the settings, you need to schedule a simple function call on a regular basis. This function is the pg_track_settings_snapshot function. It’s really cheap to call, and won’t have any measurable impact on your cluster. This function will do all the smart work of storing all the parameters that changed since the last call.

For instance, if you want to be able to know what changed on your server within a 5 minutes accuracy, a simple cron entry like this for the postgres user is enough:

\*/5 \*  \* \* \*     psql -c "SELECT pg\_track\_settings\_snapshot()" > /dev/null 2>&1

A background worker could be used on PostgreSQL 9.3 and more, but as we only have to call one function every few minutes, it’d be overkill to add one just for this. If you really want one, you’d better consider settting up PoWA for that, or another extension that allows to run task like pgAgent.

How to use it

Let’s call the snapshot function to get ti initial values:

postgres=# select pg\_track\_settings\_snapshot()
 ----------------------------
  t
  (1 row)

A first snapshot with the initial settings values is saved. Now, I’ll just change a setting in the postgresql.conf file (ALTER SYSTEM could also be used on a PostgreSQL 9.4 or more release), reload the configuration and take another snapshot:

postgres=# select pg\_reload\_conf();
 pg\_reload\_conf
 ----------------
  t
  (1 row)

postgres=# select * from pg\_track\_settings\_snapshot();
 pg\_track\_settings\_snapshot
----------------------------
 t
(1 row)

Now, the fun part. What information is available?

First, what changed between two timestamp. For instance, let’s check what changed in the last 2 minutes:

postgres=# SELECT * FROM pg\_track\_settings\_diff(now() - interval '2 minutes', now());
        name         | from\_setting | from\_exists | to\_setting | to\_exists
---------------------+--------------|-------------|------------|----------
 max\_wal\_size      | 93           | t           | 31         | t
(1 row)

What do we learn ?

  • as the max_wal_size parameter exists, I’m using the 9.5 alpha release. Yes, what PostgreSQL really needs right now is people testing the upcoming release! It’s simple, and the more people test it, the faster it’ll be avalable. See the how to page to see how you can help :)
  • the max_wal_size parameter existed 2 minutes ago (from_exists is true), and also exists right now (to_exists is true). Obviously, the regular settings will not disappear, but think of extension related settings like pg_stat_statements.* or auto_explain.*
  • the max_wal_size changed from 93 (from_setting) to 31 (to_setting).

Also, we can get the history of a specific setting:

postgres=# SELECT * FROM pg\_track\_settings\_log('max\_wal\_size');
              ts               |     name     | setting\_exists | setting 
-------------------------------+--------------+----------------+---------
 2015-07-17 22:42:01.156948+02 | max\_wal\_size | t              | 31
 2015-07-17 22:38:02.722206+02 | max\_wal\_size | t              | 93
(2 rows)

You can also retrieve the entire configuration at a specified timestamp. For instance:

postgres=# SELECT * FROM pg\_track\_settings('2015-07-17 22:40:00');
                name                 |     setting
-------------------------------------+-----------------
[...]
 max\_wal\_senders                     | 5
 max\_wal\_size                        | 93
 max\_worker\_processes                | 8
[...]

The sames functions are provided to know what settings have been overloaded for a specific user and/or database (the ALTER ROLE … SET, ALTER ROLE … IN DATABASE … SET and ALTER DATABASE … SET commands), with the functions:

  • pg_track_db_role_settings_diff()
  • pg_track_db_role_settings_log()
  • pg_track_db_role_settings()

And finally, just in case you can also know when PostgreSQL has been restarted:

postgres=# SELECT * FROM pg\_reboot;
              ts
-------------------------------
 2015-07-17 08:39:37.315131+02
(1 row)

That’s all for this extension. I hope you’ll never miss or forget a configuration change again!

If you want to install it, the source code is available on the github repository github.com/rjuju/pg_track_settings.

Limitations

As the only way to know what is the current value for a setting is to query pg_settings (or call current_setting()), you must be aware that the user calling pg_track_settings_snapshot() may see an overloaded value (like ALTER ROLE … SET param = value) rather than the original value. As the pg_db_role_setting table is also historized, it’s pretty easy to know that you don’t see the original value, but there’s no way to know what the original value really is.

Keep an eye on your PostgreSQL configuration was originally published by Julien Rouhaud at rjuju's home on July 20, 2015.

Shaun M. Thomas: PG Phriday: 10 Ways to Ruin Performance: Indexing the World

From Planet PostgreSQL. Published on Jul 17, 2015.

An easy way to give PGDB (PostgreSQL) a performance boost is to judiciously use indexes based on queries observed in the system. For most situations, this is as simple as indexing columns that are referenced frequently in WHERE clauses. PGDB is one of the few database engines that takes this idea even further with partial indexes. Unfortunately as a consequence of insufficient exposure, most DBAs and users are unfamiliar with this extremely powerful functionality.

Imagine we have an order system that tracks order state, such that entries are marked as new, processing, or done. These kinds of transient states are not uncommon in various inventory management systems, so it’s a great example for this use case. Often with such systems, data is distributed in such a way that more than 90% of orders are marked as ‘done’. To make this interesting, let’s just cap the done state at 90%, and distribute another 5% to processing, and 5% to new.

This somewhat complex SQL should emulate the above scenario:

DROP TABLE IF EXISTS sys_order;
 
CREATE TABLE sys_order
(
    order_id     SERIAL       NOT NULL,
    account_id   INT          NOT NULL,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_state  CHAR         NOT NULL DEFAULT 'N',
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now(),
    valid_dt     TIMESTAMPTZ  NULL
);
 
INSERT INTO sys_order (
         account_id, product_id, item_count, order_state,
         order_dt, valid_dt
       )
SELECT (a.id % 100) + 1, (a.id % 100000) + 1, (a.id % 100) + 1,
       CASE WHEN b.id BETWEEN 1 AND 5 THEN 'N'
            WHEN b.id BETWEEN 6 AND 10 THEN 'P'
            ELSE 'D'
       END,
       now() - (a.id % 1000 || 'd')::INTERVAL
             + (a.id || 'ms')::INTERVAL,
       CASE WHEN a.id % 499 = 0
            THEN NULL
            ELSE now() - (a.id % 999 || 'd')::INTERVAL
       END
  FROM generate_series(1, 10000) a(id),
       generate_series(1, 100) b(id);
 
ALTER TABLE sys_order ADD CONSTRAINT pk_order_id
      PRIMARY KEY (order_id);
 
CREATE INDEX idx_order_account_id
    ON sys_order (account_id);
 
CREATE INDEX idx_order_order_dt
    ON sys_order (order_dt DESC);
 
ANALYZE sys_order;

In many, if not most situations, automated tools and interfaces are only interested in pending order states. If an order is new, or in the middle of processing, it’s going to see a lot of activity. Because of this, a lot of queries will request orders based on status. Let’s presume an example of this is to fetch all new orders from a certain account, possibly for display on a web site.

This is the query PGDB sees:

SELECT *
  FROM sys_order
 WHERE account_id = 42
   AND order_state = 'N';

In my virtual environment, this executes in about 3ms and uses the index on account_id as expected. Few people would consider a 3ms execution time as slow, so this is where most optimization stops. But we know something about this data! We know that 90% (or more) of all possible order states are ‘D’. In the case of this query, PGDB has to reduce 10,000 matches from the index, down to 500 to return our results.

This is hardly ideal. A naive approach to correct this, may be to index both account_id and order_state. While this works, we’re still indexing 90% of values that provide no benefit to the index. This is where PGDB differs from many other database engines. Let’s go a bit further with our example and create a partial index and try the query again:

CREATE INDEX idx_order_state_account_id
    ON sys_order (account_id, order_state)
 WHERE order_state != 'D';
 
SELECT *
  FROM sys_order
 WHERE account_id = 42
   AND order_state = 'N';

The revised execution time of our query with the new index is about 0.7ms. While this is 4-5x faster than before, the benefits go beyond execution time. Let’s take a look at the size of each index on disk:

SELECT indexrelid::REGCLASS::TEXT AS index_name,
       pg_relation_size(indexrelid) / 1048576 AS size_mb
  FROM pg_index
 WHERE indrelid::REGCLASS::TEXT = 'sys_order';
 
         index_name         | size_mb 
----------------------------+---------
 pk_order_id                |      21
 idx_order_account_id       |      21
 idx_order_order_dt         |      21
 idx_order_state_account_id |       2

Hopefully it comes as no surprise that an index which includes 90% less data will be 90% smaller. Also, keep in mind that this contrived example was designed to somewhat downplay the effect of partial indexes. In a real order system, far more than 90% of orders would be marked as done, thus magnifying the speed increase and index size reduction.

There are a lot of ways partial indexes can be used, and like most tools, they’re not appropriate for all situations. But when the data cooperates, they’re extremely relevant. When time permits, take a look at data distribution and queries against the system. Chances are, there will be at least one situation that could be improved with a partial index.

If only every database was so accommodating.

Understanding Django Middlewares

By Agiliq's Django Blog from Django community aggregator: Community blog posts. Published on Jul 17, 2015.

I assume you have read official Django docs on middleware. I will elaborate on things mentioned in the documentation but I assume you are familiar with basics of middleware.

In this post we will discuss the following.

  • What is a middleware
  • When to use middleware
  • Things to remember when writing middleware
  • Writing some middlewares to understand how order of middleware matters

What is a middleware

Middlewares are hooks to modify Django request or response object. Putting the definition of middleware from Django docs.

Middleware is a framework of hooks into Djangos request/response processing. Its a light, low-level plugin system for globally altering Djangos input or output.

When to use middleware

You can use middleware if you want to modify the request i.e HttpRequest object which is sent to the view. Or you might want to modify the HttpResponse object returned from the view. Both these can be achieved by using a middleware.

You might want to perform an operation before the view executes. In such case you would use a middleware.

Django provides some default middleware. eg: AuthenticationMiddleware

Very often you would have used request.user inside the view. Django wants user attribute to be set on request before any view executes. Django takes a middleware approach to accomplish this. So Django provides an AuthenticationMiddleware which can modify the request object.

And Django modifies the request object like:

https://github.com/django/django/blob/master/django/contrib/auth/middleware.py#L22

Similarly you might have an application which works with users of different timezones. You want to use the user's timezone while showing any page to the user. You want access to user's timezone in all the views. It makes sense to add it in session in such case. So you can add a middleware like this:

class TimezoneMiddleware(object):
    def process_request(self, request):
        # Assuming user has a OneToOneField to a model called Profile
        # And Profile stores the timezone of the User.
        request.session['timezone'] = request.user.profile.timezone

TimezoneMiddleware is dependent on request.user. And request.user is populated in AuthenticationMiddleware. So TimezoneMiddleware written by us must come after Django provided AuthenticationMiddleware in the tuple settings.MIDDLEWARE_CLASSES.

We will get more idea about order of middlewares in coming examples.

Things to remember when using middleware

  • Order of middlewares is important.
  • A middleware only need to extend from class object.
  • A middleware is free to implement some of the methods and not implement other methods.
  • A middleware may implement process_request but may not implement process_response and process_view. Infact it is very common and lot of Django provided middlewares do it.
  • A middleware may implement process_response but not implement process_request.

AuthenticationMiddleware only implements process_request and doesn't implement process_response. You can check it here

GZipMiddleware only implements process_response and doesn't implement process_request or process_view. You can see it here

Writing some middlewares

Make sure you have a Django project with a url and a view, and that you are able to access that view. Since we will try several things with request.user, make sure authentication is properly set for you and that request.user prints the right thing in this view.

Create a file middleware.py in any of your app.

I have an app called books and so I am writing this in books/middleware.py

class BookMiddleware(object):
    def process_request(self, request):
        print "Middleware executed"

Add this middleware in MIDDLEWARE_CLASSES

MIDDLEWARE_CLASSES = (
    'books.middleware.BookMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
)

Make request to any url. This should get printed on runserver console

Middleware executed

Modify BookMiddleware.process_request so it looks like

class BookMiddleware(object):
    def process_request(self, request):
        print "Middleware executed"
        print request.user

Make request to a url again. This will raise an error.

'WSGIRequest' object has no attribute 'user'

This happened because attribute user hasn't been set on request yet.

Now change the order of middlewares so BookMiddleware comes after AuthenticationMiddleware

MIDDLEWARE_CLASSES = (
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'books.middleware.BookMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
)

Make request to any url. This should get printed on runserver console

Middleware executed
<username>

This tells that process_request is executed on the middlewares in the order in which they are listed in settings.MIDDLEWARE_CLASSES

You can verify it further. Add another middleware in your middleware.py

class AnotherMiddleware(object):
    def process_request(self, request):
        print "Another middleware executed"

Add this middleware in MIDDLEWARE_CLASSES too.

MIDDLEWARE_CLASSES = (
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'books.middleware.BookMiddleware',
    'books.middleware.AnotherMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
)

Now output would be:

Middleware executed
<username>
Another middleware executed

How returning HttpResponse from process_request changes things

Modify BookMiddleware so it looks like

class BookMiddleware(object):
    def process_request(self, request):
        print "Middleware executed"
        print request.user
        return HttpResponse("some response")

Try any url now and your output would be:

Middleware executed
<username>

You will notice two things:

  • Your view will no more be executed and no matter which url you try, you will see "some response".
  • AnotherMiddleware.process_request will not be executed anymore.

So if a Middleware's process_request() returns a HttpResponse object then process_request of any subsequent middlewares is bypassed. Also view execution is bypassed. You would rarely do this or require this in your projects.

Comment "return HttpResponse("some response")" so process_request of both middlewares keep executing.

Working with process_response

Add method process_response to both the middlewares

class AnotherMiddleware(object):
    def process_request(self, request):
        print "Another middleware executed"

    def process_response(self, request, response):
        print "AnotherMiddleware process_response executed"
        return response

class BookMiddleware(object):
    def process_request(self, request):
        print "Middleware executed"
        print request.user
        return HttpResponse("some response")
        #self._start = time.time()

    def process_response(self, request, response):
        print "BookMiddleware process_response executed"
        return response

Try some url. Output would be

Middleware executed
<username>
Another middleware executed
AnotherMiddleware process_response executed
BookMiddleware process_response executed

AnotherMiddleware.process_response() is executed before BookMiddleware.process_response() while AnotherMiddleware.process_request() executes after BookMiddleware.process_request(). So process_response() follows the reverse of what happens with process_request. process_response() is executed for last middleware then second last middleware and so on till the first middleware.

process_view

Django applies middleware's process_view() in the order it’s defined in MIDDLEWARE_CLASSES, top-down. This is similar to the order followed for process_request().

Also if any process_view() returns an HttpResponse object, then subsequent process_view() calls are bypassed and not executed.

Check our next post to see a practical use of middleware.

Profiling Django Middlewares

By Agiliq's Django Blog from Django community aggregator: Community blog posts. Published on Jul 17, 2015.

I assume you have a basic understanding of Profiling, what it means and why we use it.

Why this post

Recently I had to profile a Django application which wasn't performing as fast as it should. This application had several custom middlewares too. So it was possible that custom middlewares were the cause of slow performance.

There are some existing Django libraries to profile Django code. eg: Django Debug Toolbar, django-silk , django_cprofile etc. Most of them can profile view code well but they can't profile other middlewares.

I wanted a way to profile middlewares too.

Problem with Django Debug Toolbar

I assume you understand middlewares and how the order in which middlewares are defined matter. If you want to get more idea about middlewares, this post might help.

Django debug toolbar is probably designed for profiling the views. It uses process_view() and returns an HttpResponse instace from process_view(). process_request() of all middlewares run before any middleware's process_view(). So using Django debug toolbar, it's not possible to profile what's going on inside process_request() of different middlewares.

And since process_view() of debug toolbar returns HttpResponse, process_view() of other middlewares is bypassed and so we can't profile process_view() of other middlewares.

So I guess it is not possible to profile middleware code using Django debug toolbar.

django-silk

Django silk seemed better at profiling middlewares too. It looks promising and I will play more with it.

But Django silk also tracks queries executed, inserts the results in db etc. In case you only wanted to know the time it takes to execute different functions and wanted to find out the most time consuming functions, you might not want the overhead of django silk.

Writing our own middleware

We want to write a simple middleware that just tells the most expensive functions/methods and time it took to execute those functions. We don't want to capture sql queries or anything fancy.

We will use standard Python provided cProfile to achieve our goal. This official doc can help you get familiar with cProfile in 10 mins.

Add the following in any app's middleware.py. Supposing you have an app called books and you add this in books/middleware.py

import cProfile, pstats, StringIO

class ProfilerMiddleware(object):
    def process_request(self, request):
        pr = cProfile.Profile()
        pr.enable()
        request._pr = pr

    def process_response(self, request, response):
        request._pr.disable()
        s = StringIO.StringIO()
        sortby = 'cumulative'
        # Sort the output by cumulative time it took in fuctions/methods.
        ps = pstats.Stats(request._pr, stream=s).sort_stats(sortby)
        # Print only 10 most time consuming functions
        ps.print_stats(10)
        print s.getvalue()
        return response

And add books.middleware.ProfileMiddleware at top of your MIDDLEWARE_CLASSES.

MIDDLEWARE_CLASSES = (
    'books.middleware.ProfilerMiddleware',
    # Assuming you have some custom middlewares here, even they will be profiled
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    # This middleware will be profiled too.
    # 'books.middleware.SomeCustomMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
)

Try any url and you should see the profiler output on the runserver console.

Explanation

  • We put our middleware at top of MIDDLEWARE_CLASSES.
  • So this middleware's process_request() will be executed before any other middleware's process_request(). Also it will be executed before any other middleware's any other function like process_view() etc.
  • We enable profiling in process_request() so everything hereafter will be profiled. So process_request() and process_view() of any other middleware will be profiled.
  • We disable profiling in process_response() of our middleware. process_response() of this middleware will run at last, i.e after process_response() of all other middlewares have run.
  • This way process_response() of all other middlewares get profiled too.

Pavel Stehule: aliasing psql

From Planet PostgreSQL. Published on Jul 15, 2015.

When you run psql without arguments, then the psql tries to connect to database named like current user. I dislike this behave. Surely I have not database "pavel". There is trivial solution - add to your .bashrc code:
function psql { 
if [[ $# -eq 0 ]]; then
env psql postgres
else
env psql "$@"
fi
}

gabrielle roth: PDXPUG – July meeting – OSCON BoF Session

From Planet PostgreSQL. Published on Jul 15, 2015.

When: 7-8pm Wed July 22
Where: Oregon Convention Center, Room E141

We’re having a Birds of a Feather session at OSCON instead of our usual July meeting.

Registration is not required to attend the BoF. O’Reilly would prefer we register, though – here’s a code for a free pass if you would like to also check out the Expo Hall: https://twitter.com/oscon/status/621013720257269760


Greg Sabino Mullane: Selectively firing Postgres triggers

From Planet PostgreSQL. Published on Jul 15, 2015.

Being able to disable Postgres triggers selectively can be an important skill when doing tasks like bulk updates, in which you only want a subset of the triggers on the table to be fired. Read below for the long explanation, but the TL;DR version of the best solution is to set a WHEN clause on the trigger you wish to skip, making it conditional on a variable such as session_replication_role, or application_name


CREATE TRIGGER mytrig AFTER INSERT ON foobar FOR EACH 
  ROW WHEN (current_setting('session_replication_role') <> 'local') EXECUTE PROCEDURE myfunc();
BEGIN;
SET LOCAL session_replication_role = 'local';
UPDATE foobar SET baz = 123;
COMMIT;

I decided to spin up a free Heroku "Hobby Dev" database to illustrate the solutions. Generating a test table was done by using the Pagila project, as it has tables which contain triggers. Heroku gives you a randomly generated user and database name. To install the Pagila schema, I did:

$ export H="postgres://vacnvzatmsnpre:2iCDp-46ldaFxgdIx8HWFeXHM@ec2-34-567-89.compute-1.amazonaws.com:5432/d5q5io7c3alx9t"
$ cd pagila-0.10.1
$ psql $H -q -f pagila-schema.sql
$ psql $H -q -f pagila-data.sql

Errors appeared on the import, but they can be safely ignored. One error was because the Heroku database does not have a user named "postgres", and the other error was due to the fact that the Heroku user is not a superuser. The data, however, was all intact. The sample data is actually quite funny, as the movie titles were semi auto-generated at some point. For example, seven random movie descriptions:

  • A Brilliant Panorama of a Madman And a Composer who must Succumb a Car in Ancient India
  • A Touching Documentary of a Madman And a Mad Scientist who must Outrace a Feminist in An Abandoned Mine Shaft
  • A Lackluster Reflection of a Eskimo And a Wretch who must Find a Fanny Pack in The Canadian Rockies
  • A Emotional Character Study of a Robot And a A Shark who must Defeat a Technical Writer in A Manhattan Penthouse
  • A Amazing Yarn of a Hunter And a Butler who must Defeat a Boy in A Jet Boat
  • A Beautiful Reflection of a Womanizer And a Sumo Wrestler who must Chase a Database Administrator in The Gulf of Mexico
  • A Awe-Inspiring Reflection of a Waitress And a Squirrel who must Kill a Mad Cow in A Jet Boat

The table we want to use for this post is named "film", and comes with two triggers on it, 'film_fulltext_trigger', and 'last_updated':

heroku=> \d film
                            Table "public.film"
        Column        |            Type             |       Modifiers
----------------------+-----------------------------+---------------------------------
 film_id              | integer                     | not null default 
                                                      nextval('film_film_id_seq'::regclass)
 title                | character varying(255)      | not null
 description          | text                        | 
 release_year         | year                        | 
 language_id          | smallint                    | not null
 original_language_id | smallint                    | 
 rental_duration      | smallint                    | not null default 3
 rental_rate          | numeric(4,2)                | not null default 4.99
 length               | smallint                    | 
 replacement_cost     | numeric(5,2)                | not null default 19.99
 rating               | mpaa_rating                 | default 'G'::mpaa_rating
 last_update          | timestamp without time zone | not null default now()
...
Triggers:
    film_fulltext_trigger BEFORE INSERT OR UPDATE ON film FOR EACH ROW EXECUTE 
       PROCEDURE tsvector_update_trigger('fulltext', 'pg_catalog.english', 'title', 'description')
    last_updated BEFORE UPDATE ON film FOR EACH ROW EXECUTE PROCEDURE last_updated()

The last_updated trigger calls the last_updated() function, which simply sets the last_update column to CURRENT_TIMESTAMP, which is often seen as its shorter-to-type form, now(). This is a handy metric to track, but there are times when you want to make changes and *not* update this field. A typical example is some sort of bulk change that does not warrant changing all the row's last_update field. How to accomplish this? We need to ensure that the trigger does not fire when we do our UPDATE. The way many people are familiar with is to simply disable all triggers on the table. So you would do something like this:

BEGIN;
ALTER TABLE film DISABLE TRIGGER ALL;
UPDATE film SET rental_duration = 10;
ALTER TABLE film ENABLE TRIGGER ALL;
COMMIT;

When using Heroku, you are given a regular user, not a Postgres superuser, so the above will generate an error that looks like this:

ERROR:  permission denied: "RI_ConstraintTrigger_a_88776583" is a system trigger.

This is caused by the failure of a normal user to disable the internal triggers Postgres uses to maintain foreign key relationships between tables. So the better way is to simply disable the specific trigger like so:

BEGIN;
ALTER TABLE film DISABLE TRIGGER last_updated;
UPDATE film SET rental_duration = 10;
ALTER TABLE film ENABLE TRIGGER last_updated;
COMMIT;

This works on Heroku, but there are two major problems with the ALTER TABLE solution. First, the ALTER TABLE will take a very heavy lock on the entire table, meaning that nobody else will be able to access the table - even to read it! - until your transaction is complete (although 9.5 will reduce this lock!). The other problem with disabling triggers this way is that it is too easy to accidentally leave it in a disabled state (although the check_postgres program has a specific check for this!). Let's take a look at the lock, and double check that the trigger has been disabled as well:

heroku=> SELECT last_update FROM film WHERE film_id = 123;
        last_update         
----------------------------
 2015-06-21 16:38:00.891019
heroku=> BEGIN;
heroku=> ALTER TABLE film DISABLE TRIGGER last_updated;
heroku=> SELECT last_update FROM film WHERE film_id = 123;
heroku=> UPDATE film SET rental_duration = 10;
-- We need the subselect because we share with a gazillion other Heroku databases!
heroku=> select relation::regclass,mode,granted from pg_locks where database = 
heroku->   (select oid from pg_database where datname = current_database());
 relation |        mode         | granted 
----------+---------------------+---------
 pg_locks | AccessShareLock     | t
 film     | RowExclusiveLock    | t
 film     | AccessExclusiveLock | t  ## This is a very heavy lock!
## Version 9.5 and up will have a ShareRowExclusive lock only!
heroku=> ALTER TABLE film ENABLE TRIGGER last_updated;
heroku=> COMMIT;

-- This is the same value, because the trigger did not fire when we updated
heroku=> select last_update FROM film WHERE film_id = 123;
        last_update         
----------------------------
 2015-06-21 16:38:00.891019

What we really want is to use the powerful session_replication_role parameter to safely disable the triggers. The problem is that the canonical way to disable triggers, by setting session_replication_role to 'replica', will disable ALL triggers and rules, for ALL tables. This is not wanted. In our example, we want to stop the last_updated trigger from firing, but also want all the other user triggers to fire, as well as the hidden system triggers that are enforcing foreign key referential integrity.

You can set session_replication_role to one of three values: origin (the default), local, and replica. Setting it to "replica" is commonly used in replication systems such as Bucardo and Slony to prevent all rules and triggers from firing. It can also be used for careful bulk loading. Only triggers explicitly set as "replica triggers" will fire when the session_replication_role is set to 'replica'. The local setting is a little harder to understand, as it does not have a direct mapping to a trigger state, as 'origin' and 'replica' do. Instead, it can be though of as an alias to 'origin' - same functionality, but with a different name. What use is that? Well, you can check the value of session_replication_role and do things differently depending on whether it is 'origin' or 'local'. Thus, it is possible to teach a trigger that it should not fire when session_replication_role is set to 'local' (or to fire *only* when it is set to 'local').

Thus, our previous problem of preventing the last_updated trigger from firing can be solved by careful use of the session_replication_role. We want the trigger to NOT fire when session_replication_role is set to 'local'. This can be accomplished in two ways: modification of the trigger, or modification of the underlying function. Each has its strengths and weaknesses. Note that session_replication_role can only be set by a superuser, which means I'll be switching from Heroku (which only allows connecting as a non-superuser) to a local Pagila database.

For the modify-the-function route, add a quick block at the top to short-circuit the trigger if the session_replication_role (srr) is set to 'local'. An advantage to this method is that all triggers that invoke this function will be affected. In the pagila database, there are 14 tables that have a trigger that calls the last_updated function. Another advantage is that the exception to the function firing is clearly visible in the functions definition itself, and thus easy to spot when you examine the function. Here is how you would modify the last_updated function to only fire when in 'local' srr mode:

CREATE OR REPLACE FUNCTION public.last_updated()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $bc$
BEGIN
  IF current_setting('session_replication_role') = 'local' THEN
    RETURN NEW;
  END IF;

  NEW.last_update = CURRENT_TIMESTAMP;
  RETURN NEW;
END
$bc$;

To invoke it, we change session_replication_role (temporarily!) to 'local', then make our changes. Observe how the value of last_update does not change when we are in 'local' mode:

pagila=# show session_replication_role \t\g
 origin

pagila=# begin;
BEGIN
pagila=# select last_update from film where film_id = 203;
 2015-06-21 16:38:00.711411

pagila=# update film set rental_duration = 10 WHERE film_id = 203;
UPDATE 1
pagila=# select last_update from film where film_id = 203;
 2015-06-21 16:38:03.543831
pagila=# commit;
COMMIT

pagila=# begin;
BEGIN
pagila=# set LOCAL session_replication_role = 'local';
SET
pagila=# select last_update from film where film_id = 203;
 2015-06-21 16:38:03.543831
pagila=# update film set rental_duration = 10 WHERE film_id = 203;
UPDATE 1
pagila=# select last_update from film where film_id = 203;
 2015-06-21 16:38:03.543831
pagila=# commit;
COMMIT

pagila=# show session_replication_role;
 origin

The second method for skipping a trigger by using session_replication_role is to modify the trigger definition itself, rather than changing the function. This has the advantage of not having to touch the function at all, and also allows you to see that the trigger has been modified when doing a \d of the table. Using ALTER TRIGGER only allows a rename, so we will need to drop and recreate the trigger. By adding a WHEN clause to the trigger, we can ensure that it does NOT fire when session_replication_role is set to 'local'. The SQL looks like this:

pagila=# begin;
BEGIN
pagila=# drop trigger last_updated ON film;
DROP TRIGGER
pagila=# create trigger last_updated before update on film for each row 
pagila-#   when (current_setting('session_replication_role') <> 'local') execute procedure last_updated();
CREATE TRIGGER
pagila=# commit;
COMMIT

Voila! As before, we can test it out by setting session_replication_role to 'local' and confirming that the function does not modify the last_update column. Before doing that, let's also change the function back to its original form, to keep things honest:

-- Restore the original version, with no session_replication_role logic:
pagila=# CREATE OR REPLACE FUNCTION public.last_updated() RETURNS TRIGGER LANGUAGE plpgsql
AS $bc$ BEGIN NEW.last_update = CURRENT_TIMESTAMP; RETURN NEW; END $bc$;
CREATE FUNCTION

-- Normal update will change the last_update column:
pagila=# select last_update from film where film_id = 203;
        last_update         
----------------------------
 2015-06-21 16:38:00.121011

pagila=# update film set rental_duration = 10 WHERE film_id = 203;
UPDATE 1
pagila=# select last_update from film where film_id = 203;
        last_update         
----------------------------
 2015-06-21 16:38:03.011004

pagila=# begin;
pagila=# set LOCAL session_replication_role = 'local';
SET
pagila=# update film set rental_duration = 10 WHERE film_id = 203;
UPDATE 1
pagila=# select last_update from film where film_id = 203;
        last_update         
----------------------------
 2015-06-21 16:38:03.011004

-- Show that we are not holding a heavy lock:
pagila=# select relation::regclass,mode,granted from pg_locks where relation::regclass::text = 'film';
 relation |       mode       | granted 
----------+------------------+---------
 film     | AccessShareLock  | t
 film     | RowExclusiveLock | t

pagila=# commit;
COMMIT

Those are the three main ways to selectively disable a trigger on a table: using ALTER TABLE to completely disable it (and invoking a heavy lock), having the function check session_replication_role (affects all triggers using it, requires superuser), and having the trigger use a WHEN clause (requires superuser). Sharp readers may note that being a superuser is not really required, as something other than session_replication_role could be used. Thus, a solution is to use a parameter that can be changed by anyone, that will not affect anything else, and can be set to a unique value. Here is one such solution, using the handy "application_name" parameter. We will return to the Heroku database for this one:

heroku=> drop trigger last_updated on film;
heroku=> create trigger last_updated before update on film for each row 
  when (current_setting('application_name') <> 'skiptrig') execute procedure last_updated();

heroku=> select last_update from film where film_id = 111;
 2015-06-21 16:38:00.365103
heroku=> update film set rental_duration = 10 WHERE film_id = 111;
UPDATE 1
heroku=> select last_update from film where film_id = 111;
 2015-06-21 16:38:03.101115

heroku=> begin;
BEGIN
heroku=> set LOCAL application_name = 'skiptrig';
SET
heroku=> update film set rental_duration = 10 WHERE film_id = 111;
UPDATE 1
heroku=> select last_update from film where film_id = 111;
 2015-06-21 16:38:03.101115

-- Show that we are not holding a heavy lock:
heroku=> select relation::regclass,mode,granted from pg_locks where database = 
heroku->   (select oid from pg_database where datname = current_database());
 relation |       mode       | granted 
----------+------------------+---------
 film     | AccessShareLock  | t
 film     | RowExclusiveLock | t

heroku=> commit;
COMMIT

So there you have it - four solutions to the problem of skipping a single trigger. Which to use depends on your circumstances. I prefer the WHEN + session_replication_role option, as it forces you to be a superuser, and is very visible when looking at the trigger via \d.

Andrew Dunstan: Dropping columns on partitioned tables.

From Planet PostgreSQL. Published on Jul 15, 2015.

Say you have a partitioned table and you want to add a column. There is no problem - you just add the column to the parent table, and it is added to all the children. But what if you want to drop a column? Then things are no so straightforward. If the child's column was created before it was inherited then it won't be dropped just by dropping it on the parent. So it very much depends on how the child is set up. If you do:
create table child() like (parent);
then dropping a column in the parent drops it in the child too. But if you do:
create table child (like parent);
alter table child inherit parent;
then dropping a column in the parent won't drop it in the child. The pg_partman package follows this second pattern when setting up child partitions, as I discovered yesterday when a client ran into this problem. In this case you have to delete the column from the children yourself. I devised the following snippet of code to accomplish this after you have deleted the column from the parent:
do $$
declare
child oid;
begin
for child in
select inhrelid
from pg_inherits
where inhparent = 'parent'::regclass
loop
execute 'alter table ' || child::regclass ||
' drop column if exists some_column';
end loop;
end;
$$;

Michael Paquier: Postgres 9.5 feature highlight: pg_file_settings to finely track system configuration

From Planet PostgreSQL. Published on Jul 15, 2015.

PostgreSQL 9.5 is coming up with a new feature aimed at simplifying tracking of GUC parameters when those are set in a multiple set of files by introducing a new system view called pg_file_settings:

commit: a97e0c3354ace5d74c6873cd5e98444757590be8
author: Stephen Frost <sfrost@snowman.net>
date: Fri, 8 May 2015 19:09:26 -0400
Add pg_file_settings view and function

The function and view added here provide a way to look at all settings
in postgresql.conf, any #include'd files, and postgresql.auto.conf
(which is what backs the ALTER SYSTEM command).

The information returned includes the configuration file name, line
number in that file, sequence number indicating when the parameter is
loaded (useful to see if it is later masked by another definition of the
same parameter), parameter name, and what it is set to at that point.
This information is updated on reload of the server.

This is unfiltered, privileged, information and therefore access is
restricted to superusers through the GRANT system.

Author: Sawada Masahiko, various improvements by me.
Reviewers: David Steele

In short, pg_file_settings can prove to be quite useful when using a set of configuration files to set the server when including them using for example include or include_if_not_exists. Hence, for example let's imagine a server with the following, tiny configuration:

$ cat $PGDATA/postgresql.conf
shared_buffers = '1GB'
work_mem = '50MB'
include = 'other_params.conf'
$ cat $PGDATA/other_params.conf
log_directory = 'pg_log'
logging_collector = on
log_statement = 'all'

Then this new system view is able to show up from where each parameter comes from and the value assigned to it:

=# SELECT * FROM pg_file_settings;
          sourcefile          | sourceline | seqno |       name        | setting | applied | error
------------------------------+------------+-------+-------------------+---------+---------+-------
 /to/pgdata/postgresql.conf   |          1 |     1 | shared_buffers    | 1GB     | t       | null
 /to/pgdata/postgresql.conf   |          2 |     2 | work_mem          | 50MB    | t       | null
 /to/pgdata/other_params.conf |          1 |     3 | log_directory     | pg_log  | t       | null
 /to/pgdata/other_params.conf |          2 |     4 | logging_collector | on      | t       | null
 /to/pgdata/other_params.conf |          3 |     5 | log_statement     | all     | t       | null
 (5 rows)

Among the information given, such as the line of the configuration file where the parameter has been detected, is what makes this view useful for operators: "applied" is a boolean to define if a given parameter can be applied on server or not. If the parameter cannot be applied correctly, it is possible to see the reason why it could not have been applied by looking at the field "error".

Note that the configuration file postgresql.auto.conf is also taken into account. Then let's see what happens when setting new values for parameters already defined in other files:

=# ALTER SYSTEM SET work_mem = '25MB';
ALTER SYSTEM
=# ALTER SYSTEM SET shared_buffers = '250MB';
ALTER SYSTEM
=# SELECT sourcefile, name, setting, applied, error
   FROM pg_file_settings WHERE name IN ('work_mem', 'shared_buffers');
           sourcefile            |      name      | setting | applied |            error
---------------------------------+----------------+---------+---------+------------------------------
 /to/pgdata/postgresql.conf      | shared_buffers | 1GB     | f       | null
 /to/pgdata/postgresql.conf      | work_mem       | 50MB    | f       | null
 /to/pgdata/postgresql.auto.conf | work_mem       | 25MB    | t       | null
 /to/pgdata/postgresql.auto.conf | shared_buffers | 250MB   | f       | setting could not be applied
(4 rows)

Note that as already mentioned above, "applied" defines if the parameter can be applied or not on the server, but that is actually the state that a server would face after reloading parameters on it. Hence in this case if parameters are reloaded, the new value of work_mem which is 25MB can be applied successfully, while the new value of shared_buffers, which is going to need a server restart to be applied, has been selected as the correct candidate, but its new value cannot be applied for the reason given out by the view. Then when restarting the server, values are applied again for all fields:

=# SELECT sourcefile, name, setting, applied, error
   FROM pg_file_settings WHERE name IN ('work_mem', 'shared_buffers');
            sourcefile           |      name      | setting | applied | error
---------------------------------+----------------+---------+---------+-------
 /to/pgdata/postgresql.conf      | shared_buffers | 1GB     | f       | null
 /to/pgdata/postgresql.conf      | work_mem       | 50MB    | f       | null
 /to/pgdata/postgresql.auto.conf | work_mem       | 25MB    | t       | null
 /to/pgdata/postgresql.auto.conf | shared_buffers | 250MB   | t       | null
(4 rows)

And it is possible to clearly see the values that are selected by the system for each parameter.

Incorrect parameters also have a special treatment, for example by defining a parameter that server cannot identify properly here is what pg_file_settings complains about:

=# SELECT sourcefile, name, error FROM pg_file_settings WHERE name = 'incorrect_param';
          sourcefile          |      name       |                error
------------------------------+-----------------+--------------------------------------
 /to/pgdata/other_params.conf | incorrect_param | unrecognized configuration parameter
(1 row)

This is definitely going to be useful for operators who daily manipulate large sets of configuration files to determine if a parameter modified will be able to be taken into account by the server correctly or not. And that's a very appealing for the upcoming 9.5.

Django Birthday: Recap

By Revolution Systems Blog from Django community aggregator: Community blog posts. Published on Jul 14, 2015.

Django Birthday: Recap

Pavel Stehule: simple parallel run statement in every database in PostgreSQL cluster

From Planet PostgreSQL. Published on Jul 14, 2015.

When I need to execute some statement in every database of some PostgreSQL cluster, I am using a script:
for db in `psql -At -c "select datname from pg_database where datname not in ('template0','template1')"`; 
do
psql -At -c "select current_database(), pg_size_pretty(pg_table_size('pg_attribute')) where pg_table_size('pg_attribute') > 100 * 1024 * 1024" $db;
done
Today I needed to run VACUUM statement for selected databases. So I needed to find a way, how to run this slow statement. I was to surprised how it is simple task due strong xarg command. And nice bonus - I can run this slow queries parallel - because xargs can run entered command in more workers (-P option):
# find databases with bloated pg_attribute table, and enforce VACUUM
for db in `psql -At -c "select datname from pg_database where datname not in ('template0','template1')"`;
do
psql -At -c "select current_database() where pg_table_size('pg_attribute') > 100 * 1024 * 1024" $db;
done | xargs -P 3 -I % psql % -c "vacuum full verbose analyze pg_attribute"

Q3 2015 ShipIt Day ReCap

By Caktus Consulting Group from Django community aggregator: Community blog posts. Published on Jul 14, 2015.

Last Friday marked another ShipIt Day at Caktus, a chance for our employees to set aside client work for experimentation and personal development. It’s always a wonderful chance for our developers to test new boundaries, learn new skills and sometimes even build something entirely new in a single day.


NC Nwoko and Mark Lavin teamed up to develop a pizza calculator app. The app simply and efficiently calculates how much pizza any host or catering planner needs to order to feed a large group of people. We eat a lot of pizza at Caktus. Noticing deficiencies in other calculators on the internet, NC and Mark built something simple, clean, and (above all) well researched. In the future, they hope to add size mixing capabilities and as well as a function for calculating the necessary ratios to provide for certain dietary restrictions, such as vegan, vegetarian, or gluten-free eaters.

Jeff Bradberry and Scott Morningstar worked on getting Ansible functioning to replace SALT states in the Django project template and made a lot of progress. And Karen Tracey approached some recent test failures, importing solutions from the database, while Rebecca Muraya began the massive task of updating some of our client based projects to Python 3.

Hunter MacDermut continued building the game he started last ShipIt Day, an HTML5 game using the Phaser framework. He added logic and other game-like elements to make a travelable board with the goal of destroying opponents. He also added animated sprites, including animations for an attack, giving each character their own unique moves. The result was a lot of fun to watch!

Dmitriy Chukhin and Caleb Smith developed a YouTube listening queue using ReactJS, using JQuery for the data layer. They loved the tag functions inherent in ReactJS as well as the speed.

Victor Rocha wrote a new admin action that enables a user to export models as a CSV file. He even found time to open source his work.

Vinod Kurup spent his day fixing RapidSMS bugs, creating two new pull requests. You can find them here and here. Once reviewed, they will be incorporated in the next RapidSMS release.

[Neil Ashton] worked through three chapters of experiments from The Foundations of Statistics: a Simulation-based Approach using iPython Notebook. He subsequently fell in love with iPython Notebook. An interactive computational environment, the iPython notebook seems the perfect platform for Neil’s love of data visualization and interactive experimentation. The iPython notebook ultimately allows the user to combine code execution, rich text, mathematics, plots, and other rich media.

Ross Pike spent the day exploring Font Awesome, the open-source library for scalable vector icons. He also took several tutorials in Sketch, an application for designing websites, interfaces, icons, and pretty much anything else.

Tobias McNulty spent some time working on the next release of django-cache-machine, a 3rd party Django app that adds caching and automatic invalidation to your Django models on a per-model basis. This ShipIt Day he worked on adding Python 3 support (with help from Vinod) and added a feature to support invalidation of queries when new model instances are created.

Finally, inspired by the open data apps built by Code for Durham, Rebecca Conley used D3 to write data visualizations of data on North Carolina’s public schools. Eventually, she wants to test more complex bar graph visualizations as well as learn data visualization in D3 beyond the bar graph.

We had a number of people on vacation this ShipIt Day and several administrators and team members who couldn’t put away their typical workload this time around. But no matter; there is always the next ShipIt Day!

Repository for existing Extbase model class

By zoe.vc - Development & Freelancing Blog from Django community aggregator: Community blog posts. Published on Jul 13, 2015.

If you want to use an existing extbase model like Category but want to create a custom repository to implement some custom queries, here is the way to go:

Create typo3conf/ext/your_ext/Classes/Domain/Model/Category.php:

<?php
namespace Dummy\YourExt\Domain\Model;
/**
 * Category
 */
class Category extends \TYPO3\CMS\Extbase\Domain\Model\Category {

}

Create typo3conf/ext/your_ext/Classes/Domain/Repository/CategoryRepository.php:

<?php
namespace Dummy\YourExt\Domain\Repository;
/**
 * The repository for Categories
 */
class CategoryRepository extends \TYPO3\CMS\Extbase\Domain\Repository\CategoryRepository {
    // some custom query methods here
}

Create typo3conf/ext/your_ext/ext_typoscript_setup.txt:

config.tx_extbase{
    persistence{
        classes{
            Dummy\YourExt\Domain\Model\Category {
                mapping {
                    tableName = sys_category
                    recordType = Tx_YourExt_Category
                }
            }
        }
    }
}

Clear the cache and you're ready to go!

Frontend Code Strategy with Django

By GoDjango - Django Screencasts from Django community aggregator: Community blog posts. Published on Jul 13, 2015.

Web development is getting more and more focused on front-end development. This leads to a lot of questions on what is the best way to handle compiling css, javascript, coffeescirpt, stylus, etc. Having been involved with a project where the frontend deals with 100+ files between stylus files, coffeescirpt files, javascript, and css. This is kind of crazy to manage without a good strategy or even knowing how to accomplish it.

How Frontend Code Works in Django

When working with django you need to consider how the code is going to go from your code base to be displayed in the browser. The normal path it takes is to live in a static directory which you edit and run locally. Then in production you do a collectstatic command and it moves everything to a folder for your webserver to pull from. With that in mind as long as the above happens you can do almost anything you want for your frontend code. Lets take a look at the 3 most common ways you can do front-end development with django.

Three Popular Ways to Write Frontend Code

Everything in the static folder

The most common way we start projects is to just stuff everything in a static file on the root of our project. We add img, css, and js folders to keep things organized. We then use the staticfiles app to use template helpers for loading in our static files without having to hard code them. This will get us a long way, and for most sites is a good approach, because lets face it most sites don't move past hobby projects.

To get this working about the only configuration you need is:

INSTALLED_APPS = (
    ...
    'django.contrib.staticfiles',
    ...
)

STATIC_URL = '/static/'
STATICFILES_DIRS = (
    os.path.join(BASE_DIR, 'static'),
)

STATIC_ROOT = os.path.join(BASE_DIR, 'static_final')

To see it in action you can watch a video here: Staticfiles, template filters, and commands

Third Party Django Library

The next step people often move to is using a third party django app which kind of takes care of everything for you. You set in the settings what you want to use to compile and preprocess code. The django app then does all the processing, finally outputing the final information where we tell it, generally from the settings above. It is fairly simple and straight forward to use these libraries especially since they don't really break your workflow, and you get the addition of using the latest cool technologies.

The two most populare packages I have seen is django-compressor and django-pipeline, a fork of compressor. Both are fairly simple to setup and configure, but they do have their limitations and oddities. I would recommend doing a side project with each to evaluate which one you like, to determine if it is right for you.

If you want to checkout a video for django-pipline go here: Compile and compress assets with django-pipline

Front-end Build System

The final way most often seen for building front end code is to use an entire build system outside of django completely. You setup your static file locations like we did above, and then you tell the independent build system to output your files there. Generally you would use something like grunt, gulp or brunch to do this. This method gives you the most flexibility and freedom, but is the most time consuming to setup and learn. If you heavily rely on front end code then this is probably the best solution, especially for single page apps.

Also by taking the time to learn some of these tools it helps you become a better all around developer through getting you involved in other things besides just django and python. Generally these build systems are completely custom, or based on something in the node.js world. Below are some of the more popular ones:

There is also a video on this method as well using gulp check it out here: Use gulp.js to manage static assets

Conclusion

Knowing what is the best way to build your assets is hard, I use all three methods in various projects. So I kind of take a tiered approach to know when to choose what method. If it is a small site where I am going to write it once and forget about it or maybe touch it a couple of times a year I don't worry about a build system at all, not worth the effort. If I am going to touch the code once a month on average I will generally use django-pipeline. However, if I am touching the code base 2+ times a month it means I am probably touching it more than that and I need to have the most flexibility, so I will use gulp.

These are the rules I follow, but I suggest you take a look at these methods and find your own sweet spot.

Hans-Juergen Schoenig: Transactional DDLs

From Planet PostgreSQL. Published on Jul 13, 2015.

When checking out those new features of Sybase 15.7 (yes, from time to time I got to see what commercial databases are up to),  I stumbled over an interesting and yet amusing line: “Fully Recoverable DDL”. The details seem to indicate that not just Sybase is still having a hard time to handle transactional and […]

Pavel Stehule: New Orafce and Plpgsql_check extensions released

From Planet PostgreSQL. Published on Jul 11, 2015.

I released a new versions of these packages: Orafce and Plpgsql_check. Its mostly bugfix releases with PostgreSQL 9.5 support.

Rendering your STL files with matplotlib using numpy-stl

By wol.ph from Django community aggregator: Community blog posts. Published on Jul 10, 2015.

It’s been a while since the first version of numpy-stl and a lot has changed since.

Most importantly, usage has become even easier. So here’s a demo on how to use matplotlib to render your stl files:

from stl import mesh
from mpl_toolkits import mplot3d
from matplotlib import pyplot

# Create a new plot
figure = pyplot.figure()
axes = mplot3d.Axes3D(figure)

# Load the STL files and add the vectors to the plot
your_mesh = mesh.Mesh.from_file('tests/stl_binary/HalfDonut.stl')
axes.add_collection3d(mplot3d.art3d.Poly3DCollection(your_mesh.vectors))

# Auto scale to the mesh size
scale = your_mesh.points.flatten(-1)
axes.auto_scale_xyz(scale, scale, scale)

# Show the plot to the screen
pyplot.show()

You can make the render prettier yourself of course, but it is certainly useful for testing.

Screen Shot 2015-07-10 at 18.53.28

Shaun M. Thomas: PG Phriday: 10 Ways to Ruin Performance: MAXimized Value

From Planet PostgreSQL. Published on Jul 10, 2015.

I apologize for putting this series on a short hiatus last week for the 4th of July. But worry not, for this week is something special for all the developers out there! I’m going to try to make your life easier for a change. Screw the database!

As a PGDB (PostgreSQL) DBA, it’s easy to get tied up in performance hints, obscure syntax, and mangled queries, but it’s really all about the people. These men and women who hit our databases all day long in an attempt to hit insane deadlines often stare at the screen in defeat, trying to get something to work because they’re afraid to ask for help. I know, because I used to be one of them in my bygone developer days.

Sometimes performance isn’t just about the database. Queries can be insanely complex for seemingly simple requests. Even developers who are familiar with SQL syntax don’t speak it fluently, and will often come up with rather bizarre—yet functional—solutions to get a specific data set. Nobody wins when this happens, not the poor dev who fought a truculent query all day, nor the DBA who probably doesn’t understand all the business logic that drove it.

Let’s use an example from a company I worked with in the past. This data set should give us something fairly close to what happened back then.

CREATE TABLE sys_order
(
    order_id     SERIAL       NOT NULL,
    account_id   INT          NOT NULL,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now(),
    valid_dt     TIMESTAMPTZ  NULL
);
 
INSERT INTO sys_order (account_id, product_id, item_count,
       order_dt, valid_dt)
SELECT (a.id % 1000) + 1, (a.id % 100000) + 1, (a.id % 100) + 1,
       now() - (id % 1000 || 'd')::INTERVAL + (id || 'ms')::INTERVAL,
       CASE WHEN a.id % 499 = 0
            THEN NULL
            ELSE now() - (id % 999 || 'd')::INTERVAL
       END
  FROM generate_series(1, 1000000) a(id);
 
ALTER TABLE sys_order ADD CONSTRAINT pk_order_id
      PRIMARY KEY (order_id);
 
CREATE INDEX idx_order_account_id
    ON sys_order (account_id);
 
CREATE INDEX idx_order_order_dt
    ON sys_order (order_dt DESC);
 
ANALYZE sys_order;

Consider the scenario: retrieve all of the order details for the most recent order on every account in the system that ordered something in the last month. The developer tasked with solving this conundrum knew about GROUP BY syntax, and also had enough experience to realize it wouldn’t work. Using GROUP BY is mandatory for all non-aggregate columns in a query. Thus, if we wanted the maximum order date for each account, we’d lose all of the other order information.

Given those constraints, the developer came up with this:

SELECT o.*
  FROM sys_order o
  JOIN (
        SELECT account_id,
               MAX(order_dt) AS order_dt
          FROM sys_order
         WHERE order_dt > CURRENT_DATE
                        - INTERVAL '1 month'
         GROUP BY account_id
       ) s USING (account_id, order_dt);

It may not be immediately obvious, but this approach only worked because the order system was built in such a way that an account could not order two different items simultaneously. Without that scenario, more than one row would come back for each account when joined back to the order table. But beyond that caveat, there’s an even more distressing implication. Forget the previous discussion if you can, then ask yourself this question: what does that query do?

I’ll admit I scrached my head for a couple minutes the first time I encountered it. The intent is almost completely obfuscated by the structure, which really defeats the purpose of SQL as a language. Generally the idea of SQL is to make a request, not tell the database how to fulfill it. So that’s why it’s a good thing SQL has syntax specifically to address this kind of request.

In PGDB, the DISTINCT ON clause lets us specify columns that should be used in determining whether or not a row is actually distinct, independent of any potential aggregation. It works like this: the first unique combination of any listed columns wins. That means we can naturally control the output simply by issuing a standard ORDER BY.

Here’s how that looks:

SELECT DISTINCT ON (account_id) *
  FROM sys_order
 WHERE order_dt > CURRENT_DATE
                - INTERVAL '1 month'
 ORDER BY account_id, order_dt DESC

Now look at the query and ask the same question as before: what does this query do? Well, for each distinct account, return all columns for the most recent order in the last month. Isn’t that much more straight-forward? As a bonus, this doesn’t have the potential for breaking if the system changes and allows batch ordering of multiple products per transaction.

So we now have a query that’s easier to understand, is safer and more consistent, and uses far simpler SQL syntax. The only possible drawback is that DISTINCT ON is often a bit slower than a more convoluted approach. But that’s fine; developer time is valuable. As a DBA, I know which query I’d rather try and debug!

Hopefully, I’m not the only one.

Joshua Drake: Tip for West side U.S. folks going to PgConf.EU in October

From Planet PostgreSQL. Published on Jul 09, 2015.

This tip works very well for me because of my physical location (Bellingham, WA) but it would also work reasonably well for anyone flying from Denver->West Coast including places such as Houston. It does take a little bit of patience though.

A normal trip for myself would mean driving down to SEA which is 90 minutes to 2 hours. This year, I decided on whim to see what it would take to fly out of YVR (Vancouver, B.C.) which is only 60 minutes driving.

Since I would be flying out of YVR on a non-connecting flight, I paid Canadian Dollars. For those that haven't been paying attention, the U.S. dollar has been doing very well lately (even overtaking the euro on some days). For example, the Canadian dollar is 27% cheaper right now than the U.S. dollar and thus my flight was 27% cheaper.

You can't connect to YVR, you must fly out of YVR. Therefore if you are in the aforementioned areas, you would fly into Seattle or Bellingham and then drive to YVR to connect to a new flight. Be patient and give yourself enough time (to not miss your flight), and you are going to save a lot of money.

Cheerio and make sure you register for my class!

Kevin Grittner: Deleting backup_label on restore will corrupt your database!

From Planet PostgreSQL. Published on Jul 09, 2015.

The quick summary of this issue is that the backup_label file is an integral part of your database cluster binary backup, and removing it to allow the recovery to proceed without error is very likely to corrupt your database.  Don't do that.

Note that this post does not attempt to provide complete instructions for how to restore from a binary backup -- the documentation has all that, and it is of no benefit to duplicate it here; this is to warn people about a common error in the process that can corrupt databases when people try to take short-cuts rather than following the steps described in the documentation.

How to Lose Data


The Proximate Cause

If you are not careful to follow the documentation's instructions for archiving, binary backup, and PITR restore the attempt to start the restored database may fail, and you may see this in the log:
FATAL:  could not locate required checkpoint record
HINT:  If you are not restoring from a backup, try removing the file "$PGDATA/backup_label".
... where $PGDATA is the path to the data directory.  It is critically important to note that the hint says to try removing the file "If you are not restoring from a backup".  If you are restoring from a backup, removing the file will prevent recovery from knowing what set of WAL records need to be applied to the copy to put it into a coherent state; it will assume that it is just recovering from a crash "in place" and will be happy to apply WAL forward from the completion of the last checkpoint.  If that last checkpoint happened after you started the backup process, you will not replay all the WAL needed to achieve a coherent state, and you are very likely to have corruption in the restored database.  This corruption could result in anything from the database failing to start to errors about bad pages to silently returning incorrect results from queries when a particular index is used.  These problems may appear immediately or lie dormant for months before causing visible problems.

Note that you might sometimes get lucky and not experience corruption.  That doesn't mean that deleting the file when restoring from a backup is any more safe than stepping out onto a highway without checking for oncoming traffic -- failure to get clobbered one time provides no guarantee that you will not get clobbered if you try it again.


Secondary Conditions

Now, if you had followed all the other instructions from the documentation for how to restore, making the above mistake would not corrupt your database.  It can only do so as the last step in a chain of mistakes.  Note that for restoring a backup you are supposed to make sure that the postmaster.pid file and the files in the pg_xlog subdirectory have been deleted.  Failure to do so can cause corruption if the database manages to recover in spite of the transgressions.  But if you have deleted (or excluded from backup) the files in the pg_xlog directory, deleting the backup_label file is likely to result in another failure to start, with this in the log:
LOG:  invalid primary checkpoint record
LOG:  invalid secondary checkpoint record
PANIC:  could not locate a valid checkpoint record
What the hint from the first error above doesn't say is that if you are restoring from a backup, you should check that you don't have an files in pg_xlog from the time of the backup, you should check that do not have a postmaster.pid file, and you should make sure you have a recovery.conf file with appropriate contents (including a restore_command entry that will copy from your archive location).


Why Does This Happen?


The Recovery Process

Restoring from a binary backup makes use of the same recovery process that prevents data loss on a crash of the server.  As pages for relations (tables, indexes, etc.) and other internal structures are modified, these changes are made in RAM buffers which are not written out to the OS until they have been journalled to the Write Ahead Log (WAL) files and flushed to persistent storage (e.g., disk).  Periodically there is a checkpoint, which writes all of the modified pages out to the OS and tells the OS to flush them to permanent storage.  So, if there is a crash, the recovery process can look to the last checkpoint and apply all WAL from that point forward to reach a consistent state.  WAL replay will create, extend, truncate, or remove tables as needed, modify data within files, and will tolerate the case that these changes were already flushed to the main files or have not yet made it to persistent storage.  To handle possible race conditions around the checkpoint, the system tracks the last two checkpoints, and if it can't use one of them it will go to the other.

When you run pg_start_backup() it waits for a distributed (or "paced") checkpoint in process to complete, or (if requested to do so with the "fast" parameter) forces an immediate checkpoint at maximum speed.  You can then copy the files in any order while they are being modified as long as the copy is completed before pg_stop_backup() is called.  Even though there is not consistency among the files (or even within a single file), WAL replay (if it starts from the point of the checkpoint related to the call to pg_start_backup()) will bring things to a coherent state just as it would in crash recovery.

The backup_label File

How does the recovery process know where in the WAL stream it has to start replay for it to be possible to reach a consistent state?  For crash recovery it's simple: it goes to the last checkpoint that is in the WAL based on data saved in the global/pg_control file.  For restoring a backup, the starting point in the WAL steam must be recorded somewhere for the recovery process to find and use.  That is the purpose of the backup_label file.  The presence of the file indicates to the recovery process that it is restoring from a backup, and tells it what WAL is needed to reach a consistent state.  It also contains information that may be of interest to a DBA, and is in a human-readable format; but that doesn't change the fact that it is an integral part of a backup, and the backup is not complete or guaranteed to be usable if it is removed.

Recovery


If you delete the file and cannot prove that there were no checkpoints after pg_start_backup() was run and before the backup copy was completed, you should assume that the database has hidden corruption.  If you can restore from a backup correctly, that is likely to be the best course; if not, you should probably use pg_dump and/or pg_dumpall to get a logical dump, and restore it to a fresh cluster (i.e., use initdb to get a cluster free from corruption to restore into).

Avoidance

If you read the documentation for restoring a binary backup, and follow the steps provided, you will never see this error during a restore and will not suffer the corruption problems.

Marco Slot: Scalable PostgreSQL on Amazon RDS using masterless pg_shard

From Planet PostgreSQL. Published on Jul 09, 2015.

The pg_shard extension provides a transparent, automatic sharding solution for PostgreSQL. It can shard a table across a cluster of PostgreSQL nodes, storing shards in regular tables. All communication between the pg_shard master and worker nodes happens using regular SQL commands, which allows almost any PostgreSQL server to act as a worker node, including Amazon RDS instances.

Apart from simplifying database administration, using Amazon RDS or a similar solution with pg_shard has another benefit. RDS instances have automatic failover using streaming replication, which means that it is not necessary to use pg_shard’s built-in replication for high availability. Without replication, pg_shard can be used in a multi-master / masterless set-up.

At this week’s PGDay UK, we demonstrated a distributed PostgreSQL cluster consisting of 4 worker nodes on RDS and 2 master nodes on EC2 with pg_shard installed (as shown below). We showed how the cluster automatically recovers when you terminate workers or master nodes while running queries. To make it even more interesting, we put the master nodes in an auto-scaling group and put a load-balancer in front of them. This architecture is somewhat experimental, but it can support a very high number of transactions per second and very large data sizes.

Architecture diagram - click to enlarge

Now what good is a demo if you can’t play with it yourself? To start your very own state-of-the-art distributed PostgreSQL cluster with 4 worker nodes on Amazon RDS: Launch it using CloudFormation! Make sure to enter a (long) database password and your EC2 keypair in the Parameters screen. You can leave the other settings on their defaults.

Once stack creation completes (~25 minutes), you can find the hostname of the load-balancer in the Outputs tab in the CloudFormation console (may require refresh), the hostnames of the master nodes can be found in the EC2 console, and the worker nodes in the RDS console.

We recommend you start by connecting to one of the master nodes over SSH. On the master node, run psql and enter the following commands:

CREATE TABLE customer_reviews
(
    customer_id TEXT NOT NULL,
    review_date DATE,
    review_rating INTEGER,
    review_votes INTEGER,
    review_helpful_votes INTEGER,
    product_id CHAR(10),
    product_title TEXT,
    product_sales_rank BIGINT,
    product_group TEXT,
    product_category TEXT,
    product_subcategory TEXT,
    similar_product_ids CHAR(10)[]
);

SELECT master_create_distributed_table('customer_reviews', 'customer_id');
SELECT master_create_worker_shards('customer_reviews', 128, 1);
\q

Every master node has a script to sync metadata to the other master nodes. In the shell, run:

sync-metadata customer_reviews

Now you should be able to run queries on the customer_reviews table via the load-balancer by running the command below. See the pg_shard github page for some example queries to use.

psql -h $(cat /etc/load-balancer)

To ingest some interesting data, use the following commands to INSERT rows using 256 parallel streams:

wget http://examples.citusdata.com/customer_reviews_{1998..2004}.csv.gz
gzip -d customer_reviews_*.csv.gz
parallel-copy-in -P 256 -C -h $(cat /etc/load-balancer) customer_reviews_1998.csv customer_reviews

In our initial benchmarks using the CloudFormation template we saw well over 100k INSERTS/second across the 4 RDS instances. It should be noted that there will be some limitations to using Amazon RDS with pg_shard until Amazon adds the pg_shard extension to RDS. For example, copying shards between worker nodes is not supported.

Sharded tables can also be queried in parallel for real-time analytics using CitusDB, which pushes down computation to the worker nodes and supports JOINs. This uniquely positions PostgreSQL as a platform that can support real-time data ingestion, fast sharded queries, and real-time analytics at a massive scale. An example of running queries using CitusDB is shown below:

\timing
SET pg_shard.use_citusdb_select_logic TO on;
SELECT review_date, count(*) FROM customer_reviews WHERE review_date BETWEEN '1998-01-01' AND '1998-01-31' GROUP BY review_date ORDER BY review_date ASC;

If you’d like to learn more about using CitusDB for real-time analytics for big data, don’t hesitate to contact us.

Dinesh Kumar: Parallel Operations With pl/pgSQL

From Planet PostgreSQL. Published on Jul 09, 2015.


Hi,

I am pretty sure that, there will be a right heading for this post. For now, i am going with this. If you could suggest me proper heading, i will update it :-)

OK. let me explain the situation. Then will let you know what i am trying to do here, and how i did it.

Situation here is,

We have a table, which we need to run update on “R” no.of records. The update query is using some joins to get the desired result, and do update the table. To process these “R” no.of records, it is taking “H” no.of hours. That too, it’s giving load on the production server. So, we planned to run this UPDATE as batch process. Per a batch process, we took “N” no.or records. To process this batch UPDATE, it is taking “S” no.of seconds.

With the above batch process, production server is pretty stable, and doing great. So, we planned to run these Batch updates parallel. I mean, “K” sessions, running different record UPDATEs. Of-course, we can also increase the Batch size here. But we want to use much cpus to complete all these UPDATES as soon as possible.

Problem here is,

So, as i said, we need to run multiple UPDATEs on multiple records in parallel. But, how can one session is going to communicate with other sessions on this batch records.
I mean,
If one session is running updates on 1 to 1000, how could the second session knows that the other session was processing from 1 to 1000.
If the second session knows this information, this will start from 1001 to 2000 in parallel. This is the problem i am trying to solve here.

I am not sure whether this is the optimal solution, but as per my requirement it’s working. :-) Let me know if you see any problems in it.

Object Definitions
                      Table "public.test"
Column | Type | Modifiers
--------+---------+----------------------------------------------------
t | text |
i | boolean |
seq | bigint | not null default nextval('test_seq_seq'::regclass)

postgres=# INSERT INTO test VALUES(generate_series(1, 9000), false, generate_series(1, 9000));
INSERT 0 9000

postgres=# \ds testing
List of relations
Schema | Name | Type | Owner
--------+---------+----------+----------
public | testing | sequence | postgres
(1 row)


CREATE OR REPLACE FUNCTION public.update_test_parallel(batch integer)
RETURNS void
LANGUAGE plpgsql
AS $function$
DECLARE
VAR BIGINT;
DUMMY TEXT;
BEGIN

-- Adding this for Demo
--

SELECT pg_sleep(10) INTO DUMMY;

SELECT pg_advisory_lock(-1234) INTO DUMMY;

SELECT nextval('testing') INTO VAR;
EXECUTE E'SELECT nextval(\'testing\') FROM generate_series('||VAR||','||VAR+BATCH||')';

-- We need to decrease the sequence value by two, since we executed nextval expression twice
-- Otherwise, it will affect the other session''s execution.
--
SELECT setval('testing', currval('testing')-2) INTO DUMMY;

SELECT pg_advisory_unlock(-1234) INTO DUMMY;

-- I want to update the test table of the column "I" with value "true".
--
UPDATE test SET I=true WHERE SEQ BETWEEN VAR AND (VAR+BATCH);


RAISE NOTICE 'VAR IS %, VAR+BATCH IS %', VAR, (VAR+BATCH);
RAISE NOTICE 'CURRENT SEQ VALUE IS %', currval('testing');

EXCEPTION WHEN OTHERS THEN

-- If there is an exception, we need to reset the sequence to it''s start position again.
-- So that, the other sessions, will try with the same sequence numbers.
--
SELECT setval('testing', VAR-1) INTO DUMMY;
SELECT pg_advisory_unlock(-1234) INTO DUMMY;
RAISE EXCEPTION '%', SQLERRM;
END;
$function$;
Session 1
 
postgres=# SELECT public.update_test_parallel(3000);
NOTICE: VAR IS 1, VAR+BATCH IS 3001
NOTICE: CURRENT SEQ VALUE IS 3000
update_test_parallel
----------------------

(1 row)
Session 2
 
postgres=# SELECT public.update_test_parallel(3000);
NOTICE: VAR IS 3001, VAR+BATCH IS 6001
NOTICE: CURRENT SEQ VALUE IS 6000
update_test_parallel
----------------------

(1 row)
Session 3
 
postgres=# SELECT public.update_test_parallel(3000);
NOTICE: VAR IS 6001, VAR+BATCH IS 9001
NOTICE: CURRENT SEQ VALUE IS 9000
update_test_parallel
----------------------

(1 row)
Desired result
 
postgres=# SELECT COUNT(*) FROM test WHERE i is true;
count
-------
9000
(1 row)

In the above implementation, i took "sequence" for the session's parallel execution with the help of advisory locks. Hope this helps to others as well.

Thanks as always for reading it, and welcome your inputs.
 --Dinesh Kumar

Craig Ringer: BDR 0.9.2 and BDR-PostgreSQL 9.4.4 released

From Planet PostgreSQL. Published on Jul 07, 2015.

Version 0.9.2 of the BDR (Bi-Directional Replication) extension for PosgreSQL has been released.

This is a maintenance release in the current stable 0.9.x series, focused on bug fixes, stability and usability improvements. In particular bdr_init_copy, the pg_basebackup-based node bring-up tool, is significantly improved in this update.

This release also updates the BDR-patched version of PostgreSQL its self to version 9.4.4.

Sources and RPMs (for RHEL and CentOS 6 and 7, Fedora 21 and 22) are available. Debian/Ubuntu packages will be pushed to the repository soon.

As before, the sources will build and run fine on any Linux/BSD or on Mac OS X, but do not currently support Windows.

The release notes have more detail on what has changed in this release.

Paul Ramsey: 2.1.8 Released

From Planet PostgreSQL. Published on Jul 06, 2015.

Due to a number of bugs capable of crashing a back-end, we have released 2.1.8. If you are running an earlier version on a public site, we recommend that you update as soon as possible.

http://download.osgeo.org/postgis/source/postgis-2.1.8.tar.gz

View all closed tickets for 2.0.8.

Einladung zur Django-UserGroup Hamburg am 08. Juli

By Arne Brodowski from Django community aggregator: Community blog posts. Published on Jul 01, 2015.

Das nächste Treffen der Django-UserGroup Hamburg findet am Mittwoch, den 08.07.2015 um 19:30 statt. Achtung: Neue Location! Dieses Mal treffen wir uns in den Räumen der Smaato Inc., Valentinskamp 70, Emporio 19. Stock in 20355 Hamburg.

Auf diesem Treffen gibt es einen Vortrag über TDD für APIs von Michael Kuehne.

Bitte seid um ca. 19:20 unten im Foyer, wir fahren dann gemeinsam in den 19. Stock. Wer später kommt, sagt bitte am Empfang bescheid, dass er zur Smaato Inc möchte und wird dann hoch gelassen.

Eingeladen ist wie immer jeder der Interesse hat sich mit anderen Djangonauten auszutauschen. Eine Anmeldung ist nicht erforderlich, hilft aber bei der Planung.

Weitere Informationen über die UserGroup gibt es auf unserer Webseite www.dughh.de.

Die Organisation der Django-UserGroup Hamburg findet ab jetzt über Meetup statt. Um automatisch über zukünftige Treffen informiert zu werden, werdet bitte Mitglied in unserer Meetup-Gruppe: http://www.meetup.com/django-hh

Observations on the nature of time. And javascript.

By Isotoma Blog from Django community aggregator: Community blog posts. Published on Jun 30, 2015.

In the course of working on one of our latest projects, I picked up an innocuous looking ticket that said: “Date pickers reset to empty on form submission”. “Easy”, I thought. It’s just the values being lost somewhere in form validation.And then I saw the ‘in Firefox and IE’ description. Shouldn’t be too hard, it’ll be a formatting issue or something, maybe even a placeholder, right?

Yeah, no.

Initial Investigations

Everything was fine in Chrome, but not in Firefox. I confirmed the fault also existed in IE (and then promptly ignored IE for now).

The responsible element looked like this:
<input class="form-control datepicker" data-date-format="{{ js_datepicker_format }}" type="date" name="departure_date" id="departure_date" value="{{ form.departure_date.value|default:'' }}">

This looks pretty innocent. It’s a date input, how wrong can that be?

Sit comfortably, there’s a rabbit hole coming up.

On Date Inputs

Date type inputs are a relatively new thing, they’re in the HTML5 Spec. Support for it is pretty mixed. This jumps out as being the cause of it working in Chrome, but nothing else. Onwards investigations (and flapping at colleagues) led to the fact that we use bootstrap-datepicker to provide a JS/CSS based implementation for the browsers that have no native support.

We have an isolated cause for the problem. It is obviously something to do with bootstrap-datepicker, clearly. Right?

On Wire Formats and Localisation

See that data-date-format="{{ js_datepicker_format }}" attribute of the input element. That’s setting the date format for bootstrap-datepicker. The HTML5 date element doesn’t have similar. I’m going to cite this stackoverflow answer rather than the appropriate sections of the documentation. The HTML5 element has the concept of a wire format and a presentation format. The wire format is YYYY-MM-DD (iso8601), the presentation format is whatever the user has the locale set to in their browser.

You have no control over this, it will do that and you can do nothing about it.

bootstrap-datepicker, meanwhile has the data-date-format element, which controls everything about the date that it displays and outputs. There’s only one option for this, the wire and presentation formats are not separated.

This leads to an issue. If you set the date in YYYY-MM-DD format for the html5 element value, then Chrome will work. If you set it to anything else, then Chrome will not work and bootstrap-datepicker might, depending on if the format matches what is expected.

There’s another issue. bootstrap-datepicker doesn’t do anything with the element value when you start it. So if you set the value to YYYY-MM-DD format (for Chrome), then a Firefox user will see 2015-06-24, until they select something, at which point it will change to whatever you specified in data-date-format. But a Chrome user will see it in their local format (24/06/2015 for me, GB format currently).

It’s all broken, Jim.

A sidetrack into Javascript date formats.

The usual answer for anything to do with dates in JS is ‘use moment.js’. But why? It’s a fairly large module, this is a small problem, surely we can just avoid it?

Give me a date:

>>> var d = new Date();
undefined

Lets make a date string!

>>> d.getYear() + d.getMonth() + d.getDay() + ""
"123"

Wat. (Yeah, I know that’s not how you do string formatting and therefore it’s my fault.)

>>> d.getDay()
3

It’s currently 2015-06-24. Why 3?.

Oh, that’s day of the week. Clearly.

>>> d.getDate()
24

The method that gets you the day of the month is called getDate(). It doesn’t, you know, RETURN A DATE.

>>> var d = new Date('10-06-2015')
undefined
>>> d
Tue Oct 06 2015 00:00:00 GMT+0100 (BST)

Oh. Default date format is US format (MM-DD-YYYY). Right. Wat.

>>> var d = new Date('31-06-2015')
undefined
>>> d
Invalid Date

That’s… reasonable, given the above. Except that’s a magic object that says Invalid Date. But at least I can compare against it.

>>> var d = new Date('31/06/2015')
undefined
>>> d
Invalid Date

Oh great, same behaviour if I give it UK date formats (/ rather than -). That’s okay.

>>> var d = new Date('31/06/2015')
undefined
>>> d
"Date 2017-07-05T23:00:00.000Z"

Wat.

What’s going on?

The difference here is that I’ve used Firefox, the previous examples are in Chrome. I tried to give an explanation of what that’s done, but I actually have no idea. I know it’s 31 months from something, as it’s parsed the 31 months and added it to something. But I can’t work out what, and I’ve spent too long on this already. Help. Stop.

So. Why you should use moment.js. Because otherwise the old great ones will be summoned and you will go mad.

Also:

ISO Date Format is not supported in Internet Explorer 8 standards mode and Quirks mode.

Yep.

The Actual Problem

Now I knew all of this, I could see the problem.

  1. The HTML5 widget expects YYYY-MM-DD
  2. The JS widget will set whatever you ask it to
  3. We were outputting GB formats into the form after submission
  4. This would then be an incorrect format for the HTML 5 widget
  5. The native widget would not change an existing date until a new one is selected, so changing the output format to YYYY-MM-DD meant that it changed when a user selected something.

A Solution In Two Parts

The solution is to standardise the behaviour and formats across both options. Since I have no control over the HTML5 widget, looks like it’s time to take a dive into bootstrap-datepicker and make that do the same thing.

Deep breath, and here we go…

Part 1

First job is to standardise the output date format in all the places. This means that the template needs to see a datetime object, not a preformatted date.

Once this is done, can feed the object into the date template tag, with the format filter. Which takes PHP date format strings. Okay, that’s helpful in 2015. Really.

Figured that out, changed the date parsing Date Input Formats and make sure it has the right ISO format in it.

That made the HTML5 element work consistently. Great.

Then, to the javascript widget.

bootstrap-datepicker does not do anything with the initial value of the element. To make it behave the same as the HTML5 widget, you need to:

1. Get the locale of the user

2. Get the date format for that locale

3. Set that as the format of the datepicker

4. Read the value

5. Convert the value into the right format

6. Call the setValue event of the datepicker with that value

This should be relatively straightforward, with a couple of complications.

  1. moment.js uses a different date format to bootstrap-datepicker
  2. There is no easy way to get a date format string, so a hardcoded list is the best solution.

// taken from bootstrap-datepicker.js
function parseFormat(format) {
    var separator = format.match(/[.\/\-\s].*?/),
        parts = format.split(/\W+/);
    if (!separator || !parts || parts.length === 0){
        throw new Error("Invalid date format.");
    }
    return {separator: separator, parts: parts};
}

var momentUserDateFormat = getLocaleDateString(true);
var datepickerUserDateFormat = getLocaleDateString(false);

$datepicker.each(function() {
    var $this = $(this);
    var presetData = $this.val();
    $this.data('datepicker').format = parseFormat(datepickerUserDateFormat);
    if (presetData) {
        $this.datepicker('setValue', moment(presetData).format(momentUserDateFormat));
    }
});

A bit of copy and paste code from the bootstrap-datepicker library, some jquery and moment.js and the problem is solved.

Part 3

Now we have the dates displaying in the right format on page load, we need to ensure they’re sent in the right format after the user has submitted the form. Should just be the reverse operation.

 function rewriteDateFormat(event) {
    var $this = $(event.data.input);
    if ($this.val()) {
        var momentUserDateFormat = getLocaleDateString(true);
        $this.val(moment($this.val(), [momentUserDateFormat, 'YYYY-MM-DD']).format('YYYY-MM-DD'));
    }
}

$datepicker.each(function() {
    var $this = $(this);
     // set the form handler for rewriting the format on submit
    var $form = $this.closest('form');
    $form.on('submit', {input: this}, rewriteDateFormat);
});

And we’re done.

Takeaways

Some final points that I’ve learnt.

  1. Always work in datetime objects until the last possible point. You don’t have to format them.
  2. Default to ISO format unless otherwise instructed
  3. Use parsing libraries

 

Wrapping Up CycleMento

By GoDjango - Django Screencasts from Django community aggregator: Community blog posts. Published on Jun 23, 2015.

in this video we wrap up the Building a Product series by doing an overview of topics we discussed in the previous 11 videos.
Watch Now...

Announcing the Evennia example-game project "Ainneve"

By Griatch's Evennia musings (MU* creation with Django+Twisted) from Django community aggregator: Community blog posts. Published on Jun 22, 2015.

The Evennia example-game project is underway!

I was quite impressed with the response I got on the mailing list to my call for developing an Evennia example game (see my Need your Help blog post).

The nature of the responses varied, many were from interested people with little to no experience in Evennia or Python whereas others had the experience but not the time to lead it. It was however clear that the interest to work on an "official" Evennia game is quite big.

I'm happy to announce, however, that after only a week we now have a solid lead developer/manager, George Oliver. Helping him on the technical/architecture side is Whitenoise (who, despite a barren github profile, is a professional developer).

George put together a game proposal based on the OpenAdventure rpg, an open-source (CC-SA) ruleset that is also found on github. The example game is to be named "Ainneve" and its development is found in a in a separate repository under the github Evennia organisation.

All the relevant links and future discussion can be found on the mailing list.

George and whitenoise have already made it clear that they aim to not only make Ainneve a good example Evennia game for others to learn from and build on, but to make the development itself a possibility for people of all skill levels to get involved. So get in touch with them if you are at all interested in Python, Evennia and mud development!

So thanks to George and whitenoise for taking this on, looking forward to see where it leads!


image from loveintoblender.

Reading MT940 files using Python

By wol.ph from Django community aggregator: Community blog posts. Published on Jun 19, 2015.

Some time ago I wrote a library to read MT940 files with Python. While there are multiple libraries available for this target, none of the others really work properly and/or support all variants of the format.

The MT940 library I wrote is slightly different, it’s designed to be able to parse any MT940 file, regardless whether it’s correct or complete. The initial version of the library was very strict and only supported files that perfectly followed the standards, a few months after the release it became obvious that most banks either used different standards when implementing the standard or interpreted the standard different. Regardless, the library gave little to no results or even crashed on some MT940 files.

Upon reflection I rewrote nearly all of the code to have a script that is flexible enough to support any format (even supporting custom processors for specific format) and wrote test code that tested every MT940 file I could find on the web. The result… a library that parsers pretty much everything out there while still maintaining a reasonable amount of results.

Usage? As simple as you might imagine. After installing (pip install mt-940, note the dash) usage can be as simple as this:

import mt940
import pprint

transactions = mt940.parse('tests/jejik/abnamro.sta')

print 'Transactions:'
print transactions
pprint.pprint(transactions.data)

print
for transaction in transactions:
    print 'Transaction: ', transaction
    pprint.pprint(transaction.data)

For more examples, have a look at the tests. For example, the preprocessor test:

import pytest
import mt940


@pytest.fixture
def sta_data():
    with open('tests/jejik/abnamro.sta') as fh:
        return fh.read()


def test_pre_processor(sta_data):
    transactions = mt940.models.Transactions(processors=dict(
        pre_closing_balance=[
            mt940.processors.add_currency_pre_processor('USD'),
        ],
        pre_opening_balance=[
            mt940.processors.add_currency_pre_processor('EUR'),
        ],
    ))

    transactions.parse(sta_data)
    assert transactions.data['closing_balance'].amount.currency == 'USD'
    assert transactions.data['opening_balance'].amount.currency == 'EUR'


def test_post_processor(sta_data):
    transactions = mt940.models.Transactions(processors=dict(
        post_closing_balance=[
            mt940.processors.date_cleanup_post_processor,
        ],
    ))

    transactions.parse(sta_data)
    assert 'closing_balance_day' not in transactions.data

Django Birthday Party

By Revolution Systems Blog from Django community aggregator: Community blog posts. Published on Jun 17, 2015.

Django Birthday Party

Beyond Request-Response

By Andrew Godwin from Django community aggregator: Community blog posts. Published on Jun 17, 2015.

Examining one of Django's key abstractions and how it could be updated for a more modern age.

While I love Django dearly, and I think the rate of progress we keep on the project draws a fine balance between progress and backwards-compatibility, sometimes I look ahead to the evolving Web and wonder what we can do to adapt to some more major changes.

We're already well-placed to be the business-logic and data backend to more JavaScript-heavy web apps or native apps; things like Django REST Framework play especially well into that role, and for more traditional sites I still think Django's view and URL abstractions do a decent job - though the URL routing could perhaps do with a refresh sometime soon.

The gaping hole though, to me, was always WebSocket support. It used to be long-poll/COMET support, but we seem to have mostly got past that period now - but even so, the problem remains the same. As a framework, Django is tied to a strict request-response cycle - a request comes in, a worker is tied up handling it until a response is sent, and you only have a small number of workers.

Trying to service long-polling would eat through your workers (and your RAM if you tried to spin up more), and WebSockets are outside the scope of even WSGI itself.

That's why I've come up with a proposal to modify the way Django handles requests and views. You can see the full proposal in this gist, and there's a discussion thread on django-developers, but I wanted to make a blog post explaining more of my reasoning.

The Abstraction

I'm going to skip over why we would need support for these features - I think that's relatively obvious - and jump straight to the how.

In particular, the key thing here is that this is NOT making Django asynchronous in the way where we make core parts nonblocking, or make everything yield or take callbacks.

Instead, the key change is quite small - changing the core "chunk" of what Django runs from views to consumers.

Currently, a view takes a single request and returns a single response:

In my proposed new model, a consumer takes a single message on a channel and returns zero to many responses on other channels:

The code inside the consumer runs like normal Django view code, complete with things like transaction auto-management if desired - it doesn't have to do anything special or use any new async-style APIs.

On top of that, the old middleware-url-view model can itself run as a consumer, if we have a channel for incoming requests and a channel (per client) for outgoing responses.

In fact, we can extend that model to more than just requests and responses; we can also define a similar API for WebSockets, but with more channels - one for new connections, one for incoming data packets, and one per client for outgoing data.

What this means is that rather than just reacting to requests and returning responses, Django can now react to a whole series of events. You could react to incoming WebSocket messages and write them to other WebSockets, like a chat server. You could dispatch task descriptions from inside a view and then handle them later in a different consumer, once the response is sent back.

The Implementation

Now, how do we run this? Clearly it can't run in the existing Django WSGI infrastructure - that's tied to the request lifecycle very explicitly.

Instead, we split Django into three layers:

  • The interface layers, initially just WSGI and WebSockets at launch. These are responsible for turning the client connections into channel messages and vice-versa.
  • The channel layer, a pluggable backend which transports messages over a network - initially two backends, one database-backed and one redis-backed.
  • The worker layer, which are processes that loop and run any pending consumers when messages are available for them.

The worker is pretty simple - it's all synchronous code, just finding a pending message, picking the right consumer to run it, and running the function until it returns (remember, consumers can't block on channels, only send to them - they can only ever receive from one, the one they're subscribed to, precisely to allow this worker model and prevent deadlocks).

The channel layer is pluggable, and also not terribly complicated; at its core, it just has a "send" and a "receive_many" method. You can see more about this in the prototype code I've written - see the next section.

The interface layers are the more difficult ones to explain. They're responsible for interfacing the channel-layer with the outside world, via a variety of methods - initially, the two I propose are:

  • A WSGI interface layer, that translates requests and responses
  • A WebSocket interface layer, that translates connects, closes, sends and receives

The WSGI interface can just run as a normal WSGI app (it doesn't need any async code to write to a channel and then block on the response channel until a message arrives), but the WebSocket interface has to be more custom - it's the bit of code that lets us write our logic in clean, separate consumer functions by handling all of that connection juggling and keeping track of potentially thousands of clients.

I'm proposing that the first versions of the WebSocket layer are written in Twisted (for Python 2) and asyncio (for Python 3), largely because that's what Autobahn|Python supports, but there's nothing to stop someone writing an interface server that uses any async tech they like (even potentially another language, though you'd have to then also write channel layer bindings).

The interface layers are the glue that lets us ignore asynchrony and connection volumes in the rest of our Django code - they're the things responsible for terminating and handling protocols and interfacing them with a more standard set of channel interactions (though it would always be possible to write your own with its own channel message style if you wanted).

An end-user would only ever run premade one of them; they're the code that solves the nasty part of the common problem, and all the issues about tuning and tweaking them fall to Django - and I think that's the job of a framework, to handle those complicated parts for you.

Why Workers?

Some people will wonder why this is just a simple worker model - there's nothing particularly revolutionary here, and it's nowhere near rewriting Django to be "asynchronous" internally.

Basically, I don't think we need that. Writing asynchronous code correctly is difficult for even experienced programmers, and what would it accomplish? Sure, we'd be able to eke out more performance from individual workers if we were sending lots of long database queries or API requests to other sites, but Django has, for better or worse, never really been about great low-level performance.

I do think it will perform slightly better than currently - the channel layer, providing it can scale well enough, will "smooth out" the peaks in requests across the workers. Scaling the channel layer is perhaps the biggest potential issue for large sites, but there's some potential solutions there (especially as only the channels listened to by interface servers need to be global and not sharded off into chunks of workers)

What I want is the ability for anyone from beginner programmers and up to be able to write code that deals with WebSockets or long-poll requests or other non-traditional interaction methods, without having to get into the issues of writing async code (blocking libraries, deadlocks, more complex code, etc.).

The key thing is that this proposal isn't that big a change to Django both in terms of code and how developers interact with it. The new abstraction is just an extension of the existing view abstraction and almost as easy to use; I feel it's a reasonably natural jump for both existing developers and new ones working through tutorials, and it provides the key features Django is missing as well as adding other ways to do things that currently exist, like some tasks you might send via Celery.

Django will still work as it does today; everything will come configured to run things through the URL resolver by default, and things like runserver and running as a normal WSGI app will still work fine (internally, an in-memory channel layer will run to service things - see the proper proposal for details). The difference will be that now, when you want to go that step further and have finer control over HTTP response delays or WebSockets, you can now just drop down and do them directly in Django rather than having to go away and solve this whole new problem.

It's also worth noting that while some kind of "in-process" async like greenlets, Twisted or asyncio might let Django users solve some of these problems, like writing to and from WebSockets, they're still process local and don't enable things like chat message broadcast between different machines in a cluster. The channel layer forces this cross-network behaviour on you from the start and I think that's very healthy in application design; as an end developer you know that you're programming in a style that will easily scale horizontally.

Show Me The Code

I think no proposal is anywhere near complete until there's some code backing it up, and so I've written and deployed a first version of this code, codenamed channels.

You can see it on GitHub: https://github.com/andrewgodwin/django-channels

While this feature would be rolled into Django itself in my proposal, developing it as a third-party app initially allows much more rapid prototyping and the ability to test it with existing sites without requiring users to run an unreleased version or branch of Django.

In fact, it's running on this very website, and I've made a simple WebSocket chat server that's running at http://aeracode.org/chat/. The code behind it is pretty simple; here's the consumers.py file:

import redis
from channels import Channel

redis_conn = redis.Redis("localhost", 6379)

@Channel.consumer("django.websocket.connect")
def ws_connect(path, send_channel, **kwargs):
    redis_conn.sadd("chatroom", send_channel)

@Channel.consumer("django.websocket.receive")
def ws_receive(channel, send_channel, content, binary, **kwargs):
    # Ignore binary messages
    if binary:
        return
    # Re-dispatch message
    for channel in redis_conn.smembers("chatroom"):
        Channel(channel).send(content=content, binary=False)

@Channel.consumer("django.websocket.disconnect")
def ws_disconnect(channel, send_channel, **kwargs):
    redis_conn.srem("chatroom", send_channel)
    # NOTE: this does not clean up server crash disconnects,
    # you'd want expiring keys as well real life.

Obviously, this is a simple example, but it shows how you can have Django respond to WebSockets and both push and receive data. Plenty more patterns are possible; you could push out chat messages in a post_save signal hook, you could dispatch thumbnailing tasks when image uploads complete, and so on.

There's not enough space here for all the examples and options, but hopefully it's given you some idea what I'm going for. I'd also encourage you to, if you're interested, download and try the example code; it's nowhere near consumer ready yet, and I aim to get it much further and get better documentation soon, but the README should give you some idea.

Your feedback on the proposal and my alpha code is more than welcome; I'd love to know what you think, what you don't like, and what issues you're worried about. You can chime in on the django-developers thread, or you can email me personally at andrew@aeracode.org.

Buildout and Django: djangorecipe updated for gunicorn support

By Reinout van Rees' weblog from Django community aggregator: Community blog posts. Published on Jun 15, 2015.

Most people in the Django world probably use pip to install everything. I (and the company were I work, Nelen & Schuurmans) use buildout instead. If there are any other buildout users left outside of zope/plone, I'd love to hear it :-)

First the news about the new update, after that I'll add a quick note about what's good about buildout, ok?

Djangorecipe 2.1.1 is out. The two main improvements:

  • Lots of old unused functionality has been removed. Project generation, for instance. Django's own startproject is good enough right now. And you can also look at cookiecutter. Options like projectegg and wsgilog are gone as they're not needed anymore.

  • The latest gunicorn releases didn't come with django support anymore. You used to have a bin/django run_gunicorn (or python manage.py run_gunicorn) management command, but now you just have to run bin/gunicorn yourproject.wsgi. And pass along an environment variable that points at your django settings.

    With the latest djangorecipe, you can add a scripts-with-settings = gunicorn option and it'll create a bin/gunicorn-with-settings script for you that sets the environment variable automatically. Handy!

Advantage of buildout. To me, the advantage of buildout is threefold:

  • Buildout is more fool-proof. With pip/virtualenv you should remember to activate the virtualenv. With buildout, the scripts themselves make sure the correct sys.path is set.

    With pip install something you shouldn't forget the -r requirements.txt option. With buildout, the requirement restrictions ("versions") are applied automatically.

    With pip, you need to set the django settings environment variable in production and staging. With buildout, it is just bin/django like in development: it includes the correct reference to the correct settings file automatically.

    There just isn't anything you can forget!

  • Buildout is extensible. You can extend it with "recipes". Like a django recipe that helps with the settings and so. Or a template recipe that generates an ngnix config based on a template with the django port and hostname already filled in from the buildout config file. Or a sysegg recipe that selectively injects system packages (=hard to compile things like numpy, scipy, netcdf4).

  • Buildout "composes" your entire site, as far as possible. Pip "just" grabs your python packages. Buildout can also build NPM and run grunt to grab your javascript and can automatically run bin/django collectstatic -y when you install it in production. And generate an nginx/apache file based on your config's gunicorn port. And generate a supervisord config with the correct gunicorn call with the same port number.

Of course there are drawbacks:

  • The documentation is definitively not up to the standards of django itself. Actually, I don't really want to point at the effectively unmaintained main documentation site at http://www.buildout.org/.... You need some experience with buildout to be able to get and keep it working.
  • Most people use pip.

Why do I still use it?

  • The level of automation you can get with buildout ("composability") is great.
  • It is fool-proof. One bin/buildout and everything is set up correctly. Do you trust every colleague (including yourself) to remember 5 different commands to set up a full environment?
  • If you don't use buildout, you have to use pip and virtualenv. And a makefile or something like that to collect all the various parts. Or you need ansible even to set up a local environment.
  • Syseggrecipe makes it easy to include system packages like numpy, scipy, mapnik and so on. Most pip-using web developers only need a handful of pure python packages. We're deep into GIS and numpy/gdal territory. You don't want to compile all that stuff by hand. You don't want to have to keep track of all the development header file packages!

So... hurray for buildout and for the updated djangorecipe functionality! If you still use it, please give me some feedback at reinout@vanrees.org or in the comments below. I've removed quite some old functionality and I might have broken some usecases. And buildout/django ideas and thoughts are always welcome.

Need your help

By Griatch's Evennia musings (MU* creation with Django+Twisted) from Django community aggregator: Community blog posts. Published on Jun 15, 2015.

This for all you developers out there who want to make a game with Evennia but are not sure about what game to make or where to start off.

We need an example game

One of the main critiques Evennia get from newbies is the lack of an (optional) full game implementation to use as an example and base to build from. So, Evennia needs a full, BSD-licensed example game. I'm talking "diku-like", something you could in principle hook up and allow players into within minutes of installing Evennia. The Tutorial world we already have is a start but it is more of a solo quest, it's not designed to be a full multiplayer game. Whereas Evennia supports other forms of MU* too, the idea is that the systems from a more "code-heavy" MUD could easily be extracted and adopted to a more freeform-style game whereas the reverse is not generally true.

The exact structure of such a game would be up to the person or team taking this on, but it should be making use of Evennia's api and come distributed as a custom game folder (the folder you get with evennia --init). We will set this up as a separate repository under the Evennia github organisation - a spin-off from the main evennia project, and maintained separately.

We need you!


Thing is, while I am (and, I'm sure other Evennia core devs) certainly willing to give considerable help and input on such a project, it's not something I have time to take the lead on myself. So I'm looking for enthusiastic coders who would be willing to step up to both help and take the lead on this; both designing and (especially) coding such an example game. Even if you have your own game in mind for the future, you still need to build most of these systems, so starting with a generic system will still help you towards that final goal - plus you get to be immortalized in the code credits, of course.


Suggestion for game

Being an example game, it should be well-documented and following good code practices (this is something we can always fix and adjust as we go though). The systems should be designed as stand-alone/modular as possible to make them easy to rip out and re-purpose (you know people will do so anyway). These are the general features I would imagine are needed (they are open to discussion):
  • Generic Tolkien-esque fantasy theme (lore is not the focus here, but it can still be made interesting)
  • Character creation module
  • Races (say, 2-3)
  • Classes (say 2-3)
  • Attributes and Skills (based on D&D? Limit number of skills to the minimal set)
  • Rule module for making skill checks, rolls etc (D&D rules?)
  • Combat system (twitch? Turn-based?)
  • Mobs, both friendly and aggressive, with AI
  • Trade with NPC / other players (money system)
  • Quest system
  • Eventual new GM/admin tools as needed
  • Small game world (batch-built) to demonstrate all features (of good quality to show off)
  • More? Less?

I'm interested!

Great! We are as a first step looking for a driven lead dev for this project, a person who has the enthusiasm, coding experience and drive to see the project through and manage it. You will (hopefully) get plenty of collaborators willing to help out but It is my experience that a successful hobby project really needs at least one person taking responsibility to "lead the charge" and having the final say on features: Collaborative development can otherwise easily mean that everyone does their own thing or cannot agree on a common course. This would be a spin-off from the main Evennia project and maintained separately as mentioned above.

Reply to this thread if you are willing to participate at any level to the project, including chipping in with code from your already ongoing development. I don't know if there'd be any "competition" over the lead-dev position but if multiple really enthusiastic and willing devs step forward we'll handle that then.

So get in touch!

Is Open Source Consulting Dead?

By chrism from . Published on Sep 10, 2013.

Has Elvis left the building? Will we be able to sustain ourselves as open source consultants?

Consulting and Patent Indemification

By chrism from . Published on Aug 09, 2013.

Article about consulting and patent indemnification

Python Advent Calendar 2012 Topic

By chrism from . Published on Dec 24, 2012.

An entry for the 2012 Japanese advent calendar at http://connpass.com/event/1439/

Why I Like ZODB

By chrism from . Published on May 15, 2012.

Why I like ZODB better than other persistence systems for writing real-world web applications.

A str. __iter__ Gotcha in Cross-Compatible Py2/Py3 Code

By chrism from . Published on Mar 03, 2012.

A bug caused by a minor incompatibility can remain latent for long periods of time in a cross-compatible Python 2 / Python 3 codebase.

In Praise of Complaining

By chrism from . Published on Jan 01, 2012.

In praise of complaining, even when the complaints are absurd.

2012 Python Meme

By chrism from . Published on Dec 24, 2011.

My "Python meme" replies.

In Defense of Zope Libraries

By chrism from . Published on Dec 19, 2011.

A much too long defense of Pyramid's use of Zope libraries.

Plone Conference 2011 Pyramid Sprint

By chrism from . Published on Nov 10, 2011.

An update about the happenings at the recent 2011 Plone Conference Pyramid sprint.

Jobs-Ification of Software Development

By chrism from . Published on Oct 17, 2011.

Try not to Jobs-ify the task of software development.

WebOb Now on Python 3

By chrism from . Published on Oct 15, 2011.

Report about porting to Python 3.

Open Source Project Maintainer Sarcastic Response Cheat Sheet

By chrism from . Published on Jun 12, 2011.

Need a sarcastic response to a support interaction as an open source project maintainer? Look no further!

Pylons Miniconference #0 Wrapup

By chrism from . Published on May 04, 2011.

Last week, I visited the lovely Bay Area to attend the 0th Pylons Miniconference in San Francisco.

Pylons Project Meetup / Minicon

By chrism from . Published on Apr 14, 2011.

In the SF Bay Area on the 28th, 29th, and 30th of this month (April), 3 separate Pylons Project events.

PyCon 2011 Report

By chrism from . Published on Mar 19, 2011.

My personal PyCon 2011 Report