Skip to content. | Skip to navigation

Personal tools
Log in
Sections
You are here: Home

Open Source Posts

Shaun M. Thomas: PG Phriday: In the Window

From Planet PostgreSQL. Published on Feb 24, 2017.

I’ll be the first to admit that I found Postgres window functions fantastically confusing when I first encountered them. They’re a powerful and versatile tool for building reports and summaries, but that functionality hides behind a fairly steep learning curve. One of the ways to combat their inherent complexity is to fully explore how they work, instead of just trying to wrap our heads around the expected results.

Window doggies have gotten decidedly smug

To that end, let’s set up a quick set of data in the customary fashion:

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);

INSERT INTO sensor_log (location, reading, reading_date) SELECT s.id % 1000, s.id % 100, CURRENT_DATE + INTERVAL '1d' - ((s.id * 10) || 'm')::INTERVAL FROM generate_series(1, 5000) s(id);

CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);

ANALYZE sensor_log;

Yes, it’s the trusty sensor_log table once again. This time around, we only really care about demonstration in lieu of volume, so we’ve elected for five thousand rows in place of the usual five million. Our data represents one thousand sensors sequentially taking readings every minute. This should provide enough overlap to easily demonstrate what’s going on behind the scenes.

Let’s start with probably the easiest window function of the lot: row_number. All it does is number rows in the result set so we have a kind of counter that’s useful in a lot of different contexts. How do the first ten rows for today look?

SELECT location, reading, row_number() OVER ()
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 LIMIT 10;

location | reading | row_number ----------+---------+------------ 143 | 43 | 1 142 | 42 | 2 141 | 41 | 3 140 | 40 | 4 139 | 39 | 5 138 | 38 | 6 137 | 37 | 7 136 | 36 | 8 135 | 35 | 9 134 | 34 | 10

Window functions must be called on some kind of data window. An empty set of () represents the entire data set, with no ordering, groups, or other shenanigans involved. We’re just numbering the results, and the output would have been no different if we removed the window function. This kind of use is very similar to Oracle’s ROWNUM pseudo-column.

Yet a row number by itself isn’t that interesting. Let’s number the rows in order of sensor reading, and fetch the first ten rows of those results:

SELECT location, reading, row_number() OVER (ORDER BY reading)
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 LIMIT 10;

location | reading | row_number ----------+---------+------------ 100 | 0 | 1 101 | 1 | 2 1 | 1 | 3 102 | 2 | 4 2 | 2 | 5 103 | 3 | 6 3 | 3 | 7 4 | 4 | 8 104 | 4 | 9 105 | 5 | 10

If () was the window representing all rows, then (ORDER BY reading) is that same content after being sorted by the reading column. Not only did Postgres sort our results, but it numbered them in the post-sorted order. This is a very fine distinction! Consider what happens when we move the ORDER BY clause into the query proper.

SELECT location, reading, row_number() OVER ()
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 ORDER BY reading
 LIMIT 10;

location | reading | row_number ----------+---------+------------ 100 | 0 | 44 1 | 1 | 143 101 | 1 | 43 2 | 2 | 142 102 | 2 | 42 3 | 3 | 141 103 | 3 | 41 104 | 4 | 40 4 | 4 | 140 105 | 5 | 39

What the heck happened here? The data looks exactly the same, but the artificial row numbers are seemingly arbitrary. Indeed they are! By design. This is part of the reason window functions are so difficult to explain and comprehend. The fact of the matter is that each window is a virtual and separate manifestation of the plain query results.

Anyone who has struggled with pointers in C or C++ know that abstracted structures introduce certain pitfalls into obtaining desired results.

Window functions and you!

To help unravel the mystery a bit, let’s look at the natural state of the results without any window function nonsense. Of course we must also shift the data by 40 rows so we can see some of the same information the window received.

SELECT location, reading
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 LIMIT 10 OFFSET 40;

location | reading ----------+--------- 103 | 3 102 | 2 101 | 1 100 | 0 99 | 99 98 | 98 97 | 97 96 | 96 95 | 95 94 | 94

This output represents the table rows as sorted by the index we created on reading_date. Since these results are not artificially sorted in any way, this what the window function is actually seeing without its own specific sort operation. We shifted the results by 40 rows and as expected, row 44 has the value of 0. The window function gave us exactly what we requested, but it numbered the rows before Postgres sorted them.

We can actually watch this in action by looking at the two query plans:

EXPLAIN
SELECT location, reading, row_number() OVER (ORDER BY reading)
  FROM sensor_log
 WHERE reading_date > CURRENT_DATE
 LIMIT 10;

QUERY PLAN

Limit (cost=15.91..16.08 rows=10 width=11) -> WindowAgg (cost=15.91..18.41 rows=143 width=11) -> Sort (cost=15.91..16.27 rows=143 width=11) Sort Key: reading -> Index Scan using idx_sensor_log_date on sensor_log (cost=0.29..10.79 rows=143 width=11) Index Cond: (reading_date > ('now'::cstring)::date)

EXPLAIN SELECT location, reading, row_number() OVER () FROM sensor_log WHERE reading_date > CURRENT_DATE ORDER BY reading LIMIT 10;

QUERY PLAN

Limit (cost=15.67..15.69 rows=10 width=11) -> Sort (cost=15.67..16.03 rows=143 width=11) Sort Key: reading -> WindowAgg (cost=0.29..12.58 rows=143 width=11) -> Index Scan using idx_sensor_log_date on sensor_log (cost=0.29..10.79 rows=143 width=11) Index Cond: (reading_date > ('now'::cstring)::date)

Note that the WindowAgg step occurs at the end of query execution in the first example, taking place directly after a sort operation. The second query sorts after the WindowAgg, indicating the window only has access to unsorted rows. The key detail is that window functions only have access to rows within the query as if it had executed without them. This also happens after other aggregates are applied, meaning it’s a bad idea (or even impossible) to mix regular aggregates with window functions.

The easiest way to comprehend how a window function works is to run the query without them. That’s the data the window has access to, regardless of how we slice and dice them within the window itself. It also explains why we’re unable to refer to window function elements in other query clauses. They’re unusable in predicates and we can’t leverage their calculations to group or limit results.

So imagine for a moment we don’t have the standard Postgres LIMIT clause. This is how we would snag the top ten results of our location readings:

SELECT *
  FROM (SELECT location, reading,
               row_number() OVER (ORDER BY reading)
          FROM sensor_log
         WHERE reading_date > CURRENT_DATE
       ) sub
 WHERE row_number <= 10;

location | reading | row_number ----------+---------+------------ 100 | 0 | 1 101 | 1 | 2 1 | 1 | 3 102 | 2 | 4 2 | 2 | 5 103 | 3 | 6 3 | 3 | 7 4 | 4 | 8 104 | 4 | 9 105 | 5 | 10

This is clearly silly when using row_number, but the trick works the same with other window functions. Here’s how we’d obtain the 10th ranked readings for today’s data:

SELECT *
  FROM (SELECT location, reading,
               dense_rank() OVER (ORDER BY reading)
          FROM sensor_log
         WHERE reading_date > CURRENT_DATE
       ) sub
 WHERE dense_rank = 10;

location | reading | dense_rank ----------+---------+------------ 109 | 9 | 10 9 | 9 | 10

Keep in mind that Postgres must fetch the full results internally to materialize them for the window functions. This is true whether there are 143 rows as with our example, or 143-million.

The more advanced use cases for window functions are a topic for another day. Consider this a very high-level introduction to how they work and their inherent limitations instead of a comprehensive guide. There’s a lot of material here that deserves closer inspection, so there’s no need to rush. Either way, don’t let window functions confuse you more than necessary. Like any independent agent, you just need to know what they’re doing behind the scenes.

Like this, but with slightly fewer bombs

Bruce Momjian: Index Order Does Matter

From Planet PostgreSQL. Published on Feb 24, 2017.

Postgres has supported multi-column indexes since 1997, e.g. CREATE INDEX i_test ON test (a, b, c). It can easily use an index if the supplied columns are all at the front of the index, e.g. a and b in the previous index, but it can also use the index if some of the indexed column values are not supplied, e.g. columns a and c in the previous index. It does this by looking up a in the index, then looking through the index for matches of c, ignoring values of b, e.g.

CREATE TABLE test (a INT, b INT, c INT);
INSERT INTO test
        SELECT x, x, x FROM generate_series(1, 100000) AS f(x);
CREATE INDEX i_test ON test(a, b, c);
ANALYZE test;
 
EXPLAIN (COSTS false)
SELECT * FROM test WHERE a = 1 AND c = 1;
              QUERY PLAN
--------------------------------------
 Index Only Scan using i_test on test
   Index Cond: ((a = 1) AND (c = 1))

Continue Reading »

Joshua Drake: Are you ready for Next-Generation Parallel Query?

From Planet PostgreSQL. Published on Feb 23, 2017.

The next version of Parallel Query is set to hit in PostgreSQL v10, and you can learn all about it at PgConf.US next month!

Until then, we invite you to catch up on what Parallel Query can do now and Robert Haas's general thoughts on the feature. Watch the video from PgConf.US 2016 here.

Once you are done make sure and grab a ticket to the largest PostgreSQL conference in North America.

Make Your Developer's Life Easier By Reducing Number Of Opened Tabs in Pycharm

By timonweb.com Django posts from Django community aggregator: Community blog posts. Published on Feb 23, 2017.

When coding, I often find myself overwhelmed by a number of opened tabs in my editor. I just seem to have opened every single .py file of a Django project. And it's kinda hard to find a way in this evergrowing of tabs.

Given that, I was really happy to ...

Read now

Bruce Momjian: Expression Index Statistics and Joins

From Planet PostgreSQL. Published on Feb 22, 2017.

In my previous blog post, I showed how statistics generated on expression indexes can be used to produce more accurate row counts, and potentially better plans. While I did show more accurate row counts via EXPLAIN, I did not show changed query plans. I plan to do so in this blog post. First, the setup:

CREATE TABLE test1 AS
        SELECT * FROM generate_series(1, 100000) AS f(x);
CREATE TABLE test2 AS
        SELECT * FROM generate_series(1, 2) AS f(x);
 
ANALYZE test1;
ANALYZE test2;

Continue Reading »

Python type annotations

By Caktus Consulting Group from Django community aggregator: Community blog posts. Published on Feb 22, 2017.

When it comes to programming, I have a belt and suspenders philosophy. Anything that can help me avoid errors early is worth looking into.

The type annotation support that's been gradually added to Python is a good example. Here's how it works and how it can be helpful.

Introduction

The first important point is that the new type annotation support has no effect at runtime. Adding type annotations in your code has no risk of causing new runtime errors: Python is not going to do any additional type-checking while running.

Instead, you'll be running separate tools to type-check your programs statically during development. I say "separate tools" because there's no official Python type checking tool, but there are several third-party tools available.

So, if you chose to use the mypy tool, you might run:

$ mypy my_code.py

and it might warn you that a function that was annotated as expecting string arguments was going to be called with an integer.

Of course, for this to work, you have to be able to add information to your code to let the tools know what types are expected. We do this by adding "annotations" to our code.

One approach is to put the annotations in specially-formatted comments. The obvious advantage is that you can do this in any version of Python, since it doesn't require any changes to the Python syntax. The disadvantages are the difficulties in writing these things correctly, and the coincident difficulties in parsing them for the tools.

To help with this, Python 3.0 added support for adding annotations to functions (PEP-3107), though without specifying any semantics for the annotations. Python 3.6 adds support for annotations on variables (PEP-526).

Two additional PEPs, PEP-483 and PEP-484, define how annotations can be used for type-checking.

Since I try to write all new code in Python 3, I won't say any more about putting annotations in comments.

Getting started

Enough background, let's see what all this looks like.

Python 3.6 was just released, so I’ll be using it. I'll start with a new virtual environment, and install the type-checking tool mypy (whose package name is mypy-lang).:

$ virtualenv -p $(which python3.6) try_types
$ . try_types/bin/activate
$ pip install mypy-lang

Let's see how we might use this when writing some basic string functions. Suppose we're looking for a substring inside a longer string. We might start with:

def search_for(needle, haystack):
    offset = haystack.find(needle)
    return offset

If we were to call this with anything that's not text, we'd consider it an error. To help us avoid that, let's annotate the arguments:

def search_for(needle: str, haystack: str):
    offset = haystack.find(needle)
    return offset

Does Python care about this?:

$ python search1.py
$

Python is happy with it. There's not much yet for mypy to check, but let's try it:

$ mypy search1.py
$

In both cases, no output means everything is okay.

(Aside: mypy uses information from the files and directories on its command line plus all packages they import, but it only does type-checking on the files and directories on its command line.)

So far, so good. Now, let's call our function with a bad argument by adding this at the end:

search_for(12, "my string")

If we tried to run this, it wouldn't work:

$ python search2.py
Traceback (most recent call last):
    File "search2.py", line 4, in <module>
        search_for(12, "my string")
    File "search2.py", line 2, in search_for
        offset = haystack.find(needle)
TypeError: must be str, not int

In a more complicated program, we might not have run that line of code until sometime when it would be a real problem, and so wouldn't have known it was going to fail. Instead, let's check the code immediately:

$ mypy search2.py
search2.py:4: error: Argument 1 to "search_for" has incompatible type "int"; expected "str"

Mypy spotted the problem for us and explained exactly what was wrong and where.

We can also indicate the return type of our function:

def search_for(needle: str, haystack: str) -> str:
    offset = haystack.find(needle)
    return offset

and ask mypy to check it:

$ mypy search3.py
search3.py: note: In function "search_for":
search3.py:3: error: Incompatible return value type (got "int", expected "str")

Oops, we're actually returning an integer but we said we were going to return a string, and mypy was smart enough to work that out. Let's fix that:

def search_for(needle: str, haystack: str) -> int:
    offset = haystack.find(needle)
    return offset

And see if it checks out:

$ mypy search4.py
$

Now, maybe later on we forget just how our function works, and try to use the return value as a string:

x = len(search_for('the', 'in the string'))

Mypy will catch this for us:

$ mypy search5.py
search5.py:5: error: Argument 1 to "len" has incompatible type "int"; expected "Sized"

We can't call len() on an integer. Mypy wants something of type Sized -- what's that?

More complicated types

The built-in types will only take us so far, so Python 3.5 added the typing module, which both gives us a bunch of new names for types, and tools to build our own types.

In this case, typing.Sized represents anything with a __len__ method, which is the only kind of thing we can call len() on.

Let's write a new function that'll return a list of the offsets of all of the instances of some string in another string. Here it is:

from typing import List

def multisearch(needle: str, haystack: str) -> List[int]:
    # Not necessarily the most efficient implementation
    offset = haystack.find(needle)
    if offset == -1:
        return []
    return [offset] + multisearch(needle, haystack[offset+1:])

Look at the return type: List[int]. You can define a new type, a list of a particular type of elements, by saying List and then adding the element type in square brackets.

There are a number of these - e.g. Dict[keytype, valuetype] - but I'll let you read the documentation to find these as you need them.

mypy passed the code above, but suppose we had accidentally had it return None when there were no matches:

def multisearch(needle: str, haystack: str) -> List[int]:
    # Not necessarily the most efficient implementation
    offset = haystack.find(needle)
    if offset == -1:
        return None
    return [offset] + multisearch(needle, haystack[offset+1:])

mypy should spot that there's a case where we don't return a list of integers, like this:

$ mypy search6.py
$

Uh-oh - why didn't it spot the problem here? It turns out that by default, mypy considers None compatible with everything. To my mind, that's wrong, but luckily there's an option to change that behavior:

$ mypy --strict-optional search6.py
search6.py: note: In function "multisearch":
search6.py:7: error: Incompatible return value type (got None, expected List[int])

I shouldn't have to remember to add that to the command line every time, though, so let's put it in a configuration file just once. Create mypy.ini in the current directory and put in:

[mypy]
strict_optional = True

And now:

$ mypy search6.py
search6.py: note: In function "multisearch":
search6.py:7: error: Incompatible return value type (got None, expected List[int])

But speaking of None, it's not uncommon to have functions that can either return a value or None. We might change our search_for method to return None if it doesn't find the string, instead of -1:

def search_for(needle: str, haystack: str) -> int:
    offset = haystack.find(needle)
    if offset == -1:
        return None
    else:
        return offset

But now we don't always return an int and mypy will rightly complain:

$ mypy search7.py
search7.py: note: In function "search_for":
search7.py:4: error: Incompatible return value type (got None, expected "int")

When a method can return different types, we can annotate it with a Union type:

from typing import Union

def search_for(needle: str, haystack: str) -> Union[int, None]:
    offset = haystack.find(needle)
    if offset == -1:
        return None
    else:
        return offset

There's also a shortcut, Optional, for the common case of a value being either some type or None:

from typing import Optional

def search_for(needle: str, haystack: str) -> Optional[int]:
    offset = haystack.find(needle)
    if offset == -1:
        return None
    else:
        return offset

Wrapping up

I've barely touched the surface, but you get the idea.

One nice thing is that the Python libraries are all annotated for us already. You might have noticed above that mypy knew that calling find on a str returns an int - that's because str.find is already annotated. So you can get some benefit just by calling mypy on your code without annotating anything at all -- mypy might spot some misuses of the libraries for you.

For more reading:

New in Django 1.11: Template-based widget rendering

By timonweb.com Django posts from Django community aggregator: Community blog posts. Published on Feb 22, 2017.

There's a very interesting change coming with Django 1.11 - To make customizing widgets easier, form widget rendering is now done using the template system. In older versions, widgets were rendered using Python. You can get more details about it from The form rendering API docs page: https://docs.djangoproject ...

Read now

Kaarel Moppel: PostgreSQL and reporting software

From Planet PostgreSQL. Published on Feb 22, 2017.

I was recently asked if Postgres was “covered” in the area of common reporting needs? By reporting I mean ad hoc or predefined chart generation,  mostly drag-and-drop style software, used most commonly by Business Intelligence people. The answer is of course “yes” – you can do reporting on PostgreSQL as easily as with other popular […]

The post PostgreSQL and reporting software appeared first on Cybertec - The PostgreSQL Database Company.

Pavan Deolasee: Looking forward to PGConf India 2017

From Planet PostgreSQL. Published on Feb 21, 2017.

It has taken a lot of planning and efforts, but I’m happy to see that PGConf India 2017 is coming along very well and promises to be a great event. This is our third year in a row in Bengaluru, but there are many distinguishing factors that make this year stand out:

  1. For the first time, we’re having a dedicated training day ahead of the main conference. All seats for the trainings are sold out and we had to turn away many interested folks.
  2. For the first time, we’re having a multi-track conference.
  3. And for the first time, we’ve crossed 225 conference delegate registrations, and there are still a few days left.

When we started planning for the conference, we were a bit nervous about whether we could get a good response to the training programme, but we’re almost stumped by the positive response we’ve received. It clearly shows PostgreSQL’s growing popularity and how companies and developers are adopting this technology.

The keynote this year will be delivered by Simon Riggs. He will share his thoughts on why “Persistence” is “Key to PostgreSQL’s Success“. Other than that, there are several topics which will interest developers and administrators, as well as IT managers. So if you want to know how to extract maximum performance from your PostgreSQL database or how to port from Oracle to PostgreSQL or want to know more about parallel query processing, then you must attend the conference. You’ll get to hear from not only some of the best developers, but also some very interesting case studies from real users of the database.

What started as a small meetup of interested folks 4 years back has now turned into a full scale conference with a large venue, great sponsors and amazing list of speakers.

Visit the conference website to know more details or contact the organisers at contact@pgconf.in. See you at the conference.

How to Add reCAPTCHA to a Django Site

By Simple is Better Than Complex from Django community aggregator: Community blog posts. Published on Feb 20, 2017.

Google’s reCAPTCHA is a very popular solution to protect your application or website against bots and spam. It is fairly simple to implement. In this tutorial you will find a working example using only built-in libraries, an alternative using requests and also an implementation using decorators, to reuse the reCAPTCHA verification across your application.


Setup

First thing, register your application in the reCAPTCHA admin.

reCAPTCHA Admin

I added the 127.0.0.1 IP address as my domain for testing purpose. Here you are supposed to add your website domain.

After registering your website, you will be handed a Site key and a Secret key. The Site key will be used in the reCAPTCHA widget which is rendered within the page where you want to place it. The Secret key will be stored safely in the server, made available through the settings.py module.

settings.py

GOOGLE_RECAPTCHA_SECRET_KEY = '6LdRSRYUAAAAAOnk5yomm1dI9BmQkJWTg_wIlMJ_'

PS: It is not a good idea to keep this kind of information directly in the settings.py. I’m adding it here so the example is more explicit. Please refer to this article Package of the Week: Python Decouple to learn how to separate configuration from settings, and keep sensitive information in a safe place.


Implementing the reCAPTCHA

Let’s say we want to add a reCAPTCHA in a comment section of a website. Inside the form you are currently using to post the data, add the code provided by reCAPTCHA Admin page:

{% extends 'base.html' %}

{% block content %}
  <form method="post">
    {% csrf_token %}
    {{ form.as_p }}

    <script src='https://www.google.com/recaptcha/api.js'></script>
    <div class="g-recaptcha" data-sitekey="6LdRSRYUAAAAAFCqQ1aZnYfRGJIlAUMX3qkUWlcF"></div>

    <button type="submit" class="btn btn-primary">Post</button>
  </form>
{% endblock %}

Make sure you change the data-sitekey with the correct key for your website. You may also place the script tag in the <head> of your template, or in the bottom of the page (depending on how you are organizing the assets).

Just by adding the tags, the reCAPTCHA widget will already show up.

reCAPTCHA


Validating the reCAPTCHA

Next step is to actually validate the data. It is done by making a POST request to the endpoint https://www.google.com/recaptcha/api/siteverify, containing your Secret key and the data from the reCAPTCHA widget, which is identified by g-recaptcha-response.

Python 2 Solution without Third Party Libraries

You can validate it directly in the view function, using just built-in libs:

views.py

import urllib
import urllib2
import json

from django.shortcuts import render, redirect
from django.conf import settings
from django.contrib import messages

from .models import Comment
from .forms import CommentForm


def comments(request):
    comments_list = Comment.objects.order_by('-created_at')

    if request.method == 'POST':
        form = CommentForm(request.POST)
        if form.is_valid():

            ''' Begin reCAPTCHA validation '''
            recaptcha_response = request.POST.get('g-recaptcha-response')
            url = 'https://www.google.com/recaptcha/api/siteverify'
            values = {
                'secret': settings.GOOGLE_RECAPTCHA_SECRET_KEY,
                'response': recaptcha_response
            }
            data = urllib.urlencode(values)
            req = urllib2.Request(url, data)
            response = urllib2.urlopen(req)
            result = json.load(response)
            ''' End reCAPTCHA validation '''

            if result['success']:
                form.save()
                messages.success(request, 'New comment added with success!')
            else:
                messages.error(request, 'Invalid reCAPTCHA. Please try again.')

            return redirect('comments')
    else:
        form = CommentForm()

    return render(request, 'core/comments.html', {'comments': comments_list, 'form': form})

Basically result['success'] will return True or False, defining if the reCAPTCHA is valid or not.

Python 3 Solution without Third Party Libraries

views.py

import json
import urllib

from django.shortcuts import render, redirect
from django.conf import settings
from django.contrib import messages

from .models import Comment
from .forms import CommentForm


def comments(request):
    comments_list = Comment.objects.order_by('-created_at')

    if request.method == 'POST':
        form = CommentForm(request.POST)
        if form.is_valid():

            ''' Begin reCAPTCHA validation '''
            recaptcha_response = request.POST.get('g-recaptcha-response')
            url = 'https://www.google.com/recaptcha/api/siteverify'
            values = {
                'secret': settings.GOOGLE_RECAPTCHA_SECRET_KEY,
                'response': recaptcha_response
            }
            data = urllib.parse.urlencode(values).encode()
            req =  urllib.request.Request(url, data=data)
            response = urllib.request.urlopen(req)
            result = json.loads(response.read().decode())
            ''' End reCAPTCHA validation '''

            if result['success']:
                form.save()
                messages.success(request, 'New comment added with success!')
            else:
                messages.error(request, 'Invalid reCAPTCHA. Please try again.')

            return redirect('comments')
    else:
        form = CommentForm()

    return render(request, 'core/comments.html', {'comments': comments_list, 'form': form})
Alternative Solution With a Third Party Library

If you don’t mind adding an extra dependency, install the requests library:

pip install requests

Then you can make the POST in a relatively easier way:

views.py

import requests

from django.shortcuts import render, redirect
from django.conf import settings
from django.contrib import messages

from .models import Comment
from .forms import CommentForm


def comments(request):
    comments_list = Comment.objects.order_by('-created_at')

    if request.method == 'POST':
        form = CommentForm(request.POST)
        if form.is_valid():

            ''' Begin reCAPTCHA validation '''
            recaptcha_response = request.POST.get('g-recaptcha-response')
            data = {
                'secret': settings.GOOGLE_RECAPTCHA_SECRET_KEY,
                'response': recaptcha_response
            }
            r = requests.post('https://www.google.com/recaptcha/api/siteverify', data=data)
            result = r.json()
            ''' End reCAPTCHA validation '''

            if result['success']:
                form.save()
                messages.success(request, 'New comment added with success!')
            else:
                messages.error(request, 'Invalid reCAPTCHA. Please try again.')

            return redirect('comments')
    else:
        form = CommentForm()

    return render(request, 'core/comments.html', {'comments': comments_list, 'form': form})

reCAPTCHA Decorator

This is an extra for this post. This is just an idea of what you can do, to reuse the reCAPTCHA verification code across the project.

decorators.py

from functools import wraps

from django.conf import settings
from django.contrib import messages

import requests

def check_recaptcha(view_func):
    @wraps(view_func)
    def _wrapped_view(request, *args, **kwargs):
        request.recaptcha_is_valid = None
        if request.method == 'POST':
            recaptcha_response = request.POST.get('g-recaptcha-response')
            data = {
                'secret': settings.GOOGLE_RECAPTCHA_SECRET_KEY,
                'response': recaptcha_response
            }
            r = requests.post('https://www.google.com/recaptcha/api/siteverify', data=data)
            result = r.json()
            if result['success']:
                request.recaptcha_is_valid = True
            else:
                request.recaptcha_is_valid = False
                messages.error(request, 'Invalid reCAPTCHA. Please try again.')
        return view_func(request, *args, **kwargs)
    return _wrapped_view

Then you can use it like this:

views.py

from django.shortcuts import render, redirect
from django.conf import settings
from django.contrib import messages

from .decorators import check_recaptcha
from .models import Comment
from .forms import CommentForm


@check_recaptcha
def comments(request):
    comments_list = Comment.objects.order_by('-created_at')

    if request.method == 'POST':
        form = CommentForm(request.POST)
        if form.is_valid() and request.recaptcha_is_valid:
            form.save()
            messages.success(request, 'New comment added with success!')
            return redirect('comments')
    else:
        form = CommentForm()

    return render(request, 'core/comments.html', {'comments': comments_list, 'form': form})

That’s about it! Hope you can find it useful somehow. Google’s reCAPTCHA is a very common solution to avoid spam, bots, and can also be used to mitigate brute force attacks on login pages for example.

As usual, all the source code is available on GitHub so you can try it by yourself. Make sure you register your own application to get valid Site key and Secret key.

github.com/sibtc/simple-recaptcha

Alexey Lesovsky: Deep dive into postgres stats: Introduction

From Planet PostgreSQL. Published on Feb 20, 2017.

Everything you always wanted to know about PostgreSQL stats


This blogpost is all about postgres activity stats. Stats are the very important since they allow us to understand what's going on with the system. With that in mind, stats also have some weak points and that’s what I would like to discuss here.

In these blog series I will try to explain how to use stats effectively and how to detect and solve problems with their help. I will start with an overview and will move on discussion on specific usages with receipts and built in tools.

I hope my posts will provide you with some hands-on tools that will help you using postgres stats and will reassure you that stats aren’t as scary as they seems at first glance.
What is postgres? For people who are not familiar with postgres, it's a bunch of processes in the 'ps auxf' output. What they can see are operating system’s metrics, such as CPU usage, memory or swap consumed by these processes, but nothing more. This, however, is not sufficient to effectively troubleshoot postgres. To do that one needs to know how postgres stats are collected throughout postgres lifetime and to also be able to properly use them.

If we look inside postgres and try to understand what it consists of, we see that there are many subsystems that work together and problems in one part may cause problems in the other parts. Generally, postgres can be splitted with two abstract parts, first is the servicing clients and second is the background service operations, e.g. vacuums, checkpoints, write ahead logs. Thus slow down in background operations will negatively affect servicing clients tasks and vice versa. Postgres activity stats are used for observing, predicting or eliminating possible problems, and almost all postgres parts have specific stats which describe what's going on there.

Even provided all these strengths, stats also have a few weak points. First, is the fact that there are so many of them and one should know which source to use in each particular case. Second, almost all stats are provided as permanently incremented counters and often you can see billion values that say nothing to you. Third, stats don't provide history, so there is no built-in way to check what happened five, ten or thirty minutes ago. And last and the least is that postgres doesn't provide handy tool for working with stats, and end-user has to use only sql clients, such as psql.

Despite all that, stats are the first thing which can help you to troubleshoot postgres. Let's take a closer look. Stats provide different types of information, these are
  1. Events that happened in postgres instance (counters), such as table or index operations, number of commits and rollbacks, such as block hits, reads etc.
  2. Database's objects properties (current values), e.g. transactions or queries start time and states and relations sizes.
  3. Time spent for reading and writing operations (counters).
Tracking stats are enabled by default and there are no additional steps for configuring them (except for stats that are provided by contrib modules). Stats interface based on functions and views, these functions and views available by default in all existing and new created databases. Thus getting stats is possible using psql or any other client and SELECT queries to these views.

As mentioned above, there are many stats functions and views and in addition postgres has multiple subsystems. This information can be represented in the following diagram.
postgresql observability

In my next post in this series I will be focusing on particular stats view and will explain what kind of problems it allows to solve and how to do it.

Why You Should Pin Your Dependencies by My Mistakes

By GoDjango - Django Screencasts from Django community aggregator: Community blog posts. Published on Feb 20, 2017.

Have you ever been bitten by not pinning your dependencies in your django project? If not be glad, and come learn from my problems.

Pinning your dependencies is important to solve future unknown issues, better the devil you know and all that.

In this weeks video I talk about 3 times I had issues. They are either not pinning my dependencies, a weird edge case with pinning and python, and not really understanding what I was doing with pinned dependencies.

Why You Should Pin Your Dependencies

Bruce Momjian: Statistics on Expression Indexes

From Planet PostgreSQL. Published on Feb 20, 2017.

Most people know that Postgres allows the creation of indexes on expressions. This is helpful if you need index lookups of expressions used in WHERE clauses.

However, there is another benefit to expression indexes, and that is optimizer statistics. Not only do expression indexes allow rapid lookups of matching expressions, but they also provide optimizer statistics, which improve row estimates and hence query plans. Here is an example:

CREATE TABLE test (x INTEGER);
INSERT INTO test SELECT x FROM generate_series(1, 100) AS t(x);
 
ANALYZE test;
 
SELECT COUNT(*) FROM test WHERE x % 2 = 1;
 count
-------
    50
 
EXPLAIN SELECT * FROM test WHERE x % 2 = 1;
                     QUERY PLAN
----------------------------------------------------
 Seq Scan on test  (cost=0.00..2.50 rows=1 width=4)
   Filter: ((x % 2) = 1)

Continue Reading »

Hubert 'depesz' Lubaczewski: Getting first and last values per group

From Planet PostgreSQL. Published on Feb 20, 2017.

Every so often someone needs solution to getting first (or couple of first) values for given column. Or last. For some cases (when there is not many groups) you can use recursive queries. But it's not always the best choice. Let's try to implement first() and last() aggregates, so these could be easily used by […]

Pavel Stehule: new command line tool pgimportdoc

From Planet PostgreSQL. Published on Feb 19, 2017.

More, more times I had to import some XML documents to Postgres. How do it simply? More, some XML documents are not in UTF8 encoding, so some conversion and modification is necessary.

<?xml version="1.0" encoding="windows-1250"?>
<enprimeur>
<vino>
<id>1</id>
<nazev>Alter Ego de Palmer</nazev>
<vyrobce>63</vyrobce>
<rocnik>2012</rocnik>
<cena0375>0</cena0375>
<cena1500></cena1500>
<cena3000>0</cena3000>

It is not a hard work, but it is manual work, and it is terrible work, because PostgreSQL has enough functionality, but this functionality is not accessible from psql console.

I wrote simple tool pgimportdoc, that can helps with import any text, json, binary (to bytea field) or XML document to Postgres.

Usage:
cat ~/Stažené/enprimeur.xml | ./pgimportdoc postgres -c 'insert into xmldata values($1)' -t XML
./pgimportdoc postgres -f ~/Stažené/enprimeur.xml -c 'insert into xmldata values($1)' -t XML

Supported formats are XML, BYTEA and TEXT. The TEXT format can be used for text, json, jsonb target formats.

Link: https://github.com/okbob/pgimportdoc

How to Create User Sign Up View

By Simple is Better Than Complex from Django community aggregator: Community blog posts. Published on Feb 17, 2017.

In this tutorial I will cover a few strategies to create Django user sign up/registration. Usually I implement it from scratch. You will see it’s very straightforward.

For the examples I will use an empty Django project named mysite. Inside the mysite folder I created an app named core. So every time you see mysite and/or core, change to the suitable project name and app name.

A brief summary of what you are going to find here:


Basic Sign Up

The most simple way to implement a user sign up is by using the UserCreationForm as it is. This strategy is suitable in case you are using the default Django user, using username to authenticate and is interested only in setting the username and password upon sign up.

urls.py

from django.conf.urls import url
from mysite.core import views as core_views

urlpatterns = [
    ...
    url(r'^signup/$', core_views.signup, name='signup'),
]

views.py

from django.contrib.auth import login, authenticate
from django.contrib.auth.forms import UserCreationForm
from django.shortcuts import render, redirect

def signup(request):
    if request.method == 'POST':
        form = UserCreationForm(request.POST)
        if form.is_valid():
            form.save()
            username = form.cleaned_data.get('username')
            raw_password = form.cleaned_data.get('password1')
            user = authenticate(username=username, password=raw_password)
            login(request, user)
            return redirect('home')
    else:
        form = UserCreationForm()
    return render(request, 'signup.html', {'form': form})

signup.html

{% extends 'base.html' %}

{% block content %}
  <h2>Sign up</h2>
  <form method="post">
    {% csrf_token %}
    {{ form.as_p }}
    <button type="submit">Sign up</button>
  </form>
{% endblock %}

Basically what we are doing here is handling the UserCreationForm processing. Business as usual. After the code reach form.save(), the user is created. But here we need to take a step further: manually authenticate the user. You could redirect the user to the login page, but performing the authentication is good for the user experience.

In order to perform the authentication, we need to grab the raw password from the input that came from POST. The user.password stores the hash, and we can’t use it directly to authenticate.

If the authenticate() function is executed successfully (in this case it will always return a success), it will a user instance (meaning the username and password matches), we can now securely log the user in. It’s done by calling the login() function, passing the request and the user instance as parameter. After that, simply redirect the user to wherever you want.

If you want more control over the signup.html form, you can extract the fields in a for loop:

signup.html

{% extends 'base.html' %}

{% block content %}
  <h2>Sign up</h2>
  <form method="post">
    {% csrf_token %}
    {% for field in form %}
      <p>
        {{ field.label_tag }}<br>
        {{ field }}
        {% if field.help_text %}
          <small style="color: grey">{{ field.help_text }}</small>
        {% endif %}
        {% for error in field.errors %}
          <p style="color: red">{{ error }}</p>
        {% endfor %}
      </p>
    {% endfor %}
    <button type="submit">Sign up</button>
  </form>
{% endblock %}

It will look like this:

Sign Up Basic Example

The code is available on GitHub: github.com/sibtc/simple-signup/tree/master/basic-example.


Sign Up With Extra Fields

So, what if I wanted to also get the user’s email address and full name upon sign up?

This strategy will work if you are using the Django user as it is, or if you have extended it using the AbstractUser or AbstractBaseUser. If you extended the Django user using a Profile model, hold tight, we will get there too.

Now we need a step further, we have to extend the UserCreationForm.

forms.py

from django import forms
from django.contrib.auth.forms import UserCreationForm
from django.contrib.auth.models import User


class SignUpForm(UserCreationForm):
    first_name = forms.CharField(max_length=30, required=False, help_text='Optional.')
    last_name = forms.CharField(max_length=30, required=False, help_text='Optional.')
    email = forms.EmailField(max_length=254, help_text='Required. Inform a valid email address.')

    class Meta:
        model = User
        fields = ('username', 'first_name', 'last_name', 'email', 'password1', 'password2', )

Now in the view, just change the form class to use our new SignUpForm.

views.py

from django.contrib.auth import login, authenticate
from django.shortcuts import render, redirect

from mysite.core.forms import SignUpForm

def signup(request):
    if request.method == 'POST':
        form = SignUpForm(request.POST)
        if form.is_valid():
            form.save()
            username = form.cleaned_data.get('username')
            raw_password = form.cleaned_data.get('password1')
            user = authenticate(username=username, password=raw_password)
            login(request, user)
            return redirect('home')
    else:
        form = SignUpForm()
    return render(request, 'signup.html', {'form': form})

It will look like this:

Sign Up Basic Example With Extra Fields

By the way, if you are wondering about the template, I’m using the same signup.html template from the previous example.

The code is available on GitHub: github.com/sibtc/simple-signup/tree/master/extra-fields.


Sign Up With Profile Model

With some tweaks we can also make it work using a profile model. Consider the model definition:

models.py

from django.db import models
from django.contrib.auth.models import User
from django.db.models.signals import post_save
from django.dispatch import receiver

class Profile(models.Model):
    user = models.OneToOneField(User, on_delete=models.CASCADE)
    bio = models.TextField(max_length=500, blank=True)
    location = models.CharField(max_length=30, blank=True)
    birth_date = models.DateField(null=True, blank=True)

@receiver(post_save, sender=User)
def update_user_profile(sender, instance, created, **kwargs):
    if created:
        Profile.objects.create(user=instance)
    instance.profile.save()

In this particular case, the profile is created using a Signal. It’s not mandatory, but usually it is a good way to implement it. You can learn more about extending the user model in this post.

Let’s say we want to also get the birth_date of the user upon sign up. First, let’s update the model form:

forms.py

from django import forms
from django.contrib.auth.forms import UserCreationForm
from django.contrib.auth.models import User

class SignUpForm(UserCreationForm):
    birth_date = forms.DateField(help_text='Required. Format: YYYY-MM-DD')

    class Meta:
        model = User
        fields = ('username', 'birth_date', 'password1', 'password2', )

This form won’t save automatically the birth_date on form.save(). Instead, we have to handle it manually:

views.py

from django.contrib.auth import login, authenticate
from django.shortcuts import render, redirect
from mysite.core.forms import SignUpForm

def signup(request):
    if request.method == 'POST':
        form = SignUpForm(request.POST)
        if form.is_valid():
            user = form.save()
            user.refresh_from_db()  # load the profile instance created by the signal
            user.profile.birth_date = form.cleaned_data.get('birth_date')
            user.save()
            raw_password = form.cleaned_data.get('password1')
            user = authenticate(username=user.username, password=raw_password)
            login(request, user)
            return redirect('home')
    else:
        form = SignUpForm()
    return render(request, 'signup.html', {'form': form})

Because of the Signal handling the Profile creation, we have a synchronism issue here. It is easily solved by calling the user.refresh_from_db() method. This will cause a hard refresh from the database, which will retrieve the profile instance.

If you don’t call user.refresh_from_db(), when you try to access the user.profile, it will return None.

After refreshing it user model, set the cleaned data to the fields that matter, and save the user model. The user save will trigger the profile save as well, that’s why you don’t need to call user.profile.save(), instead you call just user.save().

I updated the birth_date directly in the view, but in case you are dealing with a bigger amount of fields in the profile model, I would suggest creating two forms (say, UserForm and ProfileForm), and process both at once. Check this gist for an example.

The code is available on GitHub: github.com/sibtc/simple-signup/tree/master/profile-model.


Sign Up With Confirmation Mail

This one requires a more complicated setup. Because you know, you need to configure email, need some extra fields, unique links with expiration date, and stuff like that.

I have already published in great detail each one of those parts, so in this post I will just put the pieces together.

For testing purpose, I will be using the console email backend, to debug the code:

settings.py

EMAIL_BACKEND = 'django.core.mail.backends.console.EmailBackend'

If you want to configure a production quality email service, read more in this post: How to Send Email in a Django App.

Create a field to determine if the email is confirmed or not:

models.py

from django.db import models
from django.contrib.auth.models import User
from django.db.models.signals import post_save
from django.dispatch import receiver

class Profile(models.Model):
    user = models.OneToOneField(User, on_delete=models.CASCADE)
    email_confirmed = models.BooleanField(default=False)
    # other fields...

@receiver(post_save, sender=User)
def update_user_profile(sender, instance, created, **kwargs):
    if created:
        Profile.objects.create(user=instance)
    instance.profile.save()

In an previous article I explained how to make use of some Django internal apis to create one time link. You can read more about it here: How to Create a One Time Link (in case you want to learn more how it works).

For our use case, create a new module named tokens.py.

tokens.py

from django.contrib.auth.tokens import PasswordResetTokenGenerator
from django.utils import six

class AccountActivationTokenGenerator(PasswordResetTokenGenerator):
    def _make_hash_value(self, user, timestamp):
        return (
            six.text_type(user.pk) + six.text_type(timestamp) +
            six.text_type(user.profile.email_confirmed)
        )

account_activation_token = AccountActivationTokenGenerator()

We basically extended the PasswordResetTokenGenerator to create a unique token generator to confirm email addresses. This make use of your project’s SECRET_KEY, so it is a pretty safe and reliable method.

Now we need to enforce the email registration on sign up.

forms.py

from django import forms
from django.contrib.auth.forms import UserCreationForm
from django.contrib.auth.models import User


class SignUpForm(UserCreationForm):
    email = forms.EmailField(max_length=254, help_text='Required. Inform a valid email address.')

    class Meta:
        model = User
        fields = ('username', 'email', 'password1', 'password2', )

In the view processing we will no longer authenticate the user, instead we will email him/her the account activation email:

views.py

from django.contrib.sites.shortcuts import get_current_site
from django.shortcuts import render, redirect
from django.utils.encoding import force_bytes
from django.utils.http import urlsafe_base64_encode
from django.template.loader import render_to_string
from mysite.core.forms import SignUpForm
from mysite.core.tokens import account_activation_token

def signup(request):
    if request.method == 'POST':
        form = SignUpForm(request.POST)
        if form.is_valid():
            user = form.save(commit=False)
            user.is_active = False
            user.save()
            current_site = get_current_site(request)
            subject = 'Activate Your MySite Account'
            message = render_to_string('account_activation_email.html', {
                'user': user,
                'domain': current_site.domain,
                'uid': urlsafe_base64_encode(force_bytes(user.pk)),
                'token': account_activation_token.make_token(user),
            })
            user.email_user(subject, message)
            return redirect('account_activation_sent')
    else:
        form = SignUpForm()
    return render(request, 'signup.html', {'form': form})

Basically this is the same strategy for password reset. Notice that I’m changing the user.is_active to False, so the user can’t log in before confirming the email address. After that, we send the email for the user. See below the contents of the email template.

account_activation_email.html

{% autoescape off %}
Hi {{ user.username }},

Please click on the link below to confirm your registration:

http://{{ domain }}{% url 'activate' uidb64=uid token=token %}
{% endautoescape %}

And here is the routes you will need:

urls.py

from django.conf.urls import url
from mysite.core import views as core_views

urlpatterns = [
    url(r'^account_activation_sent/$', core_views.account_activation_sent, name='account_activation_sent'),
    url(r'^activate/(?P<uidb64>[0-9A-Za-z_\-]+)/(?P<token>[0-9A-Za-z]{1,13}-[0-9A-Za-z]{1,20})/$',
        core_views.activate, name='activate'),
]

The account_activation_sent view is a simple HTML template:

Sign Up Confirmation Link

This page is showed right after the user submit the sign up form. Then, at the same time, the user will receive an email with the link:

Sign Up Confirmation Link

By clicking in the link, the user is sent to the activate view:

views.py

from django.contrib.auth import login
from django.contrib.auth.models import User
from django.shortcuts import render, redirect
from django.utils.encoding import force_text
from django.utils.http import urlsafe_base64_decode
from mysite.core.tokens import account_activation_token

def activate(request, uidb64, token):
    try:
        uid = force_text(urlsafe_base64_decode(uidb64))
        user = User.objects.get(pk=uid)
    except (TypeError, ValueError, OverflowError, User.DoesNotExist):
        user = None

    if user is not None and account_activation_token.check_token(user, token):
        user.is_active = True
        user.profile.email_confirmed = True
        user.save()
        login(request, user)
        return redirect('home')
    else:
        return render(request, 'account_activation_invalid.html')

Here we do all the magic, checking if the user exists, if the token is valid. If everything checks, we switch the flags is_active and email_confirmed to True and log the user in.

By changing the value of the email_confirmed field, it will cause the link to be invalidated.

The code is available on GitHub: github.com/sibtc/simple-signup/tree/master/confirmation-email.


That’s it! I hope you enjoyed this post. In a future post I can explore the third party libraries that help the registration process. But usually I prefer to implement it by myself, and avoid an extra dependency.

All the four examples are available on GitHub in the same repository:

github.com/sibtc/simple-signup

Caktus Attends Wagtail CMS Sprint in Reykjavik

By Caktus Consulting Group from Django community aggregator: Community blog posts. Published on Feb 17, 2017.

Caktus CEO Tobias McNulty and Sales Engineer David Ray recently had the opportunity to attend a development sprint for the Wagtail Content Management System (CMS) in Reykjavik, Iceland. The two-day software development sprint attracted 15 attendees hailing from a total of 5 countries across North America and Europe.

Wagtail sprinters in Reykjavik

Wagtail was originally built for the Royal College of Art by UK-firm Torchbox and is now one of the fastest-growing open source CMSs available. Being longtime champions of the Django framework, we’re also thrilled that Wagtail is Django-based. This makes Wagtail a natural fit for content-heavy sites that might still benefit from the customization made possible through the CMS’ Django roots.

Tobias & Tom in Reykjavik

The team worked on a wide variety of projects, including caching optimizations, an improved content model, a new React-based page explorer, the integration of a new rich-text editor (Draft.js), performance enhancements, other new features, and bug fixes.

David & Scot in Reykjavik

Team Wagtail Bakery stole the show with a brand-new demo site that’s visually appealing and better demonstrates the level of customization afforded by the Wagtail CMS. The new demo site, which is still in development as of the time of this post, can be found at wagtail/bakerydemo on GitHub.

Wagtail Bakery on laptop screen

After the sprint was over, our hosts at Overcast Software were kind enough to take us on a personalized tour of the countryside around Reykjavik. We left Iceland with significant progress on a number of pull requests on Wagtail, new friends, and a new appreciation for the country's magical landscapes.

Wagtail sprinters on road trip, in front of waterfall

We were thrilled to attend and are delighted to be a part of the growing Wagtail community. If you're interested in participating in the next Wagtail sprint, it is not far away. Wagtail Space is taking place in Arnhem, The Netherlands March 21st-25th and is being organized to accommodate both local and remote sprinters. We hope to connect with you then!

Shaun M. Thomas: PG Phriday: Getting Assertive

From Planet PostgreSQL. Published on Feb 17, 2017.

There are a lot of languages available for authoring Postgres functions, but there’s nothing quite like the the classic PL/pgSQL. It’s SQL! It’s not SQL! It’s a kind of horrifying mutant crossbreed suitable only for terrifying small children and generating complex reports from a cavalcade of dubious sources! And deep within its twisted entrails is an often overlooked feature usually only available in far more mature entities.

Just wait until they add sharding

That’s right, it’s obvious we’re referring to the ASSERT statement.

When we say “often overlooked”, we’re not kidding. The pertinent documentation offers a meager overview following a far more comprehensive summary of the RAISE statement. We thought it deserved better.

Let’s begin with a simple table example that can form the basis of a more complex function:

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);

INSERT INTO sensor_log (location, reading, reading_date) SELECT s.id % 1000, s.id % 100, CURRENT_DATE - ((s.id * 10) || 's')::INTERVAL FROM generate_series(1, 5000000) s(id);

CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);

ANALYZE sensor_log;

What can we do with five million rows of simulated data from 1000 sensors that collect data every ten seconds? A good place to start is a summary or report table of some kind. We’re only doing cursory analysis, so we should begin with aggregating the daily average, minimum, and maximum. If we get adventurous later, we can add standard deviations, jitter, deltas, and other fancy logic report owners might find useful.

Since this type of data is easy to tabulate, we require one single table and a function to manage its contents.

CREATE TABLE sensor_report (
  id            SERIAL PRIMARY KEY,
  location      VARCHAR NOT NULL,
  reading_date  DATE NOT NULL,
  avg_reading   BIGINT NOT NULL,
  min_reading   BIGINT NOT NULL,
  max_reading   BIGINT NOT NULL
);

CREATE INDEX idx_sensor_report_date ON sensor_report (reading_date);

CREATE OR REPLACE FUNCTION sp_trickle_report() RETURNS BIGINT AS $$ DECLARE last_update DATE; insert_count BIGINT; BEGIN -- Find the most recent data in the report so we only -- refresh those contents.

SELECT INTO last_update max(reading_date) FROM sensor_report;

-- Bootstrap the report table if it's currently empty.

IF last_update IS NULL THEN SELECT INTO last_update min(reading_date) FROM sensor_log; END IF;

-- Since data comes in regularly, we only need to rebuild -- current data. Delete old aggregates so we can replace them.

DELETE FROM sensor_report WHERE reading_date >= last_update;

-- Perform the insert in a CTE so we can capture the count of -- inserted rows for further analysis or to return to our caller.

WITH capture AS ( INSERT INTO sensor_report ( location, reading_date, avg_reading, min_reading, max_reading ) SELECT location, reading_date::DATE, avg(reading), min(reading), min(reading) FROM sensor_log WHERE reading_date >= last_update GROUP BY location, reading_date::DATE RETURNING * ) SELECT count(*) INTO insert_count FROM capture;

RETURN insert_count; END; $$ LANGUAGE plpgsql;

SELECT sp_trickle_report();

sp_trickle_report

        579000

ANALYZE sensor_report;

Thankfully this is a very simple (and somewhat naive) reporting function. Imagine this is our first iteration and we want to debug some of its activity. If we were using Python for example, we might pepper the code with assert statements or craft specific tests accomplish that same goal. Well PL/pgSQL allows us to do just that.

For the second iteration, let’s assume we want to avoid bootstrapping the report table if it’s empty. After all, bootstrapping only needs to be done once, so why pollute the function with that edge case? Besides, it’s entirely possible we might require a separate procedure to initialize the reporting table, and isolated functionality is generally less buggy.

If the table isn’t bootstrapped, it’s probably because we’re in a testing environment that is missing data. In this case, it makes sense to complain so the missing data is loaded, even though it’s not strictly an erroneous scenario. So let’s replace the bootstrap with an assertion. While we’re at it, let’s say a new requirement from the upstream code suggests a minimum of 2000 sensors are required for meaningful output. In such a case, we might need to revise our fixture data in the development and QA environments.

Here’s how that might look:

CREATE OR REPLACE FUNCTION sp_trickle_report()
RETURNS BIGINT AS
$$
DECLARE
  last_update DATE;
  insert_count BIGINT;
BEGIN
  -- Find the most recent data in the report so we only
  -- refresh those contents.

SELECT INTO last_update max(reading_date) FROM sensor_report;

ASSERT last_update IS NOT NULL, 'Need to bootstrap report table.';

-- Since data comes in regularly, we only need to rebuild -- current data. Delete old aggregates so we can replace them.

DELETE FROM sensor_report WHERE reading_date >= last_update;

-- Perform the insert in a CTE so we can capture the count of -- inserted rows for further analysis or to return to our caller.

WITH capture AS ( INSERT INTO sensor_report ( location, reading_date, avg_reading, min_reading, max_reading ) SELECT location, reading_date::DATE, avg(reading), min(reading), min(reading) FROM sensor_log WHERE reading_date >= last_update GROUP BY location, reading_date::DATE RETURNING * ) SELECT count(*) INTO insert_count FROM capture;

ASSERT insert_count >= 2000, 'Not enough current sensor activity!';

RETURN insert_count; END; $$ LANGUAGE plpgsql;

The function itself isn’t that much different; the important part is the result of our changes. The only truly meaningful modification is that the function no longer initializes the report table if it was empty. Ignoring that, we can treat the function exactly as before, as no output will change, and no new failure conditions are triggered.

This is due to the fact that assertions are ignored by default. Remember, they’re primarily intended for debugging purposes, so they must be explicitly enabled. We can do this by making an entry in postgresql.conf, but we can also activate them within our current session.

Let’s use our new assertions:

SET plpgsql.check_asserts = TRUE;

TRUNCATE TABLE sensor_report;

SELECT sp_trickle_report();

ERROR: Need to bootstrap report table. CONTEXT: PL/pgSQL function sp_trickle_report() line 12 at ASSERT

INSERT INTO sensor_report ( location, reading_date, avg_reading, min_reading, max_reading ) SELECT location, reading_date::DATE, avg(reading), min(reading), min(reading) FROM sensor_log GROUP BY location, reading_date::DATE;

ANALYZE sensor_report;

SELECT sp_trickle_report();

ERROR: Not enough current sensor activity! CONTEXT: PL/pgSQL function sp_trickle_report() line 38 at ASSERT

INSERT INTO sensor_log (location, reading, reading_date) SELECT s.id % 5000, s.id % 100, CURRENT_DATE - ((s.id * 1) || 's')::INTERVAL FROM generate_series(1, 50000) s(id);

SELECT sp_trickle_report();

sp_trickle_report

          5000

We begin by truncating the report table so we can witness the first assertion. As we can see, Postgres correctly complains that the table is empty when it shouldn’t be. Since this is a true ERROR, execution immediately halts where the assertion failed. We see that error and use some alternative method for filling the report table, and try the function again.

Stop! Assert time!

The second execution fails because the sensor_log table only contains data based on 1000 sensors, yet we expect 2000. Besides telling us to update our fixture data to account for more sensors, we also learn what the function expects. So we decide to insert some fake data from 5000 sensors and try again. The last attempt works as expected, and we’re free to move on with our lives.

It seems odd initially that a formerly ignored statement gets escalated all the way to a fatal error, doesn’t it? But how else should assertions work? If we just needed information, we could have simply used RAISE NOTICE with the relevant details. The power of ASSERT is that we don’t need to remove the statements to deactivate them.

The reason assertions are fatal is because they raise an ASSERT_FAILURE exception. We can actually take advantage of that in debugging. We can catch that exception just like any other, meaning we can ignore it or escalate as necessary. Consider this example:

CREATE OR REPLACE FUNCTION sp_trickle_report()
RETURNS BIGINT AS
$$
DECLARE
  last_update DATE;
  insert_count BIGINT;
BEGIN
  -- Find the most recent data in the report so we only
  -- refresh those contents.

SELECT INTO last_update max(reading_date) FROM sensor_report;

ASSERT last_update IS NOT NULL, 'Need to bootstrap report table.';

-- Since data comes in regularly, we only need to rebuild -- current data. Delete old aggregates so we can replace them.

DELETE FROM sensor_report WHERE reading_date >= last_update;

-- Perform the insert in a CTE so we can capture the count of -- inserted rows for further analysis or to return to our caller.

WITH capture AS ( INSERT INTO sensor_report ( location, reading_date, avg_reading, min_reading, max_reading ) SELECT location, reading_date::DATE, avg(reading), min(reading), min(reading) FROM sensor_log WHERE reading_date >= last_update GROUP BY location, reading_date::DATE RETURNING * ) SELECT count(*) INTO insert_count FROM capture;

ASSERT insert_count >= 2000, 'Not enough current sensor activity!';

RETURN insert_count;

EXCEPTION WHEN ASSERT_FAILURE THEN RAISE NOTICE 'Tripped over debugging code: %', SQLERRM; RETURN 0; END; $$ LANGUAGE plpgsql;

The only thing we added was a single exception block to catch any exceptions thrown within the function. Then we toned down the error slightly while maintaining the original assertion message, and return 0 from the function since it didn’t process any rows due to the exception.

We can see this for ourselves by truncating the report table again:

SET plpgsql.check_asserts = TRUE;

TRUNCATE TABLE sensor_report;

SELECT sp_trickle_report();

NOTICE: Tripped over debugging code: Need to bootstrap report table.

sp_trickle_report

             0

Now the function itself doesn’t fail or introduce an error into any transaction contexts we might have open, but we can still see something is wrong. The function claims to have inserted no rows, and we see a notice specifically from our own debugging system. In the production environment, this code would be completely silent and the function would operate without the overly oppressive assertions.

Is ASSERT overlooked because there are more suitable approaches to solve these problems? Or is it because the feature was added in 9.5, and it’s extremely easy to miss in the deluge of other headline-grabbing advancements? Honestly it’s probably a combination of those two and a myriad of other causes. The important thing is that we know about it now.

And as we are all well aware…

G.I. Joe!

Bruce Momjian: Still No Query Hints?

From Planet PostgreSQL. Published on Feb 17, 2017.

So, it is 2017 and Postgres still doesn't support query hints like other relational databases? Yep, it's true, and probably will be forever since "'Oracle-style' optimizer hints" is listed in the "Features We Do Not Want" section of the Postgres TODO list. A wiki page outlines the reasons for this.

While this seems perfectly logical to people who have used Postgres for years, it strikes new users as rigid and extreme. There are several reasons for this divergence.

First, what new users don't realize is that there are ways to control the optimizer, just not with inline query hints. The most useful ways are by changing planner constants and other planner options, which allow you to tailor the optimizer to your specific hardware and query types. This is a more holistic approach to optimizer tuning because, if set properly, they improve all queries, compared to "use this index"-style hints which must be added to every query.

Continue Reading »

Craig Kerstiens: Citus 6.1 Released–Horizontally scale your Postgres database

From Planet PostgreSQL. Published on Feb 16, 2017.

Microservices and NoSQL get a lot of hype, but in many cases what you really want is a relational database that simply works, and can easily scale as your application data grows. Microservices can help you split up areas of concern, but also introduce complexity and often heavy engineering work to migrate to them. Yet, there are a lot of monolithic apps out that do need to scale. If you don’t want the added complexity of microservices, but do need to continue scaling your relational database then you can with Citus. With Citus 6.1 we’re continuing to make scaling out your database even easier with all the benefits of Postgres (SQL, JSONB, PostGIS, indexes, etc.) still packed in there.

With this new release customers like Heap and Convertflow are able to scale from single node Postgres to horizontal linear scale. Citus 6.1 brings several improvements, making scaling your multi-tenant app even easier. These include:

  • Integrated reference table support
  • Tenant Isolation
  • View support on distributed tables
  • Distributed Vaccum / Analyze

All of this with the same language bindings, clients, drivers, libraries (like ActiveRecord) that Postgres already works with.

Give Citus 6.1 a try today on Citus Cloud, our fully managed database-as-a-service on top of AWS, or read on to learn more about all that’s included in this release.

Reference table support: Sharing data across tenants

Applications that are B2B fit smoothly into a model of sharding by customer. This means your customer only interacts with their own data, and all that data can be automatically co-located together–giving you all the power of SQL while still maintaining flexibility. Still in cases you may have smaller lookup or reference tables that don’t make sense to distribute. This could be something like a list of countries or an order status table. These type of tables don’t have the same large write volume as tables you’ll want to shard, but may have a relationship to them still. As of today you now have cleanly defined APIs to create these tables. To create a reference table you’d first create your table then run the function to distribute it:

-- a reference table
CREATE TABLE states (
  code char(2) PRIMARY KEY,
  full_name text NOT NULL,
  general_sales_tax numeric(4,3)
);

-- distribute it to all workers
SELECT create_reference_table('states');

Under the covers when interacting with this table we perform two phase commit (2PC) to gurantee all nodes are in a consistent state when you write/update/delete from it. This allows you to have the same data consistency guarantees as you would on a single node Postgres, without having to build extra logic into your application.

Tenant isolation: More resources for bigger tenants

Under the covers when you choose to distribute your tables multiple customers or tenants will live in the same table. Each shard is essentially just a Postgres table under the covers. This means if you were to shard something like a pageviews table by customer you would have 32 tables under the covers: pageviews_001, pageviews_002, etc. When you then insert data in your application you would insert into a normal pageviews table, and based on a hash of the customer_id it would be placed in the appropriate table.

For most data distributions this works well. Data at large enough scale that you need to shard tends to smooth out across your distributed tables–you’ll have some customers with larger datasets and some with smaller contained within one table. But, in some cases you may want even further isolation to improve performance. Citus makes this trivial by allowing you to isolate a specific tenant. To isolate your very large tenant you’d run the following which includes the table you want to isolate as well as the id of the tenant:

SELECT isolate_tenant_to_new_shard('table_name', tenant_id);

Once run the above will split the shard it was in into three shards in total. One table will contain all the lower range hash values, this tenant will live in their own table, and another table for the higher range hash values. The new shard is created on the same node as the shard from which the tenant was removed. For true hardware isolation you can then move that shard to a separate node if you see fit. And you can do this while keeping your database up and running.

View support on distributed tables

With Citus 6.0 we saw a number of customers start to leverage Citus for distributed roll-ups of data to help power real-time analytics. Once using this, we heard requests to make it easy to create various views across the entire dataset to make it easier to query within your application. Now when you create a view across a distributed table that view is automatically propagated to all of your distributed nodes.

Distributed Vaccum / Analyze

Being able to scale out all the resources your database needs, not only storage but also memory and compute is at the core of how Citus is able to give significant performance gains to applications over single node Postgres. On the compute side there’s a lot of tasks that may not happen every transaction, but do routinely such as creating indexes or vacuum. Vacuum is one operation that can be quite intensive within Postgres. In Citus 6.1 Vacuum is now distributed and run in parallel automatically across your cluster

Get started

We’re excited to continue expanding the capabilities of Citus it applies to an even broader set of use cases. You can give it a try today by creating an account on Citus Cloud or setup a local installation with Docker.

If you’re curious to learn more about how Citus might be able to help you feel free to drop us a line or join our upcoming webcast: a tour of Citus Cloud.

Joshua Drake: Training: Detecting performance problems and fixing them

From Planet PostgreSQL. Published on Feb 16, 2017.

Congratulations to Hans-Jürgen Schönig, who sold out his training: Detecting performance problems and fixing them. This marks one of the quickest training sales in PgConf US history.

Hans-Jürgen Schönig has been in professional PostgreSQL for over 16 years now and has dealt with customers around the globe. His company Cybertec Schönig & Schönig GmbH offers professional services for PostgreSQL, which includes training, 24x7 support, as well as consulting.

We at PgConf US are appreciative of all contributors willing to help us grow the largest PostgreSQL Conference in North America. It is quality content like this training that allows PostgreSQL to continue its upward growth. What a great community we have! Now might be the time to get your tickets!


HTML Template to PDF in Django

By Coding for Entrepreneurs | A Django Programming Class for the Non-Technical Founder from Django community aggregator: Community blog posts. Published on Feb 15, 2017.

** Still in Development ** ...

Create a Blank Django Project

By Coding for Entrepreneurs | A Django Programming Class for the Non-Technical Founder from Django community aggregator: Community blog posts. Published on Feb 15, 2017.

A easy to use guide for creati...

History of the URL, or why you are confused by serving many domains with a single server

By Django deployment from Django community aggregator: Community blog posts. Published on Feb 15, 2017.

I’ve noticed that many people attempting to deploy Django have trouble understanding how a single web server installation can serve many domains. One reason is that words matter. What Apache calls a “virtual host” is exactly what nginx calls a “server” and what HTTP calls a “host”. It is neither a host (let alone a virtual one) nor a server; it’s a domain. Let’s clear this up.

The first version of the HyperText Transfer Protocol was very simple. If your browser wanted to visit page http://djangodeployment.com/apage, it would connect to the server djangodeployment.com, port 80, and after the TCP connection was established it would send this line to the server, terminated with a newline:

GET /apage

The server would then respond with the content of that page and close the connection. The content of the page was usually something like this:

<html>
  <head>
    <title>Hello</title>
  </head>
  <body>
    <p>hello, world</p>
  </body>
</html>

In that era, a single computer could not distinguish different domains in the request. Whether you visited http://djangodeployment.com/apage or http://71.19.145.109/apage you’d get the same response. This is because the whole syntax of the URL, protocol://host[:port]/path, assumed you would get a path from a server. In that context, “host” is synonymous to “server”. At that time, no-one had thought of serving many domains from a single computer (having even a single web site was such a novelty that no-one was dreaming of having a second one).

Soon HTTP was enhanced to what we still use today. The GET directive is followed by a list of “HTTP headers”, terminated with an empty line.

GET /apage HTTP/1.1
Host: djangodeployment.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0

And the response is like this:

HTTP/1.1 200 OK
Date: Wed, 15 Feb 2017 15:04:50 GMT
Server: Apache/2.4.10 (Debian)
Last-Modified: Wed, 15 Feb 2017 15:02:36 GMT
Content-Length: 102
Content-Type: text/html

<html>
  <head>
    <title>Hello</title>
  </head>
  <body>
    <p>hello, world</p>
  </body>
</html>

It’s the “Host” header in the request that makes it possible for a single server installation to respond to each domain differently. However, the word that was chosen for that header, “Host”, was wrong since its conception—the correct one would have been “Domain”. Since the format of the URL was protocol://host[:port]/path, the HTTP designers apparently thought that you were asking the server for a specific “host”. And because, in that context, “host” is synonym for “server”, they thought that a single physical host can server many “virtual hosts”. Hence the confusion that persists to this day.

The moral is that you should mentally translate the currently used keywords into others. Specifically, you should understand the HTTP “Host” header as “domain”; the Apache “VirtualHost”, “ServerName” and “ServerAlias” directives as “domain”, “domain name” and “domain alias”; and the nginx “server” and “server_name” directives as “domain” and “domain name”.

Incidentally, it’s not only the words that are wrong. In fact, the whole design of the URL is suboptimal. I don’t, of course, hold anything against its designers; I wouldn’t have done better in their place, and it’s Tim Berners Lee himself who said the syntax is clumsy. In a URL like https://www.djangoproject.com/start/overview/, the “host” part, that is, the domain, goes from the most specific (djangoproject) to the most general (com), whereas the path goes from the most general to the most specific. What’s more, why should the user care about the protocol, the host, and the port? The URL should be something like “/com/djangoproject/start/overview”; the web browser would ask the DNS where to find it, and the DNS would reply “to get this resource, connect to 146.20.110.22, port 80, with HTTP”. While in 1990 the URL design looked cool, it’s the root of today’s confusion.

Related: How to use ngrep to debug HTTP headers

External link: Tim Berners Lee on the design of the URL

 

The post History of the URL, or why you are confused by serving many domains with a single server appeared first on Django deployment.

Joshua Drake: How PostgreSQL is tested

From Planet PostgreSQL. Published on Feb 15, 2017.

At PgConf US 2016, PostgreSQL core team member Peter Eisentraut  presented: How PostgreSQL is tested. Below is the video for that presentation. Join us at the end of March to participate in knowledge craft that is rarely seen outside of the PostgreSQL ecosystem.



Bruce Momjian: Going Deep on Stats

From Planet PostgreSQL. Published on Feb 15, 2017.

Postgres includes lots of monitoring tools that allow you to see what is happening, but there are a few settings that really go into detail, e.g. log_statement_stats:

SET client_min_messages = log;
SET log_statement_stats = true;
 
SELECT 100;
LOG:  QUERY STATISTICS
DETAIL:  ! system usage stats:
1 !       0.000000 s user, 0.000000 s system, 0.000201 s elapsed
2 !       [0.000000 s user, 0.000000 s system total]
3 !       0/0 [0/0] filesystem blocks in/out
4 !       0/14 [0/363] page faults/reclaims, 0 [0] swaps
5 !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
6 !       0/0 [3/0] voluntary/involuntary context switches
 ?column?
----------
      100

Continue Reading »

Pavan Deolasee: Corruption in CREATE INDEX CONCURRENTLY

From Planet PostgreSQL. Published on Feb 14, 2017.

Two weeks back we discovered an ancient bug in PostgreSQL which may cause index corruption when index is built via CREATE INDEX CONCURRENTLY (aka CIC). All supported releases of PostgreSQL are affected by this bug. The bug could be a source of index corruption reports that we receive from the field once in awhile.

Now before we discuss the problem, let me make it absolutely clear that PostgreSQL is known for its gold standards as far as code quality and product stability is concerned. But just like any other software, the database engine can also have unnoticed bugs. But PostgreSQL community takes every known bug very seriously and try to fix it on priority. No wonder that the bug fix went into the scheduled 9.6.2 release (and all other stable releases) at a very short notice, but not before community discussed and deliberated pros and cons of adding a crucial bug fix very close to a planned release.

To understand origin of the bug, you must first understand how CREATE INDEX CONCURRENTLY works. So if you haven’t, read my previous blog on the topic.

Relation Cache

You would also need to know about another technical detail. PostgreSQL engine internally uses something known as “Relation Cache” or relcache to store information about database objects, such as tables and indexes. These objects belong to system catalogs which are nothing more than a bunch of tables, but created in a system schema and whose presence is well known to the database engine. These catalogs are accessed very frequently since almost everything that get executed in the engine need access to the catalogs. It would be grossly damaging if these catalogs are scanned and queried like normal tables, given how frequently they are accessed. Also, catalogs don’t change too often. So information about various database objects is cached so that it can be quickly accessed without going to the database. There are variety of caches, but the cache that stores information about tables, their attributes, index columns etc is called a relcache. Caches are maintained in backend-local private memory and they are invalidated whenever catalog information is changed.

CIC Bug: Origin, Symptoms and Effects

This particular bug affects CIC in presence of concurrent updates where updates are changing from HOT to non-HOT update precisely because of the new index being added. This of course does not happen very often since users typically try to avoid creating an index which will make HOT updates impossible. But if it does happen, then the bug may prevent CIC from inserting necessary index entries into the new index. Or incorrectly assume that the index entry added during the first phase of CIC can be used to identify the tuple (at this point, you may want to read my previous blog on the working of CIC and HOT). You would notice that HOT updates skip adding new index entries if none of the index columns are being modified. The updated versions are still reachable from the old index entries. Now obviously when a new index is added, the view of indexed columns changes if the new index is indexing new columns. CIC has special mechanism to deal with that, as explained in the previous blog. But what if that’s broken? What if we can’t reliably detect the set of indexed columns as new index is added and concurrent update are executed? PostgreSQL might think that an UPDATE is a HOT update and avoid indexing that tuple. It might assume that the existing index entry is enough to reach the updated tuple, but that’s not correct because the indexed columns have changed. That’s exactly what happened, resulting in the corruption.

If the race condition materialise, you might find that you can’t access certain rows via the index or index access may return wrong rows. For example:

CREATE TABLE testtab (a int UNIQUE, b int, c int);

-- Insert data and then run UPDATE statements
INSERT INTO testtab SELECT generate_series(1, 100), generate_series(1001, 1100), 10;
UPDATE testtab SET b = b + 1000 WHERE a = 1;

-- At the same time, build a new index concurrently
CREATE INDEX CONCURRENTLY testindx ON testtab(b);

Without the new index, each UPDATE can be a HOT update and hence the database does not insert new index entries. But when the new index is built, update becomes a non-HOT update. Now say, when the first phase of CIC is executed, it indexes the old version (1, 1001, 10) of the row. It will add an index entry for (b = 1001). If the race condition occurs, and when b is updated from 1001 to 2001, the buggy code skips index entry for (b = 2001) because it can’t yet see the new index and incorrectly assumes that the update can be a HOT update. As a result, if the user later queries the table for (b = 2001), it may not find the row (unless of course database decides not to use the new index because of cost estimates). On the other hand, since the new row is still reachable from the old index entry, if the user queries the table for (b = 1001), it may get the new row instead.

You might wonder what it has to do with the relcache. Well, the information about indexed columns is stored in the relcache. When a new index is added, the relache for the underlying table is invalidated and then rebuilt. But there was a race condition which caused the relcache information to remain stale and include the columns indexed by the new index. Subsequent UPDATEs used this stale information, they incorrectly assumed certain updates to be HOT updates.

Data corruption bugs are often very subtle and may not show up immediately and that’s one reason why they are very hard to diagnose and fix.

How We Discovered the Bug

To be honest, the bug was discovered quite accidentally. We’re currently working on a new feature called WARM which has tremendous potential to reduce write amplification caused by UPDATEs to a table with large number of indexes, but only very few of those index columns are actually updated. As part of the development process, we were running stress tests on patched PostgreSQL, dropping and creating indexes concurrently while the test is updating rows in many concurrent sessions. We noticed certain data consistency issues, which were obviously first attributed to the new feature. This was very natural since WARM touches these very areas of the database engine. But after spending hours and with careful review of the new code, we concluded that this is a pre-existing bug. After some more code reading, it was clear how relcache can get into a race condition, causing stale data and problems thereafter.

Once we’d theory in place, it wasn’t too hard to create a fully reproducible case which was duly reported to the PostgreSQL community.

Remedy

PostgreSQL community deliberated to a great length on what could be the correct fix. Given project’s focus on data consistency and integrity, we ensured that a bug fix is committed in time before the scheduled point releases went out. So all latest minor releases of PostgreSQL contain the bug fix. If you’re think if one or more of your indexes that were built CONCURRENTLY could possibly be corrupt, make sure you rebuild them after updating to the corresponding latest release.

Different types of testing in Django

By GoDjango - Django Screencasts from Django community aggregator: Community blog posts. Published on Feb 14, 2017.

Testing is always one of those topics that can be interesting to talk about. There are a lot of different opinions on testing so it can be fun.

Django comes with some great tools for testing, so this week we will talk a little bit about the different types of tests. Then expand on that with how that relates to Django.

I also present to you a new type of test at the end of the video that I have been using, special thanks to a co-worker for coming up with the idea. It is really specific to django, and I haven't heard of others doing it.

Different Types of Tests with Django

Joshua Drake: Third Annual Regulated Industry Summit

From Planet PostgreSQL. Published on Feb 14, 2017.

Regulated Industry Summit

We just released the schedule for the Regulated Industry Summit!

The "Third Annual Regulated Industry Summit: Finance, Government, Healthcare, and PostgreSQL" is a community summit that brings together PostgreSQL contributors and enterprise developers to discuss challenges and solutions in utilizing PostgreSQL in regulated environments.

Some of the topics of the Regulated Industry Summit are:
  • Engineering cloud infrastructure for HIPAA-compliance
  • Blockchain, Distributed Security and Regulated Industries
  • Secure Technical Implementation Guide (STIG) For PostgreSQL
  • Security Enhancements in Postgres 10 

Entrance to the Regulated Industry Summit is part of conference admission, but if you are just interested in attending the Summit, you can purchase a ticket for the Regulated Industry Summit for $99.

The Largest PostgreSQL Conference in North America 

We are inching closer to March 28 - 31. This year PGConf US 2017 will feature the familiar conference format on March 29 - 31 with a mix of specialized sessions, keynotes, and the ability to connect with PostgreSQL fans at various receptions throughout the day. As with past PostgreSQL conferences, we will feature presentations from the perspective of developers, DBAs, systems administrators, business-intelligence analysts, and decisions makers.

If you haven't purchased your tickets yet, buy them before they sell out!

Get your tickets

http://www.pgconf.us/conferences/2017

ProTip

Reach out to our sponsors for a special discount code.

Hotel
For those planning to attend, don’t forget to book your room soon for the best rates as our group block is only available until March 6. We have reserved a block of rooms at the special conference rate of $236/night plus applicable taxes for attendees who make their reservations via the Westin website. You can book your room at:
https://www.starwoodmeeting.com/Book/PgConfUS2017

PGConf US 2017 is hosted by the United States PostgreSQL Association, a nonprofit 501(c)(3) created to support PostgreSQL in the United States through user group development, conferences, educational initiatives, and fun.  

The generous support from sponsors is what makes PGConf US 2017 possible:

Sponsors
Diamond: Amazon Web Services
Platinum: OpenSCG, EnterpriseDB, 2ndQuadrant
Gold: Heroku, Crunchy Data, GridGain, Braintree, JetBrains, Google Cloud Platform, Select Star, Airfacts

For more information, please visit http://www.pgconf.us

We look forward to seeing you in March!                         

Hans-Juergen Schoenig: Experimenting with scaling and full parallelism in PostgreSQL

From Planet PostgreSQL. Published on Feb 14, 2017.

A lot has been said and written about scaling PostgreSQL to many cores and subsequently to many machines. Running PostgreSQL in a server farm is something, which has not been possible for many years. However, this has definitely changed. A lot of development has gone into scaling PostgreSQL and to running a single query on […]

The post Experimenting with scaling and full parallelism in PostgreSQL appeared first on Cybertec - The PostgreSQL Database Company.

Podcasttime.io - How Much Time Do Your Podcasts Take To Listen To?

By Peter Bengtsson from Django community aggregator: Community blog posts. Published on Feb 13, 2017.

It's a web app where you search and find the podcasts *you* listen to. It then gives you a break down how much time that requires to keep up per day, per week and per month.

Joshua Drake: Advanced use of pg_stat_statements

From Planet PostgreSQL. Published on Feb 13, 2017.

The following presentation took place at PgConf US 2016. Join us in 2017 to see other excellent content like this! We have a full line up and an extra day.





Bruce Momjian: Performance Goalposts

From Planet PostgreSQL. Published on Feb 13, 2017.

I often get the question, "How do I get good performance out of Postgres?," and sometimes I am kind of mifted because I get the feeling that they expect a short answer. Those who are familiar with Postgres know there is no short answer, and it really isn't Postgres's fault because there isn't a simple answer to that question for any relational database.

I guess it comes down to the complexity of relational systems. There is the SQL parser, the optimizer, and background workers. No matter how much auto-tuning we do, there are still going to be things that administrators must adjust for optimal performance. And then there is the hardware, which is stressed by all relational systems. Proper tuning of Postgres must happen at all three levels:

  • Hardware
  • Database parameter configuration
  • SQL (both data modeling and queries)

Unless all three are tuned, you can't say the system is optimized. I have already covered hardware tuning in my Database Hardware Selection Guidelines presentation. Database parameter configuration is a tough one because every site has different requirements, though there are a few common tuning cases. Tuning at the SQL level involves proper schema and index setup, plus monitoring with something like log_min_duration_statement, pg_stat_statements, or pgBadger. Once you identify a problem SQL query, you often have to do detective work with EXPLAIN.

Continue Reading »

Chris Travers: PostgreSQL at 10TB and Beyond Recorded Talk

From Planet PostgreSQL. Published on Feb 13, 2017.

The PostgreSQL at 10 TB And Beyond talk has now been released on Youtube. Feel free to watch.  For the folks seeing this on Planet Perl Iron Man, there is a short function which extends SQL written in Perl that runs in PostgreSQL in the final 10 minutes or so of the lecture.

This lecture discusses human and technical approaches to solving volume, velocity, and variety problems on PostgreSQL in the 10TB range on a single, non-sharded large server.

As a side but related note, I am teaching a course through Edument on the topics discussed in Sweden discussing many of the technical aspects discussed here, called Advanced PostgreSQL for Programmers.  You can book the course for the end of this month.  It will be held in Malmo, Sweden.

Michael Paquier: Postgres 10 highlight - Password file paths as libpq connection parameter

From Planet PostgreSQL. Published on Feb 12, 2017.

Here is a feature for Postgres 10 that a couple of people will find useful regarding the handling of password files:

commit: ba005f193d88a8404e81db3df223cf689d64d75e
author: Tom Lane <tgl@sss.pgh.pa.us>
date: Tue, 24 Jan 2017 17:06:34 -0500
Allow password file name to be specified as a libpq connection parameter.

Formerly an alternate password file could only be selected via the
environment variable PGPASSFILE; now it can also be selected via a
new connection parameter "passfile", corresponding to the conventions
for most other connection parameters.  There was some concern about
this creating a security weakness, but it was agreed that that argument
was pretty thin, and there are clear use-cases for handling password
files this way.

Julian Markwort, reviewed by Fabien Coelho, some adjustments by me

Discussion: https://postgr.es/m/a4b4f4f1-7b58-a0e8-5268-5f7db8e8ccaa@uni-muenster.de

Connection strings can be used to connect to a PostgreSQL instance and can be customized in many ways to decide how the client should try to connect with the backend server. The documentation offers a large list nicely documented, most of them being as well overridable using mapping environment variables listed here.

The commit above enables the possibility to override the position of a password file directly using a path, without the need of an environment variable. This is a major advantage for some class of users. For example imagine the case where Postgres is used on a host shared by many users, where trusted connections cannot be used even with local Unix domains path under the control of a specific group or user because those users rely on default paths like /tmp or default localhost (the limitation here being that pg_hba.conf assumes that “local” entries map to all local Unix domains). When creating a service that links to PostgreSQL, monitored by some higher-level application, this service may not be able to use the environment variables at its disposal to find the path to a password file. While it is necessary to hardcode somewhere the path to the password file, what is more a pain is the extra logic needed to parse the password file in place and then use its data directly in the connection string. The above commit makes all this class of parsing problems completely disappear, and that’s much welcome.

The environment variable PGPASSFILE is already at disposal to enforce at session-level the path of the password file, and now the parameter called “passfile” can be used directly in a connection string to enforce the path where to find the user information, for a use like the following one:

$ psql -d "passfile=/path/to/pgpass dbname=dbfoo" -U userfoo

This would simply attempt a connection to the instance at address localhost, using database user “userfoo”, on database “dbfoo”. If the password file specified in the connection string matches, then a lookup is done on it to avoid input of any password needed. Note that no errors are reported if the password file is missing, that the password file path cannot be a symlink to something else and that it cannot have world or group permissions. There is nothing new here compared to past versions of PostgreSQL, the same checks applying as well on this new connection parameter.

Magnus Hagander: Logging transactions that dropped tables

From Planet PostgreSQL. Published on Feb 12, 2017.

In a previous post I discussed a way to find out which transaction dropped a table by examining the transaction log, in order to set a restore point to right before the table was dropped.

But what if we have the luxury of planning ahead (right? Well, let's call it the second time it happens?). Shouldn't we be able to log which transaction dropped a table, and use that? Of course we should.

The first thing one tries is then of course something like this in postgresql.conf:

log_statement='ddl'
log_line_prefix = '%t [%u@%d] <%x> '

to include the transaction id of the table. Unfortunately:

2017-02-12 12:16:39 CET [mha@postgres] <0> LOG:  statement: drop table testtable;

The 0 as a transaction id indicates that this command was run in a virtual transaction, and did not have a real transaction id. The reason for this is that the statement logging happens before the statement has actually acquired a transaction. For example, if I instead drop two tables, and do so in a transaction:

postgres=# BEGIN;
BEGIN
postgres=# DROP TABLE test1;
DROP TABLE
postgres=# DROP TABLE test2;
DROP TABLE
postgres=# COMMIT;
COMMIT

I get this interesting output:

2017-02-12 12:17:43 CET [mha@postgres] <0> LOG:  statement: DROP TABLE test1;
2017-02-12 12:17:45 CET [mha@postgres] <156960> LOG:  statement: DROP TABLE test2;

Which shows two different transaction ids (one real and one not) for statements in the same transaction. That's obviously not true - they were both dropped by transaction 156960. The transaction id just wasn't available at the time of logging.

So what can we do about that? Event triggers to the rescue!

Bruce Momjian: Postgres Gives Developers More Options with NoSQL

From Planet PostgreSQL. Published on Feb 11, 2017.

I just co-authored a blog for IBM's Linux on Power developer site. While there is similar content in my YeSQL slides, the IBM blog explains the four options available to application developers, from pure relational to pure NoSQL, and the options in between possible with Postgres.

Set Up Travis CI For Django project

By Micropyramid django from Django community aggregator: Community blog posts. Published on Feb 10, 2017.

Travis CI is a continuous integration service used to build and test applications hosted on GitHub. Travis CI supports integration with other tools such as coverage analyzers.

Why use Travis?

Travis CI runs your program's tests every time you commit to GitHub. you can easily discover your code breaks. setting up Travis-ci is very easy.

To run tests in Travis you have to create ".travis.yml" file in root directory.

.travis.yml file for Django application.

    language: python # => 1

    python: # => 2

      - "2.6"

      - "2.7"

    services: # => 3

      - mysql

    env: # => 4

      -DJANGO=1.8 DB=mysql
    install: # => 5

      - pip install -r requirements.txt

    before_script: # => 6

      - mysql -e 'create database test;' -u root
    script: # => 7

      - python manage.py test

Explanation for above comments:
1. By defining "language: python" application is developed in python language. 
2. Test your application in multiple versions of python by defining versions in python hook settings
3. Define services required for your application ex: elastic search, radis, etc in services.
4. Specify your Django version and database to use.
5. install application requirements.
6. before_script: as name defines this commands will run before running your actual test cases.
7. command to run tests.

Add above file to your Django project and commit it to GitHub. Now check the build status in Travis-CI.

Django Unit Test cases with Forms and Views

By Micropyramid django from Django community aggregator: Community blog posts. Published on Feb 10, 2017.

Test Cases For Forms, Views

In this post, we’ll see a brief introduction how to write a unit test cases for any project involved.

Having tests for any project will helps you to structure good code, find bugs. If a specific function breaks, you will know about it. Tests also make debugging breaks in code much easier.

Django’s Unit Tests uses a Python standard library module: unit test. This module defines tests using a class-based approach.

Unit Tests: Unit Tests are isolated tests that test one specific function.

Test Case: A test case is the smallest unit of testing. It checks for a specific response to a particular set of inputs.

Test suite: A test suite is a collection of test cases. It is used to aggregate tests that should be executed together.

In general, tests result in either a Success (expected results), Failure (unexpected results), or an error. You not only need to test for expected results but also how well your code handles unexpected results also.

Testing the Forms:

Consider a Form:

from django import forms
from .models import *

    class UserForm(forms.ModelForm):
       class Meta:
          model = User
          fields = ('email', 'password', 'first_name', 'phone')

setUp(): The setUp() methods allow you to define instructions that will be executed before and after each test method.

Every TestCase method should start with "test", because When you run your tests, the default behavior of the test utility is to find all the test cases in any file whose name begins with test, automatically build a test suite out of all test cases, and run that suite.

For Testing any Form, In "Setup_Class" we create required objects here, and will test whether the created object details matched or not.

Write in your tests.py

from django.test import TestCase
from django.test import Client
from .forms import *   # import all forms

class Setup_Class(TestCase):

    def setUp(self):
        self.user = User.objects.create(email="user@mp.com", password="user", first_name="user", phone=12345678)

class User_Form_Test(TestCase):

    # Valid Form Data
    def test_UserForm_valid(self):
        form = UserForm(data={'email': "user@mp.com", 'password': "user", 'first_name': "user", 'phone': 12345678})
        self.assertTrue(form.is_valid())

    # Invalid Form Data
    def test_UserForm_invalid(self):
        form = UserForm(data={'email': "", 'password': "mp", 'first_name': "mp", 'phone': ""})
        self.assertFalse(form.is_valid())

Here assertTrue() or assertFalse() to verify a condition, which returns "True" or "Flase"

Here the above functions in class "User_Form_Test" returns "True" or "False" it depends on the data given.

Testing the Views:

While testing the views, we test for the response codes, and then we test the actual response.

First, we check for user is logged in then, we fetch the URL from the client, store the results in the variable and then test our assertions.

class User_Views_Test(SetUp_Class):

    def test_home_view(self):
        user_login = self.client.login(email="user@mp.com", password="user")
        self.assertTrue(user_login)
        response = self.client.get("/")
        self.assertEqual(response.status_code, 302)

    def test_add_user_view(self):
        response = self.client.get("include url for add user view")
        self.assertEqual(response.status_code, 200)
        self.assertTemplateUsed(response, "include template name to render the response")

    # Invalid Data
    def test_add_user_invalidform_view(self):
        response = self.client.post("include url to post the data given", {'email': "admin@mp.com", 'password': "", 'first_name': "mp", 'phone': 12345678})
        self.assertTrue('"error": true' in response.content)

    # Valid Data
    def test_add_admin_form_view(self):
        user_count = User.objects.count()
        response = self.client.post("include url to post the data given", {'email': "user@mp.com", 'password': "user", 'first_name': "user"})
        self.assertEqual(response.status_code, 200)
        self.assertEqual(User.objects.count(), user_count+1)
        self.assertTrue('"error": false' in response.content)

assertEqual(): assertEqual() to check for an expected result.
assertTemplateUsed(): Asserts that the template with the given name was used in rendering the response.

Command line usage to Run the tests:

    python manage.py test

Third-party package:

Coverage: It is a tool used for measuring the effectiveness of tests, showing the percentage of your codebase covered by tests.

Installation: Pip install coverage

Use "coverage run" to run your program and gather data:

coverage run manage.py test
coverage report -m  # provides the report for the tests

Django - migrating from function based views to class based views

By Micropyramid django from Django community aggregator: Community blog posts. Published on Feb 10, 2017.

The single most significant advantage in Django class-based views is inheritance. On a large project, it's likely that we will have lots of similar views. Instead of writing the repeated code we can simply have our views inherit from a base view. Also, Django ships with a collection of generic view classes that can be used to do some of the most common tasks.

1. Template View

     Function based view

  urls.py

from django.conf.urls import url
from . import views
urlpatterns = [
         url(r'^about-us/$', views.about_us, name="about_us"),
]

   views.py

from django.shortcuts import render
def about_us(request):
       return render(request, 'templates/contact.html')
 
     Class based view

      urls.py

from django.conf.urls import url
from .views import AboutUs
urlpatterns = [
         url(r'^about-us/$', AboutUs.as_view(), name="about_us"),
]

      views.py

from django.views.generic import TemplateView
class AboutUs(TemplateView):
        template_name = "templates/about.html"

or we can directly write it in    "urls.py" 

      urls.py

from django.conf.urls import url
from django.views.generic import TemplateView
urlpatterns = [
         url(r'^about-us/$', TemplateView.as_view(template_name= "templates/about.html"), name="about_us"),
]

 

2. Detail View

     Function based view

  urls.py

from django.conf.urls import url
from . import views
urlpatterns = [
         url(r'^author/(?P<pk>[0-9]+)/details/$', views.author_details, name="author_details"),
]

  views.py

from django.shortcuts import render
from django.shortcuts import get_object_or_404
from books.models import Author

def author_details(request, pk):
       author = get_object_or_404(Author, id=pk)
       context = {'author': author}
       return render(request,
                             context, 
                             'templates/author_details.html')
 
    Class based view

       urls.py

from django.conf.urls import url
from .views import AuthorDetails
urlpatterns = [
         url(r'^author/(?P<pk>[0-9]+)/details/$', AuthorDetails.as_view(), name="author_details"),
]

      views.py

from django.views.generic import DetailView
from books.models import Author

class  AuthorDetails(DetailView):
        model = Author
        slug_field = 'pk'         #  for identifying the object uniquely
        context_object_name = 'author'         # this name is used access the object inside template (it's optional)
        template_name = "templates/author_details.html"

 we can access object in template as  "object" or model name with lowercase letters "modelname" (ex:  object.name or author.name)

      Note : To send extra context data to the template we can override  the super class method  get_context_data 

from django.views.generic import DetailView
from .models import Author, Book

class  AuthorDetails(DetailView):
        model = Author
        slug_field = 'pk'
        template_name = "templates/author_details.html"
        
        def get_context_data(self, **kwargs):
               context = super(AuthorDetails, self).get_context_data(**kwargs)
               context['book_list'] = Book.objects.filter(author=self.get_object())
               return context

 

3. List View

    Function based view

  urls.py

from django.conf.urls import url
from . import views
urlpatterns = [
         url(r'^authors-list/$', views.authors_list, name="authors_list"),
]

  views.py

from django.shortcuts import render
from books.models import Author
def authors_list(request):
       authors_list = Author.objects.all()
       context = {'authors_list': authors_list}
       return render(request,
                             context, 
                             'templates/authors_list.html')
 
    Class based view

       urls.py

from django.conf.urls import url
from .views import AuthorDetails
urlpatterns = [
         url(r'^authors-list/$', AuthorsList.as_view(), name="authors_list"),
]

      views.py

from django.views.generic import ListView
from books.models import Author

class  AuthorsList(ListView):
        model = Author
        template_name = "templates/authors_list.html"

 we can access objects in template as  "object_list" or model name with lowercase letters "modelname_list" 

    List view with dynamic filtering

  urls.py

from django.conf.urls import url
from . import views
urlpatterns = [
         url(r'^authors-list/subject/(?P<category>[-\w]+)/$', views.authors_list, name="authors_list"),
]

  views.py

from django.shortcuts import render
from books.models import Author
def authors_list(request, category):
       authors_list = Author.objects.filter(category=category)
       context = {'authors_list'authors_list,
                        'category': category
}
       return render(request,
                             context, 
                             'templates/authors_list.html')
 
      Class based view

       urls.py

from django.conf.urls import url
from .views import AuthorsList
urlpatterns = [
         url(r'^authors-list/subject/(?P<category>)/$', AuthorsList.as_view(), name="authors_list"),
]

      views.py

from django.views.generic import ListView
from books.models import Author

class  AuthorsList(ListView):
        model = Author
        template_name = "templates/authors_list.html"
        paginate_by = 10      # It will paginate the objects such that  each page contains atmost 10 objects
        
        def get_queryset(self):
              query_set = self.model.objects.filter(category=self.kwargs.get('category'))
              return query_set

 we are overriding the super class method "get_queryset"  to get  desired query set.

 

4. Form View

 forms.py

from django import forms
from django.core.mail import send_mail

class ContactForm(forms.Form):
    name = forms.CharField(max_length=100)
    subject = forms.CharField(max_length=100, required=False)
    email = forms.EmailField()
    message = forms.CharField(widget=forms.Textarea)

    def send_email(self):
        name = self.cleaned_data.get('name', '')
   
    subject = self.cleaned_data.get('subject', '')
        email = self.cleaned_data.get('name', '')
        message = self.cleaned_data.get('message', '')

        send_mail( subject,
                          message,
                          email,
                          ['help@micropyramid.com'],
                          fail_silently=False, )
       

    Function based view

  urls.py

from django.conf.urls import url
from . import views
urlpatterns = [
         url(r'^contact-us/$', views.contact_us, name="contact_us"),
]

 views.py

from .forms import ContactForm
django.http import HttpResponseRedirect

def contact_us(request):
      form = ContactForm()
       if request.POST:

            form = ContactForm(request.POST)
            if form.is_valid():
                form.send_email()
                return HttpResponseRedirect("/thanks/")
       context = {'form': form}

       return render(request,
                             context, 
                             'templates/contact_us.html')
   Class based view

       urls.py

from django.conf.urls import url
from .views import  ContactUs
urlpatterns = [
         url(r'^contact-us/$', ContactUs.as_view(), name="contact_us"),
]

      views.py

from django.views.generic.edit import FormView
from .forms import ContactForm

class ContactUs(FormView):
     template_name = 'templates/contact_us.html'
     form_class = ContactForm
     success_url = '/thanks/'

     def form_valid(self, form):
          form.send_email()
          
return super(ContactUs, self).form_valid(form)
          

 

5. Create View

   forms.py

from django import forms
from .models import Author

class AuthorForm(forms.ModelForm):
    class Meta:
          model = Author
          fields  = ("first_name",
                         "last_name",
                         "about",
                         "country")
 

    Function based view

   urls.py

from django.conf.urls import url
from . import views
urlpatterns = [
         url(r'^create-author/$', views.create_author, name="create_author"),
]

  views.py

from .forms import AuthorForm
django.http import HttpResponseRedirect

def create_author(request):
      form =
AuthorForm()
       if request.POST:

            form = AuthorForm(request.POST)
            if form.is_valid():
                form.save()
                return HttpResponseRedirect("/authors-list/")
       context = {'form': form}

       return render(request,
                             context, 
                             'templates/create_author.html')
   
   Class based view

       urls.py

from django.conf.urls import url
from .views import  CreateAuthor
urlpatterns = [
         url(r'^create-author/$', CreateAuthor.as_view(), name="create_author"),
]

      views.py

from django.views.generic.edit import CreateView
from .forms import AuthorForm
from .models import Author

class CreateAuthor(CreateView):
     model = Author                 # provide if you don't use model form
     template_name = 'templates/create_author.html'
     form_class = AuthorForm
     success_url = '/authors-list/'          

 

6. Update View

     Function based view:

   urls.py

from django.conf.urls import url
from . import views
urlpatterns = [
         url(r'^update-author/(?P<author_id>[\d]+)/$', views.update_author, name="update_author"),
]

  views.py

from django.http import HttpResponseRedirect
from django.shortcuts import get_object_or_404
from .forms import AuthorForm
from .models import Author


def update_author(request, author_id):
      author =
get_object_or_404(Author, id=author_id)
      if request.POST:
            form = AuthorForm(request.POST, instance=author)
            if form.is_valid():
                  form.save()
                  return HttpResponseRedirect("/authors-list/")
      form = AuthorForm(instance=
author)
      context = {'form': form}
      return render(request,
                            context, 
                            'templates/create_author.html')
   
   
Class based view

       urls.py

from django.conf.urls import url
from .views import  UpdateAuthor
urlpatterns = [
         url(r'^update-author/(?P<author_id>[\d]+)/$', UpdateAuthor.as_view(), name="update_author"),
]

      views.py

from django.views.generic.edit import UpdateView
from .forms import AuthorForm
from .models import Author

class UpdateAuthor(UpdateView):
     model = Author              # provide if you don't use model form
     slug_field = 'author_id'
     template_name = 'templates/create_author.html'
     form_class = AuthorForm
     success_url = '/authors-list/

 

7. Delete View

     Function based view:

   urls.py

from django.conf.urls import url
from . import views
urlpatterns = [
         url(r'^delete-author/(?P<author_id>[\d]+)/$', views.delete_author, name="delete_author"),
]

  views.py

from django.http import HttpResponseRedirect
from django.shortcuts import get_object_or_404
from .models import Author

def delete_author(request, author_id):
      author = 
get_object_or_404(Author, id=author_id)
      author.delete()
      
return HttpResponseRedirect("/authors-list/")
      
    
   
Class based view

 urls.py

from django.conf.urls import url
from .views import DeleteAuthor
urlpatterns = [
         url(r'^delete-author/(?P<author_id>[\d]+)/$', DeleteAuthor.as_view(), name="delete_author"),
]

  views.py

from django.views.generic.edit import DeleteView
from .models import Author

class DeleteAuthor(DeleteView):
     model = Author

     slug_field = 'author_id'
     success_url = '/authors-list/


You can always override super class methods if required,  see UML diagrams of Django-generic views for better understanding http://epydoc.pythondiary.com/generic-views/

Deploying Your Django app on Heroku

By Micropyramid django from Django community aggregator: Community blog posts. Published on Feb 10, 2017.

Heroku is a cloud application platform it's a new way of building and deploying web apps, Which makes easy to host your application in the cloud with a simple git push command.

Heroku supports several programming languages like(Python, Java, PHP)

Install the Heroku Toolbelt:

The first thing you need to do is to install the Heroku toolbelt. The toolbelt is a command line software which makes it easy to interact with the Heroku service through your command line.

Here you can find out the Heroku toolbelt installation for Debian/Ubuntu, Run this from your terminal:

    wget -O- https://toolbelt.heroku.com/install-ubuntu.sh | sh

In the below link you need to select your required operating system(Download Heroku Toolbelt for..) for installing heroku toolbelt in your system. And then you can proceed to install.

Please click here to install Heroku ToolBelt

After successfully installation of toolbelt, you can use the heroku command from your terminal.

heroku login
You will prompt to provide heroku credentials(Email and password), once you have authenticated you can access both heroku and git commands.

Create your heroku app:

The following command is used to create an app

heroku create your-app-name

Here 'your-app-name' should be unique, heroku will generate default app-name if you won't specify any app-name.

For creating remote for your app use this command

heroku git:remote -a your-app-name

Define a Procfile:

A Procfile is a mechanism for declaring what commands need to run by your application dynos on the Heroku platform.

Use a Procfile, a text file in the root directory of your application, to explicitly declare what command should be executed to start your app.

Here first you need to install gunicorn using pip 

    pip install gunicorn

    web: gunicorn yoursettings.wsgi --log-file -

Create new file in the root directory of your application as "app.json"

{

    "name": "Your-App-Name",

    "description": "Self-hosted app",

    "repository": "provide your git repository url",

    "scripts": {

    "postdeploy": "python manage.py makemigrations && python manage.py migrate"

  }

}

"app.json" is a json format file for describing web apps. It declares environment variables, Description, and other information required to run an app on Heroku.

To Load your app Static files you need to use "WhiteNoise", this can be install by using pip 

pip install whitenoise

And also you need to update your "wsgi.py" file too

    from whitenoise.django import DjangoWhiteNoise

    application = DjangoWhiteNoise(application)
    STATICFILES_STORAGE = 'whitenoise.django.GzipManifestStaticFilesStorage'

Update STATIC_ROOT in your "settings.py" file for loading your static files 

STATIC_ROOT = (os.path.join(BASE_DIR, "static"))

Database Configuration:

For configuring your database, First you need to install "dj_database_url" using pip

    pip install dj_database_url
    pip install psycopg2

    import dj_database_url

    DATABASES = {
        'default': dj_database_url.config(
        default='sqlite:////{0}'.format(os.path.join(BASE_DIR, 'db.sqlite3'))
        )
    }

Dependecies used:

    1. dj-database-url
    2. Django
    3. gunicorn
    4. psycopg2
    5. whitenoise

Update all your app dependencies in your "requirements.txt" file which is in the root directory of your app. These are the basic dependencies used.

Deploying Your app on Heroku:

    This can be done by using simple git command
        * It will read your Django appication
        * Install your requirements provided
        * Detects your Procfile
        * It will collect your static files.

    git push heroku master

After Sucessfully deploying your app, you will be provided with the url to open your app.

you can also run it by using the command.

heroku open

Angular 2 Setup Guide

By Coding for Entrepreneurs | A Django Programming Class for the Non-Technical Founder from Django community aggregator: Community blog posts. Published on Feb 10, 2017.

# This guide is still being de...

Srvup 2 is here

By Coding for Entrepreneurs | A Django Programming Class for the Non-Technical Founder from Django community aggregator: Community blog posts. Published on Feb 10, 2017.

The future of education will l...

Typescript Setup Guide

By Coding for Entrepreneurs | A Django Programming Class for the Non-Technical Founder from Django community aggregator: Community blog posts. Published on Feb 10, 2017.

** This guide is still being d...

Andreas Scherbaum: PostgreSQL @ FOSDEM 2017 - Review

From Planet PostgreSQL. Published on Feb 10, 2017.

Author
Andreas 'ads' Scherbaum

FOSDEM 2017 is over, and hopefully everyone is back home. Time for a short review.

Once again, PostgreSQL ran a pre-conference to FOSDEM, called PGDay. One day just filled with awesome PostgreSQL talks, and meeting community members.

Many speakers uploaded or linked their talks to the PostgreSQL Wiki. If the slides for a talk are missing, please contact the speaker and ask them to upload or link the slides. (For the record, my slides are here)

The FOSDEM video team already uploaded the recordings, and the PostgreSQL Volunteers reviewed every single video. Find them here. Devrim also did short recordings of most of the talks, and posted them on Twitter.

The Hacker Public Radio did interviews with many projects, including PostgreSQL. Find the interview here.

The PostgreSQL Project had a Devroom on Sunday. The schedule is part of the regular PGDay/FOSDEM schedule. And of course we managed to fill the room to the last seat, standing room only. If only people would obey the "Room is full" sign on the door ;-)

We also had a booth (pic 2, pic 3) in the main building, where you can talk with community members about anything around PostgreSQL. Also we had some nice merchandise. Matching the bad weather, the umbrellas went very well. We promise to have more next year.

Last but not least, Slonik visited FOSDEM. Here is just a short selection of pictures.

Hope to see you all next year!

 

P.S.: The next PostgreSQL Europe Conference will be in Warsaw, in Orctober. Details can be found here.

Shaun M. Thomas: PG Phriday: Down in the Dumps

From Planet PostgreSQL. Published on Feb 10, 2017.

These days with multiple Postgres database environments a commonality, it’s not unheard of to copy data from one to another. Perhaps a production extract is necessary to properly vet data in a staging environment. Maybe the development environment needs to update its data to reflect an especially pernicious and intractable edge case someone observed. In any of these scenarios, we are likely to extract data from multiple tables to import it elsewhere. The question is: how?

Last week I flatly stated that pg_dump is not an adequate backup tool. Despite that, it’s perfect for situations like these, where we only need a few tables out of an entire database. However, tables frequently reference other tables in foreign keys. While such integrity concerns do not affect data export, they greatly complicate import efforts.

The deeper nested the referential integrity rules become, we’ll pull out exponentially more hair in trying to track all of them to their requisite tables. Even after we’ve found them all, we still need to unravel the horrors again in the target cluster.

A typical table dependency chart

To better illustrate what we mean by “horrors”, take this relatively simple table hierarchy:

CREATE TABLE tab_a (id INT PRIMARY KEY);
CREATE TABLE tab_a1 (id INT PRIMARY KEY REFERENCES tab_a);
CREATE TABLE tab_b (id INT PRIMARY KEY REFERENCES tab_a);
CREATE TABLE tab_b1 (id INT PRIMARY KEY REFERENCES tab_b);
CREATE TABLE tab_c (id INT PRIMARY KEY REFERENCES tab_b);
CREATE TABLE tab_c1 (id INT PRIMARY KEY REFERENCES tab_c);

INSERT INTO tab_a VALUES (1); INSERT INTO tab_a1 VALUES (1); INSERT INTO tab_b VALUES (1); INSERT INTO tab_b1 VALUES (1); INSERT INTO tab_c VALUES (1); INSERT INTO tab_c1 VALUES (1);

Imagine these are six tables out of potentially hundreds. From an outside perspective, we may only know that tab_a needs to be copied from one system to another. No matter what tool we use for this, finding the other tables is somewhat irritating. An ORM will display the relationships in a physical chart, and the psql command line client or pgAdmin will report tables that immediately depend on a table we examine. Not a great way to just isolate them all at once.

Fortunately we have the Postgres system catalog. The pg_constraint system table tracks all of those table relationships, and we can recursively fetch the entire tree using a CTE. Remember how awesome CTEs are? Let’s mix some chocolate with our peanut butter by using one to mine the catalog.

Starting with tab_a, here’s how we might find all of the other tables in the list:

WITH RECURSIVE deptree AS (
  SELECT conrelid, conrelid::REGCLASS::TEXT AS table_name,
         confrelid, confrelid::REGCLASS::TEXT AS dep_table
    FROM pg_constraint
   WHERE contype IN ('f', 'p')
     AND (confrelid::REGCLASS::TEXT = 'tab_a' OR
          conrelid::REGCLASS::TEXT = 'tab_a')
   UNION
  SELECT c.conrelid, c.conrelid::REGCLASS::TEXT AS table_name,
         c.confrelid, c.confrelid::REGCLASS::TEXT AS dep_table
    FROM pg_constraint c
    JOIN deptree d ON (d.conrelid = c.confrelid)
   WHERE c.contype = 'f'
)
SELECT *
  FROM deptree;

conrelid | table_name | confrelid | dep_table ----------+------------+-----------+----------- 69327 | tab_a | 0 | - 69332 | tab_a1 | 69327 | tab_a 69342 | tab_b | 69327 | tab_a 69352 | tab_b1 | 69342 | tab_b 69362 | tab_c | 69342 | tab_b 69372 | tab_c1 | 69362 | tab_c

As complicated as this query appears, it’s not actually that bad. We simply bootstrap the CTE with any tables that depend on tab_a, and repeat the process with each successive loop. Even if there are dozens or hundreds of incestuous relationships, we’ll capture all of them.

In any case, we have all the data we need to construct a pg_dump command to export all of these tables. To keep this as speedy as possible, let’s perform a parallel dump of the indicated tables into a directory we can transmit to another server.

pg_dump -t tab_a -t tab_a1 -t tab_b -t tab_b1 -t tab_c -t tab_c1 \
        -a -Fd -j 8 -f /tmp/tab_exports postgres

Though we can list each table individually, it would be nice if we could supply a filename of desired tables instead. Dependency graphs of sufficient depth could make for a literally impossible command-line given character limit constraints. This means we may need to execute multiple dumps or rely on wildcards and hope that there’s enough name overlap.

Given that caveat, maybe we want to combine pg_dump with xargs and its parallel capabilities instead:

mkdir /tmp/tab_exports

cat /tmp/tablist.txt | xargs -I{} -P 8 \ pg_dump -t {} -a -Fc -f /tmp/tab_exports/{}.pgr postgres

Either way, we now have a data-only representation of every table in the dependency tree of the table we’re actually importing. Now we just have to insert these values into the target system. And this is usually where we first encounter a major roadblock: existing contents.

First of all, that data is in our way. Secondly, the most efficient manner of ridding ourselves of their oppression is the TRUNCATE command. While TRUNCATE is a DDL-level command and can’t be circumvented, it does have a CASCADE decorator which will empty the entire dependency chain. This actually helps us since we have data to rebuild anything that might be affected by a cascading truncate.

Let’s give it a try on the recipient database:

TRUNCATE TABLE tab_a;

ERROR: cannot truncate a table referenced in a foreign key constraint DETAIL: Table "tab_a1" references "tab_a". HINT: Truncate table "tab_a1" at the same time, or use TRUNCATE ... CASCADE.

TRUNCATE TABLE tab_a CASCADE;

NOTICE: truncate cascades to table "tab_a1" NOTICE: truncate cascades to table "tab_b" NOTICE: truncate cascades to table "tab_b1" NOTICE: truncate cascades to table "tab_c" NOTICE: truncate cascades to table "tab_c1"

Well that was terrifying. Also note that the error only reported one of the dependencies. If we relied only on this information, we might attempt to drop the foreign key constraint thinking it was the only one. Every subsequent attempt to truncate the table would result in another error.

We’ve avoided all of that, but imagine if we had attempted to restrict our efforts to only the data in tab_a. We’d be left with no benefits of cascade, manually tracing the errors until success, inability to reestablish the constraints due to likely missing keys, and so on. No thanks.

Though truncate and the foreign keys aided us in emptying the tables on the recipient, restoring from our dump is somewhat problematic. We want to do it in parallel, but foreign keys directly prevent this by enforcing key integrity at each level. So we’re stuck either importing each table in order of the dependency tree, or we need to do a bit of preparation.

We can leverage that CTE again and build a couple of SQL scripts to help us out:

CREATE TEMP TABLE tab_deps AS
WITH RECURSIVE deptree AS (
  SELECT conrelid, conrelid::REGCLASS::TEXT AS table_name,
         confrelid, confrelid::REGCLASS::TEXT AS dep_table
    FROM pg_constraint
   WHERE contype IN ('f', 'p')
     AND (confrelid::REGCLASS::TEXT = 'tab_a' OR
          conrelid::REGCLASS::TEXT = 'tab_a')
   UNION
  SELECT c.conrelid, c.conrelid::REGCLASS::TEXT AS table_name,
         c.confrelid, c.confrelid::REGCLASS::TEXT AS dep_table
    FROM pg_constraint c
    JOIN deptree d ON (d.conrelid = c.confrelid)
   WHERE c.contype = 'f'
)
SELECT *
  FROM deptree;

COPY ( SELECT 'ALTER TABLE ' || table_name || ' DISABLE TRIGGER ALL;' FROM tab_deps ) TO '/tmp/no_trigs.sql';

COPY ( SELECT 'ALTER TABLE ' || table_name || ' ENABLE TRIGGER ALL;' FROM tab_deps ) TO '/tmp/fix_trigs.sql';

One thing that’s interesting about Postgres is that foreign keys are actually enforced by hidden system-level triggers. Given that information, we can simply turn them off for the duration of our import, and re-enable them afterwards. Now we have all the parts we need to restore the donor data into the environment of our choice.

It should be cautioned that all of the import steps should be performed on the target system itself. Even if this means synchronizing the dump directory to another system first. We don’t want to fat-finger these commands in production or an equally important system.

In any case, this is how that might go:

psql -f /tmp/no_trigs.sql postgres
psql -c 'TRUNCATE TABLE tab_a CASCADE;' postgres
pg_restore -d postgres -j 8 /tmp/tab_exports
psql -f /tmp/fix_trigs.sql postgres

Or if we used xargs, we’ll want this procedure instead:

psql -f /tmp/no_trigs.sql postgres
psql -c 'TRUNCATE TABLE tab_a CASCADE;' postgres

cat /tmp/tablist.txt | xargs -I{} -P 8 \ pg_restore -d postgres /tmp/tab_exports/{}.pgr

psql -f /tmp/fix_trigs.sql postgres

And voila, we’re saved!

Of course table structures between the environments may not match. In that case, we would simply drop all of the involved tables instead of truncating them and stunning their constraint triggers. As such, we would want to remove the -a data-only flag from pg_dump commands we used so pg_restore can recreate them. Ultimately, we’ve broken the paradox of satisfying constraints while simultaneously circumventing them.

It’s not an ideal situation, but hey, whatever works.

How To Export Django Model Data Along With Its Parent Model Data via dumpdata Command

By timonweb.com Django posts from Django community aggregator: Community blog posts. Published on Feb 10, 2017.

Django dumpdata command lets us export model objects as fixtures and store them in json / xml formats. All is good and works fine until a moment when your model is a children of a concrete model and shares two db tables (one for parent  another is for model). If you ...

Read now

Activate, Reactivate, Deactivate your Virtualenv

By Coding for Entrepreneurs | A Django Programming Class for the Non-Technical Founder from Django community aggregator: Community blog posts. Published on Feb 09, 2017.

Here's a quick guide to activa...

Andrew Dunstan: Problem with Buildfarm registrations fixed

From Planet PostgreSQL. Published on Feb 09, 2017.

Recently we switched the PostgreSQL Buildfarm to use https for its front-facing web transactions. That’s worked pretty well, but one thing got broken in the change. The registration page uses a Captcha to help avoid web spam, which has been a problem in the past. However, the code to fetch the relevant html was still getting plain http, and thus browsers were refusing to display the Captcha because it came from an untrusted source. Without the Captcha the registration was rejected. This was discovered yesterday, and has now been fixed. The Captcha code now uses https. If you have recently found a buildfarm registration rejected like this, please resubmit it.

Jason Owen: AWS PostgreSQL RDS Passwords

From Planet PostgreSQL. Published on Feb 09, 2017.

In order to set a strong password for the PostgreSQL database I provisioned on Amazon RDS, I looked up the limits. In my case, there are two sources of constraints:

  1. Amazon RDS limits
    • Must contain 8 to 128 characters
    • The password for the master database user can be any printable ASCII character except /, `, or @.
  2. Characters allowed in Amazon Lambda environment variables
    • Member must satisfy regular expression pattern: [^,]* (I cannot find documentation for this, except the error message when you try to save a value that has a comma in it.)

We can generate a password that meets these restrictions with makepasswd(1):

$ makepasswd --chars=128 \ --string
  'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789`~!#$%^&*()-_=+[]{}\|;:<>.?'\'

Note the '\' at the end: that means "close the single-quoted string, and append an escaped single-quote."

You can then save this to your ~/.pgpass file, being sure to escape \ and : characters:

$ sed -e 's/\\/\\\\/g;s/:/\\:/g'

Jason Owen: Variable names in PostgreSQL stored procedures

From Planet PostgreSQL. Published on Feb 08, 2017.

I am building a web application that delegates authentication to a third party. Once the third party authenticates the user, the app create a session for the user - and maybe create the user, too, if they don't already exist!

My first draft of this had all the SQL queries in the code. The logic is something like:

does user exist?
  yes: create session
  no: create user, then create session

I wasn't very happy with the code, for a couple of reasons. First, the SQL queries were rather ugly string constants. Multi-line strings aren't really great in any language, and embedding SQL in another language's source file makes it harder for editors to do syntax highlighting. Second, handling errors and encoding the (fairly simple) logic above was obscuring my intent.

Stored procedures are a way to keep database logic in the database. Among other benefits, this can dramatically simplify the calling code.

I ended up with a function like the following:

CREATE FUNCTION create_session(
  external_user_id bigint,
  external_user_name text,
  OUT session_id uuid,
  OUT session_expiration TIMESTAMP WITH TIME ZONE
) AS $$
DECLARE
  existing_user_id INTEGER;
  new_user_id INTEGER;
BEGIN
  SELECT INTO existing_user_id user_id
    FROM users_external
    WHERE users_external.external_user_id = external_user_id;

  IF existing_user_id IS NULL THEN
    INSERT INTO users_external (external_user_id, external_user_name)
      VALUES (external_user_id, external_user_name)
      RETURNING user_id
      INTO new_user_id;
    INSERT INTO sessions (user_id)
      VALUES (new_user_id)
      RETURNING session_id, session_expiration
      INTO session_id, session_expiration;
  ELSE
    INSERT INTO sessions (user_id)
      VALUES (existing_user_id)
      RETURNING session_id, expires
      INTO session_id, session_expiration;
  END IF;
END;
$$ LANGUAGE plpgsql;

This is a syntactically correct function and will be accepted by PostgreSQL, but fails at runtime:

> select * from create_session(12345, 'example');
ERROR:  column reference "external_user_id" is ambiguous
LINE 3:     WHERE users_external.external_user_id = external_user_id
                                                    ^
DETAIL:  It could refer to either a PL/pgSQL variable or a table column.
QUERY:  SELECT                       user_id
    FROM users_external
    WHERE users_external.external_user_id = external_user_id
CONTEXT:  PL/pgSQL function create_session(bigint,text) line 6 at SQL statement

How do you disambiguate a parameter from a column name?

There is some useful documentation about how variable substitution works in PL/pgSQL. In particular, it mentions that you can disambiguate column names from variable names by labelling the declaring block:

<<block>>
DECLARE
    foo int;
BEGIN
    foo := ...;
    INSERT INTO dest (col) SELECT block.foo + bar FROM src;

This is tangentially related, but it does not cover the issue I was having. However, it links to Structure of PL/pgSQL, which includes a note near the bottom:

There is actually a hidden "outer block" surrounding the body of any PL/pgSQL function. This block provides the declarations of the function's parameters (if any), as well as some special variables such as FOUND (see Section 41.5.5). The outer block is labeled with the function's name, meaning that parameters and special variables can be qualified with the function's name.

That was the key piece I was missing! You can disambiguate a parameter from a column by prefixing the parameter with the function name. Here's what we needed to change to get the example to work:

@@ -13 +13 @@
-    WHERE users_external.external_user_id = external_user_id;
+    WHERE users_external.external_user_id = create_session.external_user_id;
@@ -17 +17,4 @@
-      VALUES (external_user_id, external_user_name)
+      VALUES (
+        create_session.external_user_id,
+        create_session.external_user_name
+      )
@@ -22,2 +25,4 @@
-      RETURNING session_id, session_expiration
-      INTO session_id, session_expiration;
+      RETURNING sessions.session_id,
+                sessions.session_expiration
+      INTO create_session.session_id,
+           create_session.session_expiration;
@@ -27,2 +32,4 @@
-      RETURNING session_id, expires
-      INTO session_id, session_expiration;
+      RETURNING sessions.session_id,
+                sessions.session_expiration
+      INTO create_session.session_id,
+           create_session.session_expiration;

This simplifies the calling code to a single query!

> SELECT * FROM create_session(12345, 'example');
              session_id              |      session_expiration
--------------------------------------+-------------------------------
 7be20fa5-63ec-4937-8a02-3417df54571b | 2017-02-15 18:44:29.653136-05
(1 row)

Here's the schema I'm using for this example:

CREATE TABLE users_external(
  user_id SERIAL PRIMARY KEY,
  external_user_id BIGINT NOT NULL,
  external_user_name TEXT NOT NULL
);
CREATE TABLE sessions(
  session_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id INTEGER NOT NULL REFERENCES users_external (user_id),
  session_expiration TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW() + '1 week'
);

And the complete, rewritten, working stored procedure:

CREATE FUNCTION create_session(
  external_user_id bigint,
  external_user_name text,
  OUT session_id uuid,
  OUT session_expiration TIMESTAMP WITH TIME ZONE
) AS $$
DECLARE
  existing_user_id INTEGER;
  new_user_id INTEGER;
BEGIN
  SELECT INTO existing_user_id user_id
    FROM users_external
    WHERE users_external.external_user_id = create_session.external_user_id;

  IF existing_user_id IS NULL THEN
    INSERT INTO users_external (external_user_id, external_user_name)
      VALUES (
        create_session.external_user_id,
        create_session.external_user_name
      )
      RETURNING user_id
      INTO new_user_id;
    INSERT INTO sessions (user_id)
      VALUES (new_user_id)
      RETURNING sessions.session_id,
                sessions.session_expiration
      INTO create_session.session_id,
           create_session.session_expiration;
  ELSE
    INSERT INTO sessions (user_id)
      VALUES (existing_user_id)
      RETURNING sessions.session_id,
                sessions.session_expiration
      INTO create_session.session_id,
           create_session.session_expiration;
  END IF;
END;
$$ LANGUAGE plpgsql;

Django PositionField

By Coding for Entrepreneurs | A Django Programming Class for the Non-Technical Founder from Django community aggregator: Community blog posts. Published on Feb 06, 2017.

Sometimes you need ordering in...

5 Reasons to Use Class Based Views

By GoDjango - Django Screencasts from Django community aggregator: Community blog posts. Published on Feb 06, 2017.

Anytime anyone brings up Class Based Views, generic and otherwise, it is similar to the Vim vs Emacs debate. So lets pile on a bit more.

But not really. In actuality, I feel like there are good, solid, and legitimate reasons to use Class Based Views which get passed over in the middle of arguments.

In this weeks topic I talk about 5 of those reasons I think people should use Class Based Views. In reality there are more, but I wanted to keep the video short'ish.

5 Reasons to Use Class Based Views

How to Implement Case-Insensitive Username

By Simple is Better Than Complex from Django community aggregator: Community blog posts. Published on Feb 05, 2017.

Inspired by a discussion in the How to Extend Django User Model comments, I decided to compile a few options on how to implement a case insensitive authentication using the built in Django User. Thanks to Paul Spiteri for bringing up the question and also to provide a possible solution!


Option 1: Custom Authentication Backend

Please note that there is a small difference in the implementation between Django 1.10 and 1.11, as from 1.11 the authenticate method receives a request object.

Django Version 1.10.x Or Below

If your application is already up and running and you can’t afford to customize the Django User model, this is the less intrusive way.

Create a python module named backends.py anywhere in your project and add the following snippet:

backends.py

from django.contrib.auth import get_user_model
from django.contrib.auth.backends import ModelBackend


class CaseInsensitiveModelBackend(ModelBackend):
    def authenticate(self, username=None, password=None, **kwargs):
        UserModel = get_user_model()
        if username is None:
            username = kwargs.get(UserModel.USERNAME_FIELD)
        try:
            case_insensitive_username_field = '{}__iexact'.format(UserModel.USERNAME_FIELD)
            user = UserModel._default_manager.get(**{case_insensitive_username_field: username})
        except UserModel.DoesNotExist:
            # Run the default password hasher once to reduce the timing
            # difference between an existing and a non-existing user (#20760).
            UserModel().set_password(password)
        else:
            if user.check_password(password) and self.user_can_authenticate(user):
                return user

Now switch the authentication backend in the settings.py module:

settings.py

AUTHENTICATION_BACKENDS = ('mysite.core.backends.CaseInsensitiveModelBackend', )

Please note that 'mysite.core.backends.CaseInsensitiveModelBackend' must be changed to the valid path, where you created the backends.py module.

Django Version 1.11.x Or Higher

If your application is already up and running and you can’t afford to customize the Django User model, this is the less intrusive way.

Create a python module named backends.py anywhere in your project and add the following snippet:

backends.py

from django.contrib.auth import get_user_model
from django.contrib.auth.backends import ModelBackend


class CaseInsensitiveModelBackend(ModelBackend):
    def authenticate(self, request, username=None, password=None, **kwargs):
        UserModel = get_user_model()
        if username is None:
            username = kwargs.get(UserModel.USERNAME_FIELD)
        try:
            case_insensitive_username_field = '{}__iexact'.format(UserModel.USERNAME_FIELD)
            user = UserModel._default_manager.get(**{case_insensitive_username_field: username})
        except UserModel.DoesNotExist:
            # Run the default password hasher once to reduce the timing
            # difference between an existing and a non-existing user (#20760).
            UserModel().set_password(password)
        else:
            if user.check_password(password) and self.user_can_authenticate(user):
                return user

Now switch the authentication backend in the settings.py module:

settings.py

AUTHENTICATION_BACKENDS = ('mysite.core.backends.CaseInsensitiveModelBackend', )

Please note that 'mysite.core.backends.CaseInsensitiveModelBackend' must be changed to the valid path, where you created the backends.py module.

Drawbacks

This change only affects the authentication process, it doesn’t enforce the uniqueness of the case insensitive username. Meaning that you can still have a username saved as John and another one as john and another one as JOHN. Actually if that’s the case, the application will crash when John or JOHN try to authenticate in the system, because UserModel._default_manager.get() will return more than one row, and will eventually throw an exception.

First thing you want to do, before you change the authentication backends, is to make sure there is no conflicting usernames in the database (same username with mixed case).

If you are good to go, make sure every input where the end user can set the username field is properly validated.

For example, if you are using the built-in UserCreationForm from django.contrib.auth.forms, extend it and enforce the validation. Below, a very straightforward example:

forms.py

from django.contrib.auth.forms import UserCreationForm
from django.contrib.auth.models import User


class CaseInsensitiveUserCreationForm(UserCreationForm):
    def clean(self):
        cleaned_data = super(CaseInsensitiveUserCreationForm, self).clean()
        username = cleaned_data.get('username')
        if username and User.objects.filter(username__iexact=username).exists():
            self.add_error('username', 'A user with that username already exists.')
        return cleaned_data

The whole idea behind this strategy is to take care of all the possible ways the end user has to set the username, either by creating a new user or by editing it. Use it with caution.


Option 2: Extending The User Model

This is the best option if your project is in the beginning and you can afford swapping the authentication User. Actually any of the strategies described in the How to Extend Django User Model will work (except for the Proxy model, because the AUTH_USER_MODEL can’t be a Proxy).

Below, the most basic way to implement it, without messing around with anything:

models.py

from django.contrib.auth.models import AbstractUser, UserManager

class CustomUserManager(UserManager):
    def get_by_natural_key(self, username):
        case_insensitive_username_field = '{}__iexact'.format(self.model.USERNAME_FIELD)
        return self.get(**{case_insensitive_username_field: username})

class CustomUser(AbstractUser):
    objects = CustomUserManager()

Now, it’s just a matter of swapping the AUTH_USER_MODEL in the settings.py module:

settings.py

AUTH_USER_MODEL = 'core.CustomUser'

And that’s it!


I hope this tutorial could help you somehow!

Know a better way to solve this issue? Any questions? Comments? Leave a message in the comments below!

News items from the new year

By Griatch's Evennia musings (MU* creation with Django+Twisted) from Django community aggregator: Community blog posts. Published on Feb 05, 2017.

The last few months have been mostly occupied with fixing bugs and straightening out usage quirks as more and more people take Evennia through its paces.

Webclient progress

One of our contributors, mewser/titeuf87 has put in work on implementing part of our roadmap for the webclient. In the first merged batch, the client now has an option window for adjusting and saving settings. This is an important first step towards expanding the client's functionality. Other  features is showing help in an (optional) popup window and to report window activity by popup and/or sound.

The goal for the future is to allow the user or developer to split the client window into panes to which they can then direct various output from the server as they please It's early days still but some of the example designs being discussed can be found in the wiki webclient brainstorm (see the title image of this blog for one of the mockups).
 
New server stuff

Last year saw the death of our old demo server on horizondark.com, luckily the new one at silvren.com has worked out fine with no hickups. As part of setting that up, we also got together a more proper list of recommended hosts for Evennia games. Evennia requires more memory than your average C code base so this is important information to have. It seems most of our users run Evennia on various cloud hosting services rather than from a traditional remote server login.

Arx going strong 

The currently largest Evennia game, the mush Arx - After the Reckoning has helped a lot in stress testing. Their lead coder Tehom has also been active both in reporting issues and fixing them - kudos! There are however some lingering issues which appears rarely enough that they have not been possible to reproduce yet; we're working on those. Overall though I must say that considering how active Arx is, I would have expected to have seen even more "childhood diseases" than we have. 

Launch scripts and discussions

It is always interesting with feedback, and some time back another discussion thread erupted over on musoapbox, here. The musoapbox regulars have strong opinions about many things and this time some were critical of Evennia's install process. They felt it was too elaborate with too many steps, especially if you are approaching the system with no knowledge about Python. Apparently the average MUSH server has a much shorter path to go (even though that does require C compiling). Whereas I don't necessarily agree with all notions in that thread, it's valuable feedback - I've long acknowledged that it's hard to know just what is hard or not for a beginner.

Whereas we are planning to eventually move Evennia to pypi (so you can do pip install evennia), the instructions around getting virtualenv setup is not likely to change. So there is now unix shell scripts supplied with the system for installing on debian-derived systems (Debian, Ubuntu, Mint etc). I also made scripts for automating the setup and launch of Evennia and to use it with linux' initd within the scope of a virtualenv.
So far these scripts are not tested by enough people to warrant them being generally recommended, but if you are on a supported OS and is interested to try they are found from the top of the Evennia repo, in bin/unix/. More info can be found on their documentation page.

Docker

Speaking of installing, Evennia now has an official Docker image, courtesy of the work of contributor and Ainneve dev feend78. The image is automatically kept up-to.date with the latest Evennia repo and allows Evennia to be easily deployed in a production environment (most cloud services supports this). See Docker wiki page for more info.


Lots of new PRs

There was a whole slew of contributions waiting for me when returning from Chistmas break, and this has not slowed since. Github makes it easy to contribute and I think we are really starting to see this effect (Google Code back in the day was not as simple in this regard). The best thing with many of these PRs is that they address common things that people need to do but which could be made simpler or more flexible. It's hard to plan for all possibilities, so many people using the system is the best way to find such solutions.

Apart from the map-creation contribs from last year we also have a new Wildnerness system by mewser/titeuf87. This implements wilderness according to an old idea I had on the mailing list - instead of making a room per location, players get a single room.  The room tracks its coordinate in the wildnerness and updates its description and exits dynamically every time you move. This way you could in principle have an infinite wilderness without it taking any space. It's great to see the idea turned into a practical implementation and that it seems to work so well. Will be fun to see what people can do with it in the future!

I'm Gonna Regret This

By chrism from . Published on Jun 14, 2016.

A plea for liberals to fight for individual rights.

Is Open Source Consulting Dead?

By chrism from . Published on Sep 10, 2013.

Has Elvis left the building? Will we be able to sustain ourselves as open source consultants?

Consulting and Patent Indemification

By chrism from . Published on Aug 09, 2013.

Article about consulting and patent indemnification

Python Advent Calendar 2012 Topic

By chrism from . Published on Dec 24, 2012.

An entry for the 2012 Japanese advent calendar at http://connpass.com/event/1439/

Why I Like ZODB

By chrism from . Published on May 15, 2012.

Why I like ZODB better than other persistence systems for writing real-world web applications.

A str. __iter__ Gotcha in Cross-Compatible Py2/Py3 Code

By chrism from . Published on Mar 03, 2012.

A bug caused by a minor incompatibility can remain latent for long periods of time in a cross-compatible Python 2 / Python 3 codebase.

In Praise of Complaining

By chrism from . Published on Jan 01, 2012.

In praise of complaining, even when the complaints are absurd.

2012 Python Meme

By chrism from . Published on Dec 24, 2011.

My "Python meme" replies.

In Defense of Zope Libraries

By chrism from . Published on Dec 19, 2011.

A much too long defense of Pyramid's use of Zope libraries.

Plone Conference 2011 Pyramid Sprint

By chrism from . Published on Nov 10, 2011.

An update about the happenings at the recent 2011 Plone Conference Pyramid sprint.

Jobs-Ification of Software Development

By chrism from . Published on Oct 17, 2011.

Try not to Jobs-ify the task of software development.

WebOb Now on Python 3

By chrism from . Published on Oct 15, 2011.

Report about porting to Python 3.

Open Source Project Maintainer Sarcastic Response Cheat Sheet

By chrism from . Published on Jun 12, 2011.

Need a sarcastic response to a support interaction as an open source project maintainer? Look no further!

Pylons Miniconference #0 Wrapup

By chrism from . Published on May 04, 2011.

Last week, I visited the lovely Bay Area to attend the 0th Pylons Miniconference in San Francisco.

Pylons Project Meetup / Minicon

By chrism from . Published on Apr 14, 2011.

In the SF Bay Area on the 28th, 29th, and 30th of this month (April), 3 separate Pylons Project events.