Skip to content. | Skip to navigation

Personal tools
Log in
Sections
You are here: Home

Open Source Posts

Django, pytz and NonExistentTimeError

By Between engineering and real life from Django community aggregator: Community blog posts. Published on Mar 30, 2015.

Brief: In one of the project I work on we have convert some old naive datetime object to timezone aware ones. Converting naive datetime to timezone aware one is usually straightforward job. In django you even have a nice utility function for this. For example:

import pytz from django.utils import timezone timezone.make_aware(datetime.datetime(2012, 3, 25, 3, 52), timezone=pytz.timezone('Europe/Stockholm')) # returns datetime.datetime(2012, 3, 25, 3, 52, tzinfo=<DstTzInfo 'Europe/Stockholm' CEST+2:00:00 DST>)

Problem: You can use this for quite a long time until one day you end up with something like this:

timezone.make_aware(datetime.datetime(2012, 3, 25, 2, 52), timezone=pytz.timezone('Europe/Stockholm')) # which leads to Traceback (most recent call last): File "", line 1, in File "/home/ilian/venvs/test/lib/python3.4/site-packages/django/utils/timezone.py", line 358, in make_aware return timezone.localize(value, is_dst=None) File "/home/ilian/venvs/test/lib/python3.4/site-packages/pytz/tzinfo.py", line 327, in localize raise NonExistentTimeError(dt) pytz.exceptions.NonExistentTimeError: 2012-03-25 02:52:00

Explanation: The reason for this error is that in the real world this datetime does not exists. Due to the DST change on this date the clock jumps from 01:59 directly to 03:00. Fortunately (or not) pytz is aware of the fact that this time is invalid and will throw the exception above.

Why this happens? Well we couldn't be sure how exactly this one got into our legacy data but the assumption is that at the moment when the record was saved the server has been in different timezone where this has been a valid time.

Solution: The fix is quite simple, just add an hour if the error occurs:

try: date = make_aware( datetime.fromtimestamp(date_time, timezone=pytz.timezone('Europe/Stockholm') ) except pytz.NonExistentTimeError: date = make_aware( datetime.fromtimestamp(date_time) + timedelta(hours=1), timezone=pytz.timezone('Europe/Stockholm') )

Andrew Dunstan: Testing patches with a couple of commands using a buildfarm animal

From Planet PostgreSQL. Published on Mar 30, 2015.

I've blogged before about how the buildfarm client software can be useful for developers amd reviewers. Yesterday was a perfect example. I was testing a set of patches for a bug fix for pg_upgrade running on Windows, and they go all the way back to the 9.0 release. The simplest way to test these was using a buildfarm animal. On jacana, I applied the relevant patch in each branch repo, and then simply did this to build and test them all:

for f in root/[RH]* ; do 
br=`basename $f`
perl ./run_build.pl --from-source=`pwd`/$f/pgsql --config=jacana.conf --verbose $br
done

After it was all done and everything worked, I cleaned up the git repositories so they were ready for more buildfarm runs:

for f in root/[RH]* ; do 
pushd $f/pgsql
git reset --hard
git clean -dfxq
popd
done

Pretty simple! The commands are shown here on multiple lines for clarity, but in fact I wrote each set on one line, so after applying the patches the whole thing took 2 lines. (Because jacana only builds back to release 9.2, I had to repeat the process on frogmouth for 9.0 and 9.1, using the same process).

Tastypie with ForeignKey

By Agiliq Blog: Django web development from Django community aggregator: Community blog posts. Published on Mar 29, 2015.

Tastypie with ForeignKeys

This is a followup post on Getting started with tastypie. We will use the same project setup as used in the last post.

This post will cover:

  • Fetch ForeignKey data in GET calls
  • Create an object with ForeignKeys using POST calls

Setup the application

Let's add the capability to categorise the expenses

Add a model called ExpenseCategory

class ExpenseCategory(models.Model):
    name = models.CharField(max_length=100)
    description = models.TextField()

Add a FK from Expense to ExpenseCategory

class Expense(models.Model):
    description = models.CharField(max_length=100)
    amount = models.IntegerField()
    user = models.ForeignKey(User, null=True)
    category = models.ForeignKey(ExpenseCategory, null=True)

There already exists some Expense in db without an associated category, so make ExpenseCategory as nullable.

Create and apply migrations

python manage.py makemigrations
python manage.py migrate

Let's create an expensecategory from shell and associate it with an expense of user Sheryl.

u = User.objects.get(username='sheryl')
ec = ExpenseCategory.objects.create(name='Misc', description='Miscellaneous expenses')
e = Expense.objects.create(description='Went to Stockholm', amount='5000', user=u, category=ec)

Get FK fields in response too.

We want category in Expense GET endpoint too.

Our first approach would be adding 'category' to ExpenseCategory.Meta.fields. Try it

fields = ['description', 'amount', 'category']

Try the expense GET endpoint for Sheryl

http://localhost:8000/api/expense/?username=sheryl&api_key=1a23&format=json

Still don't see category in response. We need something more than this.

Adding fields.ForeignKey on ExpenseResource

There is no easy way to achieve this without adding a resource for ExpenseCategory.

We need to create an ExpenseCategoryResource similar to ExpenseResource

Add ExpenseCategoryResource to expenses/api.py

class ExpenseCategoryResource(ModelResource):
    class Meta:
        queryset = ExpenseCategory.objects.all()
        resource_name = 'expensecategory'

Add proper url pattern for ExpenseCategoryResource in expenses/api.py

expense_category_resource = ExpenseCategoryResource()

urlpatterns = patterns('',
    url(r'^admin/', include(admin.site.urls)),
    url(r'^api/', include(expense_resource.urls)),
    url(r'^api/', include(expense_category_resource.urls)),
)

Verify things are properly setup for ExpenseCategoryResource by accessing

http://localhost:8000/api/expensecategory/?format=json

Add the following to ExpenseCategory

category = fields.ForeignKey(ExpenseCategoryResource, attribute='category', null=True)

Try

http://localhost:8000/api/expense/?username=sheryl&api_key=1a23&format=json

After this you'll be able to see category in response

This will return resource_uri of ExpenseCategory by default

Using full=True

Probably you want to see the name and description of category in the response

Make the following modification

category = fields.ForeignKey(ExpenseCategoryResource, attribute='category', null=True, full=True)

Try the GET endpoint again

http://localhost:8000/api/expense/?username=sheryl&api_key=1a23&format=json

POST data with FK

There are several ways in which we can set category on expense while making POST call to create expenses.

Post with resource_uri of FK

We already have one ExpenseCategory in the db and the resource_uri for that expensecategory is '/api/expensecategory/1/'

We want to create an expense and set the category as our earlier created expensecategory.

post_data = {'description': 'Bought a phone for testing', 'amount': 2200, 'category': '/api/expensecategory/1/'}
post_url = 'http://localhost:8000/api/expense/?username=sheryl&api_key=1a23'
r = requests.post(post_url, data=json.dumps(post_data), headers=headers)

Posting entire data of FK

You find that the expense you want to create doesn't fit in any of the existing categories. You want to create a new expensecategory while making POST data to expense endpoint.

So we want to creating ExpenseCategory and Expense together.

You need to post the following data in such case.

post_data = {'description': 'Went to paris to attend a conference', 'amount': 9000, 'category': {'name': 'Travel', 'description': 'Expenses incurred on travelling'}}

No category exists for Travel yet.

Check the count of ExpenseCategory currently so that later you can verify that a new ExpenseCategory is created.

ExpenseCategory.objects.count()
1   #output

POST the data to expense endpoint

r = requests.post(post_url, data=json.dumps(post_data), headers=headers)
print r.status_code    #401
Why you got 401

Even though you tried creating an Expense on expense post endpoint, tastypie internally tries creating an expensecategory because of structure of post_data. But tastypie finds that ExpenseCategoryResource doesn't have authorization to allow POST yet.

So we need to add proper authorization to ExpenseCategory before this post call can succeed.

Add the following to ExpenseCategoryResource.Meta

authorization = Authorization()

POSTing again

Try the post call again.

r = requests.post(post_url, data=json.dumps(post_data), headers=headers)

This would have worked well and a new ExpenseCategory should have been created.

ExpenseCategory.objects.count()
2    #output

Also the new expense would have got associated with the newly created ExpenseCategory.

Christophe Pettus: PostgreSQL and JSON: 2015

From Planet PostgreSQL. Published on Mar 28, 2015.

The slides from my talk at PGConf US 2015 are now available.

Josh Berkus: Crazy SQL Saturday: replacing SciPy with SQL

From Planet PostgreSQL. Published on Mar 28, 2015.

I have a data analytics project which produces multiple statistical metrics for a large volume of sensor data.  This includes percentiles (like median and 90%) as well as average, min and max.  Originally this worked using PL/R, which was pretty good except that some of the R modules were crashy, which was not so great for uptime.

This is why, two years ago, I ripped out all of the PL/R and replaced it with PL/Python and SciPy.  I love SciPy because it gives me everything I liked about R, without most of the things I didn't like.  But now, I've ripped out the SciPy as well.  What am I replacing it with?  Well, SQL.

In version 9.4, Andrew Gierth added support for percentiles to PostgreSQL via WITHIN GROUP aggregates. As far as I'm concerned, this is second only to JSONB in reasons to use 9.4.

Now, one of the more complicated uses I make of aggregates is doing "circular" aggregates, that is producing percentiles for a set of circular directions in an effort to determine the most common facings for certain devices.  Here's the PL/Python function I wrote for this, which calculates circular aggregates using the "largest gap" method.  This algorithm assumes that the heading measurements are essentially unordered, so to find the endpoints of the arc we look for two measurements which are the furthest apart on the circle.  This means shifting the measurements to an imaginary coordinate system where the edge of this gap is the low measurement, calculating percentiles, and then shifting it back.  Note that this method produces garbage if the device turned around a complete circle during the aggregate period.

Now, that SciPy function was pretty good and we used it for quite a while.  But we were unhappy with two things: first, SciPy is rather painful as a dependency because the packaging for it is terrible; second, having PostgreSQL call out to SciPy for each iteration isn't all that efficient.

So, since 9.4 has percentiles now, I started writing a function based the built-in SQL percentiles.  Initially I was thinking it would be a PL/pgSQL function, but was pleasantly surprised to find that I could write it entirely as a SQL function!  Truly, Postgres's SQL dialect is turing-complete.

So here's the new all-SQL function, with some helper functions.

Then I performance tested it, and was pleasantly surprised again.  The SciPy version took 2.6 seconds* to aggregate 100,000 sets of 20 measurements.  The new SQL version takes 40 milleseconds, cutting response time by 98%.  Wow!

And I've eliminated a hard-to-install dependency.  So it's all win.  Of course, if anyone has ideas on making it even faster, let me know.

Pushing the limits of SQL to the edge of insanity.

(* note: I expect that most of the extra time for the SciPy version is in calling out to Python through PL/Python, rather than in SciPy itself.)

Welcome to Our New Staff Members

By Caktus Consulting Group from Django community aggregator: Community blog posts. Published on Mar 27, 2015.

We’ve hit one of our greatest growth points yet in 2015, adding nine new team members since January to handle our increasing project load. There are many exciting things on the horizon for Caktus and our clients, so it’s wonderful to have a few more hands on deck.

One of the best things about working at Caktus is the diversity of our staff’s interests and backgrounds. In order of their appearance from left to right in the photos above, here’s a quick look at our new Cakti’s roles and some fun facts:

Neil Ashton

Neil was also a Caktus contractor who has made the move to full-time Django developer. He is a keen student of more than programming languages; he holds two degrees in Classics and another Master’s in Linguistics.

Jeff Bradberry

Though Jeff has been working as a contractor at Caktus, he recently became a full-time developer. In his spare time, he likes to play around with artificial intelligence, sometimes giving his creations a dose of inexplicable, random behavior to better mimic us poor humans.

Ross Pike

Ross is our new Lead Designer and has earned recognition for his work from Print, How Magazine, and the AIGA. He also served in the Peace Corps for a year in Bolivia on a health and water mission.

Lucas Rowe

Lucas joins us for six months as a game designer, courtesy of a federal grant to reduce the spread of HIV. When he’s not working on Epic Allies, our HIV medication app, he can be found playing board games or visiting local breweries.

Erin Mullaney

Erin has more than a decade of development experience behind her, making her the perfect addition to our team of Django developers. She loves cooking healthy, vegan meals and watching television shows laden with 90s nostalgia.

Liza Chabot

Liza is an English major who loves to read, write, and organize, all necessary skills as Caktus’ Administrative and Marketing Assistant. She is also a weaver and sells and exhibits her handwoven wall hangings and textiles in the local craft community.

NC Nwoko

NC’s skills are vast in scope. She graduated from UNC Chapel Hill with a BA in Journalism and Mass Communication with a focus on public relations and business as well as a second major in International Studies with a focus on global economics. She now puts this experience to good use as Caktus’ Digital Health Product Manager, but on the weekends you can find her playing video games and reading comic books.

Edward Rowe

Edward is joining us for six months as a game developer for the Epic Allies project. He loves developing games for social good. Outside of work, Edward continues to express his passion for games as an avid indie game developer, UNC basketball fan, and board and video game player.

Rob Lineberger

Rob is our new Django contractor. Rob is a renaissance man; he’s not only a skilled and respected visual artist, he’s trained in bioinformatics, psychology, information systems and knows his way around the kitchen.

To learn more about our team, visit our About Page. And if you’re wishing you could spend your days with these smart, passionate people, keep in mind that we’re still hiring.

Shaun M. Thomas: PG Phriday: High Availability Through Delayed Replication

From Planet PostgreSQL. Published on Mar 27, 2015.

High availability of PostgreSQL databases is incredibly important to me. You might even say it’s a special interest of mine. It’s one reason I’m both excited and saddened by a feature introduced in 9.4. I’m Excited because it’s a feature I plan to make extensive use of, and saddened because it has flown under the radar thus far. It’s not even listed in the What’s new in PostgreSQL 9.4 Wiki page. If they’ll let me, I may have to rectify that.

What is this mysterious change that has me drooling all over my keyboard? The new recovery_min_apply_delay standby server setting. In name and intent, it forces a standby server to delay application of upstream changes. The implications, however, are much, much more important.

Let me tell you a story; it’s not long, actually. A couple years ago, I had to help a client that was using a hilariously over-engineered stack to prevent data loss. I only say that because at first glance, the number of layers and duplicate servers would shock most people, and the expense would finish the job. This was one of my full recommended stacks, plus a few extra bits for the truly paranoid. DRBD-bonded servers, Pacemaker failover, off-site disaster recovery streaming clones, nightly backup, off-site backup and historical WAL storage, and long-term tape archival in a vault for up to seven years. You would need to firebomb several cities to get rid of this data.

But data permanence and availability are not synonymous. All it took was a single misbehaving CPU to take out the entire constellation of database servers, and corrupt a bunch of recent WAL files for good measure. How this is possible, and how difficult it is to avoid, is a natural extension of using live streaming replicas for availability purposes. We always need to consider one important factor: immediacy applies to everything.

Here’s what actually happened:

  1. A CPU on master-1 went bad.
  2. Data being written to database files was corrupt.
  3. DRBD copied the bad blocks, immediately corrupting master-2.
  4. Shared memory became corrupt.
  5. Streaming replication copied the corrupt data to dr-master-1.
  6. DRBD copied the bad blocks, immediately corrupting dr-master-2.
  7. In turn, PostgreSQL noticed the corruption and crashed on each server.
  8. Monitoring systems started screaming on all channels.

Just like that, a bulletproof high-availability cluster imploded into itself. All we had left at that point were the pristine backups, and the off-site WAL archives. This is one of the major reasons I wrote walctl, actually. Keeping archived WAL files on a tertiary server isolates them from issues that affect the primary or disaster recovery clusters. Further, it means the files can be pulled by any number of new clones without overloading the masters, which are intended to be used for OLTP.

In this case, we pulled a backup from the off-site backup vault, gathered the WAL files that were generated before the CPU went bad, and got the cluster running again in a couple of hours. But this could have easily been much worse, and without the previously-mentioned expensive paranoia and surplus of redundancy levels, it would have. And despite the fact we recovered everything, there’s still the several-hour outage to address.

You see, we weren’t paranoid enough. For a truly high-available architecture, corruption of the data source should always be considered a possibility. Both DRBD and PostgreSQL strive to copy data as quickly as possible, just as they should. Synchronization delay is another huge, but unrelated problem applications often need to circumvent when communicating with replicas. One way to solve this is to keep a third standby server that uses traditional WAL consumption, and then implement a time delay.

Effectively, this means preventing the extra server from processing WAL files for some period of time. This interval allows a DBA to interrupt replay before corruption reaches a fully online replica. It takes time for monitoring systems to report outages, and for the DBA to log into a server and diagnose the problem. As we’ve seen, it can already be too late; the data is already corrupt, and a backup is the only recourse. But a delayed server is already online, can easily be recovered to a point right before the corruption started, and can drastically reduce the duration of an outage.

There are several ways of imposing this delay, and all of them require at least one more series of scripts or software to strictly regulate file availability. They’re also largely irrelevant since the introduction of PostgreSQL 9.4 and the recovery_min_apply_delay setting. Instead of a cron job, or using a complicated script as the restore_command in recovery.conf, or some other method, we just set this variable and we get the desired offset. Here’s a two-hour window:

recovery_min_apply_delay = '2h'

This works with both streaming replication, and more traditional WAL file recovery. There is however, one caveat to using this setting. Since the replica can not apply the changes as they’re presented, they are held in the pg_xlog directory until the imposed purgatory expires. On highly transactional systems, this can result in unexpected space usage on replicas that activate the setting. The larger the safety margin, the more files will accumulate awaiting replay.

Barring that, it’s a huge win for anyone who wants to run a highly available cluster. In fact, it can even be integrated into cluster automation, so a delayed server is stopped if the primary system is down. This keeps our desired window intact while we investigate the problem, without us having to stop troubleshooting and shut down the time-offset replica ourselves.

In addition, a delayed server can be used for standard recovery purposes. If a user erroneously deletes data, or a rogue process drops a critical object, there’s a replica ready and waiting to let us recover the data and reintroduce it to the master server.

Having a server sitting around with self-inflicted synchronization offset seems ridiculous at first glance. But from the perspective of a DBA, it can literally save the database if used properly. I highly recommend anyone who can afford to implement this technique, does so. Your system uptime will thank you.

Michael Paquier: Postgres 9.5 feature highlight: Scale-out with Foreign Tables now part of Inheritance Trees

From Planet PostgreSQL. Published on Mar 27, 2015.

This week the following commit has landed in PostgreSQL code tree, introducing a new feature that will be released in 9.5:

commit: cb1ca4d800621dcae67ca6c799006de99fa4f0a5
author: Tom Lane <tgl@sss.pgh.pa.us>
date: Sun, 22 Mar 2015 13:53:11 -0400
Allow foreign tables to participate in inheritance.

Foreign tables can now be inheritance children, or parents.  Much of the
system was already ready for this, but we had to fix a few things of
course, mostly in the area of planner and executor handling of row locks.

[...]

Shigeru Hanada and Etsuro Fujita, reviewed by Ashutosh Bapat and Kyotaro
Horiguchi, some additional hacking by me

As mentioned in the commit message, foreign tables can now be part of an inheritance tree, be it as a parent or as a child.

Well, seeing this commit, one word comes immediately in mind: in-core sharding. And this feature opens such possibilities with for example a parent table managing locally a partition of foreign child tables located on a set of foreign servers.

PostgreSQL offers some way to already do partitioning by using CHECK constraints (non-intuitive system but there may be improvements in a close future in this area). Now combined with the feature committed, here is a small example of how to do sharding without the need of any external plugin or tools, only postgres_fdw being needed to define foreign tables.

Now let's take the example of 3 Postgres servers, running on the same machine for simplicity, using ports 5432, 5433 and 5434. 5432 will hold a parent table, that has two child tables, the two being foreign tables, located on servers listening at 5433 and 5434. The test case is simple: a log table partitioned by year.

First on the foreign servers, let's create the child tables. Here it is for the table on server 5433:

=# CREATE TABLE log_entry_y2014(log_time timestamp,
       entry text,
       check (date(log_time) >= '2014-01-01' AND
              date(log_time) < '2015-01-01'));
CREATE TABLE

And the second one on 5434:

=# CREATE TABLE log_entry_y2015(log_time timestamp,
       entry text,
       check (date(log_time) >= '2015-01-01' AND
              date(log_time) < '2016-01-01'));
CREATE TABLE

Now it is time to do the rest of the work on server 5432, by creating a parent table, and foreign tables that act as children, themselves linking to the relations on servers 5433 and 5434 already created. First here is some preparatory work to define the foreign servers.

=# CREATE EXTENSION postgres_fdw;
CREATE EXTENSION
=# CREATE SERVER server_5433 FOREIGN DATA WRAPPER postgres_fdw
   OPTIONS (host 'localhost', port '5433', dbname 'postgres');
CREATE SERVER
=# CREATE SERVER server_5434 FOREIGN DATA WRAPPER postgres_fdw
   OPTIONS (host 'localhost', port '5434', dbname 'postgres');
CREATE SERVER
=# CREATE USER MAPPING FOR PUBLIC SERVER server_5433 OPTIONS (password '');
CREATE USER MAPPING
=# CREATE USER MAPPING FOR PUBLIC SERVER server_5434 OPTIONS (password '');
CREATE USER MAPPING

And now here are the local tables:

=# CREATE TABLE log_entries(log_time timestamp, entry text);
CREATE TABLE
=# CREATE FOREIGN TABLE log_entry_y2014_f (log_time timestamp,
                                           entry text)
   INHERITS (log_entries) SERVER server_5433 OPTIONS (table_name 'log_entry_y2014');
CREATE FOREIGN TABLE
=# CREATE FOREIGN TABLE log_entry_y2015_f (log_time timestamp,
                                           entry text)
   INHERITS (log_entries) SERVER server_5434 OPTIONS (table_name 'log_entry_y2015');
CREATE FOREIGN TABLE

The tuple insertion from the parent table to the children can be achieved using for example a plpgsql function like this one with a trigger on the parent relation log_entries.

=# CREATE FUNCTION log_entry_insert_trigger()
   RETURNS TRIGGER AS $$
   BEGIN
     IF date(NEW.log_time) >= '2014-01-01' AND date(NEW.log_time) < '2015-01-01' THEN
       INSERT INTO log_entry_y2014_f VALUES (NEW.*);
     ELSIF date(NEW.log_time) >= '2015-01-01' AND date(NEW.log_time) < '2016-01-01' THEN
       INSERT INTO log_entry_y2015_f VALUES (NEW.*);
     ELSE
       RAISE EXCEPTION 'Timestamp out-of-range';
     END IF;
     RETURN NULL;
   END;
   $$ LANGUAGE plpgsql;
 CREATE FUNCTION
 =# CREATE TRIGGER log_entry_insert BEFORE INSERT ON log_entries
    FOR EACH ROW EXECUTE PROCEDURE log_entry_insert_trigger();
 CREATE TRIGGER

Once the environment is set and in place, log entries can be insertedon the parent tables, and will be automatically sharded across the foreign servers.

=# INSERT INTO log_entries VALUES (now(), 'Log entry of 2015');
INSERT 0 0
=# INSERT INTO log_entries VALUES (now() - interval '1 year', 'Log entry of 2014');
INSERT 0 0
=# INSERT INTO log_entries VALUES (now(), 'Log entry of 2015-2');
INSERT 0 0
=# INSERT INTO log_entries VALUES (now() - interval '1 year', 'Log entry of 2014-2');
INSERT 0 0

The entries inserted are of course localized on their dedicated foreign tables:

=# SELECT * FROM log_entry_y2014_f;
          log_time          |        entry
----------------------------+---------------------
 2014-03-27 22:34:04.952531 | Log entry of 2014
 2014-03-27 22:34:28.06422  | Log entry of 2014-2
(2 rows)
=# SELECT * FROM log_entry_y2015_f;
          log_time          |        entry
----------------------------+---------------------
 2015-03-27 22:31:19.042066 | Log entry of 2015
 2015-03-27 22:34:18.425944 | Log entry of 2015-2
(2 rows)

Something useful to note as well is that EXPLAIN is now verbose enough to identify all the tables targetted by a DML. For example in this case (not limited to foreign tables):

=# EXPLAIN UPDATE log_entries SET log_time = log_time + interval '1 day';
                                      QUERY PLAN
-----------------------------------------------------------------------------------
 Update on log_entries  (cost=0.00..296.05 rows=2341 width=46)
   Update on log_entries
   Foreign Update on log_entry_y2014_f
   Foreign Update on log_entry_y2015_f
   ->  Seq Scan on log_entries  (cost=0.00..0.00 rows=1 width=46)
   ->  Foreign Scan on log_entry_y2014_f  (cost=100.00..148.03 rows=1170 width=46)
   ->  Foreign Scan on log_entry_y2015_f  (cost=100.00..148.03 rows=1170 width=46)
(7 rows)

And this makes a day.

High Performance Django Infrastructure Preview

By Lincoln Loop from Django community aggregator: Community blog posts. Published on Mar 26, 2015.

One of the most common requests we've heard since releasing our book, High Performance Django is: "Do you have more code/configuration examples?" It's a pretty loaded question because the book covers everything from Python code to deploying and configuring servers. After some thinking on how to deliver this in a format people could easily understand, I realized the answer was already right under our noses.

We've been users of Salt for configuration management for almost three years. Over the last few weeks I've been extracting our internal Salt states into a reusable and extensible system I like to call "infrastructure-in-a-box". It encompasses all the lessons we've learned over the years with our different clients and allows anyone to setup a complete Python website (load balancer, web accelerator, cache, database, task queue, etc.) in about 15 minutes. The exact same code can be used to setup a single Vagrant machine on your laptop or a ten server farm in the cloud. I whipped together a quick screencast preview of it in action (apologies for the low-quality audio):

I'm really excited about being able to offer this as a companion product to our book. It's going to save people a lot of time and money (not to mention heartache) figuring it out on their own.

Here's the thing though, I need your feedback to get this released. I know this would have been useful for us and for many of our clients, but there's a lot of work left to take it from where it is to a polished product. Is this something that interests you? What topics would you like to see included? Leave a comment or send us an email with your thoughts. Thanks!

RevSys Roundup - March 2015

By Revolution Systems Blog from Django community aggregator: Community blog posts. Published on Mar 26, 2015.

RevSys Roundup - March 2015

How to automatically and professionally remove photo backgrounds

By Cloudinary Blog - Django from Django community aggregator: Community blog posts. Published on Mar 26, 2015.

RTB add-on It is common for e-commerce, media, and news sites to remove image backgrounds or make them transparent in order to place the main element of the image on either white or color backgrounds. The final result better integrates an image into a site or specific page’s graphic design. For example, a fashion site that presents clothes or shoes should have the main element of a photo (e.g. shoes) extracted from the original image background, then edited to fit the site’s catalogue design and structure.

Remove-The-Background add-on

RTB logo

We are glad to introduce the Remove-The-Background editing add-on, a third party image processing add-on that supports image background removal. This add-on is brought to you by Remove-The-Background, a leading vendor of image editing solution components, including professional background removal, that is performed by a team of human experts. We, at Cloudinary, have tried it multiple times and the results were pretty impressive.

There are automatic tools that can aid in background removal. Nonetheless, if your goal is to create perfect results, utilizing a graphic editor/designer would be your best bet. However, instead of hiring an in-house or freelance designer, Cloudinary’s Remove-The-Background add-on makes this process much simpler. Since the new add-on is fully integrated into Cloudinary's image management pipeline), when you upload an image, you can easily and automatically have it edited by Remove-The-Background experts.

How to remove a photo background with Cloudinary

We’d like to demonstrate this process, starting with the picture below:

Ruby:
cl_image_tag("shoes_orig.jpg")
PHP:
cl_image_tag("shoes_orig.jpg")
Python:
CloudinaryImage("shoes_orig.jpg").image()
Node.js:
cloudinary.image("shoes_orig.jpg")
Java:
cloudinary.url().imageTag("shoes_orig.jpg")
jQuery:
$.cloudinary.image("shoes_orig.jpg")
.Net:
cloudinary.Api.UrlImgUp.BuildImageTag("shoes_orig.jpg")
Original shows image

You can begin the process either while the photo is being uploaded to Cloudinary, using the upload API demonstrated in the code sample below, or by using the Admin API for images that have already been uploaded. Simply specify the background_removal parameter in either API.

Ruby:
Cloudinary::Uploader.upload("shoes.jpg",
  :public_id => "shoes",
  :background_removal => 'remove_the_background')
PHP:
\Cloudinary\Uploader::upload("shoes.jpg", 
  array(
    "public_id" => "shoes",
    "background_removal" => "remove_the_background"
  ));
Python:
cloudinary.uploader.upload("shoes.jpg",
  public_id = "shoes",
  background_removal = "remove_the_background")
Node.js:
cloudinary.uploader.upload("shoes.jpg", 
  function(result) { console.log(result); }, 
  { public_id: "shoes",
    background_removal: "remove_the_background" });
Java:
cloudinary.uploader().upload("shoes.jpg", Cloudinary.asMap(
  "public_id", "shoes",
  "background_removal", "remove_the_background"));

As mentioned above, the actual background removal is performed by Remove-The-Background’s team of experts and it could therefore take up to 24 hours to complete. Cloudinary processes the request asynchronously, then when the background removal is complete, the original uploaded image is replaced by an edited one. A backup of the original image is automatically saved to Cloudinary. It is also possible to receive a notification that indicates when the editing process is complete. Below, you can see how the picture's background was removed with great results:

Ruby:
cl_image_tag("shoes.jpg")
PHP:
cl_image_tag("shoes.jpg")
Python:
CloudinaryImage("shoes.jpg").image()
Node.js:
cloudinary.image("shoes.jpg")
Java:
cloudinary.url().imageTag("shoes.jpg")
jQuery:
$.cloudinary.image("shoes.jpg")
.Net:
cloudinary.Api.UrlImgUp.BuildImageTag("shoes.jpg")
Resulting image with background removed

Pictures can be further manipulated to fit your own graphics and design using Cloudinary's manipulation URLs. For example, below, you can see the same image cropped to 250 x 250, with increased saturation.

Ruby:
cl_image_tag("shoes.jpg", :width=>250, :height=>250, :crop=>:fill, :effect=>"saturation:80")
PHP:
cl_image_tag("shoes.jpg", array("width"=>250, "height"=>250, "crop"=>"fill", "effect"=>"saturation:80"))
Python:
CloudinaryImage("shoes.jpg").image(width=250, height=250, crop="fill", effect="saturation:80")
Node.js:
cloudinary.image("shoes.jpg", {width: 250, height: 250, crop: "fill", effect: "saturation:80"})
Java:
cloudinary.url().transformation(new Transformation().width(250).height(250).crop("fill").effect("saturation:80")).imageTag("shoes.jpg")
jQuery:
$.cloudinary.image("shoes.jpg", {width: 250, height: 250, crop: "fill", effect: "saturation:80"})
.Net:
cloudinary.Api.UrlImgUp.Transform(new Transformation().Width(250).Height(250).Crop("fill").Effect("saturation:80")).BuildImageTag("shoes.jpg")
250x250 cropped shows image with background removed

This add-on can remove the background from any type of photo, including pictures of people.

Ruby:
cl_image_tag("woman.jpg")
PHP:
cl_image_tag("woman.jpg")
Python:
CloudinaryImage("woman.jpg").image()
Node.js:
cloudinary.image("woman.jpg")
Java:
cloudinary.url().imageTag("woman.jpg")
jQuery:
$.cloudinary.image("woman.jpg")
.Net:
cloudinary.Api.UrlImgUp.BuildImageTag("woman.jpg")
Original woman photo Woman photo with background removed

The images below have been dynamically created using Cloudinary's manipulation URLs. 200x200 face-detection based thumbnails were created. The image on the left is a thumbnail of the original image while the image on the right is a thumbnail with the background removed.

Ruby:
cl_image_tag("woman.jpg", :width=>200, :height=>200, :crop=>:thumb, :gravity=>:face)
PHP:
cl_image_tag("woman.jpg", array("width"=>200, "height"=>200, "crop"=>"thumb", "gravity"=>"face"))
Python:
CloudinaryImage("woman.jpg").image(width=200, height=200, crop="thumb", gravity="face")
Node.js:
cloudinary.image("woman.jpg", {width: 200, height: 200, crop: "thumb", gravity: "face"})
Java:
cloudinary.url().transformation(new Transformation().width(200).height(200).crop("thumb").gravity("face")).imageTag("woman.jpg")
jQuery:
$.cloudinary.image("woman.jpg", {width: 200, height: 200, crop: "thumb", gravity: "face"})
.Net:
cloudinary.Api.UrlImgUp.Transform(new Transformation().Width(200).Height(200).Crop("thumb").Gravity("face")).BuildImageTag("woman.jpg")
Thumbnail of original woman photo Thumbnail of woman photo with background removed

Remove The Background supports additional editing profiles that can be specified via Cloudinary’s API (e.g. keep/remove shadow, transparent/white background, and more). Please contact us if you need a custom editing profile. For more details about this add-on check out our Remove-The-Background add-on documentation.

Final Notes

Cloudinary’s Remove-The-Background add-on helps preserve your site or app’s professional look, without the need for in-house graphic designers or long and complex editing processes. Customers of the Basic plan or higher can try the Remove-The-Background add-on for free then later subscribe to a plan that best meets your specific requirements.

If you don't have a Cloudinary account yet, sign up for a free account here.

Josh Berkus: Save the Date: pgConf Silicon Valley

From Planet PostgreSQL. Published on Mar 26, 2015.

On November 18th, 2015, we will have an independent, multi-track conference all about high performance PostgreSQL: pgConf SV. This conference is being organized by CitusData at the South San Francisco Convention Center. Stay tuned for call for presentations, sponsorships, and more details soon.

David Fetter: Formatting!

From Planet PostgreSQL. Published on Mar 26, 2015.

SQL is code.

This may seem like a simple idea, but out in the wild, you will find an awful lot of SQL programs which consist of a single line, which makes them challenging to debug.

Getting it into a format where debugging was reasonably easy used to be tedious and time-consuming, but no more!
Continue reading "Formatting!"

Peter Eisentraut: Retrieving PgBouncer statistics via dblink

From Planet PostgreSQL. Published on Mar 25, 2015.

PgBouncer has a virtual database called pgbouncer. If you connect to that you can run special SQL-like commands, for example

$ psql -p 6432 pgbouncer
=# SHOW pools;
┌─[ RECORD 1 ]───────────┐
│ database   │ pgbouncer │
│ user       │ pgbouncer │
│ cl_active  │ 1         │
│ cl_waiting │ 0         │
│ sv_active  │ 0         │
│ sv_idle    │ 0         │
│ sv_used    │ 0         │
│ sv_tested  │ 0         │
│ sv_login   │ 0         │
│ maxwait    │ 0         │
└────────────┴───────────┘

This is quite nice, but unfortunately, you cannot run full SQL queries against that data. So you couldn’t do something like

SELECT * FROM pgbouncer.pools WHERE maxwait > 0;

Well, here is a way: From a regular PostgreSQL database, connect to PgBouncer using dblink. For each SHOW command provided by PgBouncer, create a view. Then that SQL query actually works.

But before you start doing that, I have already done that here:

Here is another useful example. If you’re tracing back connections from the database server through PgBouncer to the client, try this:

SELECT * FROM pgbouncer.servers LEFT JOIN pgbouncer.clients ON servers.link = clients.ptr;

Unfortunately, different versions of PgBouncer return a different number of columns for some commands. Then you will need different view definitions. I haven’t determined a way to handle that elegantly.

Rajeev Rastogi: Index Scan Optimization for ">" condition

From Planet PostgreSQL. Published on Mar 24, 2015.

In PostgreSQL 9.5, we can see improved performance  for Index Scan on ">" condition.

In order to explain this optimization, consider the below schema:
create table tbl2(id1 int, id2 varchar(10), id3 int);
create index idx2 on tbl2(id2, id3);

Query as:
                select count(*) from tbl2 where id2>'a' and id3>990000;

As per design prior to this patch, Above query used following steps to retrieve index tuples:

  • Find the scan start position by searching first position in BTree as per the first key condition i.e. as per id2>'a'
  • Then it fetches each tuples from position found in step-1.
  • For each tuple, it matches all scan key condition, in our example it matches both scan key condition.
  • If condition match, it returns the tuple otherwise scan stops.


Now problem is here that already first scan key condition is matched to find the scan start position (Step-1), so it is obvious that any further tuple also will match the first scan key condition (as records are sorted).

So comparison on first scan key condition again in step-3 seems to be redundant.

So we have made the changes in BTree scan algorithm to avoid the redundant check i.e. remove the first key comparison for each tuple as it is guaranteed to be always true.

Performance result summary:



I would like to thanks Simon Riggs for verifying and committing this patch. Simon Riggs also confirmed improvement of 5% in both short and long index, on the least beneficial data-type and considered to be very positive win overall. 

Twitter way back machine

By Thierry Schellenbach from Django community aggregator: Community blog posts. Published on Mar 24, 2015.

One of the associates at Techstars created this beautiful demo of Stream’s technology (github repo). If you ever wondered about Thomas Edison’s or Nikola Tesla’s tweets, check it out! :)

Share and Enjoy: Digg Sphinn del.icio.us Facebook Mixx Google

Josh Berkus: pgDay SF recap

From Planet PostgreSQL. Published on Mar 24, 2015.

main-image

On March 10th, we had our third ever pgDay for SFPUG, which was a runaway success. pgDaySF 2015 was held together with FOSS4G-NA and EclipseCon; we were especially keen to join FOSS4G because of the large number of PostGIS users attending the event. In all, around 130 DBAs, developers and geo geeks joined us for pgDay SF ... so many that the conference had to reconfigure the room to add more seating!

standing room only

The day started out with Daniel Caldwell showing how to use PostGIS for offline mobile data, including a phone demo.

Daniel Caldwell setting up

Ozgun Erdogan presented pg_shard with a a short demo.

Ozgun presents pg_shard with PostGIS

Gianni Ciolli flew all the way from London to talk about using Postgres' new Logical Decoding feature for database auditing.

Gianni Ciolli presenting

Peak excitement of the day was Paul Ramsey's "PostGIS Feature Frenzy" presentation.

Paul Ramsey making quotes

We also had presentations by Mark Wong and Bruce Momjian, and lightning talks by several presenters. Slides for some sessions are available on the FOSS4G web site. According to FOSS4G, videos will be available sometime soon.

Of course, we couldn't have done it without our sponsors: Google, EnterpriseDB, 2ndQuadrant, CitusDB and pgExperts. So a big thank you to our sponsors, our speakers, and the staff of FOSS4G-NA for creating a great day.

gabrielle roth: Upgrading an existing RDS database to Postgres 9.4

From Planet PostgreSQL. Published on Mar 23, 2015.

Last Thursday, I had this short and one-sided conversation with myself: “Oh, cool, Pg 9.4 is out for RDS. I’ll upgrade my new database before I have to put it into production next week, because who knows when else I’ll get a chance. Even though I can’t use pg_dumpall, this will take me what, 20 […]

Heikki Linnakangas: pg_rewind in PostgreSQL 9.5

From Planet PostgreSQL. Published on Mar 23, 2015.

Before PostgreSQL got streaming replication, back in version 9.0, people kept asking when we’re going to get replication. That was a common conversation-starter when standing at a conference booth. I don’t hear that anymore, but this dialogue still happens every now and then:

- I have streaming replication set up, with a master and standby. How do I perform failover?
- That’s easy, just kill the old master node, and run “pg_ctl promote” on the standby.
- Cool. And how do I fail back to the old master?
- Umm, well, you have to take a new base backup from the new master, and re-build the node from scratch..
- Huh, what?!?

pg_rewind is a better answer to that. One way to think of it is that it’s like rsync on steroids. Like rsync, it copies files that differ between the source and target. The trick is in how it determines which files have changed. Rsync compares timestamps, file sizes and checksums, but pg_rewind understands the PostgreSQL file formats, and reads the WAL to get that information instead.

I started hacking on pg_rewind about a year ago, while working for VMware. I got it working, but it was a bit of a pain to maintain. Michael Paquier helped to keep it up-to-date, whenever upstream changes in PostgreSQL broke it. A big pain was that it has to scan the WAL, and understand all different WAL record types – miss even one and you might end up with a corrupt database. I made big changes to the way WAL-logging works in 9.5, to make that easier. All WAL record types now contain enough information to know what block it applies to, in a common format. That slashed the amount of code required in pg_rewind, and made it a lot easier to maintain.

I have just committed pg_rewind into the PostgreSQL git repository, and it will be included in the upcoming 9.5 version. I always intended pg_rewind to be included in PostgreSQL itself; I started it as a standalone project to be able to develop it faster, outside the PostgreSQL release cycle, so I’m glad it finally made it into the main distribution now. Please give it a lot of testing!

PS. I gave a presentation on pg_rewind in Nordic PGDay 2015. It was a great conference, and I think people enjoyed the presentation. Have a look at the slides for an overview on how pg_rewind works. Also take a look at the page in the user manual.

Getting started with Django tastypie

By Agiliq Blog: Django web development from Django community aggregator: Community blog posts. Published on Mar 23, 2015.

Django tastypie is a library to write RESTful apis in Django.

Why use REST

You have a database backed web application. This application tracks expenses. The application allows the capability to enter your expenses, view all your expenses, delete an expense etc. Essentially this application provides CRUD functionality. Django application has access to database credentials, but they are never seen by the users of the web application. Django application decides what to show to which user. Django application ensures that a particular user only sees the expenses entered by him and not somebody else's expenses.

Now you want to provide a mobile application (Android or iOS) corresponding to this web application. Android application should allow the user to view his expenses, create an expense as well as any other CRUD functionality. But database credentials could not be put in Android code as it is not too hard to decompile an apk and get the db credentials. And we never want a user to get the db credentials else he will be in a position to see everyone's expenses and the entire database. So there has to be another way to allow mobile applications to get things from the database. This is where REST comes into picture.

With REST, we have three components. A database, a Django application and a mobile application. Mobile application never accesses the database directly. It makes a REST api call to Django application. Mobile application also sends a api_key specific to the mobile user. Based on api_key, Django application determines what data to make visible to this particular api_key owner and sends the corresponding data in response.

Resource

REST stands for Representational State Transfer. It is a standard for transferring the state of a Resource, from web to mobile.

What do I mean by state of a Resource?

An expense could be a resource. A Person could be a resource. A blog post could be a resource. Basically any object or instance your program deals with could be a resource. And a resource's state is maintained in it's attributes. eg: You could have a model called Expense. The state of a expense instance is represented by its attributes.

Any REST library should be able to create and return a representation of such resource, which simply stated means that REST library should be able to tell us the attributes and their values for different model instances. And tastypie is adept at doing this.

Setting up the application

I am using Django 1.7. Some things might be different for you if you are using different version of Django.

As with all projects, I want to keep things in a virtual environment

$ mkvirtualenv tastier
$ workon tastier

Install Django

$ pip install Django

Start a Django project

(tastier) $ django-admin.py startproject tastier

(tastier) $ cd tastier/

Start an app

(tastier) $ python manage.py startapp expenses

Add this app to INSTALLED_APPS

Run migration

(tastier)~ $ python manage.py migrate

Runserver

(tastier)~ $ python manage.py runserver

Check that your are able to access http://localhost:8000/admin/login/

I have pushed the code for this project to Github. You will be able to checkout at different commits in the project to see specific things.

Getting started

Install django-tastypie.

(tastier) $ pip install django-tastypie

Create a file called expenses/api.py where you will keep all the tastypie related things.

Suppose your program deals with a resource called Expense. Let's create a model Expense in expenses/models.py

class Expense(models.Model):
    description = models.CharField(max_length=100)
    amount = models.IntegerField()

Run migrations

python manage.py makemigrations
python manage.py migrate

We will later add a ForeignKey(User) to Expense to associate an expense with User. Don't worry about it for now, we will come back to it.

Let's add few Expense instances in the database.

Expense.objects.create(description='Ate pizza', amount=100)
Expense.objects.create(description='Went to Cinema', amount=200)

Handling GET

You want the ability to get the representation of all expenses in your program at url "http://localhost:8000/api/expenses/".

To deal with a resource, tastypie requires a class which overrides ModelResource. Let's call our class ExpenseResource. Add following to expenses/api.py

from tastypie.resources import ModelResource

from .models import Expense

class ExpenseResource(ModelResource):

    class Meta:
        queryset = Expense.objects.all()
        resource_name = 'expense'

And you need to add the following to tastier/urls.py

from expenses.api import ExpenseResource

expense_resource = ExpenseResource()

urlpatterns = patterns('',
    url(r'^admin/', include(admin.site.urls)),
    url(r'^api/', include(expense_resource.urls)),
)

GET all expenses

After this you should be able to hit

http://localhost:8000/api/expense/?format=json

and you will see all the expenses from database in the response.

The response would be:

{"meta": {"limit": 20, "next": null, "offset": 0, "previous": null, "total_count": 2}, "objects": [{"amount": 100, "description": "Ate pizza", "id": 1, "resource_uri": "/api/expense/1/"}, {"amount": 200, "description": "Went to Cinema", "id": 2, "resource_uri": "/api/expense/2/"}]}

You will find the representation of expense instances in objects key of response.

Get a particular expense

You can get the representation of expense with id 1 at

http://localhost:8000/api/expense/1/?format=json

See how you are able to hit these two urls without adding them in urlpatterns. These urlpatterns are added by tastypie internally.

How these endpoints help and ties with mobile application example?

If the mobile app wants to show all the expenses it could use the url http://localhost:8000/api/expense/?format=json, get the response, parse the response and show the result on app.

Right now every user will see all the expenses. As we move forward we will see how only a user's expenses will be returned when a REST call is made from his/her mobile device.

Serialization

You must have realized that REST returns you serialized data. You might be wondering why use django-tastypie to achieve it, and not just use json.dumps. You can undoubtedly use json.dumps and not use django-tastypie to provide REST endpoints. But django-tastypie allows the ability to do many more things very easily as you will soo agree. Just hang on.

Changing Meta.resource_name

You can change ExpenseResource.Meta.resource_name from expense to expenditure.

class ExpenseResource(ModelResource):

    class Meta:
        queryset = Expense.objects.all()
        resource_name = 'expenditure'

And then the old urls will stop working. Your new GET urls in that case will be

http://localhost:8000/api/expenditure/?format=json
http://localhost:8000/api/expenditure/1/?format=json

Changing the resource_name changes the urls tastypie makes available to you.

Now change the resource_name back to expense.

We have our first commit at this point. You can checkout to this commit to see the code till this point.

git checkout b6a9c6

Meta.fields

Suppose you only want description in expense representation, but don't want amount. So you can add a fields attribute on ExpenseResource.Meta

class Meta:
    queryset = Expense.objects.all()
    resource_name = 'expenditure'
    fields = ['description']

Try

http://localhost:8000/api/expense/?format=json

So if you don't have fields attribute on Meta, all the attributes of Model will be sent in response. If you have fields, only attributes listed in fields will be sent in response.

Let's add amount also to fields. Though this gives us the same behaviour as not having ExpenseResource.Meta.fields at all.

class Meta:
    queryset = Expense.objects.all()
    resource_name = 'expenditure'
    fields = ['description', 'amount']

We have our second commit at this point. You can checkout till this point by doing:

git checkout 61194c

Filtering

Suppose you only want the Expenses where amount exceeds 150.

If we had to do this with Django model we would say:

Expense.objects.filter(amount__gt=150)

amount__gt is the key thing here. This could be appended to our url pattern to get the expenses where amount exceeds 150.

This could be achieved at url

http://localhost:8000/api/expense/?amount__gt=150&format=json

Try this. You will get an error because we haven't asked tastypie to allow filtering yet.

Add filtering attribute to ExpenseResource.Meta

class Meta:
    queryset = Expense.objects.all()
    resource_name = 'expense'
    fields = ['description', 'amount']
    filtering = {
        'amount': ['gt']
    }

You should be able to use

http://localhost:8000/api/expense/?amount__gt=150&format=json

This will only return the expenses where amount exceeds 150.

Now we want to get all the expenses on Pizza. We could get pizza expenses in following way from shell.

Expense.objects.filter(description__icontains='pizza')

So to achieve this thing in api, we need to make following changes to ExpenseResource.Meta.filtering:

class Meta:
    queryset = Expense.objects.all()
    resource_name = 'expense'
    fields = ['description', 'amount']
    filtering = {
        'amount': ['gt'],
        'description': ['icontains']
    }

And then following url would give us the pizza expenses

http://localhost:8000/api/expense/?description__icontains=pizza&format=json

With GET endpoints we were able to do the Read operations. With POST we will be able to do Create operations, as we will see in next section.

Handling POST

It's hard to do POST from the browser. So we will use requests library to achieve this.

Check expense count before doing POST.

>>> Expense.objects.count()
2

Tastypie by default doesn't authorize a person to do POST request. The default authorization class is ReadOnlyAuthorization which allows GET calls but doesn't allow POST calls. So you will have to disallow authorization checks for the time being. Add the following to ExpenseResource.Meta

authorization = Authorization()

You'll need to import Authorization class for it.

from tastypie.authorization import Authorization

After this, ExpenseResource would look like:

class ExpenseResource(ModelResource):

    class Meta:
        queryset = Expense.objects.all()
        resource_name = 'expense'
        fields = ['description', 'amount']
        filtering = {
            'amount': ['gt'],
            'description': ['icontains']
        }
        authorization = Authorization()

Don't get into detail of Authorization for now, I will come back to it.

Let's make a POST request to our rest endpoint which will create an Expense object in the database.

post_url = 'http://localhost:8000/api/expense/'
post_data = {'description': 'Bought first Disworld book', 'amount': 399}
headers = {'Content-type': 'application/json'}
import requests
import json
r = requests.post(post_url, json.dumps(post_data), headers=headers)
>>> print r.status_code
201

status_code 201 means that your Expense object was properly created. You can also verify it by checking that Expense count increased by 1.

>>> Expense.objects.count()
3

If you hit the GET endpoint from your browser, you will see this new Expense object too in the response. Try

http://localhost:8000/api/expense/?format=json

We have third commit at this point.

git checkout 749cf3

Explanation of POST

  • You need to POST at the same url where you get all the expenses. Compare the two urls.
  • One way of posting is to POST json encoded data. So we used json.dumps
  • If you are sending json encoded data, you need to send appopriate Content-type header too.


How this ties in with mobile

Android or iOS has a way to make POST request at a given url with headers. So you tell mobile app about the endpoint where they need to post and the data to post. They will call this rest endpoint, and the posted data will be handled by Django tastypie and proper row will be created in the database.

Adding authentication

Currently GET and POST endpoints respond to every request. So even users who aren't registered with the site will be able to see the expenses. Our first step is ensuring that only registered users are able to use the GET endpoints.

Api tokens and sessions

In a web application, a user logs in once and then she is able to make any number of web requests without being asked to login every time. eg: User logs in once and then can see her expense list page. After first request she can refresh the page, and can still get response without being asked for her login credentials again. This works because Django uses sessions and cookies to store user state. So browser sends a cookie to Django everytime the user makes a request, and Django app can associate the cookie with a user and shows the data for this particular user.

With mobile apps, there is no concept of sessions, unless the mobile is working with a WebView(clarify with Shabda). The session corresponding thing in a mobile app is Api key. So an api key is associated with a user. Every REST call should include this api key, and then tastypie can use this key to verify whether a logged in user is making the request.

Creating user and api token

Let's create an user in our system and a corresponding api token for her.

On a shell

u = User.objects.create_user(username='sheryl', password='abc', email='sheryl@abc.com')

Tastypie provides a model called ApiKey which allows storing tokens for users. Let's create a token for Sheryl.

from tastypie.models import ApiKey
ApiKey.objects.create(key='1a23', user=u)

We are setting the api token for sheryl as '1a23'

You need to ensure tastypie is in INSTALLED_APPS and you have migrated before you could create ApiKey instance.

The default authentication class provided by tastypie is Authentication which allows anyone to make GET requests. We need to set ExpenseResource.Meta.authentication to ensure that only users who provide valid api key are able to get response from GET endpoints.

Add the following on ExpensesResource.Meta.

authentication = ApiKeyAuthentication()

You need to import ApiKeyAuthentication.

from tastypie.authentication import ApiKeyAuthentication

Try the GET endpoint to get the list of expenses

http://localhost:8000/api/expense/?format=json

You will not see anything in response. If you see your runserver terminal, you'll notice that status code 401 is raised.

Api key should be sent in the request to get proper response.

Try the following url

http://localhost:8000/api/expense/?format=json&username=sheryl&api_key=1a23

With this Sheryl will be able to get proper api response.

Try sending wrong api_key for sheryl and you will not see proper response.

http://localhost:8000/api/expense/?format=json&username=sheryl&api_key=1a2

With these we ensure that only registered users of the system with proper api key will be able to make GET requests.

Fourth commit at this point

git checkout 48725f

How this ties in with mobile app

When user installs the app, he logs in using his username and password for first time. These credentials are sent to Django server using a REST call. Django server returns the api key corresponding to this user to the mobile app. Mobile app stores this api token on mobile end and then uses this token for every subsequent REST call. User doesn't have to provide the credentials anymore.

Making unauthenticated POST requests

Unauthenticated POST requests will not work anymore

Try creating an Expense without passing any api key.

post_data = {'description': 'Bought Two scoops of Django', 'amount': 399}
headers = {'Content-type': 'application/json'}
r = requests.post("http://localhost:8000/api/expense/", data=json.dumps(post_data), headers=headers)
print r.status_code       #This will give 401

Check that Expense count isn't increased

Making authenticated POST requests

You only need to change the url to include username and api_key in the url. This will make the request authenticated.

r = requests.post("http://localhost:8000/api/expense/?username=sheryl&api_key=1a23", data=json.dumps(post_data), headers=headers)

This should have worked and Expense count should have increased.

Try with wrong api_key and it will fail.

Getting only User's expense

Till now we aren't associating Expense to User. Let's add a ForeignKey to User from Expense.

Expense model becomes:

from django.db import models
from django.contrib.auth.models import User


class Expense(models.Model):
    description = models.CharField(max_length=100)
    amount = models.IntegerField()
    user = models.ForeignKey(User, null=True)

Since we already have some Expenses in db which aren't associated with a User, so we kept User as a nullable field.

Make and run migrations

python manage.py makemigrations
python manage.py migrate

Right now our authorization class is set to Authorization. With this every user is authorized to see every expense. We will have to add a custom authorization class to enforce that users see only their expenses.

Add the following to expenses/api.py

class ExpenseAuthorization(Authorization):

    def read_list(self, object_list, bundle):
        return object_list.filter(user=bundle.request.user)

And change authorization on ExpenseResource.Meta so it becomes:

class ExpenseResource(ModelResource):

    class Meta:
        queryset = Expense.objects.all()
        resource_name = 'expense'
        fields = ['description', 'amount']
        filtering = {
            'amount': ['gt'],
            'description': ['icontains']
        }
        authorization = ExpenseAuthorization()
        authentication = ApiKeyAuthentication()

Explanation of ExpenseAuthorization

  • When GET endpoint is called for expense list, object_list is created which gives all the expenses.
  • After this, authorization is checked where further filtering could be done.
  • In case of GET on list endpoint, authorization class' read_list() method is called. object_list is passed to read_list.
  • In tastypie there is a variable called bundle. And bundle has access to request using bundle.request
  • When authentication is used properly, bundle.request.user is populated with correct user.

Try expense list endpoint for Sheryl

http://localhost:8000/api/expense/?format=json&username=sheryl&api_key=1a23

You will not get any expense after adding ExpenseAuthorization

{"meta": {"limit": 20, "next": null, "offset": 0, "previous": null, "total_count": 0}, "objects": []}

This happenned because at this point no expense is associated with Sheryl.

Create an expense for Sheryl and try the GET endpoint

On the shell

u = User.objects.get(username='sheryl')
Expense.objects.create(description='Paid for the servers', amount=1000, user=u)
Expense.objects.create(description='Paid for CI server', amount=500, user=u)

Try expense list endpoint for Sheryl again

http://localhost:8000/api/expense/?format=json&username=sheryl&api_key=1a23

You should be able to see all of Sheryl's expenses.

Fifth commit here.

git checkout 26f7c1

How mobile app will use it.

When Sheryl installs the app, she will be asked to login for the first time. There will be a REST endpoint which takes the username and password for a user and if the credentials are right, returns the api key for the user. Sheryl's api key will be returned to the mobile app which will store it in local storage. And when Sheryl wants to see her expenses, this REST call will be made to Django server.

http://localhost:8000/api/expense/?format=json&username=sheryl&api_key=1a23

This will only return Sheryl's expenses.

POST and create Sheryl's expense

Till now if a POST request is made, even if with Sheryl's api key, expense is created in db but is not associated with Sheryl.

We want to add functionality where if POST request is made from Sheryl's device then expense is associated with Sheryl. If POST request is made from Mark's device then expense should be associated with Mark.

Tastypie provides several hookpoints. We will use one such hookpoint. ModelResource provides a method called hydrate which we need to override. Add the following method to ExpenseResource.

def hydrate(self, bundle):
    bundle.obj.user = bundle.request.user
    return bundle
  • This method is called during POST/PUT calls.
  • bundle.obj is an Expense instance about to be saved in the database.
  • So we set user on bundle.obj by reading it from bundle.request. We have already discussed how bundle.request.obj is populated during authentication flow.

Make a POST request now with Sheryl's api_key.

post_data = {'description': 'Paid for iDoneThis', 'amount': 700}
r = requests.post("http://localhost:8000/api/expense/?username=sheryl&api_key=1a23", data=json.dumps(post_data), headers=headers)

Verify that the latest expense instance gets associated with Sheryl. You can also verify it by seeing that this object gets returned in GET expense list endpoint.

http://localhost:8000/api/expense/?format=json&username=sheryl&api_key=1a23

Sixth commit at this point

git checkout 17b932

Try on your own

  • Create one more user in database, from shell.
  • Create api key for this user.
  • POST to REST endpoint with this new user's api_key and username and verify that the expense gets associated with this new user.
  • Check GET expense list for this new user and verify that only expense created for this user is in the response.

Now is a good time to dig deeper into django-tastypie and understand about following:

  • Dehydrate cycle. It is used during GET calls.
  • Hydrate cycle. It is used during POST/PUT calls. Once you read about hydrate cycle, you will understand when method hydrate() is called.
  • More about authorization and different methods available on Authorization which could be overridden by you.

Want more?

I am still trying few things with tastypie. Hereafter I will not have much explanation, but I will point to the commit where I attain certain functionality change.

Authorization on detail endpoint.

Expense with id 1 is not associated with any user. But Sheryl is still able to see it at:

http://localhost:8000/api/expense/1/?format=json&username=sheryl&api_key=1a23

She shouldn't be able to see it as it is not her expense.

So add the following to ExpenseAuthorization

def read_detail(self, object_list, bundle):
    obj = object_list[0]
    return obj.user == bundle.request.user

After this Sheryl will not be able to see detail endpoint of any expense which doesn't belong to her. Try it

http://localhost:8000/api/expense/1/?format=json&username=sheryl&api_key=1a23

Commit id for this:

e650f3
git show e650f3

PUT endpoint

Expense with id 5 belongs to Sheryl. She wants to update this expense, she essentially want to change the description.

Current thing is:

http://localhost:8000/api/expense/5/?format=json&username=sheryl&api_key=1a23

Make PUT request

put_url = "http://localhost:8000/api/expense/5/?username=sheryl&api_key=1a23"
put_data = {'description': 'Paid for Travis'}
headers = {'Content-type': 'application/json'}
r = requests.put(put_url, data=json.dumps(put_data), headers=headers)

Description of Expense 5 is updated as you can verify by trying the detail endpoint again.

http://localhost:8000/api/expense/5/?format=json&username=sheryl&api_key=1a23

Notice that amount remains unchanged. So PUT changes whatever data you provide in the api call and lets everything else remain as it is.

DELETE endpoint

First check all of Sheryl's expenses

http://localhost:8000/api/expense/?format=json&username=sheryl&api_key=1a23

Sheryl wants to delete her expense with id 5. After this is done she will have one less expense in db.

delete_url = "http://localhost:8000/api/expense/5/?username=sheryl&api_key=1a23"
r = requests.delete(delete_url)

Verify that this expense got deleted.

So we were able to do Create, Read, Update and Delete with REST api calls.

Restrict POST request to certain users only

Suppose we want the users to be able to create expenses from web end but don't want to allow creating expense from mobile using the api. Yeah, weird requiremtn.

Also we don't want to disallow POST for all users. We still want Sheryl to be able to POST.

To try this we first need a new user with api key in our system. Create it from Django shell.

u = User.objects.create_user(username='mark', password='def', email='mark@abc.com')
ApiKey.objects.create(key='2b34', user=u)

Restricting POST for everyone except Sheryl

Add following method to ExpenseAuthorization

def create_detail(self, object_list, bundle):
    user = bundle.request.user
    # Return True if current user is Sheryl else return False
    return user.username == "sheryl"

Try making POST request as Mark and see you will not be able to do it. If you want you can see the expense count at this point.

post_data = {'description': 'Petty expense', 'amount': 3000}
r = requests.post("http://localhost:8000/api/expense/?username=mark&api_key=2b34", data=json.dumps(post_data), headers=headers)
print r.status_code #Should have got 401

Also you can check the expense count again to verify that expense isn't created.

Status code 401 tells that you aren't authorized to do this operation.

Verify that Sheryl is still able to create expense

Try posting the same post_data as Sheryl

r = requests.post("http://localhost:8000/api/expense/?username=sheryl&api_key=1a23", data=json.dumps(post_data), headers=headers)
print r.status_code

Status code must be 201 in this case which means expense is created. You will be able to see this expense at Sheryl's GET expense list endpoint.

Verify that Mark is still able to do GET

Mark or any other user should still be able to make GET request even if he isn't able to make POST request.

http://localhost:8000/api/expense/?username=mark&api_key=2b34&format=json

Since Mark doesn't have any expense in db, so no object is there is objects key of response. Try creating an expense for this user from shell and then try the GET endpoint again.

Marco Slot: PGConf.Russia talk on pg_shard

From Planet PostgreSQL. Published on Mar 23, 2015.

Last month we went to PGConf.Russia and gave a talk on pg_shard, now available for all to see:

We got some very interesting questions during the talk that we wanted to highlight and clarify.

  • Does pg_shard/CitusDB run my queries in parallel? In pg_shard, a query will use one thread on a master node and one on a worker node. You can run many queries in parallel by making multiple connections to the master node(s), whereas the real work is being done by the worker nodes. UPDATE and DELETE queries on the same shard are serialized to ensure consistency between the replicas. To parallelize multi-shard SELECT queries across the worker nodes, you can upgrade to CitusDB.
  • Can I use stored procedures in my queries? Yes, and this is a powerful feature of pg_shard. Function calls in queries are executed on the worker nodes, which allows you to include arbitrary logic in your queries and scale it out to many worker nodes.
  • How do I ALTER a distributed table?
    pg_shard currently does not automatically propagate ALTER TABLE commands on a distributed table to the individual shards on the workers, but you can easily do this with a simple shell script. For pg_shard: alter-pgshard-table.sh and for CitusDB: alter-citusdb-table.sh.
  • What kind of lock is used when copying shard placements? In the latest version of pg_shard, we've added a master_copy_shard_placement function which takes an exclusive lock of the shard. This will temporarily block changes to the shard, while selects can still go through.
  • What's the difference between pg_shard and PL/Proxy? Pl/Proxy allows you to scale out stored procedures on a master node across many worker nodes, whereas pg_shard allows you to (transparently) scale out a table and queries on the table across many worker nodes using replication and sharding.
  • Can I use cstore_fdw and pg_shard without CitusDB? You certainly can! cstore_fdw and pg_shard can be used both in a regular PostgreSQL database or in combination with CitusDB.

We would like to thank the organizers for a great conference and providing the recording of the talk!

Umair Shahid: NoSQL Support in PostgreSQL

From Planet PostgreSQL. Published on Mar 23, 2015.

 

Developers have been really excited about the addition of JSON support starting PostgreSQL v9.2. They feel they now have the flexibility to work with a schema-less unstructured dataset while staying within a relational DBMS. So what’s the buzz all about? Let’s explore below …

Why is NoSQL so attractive?

Rapid turnaround time … it is as simple as that. With the push to decrease time-to-market, developers are under constant pressure to turn POCs around very quickly. It is actually not just POCs, marketable products are increasingly getting the same treatment. The attitude is, “If I don’t get it out, someone else will.”.

Any decent sized application will need to store data somewhere. Rather than going through the pains of designing schemas and the debates on whether to normalize or not, developers just want to get to the next step. That’s how databases like MongoDB gained such tremendous popularity. They allow for schema-less, unstructured data to be inserted in document form and the developers find it easy to convert class objects within their code into that document directly.

There is a trade-off, however. The document (and key/value store) databases are very unfriendly to relations. While retrieving data, you will have a very hard time cross referencing between different tables making analytics nearly impossible. And, nightmare of nightmares for mission critical applications, these databases are not ACID compliant.

In walks PostgreSQL with JSON and HSTORE support.

NoSQL in PostgreSQL

While the HSTORE contrib module has been providing key/value data types in standard PostgreSQL table columns since v8.2, the introduction of native JSON support in v9.2 paves way for the true power of NoSQL within PostgreSQL.

Starting v9.3, not only do you have the ability to declare JSON data types in standard tables, you now have functions to encode data to JSON format and also to extract data elements from a JSON column. What’s more, you can also interchange data between JSON and HSTORE using simple & intuitive functions.

… and this is all ACID compliant!

The Power

Talk about bringing the best of both worlds together, the power that NoSQL capabilities bring to a traditional relational database is amazing. Developers now have the ability to kick-start their application development without any database bottlenecks using unstructured data. At the stage where analytics are required, they can be gradually structured to accommodate enterprise requirements within the same PostgreSQL database without the need for expensive migrations.

Have questions? Contact us NOW!

The post NoSQL Support in PostgreSQL appeared first on Stormatics.

Paul Ramsey: Magical PostGIS

From Planet PostgreSQL. Published on Mar 21, 2015.

I did a new PostGIS talk for FOSS4G North America 2015, an exploration of some of the tidbits I've learned over the past six months about using PostgreSQL and PostGIS together to make "magic" (any sufficiently advanced technology...)

 

Astro Code School Now Accepting Applications - Intermediate Django + Python

By Caktus Consulting Group from Django community aggregator: Community blog posts. Published on Mar 20, 2015.

 

Code

 

I'm really happy to officially announce the first Python and Django Web Engineering class at Astro Code School. I’ll outline some details here and you can also find them on our classes page.

This class is twelve weeks long and full time Monday to Friday from 9 AM – 5 PM. It'll be taught here at the Astro Code School at 108 Morris Street, Suite 1b, Durham, NC. We will conduct two Python and Django Web Engineering classes in 2015. The first one in term two starts May 18, 2015 and ends August 10, 2015. The second one in term three starts September 22, 2015 and ends December 15, 2015.

Enrollment for both sections opens today March 20. There is space for twelve students in each class. More information about the enrollment process is on our Apply page. Part of that process is an entrance exam that is designed to ensure you're ready to succeed. The price per person for Python and Django Web Engineering is $12,000.

The Python and Django Web Engineering class is intended for intermediate level students. Its goal is to help you start your career as a backend web engineer. To start down this path we recommend you prepare yourself. A few things you can do are: read some books on Python & Django, complete the Django Girls tutorial, watch videos on Youtube, and take an online class or two in Python.

 

 

Python and Django make a powerful team to build maintainable web applications quickly. When you take this course you will build your own web application during lab time with assistance from your teacher and professional Django developers. You’ll also receive help preparing your portfolio and resume to find a job using the skills you’ve learned.

Here's the syllabus:

  1. Python Basics, Git & GitHub, Unit Testing
  2. Object Oriented Programming, Functional Programming, Development Process, Command Line
  3. HTML, HTTP, CSS, LESS, JavaScript, DOM
  4. Portfolio Development, Intro to Django, Routing, Views, Templates
  5. SQL, Models, Migrations, Forms, Debugging
  6. Django Admin, Integrating Apps, Upgrading Django, Advanced Django
  7. Ajax, JavaScript, REST
  8. Linux Admin, AWS, Django Deployment, Fabric
  9. Interviewing Skills, Computer Science Topics, Review
  10. Final Project Labs
  11. Final Project Labs
  12. Final Project Labs

 

This comprehensive course is taught by experienced developer and trained teacher Caleb Smith. He's been working full time at Caktus Consulting Group, the founders of Astro Code School and the nation’s largest Django firm. He’s worked on many client projects over the years. He’s also applied his experience as a former public school teacher to teach Girl Develop It Python classes and as an adjunct lecturer at the University of North Carolina-Chapel Hill. I think you'll really enjoy working with and learning from Caleb. He's a wonderful person.

For the past six months we've been working very hard to launch the school. A large amount of our time has been spent on a application to receive our license from the State of North Carolina to conduct a proprietary school. As of today Astro is one of two code schools in North Carolina that have received this license. We found it a very important task to undertake. It helped us do our due diligence to run a honest and fair school that will protect the rights of students who will be attending Astro Code School. This long process also explains why we've waited to tell you all the details. We're required to wait till we have a license to open our application process.

Thanks for checking out Astro Code School. If you have any questions please contact me.

Paul Ramsey: Making Lines from Points

From Planet PostgreSQL. Published on Mar 20, 2015.

Somehow I've gotten through 10 years of SQL without ever learning this construction, which I found while proof-reading a colleague's blog post and looked so unlikely that I had to test it before I believed it actually worked. Just goes to show, there's always something new to learn.

Suppose you have a GPS location table:

  • gps_id: integer
  • geom: geometry
  • gps_time: timestamp
  • gps_track_id: integer

You can get a correct set of lines from this collection of points with just this SQL:


SELECT
gps_track_id,
ST_MakeLine(geom ORDER BY gps_time ASC) AS geom
FROM gps_poinst
GROUP BY gps_track_id

Those of you who already knew about placing ORDER BY within an aggregate function are going "duh", and the rest of you are, like me, going "whaaaaaa?"

Prior to this, I would solve this problem by ordering all the groups in a CTE or sub-query first, and only then pass them to the aggregate make-line function. This, is, so, much, nicer.

Test django view with cookies

By Andrey Zhukov's blog from Django community aggregator: Community blog posts. Published on Mar 20, 2015.

To test some view which use cookies:

1
2
3
4
5
6
7
8
9
10
from Cookie import SimpleCookie
from django import test

class SomeTest(test.TestCase):

  def test_some_view(self):
      self.client.cookies = SimpleCookie({'test_cookie': 'test_value'})
      response = self.client.get('/some-url/')

      self.assertEqual(response.client.cookies['test_cookie'].value, 'test_value')

Shaun M. Thomas: PG Phriday: Date Based Partition Constraints

From Planet PostgreSQL. Published on Mar 20, 2015.

PostgreSQL has provided table partitions for a long time. In fact, one might say it has always had partitioning. The functionality and performance of table inheritance has increased over the years, and there are innumerable arguments for using it, especially for larger tables consisting of hundreds of millions of rows. So I want to discuss a quirk that often catches developers off guard. In fact, it can render partitioning almost useless or counter-productive.

PostgreSQL has a very good overview in its partitioning documentation. And the pg_partman extension at PGXN follows the standard partitioning model to automate many of the pesky tasks for maintaining several aspects of partitioning. With modules like this, there’s no need to manually manage new partitions, constraint maintenance, or even some aspects of data movement and archival.

However, existing partition sets exist, and not everyone knows about extensions like this, or have developed in-house systems instead. Here’s something I encountered recently:

CREATE TABLE sys_order
(
    order_id     SERIAL       PRIMARY KEY,
    product_id   INT          NOT NULL,
    item_count   INT          NOT NULL,
    order_dt     TIMESTAMPTZ  NOT NULL DEFAULT now()
);

CREATE TABLE sys_order_part_201502 ()
       INHERITS (sys_order);

ALTER TABLE sys_order_part_201502
  ADD CONSTRAINT chk_order_part_201502
      CHECK (order_dt >= '2015-02-01'::DATE AND
             order_dt < '2015-02-01'::DATE + INTERVAL '1 mon');

This looks innocuous enough, but PostgreSQL veterans are already shaking their heads. The documentation alludes to how this could be a problem:

Keep the partitioning constraints simple, else the planner may not be able to prove that partitions don’t need to be visited.

The issue in this case, is that adding the interval of a month changes the right boundary of this range constraint into a dynamic value. PostgreSQL will not use dynamic values in evaluating check constraints. Here’s a query plan from PostgreSQL 9.4.1, which is the most recent release as of this writing:

EXPLAIN
SELECT * FROM sys_order
 WHERE order_dt = '2015-03-02';

                QUERY PLAN                                    
---------------------------------------------
 Append  (cost=0.00..30.38 rows=9 width=20)
   ->  Seq Scan on sys_order  ...
   ->  Seq Scan on sys_order_part_201502  ...

Well, it looks like the PostgreSQL planner wants to check both tables, even though the constraint we added to the child does not apply. Now, this isn’t a bug per se, but it might present as somewhat counter-intuitive. Let’s replace the constraint with one that does not use a dynamic value and try again:

ALTER TABLE sys_order_part_201502
 DROP CONSTRAINT chk_order_part_201502;

ALTER TABLE sys_order_part_201502
  ADD CONSTRAINT chk_order_part_201502
      CHECK (order_dt >= '2015-02-01'::DATE AND
             order_dt < '2015-03-01'::DATE);

EXPLAIN
SELECT * FROM sys_order
 WHERE order_dt = '2015-03-02';

                QUERY PLAN                                    
---------------------------------------------
 Append  (cost=0.00..30.38 rows=9 width=20)
   ->  Seq Scan on sys_order  ...
   ->  Seq Scan on sys_order_part_201502  ...

Wait a minute… what happened here? There’s no dynamic values; the constraint is a simple pair of static dates. Yet still, PostgreSQL wants to check both tables. Well, this was a trick question of sorts, because the real answer lies in the data types used in the constraint. The TIMESTAMP WITH TIME ZONE type, you see, is not interchangeable with TIMESTAMP. Since the time zone is preserved in this type, the actual time and date can vary depending on how it’s cast.

Watch what happens when we change the constraint to match the column type used for order_dt:

ALTER TABLE sys_order_part_201502
 DROP CONSTRAINT chk_order_part_201502;

ALTER TABLE sys_order_part_201502
  ADD CONSTRAINT chk_order_part_201502
      CHECK (order_dt >= '2015-02-01'::TIMESTAMPTZ AND
             order_dt < '2015-03-01'::TIMESTAMPTZ);

EXPLAIN
SELECT * FROM sys_order
 WHERE order_dt = '2015-03-02';

                QUERY PLAN                                    
---------------------------------------------
 Append  (cost=0.00..0.00 rows=1 width=20)
   ->  Seq Scan on sys_order  ...

Now all of the types will be directly compatible, removing any possibility of time zones being cast to a different date than the constraint uses. This is an extremely subtle type mismatch, as many developers and DBAs alike, consider these types as interchangeable. This is further complicated by the fact DATE seems to be the best type to use for the constraint, since time isn’t relevant to the desired boundaries.

It’s important to understand that even experienced developers and DBAs can get types wrong. This is especially true when including information like the time zone appears completely innocent. In fact, it’s the default PostgreSQL datetime type for a very good reason: time zones change. Without the time zone, data in the column is bound to the time zone wherever the server is running. That this applies to dates as well, can come as a bit of a surprise.

The lesson here is to always watch your types. PostgreSQL removed a lot of automatic casting in 8.3, and received no small amount of backlash for doing so. However, we can see how subtly incompatible types can cause major issues down the line. In the case of partitioning, a type mismatch can be the difference between reading 10-thousand rows, or 10-billion.

Craig Ringer: Dynamic SQL-level configuration for BDR 0.9.0

From Planet PostgreSQL. Published on Mar 19, 2015.

The BDR team has recently introduced support for dynamically adding new nodes to a BDR group from SQL into the current development builds. Now no configuration file changes are required to add nodes and there’s no need to restart the existing or newly joining nodes.

This change does not appear in the current 0.8.0 stable release; it’ll land in 0.9.0 when that’s released, and can be found in the bdr-plugin/next branch in the mean time.

New nodes negotiate with the existing nodes for permission to join. Soon they’ll be able to the group without disrupting any DDL locking, global sequence voting, etc.

There’s also an easy node removal process so you don’t need to modify internal catalog tables and manually remove slots to drop a node anymore.

New node join process

With this change, the long-standing GUC-based configuration for BDR has been removed. bdr.connections no longer exists and you no longer configure connections with bdr.[conname]_dsn etc.

Instead, node addition is accomplished with the bdr.bdr_group_join(...) function. Because this is a function in the bdr extension, you must first CREATE EXTENSION bdr;. PostgreSQL doesn’t have extension dependencies and the bdr extension requires the btree_gist extension so you’ll have to CREATE EXTENSION btree_gist first.

Creating the first node

Creation of the first node must now be done explicitly using bdr.bdr_group_create. This promotes a standalone PostgreSQL database to a single-node BDR group, allowing other nodes to then be joined to it.

You must pass a node name and a valid externally-reachable connection string for the dsn parameter, e.g.:

CREATE EXTENSION btree_gist;

CREATE EXTENSION bdr;

SELECT bdr.bdr_group_join(
  local_node_name = 'node1',
  node_external_dsn := 'host=node1 dbname=mydb'
);

Note that the dsn is not used by the root node its self. It’s used by other nodes to connect to the root node, so you can’t use a dsn like host=localhost dbname=mydb if you intend to have nodes on multiple machines.

Adding other nodes

You can now join other nodes to form a fully functional BDR group by calling bdr.bdr_group_join and specifying a connection string that points to an existing node for the join_using_dsn. e.g.:

CREATE EXTENSION btree_gist;

CREATE EXTENSION bdr;

SELECT bdr.bdr_node_join(
    local_node_name := 'node2',
    node_external_dsn := 'host=node2 dbname=mydb',
    join_using_dsn := 'host=node1 dbname=mydb'
);

Here, node_external_dsn is an externally reachable connection string that can be used to establish connection to the new node, just like you supplied for the root node.

The join_using_dsn specifies the node that this new node should connect to when joining the group and establishing its membership. It won’t be used after joining.

Waiting until a node is ready

It’s now possible to tell when a new node has finished joining by calling bdr.node_join_wait(). This function blocks until the local node reports that it’s successfully joined a BDR group and is ready to execute commands.

Database name “bdr” now reserved

Additionally, the database name bdr is now reserved. It may not be used for BDR nodes, as BDR requires it for internal management. Hopefully this requirement can be removed later once a patch to the BGWorkers API has been applied to core.

Documentation moving into the source tree

The documentation on the PostgreSQL wiki sections for BDR is being converted into the same format as is used for PostgreSQL its self. It’s being added to the BDR extension source tree and will be available as part of the 0.9.0 release.

Trying it out

If you’d like to test out bdr-plugin/next, which is due to become BDR 0.9.0, take a look at the source install instructions and the quick-start guide.

There are no packages for BDR 0.9.0 yet, so if you try to install from packages you’ll just get 0.8.0.

Comments? Questions

Please feel free to leave comments and questions here, or post to pgsql-general with BDR-related questions.

We’re also now using GitHub to host a mirror of the BDR repository. We’re using the issue tracker there, so if you’ve found a bug and can supply a detailed report with the exact version and steps to reproduce, please file it there.

Paul Ramsey: PostGIS 2.1.6 Released

From Planet PostgreSQL. Published on Mar 19, 2015.

The 2.1.6 release of PostGIS is now available.

The PostGIS development team is happy to release patch for PostGIS 2.1, the 2.1.6 release. As befits a patch release, the focus is on bugs, breakages, and performance issues. Users with large tables of points will want to priorize this patch, for substantial (~50%) disk space savings.

http://download.osgeo.org/postgis/source/postgis-2.1.6.tar.gz

Continue Reading by clicking title hyperlink ..

Transform your image overlays with on-the-fly manipulation

By Cloudinary Blog - Django from Django community aggregator: Community blog posts. Published on Mar 19, 2015.

Layer apply Front end developers may want to combine multiple images into a single image. For example, when creating and adding watermarks to stock photos, adding shapes or badges, preparing content for print (e.g. placing a logo on a t-shirt or a mug), adding a caption, and so on.

Multiple images can be combined by overlaying them one on top of the other. However, since it is not a given that both the underlying and overlaying images match each other, let alone your graphic design, you may need to perform further manipulations (e.g. resize, crop, change colors, create a better fit). This is where Cloudinary comes in.

Cloudinary's image overlay feature helps users easily combine multiple images. It supports image and text overlays using on-the-fly manipulation URLs. In this blog post, we will show you how to separately manipulate, process, and transform underlying and overlaying images, then dynamically generate a resulting image that you can embed on your site.

Manipulating image overlays

Suppose you have a website that sells personalized gifts. Users can upload their own photos, add text, and your site will automatically crop and manipulate those photos and text on the gift of their choice. For example, a couple may want to place their picture on a coffee mug. This would require you to resize and manipulate both the underlying image of the coffee mug and the overlaying picture of the couple until they fit together in harmony. Once the images are put in place, you can add text and perform further manipulations if necessary.

Below is an example of the original images of the couple and coffee mug that were uploaded to the cloud for further manipulation and delivery.

Coffee cup Nice couple

You can add an image overlay using Cloudinary's overlay parameter (or l for URLs). Returning to our example, here is what the final version of the coffee mug would look like with the overlaying picture of the couple:

Ruby:
cl_image_tag("coffee_cup.jpg", :transformation=>[
  {:width=>400, :height=>250, :crop=>:fill, :gravity=>:south},
  {:overlay=>"nice_couple", :width=>90, :x=>-20, :y=>18, :gravity=>:center}
  ])
PHP:
cl_image_tag("coffee_cup.jpg", array("transformation"=>array(
  array("width"=>400, "height"=>250, "crop"=>"fill", "gravity"=>"south"),
  array("overlay"=>"nice_couple", "width"=>90, "x"=>-20, "y"=>18, "gravity"=>"center")
  )))
Python:
CloudinaryImage("coffee_cup.jpg").image(transformation=[
  {"width": 400, "height": 250, "crop": "fill", "gravity": "south"},
  {"overlay": "nice_couple", "width": 90, "x": -20, "y": 18, "gravity": "center"}
  ])
Node.js:
cloudinary.image("coffee_cup.jpg", {transformation: [
  {width: 400, height: 250, crop: "fill", gravity: "south"},
  {overlay: "nice_couple", width: 90, x: -20, y: 18, gravity: "center"}
  ]})
Java:
cloudinary.url().transformation(new Transformation()
  .width(400).height(250).crop("fill").gravity("south").chain()
  .overlay("nice_couple").width(90).x(-20).y(18).gravity("center")).imageTag("coffee_cup.jpg")
jQuery:
$.cloudinary.image("coffee_cup.jpg", {transformation: [
  {width: 400, height: 250, crop: "fill", gravity: "south"},
  {overlay: "nice_couple", width: 90, x: -20, y: 18, gravity: "center"}
  ]})
.Net:
cloudinary.Api.UrlImgUp.Transform(new Transformation()
  .Width(400).Height(250).Crop("fill").Gravity("south").Chain()
  .Overlay("nice_couple").Width(90).X(-20).Y(18).Gravity("center")).BuildImageTag("coffee_cup.jpg")
Image overlay

Transformation instructions are used to perform the image manipulations using dynamic URLs. Cloudinary's client libraries assist in building these URLs. You can apply multiple chained transformations by separating them with a slash / in your image's URL.

In order to better manipulate image overlays, you can set the flags parameter to layer_apply (or fl_layer_apply for URLs), which then tells Cloudinary that all chained transformations that were specified up until the flag, are to be applied on the overlaying image instead of the containing image.

Using our coffee mug example, below you can see how we have applied multiple manipulations both on the containing image as well as the overlay. The containing image has been cropped to fill a 400x250 rectangle and the overlaying image of the couple has been cropped using face detection. Color saturation has been increased by 50% and the vignette effect has been applied. Finally, the resulting image has been resized to 100 pixels wide, converted to a circular shape and positioned with 20 pixels offset from the center of the containing image.

Ruby:
cl_image_tag("coffee_cup.jpg", :transformation=>[
  {:width=>400, :height=>250, :crop=>:fill, :gravity=>:south},
  {:overlay=>"nice_couple", :width=>1.3, :height=>1.3, :crop=>:crop, :gravity=>:faces, :flags=>:region_relative},
  {:effect=>"saturation:50"},
  {:effect=>"vignette"},
  {:radius=>"max", :width=>100, :x=>-20, :y=>20, :gravity=>:center, :flags=>:layer_apply}
  ])
PHP:
cl_image_tag("coffee_cup.jpg", array("transformation"=>array(
  array("width"=>400, "height"=>250, "crop"=>"fill", "gravity"=>"south"),
  array("overlay"=>"nice_couple", "width"=>1.3, "height"=>1.3, "crop"=>"crop", "gravity"=>"faces", "flags"=>"region_relative"),
  array("effect"=>"saturation:50"),
  array("effect"=>"vignette"),
  array("radius"=>"max", "width"=>100, "x"=>-20, "y"=>20, "gravity"=>"center", "flags"=>"layer_apply")
  )))
Python:
CloudinaryImage("coffee_cup.jpg").image(transformation=[
  {"width": 400, "height": 250, "crop": "fill", "gravity": "south"},
  {"overlay": "nice_couple", "width": 1.3, "height": 1.3, "crop": "crop", "gravity": "faces", "flags": "region_relative"},
  {"effect": "saturation:50"},
  {"effect": "vignette"},
  {"radius": "max", "width": 100, "x": -20, "y": 20, "gravity": "center", "flags": "layer_apply"}
  ])
Node.js:
cloudinary.image("coffee_cup.jpg", {transformation: [
  {width: 400, height: 250, crop: "fill", gravity: "south"},
  {overlay: "nice_couple", width: 1.3, height: 1.3, crop: "crop", gravity: "faces", flags: "region_relative"},
  {effect: "saturation:50"},
  {effect: "vignette"},
  {radius: "max", width: 100, x: -20, y: 20, gravity: "center", flags: "layer_apply"}
  ]})
Java:
cloudinary.url().transformation(new Transformation()
  .width(400).height(250).crop("fill").gravity("south").chain()
  .overlay("nice_couple").width(1.3).height(1.3).crop("crop").gravity("faces").flags("region_relative").chain()
  .effect("saturation:50").chain()
  .effect("vignette").chain()
  .radius("max").width(100).x(-20).y(20).gravity("center").flags("layer_apply")).imageTag("coffee_cup.jpg")
jQuery:
$.cloudinary.image("coffee_cup.jpg", {transformation: [
  {width: 400, height: 250, crop: "fill", gravity: "south"},
  {overlay: "nice_couple", width: 1.3, height: 1.3, crop: "crop", gravity: "faces", flags: "region_relative"},
  {effect: "saturation:50"},
  {effect: "vignette"},
  {radius: "max", width: 100, x: -20, y: 20, gravity: "center", flags: "layer_apply"}
  ]})
.Net:
cloudinary.Api.UrlImgUp.Transform(new Transformation()
  .Width(400).Height(250).Crop("fill").Gravity("south").Chain()
  .Overlay("nice_couple").Width(1.3).Height(1.3).Crop("crop").Gravity("faces").Flags("region_relative").Chain()
  .Effect("saturation:50").Chain()
  .Effect("vignette").Chain()
  .Radius("max").Width(100).X(-20).Y(20).Gravity("center").Flags("layer_apply")).BuildImageTag("coffee_cup.jpg")
Image overlay with further manipulation

Learn more about Cloudinary’s image manipulation capabilities

Manipulating multiple image overlays

In addition to being able to manipulate a single image overlay, Cloudinary allows you to add and manipulate multiple overlays, as well. You can do this by chaining another overlay, setting the flags parameter to layer_apply (or fl_layer_apply for URLs), and applying multiple image transformations. Adding another overlay is as simple as manipulating a picture to suit the existing underlying and overlaying images. In our coffee mug example, we added a balloon as an additional overlay and performed the following manipulations: resized to be 30 pixels wide, changed the hue level to pink, and rotated it five degrees.

Ruby:
cl_image_tag("coffee_cup.jpg", :transformation=>[
  {:width=>400, :height=>250, :crop=>:fill, :gravity=>:south},
  {:overlay=>"nice_couple", :width=>1.3, :height=>1.3, :crop=>:crop, :gravity=>:faces, :flags=>:region_relative},
  {:effect=>"saturation:50"},
  {:effect=>"vignette"},
  {:radius=>"max", :width=>100, :x=>-20, :y=>20, :gravity=>:center, :flags=>:layer_apply},
  {:overlay=>"balloon", :width=>30},
  {:angle=>5, :effect=>"hue:-20"},
  {:x=>30, :y=>5, :flags=>:layer_apply}
  ])
PHP:
cl_image_tag("coffee_cup.jpg", array("transformation"=>array(
  array("width"=>400, "height"=>250, "crop"=>"fill", "gravity"=>"south"),
  array("overlay"=>"nice_couple", "width"=>1.3, "height"=>1.3, "crop"=>"crop", "gravity"=>"faces", "flags"=>"region_relative"),
  array("effect"=>"saturation:50"),
  array("effect"=>"vignette"),
  array("radius"=>"max", "width"=>100, "x"=>-20, "y"=>20, "gravity"=>"center", "flags"=>"layer_apply"),
  array("overlay"=>"balloon", "width"=>30),
  array("angle"=>5, "effect"=>"hue:-20"),
  array("x"=>30, "y"=>5, "flags"=>"layer_apply")
  )))
Python:
CloudinaryImage("coffee_cup.jpg").image(transformation=[
  {"width": 400, "height": 250, "crop": "fill", "gravity": "south"},
  {"overlay": "nice_couple", "width": 1.3, "height": 1.3, "crop": "crop", "gravity": "faces", "flags": "region_relative"},
  {"effect": "saturation:50"},
  {"effect": "vignette"},
  {"radius": "max", "width": 100, "x": -20, "y": 20, "gravity": "center", "flags": "layer_apply"},
  {"overlay": "balloon", "width": 30},
  {"angle": 5, "effect": "hue:-20"},
  {"x": 30, "y": 5, "flags": "layer_apply"}
  ])
Node.js:
cloudinary.image("coffee_cup.jpg", {transformation: [
  {width: 400, height: 250, crop: "fill", gravity: "south"},
  {overlay: "nice_couple", width: 1.3, height: 1.3, crop: "crop", gravity: "faces", flags: "region_relative"},
  {effect: "saturation:50"},
  {effect: "vignette"},
  {radius: "max", width: 100, x: -20, y: 20, gravity: "center", flags: "layer_apply"},
  {overlay: "balloon", width: 30},
  {angle: 5, effect: "hue:-20"},
  {x: 30, y: 5, flags: "layer_apply"}
  ]})
Java:
cloudinary.url().transformation(new Transformation()
  .width(400).height(250).crop("fill").gravity("south").chain()
  .overlay("nice_couple").width(1.3).height(1.3).crop("crop").gravity("faces").flags("region_relative").chain()
  .effect("saturation:50").chain()
  .effect("vignette").chain()
  .radius("max").width(100).x(-20).y(20).gravity("center").flags("layer_apply").chain()
  .overlay("balloon").width(30).chain()
  .angle(5).effect("hue:-20").chain()
  .x(30).y(5).flags("layer_apply")).imageTag("coffee_cup.jpg")
jQuery:
$.cloudinary.image("coffee_cup.jpg", {transformation: [
  {width: 400, height: 250, crop: "fill", gravity: "south"},
  {overlay: "nice_couple", width: 1.3, height: 1.3, crop: "crop", gravity: "faces", flags: "region_relative"},
  {effect: "saturation:50"},
  {effect: "vignette"},
  {radius: "max", width: 100, x: -20, y: 20, gravity: "center", flags: "layer_apply"},
  {overlay: "balloon", width: 30},
  {angle: 5, effect: "hue:-20"},
  {x: 30, y: 5, flags: "layer_apply"}
  ]})
.Net:
cloudinary.Api.UrlImgUp.Transform(new Transformation()
  .Width(400).Height(250).Crop("fill").Gravity("south").Chain()
  .Overlay("nice_couple").Width(1.3).Height(1.3).Crop("crop").Gravity("faces").Flags("region_relative").Chain()
  .Effect("saturation:50").Chain()
  .Effect("vignette").Chain()
  .Radius("max").Width(100).X(-20).Y(20).Gravity("center").Flags("layer_apply").Chain()
  .Overlay("balloon").Width(30).Chain()
  .Angle(5).Effect("hue:-20").Chain()
  .X(30).Y(5).Flags("layer_apply")).BuildImageTag("coffee_cup.jpg")
Multiple image overlays with manipulations

Manipulating text overlays

Cloudinary supports adding dynamic text overlays in any style of customized text. What's more, the text overlay can be further manipulated, just like image overlays. So, returning to our example, we have now added a text overlay that we have colored, using the colorize effect, and rotated.

Ruby:
cl_image_tag("coffee_cup.jpg", :transformation=>[
  {:width=>400, :height=>250, :crop=>:fill, :gravity=>:south},
  {:overlay=>"nice_couple", :width=>1.3, :height=>1.3, :crop=>:crop, :gravity=>:faces, :flags=>:region_relative},
  {:effect=>"saturation:50"},
  {:effect=>"vignette"},
  {:radius=>"max", :width=>100, :x=>-20, :y=>20, :gravity=>:center, :flags=>:layer_apply},
  {:overlay=>"balloon", :width=>30},
  {:angle=>5, :effect=>"hue:-20"},
  {:x=>30, :y=>5, :flags=>:layer_apply},
  {:color=>"#f08", :overlay=>"text:Cookie_40_bold:Love", :effect=>"colorize"},
  {:angle=>20, :x=>-45, :y=>44, :flags=>:layer_apply}
  ])
PHP:
cl_image_tag("coffee_cup.jpg", array("transformation"=>array(
  array("width"=>400, "height"=>250, "crop"=>"fill", "gravity"=>"south"),
  array("overlay"=>"nice_couple", "width"=>1.3, "height"=>1.3, "crop"=>"crop", "gravity"=>"faces", "flags"=>"region_relative"),
  array("effect"=>"saturation:50"),
  array("effect"=>"vignette"),
  array("radius"=>"max", "width"=>100, "x"=>-20, "y"=>20, "gravity"=>"center", "flags"=>"layer_apply"),
  array("overlay"=>"balloon", "width"=>30),
  array("angle"=>5, "effect"=>"hue:-20"),
  array("x"=>30, "y"=>5, "flags"=>"layer_apply"),
  array("color"=>"#f08", "overlay"=>"text:Cookie_40_bold:Love", "effect"=>"colorize"),
  array("angle"=>20, "x"=>-45, "y"=>44, "flags"=>"layer_apply")
  )))
Python:
CloudinaryImage("coffee_cup.jpg").image(transformation=[
  {"width": 400, "height": 250, "crop": "fill", "gravity": "south"},
  {"overlay": "nice_couple", "width": 1.3, "height": 1.3, "crop": "crop", "gravity": "faces", "flags": "region_relative"},
  {"effect": "saturation:50"},
  {"effect": "vignette"},
  {"radius": "max", "width": 100, "x": -20, "y": 20, "gravity": "center", "flags": "layer_apply"},
  {"overlay": "balloon", "width": 30},
  {"angle": 5, "effect": "hue:-20"},
  {"x": 30, "y": 5, "flags": "layer_apply"},
  {"color": "#f08", "overlay": "text:Cookie_40_bold:Love", "effect": "colorize"},
  {"angle": 20, "x": -45, "y": 44, "flags": "layer_apply"}
  ])
Node.js:
cloudinary.image("coffee_cup.jpg", {transformation: [
  {width: 400, height: 250, crop: "fill", gravity: "south"},
  {overlay: "nice_couple", width: 1.3, height: 1.3, crop: "crop", gravity: "faces", flags: "region_relative"},
  {effect: "saturation:50"},
  {effect: "vignette"},
  {radius: "max", width: 100, x: -20, y: 20, gravity: "center", flags: "layer_apply"},
  {overlay: "balloon", width: 30},
  {angle: 5, effect: "hue:-20"},
  {x: 30, y: 5, flags: "layer_apply"},
  {color: "#f08", overlay: "text:Cookie_40_bold:Love", effect: "colorize"},
  {angle: 20, x: -45, y: 44, flags: "layer_apply"}
  ]})
Java:
cloudinary.url().transformation(new Transformation()
  .width(400).height(250).crop("fill").gravity("south").chain()
  .overlay("nice_couple").width(1.3).height(1.3).crop("crop").gravity("faces").flags("region_relative").chain()
  .effect("saturation:50").chain()
  .effect("vignette").chain()
  .radius("max").width(100).x(-20).y(20).gravity("center").flags("layer_apply").chain()
  .overlay("balloon").width(30).chain()
  .angle(5).effect("hue:-20").chain()
  .x(30).y(5).flags("layer_apply").chain()
  .color("#f08").overlay("text:Cookie_40_bold:Love").effect("colorize").chain()
  .angle(20).x(-45).y(44).flags("layer_apply")).imageTag("coffee_cup.jpg")
jQuery:
$.cloudinary.image("coffee_cup.jpg", {transformation: [
  {width: 400, height: 250, crop: "fill", gravity: "south"},
  {overlay: "nice_couple", width: 1.3, height: 1.3, crop: "crop", gravity: "faces", flags: "region_relative"},
  {effect: "saturation:50"},
  {effect: "vignette"},
  {radius: "max", width: 100, x: -20, y: 20, gravity: "center", flags: "layer_apply"},
  {overlay: "balloon", width: 30},
  {angle: 5, effect: "hue:-20"},
  {x: 30, y: 5, flags: "layer_apply"},
  {color: "#f08", overlay: "text:Cookie_40_bold:Love", effect: "colorize"},
  {angle: 20, x: -45, y: 44, flags: "layer_apply"}
  ]})
.Net:
cloudinary.Api.UrlImgUp.Transform(new Transformation()
  .Width(400).Height(250).Crop("fill").Gravity("south").Chain()
  .Overlay("nice_couple").Width(1.3).Height(1.3).Crop("crop").Gravity("faces").Flags("region_relative").Chain()
  .Effect("saturation:50").Chain()
  .Effect("vignette").Chain()
  .Radius("max").Width(100).X(-20).Y(20).Gravity("center").Flags("layer_apply").Chain()
  .Overlay("balloon").Width(30).Chain()
  .Angle(5).Effect("hue:-20").Chain()
  .X(30).Y(5).Flags("layer_apply").Chain()
  .Color("#f08").Overlay("text:Cookie_40_bold:Love").Effect("colorize").Chain()
  .Angle(20).X(-45).Y(44).Flags("layer_apply")).BuildImageTag("coffee_cup.jpg")
Multiple image and text overlays with further manipulation

Summary

Cloudinary’s powerful capabilities allow you to manipulate and generate complex, combined images that match your graphic design requirements. You can use Cloudinary's dynamic manipulation URLs with user-uploaded images in order to manipulate the images while combining multiple images and text overlays into a single new image. With the new features introduced in this post, you can apply Cloudinary’s rich set of manipulation capabilities separately on each layer of underlying or overlaying images or text. All overlay manipulation features are available with all Cloudinary plans, including the free tier. If you don't have a Cloudinary account yet, sign up for a free account here.

Josh Berkus: PostgreSQL data migration hacks from Tilt

From Planet PostgreSQL. Published on Mar 19, 2015.

Since the folks at Tilt.com aren't on Planet Postgres, I thought I'd link their recent blog post on cool data migration hacks.  Tilt is a YCombinator company, and a SFPUG supporter.

Jason Petersen: Announcing pg_shard 1.1

From Planet PostgreSQL. Published on Mar 19, 2015.

Last winter, we open-sourced pg_shard, a transparent sharding extension for PostgreSQL. It brought straightforward sharding capabilities to PostgreSQL, allowing tables and queries to be distributed across any number of servers.

Today we’re excited to announce the next release of pg_shard. The changes in this release include:

  • Improved performaceINSERT commands run up to four times faster
  • Shard repair — Easily bring inactive placements back up to speed
  • Copy script — Quickly import data from CSV and other files from the command line
  • CitusDB integration — Expose pg_shard’s metadata for CitusDB’s use
  • Resource improvements — Execute larger queries than ever before

For more information about recent changes, you can view all the issues closed during this release cycle on GitHub.

Upgrading or installing is a breeze: see pg_shard’s GitHub page for detailed instructions.

Whether you want a distributed document store alongside your normal PostgreSQL tables or need the extra computational power afforded by a sharded cluster, pg_shard can help. We continue to grow pg_shard’s capabilities and are open to feature requests.

Got questions?

If you have any questions about pg_shard, please contact us using the pg_shard-users mailing list.

If you discover an issue when using pg_shard, please submit it to our issue tracker on GitHub.

Further information is available on our website, where you are free to contact us with any general questions you may have.

Michael Paquier: Postgres 9.5 feature highlight: More flexible expressions in pgbench

From Planet PostgreSQL. Published on Mar 19, 2015.

A nice feature extending the usage of pgbench, in-core tool of Postgres aimed at doing benchmarks, has landed in 9.5 with this commit:

commit: 878fdcb843e087cc1cdeadc987d6ef55202ddd04
author: Robert Haas <rhaas@postgresql.org>
date: Mon, 2 Mar 2015 14:21:41 -0500
pgbench: Add a real expression syntax to \set

Previously, you could do \set variable operand1 operator operand2, but
nothing more complicated.  Now, you can \set variable expression, which
makes it much simpler to do multi-step calculations here.  This also
adds support for the modulo operator (%), with the same semantics as in
C.

Robert Haas and Fabien Coelho, reviewed by Álvaro Herrera and
Stephen Frost

pgbench has for ages support for custom input files using -f with custom variables, variables that can be set with for example \set or \setrandom, and then can be used in a custom set of SQL queries:

\set id 10 * :scale
\setrandom id2 1 :id
SELECT name, email FROM users WHERE id = :id;
SELECT capital, country FROM world_cities WHERE id = :id2;

Up to 9.4, those custom variables can be calculated with simple rules of the type "var operator var2" (the commit message above is explicit enough), resulting in many intermediate steps and variables when doing more complicated calculations (note as well that additional operands and variables, if provided, are simply ignored after the first three ones):

\setrandom ramp 1 200
\set scale_big :scale * 10
\set min_big_scale :scale_big + :ramp
SELECT :min_big_scale;

In 9.5, such cases become much easier because pgbench has been integrated with a parser for complicated expressions. In the case of what is written above, the same calculation can be done more simply with that, but far more fancy things can be done:

\setrandom ramp 1 200
\set min_big_scale :scale * 10 + :ramp
SELECT :min_big_scale;

With pgbench run for a couple of transactions, here is what you could get:

$ pgbench -f test.sql -t 5
[...]
$ tail -n5 $PGDATA/pg_log/postgresql.log
LOG:  statement: SELECT 157;
LOG:  statement: SELECT 53;
LOG:  statement: SELECT 32;
LOG:  statement: SELECT 115;
LOG:  statement: SELECT 43;

Another thing important to mention is that this commit has added as well support for the operator modulo "%". In any case, be careful to not overdue it with this feature, grouping expressions may be good for readability, but doing it too much would make it hard to understand later on how a given script has been designed.

Robert Haas: Parallel Sequential Scan for PostgreSQL 9.5

From Planet PostgreSQL. Published on Mar 18, 2015.

Amit Kapila and I have been working very hard to make parallel sequential scan ready to commit to PostgreSQL 9.5.  It is not all there yet, but we are making very good progress.  I'm very grateful to everyone in the PostgreSQL community who has helped us with review and testing, and I hope that more people will join the effort.  Getting a feature of this size and complexity completed is obviously a huge undertaking, and a significant amount of work remains to be done.  Not a whole lot of brand-new code remains to be written, I hope, but there are known issues with the existing patches where we need to improve the code, and I'm sure there are also bugs we haven't found yet.
Read more »

Hubert 'depesz' Lubaczewski: Waiting for 9.5 – array_offset() and array_offsets()

From Planet PostgreSQL. Published on Mar 18, 2015.

On 18th of March, Alvaro Herrera committed patch: array_offset() and array_offsets()   These functions return the offset position or positions of a value in an array.   Author: Pavel Stěhule Reviewed by: Jim Nasby It's been a while since my last “waiting for" post – mostly because while there is a lot of work happening, […]

Devrim GÜNDÜZ: Mark your calendars: May 9 2015, PGDay.TR in Istanbul!

From Planet PostgreSQL. Published on Mar 18, 2015.

Turkish PostgreSQL Users' and Developer's Association is organizing 4th PGDay.TR on May 9, 2015 at Istanbul. Dave Page, one of the community leaders, will be giving the keynote.

This year, we are going to have 1 full English track along with 2 Turkish tracks, so if you are close to Istanbul, please join us for a wonderful city, event and fun!

We are also looking for sponsors for this great event. Please email to sponsor@postgresql.org.tr for details.

See you in Istanbul!

Conference website: http://pgday.postgresql.org.tr/en/

Joshua Drake: WhatcomPUG meeting last night on: sqitch and... bitcoin friends were made!

From Planet PostgreSQL. Published on Mar 18, 2015.

Last night I attended the second WhatcomPUG. This meeting was about Sqitch, a interesting database revision control mechanism. The system is written in Perl and was developed by David Wheeler of PgTap fame. It looks and feels like git. As it is written in Perl it definitely has too many options. That said, what we were shown works, works well and appears to be a solid and thorough system for the job.

I also met a couple of people from CoinBeyond. They are a point-of-sale software vendor that specializes in letting "regular" people (read: not I or likely the people reading this blog) use Bitcoin!

That's right folks, the hottest young currency in the market today is using the hottest middle aged technology for their database, PostgreSQL. It was great to see that they are also located in Whatcom County. The longer I am here, the more I am convinced that Whatcom County (and especially Bellingham) is a quiet tech center working on profitable ventures without the noise of places like Silicon Valley. I just keep running into people doing interesting things with technology.

Oh, for reference:

  • Twitter: @coinbeyond
  • Facebook: CoinBeyond
  • LinkedIn: Linkedin

  • Julien Rouhaud: Talking About OPM and PoWA at pgconf.ru

    From Planet PostgreSQL. Published on Mar 18, 2015.

    Last month, I had the chance to talk about PostgreSQL monitoring, and present some of the tools I’m working on at pgconf.ru.

    This talk was a good opportunity to work on an overview of existing projects dealing with monitoring or performance, see what may be lacking and what can be done to change this situation.

    Here are my slides:

    If you’re interested in this topic, or if you developped a tool I missed while writing these slides (my apologies if it’s the case), the official wiki page is the place you should go first.

    I’d also like to thank all the pgconf.ru staff for their work, this conference was a big success, and the biggest postgresql-centric event ever organized.

    Talking About OPM and PoWA at pgconf.ru was originally published by Julien Rouhaud at rjuju's home on March 18, 2015.

    Rajeev Rastogi: Overview of PostgreSQL Engine Internals

    From Planet PostgreSQL. Published on Mar 18, 2015.

    POSTGRESQL is an open-source, full-featured relational database. This blog gives an overview of how POSTGRESQL engine processes queries received from the user.
    Typical simplified flow of PostgreSQL engine is:

    SQL Engine Flow

    As part of this blog, I am going to cover all modules marked in yellow colour.

    Parser:
    Parser module is responsible for syntactical analysis of the query. It constitute two sub-modules:
    1. Lexical scanner
    2. Bison rules/actions

    Lexical Scanner:
    Lexical scanner reads each character from the given query and return the appropriate token based on the matching rules. E.g. rules can be as follows:

       

        Name given in the <> is the state name, in the above example <xc> is the state name for the comment start. So once it sees the comment starting character, comment body token will be read in the <xc> state only.

    Bison:
    Bison reads token returned from scanner and matches the same  against the given rule for a particular query and performs the associated actions. E.g. the bison rule for SELECT statement is:

           

    So each returned token is matched with the rule mentioned above in left-right order, if at any time it does not find matching rule, then either it goes to next possible matching rule or throws an error.

    Analyzer:
    Analyzer module is responsible for doing semantic analysis of the given query.  Each raw information about the query received from the Parser module is transformed to database internal object form to get the corresponding object id. E.g. relation name "tbl" get replaces with its object id.
    Output of analyzer module is Query tree, structure of same can be seen in the structure "Query" of file src/include/nodes/parsenodes.h

    Optimizer:
    Optimizer module also consider to be brain of SQL engine is responsible for choosing the best path for execution of the query. Best path for a query is selected based on the cost of the path. The path with least cost is considered to be a winner path.
    Based on the winner path, plan is created which is used by executor to execute the query.
    Some of the important decision points are taken in terms of below methods:

    1. Scan Method
      • Sequential scan: Simply read the heap file start to end so it is considered to be very slow if many records to be fetched.
      • Index scan: Use a secondary data structure to quickly find the records that satisfy a certain predicate and then corresponding to that it looks for other part of the record in Heap. So it involves extra cost of random page access.
    2. Join Method
      • Nested Loop Join: In this join approach, each record of outer table is matched with each record of inner table. The simple algorithm for the same is:
                   For a NL join between Outer and Inner on Outer:k = Inner:k:

                                         for each tuple r in outer:
                                                 for each tuple s in Inner with s.k = r.k:
                                                            emit output tuple (r,s)

                                Equivalently: Outer is left, Inner is right.

                                            
                                      


      • Merge Join: This join is suitable only sorted record of each participating table and only for "=" join clause. The simple algorithm for this join is:
                         For both r in Outer, s in Inner:
                                              if r.k = s.k:
                                                    emit output tuple (r,s)
                                                    Advance Outer & Inner
                                               if r.k < s.k
                                                    Advance Outer
                                               else
                                                    Advance Inner
                             
                                  
                                       

      • Hash Join: This join does not require records to be sorted but this is also used only for "=" join clause.
                           For a HJ between Inner and Outer on Inner:k = Outer:k:
                                   -- build phase
                                  for each tuple r in Inner:
                                       insert r into hash table T with key r.k
                                  -- probe phase
                                 for each tuple s in Outer:
                                       for each tuple r in bucket T[s.k]:
                                       if s.k = r.k:
                                            emit output tuple (T[s.k], s)

                                  
                                      
    3. Join Order: It is mechanism to decide the order in which table has to be joined.

     Typical output of the plan is:
      postgres=# explain select firstname from friend where age=33      order by firstname;
                              QUERY PLAN
    --------------------------------------------------------------
     Sort  (cost=1.06..1.06 rows=1 width=101)
       Sort Key: firstname
       ->  Seq Scan on friend  (cost=0.00..1.05 rows=1 width=101)
             Filter: (age = 33)
    (4 rows)
      Executor:
      Executor is the module, which takes output of planner as input and transform each node of plan to state tree node. Then in turn each node gets executed to perform the corresponding operation. 
      The state tree nodes execution starts from root and to get the input, it keep going to child node till it reaches the leaf node. So finally leaf node executed to pass the input to upper node. Out of two leaf nodes, first outer node (i.e. left node) gets evaluated.
               At this point it uses the interface from storage module to retrieve the actual data.

      Typically execution process can be divided as:
      • Executor Start: Prepares the plan for execution. It process each node of plan recursively and generate corresponding state tree node. Also it initializes memory to hold projection list, qualification expression and slot for holding the resultant tuple. 
      • Executor Run: Recursively process each state tree node and each resultant tuple is send to front-end using the register destination function.
      • Executor Finish: Free all the allocated resources.
      One of the typical flow of execution is as below:
      So in above flow, Execution starts from Merge Join but it needs input to process, so it flow towards first left child node, take one tuple using index scan and then it request for input tuple from right child node. Right child node is Sort node, so it request for tuple from its child, which in turn does the sequence scan. So once all tuple is received at Sort node and tuples are shorted, then it passes the first tuple to its parent node.

      Reference:
      Older papers from PostgreSQL.  

      Tomas Vondra: Performance since PostgreSQL 7.4 / fulltext

      From Planet PostgreSQL. Published on Mar 17, 2015.

      After discussing the pgbench and TPC-DS results, it's time to look at the last benchmark, testing performance of built-in fulltext (and GIN/GiST index implementation in general).

      The one chart you should remember from this post is this one, GIN speedup between 9.3 and 9.4:

      fulltext-timing-speedups.png

      Interpreting this chart is a bit tricky - x-axis tracks duration on PostgreSQL 9.3 (log scale), while y-axis (linear scale) tracks relative speedup 9.4 vs. 9.3, so 1.0 means 'equal performance', and 0.5 means that 9.4 is 2x faster than 9.3.

      The chart pretty much shows exponential speedup for vast majority of queries - the longer the duration on 9.3, the higher the speedup on 9.4. That's pretty awesome, IMNSHO. What exactly caused that will be discussed later (spoiler: it's thanks to GIN fastscan). Also notice that almost no queries are slower on 9.4, and those few examples are not significantly slower.

      Benchmark

      While both pgbench and TPC-DS are well established benchmarks, there's no such benchmark for testing fulltext performance (as far as I know). Luckily, I've had played with the fulltext features a while ago, implementing archie - an in-database mailing list archive.

      It's still quite experimental and I use it for testing GIN/GiST related patches, but it's suitable for this benchmark too.

      So I've taken the current archives of PostgreSQL mailing lists, containing about 1 million messages, loaded them into the database and then executed 33k real-world queries collected from postgresql.org. I can't publish those queries because of privacy concerns (there's no info on users, but still ...), but the queries look like this:

      SELECT id FROM messages
       WHERE body_tsvector @@ ('optimizing & bulk & update')::tsquery
       ORDER BY ts_rank(body_tsvector, ('optimizing & bulk & update')::tsquery)
                DESC LIMIT 100;
      

      The number of search terms varies quite a bit - the simplest queries have a single letter, the most complex ones often tens of words.

      PostgreSQL config

      The PostgreSQL configuration was mostly default, with only minor changes:

      shared_buffers = 512MB
      work_mem = 64MB
      maintenance_work_mem = 128MB
      checkpoint_segments = 32
      effective_cache_size = 4GB
      

      Loading the data

      We have to load the data first, of course. In this case that involves a fair amount of additional logic implemented either in Python (parsing the mbox files into messages, loading them into the database), or PL/pgSQL triggers (thread detection, ...). The time needed to load all the 1M messages, producing ~6GB database, looks like this:

      fulltext-load.png

      Note: The chart only shows releases where the performance changed, so if only data for 8.2 and 9.4 are shown, it means that the releases up until 9.3 behave like 8.2 (more or less).

      The common wisdom is that querying GIN indexes are faster than GiST, but that they are more expensive when it comes to maintenance (creation, etc).

      If you look at PostgreSQL 8.2, the oldest release supporting GIN indexes, that certainly was true - the load took ~1300 seconds with GIN indexes and only ~800 seconds with GiST indexes. But 8.4 significantly impoved this, making the GIN indexes only slightly more expensive than GiST.

      Of course, this is incremental load - it might look very differently if the indexes were created after all the data are loaded, for example. But I argue that the incremental performance is more important here, because that's what usually matters in actual applications.

      The other argument might be that the overhead of the Python parser and PL/pgSQL triggers is overshadowing the GIN / GiST difference. That may be true, but that overhead should be about the same for both index types, so read-world applications are likely to have similar overhead.

      So I believe that GIN maintenance is not significantly more expensive than GiST - at least in this particular benchmark, but probably in other applications too. I have no doubt it's possible to construct examples where GIN maintenance is much more expensive than GiST maintenance.

      Query performance

      The one thing that's missing in the previous section is query performance. Let's assume your workload is 90% reads, and GIN is 10x faster than GiST for the queries you do - how much you care if GIN maintenance is 10x more expensive than GiST, in that case? In most cases, you'll choose GIN indexes because that'll probably give you better performance overall. (It's more complicated, of course, but I'll ignore that here.)

      So, how did the GIN and GiST performance evolved over time? GiST indexes were introduced first - in PostgreSQL 8.0 as a contrib module (aka extension in new releases), and then in core PostgreSQL 8.3. Using the 33k queries, the time to run all of them on each release is this (i.e. lower values are better):

      fulltext-gist.png

      Interesting. It took only ~3200 seconds on PostgreSQL 8.0 - 8.2, and then it slowed down to ~5200 seconds. That may be seen as a regression, but my understanding is that this is the cost of move into core - the contrib module was probably limited in various ways, and proper integration with the rest of the core required fixing these shortcomings.

      What about GIN? This feature was introduced in PostgreSQL 8.2, directly as in-core feature (so not as contrib module first).

      fulltext-gin.png

      Interestingly it was gradually slowing down a bit (by about ~15% between 8.2 and 9.3) - I take it as a sign that we really need regular benchmarking as part of development. Then, on 9.4 the performance significantly improved, thanks to this change:

      • Improve speed of multi-key GIN lookups (Alexander Korotkov, Heikki Linnakangas)

      also known as "GIN fastscan".

      I was discussing GIN vs. GiST maincenance cost vs. query performance a few paragraphs back, so what is the performance difference between GIN and GiST?

      fulltext-gist-vs-gin.png

      Well, in this particular benchmark, GIN indexes are about 10x faster than GiST (would be sad otherwise, because fulltext is the primary use of GIN), and as we've seen before it was not much slower than GiST maintenance-wise.

      GIN fastscan

      So what is the GIN fastscan about? I'll try to explain this, although it's of course a significantly simplified explanation.

      GIN indexes are used for indexing non-scalar data - for example when it comes to fulltext, each document (stored in a TEXT column as a single value) is transformed into tsvector, a list of words in the document (along with some other data, but that's irrelevant here). For example let's assume document with ID=10 contains the popular sentence

      10 => "The quick brown fox jumps over the lazy dog"
      

      This will get split into an array of words (this transformation may even remove some words, perform lemmatization):

      10 => ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
      

      If you build GIN index on this "vector" representation, the index will effectively invert the direction of the mapping by mapping words to IDs of all the rows containing that word (each row ID is a pair of block number and offset on that block):

      "The" => [(0,1), (0,10), (0,15), (2,4), ...]
      "quick" => [(0,1), (0,2), (2,10), (2,15), (3,18), ...]
      "brown" => [(1,10), (1,11), (1,12), ...]
      ...
      

      Then, if you do a fulltext query on the document, say

      SELECT * FROM documents WHERE to_tsvector(body) @@ to_tsquery('quick & fox');
      

      it can simply fetch the lists for quick and fox and combine them, to get only IDs of the documents containing both words.

      And this is exactly where GIN fastscan was applied. Until PostgreSQL 9.4, the performance of this combination step was determined by the longest list of document IDs, because it had to be walked. So if you had a query combining rare and common words (included in many documents, thus having a long lists of IDs), it was often slow.

      GIN fastscan changes this, starting with the short posting lists, and combining the lists in a smart way (by using the fact that the lists of IDs are sorted), so that the duration is determined by the shortest list of IDs.

      How much impact can this have? Let's see!

      Compression

      The fastscan is not the only improvement in 9.4 - the other significant improvement is compression of the posting lists (lists of row IDs). If you look at the previous example, you might notice that the posting list can be made quite compressible - you may sort the row IDs (first by block number, then by row offset). The block numbers will then repeat a lot, and the row offsets will be an increasing sequence.

      This redundancy may be exploited by various encoding schemes - RLE, delta, ... and that's what was done in PostgreSQL 9.4. The result is that GIN indexes are often much smaller. How much smaller really depends on the dataset, but for the dataset used in this benchmark the size dropped to 50% - from ~630MB to ~330MB. Other developers reported up to 80% savings in some cases.

      Relative speedup

      The following chart (already presented at the beginning of this blog post) presents speedup of a random sample from the 33k queries (plotting all the queries would only make it less readable). It shows relative speedup depending on the duration on PostgreSQL 9.3, i.e. each points plots

      • x-axis (log-scale) - duration on PostgreSQL 9.3
      • y-axis (linear) - (duration on PostgreSQL 9.4) / (duration on PostgreSQL 9.3)

      So if the query took 100 ms on PostgreSQL 9.3, and only takes 10 ms on PostgreSQL 9.4, this is represented by a point [100, 0.1].

      fulltext-timing-speedups.png

      There are a few interesting observations:

      • Only very few queries slowed down on PostgreSQL 9.4. Those queries are either very fast, taking less than 1ms, with a slow-down less than 1.6 (this may easily be a noise) or longer but with slowdown well below 10% (again, may be a noise).
      • Vast majority of queries is significantly faster than on PostgreSQL 9.3, which is clearly visible as an area with high density of the blue dots. The most interesting thing is that the higher the PostgreSQL 9.3 duration, the higher the speedup.

      This is perfectly consistent with the GIN fastscan - the queries that combine frequent and rare words took time proportional to the frequent word on PostgreSQL 9.3, but thanks to fastscan the performance is determined by the rare words. Hence the exponential speedup.

      Fulltext dictionaries

      While I'm quite excited about the speedup, the actual performance depends on other things too - for example what dictionary you use. In this benchmark I've been using the english dictionary, based on a simple snowball stemmer - a simple algorithmic stemmer, not using any kind of dictionary.

      If you're using a more complicated configuration - for example a dictionary-based stemmer, because that's necessary for your language, this may take quite a significant amount of time (especially if you're not using connection pooling and so the dictionaries need to be parsed over and over again - my shared_ispell project might be interesting in this case).

      GIN indexes as bitmap indexes

      PostgreSQL does not have traditional bitmap indexes, i.e. indexes serialized into simple on-disk bitmaps. There were attempts to do that feature in the past, but the gains never really outweighter the performance issues (locking and such), especially since 8.2 when bitmap index scans were implemented (i.e. construction of bitmaps from btree indexes at runtime).

      But if you think about that, GIN indexes are really bitmap indexes, with different bitmap serialiation format. If you're craving for bitmap indexes (not uncommon in analytical workloads), you might try btree_gin extension which makes it possible to create GIN indexes on scalar types (by default GIN can be built only on vector-like types - tsvector and such).

      Summary

      • The wisdom "GIN indexes are faster to query but more expensive to maintain" may not be true anymore, especially if the query performance is more important for you.
      • Load performance improved a lot, especially in PostgreSQL 8.2 (GiST) and 8.4 (GIN).
      • Query performance for GiST is mostly the same (at least since PostgreSQL 8.3 when GiST was included into core).
      • For GIN, the query performance was mostly the same until PostgreSQL 9.4, when the "fastscan" significantly improved performance of queries combining rare and frequent keys.

      Joshua Drake: Stomping to PgConf.US: Webscale is Dead; PostgreSQL is King! A challenge, do you accept?

      From Planet PostgreSQL. Published on Mar 17, 2015.

      I submitted to PgConf.US. I submitted talks from my general pool. All of them have been recently updated. They are also all solid talks that have been well received in the past. I thought I would end up giving my, "Practical PostgreSQL Performance: AWS Edition" talk. It is a good talk, is relevant to today and the community knows of my elevated opinion of using AWS with PostgreSQL (there are many times it works just great, until it doesn't and then you may be stuck).

      I also submitted a talk entitled: "Suck it! Webscale is Dead; PostgreSQL is King!". This talk was submitted as a joke. I never expected it to be accepted, it hadn't been written, the abstract was submitted on the fly, improvised and in one take. Guess which talk was accepted? "Webscale is Dead; PostgreSQL is King!". They changed the first sentence of the title which is absolutely acceptable. The conference organizers know their audience best and what should be presented.

      What I have since learned is that the talk submission committee was looking for dynamic talks, dynamic content, and new, inspired ideas. A lot of talks that would have been accepted in years past weren't and my attempt at humor fits the desired outcome. At first I thought they were nuts but then I primed the talk at SDPUG/PgUS PgDay @ Southern California Linux Expo.

      I was the second to last presenter on Thursday. I was one hour off the plane. I was only staying the night and flying home the next morning, early. The talk was easily the best received talk I have given. The talk went long, the audience was engaged, laughter, knowledge and opinions were abound. When the talk was over, the talk was given enthusiastic applause and with a definite need for water, I left the room.

      I was followed by at least 20 people, if not more. I don't know how many there were but it was more than I have ever had follow me after a talk before. I was deeply honored by the reception. One set of guys that approached me said something to the effect of: "You seem like you don't mind expressing your opinions". At this point, some of you reading may need to get a paper towel for your coffee because those that know me, know I will readily express an opinion. I don't care about activist morality or political correctness. If you don't agree with me, cool. Just don't expect me to agree with you. My soapbox is my own, rent is 2500.00 a minute, get in line. I digress, what did those guys ask me about? Systemd, I don't think they were expecting my answer, because I don't really have a problem with Systemd.

      Where am I going with this post? I am stomping my way to PgConf.US with an updated version of this talk (You always learn a few things after giving a performance). I am speaking in the first slot on Friday and I am going to do everything I can to bring it. I can't promise to be the best, I can promise to do everything in my power to be my best. I am being recorded this time. My performance will be on the inner tubes forever. I have no choice.

      A challenge, do you accept?

      I challenge all speakers at this voyage of PgConf.US to take it up a notch. If you were accepted, you have a responsibility to do so. Now, now, don't get me wrong. I am not suggesting that you put on a chicken suit and Fox News t-shirt to present. I am however suggesting that if you are a monotone speaker, try not to be. If you are boring, your audience will be bored and that is the last thing the conference, you or the audience wants. So speak from your diaphragm, engage the audience and make their time worth it!

      Why RapidSMS for SMS Application Development

      By Caktus Consulting Group from Django community aggregator: Community blog posts. Published on Mar 16, 2015.

      Caktus has been involved in quite a few projects (Libyan voter registration, UNICEF Project Mwana, and several others) that include text messaging (a.k.a. Short Message Service, or SMS), and we always use RapidSMS as one of our tools. We've also invested our own resources in supporting and extending RapidSMS.

      There are other options; why do we consistently choose RapidSMS?

      What is RapidSMS

      First, what is RapidSMS? It's an open source package of useful tools that extend the Django web development framework to support processing text messages. It includes:

      • A framework for writing code to be invoked when a text message is received and respond to it
      • A set of backends - pluggable code modules that can interface to various ways of connecting your Django program to the phone network to pass text messages back and forth
      • Sample applications
      • Documentation

      The backends are required because unlike email, there's no universal standard for sending and receiving text messages over the Internet. Often we get access to the messages via a third party vendor, like Twilio or Tropo, that provides a proprietary interface. RapidSMS isolates us from the differences among vendors.

      RapidSMS is open source, under the BSD license, with UNICEF acting as holder of the contributors' agreements (granting a license for RapidSMS to use and distribute their contributions). See the RapidSMS license for more about this.

      Alternatives

      Here are some of the alternatives we might have chosen:

      • Writing from scratch: starting each project new and building the infrastructure to handle text messages again
      • Writing to a particular vendor's API: writing code that sends and receives text messages using the programming interface provided by one of the online vendors that provide that service, then building applications around that
      • Other frameworks

      Why RapidSMS

      Why did we choose RapidSMS?

      • RapidSMS builds on Django, our favorite web development framework.
      • RapidSMS is at the right level for us. It provides components that we can use to build our own applications the way we need to, and the flexibility to customize its behavior.
      • RapidSMS is open source, under the BSD license. There are no issues with our use of it, and we are free to extend it when we need to for a particular project. We then have the opportunity to contribute our changes back to the RapidSMS community.
      • RapidSMS is vendor-neutral. We can build our applications without being tied to any particular vendor of text messaging services. That's good for multiple reasons:
      • We don't have to pick a vendor before we can start.
      • We could change vendors in the future without having to rewrite the applications.
      • We can deploy applications to different countries that might not have any common vendor for messaging services.

      It's worth noting that using RapidSMS doesn't even require using an Internet text messaging vendor. We can use other open source applications like Vumi or Kannel as a gateway to provide us with even more options:

      • use hardware called a "cellular/GSM modem" (basically a cell phone with a connection to a computer instead of a screen)
      • interface directly to a phone company's own servers over the Internet, using several widely used protocols

      Summary

      RapidSMS is a good fit for us at Caktus, it adds a lot to our projects, and we've been pleased to be able to contribute back to it.

      Caktus will be leading a workshop on building RapidSMS applications during PyCon 2015 on Tuesday, April 7th 3:00-5:30.

      My approach to Class Based Views

      By Luke Plant from Django community aggregator: Community blog posts. Published on Mar 16, 2015.

      I've written in the past about my dislike for Django's Class Based Views. Django's CBVs add a lot of complexity and verbosity, and simply get in the way of some moderately common patterns (e.g. when you have two forms in a single view). It seems I'm not alone as a Django core dev who thinks that way.

      In this post, however, I'll write about a different approach that I took in one project, which can be summed up like this:

      Write your own base class.

      For really simple model views, Django's own CBVs can be a time saver. For anything more complex, you will run into difficulties, and will need some heavy documentation at the very least.

      One solution is to use a simplified re-implementation of Class Based Views. My own approach is to go even further and start from nothing, writing your own base class, while borrowing the best ideas and incorporating only what you need.

      Steal the good ideas

      The as_view method provided by the Django's View class is a great idea — while it may not be obvious, it was hammered out after a lot of discussion as a way to help promote request isolation by creating a new instance of the class to handle every new request. So I'll happily steal that!

      Reject the bad

      Personally I dislike the dispatch method with its assumption that handling of GET and POST is going to be completely different, when often they can overlap a lot (especially for typical form handling). It has even introduced bugs for me where a view rejected POST requests, when what it needed to do was just ignore the POST data, which required extra code!

      So I replaced that with a simple handle function that you have to implement to do any logic.

      I also don't like the way that template names are automatically built from model names etc. — this is convention over configuration, and it makes life unnecessarily hard for a maintenance programmer who greps to find out where a template is used. If that kind of logic is used, you just Have To Know where to look to see if a template is used at all and how it is used. So that is going.

      Flatten the stack

      A relatively flat set of base classes is going to be far easier to manage than a large set of mixins and base classes. By using a flat stack, I can avoid writing crazy hacks to subvert what I have inherited.

      Write the API you want

      For instance, one of the things I really dislike about Django's CBVs is the extremely verbose way of adding new data to the context, which is something that ought to be really easy, but instead requires 4 lines:

      class MyView(ParentView):
          def get_context_data(self, **kwargs):
              context = super().get_context_data(**kwargs)
              context['title'] = "My title"  # This is the only line I want to write!
              return context
      

      In fact, it is often worse, because the data to add to the context may actually have been calculated in a different method, and stuck on self so that get_context_data could find it. And you also have the problem that it is easy to do it wrong e.g. if you forget the call to super things start breaking in non-obvious ways.

      (In searching GitHub for examples, I actually found hundreds and hundreds of examples that look like this:

      class HomeView(TemplateView):
          # ...
      
          def get_context_data(self):
              context = super(HomeView, self).get_context_data()
              return context
      

      This doesn't make much sense, until I realised that people are using boilerplate generators/snippets to create new CBVs — such as this for emacs and this for vim, and this for Sublime Text. You know when you have created an unwieldy API when people need these kinds of shortcuts.)

      So, the answer is:

      Imagine the API you want, them implement it.

      This is what I would like to write for static additions to the context:

      class MyView(ParentView):
          context = {'title': "My title"}
      

      and for dynamic:

      class MyView(ParentView):
          def context(self):
              return {'things': Thing.objects.all() if self.request.is_authenticated()
                                else Things.objects.public()}
      
          # Or perhaps using a lambda:
          context = lambda self: ...
      

      And I would like any context defined by ParentView to be automatically accumulated, even though I didn't explicitly call super. (After all, you almost always want to add to context data, and if necessary a subclass could remove specific inherited data by setting a key to None).

      I'd also like for any method in my CBV to simply be able to add data to the context directly, perhaps by setting/updating an instance variable:

      class MyView(ParentView):
      
          def do_the_thing(self):
              if some_condition():
                  self.context['foo'] = 'bar'
      

      Of course, it goes without saying that this shouldn't clobber anything at the class level and violate request isolation, and all of these methods should work together nicely in the way you would expect. And it should be impossible to accidentally update any class-defined context dictionary from within a method.

      Now, sometimes after you've finished dreaming, you find your imagined API is too tricky to implement due to a language issue, and has to be modified. In this case, the behaviour is easily achievable, although it is a little bit magic, because normally defining a method in a subclass without using super means that the super class definition would be ignored, and for class attributes you can't use super at all.

      So, my own preference is to make this more obvious by using the name magic_context for the first two (the class attribute and the method). That way I get the benefits of the magic, while not tripping up any maintainer — if something is called magic_foo, most people are going to want to know why it is magic and how it works.

      The implementation uses a few tricks, the heart of which is using reversed(self.__class__.mro()) to get all the super-classes and their magic_context attributes, iteratively updating a dictionary with them.

      Notice too how the TemplateView.handle method is extremely simple, and just calls out to another method to do all the work:

      class TemplateView(View):
          # ...
          def handle(self, request):
              return self.render({})
      

      This means that a subclass that defines handle to do the actual logic doesn't need to call super, but just calls the same method directly:

      class MyView(TemplateView):
          template_name = "mytemplate.html"
      
          def handle(self, request):
              # logic here...
              return self.render({'some_more': 'context_data'})
      

      In addition to these things, I have various hooks that I use to handle things like AJAX validation for form views, and RSS/Atom feeds for list views etc. Because I'm in control of the base classes, these things are simple to do.

      Conclusion

      I guess the core idea here is that you shouldn't be constrained by what Django has supplied. There is actually nothing about CBVs that is deeply integrated into Django, so your own implementation is just as valid as Django's, but you can make it work for you. I would encourage you to write the actual code you want to write, then make the base class that enables it to work.

      The disadvantage, of course, is that maintenance programmers who have memorised the API of Django's CBVs won't benefit from that in the context of a project which uses another set of base classes. However, I think the advantages more than compensate for this.

      Feel free to borrow any of the code or ideas if they are useful!

      Saving processes and threads in a WSGI server with Moya

      By Will McGugan from Django community aggregator: Community blog posts. Published on Mar 14, 2015.

      I have a webserver with 3 WSGI applications running on different domains (1, 2, 3). All deployed with a combination of Gunicorn and NGINX. A combination that works really well, but there are two annoyances that are only going to get worse the more sites I deploy:

      A) The configuration for each server resides in a different location on the filesystem, so I have to recall & type a long path to edit settings.

      B) More significantly, each server adds extra resource requirements. I follow the advice of running each WSGI application with (2 * number_of_cores + 1) processes, each with 8 threads. The threads may be overkill, but that ensures that the server can use all available capacity to handle dynamic requests. On my 4 core server, that's 9 processes, 72 threads per site. Or 27 processes, and 216 threads for the 3 sites. Clearly that's not scalable if I want to host more web applications on one server.

      A new feature recently added to Moya fixes both those problems. Rather than deploy a WSGI application for each site, Moya can now optionally create a single WSGI application that serves many sites. With this new system, configuration is read from /etc/moya/, which contains a directory structure like this:

      |-- logging.ini
      |-- moya.conf
      |-- sites-available
      | |-- moyapi.ini
      | |-- moyaproject.ini
      | `-- notes.ini
      `-- sites-enabled
      |-- moyapi.ini
      |-- moyaproject.ini
      `-- notes.ini

      At the top level is “moya.conf” which contains a few server-wide settings, and “logging.ini” which contains logging settings. The directories “sites-available” and “sites-enabled” work like Apache and NGINX servers; settings for each site are read from “sites-enabled”, which contains symlinks to files in “sites-available”.

      Gunicorn (or other wsgi server) can run these sites with a single instance by specifying the WSGI module as “moya.service:application”. This application object dispatches the request to the appropriate server (based on a domain defined in the INI).

      Because all three sites are going through a single Gunicorn instance, only one lot of processes / threads will ever be needed. And the settings files are much easier to locate. Another advantage is that there will be less configuration required to add another site.

      This new multi-server system is somewhat experimental, and hasn't been documented. But since I believe in eating my own dog-food, it has been live now for a whole hour–with no problems.

      Saving processes and threads in a WSGI server with Moya

      By Will McGugan from Django community aggregator: Community blog posts. Published on Mar 14, 2015.

      I have a webserver with 3 WSGI applications running on different domains (1, 2, 3). All deployed with a combination of Gunicorn and NGINX. A combination that works really well, but there are two annoyances that are only going to get worse the more sites I deploy:

      A) The configuration for each server resides in a different location on the filesystem, so I have to recall & type a long path to edit settings.

      B) More significantly, each server adds extra resource requirements. I follow the advice of running each WSGI application with (2 * number_of_cores + 1) processes, each with 8 threads. The threads may be overkill, but that ensures that the server can use all available capacity to handle dynamic requests. On my 4 core server, that's 9 processes, 72 threads per site. Or 27 processes, and 216 threads for the 3 sites. I suspect that is too many resources.

      A new feature recently added to Moya fixes both those problems. Rather than deploy a WSGI application for each site, Moya can now optionally create a single WSGI application that serves many sites. With this new system, configuration is read from /etc/moya/, which contains a directory structure like this:

      |-- logging.ini
      |-- moya.conf
      |-- sites-available
      | |-- moyapi.ini
      | |-- moyaproject.ini
      | `-- notes.ini
      `-- sites-enabled
      |-- moyapi.ini
      |-- moyaproject.ini
      `-- notes.ini

      At the top level is “moya.conf” which contains a few server-wide settings, and “logging.ini” which contains logging settings. The directories “sites-available” and “sites-enabled” work like Apache and NGINX servers; settings for each site are read from “sites-enabled”, which contains symlinks to files in “sites-available”.

      Gunicorn (or other wsgi server) can run these sites with a single instance by specifying the WSGI module as “moya.service:application”. This application object dispatches the request to the appropriate server (based on a domain defined in the INI).

      Because all three sites are going through a single Gunicorn instance, only one lot of processes / threads will ever be needed. And the settings files are much easier to locate. Another advantage is that there will be less configuration required to add another site.

      This new multi-server system is somewhat experimental, and hasn't been documented. But since I believe in eating my own dog-food, it has been live now for a whole hour–with no problems.

      How to dynamically create SEO friendly URLs for your site's images

      By Cloudinary Blog - Django from Django community aggregator: Community blog posts. Published on Mar 11, 2015.

      SEO friendly image URLs Image URLs tend to appear as a long list of random characters that are not intended for viewers and are not very useful to search engines. Concise and meaningful image file names are better for search engines to extract information about an image, therefore supporting your site's SEO ranking.

      Often times, users' uploaded image files do not have descriptive names. This creates a great challenge for developers and site content managers who need to maintain SEO friendly, short, and meaningful URLs. Cloudinary helped tackle this issue by offering users two new features: Root Path URL and Dynamic SEO suffixes. These features can be useful for website as well as web application owners, and are especially recommended for content-heavy sites, like online magazines and news sites.

      Root Path URLs

      Cloudinary's image URLs are delivered via CDN and use folder names as their path prefixes. These include resource and image types, such as /image/upload and /raw/upload. The most popular prefix among Cloudinary’s users includes /image/upload in its URL. Now, with Cloudinary’s Root Path URL feature, the /image/upload section of URLs can be removed, solely leaving the image’s public ID (file name) at the path’s root.

      Below is an example of an image that was uploaded to Cloudinary and was assigned basketball_shot as the public ID:

      And here is an example of a Cloudinary image URL that uses the Root Path URL feature:

      Ruby:
      cl_image_tag("basketball_shot.jpg", :use_root_path=>true)
      PHP:
      cl_image_tag("basketball_shot.jpg", array("use_root_path"=>true))
      Python:
      CloudinaryImage("basketball_shot.jpg").image(use_root_path=True)
      Node.js:
      cloudinary.image("basketball_shot.jpg", {use_root_path: true})
      Java:
      cloudinary.url().useRootPath(true).imageTag("basketball_shot.jpg")
      jQuery:
      $.cloudinary.image("basketball_shot.jpg", {use_root_path: true})
      .Net:
      cloudinary.Api.UrlImgUp.UseRootPath(true).BuildImageTag("basketball_shot.jpg")

      Both URLs yield the same uploaded image:

      Basketball shot sample image

      With the Root Path URL capability, users can also add parameters for on-the-fly image manipulation. For example, if an uploaded image needs to be cropped to 200 x 200 pixels, it can be transformed simply by setting the width and height parameters to 200 and the crop mode to 'fill':

      Ruby:
      cl_image_tag("basketball_shot.jpg", :width=>200, :height=>200, :crop=>:fill, :use_root_path=>true)
      PHP:
      cl_image_tag("basketball_shot.jpg", array("width"=>200, "height"=>200, "crop"=>"fill", "use_root_path"=>true))
      Python:
      CloudinaryImage("basketball_shot.jpg").image(width=200, height=200, crop="fill", use_root_path=True)
      Node.js:
      cloudinary.image("basketball_shot.jpg", {width: 200, height: 200, crop: "fill", use_root_path: true})
      Java:
      cloudinary.url().transformation(new Transformation().width(200).height(200).crop("fill")).useRootPath(true).imageTag("basketball_shot.jpg")
      jQuery:
      $.cloudinary.image("basketball_shot.jpg", {width: 200, height: 200, crop: "fill", use_root_path: true})
      .Net:
      cloudinary.Api.UrlImgUp.Transform(new Transformation().Width(200).Height(200).Crop("fill")).UseRootPath(true).BuildImageTag("basketball_shot.jpg")
      200x200 basketball shot thumbnail

      When using Cloudinary's client libraries (SDKs) to build delivery URLs and to add image tags, you simply need to set the new parameter, use_root_path, to true.

      The following code sample was used to create an HTML image tag with an image URL using the Root Path URL feature:

      Ruby:
      cl_image_tag("basketball_shot.jpg", :width=>200, :height=>200, :crop=>:fill, :use_root_path=>true)
      PHP:
      cl_image_tag("basketball_shot.jpg", array("width"=>200, "height"=>200, "crop"=>"fill", "use_root_path"=>true))
      Python:
      CloudinaryImage("basketball_shot.jpg").image(width=200, height=200, crop="fill", use_root_path=True)
      Node.js:
      cloudinary.image("basketball_shot.jpg", {width: 200, height: 200, crop: "fill", use_root_path: true})
      Java:
      cloudinary.url().transformation(new Transformation().width(200).height(200).crop("fill")).useRootPath(true).imageTag("basketball_shot.jpg")
      jQuery:
      $.cloudinary.image("basketball_shot.jpg", {width: 200, height: 200, crop: "fill", use_root_path: true})
      .Net:
      cloudinary.Api.UrlImgUp.Transform(new Transformation().Width(200).Height(200).Crop("fill")).UseRootPath(true).BuildImageTag("basketball_shot.jpg")

      Dynamic SEO Suffixes

      Many of our users have requested a Cloudinary capability that creates image URLs that are more comprehensive and descriptive. Each image uploaded to Cloudinary is given a public ID, which is its name for delivery URLs. Cloudinary already offers the ability to define custom public IDs with a string of text or multiple folder names (separated by slashes), while uploading images. These public IDs can be as descriptive as necessary.

      Our new feature allows you to separate the process of uploading an image and assigning a public ID from creating descriptive URLs. If an image is not given a suitable name during the upload process, you will be able to assign additional URLs to the image afterwards. For example, with this feature, you can dynamically add multiple different suffixes to create as many descriptive URLs as necessary for your site. You may want to use these URLs to support different languages for a single image or to reflect specific content on certain pages.

      To add a dynamic SEO suffix, an image’s path prefix must first be changed from the default /image/upload to the shorter version /images.

      Here is an example of an image that was uploaded with the ID tepu4mm0qzw6lkfxt1m and is delivered by the following CDN optimized URL using a standard path prefix:

      Ruby:
      cl_image_tag("ltepu4mm0qzw6lkfxt1m.jpg")
      PHP:
      cl_image_tag("ltepu4mm0qzw6lkfxt1m.jpg")
      Python:
      CloudinaryImage("ltepu4mm0qzw6lkfxt1m.jpg").image()
      Node.js:
      cloudinary.image("ltepu4mm0qzw6lkfxt1m.jpg")
      Java:
      cloudinary.url().imageTag("ltepu4mm0qzw6lkfxt1m.jpg")
      jQuery:
      $.cloudinary.image("ltepu4mm0qzw6lkfxt1m.jpg")
      .Net:
      cloudinary.Api.UrlImgUp.BuildImageTag("ltepu4mm0qzw6lkfxt1m.jpg")
      Image delivery

      Below, the suffix basketball-game-in-college was added, which is the text that search engines use to index the page and image:

      Ruby:
      cl_image_tag("ltepu4mm0qzw6lkfxt1m.jpg", :url_suffix=>"basketball-game-in-college")
      PHP:
      cl_image_tag("ltepu4mm0qzw6lkfxt1m.jpg", array("url_suffix"=>"basketball-game-in-college"))
      Python:
      CloudinaryImage("ltepu4mm0qzw6lkfxt1m.jpg").image(url_suffix="basketball-game-in-college")
      Node.js:
      cloudinary.image("ltepu4mm0qzw6lkfxt1m.jpg", {url_suffix: "basketball-game-in-college"})
      Java:
      cloudinary.url().urlSuffix("basketball-game-in-college").imageTag("ltepu4mm0qzw6lkfxt1m.jpg")
      jQuery:
      $.cloudinary.image("ltepu4mm0qzw6lkfxt1m.jpg", {url_suffix: "basketball-game-in-college"})
      .Net:
      cloudinary.Api.UrlImgUp.UrlSuffix("basketball-game-in-college").BuildImageTag("ltepu4mm0qzw6lkfxt1m.jpg")

      In the URL below, the same image is given an additional, separate suffix in Spanish:

      Ruby:
      cl_image_tag("ltepu4mm0qzw6lkfxt1m.jpg", :url_suffix=>"baloncesto-juego-en-universidad")
      PHP:
      cl_image_tag("ltepu4mm0qzw6lkfxt1m.jpg", array("url_suffix"=>"baloncesto-juego-en-universidad"))
      Python:
      CloudinaryImage("ltepu4mm0qzw6lkfxt1m.jpg").image(url_suffix="baloncesto-juego-en-universidad")
      Node.js:
      cloudinary.image("ltepu4mm0qzw6lkfxt1m.jpg", {url_suffix: "baloncesto-juego-en-universidad"})
      Java:
      cloudinary.url().urlSuffix("baloncesto-juego-en-universidad").imageTag("ltepu4mm0qzw6lkfxt1m.jpg")
      jQuery:
      $.cloudinary.image("ltepu4mm0qzw6lkfxt1m.jpg", {url_suffix: "baloncesto-juego-en-universidad"})
      .Net:
      cloudinary.Api.UrlImgUp.UrlSuffix("baloncesto-juego-en-universidad").BuildImageTag("ltepu4mm0qzw6lkfxt1m.jpg")

      Additional image transformations can be easily made by adding parameters to Cloudinary's on-the-fly manipulation URLs. Here, the same image is transformed to a 200 x 200 pixel crop with rounded corners and increased saturation:

      Ruby:
      cl_image_tag("ltepu4mm0qzw6lkfxt1m.jpg", :radius=>30, :width=>200, :height=>200, :crop=>:fill, :gravity=>:west, :effect=>"saturation:50", :url_suffix=>"basketball-game-in-college")
      PHP:
      cl_image_tag("ltepu4mm0qzw6lkfxt1m.jpg", array("radius"=>30, "width"=>200, "height"=>200, "crop"=>"fill", "gravity"=>"west", "effect"=>"saturation:50", "url_suffix"=>"basketball-game-in-college"))
      Python:
      CloudinaryImage("ltepu4mm0qzw6lkfxt1m.jpg").image(radius=30, width=200, height=200, crop="fill", gravity="west", effect="saturation:50", url_suffix="basketball-game-in-college")
      Node.js:
      cloudinary.image("ltepu4mm0qzw6lkfxt1m.jpg", {radius: 30, width: 200, height: 200, crop: "fill", gravity: "west", effect: "saturation:50", url_suffix: "basketball-game-in-college"})
      Java:
      cloudinary.url().transformation(new Transformation().radius(30).width(200).height(200).crop("fill").gravity("west").effect("saturation:50")).urlSuffix("basketball-game-in-college").imageTag("ltepu4mm0qzw6lkfxt1m.jpg")
      jQuery:
      $.cloudinary.image("ltepu4mm0qzw6lkfxt1m.jpg", {radius: 30, width: 200, height: 200, crop: "fill", gravity: "west", effect: "saturation:50", url_suffix: "basketball-game-in-college"})
      .Net:
      cloudinary.Api.UrlImgUp.Transform(new Transformation().Radius(30).Width(200).Height(200).Crop("fill").Gravity("west").Effect("saturation:50")).UrlSuffix("basketball-game-in-college").BuildImageTag("ltepu4mm0qzw6lkfxt1m.jpg")
      200x200 thumbnail with dynamic SEO suffix

      This capability is also applicable for non-image raw file uploads. First, the resource type /raw/upload should be replaced by /files and a suffix should be added after. When using Cloudinary’s SDK for various development frameworks, set the new url_suffix parameter to any text. URLs will be built automatically with a /files prefix, as well as the added suffix.

      Use Your Own Domain

      The Dynamic SEO Suffix capability is also available, with a custom domain (CNAME) or private CDN setup. Cloudinary enables you to use your domain name to further customize URLs with our Advanced plan or higher.

      For example:

      http://images.<mydomain.com>/w_200/afe6c8e2ca/basktetball-game.jpg

      We invite you to contact us to learn more and enable these advanced features.

      Summary

      With these two capabilities, Cloudinary aims to help you easily create advanced image manipulation and delivery URLs to better optimize your site for search engines. Cloudinary users can use both the Root Path URLs and Dynamic SEO Suffix features together to build a short and descriptive image URL. The Root Path URL capability is available for all accounts, including the free tier, and the Dynamic SEO Suffix capability is available with Cloudinary’s Advanced Plan or higher with a private CDN setup.

      If you don't have a Cloudinary account yet, we invite you sign up for a free account here.

      Documenting Python without Sphinx

      By Griatch's Evennia musings (MU* creation with Django+Twisted) from Django community aggregator: Community blog posts. Published on Mar 09, 2015.

      Last week Evennia merged its development branch with all the features mentioned in the last post. Post-merger we have since gone through and fixed remaining bugs and shortened the list at a good clip.

      One thing I have been considering is how to make Evennia's API auto-documenting - we are after all a MUD creation library and whereas our code has always been well-documented the docs were always only accessible from the source files themselves.

      Now, when you hear "Python" and "documentation" in the same sentence, the first thought usually involves Sphinx or Sphinx autodoc in some form. Sphinx produces very nice looking documentation indeed. My problem is however as follows:
      • I don't want our API documentation to be written in a different format from the rest of our documentation, which is in Github's wiki using Markdown.  Our users should be able to help document Evennia without remembering which formatting language is to be used.
      • I don't like reStructuredText syntax. This is a personal thing. I get that it is powerful but it is also really, really ugly to read in its raw form in the source code. I feel the sources must be easy to read on their own.
      • Sphinx plugins like napoleon understands this ugliness and allows you to document your functions and classes in a saner form, such as the "Google style". One still needs reST for in-place formatting though.
      • Organizing sphinx document trees is fiddly and having had a few runs with sphinx autodoc it's just a mess trying to get it to section the Evennia sources in a way that makes sense. It could probably be done if I worked a lot more with it, but it's a generic page generator and I feel that I will eventually have to get down to make those toctrees myself before I'm happy.
      • I want to host the API docs as normal Wiki files on Github (this might be possible with reST too I suppose).

      Long story short, rather than messing with getting Sphinx to do what I want, I ended up writing my own api-to-github-Markdown parser for the Evennia sources: api2md. Using Python's inspect module and aiming for a subset of the Google formatted docstrings, this was maybe a day's work in total - the rest was/is fine-tuning for prettiness.
       
      Now whenever the source is updated, I follow the following procedure to fully update the API docs:

      1. I pull the latest version of Evennia's wiki git repository from github alongside the latest version of the main Evennia repository.
      2. I run api2md on the changed Evennia sources. This crawls the main repo for top-level package imports (which is a small list currently hard-coded in the crawler - this is to know which modules should create "submodule" pages rather than try to list class contents etc). Under each package I specify it then recursively gets all modules. For each module in that package, it creates a new Markdown formatted wiki page which it drops in a folder in the wiki repository. The files are named after the model's path in the library, meaning you get files like evennia.objects.models.md and can easily cross-link to subsections (aka classes and functions) on a page using page anchors.
      3. I add eventual new files and commit the changes, then push the result to the Github wiki online. Done!
      (I could probably automate this with a git hook. Maybe as a future project.)

      The api2md program currently has some Evennia-custom elements in it (notably in which packages it imports) but it's otherwise a very generic parser of Python code into Markdown. It could maybe be broken out into its own package at some point if there's interest.   

      The interesting thing is that since I already have code for converting our wiki to reST and ReadTheDocs, I should be able to get the best of both worlds and convert our API wiki pages the same way later. The result will probably not be quite as custom-stunning as a Sphinx generated autodoc (markdown is a lot simpler in what formatting options it can offer) but that is a lesser concern.
      So far very few of Evennia's docstrings are actually updated for the Google style syntax (or any type of formatting, really) so the result is often not too useful. We hope that many people will help us with the doc strings in the future - it's a great and easy way to get to know Evennia while helping out.

      But where the sources are updated, the auto-generated wiki page looks pretty neat.


      (Image from Wikimedia commons)

      How to build a notification feed using Stream

      By Thierry Schellenbach from Django community aggregator: Community blog posts. Published on Mar 06, 2015.

      Tommaso wrote a quick tutorial on how to build a notification feed using Django and Stream.
      How to build a notification feed using Stream.

      Share and Enjoy: Digg Sphinn del.icio.us Facebook Mixx Google

      Connecting Django Models with outer applications

      By Between engineering and real life from Django community aggregator: Community blog posts. Published on Mar 05, 2015.

      Preface: Sometimes, parts of the data that you have to display in your application reside out of the Django models. Simple example for this is the following case - the client requires that you build them a webshop but they already have CRM solution that holds their products info. Of course they provide you with a mechanism to read this data from their CRM.

      Specialty: The problem is that the data in their CRM does not hold some of the product information that you need. For instance it misses SEO-friendly description and product image. So you will have to set up a model at your side and store these images there. It is easy to join them, the only thing that you will need is a simple unique key for every product.

      Solution: Here we use the product_id field to make the connection between the CRM data and the Django model.
      # in models.py class Product(models.Model): product_id = models.IntegerField(_('Original Product'), unique=True) description = models.TextField(_('SEO-friendly Description'), blank=True) pod_image = FilerImageField(verbose_name=_('Product Image'), blank=True, null=True) @property def name(self): return crm_api.get_product_name(self.product_id) # in forms.py class ProductForm(forms.ModelForm): name = forms.CharField(required=False, widget=forms.TextInput(attrs={ 'readonly': True, 'style': 'border: none'})) class Meta: model = Product widgets = { 'product_id': forms.Select(), } def __init__(self, *args, **kwargs): super(ProductForm, self).__init__(*args, **kwargs) self.fields['product_id'].widget.choices = crm_api.get_product_choices() if self.instance.id: self.fields['name'].initial = self.instance.name

      The form here should be used in the admin(add/edit) page of the model. We define that the product_id field will use the select widget and we use a method that connect to the CRM and returns the product choices list.
      The "self.instance.id" check is used to fill the name field for product that are already saved.

      Final words: This is a very simple example but its idea is to show the basic way to connect your models with another app. I strongly recommend you to use caching if your CRM data is not modified very often in order to save some bandwidth and to speed up your application.
      Also if you have multiple field it may be better to overwrite the __getattr__ method instead of defining separate one for each property that you need to pull from the outer application.

      P.S. Thanks to Miga for the code error report.

      Software for business

      By Between engineering and real life from Django community aggregator: Community blog posts. Published on Mar 05, 2015.

      I am starting a new blog. The reason is that I want to keep this one more technically oriented while the other will be more business and customers oriented. Its name is Software for business and the idea is to show to the business in less technical details how the modern IT technologies like CRM & ERP systems, web sites, e-commerce solutions, online marketing and so on can help their business.
      If you find the topic interesting feel free to join.

      Faking attributes in Python classes...

      By Between engineering and real life from Django community aggregator: Community blog posts. Published on Mar 05, 2015.

      ... or how to imitate dynamic properties in a class object

      Preface: When you have connections between your application and other systems frequently the data is not in the most useful form for your needs. If you have an API it is awesome but sometimes it just does not act the way you want and your code quickly becomes a series of repeating API calls like api.get_product_property(product_id, property).
      Of course it will be easier if you can use objects to represent the data in you code so you can create something like a proxy class to this API:
      class Product(object): def __init__(self, product_id): self.id = product_id @property def name(self): return api_obj.get_product_property(self.id, 'name') @property def price(self): return api_obj.get_product_property(self.id, 'price') #usage product = Product(product_id) print product.name In my opinion it is cleaner, more pretty and more useful than the direct API calls. But still there is something not quite right. Problem: Your model have not two but twenty properties. Defining 20 method makes the code look not that good. Not to mention that amending the code every time when you need a new property is quite boring. So is there a better way? As I mention at the end of Connecting Django Models with outer applications if you have a class that plays the role of a proxy to another API or other data it may be easier to overwrite the __getattr__ method. Solution: class Product(object): def __init__(self, product_id): self.id = product_id def __getattr__(self, key): return api_obj.get_product_property(self.id, key) #usage product = Product(product_id) print product.name

      Now you can directly use the product properties as attribute names of the Product class. Depending from the way that the API works it would be good to raise AttributeError if there is no such property for the product.

      Simple Site Checker

      By Between engineering and real life from Django community aggregator: Community blog posts. Published on Mar 05, 2015.

      ... a command line tool to monitor your sitemap links

      I was thinking to make such tool for a while and fortunately I found some time so here it is.

      Simple Site Checker is a command line tool that allows you to run a check over the links in you XML sitemap.

      How it works: The script requires a single attribute - a URL or relative/absolute path to xml-sitemap. It loads the XML, reads all loc-tags in it and start checking the links in them one by one.
      By default you will see no output unless there is an error - the script is unable to load the sitemap or any link check fails.
      Using the verbosity argument you can control the output, if you need more detailed information like elapsed time, checked links etc.
      You can run this script through a cron-like tool and get an e-mail in case of error.

      I will appreciate any user input and ideas so feel free to comment.

      HTTP Status Codes Site

      By Between engineering and real life from Django community aggregator: Community blog posts. Published on Mar 05, 2015.

      During the development of Simple Site Checker I realised that it would be useful for test purposes if there is a website returning all possible HTTP status codes. Thanks to Google App Engine and webapp2 framework building such website was a piece of cake.

      The site can be found at http://httpstatuscodes.appspot.com.

      The home page provides a list of all HTTP status codes and their names and if you want to get an HTTP response with a specific status code just add the code after the slash, example:
      http://httpstatuscodes.appspot.com/200 - returns 200 OK
      http://httpstatuscodes.appspot.com/500 - returns 500 Internal Server Error
      Also at the end of each page is located the URL of the HTTP protocol Status Codes Definitions with detailed explanation for each one of them.

      The website code is publicly available in github at HTTP Status Codes Site.

      If you find it useful feel free to comment and/or share it.

      Fabric & Django

      By Between engineering and real life from Django community aggregator: Community blog posts. Published on Mar 05, 2015.

      Or how automate the creation of new projects with simple script

      Preface: Do you remember all this tiny little steps that you have to perform every time when you start new project - create virtual environment, install packages, start and setup Django project? Kind of annoying repetition, isn't it? How about to automate it a bit.

      Solution: Recently I started learning Fabric and thought "What better way to test it in practice than automating a simple, repetitive task?". So, lets mark the tasks that I want the script to perform:

      1. Create virtual environment with the project name
      2. Activate the virtual environment
      3. Download list of packages and install them
      4. Make 'src' directory where the project source will reside
      5. Create new Django project in source directory
      6. Update the settings

      Thanks to the local command the first one was easy. The problem was with the second one. Obviously each local command is run autonomously so I had to find some way have activated virtual environment for each task after this. Fortunately the prefix context manager works like a charm. I had some issues making it read and write in the paths I wants and voilà it was working exactly as I want.

      The script is too long to place it here but is publicly available at https://gist.github.com/2818562
      It is quite simple to use, you only need python, fabric and virtual environment. Then just use the following code.
      fab start_project:my_mew_project

      To Do: Here are few things that can be improved:

      • Read the packages from a file
      • Update urls.py to enable admin
      • Generate Nginx server block file

      So this is my first try with Fabric, I hope that you will like it and find it useful. As always any comments, questions and/or improvement ideas are welcome.

      Python is not a Panacea ...

      By Between engineering and real life from Django community aggregator: Community blog posts. Published on Mar 05, 2015.

      ... neither is any other language or framework

      This post was inspired by the serial discussion on the topic "Python vs other language"(in the specific case the other one was PHP, and the question was asked in a Python group so you may guess whether there are any answers in favor of PHP). It is very simple, I believe that every Python developer will tell you that Python is the greatest language ever build, how easy is to learn it, how readable and flexible it is, how much fun it is to work with it and so on. They will tell you that you can do everything with it: web and desktop development, testing, automation, scientific simulations etc. But what most of them will forgot to tell you is that it is not a Panacea.

      In the matter of fact you can also build "ugly" and unstable applications in Python too. Most problems come not from the language or framework used, but from bad coding practices and bad understanding of the environment. Python will force you to write readable code but it wont solve all your problems. It is hard to make a complete list of what exactly you must know before starting to build application, big part of the knowledge comes with the experience but here is a small list of some essential things.

      • Write clean with meaningful variable/class names.
      • Exceptions are raised, learn how to handle them.
      • Learn OOP(Object Oriented Programming)
      • Use functions to granulate functionality and make code reusable
      • DRY(Don't Repeat Youtself)
      • If you are going do develop web application learn about the Client-Server relation
      • Use "layers" to seprate the different parts of your application - database methods, business logic, output etc. MVC is a nice example of such separation
      • Never store passwords in plain text. Even hashed password are not completely safe, check what Rainbow Tables are.
      • Comment/Document your code
      • Write unit test and learn TDD.
      • Learn how to use version control.
      • There is a client waiting on the other side - don't make him wait too long.
      • Learn functional programming.

      I hope the above does not sounds as an anti Python talk. This is not its idea. Firstly because there are things that are more important than the language itself(the list above) and secondly because... Python is awesome )))
      There are languages that will help you learn the things above faster, Python is one of them - built in documentation features, easy to learn and try and extremely useful. My advice is not to start with PHP as your first programming language it will make you think that mixing variables with different types is OK. It may be fast for some things but most of the times it is not safe so you should better start with more type strict language where you can learn casting, escaping user output etc.

      Probably I have missed a few(or more) pointa but I hope I've covered the basics. If you think that anything important is missing, just add it in the comments and I will update the post.

      Is Open Source Consulting Dead?

      By chrism from . Published on Sep 10, 2013.

      Has Elvis left the building? Will we be able to sustain ourselves as open source consultants?

      Consulting and Patent Indemification

      By chrism from . Published on Aug 09, 2013.

      Article about consulting and patent indemnification

      Python Advent Calendar 2012 Topic

      By chrism from . Published on Dec 24, 2012.

      An entry for the 2012 Japanese advent calendar at http://connpass.com/event/1439/

      Why I Like ZODB

      By chrism from . Published on May 15, 2012.

      Why I like ZODB better than other persistence systems for writing real-world web applications.

      A str. __iter__ Gotcha in Cross-Compatible Py2/Py3 Code

      By chrism from . Published on Mar 03, 2012.

      A bug caused by a minor incompatibility can remain latent for long periods of time in a cross-compatible Python 2 / Python 3 codebase.

      In Praise of Complaining

      By chrism from . Published on Jan 01, 2012.

      In praise of complaining, even when the complaints are absurd.

      2012 Python Meme

      By chrism from . Published on Dec 24, 2011.

      My "Python meme" replies.

      In Defense of Zope Libraries

      By chrism from . Published on Dec 19, 2011.

      A much too long defense of Pyramid's use of Zope libraries.

      Plone Conference 2011 Pyramid Sprint

      By chrism from . Published on Nov 10, 2011.

      An update about the happenings at the recent 2011 Plone Conference Pyramid sprint.

      Jobs-Ification of Software Development

      By chrism from . Published on Oct 17, 2011.

      Try not to Jobs-ify the task of software development.

      WebOb Now on Python 3

      By chrism from . Published on Oct 15, 2011.

      Report about porting to Python 3.

      Open Source Project Maintainer Sarcastic Response Cheat Sheet

      By chrism from . Published on Jun 12, 2011.

      Need a sarcastic response to a support interaction as an open source project maintainer? Look no further!

      Pylons Miniconference #0 Wrapup

      By chrism from . Published on May 04, 2011.

      Last week, I visited the lovely Bay Area to attend the 0th Pylons Miniconference in San Francisco.

      Pylons Project Meetup / Minicon

      By chrism from . Published on Apr 14, 2011.

      In the SF Bay Area on the 28th, 29th, and 30th of this month (April), 3 separate Pylons Project events.

      PyCon 2011 Report

      By chrism from . Published on Mar 19, 2011.

      My personal PyCon 2011 Report