Skip to content. | Skip to navigation

Personal tools
Log in
Sections
You are here: Home

Open Source Posts

Shaun M. Thomas: PG Phriday: Ambling Architecture

From Planet PostgreSQL. Published on Dec 02, 2016.

It’s about the time for year-end performance reviews. While I’m always afraid I’ll narrowly avoid being fired for gross incompetence, that’s not usually how it goes. But that meeting did remind me about a bit of restructuring I plan to impose for 2017 that should vastly improve database availability across our organization. Many of the techniques to accomplish that—while Postgres tools in our case—are not Postgres-specific concepts.

Much of database fabric design comes down to compromise. What kind of resources are involved? How much downtime is tolerable? Which failover or migration process is the least disruptive. Is it important that components integrate self-healing? There are several questions that demand answers, and in most cases, Postgres replication is completely sufficient to ensure data security and uptime.

This is what most Postgres admins and users probably see in the wild:

Disaster Recovery Pair

Usually it’s just two servers set up as a mirror using asynchronous or synchronous Postgres replication. On top is generally some kind of virtual IP address or DNS pointer to ensure connections always target the primary server. This is only critically important for connections that need write access, but it’s still a common theme.

Even the some of the most heavy-duty Postgres high availability solutions are just variations of this theme. In 2012, I gave a presentation to Postgres Open on using Pacemaker. The presentation notes are available on the wiki for that year. This is no simple stack, either:

  • LVM: Linux Volume Manager
  • DRBD: Distributed Replicating Block Device
  • Pacemaker: Failover automation
  • Corosync: Pacemaker’s communication layer
  • VIP: Virtual IP address

That particular setup uses DRBD for block-level synchronization instead of Postgres replication because it was designed for an extremely high volume transaction processing system. It’s difficult to survive 300-million writes per day with synchronous replication latency unless it’s fairly low-level.

For normal server pairs that don’t require absolutely bulletproof data-loss prevention and throughput guarantees, standard Postgres replication is fine. Yet even injecting Postgres replication in place of DRBD and accounting for LVM being standard on most Linux hosts, we must still account for Pacemaker, Corosync, and VIP network structure requirements. That’s a lot to ask for smaller companies or university research. Even mid-sized companies with a full infrastructure department tend to shy away from Pacemaker due to its management complexity.

So what else can we do for easy and quick database promotions in the case of unexpected outages or managed system migrations? In early 2015, Compose wrote about their solution to the problem. The diagram for the setup looks like this:

Governor Stack

Essentially the Governor process acts as a central nexus controller for Postgres and a few other pieces. The etcd process is just a distributed key-value storage system with a robust election system to ensure consistent values across the cluster. And HAProxy hides all of our IP addresses so we never have to know which system is the leader. Connecting to HAProxy will always contact the primary Postgres server.

It looks complicated—and to a certain extent it is—but it readily beats the alternative. Here’s how the whole process basically works while it’s running:

  1. The Governor checks etcd for the presence of a leader.
  2. If no leader is found, it sets a key claiming the position with a relatively short TTL.
  3. If there’s already a leader, it tries to put the local Postgres instance in a state where it can begin replication from that system.
  4. Postgres is restarted to fit current roles if necessary.
  5. The Governor presents a REST interface to HAProxy as a health status. Only the leader will report a successful check.
  6. Repeat.

If we connect to this stack through HAProxy, it only redirects traffic to the Postgres server that reports itself as the leader. There’s never a need for a VIP, or a CNAME, or any other kind of DNS shenanigans. Just connect to HAProxy. Empty servers get bootstrapped with the most recent data. Old leaders are rewound and become part of the existing cluster. It’s elastic and self-healing, and much easier to manage than Pacemaker.

Of course, this leaves us with a couple other issues to resolve. The first is that of race conditions. If both Postgres servers are down, how do we know the first to win the leader race is the one with the most recent data? We don’t. Once all keys and positions have expired from the key-value store, there’s a blank slate that opens up the possibility a very old server could take over as the new primary. Once the server with the most recent data tries to connect, it will notice the mismatch and fail pending admin intervention.

This is what we call a Split Brain, and this scenario is only one way to achieve it. Each member of the cluster thinks it should be the leader for perfectly valid reasons, yet the “wrong” one is now in charge. The only fool-proof method to resolve this is to always have at least one online Governor available to provide new members with a comparison transaction status. The more Postgres servers we have in any particular cluster, the easier this is to achieve.

We can also help by setting a short transaction log archive timeout and sending archived logs to a globally shared location available to all cluster members. This ensures a single minimal source of transaction status and contents. Even a very old server would apply these archived transaction contents, and we’d “only” lose data since the last archival. Not ideal, but it helps to at least mitigate risk. If we’re coming back from an outage that took down ever cluster member, it’s likely we have other problems anyway.

The other concern with this kind of structure is actually horizontal scaling. Most automated consensus algorithms have membership limits due to interaction complexity at higher counts. If we have a larger cluster with dozens of members, overhead of maintaining the key-value store could sap precious hardware resources or fail outright. If we have dozens of separate Postgres clusters for various disparate applications, we are limited to either maintaining multiple parallel stacks, or we share the key-value store and HAProxy between them. In the latter case, we run into the same membership overload.

This is where decoupling comes in. It’s easy, and even suggested to split things up for shared environments. Consider this revision to our previous diagram:

Scalable Governor Stack

Each of those Postgres groups are a separate cluster with an undetermined amount of replicas. We’ve moved HAProxy and etcd to their own resources, whether those are VMs, shared servers, or some kind of container.

In that separate location, HAProxy can connect to any of the Postgres cluster members. Now we have a much smaller pool to consider as connection candidates. It’s a layer of abstraction that might introduce more latency, but it also means we don’t need to divulge the location of any Postgres server members. For configuration purposes, this greatly simplifies larger application stacks that may consist of dozens of interacting elements. Why maintain a list of ten different hosts for various Postgres needs? Just use the HAProxy pool.

Separated from the numerious Postgres hosts, consensus complexity is greatly reduced for etcd. It’s probably a good idea to have more than three members for larger constellations, but the concept remains sound. So long as the key-value pool survives, the Governor process will always have a secure location to track the Postgres leader and the transaction state of all replicas.

It’s “easy” to simply deploy the entire stack to every Postgres server and rely on local connections in isolation. For smaller pools and non-shared resources, that’s probably the most reliable approach. But for mid to large-size businesses, Postgres doesn’t operate in isolation. It’s likely there are already other services that make use of HAProxy or etcd. This approach allows all of them to share the components, and prevents us from maintaining a parallel installation of potentially unnecessary daemons.

As a final note, Zalando forked Governor and produced Patroni for occasions like those above. In existing environments, there’s probably a key-value store already in place, so why not use it? While Governor is restricted to using etcd for this purpose, Patroni is also compatible with ZooKeeper or Consul. Because of that extra bit of functionality, we’ll probably start integrating it into our organization using that last diagram as a model.

And then? Well, the only thing better than a Postgres cluster is an immortal Postgres cluster. We just have to hope it never gains sentience and desires revenge for badly written queries.

Bruce Momjian: Built-In Sharding Wiki Page

From Planet PostgreSQL. Published on Dec 01, 2016.

As part of today's PgConf.Asia 2016 Developer Meeting, I was asked to write a wiki page with a rough outline of how built-in sharding could be implemented. I have now created such a page.

Josh Williams: Throw It Away: Suppressing Writes on PostgreSQL Replicas

From Planet PostgreSQL. Published on Nov 30, 2016.

We needed a way to suppress specific write commands on a Postgres streaming replica. The replica was set up for a DR configuration, with the applications able to be brought up into full service at a moment's notice. But since it's a hot standby, we'd like to still allow the applications to be used in a working but read-only state.

One of the applications on this database is MediaWiki, which worked great in this configuration. But a couple of the other apps have the classic behavior of updating its user object's "last login" field in one form or another when someone authenticates, which would cause the login process to fail entirely.

Of course we want updates to fail, up until that point when (knock on wood) the master server is declared down for the count and the official fail-over happens. Except for the one command that executes on login.

We don't really care about the "last login" type field -- the data is available through logs and other means. The affected apps could probably all be monkey patched to work around that part of the process. But we had a couple different apps doing this, and that adds a maintenance burden for each. And if we could figure out how to make it work at the database level then it'd work for all of them, plus anything else that might pop up.

The first thing we looked at was writing a trigger to intercept the commands, but triggers don't execute on a hot standby replica so that was out pretty quickly. The next hypothesis was that we could write a foreign data wrapper that'd just absorb the writes, or even just use postgres_fdw to route the commands to a local writable database that's more or less a throw-away data sink. But to our surprise, even writes to foreign tables get rejected on a hot standby. I'm slightly tempted to dig in and see what it'd take to enable that.

The third time was the charm: rules. Rules hook in pretty far down into the query parser, and they can be notoriously tricky to work with. But since they're embedded pretty deep, down to the point where views rely on them they're obeyed even on a replica.

So the technique was this: On the master (... obviously) we set up a separate schema, inside which a view was created with the same name as the target table and which had certain commands suppressed:

CREATE SCHEMA replica;

CREATE VIEW replica.users AS SELECT * FROM public.users;

CREATE RULE users_disable_update AS ON UPDATE TO replica.users DO INSTEAD NOTHING;

Plus any permission adjustments the app user might need. On the master server this schema and view are pretty much ignored, as the application user just uses the default search path. But on the replica, we adjust the default search path in postgresql.conf so that it applies to just that server:

search_path = '"$user",replica,public'
app@app:5432=> UPDATE "auth_user" SET "last_login" = now() WHERE "auth_user"."id" = 13;
UPDATE 0

It doesn't quite work everywhere, sadly! Notice the "UPDATE 0"? We found Django actually checks that, and panics with an error to the user when it tries to change something and the row count it gets back is different than what it expects.

Another caveat is that if the target table's schema changes, the view won't automatically follow. Presumably your user table doesn't receive changes all that often, but if you're applying this technique to something else, that might not be the case. Something to be aware of!

Birthday retrospective

By Griatch's Evennia musings (MU* creation with Django+Twisted) from Django community aggregator: Community blog posts. Published on Nov 30, 2016.

So, recently Evennia celebrated its ten-year anniversary. That is, it was on Nov 20, 2006, Greg Taylor made the first repo commit to what would eventually become the Evennia of today. Greg has said that Evennia started out as a "weird experiment" of building a MUD/MUX using Django. The strange name he got from a cheesy NPC in the Guild Wars MMORPG and Greg's first post to the mailing list also echoes the experimental intention of the codebase. The merger with Twisted came pretty early too, replacing the early asyncore hack he used and immediately seeing a massive speedup. Evennia got attention from the MUD community - clearly a Python-based MUD system sounded attractive.

When I first joined the project I had been looking at doing something MUD-like in Python for a good while. I had looked over the various existing Python code bases at the time and found them all to be either abandoned or very limited. I had a few week's stunt working with pymoo before asking myself why I was going through the trouble of parsing a custom script language ... in Python ... Why not use Python throughout? This is when I came upon Evennia. I started making contributions and around 2010 I took over the development as real life commitments forced Greg to step down.

Over the years we have gone through a series of changes. We have gone from using SVN to Mercurial and then to using GIT. We have transited from GoogleCode to GitHub - the main problem of which was converting the wiki documentation (Evennia has extensive documentation).

For a long time we used Python's reload() function to add code to the running game. It worked ... sometimes, depending on what you changed. Eventually it turned out to be so unpredictable that we now use two processes, one to connect clients to and the other running the game, meaning we can completely restart one process without disconnecting anyone.

Back in the day you were also expected to create your own game in a folder game/ inside the Evennia repo itself. It made it really hard for us to update that folder without creating merge conflicts all over. Now Evennia is a proper library and the code you write is properly separated from ours.

So in summary, many things have happened over the years, much of it documented in this blog. With 3500 commits, 28 000 lines of code (+46% comments) and some 25 people contributing in the last year, Openhub lists us as

"A mature, well-established codebase with a stable commit history, a large development team and very well documented source code". 

It's just words compiled by an algorithm, but they still feel kinda good!


While Evennia was always meant to be used for any type of multiplayer text game, this general use have been expanded and cleaned up a lot over the years.

This has been reflected in the width of people wanting to use it for different genres: Over time the MUSH people influenced us into adding the option to play the same character from many different clients at the same time (apparently, playing on the bus and then continuing on another device later is common for such games). Others have wanted to use Evennia for interactive fiction, for hack&slash, deep roleplay, strategy, education or just for learning Python.

Since Evennia is a framework/library and tries to not impose any particular game systems, it means there is much work to do when building a game using Evennia. The result is that there are dozens of games "in production" using Evennia (and more we probably don't know about), but few public releases yet.

The first active "game" may have been an Evennia game/chat supporting the Russian version of 4chan... The community driven Evennia demo-game Ainneve is also progressing, recently adding combat for testing. This is aimed at offering an example of more game-specific code people can build from (the way Diku does). There are similar projects meant for helping people create RPI (RolePlay Intensive) and MUSH-style games. That said, the Evennia-game Arx, After the Reckoning is progressing through beta at a good clip and is showing all signs of becoming the first full-fledged released Evennia game. 


So cheers, Evennia for turning 10. That's enough of the introspection and history. I'll get back to more technical aspects in the next post.

Ernst-Georg Schmid: Stampede is out!

From Planet PostgreSQL. Published on Nov 30, 2016.

Stampede just got live!

I'll try this for sure, when I have time, when I have time...

Álvaro Hernández: Announcing ToroDB Stampede 1.0 beta

From Planet PostgreSQL. Published on Nov 30, 2016.

A few days ago we published a blog post, “The conundrum of BI/aggregate queries on MongoDB”, where we analyzed and measured some performance issues that happen on MongoDB with aggregate/OLAP/DW type of queries. We also showed that if we would transform the JSON data into a relational form and query it with SQL on a PostgreSQL database, performance can be up to orders of magnitude better. Impressive!

However, this requires a significant effort. The DDL needs to be defined –and this may be non-trivial if the origin data is of high variety. Also the data needs to be migrated, and while there are many ETL tools for that, it is still an involved process. And won’t happen in real-time! What if I require real-time statistics? What if my origin data adds a new property that is not reflected in the DDL? Do I need to sacrifice the “schema-less“ness of MongoDB for being able to perform analytic queries?

Of course not! At least, starging today. I’m very pleased to announce ToroDB Stampede, the first member of ToroDB’s family and 8Kdata’s first commercial product.

stampede

With ToroDB Stampede you will see how your MongoDB collections are transformed, in real time, to a relational structure in PostgreSQL. From there, you can run your native SQL queries on the data and use your favorite Business Intelligence tools, without requiring any ETL or change in your current database infrastructure.

So how does it work? It’s very simple:

  • Stampede works as a secondary (hidden) node on a MongoDB replica set.
  • Once started it will perform a full initial database sync and then will switch to streaming replication mode.
  • All the incoming data is transformed on-the-fly from a document shape (strictly speaking, BSON) into a set of relational tables. Tables will have the names of your document properties, arrays and nested documents will be transformed into relations, and columns named after the corresponding keys.
  • You don’t need to provide any DDL. All the DDL is automagically created by Stampede. Even if new keys or embedded documents appear, new columns and/or tables will be automatically and transparently created.

And this is where the fun begins. Now you have all your MongoDB data in perfectly shaped tables in a PostgreSQL database! Visualization and data exploration are greatly improved, and, more importantly, SQL querying, native SQL querying, is at your hand! Use it to connect to your favorite BI tools. Use it to migrate off of MongoDB to PostgreSQL. Use it to have a SQL replica. Unleash your unstructured data, into a relational database! See the example below to understand how ToroDB generates the tables and columns out of JSON documents, and check the documentation for more information.

toro_stampede_mappingSurely enough, performance matters. Does ToroDB Stampede deliver on the promise of 10-100x faster queries? There’s only one way to find it out. Benchmark time! The following benchmarks used one or more (when MongoDB was used in a sharded configuration) AWS i2.xlarge instances (4 vCPUs, 30GB RAM, 800Gb local SSD). We used a XFS filesystem and basic tunning was done on both MongoDB and PostgreSQL configuration. For each dataset, we manually created 6 different queries, that try to extract business value out of the information. MongoDB queries quere done via the Aggregation Framework and Stampede ones with regular (Postgres) SQL. All the tests were run 5 times, using the first two to warm up the caches and the numbers show the average of the last three runs.

Based on the Github Archive we performed an initial benchmark over a 500Gb dataset. We run 6 different queries (named A through F) which you may check here: MongoDB (A, B, C, D, E, F) and Stampede/PostgreSQL (A, B, C, D, E, F).

stampede-github500gb

Up to 57x faster! All queries are significantly faster than MongoDB, and only one (A) is slightly slower compared to a 3-node MongoDB cluster. Trying with a smaller dataset reveals even bigger differences. This is likely due to a much better buffer management in PostgreSQL:

stampede-github100gb

Woah! 267x faster! Query D takes 2,400 seconds on a single MongoDB node (about 20 minutes), 383 seconds on a three-node MongoDB shard (better, but still more than 6 minutes) and just 9 seconds on a single node PostgreSQL.

Here both MongoDB and Stampede had an index on both the _id and actor.login fields. Stampede will automatically replicate any index created in MongoDB (no action required on your side). We also wanted to try whether indexes were being used and what impact they had on the performance:

stampede-github100gb-no_idx

From the results we can conclude that: a) PostgreSQL results are almost the same, which is consistent with the assumption that indexes are usually not required for aggregate queries; b) MongoDB worsened the results for query A without indexes… but significantly improved query time for query D, when the index is removed! This may probably an issue with the query planner.

We also benchmarked another data set, based on the flights stats information from Transtats. Similar aggregate queries were written. Data size is smaller (50Gb) which leads to smaller differences:

stampede-transtats

Still, results are consistently faster even when pitched against the three-node MongoDB sharded cluster. And up to 11x faster queries, which is a very significant improvement! While developing Stampede we have performed benchmarks where we have observed more than 2000x faster queries. Of course, this may be a degraded case for MongoDB and surely Stampede does not perform always as well on every single circumstance.

So the recommendation is always the same: please don’t trust our numbers. Do your own. Benchmark Stampede, and please let us know the results.

ToroDB Stampede is fully open source, and it’s already available in our website for download and use. Check the user documentation to learn how to install, configure and run ToroDB Stampede.

If you need more information or you just simply would like to give us your opinion, please feel free to comment below or join the discussion on Hacker News! Thank you.

Optimizing the construction of Django QuerySets

By Adam’s Tech Blog from Django community aggregator: Community blog posts. Published on Nov 29, 2016.

Deep construction
details

Django’s ORM is normally fast enough as-is, but if you’ve ever profiled a high traffic view with a fairly complicated query, you might have found that constructing QuerySet can take a noticeable portion of your request time. For example, I once found a query on the front page of the site I was working on that took 1ms to construct and 1ms for the database to answer. With a performance budget of 100ms, that was 1% gone on computing the exactly same SQL.

Thankfully we don’t need to instantly drop down to raw SQL to optimize such cases, as Django’s QuerySet API naturally lends itself to caching the intermediate objects. Since each operation on a QuerySet returns a new object with the change applied, they’re always lazy as to executing the SQL, and operations can (normally) be chained in any order, you can build the non-specific part of your QuerySet up as a cached object and then apply final, specific filtering required at request time.

Just a note before we dive in: this should be one of the least reached for tools in your optimization toolbox - normally it’s enough to fix the basics such as avoiding N+1 queries with select_related() / prefetch_related(), and adding caching of data between requests with Django’s caching framework. On that old front page I was talking about, the reason the rest of the page fit in 98ms was because most of it came from a few cache keys, avoiding even some template rendering.

As an example, let’s say we’re building an autocomplete feature on Django’s User model. We might have a function that looks like this:

def full_construction_autocomplete(typed):
    return User.objects.annotate(
        username_length=Length('username')
    ).order_by(
        'username_length'
    ).filter(
        username__startswith=typed
    )

If we time it with IPython’s %timeit on ./manage.py shell, it will time just the construction time, since nothing is iterating the QuerySet and causing the lazy fetching of results from the database. It comes out taking about a quarter of a millisecond on my machine:

In [2]: %timeit full_construction_autocomplete('ad')
1000 loops, best of 3: 263 µs per loop

To cache most of the construction, we can just define the non-specific part of the query as a module-level object, and apply the filter() at the last step:

cached_qs = User.objects.annotate(
    username_length=Length('username')
).order_by(
    'username_length'
)


def cached_construction_autocomplete(typed):
    return cached_qs.filter(
        username__startswith=typed
    )

And just like that, we’ve sped the function calls up by more than 50%:

In [4]: %timeit cached_construction_autocomplete('ad')
10000 loops, best of 3: 105 µs per loop

Obviously 160 microseconds on its own is hardly going to be noticeable to your end users, but if you find yourself looking at a complicated or frequently called QuerySet, this technique might help you make your performance budget. It’s also a simple optimization.

You don’t necessarily have to cache with a module-level object, for example in ModelAdmin classes you could cache partially constructed QuerySets on the class itself. For example this will work as long as you don’t do per-request modifications in get_queryset or get_ordering:

class MyModelAdmin(ModelAdmin):
    def get_queryset(self, request):
        if not hasattr(self, '_queryset'):
            self._queryset = super().get_queryset(request=None).annotate(
                username_length=Length('username'),
            )
        return self._queryset.all()

The all() here is important to make sure we hand a copy of the QuerySet to the caller and avoid caching the results once for the whole class!

I’m sure you can come up with the right caching scheme for wherever it is you construct your QuerySets, such as your class-based views, or custom managers.

The Django core team are aware of the time that can be wasted on QuerySet construction, and have looked at optimizing it. This would be particularly useful for some Django internals, such as when constructing a QuerySet during Model.save(). Anssi Kääriäinen created a patch to add an option to QuerySets to not clone themselves on operations in Ticket 20880, and Josh Smeaton opened an experimental Pull Request implementing the same idea as a (maybe) public API. I personally think that it’s going to be better kept as a private API for Django’s core, as there are many ways of optimizing user code, including this strategy of caching partially constructed QuerySets :)

Django Under the Hood 2016 Recap

By Caktus Consulting Group from Django community aggregator: Community blog posts. Published on Nov 29, 2016.

Caktus was a proud sponsor of Django Under the Hood (DUTH) 2016 in Amsterdam this year. Organized by Django core developers and community members, DUTH is a highly technical conference that delves deep into Django.

Django core developers and Caktus Technical Manager Karen Tracey and CEO/Co-founder Tobias McNulty both flew to Amsterdam to spend time with fellow Djangonauts. Since not all of us could go, we wanted to ask them what Django Under the Hood was like.

Can you tell us more about Django Under the Hood?

Tobias: This was my first Django Under the Hood. The venue was packed. It’s an in-depth, curated talk series by invite-only speakers. It was impeccably organized. Everything is thought through. They even have little spots where you can pick up toothbrush and toothpaste.

Karen: I’ve been to all three. They sell out very quickly. Core developers are all invited, get tickets, and some funding depending on sponsorship. This is the only event where some costs are covered for core developers. DjangoCon EU and US have core devs going, but they attend it however they manage to get funds for it.

What was your favorite part of Django Under the Hood?

Tobias: The talks: they’re longer and more detailed than typical conference talks; they’re curated and confined to a single track so the conference has a natural rhythm to it. I really liked the talks, but also being there with the core team. Just being able to meet these people you see on IRC and mailing list, there’s a big value to that. I was able to put people in context. I’d met quite a few of the core team before but not all.

Karen: I don’t have much time to contribute to Django because of heavy involvement in cat rescue locally and a full time job, but this is a great opportunity to have at least a day to do Django stuff at the sprint and to see a lot of people I don’t otherwise have a chance to see.

All the talk videos are now online. Which talk do you recommend we watch first?

Karen: Depends on what you’re interested in. I really enjoyed the Instagram one. As someone who contributed to the Django framework, to see it used and scaled to the size of Instagram 500 million plus users is interesting.

Tobias: There were humorous insights, like the Justin Bieber effect. Originally they’d shared their database by user ID, so everybody on the ops team had memorized his user ID to be prepared in case he posted anything. At that scale, maximizing the number of requests they can serve from a single server really matters.

Karen: All the monitoring was interesting too.

Tobias: I liked Ana Balica’s testing talk. It included a history of testing in Django, which was educational to me. Django didn’t start with a framework for testing your applications. It was added as a ticket in the low thousands. She also had practical advice on how to treat your test suite as part of the application, like splitting out functional tests and unit tests. She had good strategies to make your unit tests as fast as possible so you can run them as often as needed.

What was your favorite tip or lesson?

Tobias: Jennifer Akullian gave a keynote on mental health that had a diagram of how to talk about feelings in a team. You try to dig into what that means. She talked about trying to destigmatize mental health in tech. I think that’s an important topic we should be discussing more.

Karen: I learned things in each of the talks. I have a hard time picking out one tip that sticks with me. I’d like to look into what Ana Balica said about mutation testing and learn more about it.

What are some trends you’re seeing in Django?

Karen: The core developers met for a half-day meeting the first day of the conference. We talked about what’s going on with DJango, what’s happened in the past year, what’s the future of Django. The theme was “Django is boring.”

Tobias: “Django is boring” because it is no longer unknown. It’s an established, common framework now used by big organizations like NASA, Instagram, Pinterest, US Senate, etc. At the start, it was a little known bootstrappy cutting edge web framework. The reasons why we hooked up with Django nine years ago at Caktus, like security and business efficacy, all of those arguments are ever so much stronger today. That can make it seem boring for developers but it’s a good thing for business.

Karen: It’s been around for awhile. Eleven years. A lot of the common challenges in Django have been solved. Not that there aren’t cutting edge web problems. But should you solve some problems elsewhere? For example, in third party, reusable apps like channels, REST framework.

Tobias: There was also recognition that Django is so much more than the software. It’s the community and all the packages around it. That’s what make Dango great.

Where do you see Django going in the future?

Karen: I hate those sorts of questions. I don’t know how to answer that. It’s been fun to see the Django community grow and I expect to see continued growth.

Tobias: That’s not my favorite question either. But Django has a role in fostering and continuing to grow the community it has. Django can set an example for open source communities on how to operate and fund themselves in sustainable ways. Django is experimenting with funding right now. How do we make open source projects like this sustainable without relying on people with full-time jobs volunteering their nights and weekends? This is definitely not a “solved problem,” and I look forward to seeing the progress Django and other open source communities make in the coming years.

Thank you to Tobias and Karen for sharing their thoughts.

Denish Patel: Running Postgres in Docker

From Planet PostgreSQL. Published on Nov 29, 2016.

For last six months, I have been working on moving Postgres from bare metal & VM based  systems in  Docker. As of today, I have have migrated a couple of mission critical Postgres DBs  (~ 2TB) on to Docker environment from bare metal.

During the migration  journey, I have listed down some of the things to consider running Postgres production instances in to Docker environment.

  1.  Do not use default Postgres Docker image. Start with your own docker image from scratch.
  2.  Handle Postgres service shutdown gracefully in docker with SIGTERM
  3.  OS can be stored in Docker container
  4.  Data volume MUST be stored in persistent storage
  5. Use some kind of framework to manage docker containers
    • Apache Mesos & Aurora
    • OpenStack & Kubernetes
  6.  You can mount NetApp for backups and WAL files on to container
  7.  Make templates for resources for different kind of workloads
    • Aurora job for resource templates
    • postgresql.conf templates
  8.  Use static IPs for services ; DBs, Front end an backend servers
    • It will be easier to control access at container level for better security
    • failover is easy to manage with static IP for master DB server
  9.  Benchmark your existing system and compare with new system
    • Keep eyes on TPS using pgbench and benchmarksql
  10.  Monitoring solution for Postgres DB
    • collectd or other agent based monitoring
      • pg_stat_statements is very useful
    • Docker container should be monitored separately
      •  docker stats
  11.  Backup container for taking backups
  12.  Standby container for setting up standby jobs

I hope it will be useful for someone working on migrating Postgres into Docker environment!

Introduction to Django Models

By David Fozo's Blog from Django community aggregator: Community blog posts. Published on Nov 29, 2016.

In this part of the tutorial, I will show you some basic model definition by creating a simple blog. As discussed earlier, database tables are translated from the models.py files.

 

If you haven't followed along with earlier tutorials, you can do the setup here. Choose branch exercise4.

 

As usual we start by creating a new branch:

git checkout -b blog

 

Activate virtualenv, if you haven't already:

source ../virtualenv/bin/activate

 

A is an application by itself, so we create one inside Django with the following command.

python3 manage.py startapp blog

 

You should see these directories and files in your source dir:

blog/

main/

manage.py

MyTutorial/

requirements.txt

 

In order for django to discover the new application, you need to include it in the installed apps in settings.py

***MyTutorial/settings.py***

...

INSTALLED_APPS = [

    'main',

    'blog',

    'django.contrib.admin',

    'django.contrib.auth',

    'django.contrib.contenttypes',

    'django.contrib.sessions',

    'django.contrib.messages',

    'django.contrib.staticfiles',

]

As usual we write some tests first. If you have a look at the other test file main/tests.py you can see, that we have some very similar goal here. It might be tempting to import those tests and tweak it a little, but that wouldn't be right. As for Unit Tests, it needs to be completely independent from each other. So it is not a problem, if they look too much alike.

 

***blog/tests.py***

from django.core.urlresolvers import resolve

from django.test import TestCase

from django.http import HttpRequest

from blog.models import BlogPost

from blog.views import view_post

 

# Create your tests here.

class BlogPostTest(TestCase):

 

    def setUp(self):

        self.title = "Blog Post Test"

        self.slug = "blog-post-test"

        self.body = "This is a test."

        self.date = "2016-11-26"

 

        BlogPost.objects.create(

            title=self.title,

            slug=self.slug,

            body=self.body,

            date=self.date)

 

    def test_blogpost_resolves_to_slug(self):

        blogpost = resolve('/blog/' + self.slug + '/')

        self.assertEqual(blogpost.func,view_post)

 

    def test_blogpost_displays_article(self):

 

         request = HttpRequest()

         response = view_post(request,self.slug)

         self.assertTrue(response.content.startswith(b'<!doctype html>'))

         for i in [self.title,self.body]:

             self.assertIn(i,response.content.decode())

    def tearDown(self):

        BlogPost.objects.all().filter(title=self.title).delete()

 

So what did we do here? We created a new test class, which tests our blog application. In the setUp function, we create some title, body, slug variables, which we can refer to later. We also a create BlogPost object. In the first test we check if our custom url will trigger the right view function. I think the second tests title speaks for itself. We call the view_post function, which we will create in a moment, with our custom slug (the blog post identifier as in the URL). Then check if the response contains our chosen title, and body.

 

If you run the tests now:

python3 manage.py test

 

You should have two failures:

FF.......

 

So let's create the model classes first, because that's the most interesting.

 

***blog/models.py***

from django.db import models

from django.core.urlresolvers import reverse

from django.template.defaultfilters import slugify

 

# Create your models here.

class BlogPost(models.Model):

    title = models.CharField(max_length=255)

    slug = models.SlugField(max_length=255)

    body = models.TextField()

    date = models.DateField(auto_now_add=True)

 

    def save(self, *args, **kwargs):

        if not self.id:

            self.slug = slugify(self.title)

        super(BlogPost, self).save(*args, **kwargs)

    def get_absolute_url(self):

        return reverse('blog.views.view_post', args=[str(self.slug)])

 

What is happening here? So our database tables are model classes in Django. The fields are variables. So you declare a variable, and that will be the title of the database column/field. Then, you choose the appropriate field type. One if the method is saving your title as a slug (url) if it's a new post (no id yet), the other method let's you reference this post as an url.

 

Now, let's connect these models to the templates, by the views functions.

 

***blog/views.py***

from django.shortcuts import render_to_response, get_object_or_404

from blog.models import BlogPost

 

 

# Create your views here.

def view_post(request, slug):

    post = get_object_or_404(BlogPost, slug=slug)

    return render_to_response("blogpost.html", {'post' : post,})

 

Okay, so our view_post function takes in the slug and fetches the corresponding BlogPost model object. Then it renders the blogpost.html template, where the post variables refer to our BlogPost object.

 

Let's add our BlogPost slug to the urls:

***MyTutorial/urls.py***

from django.conf.urls import url

from django.contrib import admin

from main.views import home, form, subscribed, thanks

from blog.views import view_post

 

urlpatterns = [

    url(r'^admin/', admin.site.urls),

    url(r'^form/',form),

    url(r'^thanks/',thanks),

    url(r'^subscribed/',subscribed),

    url(r'^$', home),

    url(r'^blog/(?P<slug>[-\w\d\_]+)/$', view_post, name='view_post'),

]

 

So now you can reference your Post as /blog/my-blog-post-title.

 

Let's create some basic template. See, how it all comes full circle!

***blog/blogpost.html***

<!doctype html>

<html>

<head>

    <title>{{ post.title }}</title>

</head>

<body>

    <h1>{{ post.title }}</h1>

    <p>{{ post.body }}</p>

    <br>

    <p>created at {{ post.date }}</p>

</body>

</html>

 

If you run the tests, you will still have no luck. Even though, every piece of code is in place. You need to update the database to reflect the changes.

python3 manage.py makemigrations

python3 manage.py migrate

 

Now, if want to make a blog post, it's not so convenient:

python3 manage.py shell

>>>from blog.models import BlogPost

>>>BlogPost.objects.create(title=”My Title”,slug=”my-title”,body=”Hello World”,date=”2016-11-29”)

>>>exit()

 

So you are able to see it, if you start up your development server:

python3 manage.py runserver

 

Write to the url line in the browser: http://localhost/blog/my-title/ . Should see this:

### Screenshot ###

 

Run the tests:

python3 manage.py test

 

It should all pass.

 

Let's commit our changes and merge into the master branch.

git status

git add .

git commit -m “Added blog application, able to create basic blog posts”

 

Change you branch back to master:

git checkout master

 

Merge the blog branch back to master and then delete the branch not needed.

git merge blog

git branch -d blog

 

So you learned how to create some basic database model in Django. It is not really user friendly yet, but soon you will see how to be the master of your Django universe.

 

Stay tuned for the next lesson, where I will show you, how to handle your models with grace in the Django Admin.

 

Until next time.

Marco Slot: Real-time event aggregation at scale using Postgres w/ Citus

From Planet PostgreSQL. Published on Nov 29, 2016.

Citus is commonly used to scale out event data pipelines on top of PostgreSQL. Its ability to transparently shard data and parallelise queries over many machines makes it possible to have real-time responsiveness even with terabytes of data. Users with very high data volumes often store pre-aggregated data to avoid the cost of processing raw data at run-time. With Citus 6.0 this type of workflow became even easier using a new feature that enables pre-aggregation inside the database in a massively parallel fashion using standard SQL. For large datasets, querying pre-computed aggregation tables can be orders of magnitude faster than querying the facts table on demand.

To create aggregations for distributed tables, the latest version of Citus supports the INSERT .. SELECT syntax for tables that use the same distribution column. Citus 6 automatically ‘co-locates’ the shards of distributed tables such that the same distribution column value is always placed on the same worker node, which allows us to transfer data between tables as long as the distribution column value is preserved. A common way of taking advantage of co-location is to follow the multi-tenant data model and shard all tables by tenant_id or customer_id. Even without that model, as long as your tables share the same distribution column, you can leverage the INSERT .. SELECT syntax.

INSERT .. SELECT queries that can be pushed down to the workers are supported, which excludes some SQL functionality such as limits, unions, and window functions. Since the result will be inserted into a co-located shard in the destination table, we need to make sure that the distribution column (e.g. tenant_id) is preserved in the aggregation and is included in joins. INSERT .. SELECT commands on distributed tables will usually look like:

INSERT INTO aggregation_table (tenant_id, ...)
SELECT tenant_id, ... FROM facts_table ...

Now let’s walk through the steps of creating aggregations for a typical example of high-volume data: page views. We set up a Citus 6 on PostgreSQL 9.6 cluster using Citus Cloud consisting of 4 workers with 4 cores each, and create a distributed facts table with several indexes:

CREATE TABLE page_views (
    tenant_id int,
    page_id int,
    host_ip inet,
    view_time timestamp default now()
);
CREATE INDEX view_tenant_idx ON page_views (tenant_id);
CREATE INDEX view_time_idx ON page_views USING BRIN (view_time);

SELECT create_distributed_table('page_views', 'tenant_id');

Next, we generate 100 million rows of fake data (takes a few minutes) and load it into the database:

\COPY (SELECT s % 307, (random()*5000)::int, '203.0.113.' || (s % 251), now() + random() * interval '60 seconds' FROM generate_series(1,100000000) s) TO '/tmp/views.csv' WITH CSV

\COPY page_views FROM '/tmp/views.csv' WITH CSV

We can now perform aggregations at run-time by performing a SQL query against the facts table:

-- Most views in the past week
SELECT page_id, count(*) AS view_count
FROM page_views
WHERE tenant_id = 5 AND view_time >= date '2016-11-23'
GROUP BY tenant_id, page_id
ORDER BY view_count DESC LIMIT 3;
 page_id | view_count 
---------+------------
    2375 |         99
    4538 |         95
    1417 |         93
(3 rows)

Time: 269.125 ms

However, we can do much better by creating a pre-computed aggregation, which we also distribute by tenant_id. Citus automatically co-locates the table with the page_views table:

CREATE TABLE daily_page_views (
    tenant_id int,
    day date,
    page_id int,
    view_count bigint,
    primary key (tenant_id, day, page_id)
);

SELECT create_distributed_table('daily_page_views', 'tenant_id');

We can now populate the aggregation using a simple INSERT..SELECT command, which is parallelised across the cores in our workers, processing around 10 million events per second and generating 1.7 million aggregates:

INSERT INTO daily_page_views (tenant_id, day, page_id, view_count)
  SELECT tenant_id, view_time::date AS day, page_id, count(*) AS view_count
  FROM page_views
  GROUP BY tenant_id, view_time::date, page_id;

INSERT 0 1690649

Time: 10649.870 ms 

After creating the aggregation, we can get the results from the aggregation table in a fraction of the query time:

-- Most views in the past week
SELECT page_id, view_count
FROM daily_page_views
WHERE tenant_id = 5 AND day >= date '2016-11-23'
ORDER BY view_count DESC LIMIT 3;
 page_id | view_count 
---------+------------
    2375 |         99
    4538 |         95
    1417 |         93
(3 rows)

Time: 4.528 ms

We typically want to keep aggregations up-to-date, even as the current day progresses. We can achieve this by expanding our original command to only consider new rows and updating existing rows to consider the new data using ON CONFLICT (“upsert”). If we insert data for a primary key (tenant_id, day, page_id) that already exists in the aggregation table, then the count will be added instead.

INSERT INTO page_views VALUES (5, 10, '203.0.113.1');


INSERT INTO daily_page_views (tenant_id, day, page_id, view_count)
  SELECT tenant_id, view_time::date AS day, page_id, count(*) AS view_count
  FROM page_views
  WHERE view_time >= '2016-11-23 23:00:00' AND view_time < '2016-11-24 00:00:00'
  GROUP BY tenant_id, view_time::date, page_id
  ON CONFLICT (tenant_id, day, page_id) DO UPDATE SET
  view_count = daily_page_views.view_count + EXCLUDED.view_count;

INSERT 0 1

Time: 2787.081 ms

To regularly update the aggregation, we need to keep track of which rows in the facts table have already been processed as to avoid counting them more than once. A basic approach is to aggregate up to the current time, store the timestamp in a table, and continue from that timestamp on the next run. We do need to be careful that there may be in-flight requests with a lower timestamp, which is especially true when using bulk ingestion through COPY. We therefore roll up to a timestamp that lies slightly in the past, with the assumption that all requests that started before then have finished by now. We can easily codify this logic into a PL/pgSQL function:

CREATE TABLE aggregations (name regclass primary key, last_update timestamp);
INSERT INTO aggregations VALUES ('daily_page_views', now());


CREATE OR REPLACE FUNCTION compute_daily_view_counts()
RETURNS void LANGUAGE plpgsql AS $function$
DECLARE
  start_time timestamp;
  end_time timestamp := now() - interval '1 minute'; -- exclude in-flight requests
BEGIN
  SELECT last_update INTO start_time FROM aggregations WHERE name = 'daily_page_views'::regclass;
  UPDATE aggregations SET last_update = end_time WHERE name = 'daily_page_views'::regclass;

  SET LOCAL citus.all_modifications_commutative TO on; -- for on-premises, replication factor >1 only

  EXECUTE $$
    INSERT INTO daily_page_views (tenant_id, day, page_id, view_count)
      SELECT tenant_id, view_time::date AS day, page_id, count(*) AS view_count
      FROM page_views
      WHERE view_time >= $1 AND view_time < $2
      GROUP BY tenant_id, view_time::date, page_id
      ON CONFLICT (tenant_id, day, page_id) DO UPDATE SET
      view_count = daily_page_views.view_count + EXCLUDED.view_count$$
  USING start_time, end_time;
END;
$function$;

After creating the function, we can periodically call SELECT compute_daily_view_counts() to continuously update the aggregation with 1-2 minutes delay. More advanced approaches can bring down this delay to a few seconds.

A few caveats to note:

  • In this example, we used a single, database-generated time column, but it’s generally better to distinguish between the time at which the event happened at the source and the database-generated ingestion time used to keep track of whether an event was already processed.
  • Due to a current limitation in the INSERT .. SELECT implementation, we recommend using timestamp instead of timestamptz for the ingestion time column.
  • When running Citus on-premises with built-in replication, we recommend you set citus.all_modifications_commutative to on before any INSERT..SELECT command, since Citus otherwise locks the source tables to avoid inconsistencies between replicas. On Citus Cloud this is a non-issue as we leverage Postgres streaming replication.

You might be wondering why we used a page_id in the examples instead of something more meaningful like a URL. Are we trying to dodge the overhead of storing URLs for every page view to make our numbers look better? We certainly are! With Citus you can often avoid the cost of denormalization that you would pay in distributed databases that don’t support joins. You can simply put the static details of a page inside another table and perform a join:

CREATE TABLE pages (
    tenant_id int,
    page_id int,
    url text,
    language varchar(2),
    primary key (tenant_id, page_id)
);

SELECT create_distributed_table('pages', 'tenant_id');

... insert pages ...

-- Most views in the past week
SELECT url, view_count
FROM daily_page_views JOIN pages USING (tenant_id, page_id)
WHERE tenant_id = 5 AND day >= date '2016-11-23'
ORDER BY view_count DESC LIMIT 3;
   url    | view_count 
----------+------------
 /home    |         99
 /contact |         95
 /product |         93
(3 rows)

Time: 7.042 ms

You can also perform joins in the INSERT..SELECT command, allowing you to create more detailed aggregations, e.g. by language.

Distributed aggregation adds another tool to Citus’ broad toolchest in dealing with big data problems. With parallel INSERT .. SELECT, parallel indexing, parallel querying, scaling write throughput through Citus MX, and many other features, Citus can not only horizontally scale your multi-tenant database, but can also unify many different parts of your data pipeline into one platform.

Álvaro Hernández: The conundrum of BI/aggregate queries on MongoDB

From Planet PostgreSQL. Published on Nov 28, 2016.

MongoDB is a very popular NoSQL database. MongoDB indeed popularized the concept of the “document”, a hierarchical set of key-value pairs, with a non-fixed structure (schema). Having flexible schemas and the 1:1 mapping between the business logic and the database objects are, arguably, key to MongoDB’s popularity. The reasons why MongoDB is used as the source of truth for many OLTP applications.

Surprisingly to some, dynamic schema does not avoid doing data design, nor it prevents from performance implications of that design:

When designing data models, always consider the application usage of the data (i.e. queries, updates, and processing of the data) as well as the inherent structure of the data itself.

But what if queries cannot be known ahead of time? What if different set of queries have almost orthogonal design needs? After all, the relational theory and data normalization are precisely aimed at solving this problem: a data representation that is completely independent of the queries.

Queries that need to retrieve a document or part of it, typically indexed, perform very well. But what are the consequences of performing queries on MongoDB for which data was not modeled after? Indeed, how do aggregate queries –which represent a completely different query pattern of the typical OLTP gimme-this-single-document query– perform? How does it perform when used with Business Intelligence (BI) applications, where queries are unknown and left for the end users?

Let’s reason intuitively first. What is NoSQL’s “schema-less” concept? It means that any document, with any given structure (there’s always a structure) is stored into the system as-is. One after the other. So if there are mixed information, different data belonging to different “parts” of the business logic, some documents with some properties, others without, what does it look like? Chaos!

chaos

Aggregate queries typically require scanning many documents, looking for specific properties of the data, often without the help of an index. Since NoSQL stores document with any structure and data, how does the database know which documents belong to the query criteria? It doesn’t! It has to scan the whole dataset (normally, the whole collection) to answer the query.

This scan is not cheap. Below is depicted how it works on MongoDB. First, decompression may need to be performed. Then, for each document, the BSON document needs to be traversed and evaluated with the query criteria. Worst case, every single key-pair of every document needs to be evaluated.

Let’s use the Github Archive dataset as an example, and illustrate a query to return the top 10 most active actors. This is how MongoDB would internally process all the data in the collection to find the query results:

MongoDB scan operation

And the MongoDB aggregate query:

db.githubarchive.aggregate([
  {
    $group: {
      _id: '$actor.login', events: { $sum: 1 }
    }
  },
  { $sort: { events: -1 }},
  { $limit: 10 }
])

On comparison, a RDBMS with a normalized data design will typically target a subset of the tables, and only the columns involved will be evaluated. So much less I/O, which is normally the bottleneck of a database, will be required to answer the same query. In this case, only the table github_actor is required to answer the query:PostgreSQL table scanThe query on a PostgreSQL relational database would be:

SELECT count(*), login
FROM github_actor
GROUP BY login
ORDER BY 1 DESC
FETCH FIRST 10 ROWS ONLY;

 

The I/O required by both queries can be measured using iotop -o -a:

iotop: MongoDB vs PostgreSQLUsing a 1,4Gb sample of the Github Archive dataset, the query in MongoDB requires exactly the same I/O as the collection’s reported storage size (536,37Mb). This was expected, the whole collection is scanned. PostgreSQL scans just a 6th of the data to answer the query. Taking this further, if we would use Citus Data’s cstore (a columnar storage extension for PostgreSQL), the results are even more surprising: answering the query requires just 1/100th of the IO that MongoDB requires:

iotop: MongoDB vs PostgreSQL vs PostgreSQL with cstore

So the performance of aggregate queries on NoSQL, more precisely MongoDB, is not as good as expected. What is indeed the reality? We measured the execution time of the above queries comparing MongoDB and PostgreSQL on a 1Gb, 10Gb and 100Gb Github Archive sample of the dataset: (measuring query execution time; less is better)

Aggregate query performance: MongoDB vs PostgreSQL

Without zoom is hard to appreciate PostgreSQL’s results:

Aggregate query performance: MongoDB vs PostgreSQL

Similar aggregate queries deliver the same results. At 8Kdata, while developing ToroDB, we have performed dozens of similar queries. The results are consistently clear: a relational design almost always outperforms MongoDB aggregate queries by several factors, often orders of magnitude. Although we have seen even more pathological cases (i.e., even faster), it can safely be said that it is “100x faster”.

So one solution for the BI/aggregate queries on MongoDB is to bring them to SQL! Indeed, Stripe did that. There are ETLs too. However, all of these solutions require complex processes and designing the SQL schema of the destination data on your RDBMS. There’s no easy way to bring MongoDB to relational.

Hope is not lost, though. In a few days, 8Kdata will be publicly announcing a solution to this problem. If you are interested, keep watching this blog, subscribe to our newsletter and follow us on Twitter. Stay tuned!

Joshua Drake: What should I submit to PgConf US 2017?

From Planet PostgreSQL. Published on Nov 28, 2016.

From the title, that is the question. This is the last week of the PgConf US 2017 CFP (you can submit here: http://www.pgconf.us/2017/submit/) and I have no idea what to submit.

I am blessed that my talks are very well attended, the audience is engaged and we all have a good time. Many times laughing at me because I have a hard time staying on one specific topic (especially if someone brings a kid into the room). There is the disclaimer I have to put up on my slides because there are some in the community that can't handle humor or PG-13 content but we must all love our neighbor and enjoy them for who they are. I have more than once seen a community member shaking their head at me (Grant, I am looking at you) but that is part of the fun and part of the show. Let's get everyone loving PostgreSQL. That gets me to my point.

I have no idea what to submit.

What would you like to see me speak on?

  • Replication?
    • What type?
  • Backups?
    • pg_dump, or binary?
  • Data types?
  • Modeling?
  • Deploying with Kubernetes?
  • What about LXC/LXD?
  • What the PostgreSQL Snap packages are?
  • Bare metal to cloud comparison?
  • The constant nagging issue of no PostgreSQL issue tracker?
  • How Steven Universe uses PostgreSQL? (That's a joke but meant to say... insert anything else here)

You tell me. Help me out here. I want to once again have a standing room only party of community that wants to love and learn about PostgreSQL. I just don't know what people want to hear about this year.

How to break Python

By James Bennett from Django community aggregator: Community blog posts. Published on Nov 28, 2016.

Don’t worry, this isn’t another piece about Python 3. I’m fully in favor of Python 3, and on record as to why. And if you’re still not convinced, I suggest this thoroughly comprehensive article on the topic, which goes over not just the bits people get angry about but also the frankly massive amount of cool stuff that only works in Python 3, and that you’re missing out on if you still only use Python 2 ...

Read full entry

How to Filter QuerySets Dynamically

By Simple is Better Than Complex from Django community aggregator: Community blog posts. Published on Nov 28, 2016.

Filtering QuerySets dynamically is a fairly common use case. Sure thing there is a pluggable app to make your life easier. This tutorial is about how to use the django-filter app to add a hassle-free filtering to your views. To illustrate this tutorial I will implement a view to search for users. As usual the code used in this tutorial is available on GitHub. You can find the link in the end of this post.


Installation

Easiest way is to install it with pip:

pip install django-filter

That’s it. It’s ready to be used. Make sure you update your requirements.txt.

The default language of the app is English. It already come with some localization and language support. Currently the supported languages are: de, es_AR, fr, pl, ru, zh_CN. Unless you want to use any of those languages in your project, you don’t need to add django_filter to the INSTALLED_APPS.


Usage

Create a file named filters.py inside your app folder:

filters.py

from django.contrib.auth.models import User
import django_filters

class UserFilter(django_filters.FilterSet):
    class Meta:
        model = User
        fields = ['username', 'first_name', 'last_name', ]

The view is as simple as:

views.py

from django.contrib.auth.models import User
from django.shortcuts import render
from .filters import UserFilter

def search(request):
    user_list = User.objects.all()
    user_filter = UserFilter(request.GET, queryset=user_list)
    return render(request, 'search/user_list.html', {'filter': user_filter})

Then a route:

urls.py

from django.conf.urls import url
from mysite.search import views

urlpatterns = [
    url(r'^search/$', views.search, name='search'),
]

And finally the template:

user_list.html

{% extends 'base.html' %}

{% block content %}
  <form method="get">
    {{ filter.form.as_p }}
    <button type="submit">Search</button>
  </form>
  <ul>
  {% for user in filter.qs %}
    <li>{{ user.username }} - {{ user.get_full_name }}</li>
  {% endfor %}
  </ul>
{% endblock %}

The magic happens inside the UserFilter class. We simply have to pass the request.GET data to the UserFilter class, along with the QuerySet we want to filter. It will generate a Django Form with the search fields as well as return the filtered QuerySet.

So basically we will be working inside the UserFilter definition and the HTML template, displaying properly the data.

This is what the initial example looks like:

Django Filter

And after submitting the form:

Django Filter


Using The Generic Class-Based View

If you won’t be doing anything special inside the view function, you can easily replace it with the django-filter’s generic class-based view.

You may either pass a model or a filterset_class as a parameter. We will be working with the filterset_class, so to give us more flexibility:

urls.py

from django.conf.urls import url
from django_filters.views import FilterView
from mysite.search.filters import UserFilter

urlpatterns = [
    url(r'^search/$', FilterView.as_view(filterset_class=UserFilter,
        template_name='search/user_list.html'), name='search'),
]

Note that I also passed the template_name as a parameter. By default django-filter will look for the template based on the model’s app and its name, following the logic: <app_name>/<model_name>_filter.html.

Since I’m implementing a filter for the Django User, which lives inside the auth app (and I don’t have access to it), the django-filter library would look for a template in the path: auth/user_filter.html.


Exploring the Filtering Options

It is very straightforward to use the filters. Basically it use the Django’s QuerySet field lookup.

So, let’s say we want to filter the first_name, and the input should be case-insensitive and it may match part of the name.

filters.py

class UserFilter(django_filters.FilterSet):
    first_name = django_filters.CharFilter(lookup_expr='icontains')
    class Meta:
        model = User
        fields = ['username', 'first_name', 'last_name', ]

Django Filter

We may also add the date_joined field, and filter it by the year only:

filters.py

class UserFilter(django_filters.FilterSet):
    first_name = django_filters.CharFilter(lookup_expr='icontains')
    year_joined = django_filters.NumberFilter(name='date_joined', lookup_expr='year')
    class Meta:
        model = User
        fields = ['username', 'first_name', 'last_name', ]

Django Filter

Note that I didn’t added it to the fields list. If you are defining it as a class attribute, you don’t need to explicitly add it to the Meta class.

We may also add some extra attributes, related to the year:

class UserFilter(django_filters.FilterSet):
    first_name = django_filters.CharFilter(lookup_expr='icontains')
    year_joined = django_filters.NumberFilter(name='date_joined', lookup_expr='year')
    year_joined__gt = django_filters.NumberFilter(name='date_joined', lookup_expr='year__gt')
    year_joined__lt = django_filters.NumberFilter(name='date_joined', lookup_expr='year__lt')
    class Meta:
        model = User
        fields = ['username', 'first_name', 'last_name', ]

Django Filter

Another option is to define the fields as a dictionary:

class UserFilter(django_filters.FilterSet):
    class Meta:
        model = User
        fields = {
            'username': ['exact', ],
            'first_name': ['icontains', ],
            'last_name': ['exact', ],
            'date_joined': ['year', 'year__gt', 'year__lt', ],
        }

Works fine with ManyToManyField too:

class UserFilter(django_filters.FilterSet):
    first_name = django_filters.CharFilter(lookup_expr='icontains')
    year_joined = django_filters.NumberFilter(name='date_joined', lookup_expr='year')

    class Meta:
        model = User
        fields = ['username', 'first_name', 'last_name', 'year_joined', 'groups']

Django Filter

We may also override the default widget for the ManyToManyField groups field, using checkbox instead:

from django import forms
from django.contrib.auth.models import User, Group
import django_filters

class UserFilter(django_filters.FilterSet):
    first_name = django_filters.CharFilter(lookup_expr='icontains')
    year_joined = django_filters.NumberFilter(name='date_joined', lookup_expr='year')
    groups = django_filters.ModelMultipleChoiceFilter(queryset=Group.objects.all(),
        widget=forms.CheckboxSelectMultiple)

    class Meta:
        model = User
        fields = ['username', 'first_name', 'last_name', 'year_joined', 'groups']

Django Filter

For more details about the field options, please refer to the django-filter’s official documentation.


Improving the Template

Actually this is an extra for the post. In the end, the filter.form we access in the template is just a regular Django form. But in case you are wondering how to make it look prettier, here is what we can do:

{% extends 'base.html' %}

{% load widget_tweaks %}

{% block content %}
  <form method="get">
    <div class="well">
      <h4 style="margin-top: 0">Filter</h4>
      <div class="row">
        <div class="form-group col-sm-4 col-md-3">
          {{ filter.form.username.label_tag }}
          {% render_field filter.form.username class="form-control" %}
        </div>
        <div class="form-group col-sm-4 col-md-3">
          {{ filter.form.first_name.label_tag }}
          {% render_field filter.form.first_name class="form-control" %}
        </div>
        <div class="form-group col-sm-4 col-md-3">
          {{ filter.form.last_name.label_tag }}
          {% render_field filter.form.last_name class="form-control" %}
        </div>
        <div class="form-group col-sm-4 col-md-3">
          {{ filter.form.year_joined.label_tag }}
          {% render_field filter.form.year_joined class="form-control" %}
        </div>
        <div class="form-group col-sm-8 col-md-6">
          {{ filter.form.groups.label_tag }}
          <div>
            {% for choice in filter.form.groups %}
              <label class="checkbox-inline">
                {{ choice.tag }} {{ choice.choice_label }}
              </label>
            {% endfor %}
          </div>
        </div>
      </div>
      <button type="submit" class="btn btn-primary">
        <span class="glyphicon glyphicon-search"></span> Search
      </button>
    </div>
  </form>

  <table class="table table-bordered">
    <thead>
      <tr>
        <th>Username</th>
        <th>First name</th>
        <th>Last name</th>
        <th>Date joined</th>
        <th>Groups</th>
      </tr>
    </thead>
    <tbody>
      {% for user in filter.qs %}
        <tr>
          <td>{{ user.username }}</td>
          <td>{{ user.first_name }}</td>
          <td>{{ user.last_name }}</td>
          <td>{{ user.date_joined }}</td>
          <td>
            {% for group in user.groups.all %}
              {{ group }}
            {% empty %}
              <em class="text-muted">No group</em>
            {% endfor %}
          </td>
        </tr>
      {% empty %}
        <tr>
          <td colspan="5">No data</td>
        </tr>
      {% endfor %}
    </tbody>
  </table>
{% endblock %}

The result:

Django Filter Bootstrap

Please note that the {% render_field %} tag is available in the django-widget-tweaks app. If you want to learn more how to use it, have a look on this post: Package of the Week: Django Widget Tweaks.


Conclusions

That’s it! I just wanted to share a little bit about this package. There’s much more you can do with it. It also integrates well with Django Rest Framework.

The code used in this tutorial is available on GitHub simple-django-filter.

The django-filter package:

Douglas Hunley: EXPLAINing intermittent perf problems

From Planet PostgreSQL. Published on Nov 28, 2016.

We've all gotten the dreaded email/call from a user stating that a query is "slow sometimes". If you're lucky, the "sometimes" actually ends up being fairly consistent and you can fairly easily determine what's happening (an errant cron job, for example). All too often though, the issue really is sporadic, fleeting, and indeterministic. So how do you track these down? And more importantly what do you do about them once found?

For starters, you as the DBA should have your PostgreSQL logging configured to log these slow performing queries. After all, you and the devs and the users can agree that all queries should complete in some measure of time (1 sec, 1 minute, etc). So, once you know what this acceptable elapsed time is, you can easily log any query that runs longer by just setting this in your postgresql.conf:

log_min_duration_statement = 1000   # log anything running longer than 1s

And now, you have all queries with long run times logged automatically. And these show up nicely in your pgBadger reports too!

If you're lucky, you'll be able to use EXPLAIN to see why the query is behaving poorly. However, if your life if like mine, the explain plan will be reasonable and won't have any smoking guns to speak of. Which means the performance is either load dependent or being influenced by other processes (something is blowing out your caches, for example). In these cases, what you really need is the EXPLAIN output from the very instant that it performed poorly. However, you can't go back in time to get it. But what you can do is make use of the auto_explain module that ships with PostgreSQL.

In case the name wasn't obvious enough, the auto_explain module causes PostgreSQL to automatically run EXPLAIN on queries according to thresholds that you configure. These automatically generated plans are then logged into the normal PostgreSQL logs. Let's walk through setting it up and see how it works.

First, in your postgresql.conf we want to enable the module:

shared_preload_libraries = 'auto_explain'  # change requires restart

As stated, you will have to restart the postmaster to get the module to load. However, let's configure it in postgresql.conf first:

# Add settings for extensions here
#
# auto_explain
# http://www.postgresql.org/docs/current/static/auto-explain.html
auto_explain.log_analyze = true
auto_explain.log_timing = true
auto_explain.log_verbose = true
auto_explain.log_min_duration = '1000ms'
auto_explain.log_nested_statements = true
auto_explain.log_buffers = true
# auto_explain

What we've done here is configure auto_explain to

  • use EXPLAIN ANALYZE1
  • to use the TIMING option of EXPLAIN
  • to use the VERBOSE option of EXPLAIN
  • to log the plan for anything running longer than 1 second (matches log_min_duration_statement, above)
  • to include statements inside a function to also be logged
  • to use the BUFFERS option of EXPLAIN

As with most GUC in PostgreSQL, these can all be changed using SET in a given session, but we're setting the defaults here. Now that we have them setup, let's see what it looks like in practice.

(doug@[local]:5432/doug[28838]) # CREATE TABLE x(t text);
CREATE TABLE
Time: 6.022 ms
(doug@[local]:5432/doug[28838]) # INSERT INTO x(t) SELECT generate_series(1,10000);
INSERT 0 10000
Time: 23.565 ms
(doug@[local]:5432/doug[28838]) #

We connected to PostgreSQL, created a test table, and then used generate_series to insert 10k rows. In our logs, the following were added:

2016-11-28 13:20:30 EST [28838]: [18-1] user=doug,db=doug,app=psql,client=[local] LOG:  duration: 33.987 ms  statement: CREATE TABLE x(t text);
2016-11-28 13:20:59 EST [28838]: [19-1] user=doug,db=doug,app=psql,client=[local] LOG:  duration: 16.461 ms  plan:
  Query Text: INSERT INTO x(t) SELECT generate_series(1,10000);
  Insert on public.x  (cost=0.00..50.02 rows=1000 width=32) (actual time=16.459..16.459 rows=0 loops=1)
    Buffers: shared hit=10085 read=47 dirtied=45
    I/O Timings: read=0.012
    ->  Subquery Scan on "*SELECT*"  (cost=0.00..50.02 rows=1000 width=32) (actual time=0.010..4.755 rows=10000 loops=1)
          Output: "*SELECT*".generate_series
          ->  Result  (cost=0.00..15.02 rows=1000 width=4) (actual time=0.007..1.364 rows=10000 loops=1)
                Output: generate_series(1, 10000)
2016-11-28 13:20:59 EST [28838]: [20-1] user=doug,db=doug,app=psql,client=[local] LOG:  duration: 23.374 ms  statement: INSERT INTO x(t) SELECT generate_series(1,10000);
2016-11-28 13:21:00 EST [30079]: [1-1] user=,db=,app=,client= LOG:  automatic analyze of table "doug.public.x" system usage: CPU 0.00s/0.11u sec elapsed 0.14 sec

(Note that for illustrative purposes, I issued SET auto_explain.log_min_duration = '0ms')

So, you can see that the CREATE TABLE didn't log anything through the auto_explain module, but the INSERT INTO did. This is a boring example, so let's try a SELECT against our table:

(doug@[local]:5432/doug[28838]) # SELECT * FROM x ORDER BY t LIMIT 10;
┌───────┐
│   t   │
├───────┤
│ 1     │
│ 10    │
│ 100   │
│ 1000  │
│ 10000 │
│ 1001  │
│ 1002  │
│ 1003  │
│ 1004  │
│ 1005  │
└───────┘
(10 rows)

Time: 11.982 ms

and the logs look like: 2016-11-28 13:27:38 EST [322]: [7-1] user=,db=,app=,client= LOG: checkpoint starting: time 2016-11-28 13:27:46 EST [322]: [8-1] user=,db=,app=,client= LOG: checkpoint complete: wrote 75 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=7.569 s, sync=0.092 s, total=7.920 s; sync files=23, longest=0.092 s, average=0.004 s; distance=685 kB, estimate=685 kB 2016-11-28 13:28:48 EST [28838]: [21-1] user=doug,db=doug,app=psql,client=[local] LOG: duration: 11.120 ms plan: Query Text: SELECT * FROM x ORDER BY t LIMIT 10; Limit (cost=561.10..561.12 rows=10 width=4) (actual time=11.073..11.073 rows=10 loops=1) Output: t Buffers: shared hit=45 -> Sort (cost=561.10..586.10 rows=10000 width=4) (actual time=11.072..11.072 rows=10 loops=1) Output: t Sort Key: x.t Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=45 -> Seq Scan on public.x (cost=0.00..345.00 rows=10000 width=4) (actual time=0.018..1.224 rows=10000 loops=1) Output: t Buffers: shared hit=45 2016-11-28 13:28:48 EST [28838]: [22-1] user=doug,db=doug,app=psql,client=[local] LOG: duration: 11.813 ms statement: SELECT * FROM x ORDER BY t LIMIT 10;

(You can safely ignore the checkpoint lines at the top there)

There you have both the statement we ran, and the full EXPLAIN plan. You can see we did a sequential scan on the table (looks like it was all in the shared_buffers too) and then we passed that up to a sort node for an in-memory sort, and then passed that result set up to the limit node.

While this is a stupid simple example, I hope you can see that having this in production for large, complicated queries will allow you to better diagnose issues. For example, simply doing a manual EXPLAIN ANALYZE on the same query and seeing that you get a different plan is potentially enough to rule out (or in) certain culprits for the intermittent performance issue.



1 - This option causes the postmaster to collect info on every statement executed, even if auto_explain isn't going to log it. It has a measurable impact on overall performance. Please test on your workload and decide for yourself if the overhead is worth the trade-off

Federico Campoli: pg chameleon 1.0 alpha1

From Planet PostgreSQL. Published on Nov 27, 2016.

As mentioned in the presentation I did at the Brighton PostgreSQL Meetup, pg chameleon is a commute project.
In general I work on it when I travel. And thanks to the pgconf and other requirements I travelled a lot recently.

This is the presentation's recording (sorry the audio is suboptimal).


So I was able to build a minimum viable product which seems to work properly.

The alpha1 release is available for download and testing here

https://github.com/the4thdoctor/pg_chameleon/releases/tag/1.0-alpha.1

The system comes with the following limitations.

Installation in virtualenv

For working properly you should use virtualenv for installing the requirements via pip. I'm planning to make the package available via pip once it reaches the RC level.

No daemon yet

The script should be executed in a screen session to keep it running. Currently there's no respawning of the process on failure nor failure detector.

psycopg2 requires python and postgresql dev files

The psycopg2's pip installation requires the python development files and postgresql source code.
Please refer to your distribution for fulfilling those requirements.

DDL replica limitations

DDL and DML mixed in the same transaction are not decoded in the right order. This can result in a replica breakage caused by a wrong jsonb descriptor if the DML change the data on the same table modified by the DDL. I know the issue and I'm working on a solution.
Test please!
Please submit the issues you find.
Bear in mind this is an alpha release. if you use the software in production keep an eye on the process to ensure the data is correctly replicated.

Regina Obe: PostGIS 2.3.1 Released

From Planet PostgreSQL. Published on Nov 27, 2016.

The PostGIS development team is pleased to announce the release of PostGIS 2.3.1 As befits a patch release, the focus is on bugs and breakages.

Continue Reading by clicking title hyperlink ..

Markus Winand: Big News In Databases

From Planet PostgreSQL. Published on Nov 27, 2016.

2016 has seen lots of big news. Also when it comes to databases. Let's have a look at those really BIG news.

MySQL and MariaDB become modern

For a long time MySQL and MariaDB were mostly limited to the functionality of SQL-92. Now, both decided to go forward by supporting the WITH and OVER clauses in their next major release (MySQL 8.0, MariaDB 10.2). The MariaDB beta release supports both features already while the current MySQL “lab release” only support the WITH clause yet.

Parallel execution in PostgreSQL 9.6

Commercial databases can do it for decades, but in the open source database scene it is quite unique: Since release 9.6 PostgeSQL can execute single queries on multiple cores and thus make better use of modern hardware for data intensive queries. At the moment, only some special cases can take advantage of in-query parallelism. The next releases will gradually extend this functionality. Speaking of next releases: the next major PostgreSQL release will be 10.0.

SQL Server on Linux (Beta)

A beta of the next version of SQL Server (currently called “vNext”) was released. Also for Linux. On supported distributions (Red Hat, Ubuntu) the installation is done by the regular package manager. Further, Microsoft provides Docker images. The “What's New” list of vNext is rather modest from developers perspective. However, SQL Server finally gets STRING_AGG. Also noteworthy is that CREATE OR ALTER was added to SQL Server 2016 SP1, which was released at the same time the vNext beta went public.

Oracle 12cR2 currently in the cloud only

On the one hand, Version 12.2 of the Oracle Database was already released. On the other hand, you cannot download it. Cloud first seems to be the new strategy for Oracle. For developers the new release is not very exciting anyway: Error handling for CASE and LISTAGG, approximate aggregates and, of course, more JSON. Find the full list at the “New Features” documentation.

DB learns OFFSET – unfortunately

DB2 for Linux, UNIX and Windows (short LUW) has got the release 11.1 earlier this year. The general theme of the new SQL features was compatibility. It seems like IBM aims to support every single SQL dialect out there. And so it comes that DB2 supports the SQL:2011 OFFSET clause even without enabling the MySQL compatibility vector. A big setback from my No-Offset movement.

SQLite gets Row-Values

SQLite compensates for this by introducing row values support in version 3.15.0—the SQL-92 feature that make life without OFFSET often more easy

Free No-Offset sticker

If you don't know about No-Offset yet, you should definitively read this article. During this years Cyber Monday sale of my book, I'd be glad mail you a No-Offset sticker for free. You'll also get the Use The Index, Luke! and modern SQL stickers. Just fill in this form and get them without any obligation:

Order free stickers now

Big News In Databases” by Markus Winand was originally published at Use The Index, Luke!.

Ernst-Georg Schmid: pgchem::tigress 4.0 released

From Planet PostgreSQL. Published on Nov 27, 2016.

pgchem::tigress 4.0 is finally out!
  • This builds against PostgreSQL >= 9.4.x and OpenBabel 4.2.x on Linux.
  • It contains all fixes and contributions of the previous versions.
  • Support for building with Indigo has been removed
  • A lot of dead code has been removed
  • All palloc calls have been replaced by palloc0
  • MOLECULE is not binary compatible and needs to be recreated when you upgrade from 3.2 to 4.0!
So: CREATE EXTENSION pgchem_tigress;

Django Channels: Using Custom Channels

By Abu Ashraf Masnun from Django community aggregator: Community blog posts. Published on Nov 26, 2016.

In my earlier blog post - Introduction to Django Channels, I mentioned that we can create our own channels for various purposes. In this blog post, we would discuss where custom channels can be useful, what could be the challenges and of course we would see some code examples. But before we begin, please make sure you are familiar with the concepts of Django Channels. I would recommend going through the above mentioned post and the official docs to familiarize yourself with the basics.

Our Use Case

Channels is just a queue which has consumers (workers) listenning to it. With that concept in mind, we might be able to think of many innovative use cases a queue could have. But in our example, we will keep the idea simple. We are going to use Channels as a means of background task processing.

We will create our own channels for different tasks. There will be consumers waiting for messages on these channels. When we want to do something in the background, we would pass it on the appropriate channels & the workers will take care of the tasks. For example, we want to create a thumbnail of an user uploaded photo? We pass it to the thumbnails channel. We want to send a confirmation email, we send it to the welcome_email channel. Like that. If you are familiar with Celery or Python RQ, this would sound pretty familiar to you.

Now here’s my use case - in one of the projects I am working on, we’re building APIs for mobile applications. We use BrainTree for payment integration. The mobile application sends a nonce - it’s like a token that we can use to initiate the actual transaction. The transaction has two steps - first we initiate it using the nonce and I get back a transaction id. Then I query whether the transaction succeeded or failed. I felt it would be a good idea to process this in the background. We already have a websocket end point implemented using Channels. So I thought it would be great to leverage the existing setup instead of introducing something new in the stack.

Challenges

It has so far worked pretty well. But we have to remember that Channels does not gurantee delivery of the messages and there is no retrying if a message fails. So we wrote a custom management command that checks the orders for any records that have the nonce set but no transaction id or there is transaction id but there is no final result stored. We then scheduled this command to run at a certain interval and queue up the unfinished/incomplete orders again. In our case, it doesn’t hurt if the orders need some 5 to 10 minutes to process.

But if we were working on a product where the message delivery was time critical for our business, we probably would have considered Celery for the background processing part.

Let’s see the codes!

First we needed to write a handler. The hadler would receive the messages on the subscribed channel and process them. Here’s the handler:

def braintree_process(message):
    order_data = message.content.get('order')
    order_id = message.content.get('order_id')
    order_instance = Order.objects.get(pk=order_id)

    if order_data:
        nonce = order_data.get("braintree_nonce")
        if nonce:
            # [snipped]

            TRANSACTION_SUCCESS_STATUSES = [
                braintree.Transaction.Status.Authorized,
                braintree.Transaction.Status.Authorizing,
                braintree.Transaction.Status.Settled,
                braintree.Transaction.Status.SettlementConfirmed,
                braintree.Transaction.Status.SettlementPending,
                braintree.Transaction.Status.Settling,
                braintree.Transaction.Status.SubmittedForSettlement
            ]

            result = braintree.Transaction.sale({
                'amount': str(order_data.get('total')),
                'payment_method_nonce': nonce,
                'options': {
                    "submit_for_settlement": True
                }
            })

            if result.is_success or result.transaction:
                transaction = braintree.Transaction.find(result.transaction.id)
                if transaction.status in TRANSACTION_SUCCESS_STATUSES:
                    # [snipped]
                else:
                    # [snipped]
            else:
                errors = []
                for x in result.errors.deep_errors:
                    errors.append(str(x.code))

                # [snipped]

Then we needed to define a routing so the messages on a certain channel is passed on to this handler. So in our channel routing, we added this:

from channels.routing import route
from .channel_handlers import braintree_process

channel_routing = [
    route("braintree_process", braintree_process),
    # [snipped] ...
]

We now have a routing set and a handler ready to accept messages. So we’re ready! All we need to do is to start passing the data to this channel.

When the API receives a nonce, it just passes the order details to this channel:

Channel("braintree_process").send({
    "order": data,
    "order_id": order.id
})

And then the workers start working. They accept the message and then starts processing the payment request.

In our case, we already had the workers running (since they were serving our websocket requests). If you don’t have any workers running, don’t forget to run them.

python manage.py runworker

If you are wondering about how to deploy channels, I have you covered - Deploying Django Channels using Daphne

Prioritizing / Scaling Channels

In our project, Django Channels do two things - handling websocket connections for realtime communication, process delayed jobs in background. As you can probably guess, the realtime part is more important. In our current setup, the running workers handle both types of requests as they come. But we want to dedicate more workers to the websocket and perhaps just one worker should keep processing the payments.

Luckily, we can limit our workers to certain channels using the --only-channels flag. Or alternatively we can exclude certain channels by using the --exclude-channels flags.

Concluding Thoughts

I personally find the design of channels very straightforward, simple and easy to reason about. When Channels get merged into Django, it’s going to be quite useful, not just for implementing http/2 or websockets, but also as a way to process background tasks with ease and without introducing third party libraries.

Denis Gasparin: pgrepup – upgrade PostgreSQL using logical replication

From Planet PostgreSQL. Published on Nov 26, 2016.

pgrepup is a tool written in Python for upgrading a PostgreSQL cluster to a new major version using logical replication and pglogical extension.

pgrepup simplifies the setup of 2nd Quadrant’s pglogical extension giving hints for configuring correctly source and destination pgsql clusters.

The supported versions of PostgreSQL are 9.4, 9.5 and 9.6.

Quick start

Requirements

pgrepup requires both a source and a destination PostgreSQL cluster.

The clusters can have been installed into the same server or on different hosts.

pgrepup doesn’t need to be installed on the clusters’ server.
It can be safely executed from a remote host that can access both pgsql clusters. In this case, it’s recommended that SSL is enabled in pg_hba.conf of both clusters because pgsql credentials are sent over the network.

Installation

pip install pgrepup

All versions of Python >= 2.7 are supported.

Replication

A pgsql cluster can be replicated and upgraded using pgrepup following these four steps:

  1. pgrepup config
     : a simple wizard asks the basic configuration parameters needed by pgrepup
    • Source and Destination database cluster
    • Directory where to store temporary files
  2. pgrepup check
     : various checks are performed both in Source and Destination cluster
    • if a check fails, pgrepup outputs a hint for helping you to configure each cluster
  3. pgrepup setup
     : if the checks are all ok, this setup installs and configure pglogical in both pgsql clusters
  4. pgrepup start
     : start the replication process

After the start command, you can monitor the replication process using the command 

pgrepup status

The output of the status command displays an entry for each database of the source cluster along with the status reported by pglogical extension.
The status can be one of the following three values:

  • initializing
     : pglogical is copying data from source to destination cluster
  • replicating
    : pglogical is using pgsql logical replication to replicate and upgrade new data changed into the source cluster
  • down
    : replication is down, check the PostgreSQL log in both clusters

After issuing the start command all databases will be in the 

initializing
 status. During this phase pglogical background workers are executing the SQL dump of the source cluster, so it can take a while to complete.
When the dump is completed, each database status will change to 
replicating
 as the data is progressively copied from the source cluster.

Upgrade

When the replication is working fine, you can switch your application to the destination cluster at any moment.
Just follow these steps:

  • stop your application connecting to the source cluster
  • ensure no more connections are made to the source cluster
  • stop replication using 
    pgrepup stop
      command
  • change the DSN in your application (or in your connection pooler) and point to the destination cluster
  • start your application
  • upgrade done! 🙂

Caveats and limits

pgrepup is still experimental. Please feel free to open an issue on github if you encounter problems.

DDL commands

DDL commands issued in a source cluster database are not replicated to the destination cluster. This is a limit of how pgsql logical replication works.
Use the 

pglogical.replicate_ddl_command
  SQL function on the source database in order to replicate the DDL on the destination cluster.

Be aware that, at the moment, pgrepup doesn’t handle the automatic subscription of newly created tables added using 

pglogical.replicate_ddl_command
 .
The recommended procedure is to re-start the replication process using the stop, setup and start commands.

A solution is in the works and will be available in the next release of pgrepup.

Sequences

Sequences are replicated between source and destination cluster. When the stop command is given, pgrepup uses pglogical function to do a final synchronization of each sequence value.
The pglogical function adds an artificial +1000 value to the actual sequence value: see this discussion on pglogical mailing list on google groups.

High number of databases

After issuing a start command, pglogical background workers start all simultaneously to dump the data of the source database into the destination database.

This can generate very high cpu/disk load on both clusters depending on the number of databases to replicate.

A feature that enables to limit the number of databases that are dumped concurrently is in the works.

Contributions

pgrepup is licensed using GPL-3 license. Source code is available at project page on github: https://github.com/rtshome/pgrepup

Contributions are welcome!

Regina Obe: PostGIS 2.2.4 Released

From Planet PostgreSQL. Published on Nov 25, 2016.

The PostGIS development team is pleased to announce the release of PostGIS 2.2.4 As befits a patch release, the focus is on bugs and breakages.

Continue Reading by clicking title hyperlink ..

Shaun M. Thomas: PG Phriday: Extended Elections

From Planet PostgreSQL. Published on Nov 25, 2016.

One of the best features Postgres boasts is the ability to adapt. Any schmo off the street can write an extension and bolt it onto Postgres with nary a second glance. As proof, I’m going to whip one up really quick. That should be enough to convince anyone that it takes no skill at all to add functionality to Postgres.

Just so our extension actually does something, let’s start off with the instant-runoff code we wrote a few weeks ago. Except this time, we also want to transform it into a kind of utility to justify the fact it’s an extension. If it’s just a schema, a few tables and associated functions, it’s just another application.

Let’s start with the preamble in our SQL file:

\echo USE "CREATE EXTENSION irv;" TO LOAD this file. \quit
 
SET client_min_messages = warning;
 
--------------------------------------
-- CREATE EXTENSION USER
--------------------------------------
 
DO $$
BEGIN
  PERFORM 1
    FROM pg_roles
   WHERE rolname = 'irv_role';
 
  IF NOT FOUND THEN
    EXECUTE 'CREATE ROLE irv_role';
    EXECUTE 'GRANT USAGE ON SCHEMA @extschema@ TO irv_role';
  END IF;
END;
$$ LANGUAGE plpgsql;

It’s best to begin with a line that prevents the extension from being executed as a simple SQL script. After that, we set the extension to report any warnings during installation.

The giant DO block is an old trick some extension authors use. When an extension is installed, it is generally owned only by the user who installed it. In most cases, this is a superuser of some kind, and that means nobody else can use the extension. Here, we’re creating a role that will have all necessary grants to call extension functions or view table contents. Then we can just grant the role to any users that should have access to the extension.

Next we should create our tables:

 
CREATE TABLE election
(
  election_id    SERIAL PRIMARY KEY,
  election_name  VARCHAR NOT NULL,
  schema_name    VARCHAR NOT NULL UNIQUE
);
 
CREATE INDEX idx_election_election_name
    ON election (election_name);
 
SELECT pg_catalog.pg_extension_config_dump('election', '');
 
CREATE TABLE candidate
(
  candidate_id    SERIAL PRIMARY KEY,
  election_id     INT NOT NULL REFERENCES election,
  candidate_name  VARCHAR NOT NULL
);
 
CREATE INDEX idx_candidate_candidate_name
    ON candidate (candidate_name);
 
SELECT pg_catalog.pg_extension_config_dump('candidate', '');
 
CREATE TABLE vote
(
  election_id     INT NOT NULL REFERENCES election,
  candidate_list  INT[] NOT NULL
);
 
CREATE INDEX idx_vote_election_id
    ON vote (election_id);
 
CREATE TABLE scratch
(
  candidate_name  VARCHAR,
  votes           BIGINT,
  percent         NUMERIC,
  my_round        INT
);

By and large, these are the same as before. But notice that we’re invoking pg_extension_config_dump after two of the tables. This is because the election and candidate tables contain election metadata, and we want pg_dump to retain that information. Normally extensions don’t contain user data, so the presumption is that recreating the extension during database population is sufficient. Well, if we want to dump and then restore our database contents and retain the elections and associated candidates, we need to register those tables with Postgres.

But what about the vote table? This is where our extension earns its name. The thing about elections is that there are a lot of them, and millions of people vote. To prevent having a gigantic vote table, we’ve elected to “shard” the election vote results into separate tables in a schema named after the election itself. This means the vote table here is just scaffolding to help define the table which gets created by the extension.

Now let’s talk about how elections are started:

CREATE OR REPLACE FUNCTION start_election(
  elect VARCHAR
)
RETURNS VOID AS
$$
DECLARE
  use_schema VARCHAR := SUBSTRING(
    regexp_replace(elect, '\W', '', 'g'),
    1, 32
  );
BEGIN
  INSERT INTO @extschema@.election (election_name, schema_name)
  VALUES (elect, use_schema);
 
  EXECUTE 'CREATE SCHEMA ' || use_schema;
 
  PERFORM set_config('search_path', use_schema, TRUE);
 
  CREATE TABLE irv_vote (
    vote_id BIGSERIAL PRIMARY KEY,
    LIKE @extschema@.vote INCLUDING ALL
  );
END;
$$ LANGUAGE PLPGSQL;

We start by removing all non-word characters from the election name, and then extracting the first 32 characters to create a valid schema name. Then we capture the election information as we did before. After that, we create the schema that will contain our votes, and the vote table itself. We’ve prefixed the vote table with “irv” to note that the extension created it.

We’ve also set the search_path to default to the new schema we just created. This way, we don’t need to build a SQL statement string and use EXECUTE to invoke it. There’s also the @extschema@ syntax. Some extensions allow users to specify a schema during installation. By using this syntax, Postgres will substitute @extschema@ for the schema where extension objects will be stored. Normally the schema is set by the extension author.

The rest of the functions have similar modifications; a preamble to set the vote schema, and @extschema@ replacements where necessary. Here’s what they look like:

CREATE OR REPLACE FUNCTION no_losers(orig INT[], remove INT[])
RETURNS INT[] AS
$$
DECLARE
  item INT; 
  ret_arr INT[] := orig;
BEGIN
  IF array_length(remove, 1) IS NULL THEN
    RETURN ret_arr;
  END IF;
 
  FOR item IN 1 .. array_length(remove, 1) LOOP
    ret_arr := array_remove(ret_arr, remove[item]);
  END LOOP;
 
  RETURN ret_arr;
END;
$$ LANGUAGE PLPGSQL IMMUTABLE STRICT;
 
CREATE OR REPLACE FUNCTION register_candidate(
  elect VARCHAR,
  cand VARCHAR
)
RETURNS VOID AS
$$
  INSERT INTO @extschema@.candidate (election_id, candidate_name)
  SELECT election_id, cand
    FROM @extschema@.election
   WHERE election_name = elect;
$$ LANGUAGE SQL;
 
CREATE OR REPLACE FUNCTION register_vote(
  elect VARCHAR,
  candidates VARCHAR[]
)
RETURNS VOID AS
$$
DECLARE
  use_schema VARCHAR;
BEGIN
  SELECT schema_name INTO use_schema
    FROM @extschema@.election
   WHERE election_name = elect;
 
  PERFORM set_config('search_path', use_schema, TRUE);
 
  WITH ranked AS (
    SELECT candidate_name, ROW_NUMBER() OVER () AS rank
      FROM unnest(candidates) candidate_name
  ),
  conv AS (
    SELECT c.election_id, c.candidate_id, r.rank
      FROM @extschema@.candidate c
      JOIN ranked r USING (candidate_name)
     ORDER BY rank
  )
  INSERT INTO irv_vote (election_id, candidate_list)
  SELECT election_id, array_agg(candidate_id)
    FROM conv
   GROUP BY 1;
END;
$$ LANGUAGE PLPGSQL;
 
CREATE OR REPLACE FUNCTION tabulate_votes(
  elect VARCHAR
)
RETURNS VOID AS
$$
DECLARE
  use_schema VARCHAR;
  num_candidates INT;
  num_votes NUMERIC;
  vote_round INT;
  losers INT[] := '{}'::INT[]; -- Non-null empty array.
  winner VARCHAR;
  results RECORD;
BEGIN
  SELECT schema_name INTO use_schema
    FROM @extschema@.election
   WHERE election_name = elect;
 
  PERFORM set_config('search_path', use_schema, TRUE);
 
  DROP TABLE IF EXISTS working_round;
  CREATE TABLE working_round (LIKE @extschema@.scratch);
 
  -- Given the number of candidates in this election, we can
  -- reduce calculations.
 
  SELECT COUNT(*) INTO num_candidates
    FROM @extschema@.candidate c
    JOIN @extschema@.election e USING (election_id)
   WHERE e.election_name = elect;
 
  -- We need the total number of votes cast for display
  -- percentages.
 
  SELECT COUNT(*) INTO num_votes
    FROM irv_vote v
    JOIN @extschema@.election e USING (election_id)
   WHERE e.election_name = elect;
 
  -- Record each round of the IRV process. At the end of each round,
  -- eliminate the worst performer and try again. Do this until we
  -- reach > 50% for one candidate.
 
  FOR vote_round IN 1 .. (num_candidates - 1) LOOP
    RAISE NOTICE '== Round % ==', vote_round;
 
    INSERT INTO working_round
    SELECT c.candidate_name, COUNT(v.vote_id) AS votes,
           round(COUNT(v.vote_id) / 
                 num_votes * 100.0, 2) AS percent,
           vote_round AS my_round
      FROM irv_vote v
      JOIN @extschema@.candidate c ON (
             c.candidate_id = (
               @extschema@.no_losers(v.candidate_list, losers)
             )[1]
           )
      JOIN @extschema@.election e ON (e.election_id = v.election_id)
     WHERE e.election_name = elect
     GROUP BY c.candidate_name;
 
    -- Print the results of the round for spot-checking.
 
    FOR results IN
        SELECT * FROM working_round
         WHERE my_round = vote_round
    LOOP
      RAISE NOTICE '%: % (%)', results.candidate_name,
        results.votes, results.percent;
    END LOOP;
 
    -- If this round has a winner, short-circuit so we can
    -- just report the results.
 
    SELECT candidate_name INTO winner
      FROM working_round
     WHERE percent > 50
       AND my_round = vote_round;
 
    IF winner IS NOT NULL THEN
      RAISE NOTICE 'Winner of % is %!', elect, winner;
      EXIT;
    END IF;
 
    -- This is where we determine the loser of this round.
    -- It's just the lowest ranked result.
 
    SELECT array_append(losers, c.candidate_id) INTO losers
      FROM working_round w
      JOIN @extschema@.candidate c USING (candidate_name)
     WHERE my_round = vote_round
     ORDER BY w.votes
     LIMIT 1;
 
  END LOOP;
 
END;
$$ LANGUAGE PLPGSQL;

But how to install it? First we need a file named irv.control that will contain some parameters that tell Postgres how to install and identify the extension. It looks something like this:

comment = 'Extension for managing instant-runoff elections.'
default_version = '1.0'
relocatable = false
schema = irv

Don’t worry about that relocatable line. That just means we don’t want the extension to be moved with ALTER EXTENSION after initial installation. Our schema contains data instead of simply a collection of utility functions, so it’s not good practice to relocate willy nilly.

All Postgres expects is that this control file exists along with the SQL file in its configured extension directory. The easiest way to find this is to install the Postgres dev libraries so we have access to pg_config. Extensions are normally found in the extension subdirectory of the location reported by pg_config --sharedir.

Then we just need to create our SQL file there, along with the control file. Postgres does have a naming scheme we need to follow, however. We need to take the extension name and version, and separate them by a double dash. So in our case, the SQL file should be named irv--1.0.sql.

Of course, we can’t forget the grants we spoke about in the beginning:

REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA @extschema@ FROM PUBLIC;
GRANT ALL ON ALL TABLES IN SCHEMA @extschema@ TO tab_tier_role;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA @extschema@ TO tab_tier_role;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA @extschema@ TO tab_tier_role;

If we combine all of the SQL from this article, and save it as irv--1.0.sql along with the specified control file, we’ve successfully written an extension. But one question remains: does it work?

Well, let’s see! Votes have been coming in for weeks now, so we can even use the new totals under the same assumptions as before.

CREATE EXTENSION irv;
 
SELECT irv.start_election('Pres 2016');
 
SELECT irv.register_candidate('Pres 2016', 'Clinton');
SELECT irv.register_candidate('Pres 2016', 'Trump');
SELECT irv.register_candidate('Pres 2016', 'Stein');
SELECT irv.register_candidate('Pres 2016', 'Johnson');
 
SELECT irv.register_vote('Pres 2016', '{Clinton, Stein}')
  FROM generate_series(1, 64433);
 
SELECT irv.register_vote('Pres 2016', '{Stein, Clinton}')
  FROM generate_series(1, 1395);
 
SELECT irv.register_vote('Pres 2016', '{Trump, Johnson}')
  FROM generate_series(1, 62337);
 
SELECT irv.register_vote('Pres 2016', '{Johnson, Trump}')
  FROM generate_series(1, 4418);
 
SELECT irv.tabulate_votes('Pres 2016');
 
NOTICE:  == Round 1 ==
NOTICE:  Stein: 1395 (1.05)
NOTICE:  Trump: 62337 (47.02)
NOTICE:  Clinton: 64433 (48.60)
NOTICE:  Johnson: 4418 (3.33)
NOTICE:  == Round 2 ==
NOTICE:  Trump: 62337 (47.02)
NOTICE:  Clinton: 65828 (49.65)
NOTICE:  Johnson: 4418 (3.33)
NOTICE:  == Round 3 ==
NOTICE:  Trump: 66755 (50.35)
NOTICE:  Clinton: 65828 (49.65)
NOTICE:  Winner OF Pres 2016 IS Trump!

We can even see the tables where all of the data exists for this particular election. This pattern will persist for any election controlled by this extension.

SELECT schemaname, tablename
  FROM pg_tables
 WHERE schemaname = 'pres2016';
 
 schemaname |   tablename   
------------+---------------
 pres2016   | working_round
 pres2016   | irv_vote

Regardless of how you might feel about the US 2016 election, it’s great to see the extension working as expected!

And isn’t that all that really matters?

Kaarel Moppel: PostgreSQL: writer and wal writer processes explained

From Planet PostgreSQL. Published on Nov 25, 2016.

Looking at the running processes list on a server where Postgres is running, one sees two writer processes amongst other background and client processes. One process is more specifically named wal writer though. But still confusing – why are there 2 similarly named writers? What do they do? From my experiences from trainings purely this […]

The post PostgreSQL: writer and wal writer processes explained appeared first on Cybertec - The PostgreSQL Database Company.

Aislan Wendling: Importing from Oracle to PostgreSQL using CSV file OR How to get Oracle data without a Foreign Data Wrapper

From Planet PostgreSQL. Published on Nov 24, 2016.

Hi, folks.

At my work, my boss needed urgently a report involving PostgreSQL and Oracle Databases. As almost the situations that already uses data from both are gathered by PHP, I’ve never had the need to install oracle_fdw. So I started to install it and to make the SQL. At some point, I encountered a problem with a library.

ERROR: could not load library "/usr/local/pg936/lib/postgresql/oracle_fdw.so": libclntsh.so.12.1: cannot open shared object file: No such file or directory

The problem, in fact, is not a problem. You just need to restart the database.

– Wait a minute! You just said the word “restart”???

Well, I had that problem too. There was no possibility to restart anything at the very moment and my boss was waiting for the report. So I needed to do it by another way.

Doing two queries and copy into an electronic sheet could be a possibility, but not very effective when you need to do a lot of references between them, so I tried another solution.

All this text was to give some basis to the following, don’t worry, I will be more technical from now on.

Install the Oracle instantclient:

It’s quite easy, just head to oracle website, accept their terms and download 3 packages: Basic, SQL*Plus and SDK;

Install them as this example:
rpm -ivh oracle-instantclient12.1-basic-12.1.0.2.0-1.x86_64.rpm

Make the environment variables permanent as this example:

root@dbserver: vim /etc/profile
[a lot of code up here]...
export LD_LIBRARY_PATH=/usr/lib/oracle/12.1/client64/lib:$LD_LIBRARY_PATH
export PATH=/usr/lib/oracle/12.1/client64/bin:$PATH
:wq
[a lot of code down here]...

Now you’re ready to go.

Connect, Export, and Import:

There’s a bunch of ways to do that, so feel free to do as you please.

As postgres user, create a .sh file (maybe exportoracle.sh)

Inside it, write down the connection string pointing the file with the query that you want to run on Oracle and the psql command to import to your PostgreSQL database:

sqlplus64 oracleuser/password@host/sid @exportquery.sql
psql -d database -f importquery.sql

Easy looking from above.

If you ever tried to do that, you may have encountered a lot of little problems, from date formats incompatibility to concatenated data.

Let’s look inside the queries to understand better.

exportquery.sql
alter session set nls_date_format='DD/MM/YYYY';

set colsep ,
set pagesize 0
set trimspool on
set headsep off
set linesize 1000
set numw 16
set feedback off

spool exportoracle.csv

select alias1.column,
alias2.column,
alias3.column,
ROW_NUMBER() OVER (PARTITION BY
alias1.column, alias2.column ORDER BY 1,2) as order
from table1 alias1
join table2 alias2 on (alias2.column2 = alias1.column2)
join table3 alias3 on (alias3.column2 = alias1.column2)
where alias1.column = value
and alias3.column between to_char(SYSTIMESTAMP – INTERVAL ’90’ DAY, ‘DD/MM/YYYY’) and to_char(SYSTIMESTAMP,’DD/MM/YYYY’)
order by 1,2,3;
exit

All those SETs make your file human readable and most important, importable.

importquery.sql
BEGIN;
CREATE TABLE schema.importedtable (
column1 INTEGER NOT NULL,
column2 INTEGER NOT NULL,
column3 DATE NOT NULL,
column4 INTEGER NOT NULL
)
WITH (oids = false)
TABLESPACE dbtablespace;

CREATE INDEX importedtable_idx1 ON schema.importedtable
USING btree (column1 , column3 )
TABLESPACE idxtablespace;
COMMIT;
BEGIN;
COPY schema.importedtable FROM ‘/home/postgres/database/exportoracle.csv’ DELIMITER ‘,’ CSV;
COMMIT;

Now you can do your queries on your PostgreSQL database without restart it.


How to Add User Profile To Django Admin

By Simple is Better Than Complex from Django community aggregator: Community blog posts. Published on Nov 23, 2016.

There are several ways to extend the the default Django User model. Perhaps one of the most common way (and also less intrusive) is to extend the User model using a one-to-one link. This strategy is also known as User Profile. One of the challenges of this particular strategy, if you are using Django Admin, is how to display the profile data in the User edit page. And that’s what this tutorial is about.


Background

I’ve published a while ago an article about How to Extend Django User Model, describing in great detail all the different strategies. If you are still not familiar with it, I strongly suggest that you have a look in this article.

This tutorial is about the User Profile strategy. So, consider we have an app named core with the following model definition:

models.py

from django.contrib.auth.models import User
from django.db import models
from django.db.models.signals import post_save
from django.dispatch import receiver

class Profile(models.Model):
    STUDENT = 1
    TEACHER = 2
    SUPERVISOR = 3
    ROLE_CHOICES = (
        (STUDENT, 'Student'),
        (TEACHER, 'Teacher'),
        (SUPERVISOR, 'Supervisor'),
    )
    user = models.OneToOneField(User, on_delete=models.CASCADE)
    location = models.CharField(max_length=30, blank=True)
    birthdate = models.DateField(null=True, blank=True)
    role = models.PositiveSmallIntegerField(choices=ROLE_CHOICES, null=True, blank=True)

    def __str__(self):  # __unicode__ for Python 2
        return self.user.username

@receiver(post_save, sender=User)
def create_or_update_user_profile(sender, instance, created, **kwargs):
    if created:
        Profile.objects.create(user=instance)
    instance.profile.save()

What we want to achieve is making the fields location, birthdate and role available to be edited on Django Admin.


Editing The User Profile

The trick is very simple. We will extend the default UserAdmin, add the profile instance as an inline and switch the UserAdmin Django uses.

admin.py

from django.contrib import admin
from django.contrib.auth.admin import UserAdmin
from django.contrib.auth.models import User

from .models import Profile

class ProfileInline(admin.StackedInline):
    model = Profile
    can_delete = False
    verbose_name_plural = 'Profile'
    fk_name = 'user'

class CustomUserAdmin(UserAdmin):
    inlines = (ProfileInline, )

    def get_inline_instances(self, request, obj=None):
        if not obj:
            return list()
        return super(CustomUserAdmin, self).get_inline_instances(request, obj)


admin.site.unregister(User)
admin.site.register(User, CustomUserAdmin)

A few things here: I intentionally defined the verbose_name_plural as Profile, because each user can have only one profile, and as the component we are using is an StackedInline (which normally is used for formsets), just to avoid displaying Profiles in plural when we only have one profile. It’s just a minor thing.

Then a very important thing, we need to override the get_inline_instances method, so to display the inlines only in the edit form. Otherwise we might get some problems because of how the Signals work. Remember that the Signal is responsible for creating the Profile instance.

Since by default Django only provides username and password upon user creation, this way we keep the default behavior.

Finally we unregister the old User admin and register the new one, the custom with the Profile model.

It will look like this:

Django Admin User Profile

Cool, right? It just works.


Adding Profile Fields to List View

Now what if we want to display the User location (which is defined inside the profile model), in the list view, like this:

Django Admin User Profile

Since we are already extending the UserAdmin, it’s just a matter of overriding the list_display attribute.

class CustomUserAdmin(UserAdmin):
    inlines = (ProfileInline, )
    list_display = ('username', 'email', 'first_name', 'last_name', 'is_staff', 'get_location')
    list_select_related = ('profile', )

    def get_location(self, instance):
        return instance.profile.location
    get_location.short_description = 'Location'

    def get_inline_instances(self, request, obj=None):
        if not obj:
            return list()
        return super(CustomUserAdmin, self).get_inline_instances(request, obj)

We have to define an extra method, get_location and append it to the list_display, because the field is defined in a external model (the relationship is defined in the Profile model, not in the User model). So for the Django Admin understand how to display the location attribute, we have to play it this way.

The get_location.short_description is just to display it prettier in the table header. Otherwise Django Admin would display it as “Get Location” (which is the name of the method).

A very important detail: note the list_select_related. We are adding the profile relationship there, so to avoid firing hundreds of unnecessary queries in the database. For more on that subject, check this post: Django Tips #3 Optimize Database Queries.


Conclusions

That’s it! Plain simple. This also open an opportunity to explore more the User model in Django Admin. You can override several attributes and very easily add new list actions for example.

Django’s documentation is awesome. But even better is to explore its the source code. For example, to understand more on the definitions of the UserAdmin, don’t be afraid to explore the source code and see the class’ attributes and methods: django.contrib.auth.admin.

Also the source code I used in this tutorial is available on GitHub: sibtc/django-admin-user-profile.

Yeray Darias: PostgreSQL page layout

From Planet PostgreSQL. Published on Nov 23, 2016.

As developers we commonly use databases, even daily, but do we really know how they work? How do they store information? Which internal structure do they use? This post will cover how information is stored in PostgreSQL, from the table structure to the row’s organization at a disk level.

Getting page information

To get the detailed information of data structure we may use the pageinspect extension. After installing this extension we will be able to get the low level detail of PostgreSQL pages.

To install it:

# create extension pageinspect;

Tables from our database

PostgreSQL has much more tables than the ones included in our schema. It stores a lot of data, for example, the tables and colums from our schemas.

There are more than one way to get the schema tables, but the easier one is the command \dt.

# \dt eightkdata.*

The previous sentence will return a list of the relations included in the schema eightkdata.

Page structure in PostgreSQL

Once we know the tables from our schema, it is possible to analyze the page structure using some functions included in the pageinspect extension. For example, we can get the header information in each page, like it is done in the next example for the first page from table eightkdata.httparchive.

# SELECT * FROM page_header(get_raw_page('eightkdata.httparchive', 0));
┌────────────┬──────────┬───────┬───────┬───────┬─────────┬──────────┬─────────┬───────────┐
│    lsn     │ checksum │ flags │ lower │ upper │ special │ pagesize │ version │ prune_xid │
├────────────┼──────────┼───────┼───────┼───────┼─────────┼──────────┼─────────┼───────────┤
│ 6/AA8DD3F0 │        0 │     0 │   928 │   960 │    8192 │     8192 │       4 │         0 │
└────────────┴──────────┴───────┴───────┴───────┴─────────┴──────────┴─────────┴───────────┘

The header of a page are the first 24 bytes and contains basic information to management future changes, like pointers to the free space or the size of the page (in PostgreSQL by default it is 8KB).

Each column in page_header has a meaning:

  • lsn: Log Sequence Number: is the address of the next byte to be used in the page xlog.
  • checksum: Page checksum.
  • flags: Various flag bits.
  • lower: The offset where the free space starts, it will be the initial address to the next tuple created.
  • upper: The offset where the free space ends.
  • special: The offset where the special space starts, it is at the end of the page actually.
  • pagesize: It is the size of the page, 8KB by default, but it can be configured.
  • version: Version number of the page.
  • prune_xid: Signals when pruning operation can be a good option to improve the system.

This data gives a nice view of the page organization that is represented in the next diagram.

PostgreSQL page layout

There are some data in the previous diagram that doesn’t appear in the page_header, they are part of PostgreSQL configuration, like the fill factor or the alignment padding.

Fill factor

The fill factor is a value that tells PostgreSQL when to stop storing tuples in the current page and switch to a new one. By default pages are not completely filled. This allows to store tuples updates in the same original page, increasing the system performance.

Alignment padding

To increase I/O operations efficiency PostgreSQL uses a word length that depends on the machine where it is running. In modern computers with 64 bits processors the word length is 8 bytes.

This causes that tuples are not exactly the same size they are occupying in disk, because PostgreSQL uses this extra space (alignment) to increase I/O performance.

Tuple structure in PostgreSQL

Tuples can be analyzed too using another function called heap_page_items.

# select * from heap_page_items(get_raw_page('eightkdata.httparchive', 0)) limit 10;
┌────┬────────┬──────────┬────────┬────────┬────────┬──────────┬────────┬─────────────┬────────────┬────────┬────────┬───────┐
│ lp │ lp_off │ lp_flags │ lp_len │ t_xmin │ t_xmax │ t_field3 │ t_ctid │ t_infomask2 │ t_infomask │ t_hoff │ t_bits │ t_oid │
├────┼────────┼──────────┼────────┼────────┼────────┼──────────┼────────┼─────────────┼────────────┼────────┼────────┼───────┤
│  1 │   8160 │        1 │     29 │   6202 │      0 │        9 │ (0,1)  │           2 │       2304 │     24 │ ¤      │     ¤ │
│  2 │   8128 │        1 │     29 │   6202 │      0 │        9 │ (0,2)  │           2 │       2304 │     24 │ ¤      │     ¤ │
│  3 │   8096 │        1 │     29 │   6202 │      0 │        9 │ (0,3)  │           2 │       2304 │     24 │ ¤      │     ¤ │
│  4 │   8064 │        1 │     29 │   6202 │      0 │        9 │ (0,4)  │           2 │       2304 │     24 │ ¤      │     ¤ │
│  5 │   8032 │        1 │     29 │   6202 │      0 │        9 │ (0,5)  │           2 │       2304 │     24 │ ¤      │     ¤ │
│  6 │   8000 │        1 │     29 │   6202 │      0 │        9 │ (0,6)  │           2 │       2304 │     24 │ ¤      │     ¤ │
│  7 │   7968 │        1 │     29 │   6202 │      0 │        9 │ (0,7)  │           2 │       2304 │     24 │ ¤      │     ¤ │
│  8 │   7936 │        1 │     29 │   6202 │      0 │        9 │ (0,8)  │           2 │       2304 │     24 │ ¤      │     ¤ │
│  9 │   7904 │        1 │     29 │   6202 │      0 │        9 │ (0,9)  │           2 │       2304 │     24 │ ¤      │     ¤ │
│ 10 │   7872 │        1 │     29 │   6202 │      0 │        9 │ (0,10) │           2 │       2304 │     24 │ ¤      │     ¤ │
└────┴────────┴──────────┴────────┴────────┴────────┴──────────┴────────┴─────────────┴────────────┴────────┴────────┴───────┘

For each tuple there is some information available about position inside the page, visibility or size among others.

  • lp: The index of the tuple in the page.
  • lp_off: Offset of the tubple inside the page.
  • lp_flags: Keeps the status of the item pointer.
  • lp_len: Length of the tuple.
  • t_xmin: Transaction number when the tuple was created.
  • t_xmax: Transaction number when the tuple was deleted.
  • t_field3: It can contains one of two possible values, t_cid or t_xvac. The t_cid is the CID signature from the insert or delete. The t_xvac is the XID for the VACUMM operation when row version changes.
  • t_ctid: Current TID.
  • t_infomask2: Number of attributes and some flag bits.
  • t_infomask: Some flag bits.
  • t_hoff: Is the offset where the user data is stored inside the tuple.

Tuple header layout

TOAST (The Oversized-Attribute Storage Technique)

Looking at page size it is easy to discover that some data cannot be stored in such a small space. For these cases there is a mechanism called TOAST.

By default PostgreSQL has two variables, toast_tuple_threshold and toast_tuple_target with value 2K. When a tuple is being stored and is larger than 2K, the fields where it can be applied (not all of them apply to TOAST) are stored in a TOAST table.

Tomas Vondra: On the impact of full-page writes

From Planet PostgreSQL. Published on Nov 23, 2016.

While tweaking postgresql.conf, you might have noticed there’s an option called full_page_writes. The comment next to it says something about partial page writes, and people generally leave it set to on – which is a good thing, as I’ll explain later in this post. It’s however useful to understand what full page writes do, because the impact on performance may be quite significant.

Unlike my previous post on checkpoint tuning, this is not a guide how to tune the server. There’s not much you can tweak, really, but I’ll show you how some application-level decisions (e.g. choice of data types) may interact with full page writes.

Partial Writes / Torn Pages

So what are full page writes about? As the comment in postgresql.conf says it’s a way to recover from partial page writes – PostgreSQL uses 8kB pages (by default), but other parts of the stack use different chunk sizes. Linux filesystems typically use 4kB pages (it’s possible to use smaller pages, but 4kB is the max on x86), and at the hardware level the old drives used 512B sectors while new devices often write data in larger chunks (often 4kB or even 8kB).

So when PotgreSQL writes the 8kB page, the other layers of the storage stack may break this into smaller chunks, managed separately. This presents a problem regarding write atomicity. The 8kB PostgreSQL page may be split into two 4kB filesystem pages, and then into 512B sectors. Now, what if the server crashes (power failure, kernel bug, …)?

Even if the server uses storage system designed to deal with such failures (SSDs with capacitors, RAID controllers with batteries, …), the kernel already split the data into 4kB pages. So it’s possible that the database wrote 8kB data page, but only part of that made it to disk before the crash.

At this point you’re now probably thinking that this is exactly why we have transaction log (WAL), and you’re right! So after starting the server, the database will read WAL (since the last completed checkpoint), and apply the changes again to make sure the data files are complete. Simple.

But there’s a catch – the recovery does not apply the changes blindly, it often needs to read the data pages etc. Which assumes that the page is not already borked in some way, for example due to a partial write. Which seems a bit self-contradictory, because to fix data corruption we assume there’s no data corruption.

Full page writes are a way around this conundrum – when modifying a page for the first time after a checkpoint, the whole page is written into WAL. This guarantees that during recovery, the first WAL record touching a page contains the whole page, eliminating the need to read the – possibly broken – page from data file.

Write amplification

Of course, the negative consequence of this is increased WAL size – changing a single byte on the 8kB page will log the whole into WAL. The full page write only happens on the first write after a checkpoint, so making checkpoints less frequent is one way to improve the situation – typically, there’s a short “burst” of full page writes after a checkpoint, and then relatively few full page writes until the end of a checkpoint.

UUID vs. BIGSERIAL keys

But there are some unexpected interactions with design decisions made at the application level. Let’s assume we have a simple table with primary key, either a BIGSERIAL or UUID, and we insert data into it. Will there be a difference in the amount of WAL generated (assuming we insert the same number of rows)?

It seems reasonable to expect both cases to produce about the same amount of WAL, but as the following charts illustrate, there’s a huge difference in practice.

fpw-insert-integer-vs-uuid

This shows the amount of WAL produced during a 1h benchmark, throttled to 5000 inserts per second. With BIGSERIAL primary key this produces ~2GB of WAL, while with UUID it’s more than 40GB. That’s quite a significant difference, and quite clearly most of the WAL is associated with index backing the primary key. Let’s look as types of WAL records.

fpw-insert-integer-vs-uuid-details

Clearly, vast majority of the records are full-page images (FPI), i.e. the result of full-page writes. But why is this happening?

Of course, this is due to the inherent UUID randomness. With BIGSERIAL new are sequential, and so get inserted to the same leaf pages in the btree index. As only the first modification to a page triggers the full-page write, only a tiny fraction of the WAL records are FPIs. With UUID it’s completely different case, of couse – the values are not sequential at all, in fact each insert is likely to touch completely new leaf index leaf page (assuming the index is large enough).

There’s not much the database can do – the workload is simply random in nature, triggering many full-page writes.

It’s not difficult to get similar write amplification even with BIGSERIAL keys, of course. It only requires different workload – for example with UPDATE workload, randomly updating records with uniform distribution, the chart looks like this:

fpw-update-uniform-integer-vs-uuid

Suddenly, the differences between data types are gone – the access is random in both cases, resulting in almost exactly the same amount of WAL produced. Another difference is that most of the WAL is associated with “heap”, i.e. tables, and not indexes. The “HOT” cases were designed to allow HOT UPDATE optimization (i.e. update without having to touch an index), which pretty much eliminates all index-related WAL traffic.

But you might argue that most applications don’t update the whole data set. Usually, only a small subset of data is “active” – people only access posts from the last few days on a discussion forum, unresolved orders in an e-shop, etc. How does that change the results?

Thankfully, pgbench supports non-uniform distributions, and for example with exponential distribution touching 1% subset of data ~25% of the time, the chart looks like this:

fpw-update-25-integer-vs-uuid

And after making the distribution even more skewed, touching the 1% subset ~75% of the time:

fpw-update-75-integer-vs-uuid

This again shows how big difference the choice of data types may make, and also the importance of tuning for HOT updates.

8kB and 4kB pages

An interesting question is how much WAL traffic could we save by using smaller pages in PostgreSQL (which requires compiling a custom package). In the best case, it might save up to 50% WAL, thanks to logging only 4kB instead of 8kB pages. For the workload with uniformly distributed UPDATEs it looks like this:

fpw-update-uniform-4k-8k-integer-vs-uuid

So the save is not not exactly 50%, but reduction from ~140GB to ~90GB is still quite significant.

Do we still need full-page writes?

It might seems like a ridiculous after explaining the danger of partial writes, but maybe disabling full page writes might be a viable option, at least in some cases.

Firstly, I wonder whether modern Linux filesystems are still vulnerable to partial writes? The parameter was introduced in PostgreSQL 8.1 released in 2005, so perhaps some of the many filesystem improvements introduced since then make this a non-issue. Probably not universally for arbitrary workloads, but maybe assuming some additional condition (e.g. using 4kB page size in PostgreSQL) would be sufficient? Also, PostgreSQL never overwrites only a subset of the 8kB page – the whole page is always written out.

I’ve done a lot of tests recently trying to trigger a partial write, and I haven’t managed to cause yet a single case. Of course, that’s not really proof the issue does not exist. But even if it’s still an issue, data checksums may be sufficient protection (it won’t fix the issue, but will at least let you know there’s a broken page).

Secondly, many systems nowadays rely on streaming replication replicas – instead of waiting for the server to reboot after a hardware issue (which can take quite a long time) and then spend more time performing recovery, the systems simply switch to a hot standby. If the database on the failed primary is removed (and then cloned from the new primary), partial writes are a non-issue.

But I guess if we started recommending that, then “I don’t know how the data got corrupted, I’ve just set full_page_writes=off on the systems!” would become one of the most common sentences right before death for DBAs (together with the “I’ve seen this snake on reddit, it’s not poisonous.”).

Summary

There’s not much you can do to tune full-page writes directly. For most workloads, most full-page writes happen right after a checkpoint, and then disappear until the next checkpoint. So it’s important to tune checkpoints not to happen too often.

Some application-level decisions may increase randomness of writes to tables and indexes – for example UUID values are inherently random, turning even simple INSERT workload into random index updates. The schema used in the examples was rather trivial – in practice there will be secondary indexes, foreign keys etc. But using BIGSERIAL primary keys internally (and keeping the UUID as surrogate keys) would at least reduce the write amplification.

I’m really interested in discussion about the need for full-page writes on current kernels / filesystems. Sadly I haven’t found many resources, so if you have relevant info, let me know.

Leo Hsu and Regina Obe: ODBC FDW now supports 9.5 and 9.6

From Planet PostgreSQL. Published on Nov 22, 2016.

A while ago when Foreign Data Wrappers in PostgreSQL was a fairly new thing, we talked about the ODBC_FDW foreign data wrapper. Since then, people have been asking us how to get the ODBC FDW to work on newer PostgreSQL. Sadly the ODBC_FDW was stuck in time not having updated to newer FDW API standards. Our recommendation was just to use OGR_FDW, which many distributions both Linux and Windows have compiled OGR_FDW with ODBC support. True that OGR_FDW is coined as a spatial data wrapper, but the reality is spatial data rarely lives apart from regular attribute data so a good spatial vector driver supports both vector data and bread and butter data types. OGR_FDW is still our go to for working with spreadsheets and folders of CSV files.

Recently the fine folks at Carto patched the ODBC FDW to work with PostgreSQL 9.5. I do hope they accept my modest patch to make it work with PostgreSQL 9.6 as well. So now 2 FDWs to choose from for connecting to ODBC datasources. Which one is better? The answer as most always is IT DEPENDS.


Continue reading "ODBC FDW now supports 9.5 and 9.6"

Django Multiple Files Upload Using Ajax

By Simple is Better Than Complex from Django community aggregator: Community blog posts. Published on Nov 22, 2016.

In this tutorial I will guide you through the steps to implement an AJAX multiple file upload with Django using jQuery. For this tutorial we will be using a specific plug-in called jQuery File Upload, which takes care of the server communication using AJAX and also the compatibility with different browsers.

The plug-in is great, but it have so many features that sometimes it can become challenging for some to get started. You will notice that some of the examples are a little bit redundant, repeating code and so on. That’s on purpose, so to avoid code abstraction and the examples become more clear.

In the end of this post you will also find the link to download all the code used in this tutorial.


Basic Configuration

Before you move forward, if you are not familiar at all with file upload with Django, it is a good a idea to check this post I published while ago: How to Upload Files With Django. It will give you an overview of the basics and some caveats.

To work with file upload you will need to set the MEDIA_URL and MEDIA_ROOT.

settings.py

MEDIA_URL = '/media/'
MEDIA_ROOT = os.path.join(BASE_DIR, 'media')

And to test the uploads in the development environment, add this to the button of your root urlconf:

urls.py

from django.conf import settings
from django.conf.urls import url, include
from django.conf.urls.static import static

urlpatterns = [
    # ...
]

if settings.DEBUG:
    urlpatterns += static(settings.MEDIA_URL, document_root=settings.MEDIA_ROOT)

Now the rest are static assets, mostly JavaScript assets. Here’s what we are going to use:

Now you will see that the jQuery File Upload comes with several script files, they all have a purpose and you will only need some of them for certain features.

Here is how my base template looks like:

base.html

{% load static %}<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>{% block title %}Photos Library - Simple is Better Than Complex{% endblock %}</title>
    <link href="{% static 'css/bootstrap.min.css' %}" rel="stylesheet">
    <!--[if lt IE 9]>
      <script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
      <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
  </head>
  <body>
    {% include 'includes/header.html' %}
    <div class="container">
      {% block content %}
      {% endblock %}
    </div>
    <script src="{% static 'js/jquery-3.1.1.min.js' %}"></script>
    <script src="{% static 'js/bootstrap.min.js' %}"></script>
    {% block javascript %}
    {% endblock %}
  </body>
</html>

Bootstrap and jQuery in the base template, and the jQuery File Upload plug-in will be added using the {% block javascript %}. We will add them as we need.


Working Example

We will be working in a app called photos. Consider the model below for the rest of this tutorial:

models.py

class Photo(models.Model):
    title = models.CharField(max_length=255, blank=True)
    file = models.FileField(upload_to='photos/')
    uploaded_at = models.DateTimeField(auto_now_add=True)

And the form, as simple as:

forms.py

from .models import Photo

class PhotoForm(forms.ModelForm):
    class Meta:
        model = Photo
        fields = ('file', )

PS: Perhaps using a ImageField instead of FileField would be more suitable for the use case, but to avoid the hustle of installing Pillow locally, let’s just use FileField. So we have an easier to run example.


Basic File Upload

First we need a route:

urls.py

from django.conf.urls import url
from . import views

urlpatterns = [
    url(r'^basic-upload/$', views.BasicUploadView.as_view(), name='basic_upload'),
]

A view to process the upload, nothing special:

views.py

from django.shortcuts import render
from django.http import JsonResponse
from django.views import View

from .forms import PhotoForm
from .models import Photo

class BasicUploadView(View):
    def get(self, request):
        photos_list = Photo.objects.all()
        return render(self.request, 'photos/basic_upload/index.html', {'photos': photos_list})

    def post(self, request):
        form = PhotoForm(self.request.POST, self.request.FILES)
        if form.is_valid():
            photo = form.save()
            data = {'is_valid': True, 'name': photo.file.name, 'url': photo.file.url}
        else:
            data = {'is_valid': False}
        return JsonResponse(data)

We are using a basic Class-Based View, defining two request processing for the GET and POST methods. Essentially when we access the page where we handle the upload, we are going to show the user a list of uploaded photos.

For the POST, it is where the upload handling happens, using Django’s Model Forms. When we call form.save(), Django will create a Photo instance and save the file in the file system.

We will follow the same strategy for the next examples as well.

Now where the magic happens, in the client side. First the required jQuery File Upload scripts:

photos/basic_upload/index.html

{% block javascript %}
  {# JQUERY FILE UPLOAD SCRIPTS #}
  <script src="{% static 'js/jquery-file-upload/vendor/jquery.ui.widget.js' %}"></script>
  <script src="{% static 'js/jquery-file-upload/jquery.iframe-transport.js' %}"></script>
  <script src="{% static 'js/jquery-file-upload/jquery.fileupload.js' %}"></script>

  {# PHOTOS PAGE SCRIPTS #}
  <script src="{% static 'photos/js/basic-upload.js' %}"></script>
{% endblock %}
  • jquery.ui.widget.js It’s a dependency for the plug-in
  • jquery.iframe-transport.js The Iframe Transport is required for browsers without support for XHR file uploads
  • jquery.fileupload.js The basic File Upload plug-in

And finally the script basic-upload.js is where we will implement our photo upload.

Now let’s see the rest of our template and explore the meat of the page:

photos/basic_upload/index.html

{# 1. BUTTON TO TRIGGER THE ACTION #}
<button type="button" class="btn btn-primary js-upload-photos">
  <span class="glyphicon glyphicon-cloud-upload"></span> Upload photos
</button>

{# 2. FILE INPUT TO BE USED BY THE PLUG-IN #}
<input id="fileupload" type="file" name="file" multiple
       style="display: none;"
       data-url="{% url 'photos:basic_upload' %}"
       data-form-data='{"csrfmiddlewaretoken": "{{ csrf_token }}"}'>

{# 3. TABLE TO DISPLAY THE UPLOADED PHOTOS #}
<table id="gallery" class="table table-bordered">
  <thead>
    <tr>
      <th>Photo</th>
    </tr>
  </thead>
  <tbody>
    {% for photo in photos %}
      <tr>
        <td><a href="{{ photo.file.url }}">{{ photo.file.name }}</a></td>
      </tr>
    {% endfor %}
  </tbody>
</table>

First, block 1 of the snippet, is the button to start the workflow. We will hook into the css class .js-upload-photos to open the file explorer window.

Block 2 is the most important part of the page. It’s the the file input that will be used to load the jQuery File Upload component. A few things to note:

  • The name of the input must match with the Model Form Field. That is, if the field is called document, the input must be named name="document" instead. Otherwise the processing of the file will fail.
  • The multiple attribute will enable multiple file selection in the file explorer window.
  • The data-url attribute must point to the route/view where the file form will be processed.
  • The data-form-data attribute should be defined exactly this way! Don’t try to swap the " and ' of the HTML. It won’t work. This line is important so to instruct the plug-in to send the file along with the csrf middleware token.

Finally, block 3 is just a regular table displaying the photos.

Now let’s explore the page’s script, which is responsible for putting every piece of this puzzle together:

photos/js/basic-upload.js

$(function () {
  /* 1. OPEN THE FILE EXPLORER WINDOW */
  $(".js-upload-photos").click(function () {
    $("#fileupload").click();
  });

  /* 2. INITIALIZE THE FILE UPLOAD COMPONENT */
  $("#fileupload").fileupload({
    dataType: 'json',
    done: function (e, data) {  /* 3. PROCESS THE RESPONSE FROM THE SERVER */
      if (data.result.is_valid) {
        $("#gallery tbody").prepend(
          "<tr><td><a href='" + data.result.url + "'>" + data.result.name + "</a></td></tr>"
        )
      }
    }
  });

});

Remember how we returned the data in the View class?

def post(self, request):
    form = PhotoForm(self.request.POST, self.request.FILES)
    if form.is_valid():
        photo = form.save()
        data = {'is_valid': True, 'name': photo.file.name, 'url': photo.file.url}
    else:
        data = {'is_valid': False}
    return JsonResponse(data)

This JsonResponse will end up in the data parameter, passed to the anonymous function hooked to the done event of the File Upload component.

See what we are doing here? When we are accessing data.result.name, we are accessing the name we returned in the JsonResponse. So, let’s say, if we returned:

return JsonResponse({'message': 'Success'})

We would be able to catch it inside the done function, this way:

done: function (e, data) {
  if (data.result.message === 'Success') {
    // do something...
  }
}

But, what we are actually doing there is putting some pieces of HTML together and prepending it to the table body.

Let’s look at some screen shots to see how it looks like:

Basic File Upload

Then when the user clicks in the Upload photos button:

Basic File Upload

The user selects as may files as he wants and hit the Open button:

Basic File Upload

And all the files are uploaded to the server and added to the table, with a valid link! This is a minimum example. When you grab the code you will see it is really easy to reproduce.

It’s great already! Works on most major web browsers (IE 6.0+ for example!). And it is just a matter of what data to return, and what to do with it.


Displaying the Progress

An improvement we can make, is to add a progress bar, to not let the users hanging without knowing what is going on.

In our template, add the following snippet. It’s a Bootstrap modal:

photos/basic_upload/index.html

<div class="modal fade" id="modal-progress" data-backdrop="static" data-keyboard="false">
  <div class="modal-dialog">
    <div class="modal-content">
      <div class="modal-header">
        <h4 class="modal-title">Uploading...</h4>
      </div>
      <div class="modal-body">
        <div class="progress">
          <div class="progress-bar" role="progressbar" style="width: 0%;">0%</div>
        </div>
      </div>
    </div>
  </div>
</div>

Let’s improve the JavaScript part so to calculate the progress:

photos/js/basic-upload.js

$(function () {

  $(".js-upload-photos").click(function () {
    $("#fileupload").click();
  });

  $("#fileupload").fileupload({
    dataType: 'json',
    sequentialUploads: true,  /* 1. SEND THE FILES ONE BY ONE */
    start: function (e) {  /* 2. WHEN THE UPLOADING PROCESS STARTS, SHOW THE MODAL */
      $("#modal-progress").modal("show");
    },
    stop: function (e) {  /* 3. WHEN THE UPLOADING PROCESS FINALIZE, HIDE THE MODAL */
      $("#modal-progress").modal("hide");
    },
    progressall: function (e, data) {  /* 4. UPDATE THE PROGRESS BAR */
      var progress = parseInt(data.loaded / data.total * 100, 10);
      var strProgress = progress + "%";
      $(".progress-bar").css({"width": strProgress});
      $(".progress-bar").text(strProgress);
    },
    done: function (e, data) {
      if (data.result.is_valid) {
        $("#gallery tbody").prepend(
          "<tr><td><a href='" + data.result.url + "'>" + data.result.name + "</a></td></tr>"
        )
      }
    }

  });

});

The sequentialUploads attribute will instruct the component to send one file at a time. What we are doing here basically is showing the loading modal when the upload starts, closing it when it finalizes, meanwhile we update the percentage in the progress bar. The rest is the same from the other example.

Here is how it will look like:

Progress Bar


Drag and Drop Upload

Something great about this plug-in, by default you can drop files anywhere in the page to start the upload process! This mean the previous examples will simply work.

What we can do to help the users discover the functionality is just adding this simple HTML, for example:

photos/basic_upload/index.html

<div class="well text-muted text-center" style="padding-top: 4rem; padding-bottom: 4rem;">
  <span class="glyphicon glyphicon-arrow-down" style="font-size: 4rem;"></span>
  <h3>Drop Photos Here to Upload</h3>
</div>

And the result will be something like this:

Drag and Drop Upload


Conclusions

I wanted to give more tips and examples about the jQuery File Upload usage. But the post was already getting big. I will continue this post next week. So, let me know what you want to see next!

Anyway, the examples shown in this tutorial should cover the Django part. The rest is a matter of configuration and playing with the plug-in. For more details about its API and features, refer to the official documentation: jQuery File Upload Docs

The code used in this tutorial is available on GitHub: github.com/sibtc/multiple-file-upload, so you can try it locally. It’s very straightforward to get it running.

Magnus Hagander: A more secure Planet PostgreSQL

From Planet PostgreSQL. Published on Nov 22, 2016.

Today, Planet PostgreSQL was switched over from http to https. Previously, https was only used for the logged in portions for blog owners, but now the whole site uses it. If you access the page with the http protocol, you will automatically be redirected to https.

As part of this, the RSS feeds have also changed address from http to https (the path part of the URLs remain unchanged). If your feed reader does not automatically follow redirects, this will unfortunately make it stop updating until you have changed the URL.

In a couple of days we will enable HTTP Strict Transport Security on the site as well.

We apologize for the inconvenience for those of you who have to reconfigure your feeds, but we hope you agree the change is for the better.

Douglas Hunley: PostgreSQL logging, strftime, and you

From Planet PostgreSQL. Published on Nov 21, 2016.

PostgreSQL has a pretty extensive logging facility. I've talked briefly about configuring it to get the most out of pgBadger before, but today I'm gonna talk a bit about the naming of the log file itself. The chosen filename doesn't have to be static. You can, in fact, have the name dynamically created by using strftime() escapes. But what exactly are those?

    The strftime() function formats the information from timeptr into the buffer s,
    according to the string pointed to by format.

    The format string consists of zero or more conversion specifications and ordi-
    nary characters.  All ordinary characters are copied directly into the buffer.
    A conversion specification consists of a percent sign ```%''' and one other
    character.

Say what? Essentially, given a timestamp and a format specification, you'll get the timestamp back in a different formatted output. So what are these format specifications? Well, they are defined by your systems libc implementation. On Linux, this is (typically) glibc, whereas on OSX and the BSDs, it's BSD libc. You really shouldn't see a difference between these two, but it could happen. On macOS Sierra, they are (for the examples below, we'll use today's date of Monday, November 21, 2016 with a timestamp of 1pm EST):

    %A    is replaced by national representation of the full weekday name ("Monday")
    %a    is replaced by national representation of the abbreviated weekday name ("Mon")
    %B    is replaced by national representation of the full month name ("November")
    %b    is replaced by national representation of the abbreviated month name ("Nov")
    %C    is replaced by (year / 100) as decimal number; single digits are preceded by a zero ("20")
    %c    is replaced by national representation of time and date ("Mon Nov 21 13:00:00 2016")
    %D    is equivalent to ``%m/%d/%y'' ("11/21/16")
    %d    is replaced by the day of the month as a decimal number (01-31) ("21")
    %e    is replaced by the day of the month as a decimal number (1-31); single digits are preceded by a blank ("21")
    %F    is equivalent to ``%Y-%m-%d'' ("2016-11-21")
    %G    is replaced by a year as a decimal number with century.  This year is the one that contains the greater part of the week (Monday as the first day of the week) ("2016")
    %g    is replaced by the same year as in ``%G'', but as a decimal number without century (00-99) ("16")
    %H    is replaced by the hour (24-hour clock) as a decimal number (00-23) ("12")
    %h    the same as %b ("Nov")
    %I    is replaced by the hour (12-hour clock) as a decimal number (01-12) ("12")
    %j    is replaced by the day of the year as a decimal number (001-366) ("326")
    %k    is replaced by the hour (24-hour clock) as a decimal number (0-23); single digits are preceded by a blank ("13")
    %l    is replaced by the hour (12-hour clock) as a decimal number (1-12); single digits are preceded by a blank (" 1")
    %M    is replaced by the minute as a decimal number (00-59) ("00")
    %m    is replaced by the month as a decimal number (01-12) ("11")
    %n    is replaced by a newline
    %p    is replaced by national representation of either "ante meridiem" (a.m.) or "post meridiem" (p.m.)  as appropriate ("PM")
    %R    is equivalent to ``%H:%M'' ("13:00")
    %r    is equivalent to ``%I:%M:%S %p'' ("01:00:00 PM")
    %S    is replaced by the second as a decimal number (00-60) ("00")
    %s    is replaced by the number of seconds since the Epoch, UTC ("1479751200")
    %T    is equivalent to ``%H:%M:%S'' ("13:00:00")
    %t    is replaced by a tab
    %U    is replaced by the week number of the year (Sunday as the first day of the week) as a decimal number (00-53) ("47")
    %u    is replaced by the weekday (Monday as the first day of the week) as a decimal number (1-7) ("1")
    %V    is replaced by the week number of the year (Monday as the first day of the week) as a decimal number (01-53).  If the week containing January 1 has four or more days in the new year, then it is week 1; otherwise it is the last week of the previous year, and the next week is week 1 ("47")
    %v    is equivalent to ``%e-%b-%Y'' ("21-Nov-2016")
    %W    is replaced by the week number of the year (Monday as the first day of the week) as a decimal number (00-53) ("47")
    %w    is replaced by the weekday (Sunday as the first day of the week) as a decimal number (0-6) ("1")
    %X    is replaced by national representation of the time ("13:00:00")
    %x    is replaced by national representation of the date ("11/21/2016")
    %Y    is replaced by the year with century as a decimal number ("2016")
    %y    is replaced by the year without century as a decimal number (00-99) ("16")
    %Z    is replaced by the time zone name ("EST")
    %z    is replaced by the time zone offset from UTC; a leading plus sign stands for east of UTC, a minus sign for west of UTC, hours and minutes follow with two digits each and no delimiter between them ("-0500")
    %%    is replaced by `%'

Phew! That was a lot wasn't it? And where exactly does this come into play? As the postgresql.conf says:

log_filename = 'postgresql-%a.log'      # log file name pattern,
                                        # can include strftime() escapes

So, you can use any of the escapes above to craft a filename that automagically gets updated according to the current timestamp. In my example above, I'm getting PostgreSQL to create a new logfile with the local weekday abbreviation. So my $PGDATA/pg_log directory will only ever contain:

postgresql-Sun.log postgresql-Mon.log postgresql-Tue.log
postgresql-Wed.log postgresql-Thu.log postgresql-Fri.log
postgresql-Sat.log

But wait! The key to all this is that PostgreSQL only evaluates the filename when it first opens/creates the logfile. So, if you start your cluster on Mon and do not restart it and don't have it configured to log rotate, you'll still be writing to postgresql-Mon.log forever. Fortunately, PostgreSQL has you covered here too:

log_rotation_age = 1d      # Automatic rotation of logfiles will
                           # happen after that time.  0 disables.
log_rotation_size = 10MB   # Automatic rotation of logfiles will
                           # happen after that much log output.
                           # 0 disables.

So, using my log_filename from above, I enable log_rotation_age (set to 1 day) and disable log_rotation_size and I automatically get a new log every day. If I wanted hourly logs, I could do something like:

log_filename = 'postgresql-%a-%H'
log_rotation_age = 1h

which would give me logs of postgresql-Mon-00, postgresql-Mon-01, etc. You should be able to see how combining these three postgresql.conf parameters and some crafty strftime() escapes gives you a ton of flexibility in your logging. So go forth and tweak those logs!

Federico Campoli: Brighton PostgreSQL Meetup last presentation and recording

From Planet PostgreSQL. Published on Nov 20, 2016.

This is the recording of the pg_chameleon presentation I gave at the Brighton PostgreSQL Meetup

The slides are available here
 http://www.slideshare.net/FedericoCampoli/pg-chameleon-mysql-to-postgresql-replica

Django Grils- Kraków #3

By Krzysztof Żuraw Personal Blog from Django community aggregator: Community blog posts. Published on Nov 20, 2016.

As I said many times on this blog I really like teaching others so I can improve myself. That's why when I heard about Django Girls Kraków I didn't hesitate and I joined this event as a coach. This is short recap from Django Girls Kraków #3.

Installation party

The main event was held on Saturday but the day before there was a small installation party when for two hours girls were installing necessary tools for workshops such as python, django virtualenv and git. When it comes to my team there were 3 girls on it: Joanna, Olga and Magda. Before the Django Girls organizators came up with a wonderful idea that to get to know everyone in the team a little bit better, every person has to write a few sentences about themselves. Thanks to that there were already conversation starters. The installation went well without any major problems (considered that girls used Windows). After the installation party there was a pleasant surprise - dinner for coaches to thank for their work. Super cool!

Workshop day

Workshops started early - at 9 am. Girls started working on django girls tutorial. I decided to do the same - if I hadn't done this I wouldn't have known where problems could have occurred. Fortunately there is a windows virtual machine image so I could work in the same environment as my protégés. I have to say that the whole workshop lasted 10 hours and it was really demanding to be focused during this whole time. Because of that, there were breaks for lunch or contest. What is more, a small session of lightning talks was held. I have to say girls did an amazing job so I didn't have much to do but some problems were not trivial. I learn not to do something called 'dive back driving' so doing everything rather than letting girls do it and learn along the way. I also have to train my patience.

Conclusion

I learnt quite a few things during these workshops. But I wanted to thank organizators for their hard work on this event. Moreover, I wanted to thank my protégés - girls you did an amazing job- keep it up!

Special thanks to Kasia for being editor for this post. Thank you.

Cover photo taken by me during Django Girls Kraków #3.

Michael Paquier: Postgres 10 highlight - pg_basebackup improvements

From Planet PostgreSQL. Published on Nov 20, 2016.

Lately three improvements and behavior changes have been added to pg_basebackup that will be part of Postgres 10.

The first one relates to the state of an on-disk backup folder on failure. In short, in the event of a failure, pg_basebackup will remove an folders it has created when processing. This has been introduced by commit 9083353.

There are a couple of things to be aware of when using this feature though:

  • When pg_basebackup is stopped with a signal, no cleanup action is taken.
  • The format used, either plain or tar, does not matter, cleanup actions are taken in the target directory.
  • When including WAL segments in the backup directory, cleanup actions are taken as well.

In order to get the pre-10 behavior, one can specified the new option –no-clean to keep around the folders and contents created. This is mainly useful for test and development purposes, as was the pre-10 behavior. And the new default removes the need to remove manually the target directory used, that very contains a cluster data in a corrupted state anyway, still keeping it around may be useful for debugging.

A second thing to be aware of is that support for –xlog-stream=stream has been added for the tar mode, support added by commit 56c7d8d. So commands of the following type are in Postgres 10 not a problem anymore:

pg_basebackup -D $PGDATA --xlog-method=stream --format=t

As the WAL streaming happens in a different process forked from the main one in charge of taking the base backup, this creates a second tar file named pg_wal.tar. Hence the base backup would finish with a tar file for each tablespace, as well as the contents to save into the folder pg_wal/ (pg_xlog/ for pre-10 clusters). That’s the main point to be aware of: when restoring a backup from a tar-formatted method, the contents of the newly-created tar file need to be of course untar’ed, but more importantly copied into their correct place.

A third thing that has been improved in pg_basebackup is the handling of a couple of folders that are now excluded from a base backup. This has been added by commit 6ad8ac6. Here is the list of the folders whose symlinks in the source server are changed into empty folders in the base backup:

  • pg_notify/ for NOTIFY/LISTEN contents.
  • pg_serial/ for serializable transaction commits.
  • pg_snapshots/ for external snapshot data.
  • pg_stat_tmp/, default path for temporary statistics.
  • pg_subtrans/ for sub-transaction status data.

Note that the pre-10 behavior is actually failing to handle symlinks of those paths, so be aware of the limitation in this case. For example pg_stat_tmp/ is the default setting for hte temporary statistics directory though instead of specifying an absolute path in postgresql.conf some users prefer keeping the default value and use instead a symlink to a different destination. Also an important thing to notice is that pg_xlog (or pg_wal for version 10), as well as pg_replslot/ are already included as empty directories in a base backup if they are present as symlinks in the source server.

Since all those features have been committed, there are no other developments in the plans for Postgres 10 at the moment this post is written. There is of course no guarantee than nothing else will happen but the current state of things gives a good image of what pg_basebackup will be able to do when the next major version of Postgres is released.

Ernst-Georg Schmid: tcn again

From Planet PostgreSQL. Published on Nov 18, 2016.

In the previous post, I suggested that contrib/tcn for example could be used to selectively update caches.

OK, here is my try:


By touching the cache only when the database signals a change to the base table and using the passed information to touch the affected rows only, this should be much more effective than other strategies like routinely invalidating the cache every n minutes or so.

Shaun M. Thomas: PG Phriday: Primal Planner Prep

From Planet PostgreSQL. Published on Nov 18, 2016.

The Postgres query planner is house of cards built upon the ever-shifting sand of our data. It has the utterly impossible mission of converting our ridiculous and inane requests into a logical series of fetch, filter, sort, join, and other instructions. Then the resulting steps must be ruthlessly efficient or the execution phase could very well saturate every hardware resource available; Set Theory isn’t very forgiving.

Forewarned is forearmed is very apt when applied to database query planners. Without proper statistics, they are reduced to assumptions that make adequate first approximations. But scale is the utter enemy of imprecision, as multiplicative effects quickly overwhelm reality. This allows seemingly simple report scripts to fall endlessly into a pit of smoldering system resources.

To perhaps translate that analogy a bit, let’s start with a very basic schema:

CREATE TABLE sensor_log (
  id            SERIAL PRIMARY KEY NOT NULL,
  location      VARCHAR NOT NULL,
  reading       BIGINT NOT NULL,
  reading_date  TIMESTAMP NOT NULL
);
 
INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 1000, s.id % 100,
       CURRENT_DATE - (s.id || 's')::INTERVAL
  FROM generate_series(1, 5000000) s(id);
 
CREATE INDEX idx_sensor_log_reading_date
    ON sensor_log (reading_date DESC);
 
ANALYZE sensor_log;

It’s just a basic sensor log table with a mere five million rows. There are only one thousand sensors spread across one hundred locations, and readings are captured once per second. There’s nothing ground-breaking here.

Given how unassuming this structure appears, imagine we have a regularly scheduled script that executes in multiple steps. Each step produces an intermediate UNLOGGED TABLE because a chance the script is reentrant and we taught our users that valuable trick.

But there’s a problem. Execution time, disk IO, and CPU usage are all much higher than we’d like. If we investigated the first two parts of the script, we might see something like this:

CREATE UNLOGGED TABLE recent_info AS
SELECT * FROM sensor_log
 WHERE reading_date >= CURRENT_DATE - INTERVAL '5 day';
 
EXPLAIN ANALYZE
SELECT * 
  FROM sensor_log
 WHERE id IN (
         SELECT id FROM recent_info
          WHERE reading BETWEEN 10 AND 50
       );
 
                              QUERY PLAN
------------------------------------------------------------------------
 Nested Loop  (cost=6966.50..6977.01 ROWS=2500000 width=23)
              (actual TIME=225.389..2151.500 ROWS=177120 loops=1)
   ->  HashAggregate  (cost=6966.07..6966.09 ROWS=2 width=4)
                      (actual TIME=225.342..334.923 ROWS=177120 loops=1)
         GROUP KEY: r.id
         ->  Seq Scan ON recent_info r
                 (cost=0.00..6962.56 ROWS=1404 width=4)
                 (actual TIME=0.030..135.195 ROWS=177120 loops=1)
               FILTER: ((reading >= 10) AND (reading <= 50))
               ROWS Removed BY FILTER: 254880
   ->  INDEX Scan USING sensor_log_pkey ON sensor_log s
           (cost=0.43..5.45 ROWS=1 width=23)
           (actual TIME=0.009..0.010 ROWS=1 loops=177120)
         INDEX Cond: (id = r.id)
 
 Planning TIME: 0.290 ms
 Execution TIME: 2164.980 ms

The first step creates an unlogged table to store some small fraction of the source data. That in itself is extremely common. The problems start immediately when we try to use that intermediate table as the basis for further tables. If we examine the query that builds the second table, it’s readily apparent something is horribly wrong.

Don’t worry about learning to read EXPLAIN output. In all of this output, the first set of parentheses is the planner estimate for that particular step, while the second outlines what actually happened. If we just compare the expected row counts between the two, the estimate of matched rows in recent_info was off by two orders of magnitude. Due to this drastic underestimation, the planner figured it would be faster to loop through the 1404 rows, and find corresponding matches in sensor_log.

Well, there’s a dramatic difference in looping over 1,000 matches and nearly 200,000. Our example isn’t terrible due to the scale, but a real system likely dwarfs ours by a factor of 100 at minimum. Errors scale, and not always linearly.

Postgres has a mechanism for fixing bad estimates like this called ANALYZE. There’s even a series of background workers with the sole duty of analyzing tables as data accumulates. If we inserted an arbitrary pause in the script, it’s possible one of those workers would eventually accumulate the missing statistics and transform the planner assumptions into hard facts. Or we could force the issue and analyze the table ourselves.

Let’s see how that changes the execution plan:

ANALYZE recent_info;
 
EXPLAIN ANALYZE
SELECT * 
  FROM sensor_log
 WHERE id IN (
         SELECT id FROM recent_info
          WHERE reading BETWEEN 10 AND 50
       );
 
                              QUERY PLAN
------------------------------------------------------------------------
 MERGE Semi JOIN  (cost=26533.87..42842.33 ROWS=176707 width=23)
                  (actual TIME=195.161..419.301 ROWS=177120 loops=1)
   MERGE Cond: (s.id = r.id)
   ->  INDEX Scan USING sensor_log_pkey ON sensor_log s
           (cost=0.43..141129.93 ROWS=5000000 width=23)
           (actual TIME=0.010..92.786 ROWS=431951 loops=1)
   ->  Materialize  (cost=26533.15..27416.68 ROWS=176707 width=4)
                    (actual TIME=195.130..237.153 ROWS=177120 loops=1)
         ->  Sort  (cost=26533.15..26974.92 ROWS=176707 width=4)
                   (actual TIME=195.125..218.283 ROWS=177120 loops=1)
               Sort KEY: r.id
               Sort Method: external sort  Disk: 2424kB
               ->  Seq Scan ON recent_info r
                       (cost=0.00..9232.00 ROWS=176707 width=4)
                       (actual TIME=0.018..93.871 ROWS=177120 loops=1)
                     FILTER: ((reading >= 10) AND (reading <= 50))
                     ROWS Removed BY FILTER: 254880
 
 Planning TIME: 0.305 ms
 Execution TIME: 426.620 ms

The degree of difference here really illustrates how well the planner adapts. Instead of a nested loop, it opted to fetch, filter, and sort the rows from recent_info and merge that into the primary key for sensor_log to find the intersection. All of our row estimates are much better, too. Why such a radical departure from the original plan?

To find that answer, we need to examine the Postgres catalog. This is where Postgres maintains everything it knows about tables, indexes, and other objects that reside within its confines. Of particular interest to us are the pg_stats view and pg_class table. This is a small part of what they contain after we analyzed recent_info:

SELECT reltuples, relpages
  FROM pg_class
 WHERE relname = 'recent_info';
 
 reltuples | relpages
-----------+----------
    432000 |     2752
 
SELECT attname, n_distinct
  FROM pg_stats
 WHERE tablename = 'recent_info';
 
   attname    | n_distinct 
--------------+------------
 id           |         -1
 location     |       1000
 reading      |        100
 reading_date |         -1

If we ran these same two queries immediately after creating recent_info, the first would report zero tuples, and the second would show no matches at all. At that point, Postgres knew next to nothing about the table, and that is reflected in the row estimates and planner decisions.

After we analyzed recent_info, Postgres garnered a vast smorgasbord of pertinent statistics. Not only does it have an approximate row count, it also knows how many data pages the table occupies. This lets Postgres calculate expense related to hardware interaction; how much work is involved with fetching and processing these rows.

What Postgres gains from the contents of pg_stats is altogether different. When Postgres analyzes a table, it performs a heuristic statistical sampling of its physical contents. This includes such facts such as most frequent values for all columns, average size of column data, the amount of distinct values in per column, and so on. As we can see, Postgres did a great job of scanning the table contents, as it identified the exact variance for our location and reading data.

Negative values in n_distinct denote a ratio between the amount of distinct values for that column and the total row count for the table. From that, we can derive that there’s basically one unique id or reading_date per row. Neat! For us mundane humans, this shows us the best columns to index. For Postgres, it will consider value histograms and multiply frequencies together and produce much more accurate row estimates. Better estimates almost always result in improved query execution times.

Without table analysis, Postgres is effectively operating blindly. Adding an ANALYZE statement after initializing every temporary or unlogged table is a critical element to script performance. Yet it’s also a step that’s often omitted. Inexperienced users aren’t aware of Postgres internals, and may not even know the ANALYZE command exists.

There is, of course, a way to cheat and prevent our users from having to become Postgres experts. Postgres added event triggers in version 9.3. These triggers can activate any time DDL is detected, and that means we can detect new tables right when they’re created.

Watch this:

CREATE OR REPLACE FUNCTION analyze_new_table()
RETURNS event_trigger AS
$$
DECLARE
  tab_name TEXT;
BEGIN
  IF tg_tag IN ('CREATE TABLE AS', 'SELECT INTO') THEN
    FOR tab_name IN
      SELECT objid::REGCLASS::TEXT
        FROM pg_event_trigger_ddl_commands()
    LOOP
      EXECUTE 'ANALYZE ' || tab_name;
    END LOOP;
  END IF;
END;
$$ LANGUAGE plpgsql;
 
CREATE EVENT TRIGGER t_analyze_new
    ON ddl_command_end 
       EXECUTE PROCEDURE analyze_new_table();

Now any time a table is created with CREATE TABLE AS or SELECT INTO, Postgres will instantly analyze the contents. Depending on the size of the table and the granularity of the default settings, this may be a relatively demanding operation. However, considering how greatly statistics determine performance, such a trigger may actually be a requirement in some settings.

The alternative is allowing illiteracy of Postgres internals or accidentally forgetting the ANALYZE clause. The consequences of which could spell the difference between a report script executing in a matter of minutes, or over the course of several grueling hours.

I know which scenario I’d prefer!

Bruce Momjian: A Dozen Conferences in Four Months

From Planet PostgreSQL. Published on Nov 18, 2016.

While considering future conference attendance, I realized there are 12 dedicated Postgres conferences or PGDays in the next four months, and nicely spread worldwide:

The continent count is:

  • Asia (3)
  • Europe (5)
  • Latin America (1)
  • North America (3)

Unfortunately, they are not spread chronologically — 8 of the 12 are in March of 2017. You can see most of these listed on the Postgres events web page.

Continue Reading »

Douglas Hunley: Upgrading PostgreSQL 5x faster

From Planet PostgreSQL. Published on Nov 18, 2016.

Upgrading your PostgreSQL database from one major version (e.g. 9.4.x) to another major version (e.g. 9.5.x) used to a painful and exceedingly slow process. You essentially had two options:

  • dump / reload the data
  • use one of the complex logical replication tools

Thankfully, the PostgreSQL team introduced pg_upgrade back in version 9.0. Because the way data is stored internally in its datafiles in PostgreSQL rarely changes, pg_upgrade is able to re-use the existing datafiles (while manipulating some catalog entries) to "short circuit" the upgrade process. While this isn't (yet) a true "in place upgrade" as done by some other databases, it's pretty close. And it's stupid fast. In my testing on my overworked Macbook Pro, it took 1/5 as long to upgrade as a traditional dump and reload. So, let's look at this process shall we?

First, we assume that we have both PostgreSQL 9.5 and 9.6 installed and both have initialized (empty) clusters (see here if you need to do this). We're going to use pgbench to create some data in our PostgreSQL 9.5 instance:

doug@Douglass-MacBook-Pro ~/foo » pg 9.5
doug@Douglass-MacBook-Pro ~/foo » createdb bench1; createdb bench2; createdb bench3
doug@Douglass-MacBook-Pro ~/foo » pgbench -i -s 15 bench1 ; pgbench -i -s 70 bench2 ; pgbench -i -s 600 bench3
doug@Douglass-MacBook-Pro ~/foo » pgbench -c 4 -j 2 -T 600 bench1 ; pgbench -c 4 -j 2 -T 600 bench2 ; pgbench -c 4 -j 2 -T 600 bench3

Now that we've got data in our cluster, we can do the dump. If this were a production instance, this is where you'd have to stop your application(s).

doug@Douglass-MacBook-Pro ~/foo » time pg_dumpall > data.sql
pg_dumpall > data.sql  20.57s user 30.63s system 4% cpu 18:43.70 total

We've now dumped out all our data, and spent 18 minutes with the application(s) down. Let's restore our data to the PostgreSQL 9.6 cluster now:

doug@Douglass-MacBook-Pro ~/foo » pg 9.6
doug@Douglass-MacBook-Pro ~/foo » time psql -f data.sql
psql -f data.sql  14.53s user 18.30s system 1% cpu 37:48.49 total

After 37 minutes, our data is back and we can start our applications back up. An outage of approximately 56.5 minutes.

Now, let's blow away our PostgreSQL 9.6 cluster and use pg_upgrade to complete the same task. You would do this with the application(s) down as well!

doug@Douglass-MacBook-Pro ~/foo » rm -fr $PGDATA/*
doug@Douglass-MacBook-Pro ~/foo » initdb $PGDATA
doug@Douglass-MacBook-Pro ~/foo » export OPGDATA=$PGDATA/../9.5
doug@Douglass-MacBook-Pro ~/foo » time pg_upgrade -d $OPGDATA -D $PGDATA -b /usr/local/opt/postgresql-9.5/bin -B /usr/local/opt/postgresql-9.6/bin
pg_upgrade -d $OPGDATA -D $PGDATA -b /usr/local/opt/postgresql-9.5/bin -B   0.40s user 12.12s system 1% cpu 10:26.64 total

And we're done in 10.5 minutes. It took 1/5 the outage of the dump / load method. And that's on my puny dataset with my overworked laptop! Pretty impressive, no?

For the curious, the pg_upgrade output that I omitted above for readability's sake is:

```

Performing Consistency Checks

Checking cluster versions ok Checking database user is the install user ok Checking database connection settings ok Checking for prepared transactions ok Checking for reg* system OID user data types ok Checking for contrib/isn with bigint-passing mismatch ok Checking for roles starting with 'pg_' ok Creating dump of global objects ok Creating dump of database schemas ok Checking for presence of required libraries ok Checking database user is the install user ok Checking for prepared transactions ok

If pg_upgrade fails after this point, you must re-initdb the new cluster before continuing.

Performing Upgrade

Analyzing all rows in the new cluster ok Freezing all rows on the new cluster ok Deleting files from new pgclog ok Copying old pgclog to new server ok Setting next transaction ID and epoch for new cluster ok Deleting files from new pgmultixact/offsets ok Copying old pgmultixact/offsets to new server ok Deleting files from new pgmultixact/members ok Copying old pgmultixact/members to new server ok Setting next multixact ID and offset for new cluster ok Resetting WAL archives ok Setting frozenxid and minmxid counters in new cluster ok Restoring global objects in the new cluster ok Restoring database schemas in the new cluster ok Copying user relation files ok Setting next OID for new cluster ok Sync data directory to disk ok Creating script to analyze new cluster ok Creating script to delete old cluster ok

Upgrade Complete

Optimizer statistics are not transferred by pgupgrade so, once you start the new server, consider running: ./analyzenew_cluster.sh

Running this script will delete the old cluster's data files: ./deleteoldcluster.sh ```

A Primer to Django Forms

By David Fozo's Blog from Django community aggregator: Community blog posts. Published on Nov 16, 2016.

 

If you want some interactivity with your users, it all starts with forms. Luckily Django provides some out of the box straightforward solution for us.

 

For this tutorial we are going to do a basic website for surveying a person's age, eye color, name and whether he wants to subscribe or not.

 

If you haven't followed along, you can initiate the tutorial repository if you download it from my Github account. Choose branch exercise3. Further instructions here.

 

So first of all, we have an idea that, we should implement a new feature. For that, we need to create a new “feature branch”. So we can freely experiment, and only merge it when the feature is properly implemented.

 

This new feature will be a form, so let's do this:

git checkout -b form

git branch

 

You can see that we have, two branches now:

* form

master

 

As good TDD development practice. Start by writing a test first. New feature deserves it's own test class. Also I know that I will need a new function from main.views. You will see that later.

***main/tests.py***

...

from main.views import home, form

class FormTest(TestCase):

 

    def test_form_renders_on_page_properly(self):

        request = HttpRequest()

        response = form(request)

        for i in ['form','input','human','color','age','name','email']:

            self.assertIn(i,response.content.decode())

 

Run the test. It should fail.

python3 manage.py test

 

Output:

Ran 5 tests in 0.321s

 

FAILED (failures=1)

Make sure it is a failure, not error (marked by E). Error means there is logic error (bug) in your test, failure means your test didn't pass.

 

So if you create a forms.py file in your application, django will automatically know it musts be a form.

touch main/forms.py

 

***main/forms.py***

from django import forms

 

class SurveyForm(forms.Form):

    human = forms.BooleanField(label="Are you human?",)

    age = forms.IntegerField(label="How old are you?")

    color = forms.ChoiceField(

        label="What color is your eye?",

        choices=(('bl','Blue'),('br','Brown'),('bl','Black'),('gr','Green')),)

    name = forms.CharField(label="What's your name?",)

    subscription = forms.BooleanField(label="Do you want to subscribe?",)

    email = forms.EmailField(label="Your email:",)

 

As you can see the forms library gives us various types of fields. These settings (“label” and occasional “choices”) are the absolute minimum to initiate a form field. If you want to see more options, check out the documentation here.

 

So we have a form.py file that describes the form, but that form also needs to be displayed somewhere. For simplicity, I suggest we make a new html template.

touch main/templates/form.html

 

You can see {{ form }} is a template variable. Don't mind the “as_p” now, it just only means that it will be rendered with a <p> tag. Above you can see {% csrf_token %}. That's Django's built in defense against Cross-Site Request Forgery. That's a malicious way to take your users sessions. Although it's optional to include, when it really comes to it, don't take risks. An input tag and all of this is wrapped between two form tags. This is the bare minimum to deploy a form on your website.

 

*** main/templates/form.html***

<!DOCTYPE HTML>

<head>

  <title>Django Forms</title>

</head>

<body>

  <h1>Survey</h1>

  <form method="POST">

    {% csrf_token %}

    {{ form.as_p }}

    <input type="submit" value="Submit" />

  </form>

</body>

</html>

 

As usual views.py will knit together the backend logic with the templates. We will deplare our form here and include it in our render function.

 

*** main/views.py ***

from django.shortcuts import render

from main.models import Article

from main.forms import SurveyForm

 

# Create your views here.

def home(request):

    article = Article.objects.last()

    return render(request,'index.html',{'article':article},)

 

def form(request):

    form = SurveyForm()

    return render(request,'form.html',{'form': form},)

 

To make things easy we will create a new URL address for the form.

*** MyTutorial/urls.py ***

 

from django.conf.urls import url

from django.contrib import admin

from main.views import home, form

 

urlpatterns = [

    url(r'^admin/', admin.site.urls),

    url(r'^form/',form),

    url(r'^$', home),

]

 

Try the test now:

python3 manage.py runtest

 

It should pass:

Creating test database for alias 'default'...

.....

----------------------------

Ran 5 tests in 0.032s

 

OK

 

Destroying test database for alias 'default'...

 

Even though we tested it, we also have to have a look on what's going on with the real site. Let's spin up the development server and have a look.

python3 manage.py runserver

 

Check http://localhost/form/ in your browser. Should look like this.:

 

 

Go back to your terminal and Ctrl+C to stop the development server.

 

All test passes and we reached our milestone. The branch is stable, so we can do a commit.

git status

 

Output:

...

modified: MyTutorial/urls.py

modified: main/tests.py

modified: main/views.py

...

main/forms.py

main/templates/form.html

...

git diff

git add .

git commit -m “Form is initiated and visible”

 

Okay. If you fill in the form and click the submit button. It seems like nothing is happening. Well, the data got submitted into your server, but for the user it's not so clear. Therefore we implement some redirection for the him to get some feedback.

 

As usual we first write a test. Second in the FormTest class. So the redirection will go like this: if you check to be subscribed, there will be a you are “You are subscribed.” page to be redirected, otherwise there will be a “Thank you!” page.:

***main/tests.py***

...

def test_form_redirection(self):

 

    def subs_n_test(subs,red_url):

        post_dict = {

            'human':True,

            'color':'bl',

            'age':29,

            'name':'David Fozo',

            'subscription':subs,

            'email':'example@gmail.com'}

        response = self.client.post('/form/',post_dict)

        self.assertRedirects(response,red_url)

 

    subs_n_test(False,'/thanks/')

    subs_n_test(True, '/subscribed/')

 

Run the test.

python3 manage.py test

 

As expected:

F.....

=============================

FAIL: test_form_redirection (main.tests.FormTest)

 

When it comes to redirecting and process the incoming data, it will be done in views.py. Import HttpResponseRedirect and do this major overwrite of the file.

***main/views.py***

from django.http import HttpResponseRedirect

...

def form(request):

    form = SurveyForm()

 

    def redirection(subscription):

        if subscription:

            return HttpResponseRedirect('/subscribed/')

        else:

            return HttpResponseRedirect('/thanks/')

 

    if request.method == 'POST':

        form = SurveyForm(request.POST)

        if form.is_valid():

            return redirection(form.cleaned_data['subscription'])

    else:

        form = SurveyForm()

 

    return render(request,'form.html',{'form': form},)

 

def subscribed(request):

return render(request,'subscribed.html')

 

def thanks(request):

return render(request,'thanks.html')

 

So what is happening here? So when someone sends a POST request (filled in form) to our /form/ url, it will pump that data into our form object (SurveyForm(request.POST)). That form object will do the validation and set it's is_valid method to False or True. If it's true there will be a redirection happening based on processed form data (in this case form.cleaned_data).

 

Let's create the two other destination websites with some command line magic:

tem=main/templates

touch $tem/thanks.html $tem/subscribed.html

 

***main/templates/thanks.html***

<!doctype html>

<html>

<head>

  <meta http-equiv="Content-type" content="text/html; charset=utf=8">

  <meta name="viewport" content="width=device-width, initial-scale=1">

</head>

<body>

  <h1>Thank you!</h1>

</body>

</html>

 

***main/templates/subscribed.html***

<!doctype html>

<html>

<head>

  <meta http-equiv="Content-type" content="text/html; charset=utf=8">

  <meta name="viewport" content="width=device-width, initial-scale=1">

</head>

<body>

  <h1>You are subscribed!</h1>

</body>

</html>

 

Give some URL address to these functions.

*** MyTutorial/urls.py***

 

from django.conf.urls import url

from django.contrib import admin

from main.views import home, form, subscribed, thanks # Changed!

 

urlpatterns = [

    url(r'^admin/', admin.site.urls),

    url(r'^form/',form),

    url(r'^thanks/',thanks),

    url(r'^subscribed/',subscribed),

    url(r'^$', home),

]

Try it out.

python3 manage.py runserver

 

Ctrl-C to finish. It seems okay, except it doesn't accept if you don't want to be subscribed. That is too aggressive for our users, so let's do some configuration in main/forms.py.

***main/forms.py***

...

    subscription = forms.BooleanField(label="Do you want to subscribe?",required=False)

Run the test:

python3 manage.py test

 

Okay, the test passes:

Creating test database for alias 'default'...

......

----------------------------------------------------------------------

Ran 6 tests in 0.042s

 

OK

Destroying test database for alias 'default'...

 

All test passes, ready to check out.

git status

...

modified: MyTutorial/urls.py

modified: main/forms.py

modified: main/tests.py

modified: main/views.py

 

Untracked files:

(use "git add <file>..." to include in what will be committed)

 

main/templates/subscribed.html

main/templates/thanks.html

git add .

git commit -m “Form redirects after user submission”

 

Okay, let's say you want to provide with some custom validation. For example, if someone says he is not human, he cannot have red eyes (silly example, right?). For this third test, I reorganized some of the code, this reorganization is called refactoring. Most important aspect is to not repeat yourself. If you have two pieces of code that seemed too similar figure out some way to make it into one variable or function. In this case I had request=HttpRequest() line and post_dict variable two times. Therefore I moved them up into the SetUp function.

 

***main/tests.py***

from main.forms import SurveyForm

...

class FormTest(TestCase):

 

    def setUp(self):

        self.request = HttpRequest()

        self.post_dict = {'human':True,'color':'bl','age':29,'name':'David Fozo','subscription':True,'email':'example@gmail.com'}

 

    def test_form_renders_on_page_properly(self):

        response = form(self.request)

        for i in ['form','input','human','color','age','name','email']:

            self.assertIn(i,response.content.decode())

 

    def test_form_redirection(self):

 

        def subs_n_test(subs,red_url):

            self.post_dict['subscription'] = subs

            response = self.client.post('/form/',self.post_dict)

            self.assertRedirects(response,red_url)

 

        subs_n_test(False,'/thanks/')

        subs_n_test(True, '/subscribed/')

 

    def test_not_human_raises_error(self):

        self.post_dict['human'] = False

        self.post_dict['color'] = 'rd'

        form = SurveyForm(self.post_dict)

        self.assertEqual(form.is_valid(),False)

 

Run the test.

python3 manage.py test

 

Output:

.......

---------------------------------

Ran 7 tests in 0.043s

 

Wow! It passes without us doing anything. That shouldn't happen. The reason is because SurveyForm doesn't allow for non-humans to submit yet, and also there is no red eye option. Let's change that. The form automatically rejects our post_dict dictionary.

***main/forms.py***

...

   human = forms.BooleanField(label="Are you human?", required=False)

   color = forms.ChoiceField(

   label="What color is your eye?",

   choices=(('bl','Blue'),('br','Brown'),('bl','Black'),('gr','Green'),('rd','Red')),)

...

Let's run a test now.

Finally! We actually have to work in order to have a failing test.

..F....

==============================

FAIL: test_not_human_raises_error (main.tests.FormTest)

 

Okay, so form validation logic is in the main/forms.py file and in our case it will be done by the clean function.

 

***main/forms.py***

class SurveyForm(forms.Form):

...

   def clean(self):

       cleaned_data = super(SurveyForm,self).clean()

       if cleaned_data['human'] is False and cleaned_data['color'] == 'rd':

       raise forms.ValidationError("Humans cannot have red eyes!")

       return cleaned_data

 

So what is happening here? super(SurveyForm,self).clean() means it's run one time and do it's usual routine BEFORE you do your own custom validation. It returns a dictionary for you of the processed data. This process happens without you writing a piece of code, but now you want to insert your own validation logic. So you take that cleaned data and raise an error if doesn't met your criteria, then return cleaned data.

 

Run the test:

python3 manage.py test

 

Output:

.......

---------------------

Ran 7 tests in 0.044s

 

OK

 

Okay, so test passes, time for a commit.

git status

...

modified: main/forms.py

modified: main/tests.py

...

git add .

git commit -m “Form validates eye color”

 

Merge it back to master:

git branch

* form

master

 

First we need to squash the commits together.

git rebase -i HEAD~3

 

Interactive menu will pop up, change the last two from “pick” to “squash”:

pick 43aa283 Form is initiated and visible

squash 7c88872 Form redirects after user submission

squash e43e886 Form validates eye color

 

Ctrl+O and Ctrl+X if you are using nano. Then, edit the second file to start this way:

Form visible, redirects and validates eye color.

 

# Please enter the commit message for your changes. Lines starting

# with '#' will be ignored, and an empty message aborts the commit.

...

Now, it's time to merge it back to master.

Git checkout master

git merge form

git log

 

You should see the newest commit - “Form visible,redirects and validates eye color. ” - on top. Now, you can delete the 'form' branch.

git branch -d form

 

Congratulations! You learned the basics of dealing with Django Forms. There are a lot more to it. If you want to see more of this, let me know in the comment section.

 

Until next time.

 

RapidCon 2016: RapidPro Developer's Recap

By Caktus Consulting Group from Django community aggregator: Community blog posts. Published on Nov 16, 2016.

Developer Erin Mullaney was just in Amsterdam for RapidCon, a UNICEF-hosted event for developers using RapidPro, an SMS tool built on Django. The teams that have worked on RapidPro and its predecessor RapidSMS have gotten to know each other virtually over the years. This marks the second time they’ve all come from across the globe to share learnings on RapidPro and to discuss its future.

RapidPro has the potential to transform how field officers build surveys, collect data, and notify populations. It allows users with no technical background to quickly build surveys and message workflows. With over 100% cell phone saturation in many developing regions, SMS presents a cheap, fast means of reaching many quickly.

Erin worked closely with UNICEF Uganda in the development of a data analytics and reporting tool called TracPro for RapidPro. The organizers invited her to speak about the tool with other RapidPro users.

How was the conference?

Erin: The conference was amazing and I was ecstatic to go. Meeting the folks who work at UNICEF for the first time was exciting because we normally only speak via audio over Skype. It was nice to see them in person. We had an evening event, so it was fun to get to know them better in a social atmosphere. It was also a great opportunity to get together with other technical people who are very familiar with RapidPro and to think about ways we could increase usage of this very powerful product.

What was your talk about?

Erin: The title of my talk was “TracPro: How it Works and What it Does”. TracPro is an open source dashboard for use with RapidPro. You can use it for activities like real-time monitoring of education surveys. Nyaruka originally built it for UNICEF and it’s now being maintained by Caktus.

I was one of two developers who worked on TracPro at Caktus. We worked to flesh out the data visualizations including bar charts, line charts over date ranges and maps. We also improved celery tasks and added other features like syncing more detailed contact data from RapidPro.

What do you hope your listeners came away with?

Erin: I delved into the code for how we synced data locally via Celery and the RapidPro API and how we did it in a way that is not server-intensive. I also had examples on how to build the visualizations. Both of those features were hopefully helpful for people thinking of building their own dashboards. Building custom dashboards in a short amount of time is really easy and fun. For example, it took a ShipIt Day I to build a custom RapidPro dashboard for PyCon that called the RapidPro API.

What did you learn about RapidCon?

Erin: People discussed the tools they were building. UNICEF talked about a new project, eTools, being used for monitoring. That sounds like an interesting project that will grow.

RapidPro has had exponential usage and growth and Nyaruka and UNICEF are working really hard to manage that. It was interesting to learn about the solutions Nyaruka is looking at to deal with incredibly large data sets from places with a ton of contacts. They’ll be erasing unnecessary data and looking at other ways to minimize these giant databases.

UNICEF is pretty happy with how RapidPro is working now and don’t expect to add too many new features to it. They’re looking ahead to managing dashboard tools like TracPro. So their focus is really on these external dashboards and building them out. The original RapidPro was really not for dashboards.

What was the best part of RapidCon for you?

Erin: It was pretty cool to be in a room and selected for this. I was one of only two women. Having them say “You have this knowledge that other developers don’t have” was rewarding. I felt like I had a value-add to this conference based on the past year and a half working on RapidPro-related projects.

Will you be sharing more of your RapidPro knowledge in the future?

Erin: So far, we’ve been the only one giving a talk about RapidPro, it seems. I gave a RapidPro talk at PyData Carolinas this year with Rebecca Muraya, (Reach More People: SMS Data Collection with RapidPro)[https://caktusgroup.com/talks/reach-more-people-sms-data-collection-rapidpro/] and during a PyCon 2016 sponsor workshop. I’ve been encouraged to give this talk at more conferences and spread the word about RapidPro in order to get the word out further. I plan to submit it to a few 2017 conferences for sure!

Thank you Erin for sharing your experience with us!

To view another RapidPro talk Erin gave during PyData 2016 Carolinas, view the video here.

How does the Django Cross-site request forgery protection work?

By Practicing web development from Django community aggregator: Community blog posts. Published on Nov 15, 2016.

Dan Poirier wrote an article on the Cactus Group blog about common web site security vulnerabilities. In it he talked about the CSRF protection in Django. Although he is right about a CSRF token having to be part of the POST request, this is not the entire story.

It is not my intention to claim that mister Poirier does not know how the CSRF protection in Django works. I only want to present a more complete version.

First things first, for those of you that have not read the Dan Poiriers article, here’s a short summary of the CSRF related parts.

Cross-site request forgery (CSRF or XSRF) is a type of attack where a malicious site is trying to make your browser send requests to another site in an attempt to leverage the permissions of the user—you. (For more information and examples, check the original article or the OWASP page on CSRF.)

Besides making sure that GET requests do not change data the article talks about the CSRF protection provided by Django. Specifically it states the following (emphasis mine):

Django’s protection is to always include a user-specific, unguessable string as part of such requests, and reject any such request that doesn’t include it. This string is called the CSRF token. Any form on a Django site that does a POST etc has to include it as one of the submitted parameters. Since the malicious site doesn’t know the token, it cannot generate a malicious POST request that the Django site will pay any attention to.

This is where the author is not incorrect (the POST request indeed has to include the CSRF token), but this is only one half of the mechanism. The other half is a cookie that is set by the original site where the user is logged in. Only when the server receives both values from the browser (and they match) will the POST request be valid. This is described as the “Double Submit Cookie” defence in the Cross-Site Request Forgery (CSRF) Prevention Cheat Sheet by OWASP.

Example

Here’s an example with a very simple form (source code on GitHub). Let’s first request the form:

$ curl -i http://localhost:8000/post_to_me/

The response will look something like this:

HTTP/1.0 200 OK
Date: Tue, 15 Nov 2016 20:25:58 GMT
Server: WSGIServer/0.2 CPython/3.4.3
Content-Type: text/html; charset=utf-8
Vary: Cookie
X-Frame-Options: SAMEORIGIN
Set-Cookie:  csrftoken=wVFdNQ1Hz487w7yk2mVjre2qlsclXi99w2jEcGyvXorojDLd7jH09NGhmbavG3tx; expires=Tue, 14-Nov-2017 20:55:58 GMT; Max-Age=31449600; Path=/

<!doctype html>
<html>
  <body>
    <form action="/post_to_me/" method="post">
      <input type='hidden' name='csrfmiddlewaretoken' value='JBWuPGvKU54xW9YIwEIknst1azSkBmg3JIAVew2yipnOJFbBBBu1517SbiQuk7Ar' />
      <tr><th><label for="id_name">Name:</label></th><td><input id="id_name" name="name" type="text" required /></td></tr>
      <input type="submit" value="Post" />
    </form>
  </body>
</html>

Note the csrftoken value in the Set-Cookie header. Also note the csrfmiddlewaretoken value in the form in the body of the response. We’ll use these values in our examples where we send POST requests.

First a demonstration that we can successfully post a value using both the cookie and the value in the form (in the --data parameter):

$ curl -s -D - -o /dev/null \
-H 'Cookie: csrftoken=wVFdNQ1Hz487w7yk2mVjre2qlsclXi99w2jEcGyvXorojDLd7jH09NGhmbavG3tx' \
--data 'csrfmiddlewaretoken=JBWuPGvKU54xW9YIwEIknst1azSkBmg3JIAVew2yipnOJFbBBBu1517SbiQuk7Ar&name=mark' \
http://localhost:8000/post_to_me/

The response is a nice 200 OK:

HTTP/1.0 200 OK
Date: Tue, 15 Nov 2016 20:30:35 GMT
Server: WSGIServer/0.2 CPython/3.4.3
X-Frame-Options: SAMEORIGIN
Content-Type: text/html; charset=utf-8

If we do not send the cookie along, we expect that the POST request will fail:

$ curl -s -D - -o /dev/null \
--data 'csrfmiddlewaretoken=JBWuPGvKU54xW9YIwEIknst1azSkBmg3JIAVew2yipnOJFbBBBu1517SbiQuk7Ar&name=mark' \
http://localhost:8000/post_to_me/

And indeed the server responds with a 403 Forbidden status:

HTTP/1.0 403 Forbidden
Date: Tue, 15 Nov 2016 20:32:47 GMT
Server: WSGIServer/0.2 CPython/3.4.3
X-Frame-Options: SAMEORIGIN
Content-Type: text/html

We could try to include the cookie but leave out the value in the form to check if the cookie alone is enough:

curl -s -D - -o /dev/null \
-H 'Cookie: csrftoken=wVFdNQ1Hz487w7yk2mVjre2qlsclXi99w2jEcGyvXorojDLd7jH09NGhmbavG3tx' \
--data 'name=mark' \
http://localhost:8000/post_to_me/

However, this has the same result:

HTTP/1.0 403 Forbidden
Date: Tue, 15 Nov 2016 20:33:35 GMT
Server: WSGIServer/0.2 CPython/3.4.3
X-Frame-Options: SAMEORIGIN
Content-Type: text/html

As you can see Django requires both components to be present. The actual value of the token is less relevant. Sure, it is “unguessable”, but that is (in my humble opinion) not the most relevant part. The CSRF token is also not stored. Django could not care that much about the actual value—as long as the value in the cookie matches the one in the POST data, the token is considered valid.

To demonstrate that I can make up my own values if I want to (as long as they are 32 or 64 characters in length):

$ curl -s -D - -o /dev/null \
-H 'Cookie: csrftoken= markmarkmarkmarkmarkmarkmarkmark' \
--data 'csrfmiddlewaretoken=markmarkmarkmarkmarkmarkmarkmark&name=mark' \
http://localhost:8000/post_to_me/

This succeeds:

HTTP/1.0 200 OK
Date: Tue, 15 Nov 2016 20:35:55 GMT
Server: WSGIServer/0.2 CPython/3.4.3
X-Frame-Options: SAMEORIGIN
Content-Type: text/html; charset=utf-8

This example uses an older version of the CSRF token. As of Django 1.10 the CSRF form token value is salted and changed every request. (For details, see this commit and the related issue.) This does not change the mechanism behind the defence though.

Why does this defence work?

Back to the attack. Why does it matter that the POST request has to have matching tokens both via the cookie and the POST data?

Cookies are set for a specific domain. Your browser protects this in two ways:

  • You cannot set a cookie for a different domain.
  • Cookies for one domain are not sent to another domain.

This defence against CSRF works because although the evil site can force the browser to make a request to the site it wants to abuse, the attacker can only manipulate the request and its data. The attacker cannot set, modify or even read a cookie for a different domain than its own. As a result the attacker cannot determine or even guess which CSRF token should be in the request and thus the request will fail.

Note that this is the protection chosen by Django. Other forms of defence are possible. See the aforementioned Cross-Site Request Forgery (CSRF) Prevention Cheat Sheet for more information.

As already stated at the top of the article, I do not want to imply that mister Poirier does not know how the CSRF protection in Django works; perhaps he decided to leave out some of these details to make his article more succinct. Either way, in my opinion his article only told half of the story with regards to CSRF protection. So I decided to talk about the other half.

How to Implement CRUD Using Ajax and Json

By Simple is Better Than Complex from Django community aggregator: Community blog posts. Published on Nov 15, 2016.

Using Ajax to create asynchronous request to manipulate Django models is a very common use case. It can be used to provide an inline edit in a table, or create a new model instance without going back and forth in the website. It also bring some challanges, such as keeping the state of the objects consistent.

In case you are not familiar with the term CRUD, it stand for Create Read Update Delete.

Those are the basic operations we perform in the application entities. For the most part the Django Admin is all about CRUD.


Table of Contents


Basic Configuration

For this tutorial we will be using jQuery to implement the Ajax requests. Feel free to use any other JavaScript framework (or to implement it using bare JavaScript). The concepts should remain the same.

Grab a copy of jQuery, either download it or refer to one of the many CDN options.

jquery.com/download/

I usually like to have a local copy, because sometimes I have to work offline. Place the jQuery in the bottom of your base template:

base.html

{% load static %}<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <title>Bookstore - Simple is Better Than Complex</title>
    <link href="{% static 'css/bootstrap.min.css' %}" rel="stylesheet">
    <!--[if lt IE 9]>
      <script src="https://oss.maxcdn.com/html5shiv/3.7.3/html5shiv.min.js"></script>
      <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
  </head>
  <body>
    {% include 'includes/header.html' %}
    <div class="container">
      {% block content %}
      {% endblock %}
    </div>
    <script src="{% static 'js/jquery-3.1.1.min.js' %}"></script>  <!-- JQUERY HERE -->
    <script src="{% static 'js/bootstrap.min.js' %}"></script>
    {% block javascript %}
    {% endblock %}
  </body>
</html>

I will be also using Bootstrap. It is not required but it provide a good base css and also some useful HTML components, such as a Modal and pretty tables.


Working Example

I will be working in a app called books. For the CRUD operations consider the following model:

models.py

class Book(models.Model):
    HARDCOVER = 1
    PAPERBACK = 2
    EBOOK = 3
    BOOK_TYPES = (
        (HARDCOVER, 'Hardcover'),
        (PAPERBACK, 'Paperback'),
        (EBOOK, 'E-book'),
    )
    title = models.CharField(max_length=50)
    publication_date = models.DateField(null=True)
    author = models.CharField(max_length=30, blank=True)
    price = models.DecimalField(max_digits=5, decimal_places=2)
    pages = models.IntegerField(blank=True, null=True)
    book_type = models.PositiveSmallIntegerField(choices=BOOK_TYPES)

Listing Books

Let’s get started by listing all the book objects.

We need a route in the urlconf:

urls.py:

from django.conf.urls import url, include
from mysite.books import views

urlpatterns = [
    url(r'^books/$', views.book_list, name='book_list'),
]

A simple view to list all the books:

views.py

from django.shortcuts import render
from .models import Book

def book_list(request):
    books = Book.objects.all()
    return render(request, 'books/book_list.html', {'books': books})

book_list.html

{% extends 'base.html' %}

{% block content %}
  <h1 class="page-header">Books</h1>
  <table class="table" id="book-table">
    <thead>
      <tr>
        <th>#</th>
        <th>Title</th>
        <th>Author</th>
        <th>Type</th>
        <th>Publication date</th>
        <th>Pages</th>
        <th>Price</th>
      </tr>
    </thead>
    <tbody>
      {% for book in book_list %}
        <tr>
          <td>{{ book.id }}</td>
          <td>{{ book.title }}</td>
          <td>{{ book.author }}</td>
          <td>{{ book.get_book_type_display }}</td>
          <td>{{ book.publication_date }}</td>
          <td>{{ book.pages }}</td>
          <td>{{ book.price }}</td>
        </tr>
      {% empty %}
        <tr>
          <td colspan="7" class="text-center bg-warning">No book</td>
        </tr>
      {% endfor %}
    </tbody>
  </table>
{% endblock %}

So far nothing special. Our template should look like this:

Books List


Create Book

First thing, let’s create a model form. Let Django do its work.

forms.py

from django import forms
from .models import Book

class BookForm(forms.ModelForm):
    class Meta:
        model = Book
        fields = ('title', 'publication_date', 'author', 'price', 'pages', 'book_type', )

We need now to prepare the template to handle the creation operation. We will be working with partial templates to render only the parts that we actually need.

The strategy I like to use is to place a generic bootstrap modal, and use it for all the operations.

book_list.html

{% extends 'base.html' %}

{% block content %}
  <h1 class="page-header">Books</h1>

  <!-- BUTTON TO TRIGGER THE ACTION -->
  <p>
    <button type="button" class="btn btn-primary js-create-book">
      <span class="glyphicon glyphicon-plus"></span>
      New book
    </button>
  </p>

  <table class="table" id="book-table">
    <!-- TABLE CONTENT SUPPRESSED FOR BREVITY'S SAKE -->
  </table>

  <!-- THE MODAL WE WILL BE USING -->
  <div class="modal fade" id="modal-book">
    <div class="modal-dialog">
      <div class="modal-content">
      </div>
    </div>
{% endblock %}

Note that I already added a button that will be used to start the creation process. I added a class js-create-book to hook the click event. I usually add a class starting with js- for all elements that interacts with JavaScript code. It’s easier to debug the code later on. It’s not a enforcement but just a convention. Helps the code quality.

Add a new route:

urls.py:

from django.conf.urls import url, include
from mysite.books import views

urlpatterns = [
    url(r'^books/$', views.book_list, name='book_list'),
    url(r'^books/create/$', views.book_create, name='book_create'),
]

Let’s implement the book_create view:

views.py

from django.http import JsonResponse
from django.template.loader import render_to_string
from .forms import BookForm

def book_create(request):
    form = BookForm()
    context = {'form': form}
    html_form = render_to_string('books/includes/partial_book_create.html',
        context,
        request,
    )
    return JsonResponse({'html_form': html_form})

Note that we are not rendering a template but returning a Json response.

Now we create the partial template to render the form:

partial_book_create.html

{% load widget_tweaks %}

<form method="post">
  {% csrf_token %}
  <div class="modal-header">
    <button type="button" class="close" data-dismiss="modal" aria-label="Close">
      <span aria-hidden="true">&times;</span>
    </button>
    <h4 class="modal-title">Create a new book</h4>
  </div>
  <div class="modal-body">
    {% for field in form %}
      <div class="form-group{% if field.errors %} has-error{% endif %}">
        <label for="{{ field.id_for_label }}">{{ field.label }}</label>
        {% render_field field class="form-control" %}
        {% for error in field.errors %}
          <p class="help-block">{{ error }}</p>
        {% endfor %}
      </div>
    {% endfor %}
  </div>
  <div class="modal-footer">
    <button type="button" class="btn btn-default" data-dismiss="modal">Close</button>
    <button type="submit" class="btn btn-primary">Create book</button>
  </div>
</form>

I’m using the django-widget-tweaks library to render the form fields properly using the bootstrap class. You can read more about it in a post I published last year: Package of the Week: Django Widget Tweaks.

Now the glue that will put everything together: JavaScript.

Create an external JavaScript file. I created mine in the path: mysite/books/static/books/js/books.js

books.js

$(function () {

  $(".js-create-book").click(function () {
    $.ajax({
      url: '/books/create/',
      type: 'get',
      dataType: 'json',
      beforeSend: function () {
        $("#modal-book").modal("show");
      },
      success: function (data) {
        $("#modal-book .modal-content").html(data.html_form);
      }
    });
  });

});

Don’t forget to include this JavaScript file in the book_list.html template:

book_list.html

{% extends 'base.html' %}

{% load static %}

{% block javascript %}
  <script src="{% static 'books/js/books.js' %}"></script>
{% endblock %}

{% block content %}
  <!-- BLOCK CONTENT SUPPRESSED FOR BREVITY'S SAKE -->
{% endblock %}

Let’s explore the JavaScript snippet in great detail:

This is a jQuery shortcut to tell the browser to wait for all the HTML be rendered before executing the code:

$(function () {
  ...
});

Here we are hooking into the click event of the element with class js-create-book, which is our Add book button.

$(".js-create-book").click(function () {
  ...
});

When the user clicks in the js-create-book button, this anonymous function with the $.ajax call will be executed:

$.ajax({
  url: '/books/create/',
  type: 'get',
  dataType: 'json',
  beforeSend: function () {
    $("#modal-book").modal("show");
  },
  success: function (data) {
    $("#modal-book .modal-content").html(data.html_form);
  }
});

Now, what is this ajax request saying to the browser:

Hey, the resource I want is in this path:

url: '/books/create/',

Make sure you request my data using the HTTP GET method:

type: 'get',

Oh, by the way, I want to receive the data in JSON format:

dataType: 'json',

But just before you communicate with the server, please execute this code:

beforeSend: function () {
  $("#modal-book").modal("show");
},

(This will open the Bootstrap Modal before the Ajax request starts.)

And right after you receive the data (in the data variable), execute this code:

success: function (data) {
  $("#modal-book .modal-content").html(data.html_form);
}

(This will render the partial form defined in the partial_book_create.html template.)

Let’s have a look on what we have so far:

Books Add Button

Then when the user clicks the button:

Books Modal

Great stuff. The book form is being rendered asynchronously. But it is not doing much at the moment. Good news is that the structure is ready, now it is a matter of playing with the data.

Let’s implement now the form submission handling.

First let’s improve the book_create view function:

views.py

def book_create(request):
    data = dict()

    if request.method == 'POST':
        form = BookForm(request.POST)
        if form.is_valid():
            form.save()
            data['form_is_valid'] = True
        else:
            data['form_is_valid'] = False
    else:
        form = BookForm()

    context = {'form': form}
    data['html_form'] = render_to_string('books/includes/partial_book_create.html',
        context,
        request
    )
    return JsonResponse(data)

partial_book_create.html

{% load widget_tweaks %}

<form method="post" action="{% url 'book_create' %}" class="js-book-create-form">
  <!-- FORM CONTENT SUPPRESSED FOR BREVITY'S SAKE -->
</form>

I added the action attribute to tell the browser to where it should send the submission and the class js-book-create-form for us to use in the JavaScript side, hooking on the form submit event.

books.js

  $("#modal-book").on("submit", ".js-book-create-form", function () {
    ...
  });

The way we are listening to the submit event is a little bit different from what we have implemented before. That’s because the element with class .js-book-create-form didn’t exist on the initial page load of the book_list.html template. So we can’t register a listener to an element that doesn’t exists.

A work around is to register the listener to an element that will always exist in the page context. The #modal-book is the closest element. It is a little bit more complex what happen, but long story short, the HTML events propagate to the parents elements until it reaches the end of the document.

Hooking to the body element would have the same effect, but it would be slightly worst, because it would have to travel through several HTML elements before reaching it. So always pick the closest one.

Now the actual function:

books.js

  $("#modal-book").on("submit", ".js-book-create-form", function () {
    var form = $(this);
    $.ajax({
      url: form.attr("action"),
      data: form.serialize(),
      type: form.attr("method"),
      dataType: 'json',
      success: function (data) {
        if (data.form_is_valid) {
          alert("Book created!");  // <-- This is just a placeholder for now for testing
        }
        else {
          $("#modal-book .modal-content").html(data.html_form);
        }
      }
    });
    return false;
  });

A very important detail here: in the end of the function we are returning false. That’s because we are capturing the form submission event. So to avoid the browser to perform a full HTTP POST to the server, we cancel the default behavior returning false in the function.

So, what we are doing here:

var form = $(this);

In this context, this refers to the element with class .js-book-create-form. Which is the element that fired the submit event. So when we select $(this) we are selecting the actual form.

url: form.attr("action"),

Now I’m using the form attributes to build the Ajax request. The action here refers to the action attribute in the form, which translates to /books/create/.

data: form.serialize(),

As the name suggests, we are serializing all the data from the form, and posting it to the server. The rest follows the same concepts as I explained before.

Before we move on, let’s have a look on what we have so far.

The user fills the data:

New Book

The user clicks on the Create book button:

New Book

The data was invalid. No hard refresh no anything. Just this tiny part changed with the validation. This is what happened:

  1. The form was submitted via Ajax
  2. The view function processed the form
  3. The form data was invalid
  4. The view function rendered the invalid stated to the data['html_form'], using the render_to_string
  5. The Ajax request returned to the JavaScript function
  6. The Ajax success callback was executed, replacing the contents of the modal with the new data['html_form']

Please note that the Ajax success callback:

$.ajax({
  // ...
  success: function (data) {
    // ...
  }
});

Refers to the status of the HTTP Request, which has nothing to do with the status of your form, or whether the form was successfully processed or not. It only means that the HTTP Request returned a status 200 for example.

Let’s fix the publication date value and submit the form again:

New Book

There we go, the alert tells us that the form was succesfully processed and hopefully it was created in the database.

success: function (data) {
  if (data.form_is_valid) {
    alert("Book created!");  // <-- This line was executed! Means success
  }
  else {
    $("#modal-book .modal-content").html(data.html_form);
  }
}

It is not 100% what we want, but we are getting close. Let’s refresh the screen and see if the new book shows in the table:

Book List

Great. We are getting there.

What we want to do now: after the success form processing, we want to close the bootstrap modal and update the table with the newly created book. For that matter we will extract the body of the table to a external partial template, and we will return the new table body in the Ajax success callback.

Watch that:

book_list.html

<table class="table" id="book-table">
  <thead>
    <tr>
      <th>#</th>
      <th>Title</th>
      <th>Author</th>
      <th>Type</th>
      <th>Publication date</th>
      <th>Pages</th>
      <th>Price</th>
    </tr>
  </thead>
  <tbody>
    {% include 'books/includes/partial_book_list.html' %}
  </tbody>
</table>

partial_book_list.html

{% for book in books %}
  <tr>
    <td>{{ book.id }}</td>
    <td>{{ book.title }}</td>
    <td>{{ book.author }}</td>
    <td>{{ book.get_book_type_display }}</td>
    <td>{{ book.publication_date }}</td>
    <td>{{ book.pages }}</td>
    <td>{{ book.price }}</td>
  </tr>
{% empty %}
  <tr>
    <td colspan="7" class="text-center bg-warning">No book</td>
  </tr>
{% endfor %}

Now we can reuse the partial_book_list.html snippet without repeating ourselves.

Next step: book_create view function.

views.py

def book_create(request):
    data = dict()

    if request.method == 'POST':
        form = BookForm(request.POST)
        if form.is_valid():
            form.save()
            data['form_is_valid'] = True
            books = Book.objects.all()
            data['html_book_list'] = render_to_string('books/includes/partial_book_list.html', {
                'books': books
            })
        else:
            data['form_is_valid'] = False
    else:
        form = BookForm()

    context = {'form': form}
    data['html_form'] = render_to_string('books/includes/partial_book_create.html',
        context,
        request
    )
    return JsonResponse(data)

A proper success handler in the JavaScript side:

books.js

  $("#modal-book").on("submit", ".js-book-create-form", function () {
    var form = $(this);
    $.ajax({
      url: form.attr("action"),
      data: form.serialize(),
      type: form.attr("method"),
      dataType: 'json',
      success: function (data) {
        if (data.form_is_valid) {
          $("#book-table tbody").html(data.html_book_list);  // <-- Replace the table body
          $("#modal-book").modal("hide");  // <-- Close the modal
        }
        else {
          $("#modal-book .modal-content").html(data.html_form);
        }
      }
    });
    return false;
  });

Sweet. It’s working!


Edit Book

As you can expect, this will be very similar to what we did on the Create Book section. Except we will need to pass the ID of the book we want to edit. The rest should be somewhat the same. We will be reusing several parts of the code.

urls.py:

from django.conf.urls import url, include
from mysite.books import views

urlpatterns = [
    url(r'^books/$', views.book_list, name='book_list'),
    url(r'^books/create/$', views.book_create, name='book_create'),
    url(r'^books/(?P<pk>\d+)/update/$', views.book_update, name='book_update'),
]

Now we refactor the book_create view to reuse its code in the book_update view:

views.py

from django.shortcuts import render, get_object_or_404
from django.http import JsonResponse
from django.template.loader import render_to_string

from .models import Book
from .forms import BookForm


def save_book_form(request, form, template_name):
    data = dict()
    if request.method == 'POST':
        if form.is_valid():
            form.save()
            data['form_is_valid'] = True
            books = Book.objects.all()
            data['html_book_list'] = render_to_string('books/includes/partial_book_list.html', {
                'books': books
            })
        else:
            data['form_is_valid'] = False
    context = {'form': form}
    data['html_form'] = render_to_string(template_name, context, request)
    return JsonResponse(data)


def book_create(request):
    if request.method == 'POST':
        form = BookForm(request.POST)
    else:
        form = BookForm()
    return save_book_form(request, form, 'books/includes/partial_book_create.html')


def book_update(request, pk):
    book = get_object_or_404(Book, pk=pk)
    if request.method == 'POST':
        form = BookForm(request.POST, instance=book)
    else:
        form = BookForm(instance=book)
    return save_book_form(request, form, 'books/includes/partial_book_update.html')

Basically the view functions book_create and book_update are responsible for receiving the request, preparing the form instance and passing it to the save_book_form, along with the name of the template to use in the redering process.

Next step is to create the partial_book_update.html template. Similar to what we did with the view functions, we will also refactor the partial_book_create.html to reuse some of the code.

partial_book_form.html

{% load widget_tweaks %}

{% for field in form %}
  <div class="form-group{% if field.errors %} has-error{% endif %}">
    <label for="{{ field.id_for_label }}">{{ field.label }}</label>
    {% render_field field class="form-control" %}
    {% for error in field.errors %}
      <p class="help-block">{{ error }}</p>
    {% endfor %}
  </div>
{% endfor %}

partial_book_create.html

<form method="post" action="{% url 'book_create' %}" class="js-book-create-form">
  {% csrf_token %}
  <div class="modal-header">
    <button type="button" class="close" data-dismiss="modal" aria-label="Close">
      <span aria-hidden="true">&times;</span>
    </button>
    <h4 class="modal-title">Create a new book</h4>
  </div>
  <div class="modal-body">
    {% include 'books/includes/partial_book_form.html' %}
  </div>
  <div class="modal-footer">
    <button type="button" class="btn btn-default" data-dismiss="modal">Close</button>
    <button type="submit" class="btn btn-primary">Create book</button>
  </div>
</form>

partial_book_update.html

<form method="post" action="{% url 'book_update' form.instance.pk %}" class="js-book-update-form">
  {% csrf_token %}
  <div class="modal-header">
    <button type="button" class="close" data-dismiss="modal" aria-label="Close">
      <span aria-hidden="true">&times;</span>
    </button>
    <h4 class="modal-title">Update book</h4>
  </div>
  <div class="modal-body">
    {% include 'books/includes/partial_book_form.html' %}
  </div>
  <div class="modal-footer">
    <button type="button" class="btn btn-default" data-dismiss="modal">Close</button>
    <button type="submit" class="btn btn-primary">Update book</button>
  </div>
</form>

This is good enough. Now we gotta add an edit button to trigger the action.

partial_book_list.html

{% for book in books %}
  <tr>
    <td>{{ book.id }}</td>
    <td>{{ book.title }}</td>
    <td>{{ book.author }}</td>
    <td>{{ book.get_book_type_display }}</td>
    <td>{{ book.publication_date }}</td>
    <td>{{ book.pages }}</td>
    <td>{{ book.price }}</td>
    <td>
      <button type="button"
              class="btn btn-warning btn-sm js-update-book"
              data-url="{% url 'book_update' book.id %}">
        <span class="glyphicon glyphicon-pencil"></span> Edit
      </button>
    </td>
  </tr>
{% empty %}
  <tr>
    <td colspan="8" class="text-center bg-warning">No book</td>
  </tr>
{% endfor %}

The class js-update-book will be used to start the edit process. Now note that I also added an extra HTML attribute named data-url. This is the URL that will be used to create the ajax request dynamically.

Take the time and refactor the js-create-book button to also use the data-url strategy, so we can extract the hardcoded url from the Ajax request.

book_list.html

{% extends 'base.html' %}

{% block content %}
  <h1 class="page-header">Books</h1>

  <p>
    <button type="button"
            class="btn btn-primary js-create-book"
            data-url="{% url 'book_create' %}">
      <span class="glyphicon glyphicon-plus"></span>
      New book
    </button>
  </p>

  <!-- REST OF THE PAGE... -->

{% endblock %}

books.js

$(".js-create-book").click(function () {
  var btn = $(this);  // <-- HERE
  $.ajax({
    url: btn.attr("data-url"),  // <-- AND HERE
    type: 'get',
    dataType: 'json',
    beforeSend: function () {
      $("#modal-book").modal("show");
    },
    success: function (data) {
      $("#modal-book .modal-content").html(data.html_form);
    }
  });
});

Next step is to create the edit functions. The thing is, they are pretty much the same as the create. So, basically what we want to do is to extract the anonymous functions that we are using, and reuse them in the edit buttons and forms. Check it out:

books.js

$(function () {

  /* Functions */

  var loadForm = function () {
    var btn = $(this);
    $.ajax({
      url: btn.attr("data-url"),
      type: 'get',
      dataType: 'json',
      beforeSend: function () {
        $("#modal-book").modal("show");
      },
      success: function (data) {
        $("#modal-book .modal-content").html(data.html_form);
      }
    });
  };

  var saveForm = function () {
    var form = $(this);
    $.ajax({
      url: form.attr("action"),
      data: form.serialize(),
      type: form.attr("method"),
      dataType: 'json',
      success: function (data) {
        if (data.form_is_valid) {
          $("#book-table tbody").html(data.html_book_list);
          $("#modal-book").modal("hide");
        }
        else {
          $("#modal-book .modal-content").html(data.html_form);
        }
      }
    });
    return false;
  };


  /* Binding */

  // Create book
  $(".js-create-book").click(loadForm);
  $("#modal-book").on("submit", ".js-book-create-form", saveForm);

  // Update book
  $("#book-table").on("click", ".js-update-book", loadForm);
  $("#modal-book").on("submit", ".js-book-update-form", saveForm);

});

Let’s have a look on what we have so far.

Book Edit

The user clicks in the edit button.

Book Edit

Changes some data like the title of the book and hit the Update book button:

Book Edit

Cool! Now just the delete and we are done.


Delete Book

urls.py:

from django.conf.urls import url, include
from mysite.books import views

urlpatterns = [
    url(r'^books/$', views.book_list, name='book_list'),
    url(r'^books/create/$', views.book_create, name='book_create'),
    url(r'^books/(?P<pk>\d+)/update/$', views.book_update, name='book_update'),
    url(r'^books/(?P<pk>\d+)/delete/$', views.book_delete, name='book_delete'),
]

views.py

def book_delete(request, pk):
    book = get_object_or_404(Book, pk=pk)
    data = dict()
    if request.method == 'POST':
        book.delete()
        data['form_is_valid'] = True  # This is just to play along with the existing code
        books = Book.objects.all()
        data['html_book_list'] = render_to_string('books/includes/partial_book_list.html', {
            'books': books
        })
    else:
        context = {'book': book}
        data['html_form'] = render_to_string('books/includes/partial_book_delete.html',
            context,
            request,
        )
    return JsonResponse(data)

partial_book_delete.html

<form method="post" action="{% url 'book_delete' book.id %}" class="js-book-delete-form">
  {% csrf_token %}
  <div class="modal-header">
    <button type="button" class="close" data-dismiss="modal" aria-label="Close">
      <span aria-hidden="true">&times;</span>
    </button>
    <h4 class="modal-title">Confirm book deletion</h4>
  </div>
  <div class="modal-body">
    <p class="lead">Are you sure you want to delete the book <strong>{{ book.title }}</strong>?</p>
  </div>
  <div class="modal-footer">
    <button type="button" class="btn btn-default" data-dismiss="modal">Close</button>
    <button type="submit" class="btn btn-danger">Delete book</button>
  </div>
</form>

partial_book_list.html

{% for book in books %}
  <tr>
    <td>{{ book.id }}</td>
    <td>{{ book.title }}</td>
    <td>{{ book.author }}</td>
    <td>{{ book.get_book_type_display }}</td>
    <td>{{ book.publication_date }}</td>
    <td>{{ book.pages }}</td>
    <td>{{ book.price }}</td>
    <td>
      <button type="button"
              class="btn btn-warning btn-sm js-update-book"
              data-url="{% url 'book_update' book.id %}">
        <span class="glyphicon glyphicon-pencil"></span> Edit
      </button>
      <button type="button"
              class="btn btn-danger btn-sm js-delete-book"
              data-url="{% url 'book_delete' book.id %}">
        <span class="glyphicon glyphicon-trash"></span> Delete
      </button>
    </td>
  </tr>
{% empty %}
  <tr>
    <td colspan="8" class="text-center bg-warning">No book</td>
  </tr>
{% endfor %}

books.js

// Delete book
$("#book-table").on("click", ".js-delete-book", loadForm);
$("#modal-book").on("submit", ".js-book-delete-form", saveForm);

And the result will be:

Book Delete

The user clicks in a Delete button:

Book Delete

The user confirm the deletion, and the table in the background is refreshed:

Book Delete


Conclusions

I decided to use function-based views in this example because they are easier to read and use less configuration-magic that could distract those who are learning Django.

I tried to bring as much detail as I could and discuss about some of the code design decisions. If anything is not clear to you, or you want to suggest some improvements, feel free to leave a comment! I would love to hear your thoughts.

The code is available on GitHub: github.com/sibtc/simple-ajax-crud, so you can try it locally. It’s very straightforward to get it running.

Gitのチートシート

By Django日本語ブログ from Django community aggregator: Community blog posts. Published on Nov 15, 2016.

GitのGUIはwww.gitkraken.comがおすすめです。

新しいブランチの作成

git branch new_feature

 

ブランチをチェックアウト

git checkout new_feature

 

新しいブランチを作成し、チェックアウトをする

git checkout -b new_feature

 

レポジトリのステータスチェック

git status

 

すべての変更されたファイルをステージングエリアに追加

git add .

 

特定のフォルダ、ファイルをステージングエリアに追加

git add test.py

 

ステージングされたファイルをコミット

git commit -m "commit message"

 

コミットヒストリーを確認

git log

 

masterブランチに新しいコミットをプッシュする

git push origin master

Gitのチートシートw3b.jpで公開された投稿です。

Command Line Tricks for Ridiculously Fast Django Development

By David Fozo's Blog from Django community aggregator: Community blog posts. Published on Nov 13, 2016.

Command Line Icon

 

The command line is one of the most important tool in your arsenal. Knowing it well and be fast with it will seriously boost your performance and effectiveness. One side of that is knowing the commands well, the other side is aliases and custom variables.

 

We will focus on the aliases today with the most important shortcuts. An alias is giving another name to command, possibly a much shorter one. For example you want a faster way to invoke Python interpreter. Instead of “python3” you could just type “p”. The command would go this way:

alias p=python3

 

This setting will cease to exists when you exit the terminal. You can make it permanent if you set them in the .bashrc file in your home directory. Open up ~/.bashrc with your editor. I use nano:

nano ~/.bashrc

 

Head to the bottom of the file and copy the following:

*** ~/.bashrc ***

#my custom aliases

alias v=”source ../virtualenv/bin/activate”

alias dea=”deactivate”

alias r=”python3 manage.py runserver”

alias te=”python3 manage.py test”

alias c=”clear”

alias mdkir=”mkdir”

alias ..=”cd ..”

alias ….=”cd ../..”

alias …...=”cd ../../..”

 

#my custom variables

tut=”~/Tutorial/DjangoTutorial/source” // Replace it where your working directory is

 

If you haven't followed along with the tutorial here is the way to set it up. Clone the exercise3 branch.

 

Now, let's play around with these new tricks. Navigate into the $tut directory:

cd $tut

 

That gets you to the ~/Tutorial/DjangoTutorial/source directory from anywhere on your computer.

 

Change directory to the grandparent. In my case Tutorial/ folder.

....

 

Let's say you want to clone your DjangoTutorial project.

mkdir Testrepo

cd // Press Alt+.

 

Alt + . takes the last argument of the last command you used. In this case “Testrepo”. So your command autocompleted:

cd Testrepo

 

Clone $tut repository in our folder. Don't forget the dot at the end:

git clone $tut .

 

Should see this output:

Cloning into '.'...

done.

 

Check our files:

ls

 

Output:

main manage.py MyTutorial requirements.txt

 

This is identical, but if you really want to make sure:

diff $tut .

Output:

Common subdirectories: /home/david/Tutorial/DjangoTutorial/source/.git and ./.git

Common subdirectories: /home/david/Tutorial/DjangoTutorial/source/main and ./main

Common subdirectories: /home/david/Tutorial/DjangoTutorial/source/MyTutorial and ./MyTutorial

 

Okay, you changed your mind, and want to erase this repo. Change back to parent directory.

..

 

Delete with the recursive (r) and forced (f) options. Meaning, that it goes through all subdirectories and files and erase everything in it's path. Forced means, that you don't have to tell “yes” everytime there is a sensitive file. Needless to say, use this command with extreme caution!

rm Testrepo -r -f

 

Before we go back. Let me demonstrate the most common, single important command you will use, if you haven't already. Tab as autocomplete.

cat $tut/main/mo // Press Tab

 

You should see your command expanded and autocompleted.

cat /home/david/Tutorial/DjangoTutorial/source/main/models.py

This goes for any file or folder. Just provide with enough letter for the computer to be able to guess.

 

Let's move back to our $tut folder.

cd $tut

 

Activate virtualenv.

v

 

Command line should get into virtual environment mode:

(virtualenv) david@my-machine:~/Tutorial/DjangoTutorial/source$

 

Let's run the development server:

r

 

Output:

Performing system checks...

 

System check identified no issues (0 silenced).

November 12, 2016 - 15:40:40

Django version 1.10, using settings 'MyTutorial.settings'

Starting development server at http://127.0.0.1:8000/

Quit the server with CONTROL-C.

 

Press Ctrl + C, to escape. One of the most frequent command I also use is “clear”. Lot of times you will need it to not get confused by earlier outputs and to keep your sanity.

c

 

Testing will also need to be repeated several dozens of times.

te

 

Output:

Creating test database for alias 'default'...

....

----------------------------------------------------------------------

Ran 4 tests in 0.020s

 

OK

Destroying test database for alias 'default'...

 

If you are wondering why don't I used only the “t” letter, it's because I keep it for special occasions. Let's say, you accidentally modify the article variables in index.html the following way:

***main/templates/index.html***

...

<p>Hello World</p>

<p>{{ article }}</p>

<p>{{ article }}</p>

...

Run the test:

te

 

Output:

Creating test database for alias 'default'...

...F

======================================================================

FAIL: test_index_html_displays_article (main.tests.ModelTest)

...

So you need to pinpoint the cause of the error (let's pretend, you don't know what is the cause of the problem). You can run only this one failing test, like so:

python3 manage.py test main.tests.ModelTest.test_index_html_displays_article

(Run test from main application, tests module, ModelTest class, test_index_html_displays_article function)

 

That's unnecessary long and difficult to remember. Easier to set up an alias:

alias t=”python3 manage.py test main.tests.ModelTest.test_index_html_displays_article”

 

Next time you need to run, just type:

t

 

Remember if you close your terminal window, this alias won't be remembered next time.

 

So you figured there is something going on with the “article” object. Let's check all the files that has this word in them and the exact lines.

grep -r article

 

Output:

main/views.py: article = Article.objects.last()

main/views.py: return render(request,'index.html',{'article':article,})

main/migrations/0002_auto_20161109_1542.py: model_name='article',

Binary file main/migrations/__pycache__/0002_auto_20161109_1542.cpython-34.pyc matches

main/templates/index.html: <p>{{ article }}</p>

main/templates/index.html: <p>{{ article }}</p>

Binary file main/__pycache__/views.cpython-34.pyc matches

Binary file main/__pycache__/tests.cpython-34.pyc matches

main/tests.py: self.article = Article.objects.all().filter(title=self.test_title)

main/tests.py: self.article.delete()

main/tests.py: self.assertEqual(self.article.values()[0]['text'],self.test_body)

main/tests.py: def test_index_html_displays_article(self):

 

Okay, so you narrowed down the problem. The cause of error must be in index.html, tests.py or views.py. You also remember, that last time you committed, all tests were passing. So let's check git diff to what has been modified.

Git diff

 

Output:

diff --git a/main/templates/index.html b/main/templates/index.html

index d95f9cf..1ac081a 100644

--- a/main/templates/index.html

+++ b/main/templates/index.html

@@ -6,7 +6,7 @@

</head>

<body>

<p>Hello World</p>

- <p>{{ article.title }}</p>

- <p>{{ article.text }}</p>

+ <p>{{ article }}</p>

+ <p>{{ article }}</p>

</body>

</html>

 

Oh! Now, you see what happened. Instead of changing the lines directly, you reverse all changes through git.

Git reset --hard

 

Output:

HEAD is now at 1452ef7 Article model is displayed on home page

 

It was enough for today, let's change back from virtualenv:

dea

 

Congratulations!

 

You learned to be savvy with the command line shortcuts and possibly will be seriously faster in the future! Remember the file .bashrc and update it regularly to your workflow. Let me know in the comment section if you have any questions.

 

See you next time!

Command Line Tricks for Ridiculously Fast Django Development

By David Fozo's Blog from Django community aggregator: Community blog posts. Published on Nov 13, 2016.

Command Line Icon

 

The command line is one of the most important tool in your arsenal. Knowing it well and be fast with it will seriously boost your performance and effectiveness. One side of that is knowing the commands well, the other side is aliases and custom variables.

 

We will focus on the aliases today with the most important shortcuts. An alias is giving another name to command, possibly a much shorter one. For example you want a faster way to invoke Python interpreter. Instead of “python3” you could just type “p”. The command would go this way:

alias p=python3

 

This setting will cease to exists when you exit the terminal. You can make it permanent if you set them in the .bashrc file in your home directory. Open up ~/.bashrc with your editor. I use nano:

nano ~/.bashrc

 

Head to the bottom of the file and copy the following:

*** ~/.bashrc ***

#my custom aliases

alias v=”source ../virtualenv/bin/activate”

alias dea=”deactivate”

alias r=”python3 manage.py runserver”

alias te=”python3 manage.py test”

alias c=”clear”

alias mdkir=”mkdir”

alias ..=”cd ..”

alias ….=”cd ../..”

alias …...=”cd ../../..”

 

#my custom variables

tut=”~/Tutorial/DjangoTutorial/source” // Replace it where your working directory is

 

If you haven't followed along with the tutorial here is the way to set it up. Clone the exercise3 branch.

 

Now, let's play around with these new tricks. Navigate into the $tut directory:

cd $tut

 

That gets you to the ~/Tutorial/DjangoTutorial/source directory from anywhere on your computer.

 

Change directory to the grandparent. In my case Tutorial/ folder.

....

 

Let's say you want to clone your DjangoTutorial project.

mkdir Testrepo

cd // Press Alt+.

 

Alt + . takes the last argument of the last command you used. In this case “Testrepo”. So your command autocompleted:

cd Testrepo

 

Clone $tut repository in our folder. Don't forget the dot at the end:

git clone $tut .

 

Should see this output:

Cloning into '.'...

done.

 

Check our files:

ls

 

Output:

main manage.py MyTutorial requirements.txt

 

This is identical, but if you really want to make sure:

diff $tut .

Output:

Common subdirectories: /home/david/Tutorial/DjangoTutorial/source/.git and ./.git

Common subdirectories: /home/david/Tutorial/DjangoTutorial/source/main and ./main

Common subdirectories: /home/david/Tutorial/DjangoTutorial/source/MyTutorial and ./MyTutorial

 

Okay, you changed your mind, and want to erase this repo. Change back to parent directory.

..

 

Delete with the recursive (r) and forced (f) options. Meaning, that it goes through all subdirectories and files and erase everything in it's path. Forced means, that you don't have to tell “yes” everytime there is a sensitive file. Needless to say, use this command with extreme caution!

rm Testrepo -r -f

 

Before we go back. Let me demonstrate the most common, single important command you will use, if you haven't already. Tab as autocomplete.

cat $tut/main/mo // Press Tab

 

You should see your command expanded and autocompleted.

cat /home/david/Tutorial/DjangoTutorial/source/main/models.py

This goes for any file or folder. Just provide with enough letter for the computer to be able to guess.

 

Let's move back to our $tut folder.

cd $tut

 

Activate virtualenv.

v

 

Command line should get into virtual environment mode:

(virtualenv) david@my-machine:~/Tutorial/DjangoTutorial/source$

 

Let's run the development server:

r

 

Output:

Performing system checks...

 

System check identified no issues (0 silenced).

November 12, 2016 - 15:40:40

Django version 1.10, using settings 'MyTutorial.settings'

Starting development server at http://127.0.0.1:8000/

Quit the server with CONTROL-C.

 

Press Ctrl + C, to escape. One of the most frequent command I also use is “clear”. Lot of times you will need it to not get confused by earlier outputs and to keep your sanity.

c

 

Testing will also need to be repeated several dozens of times.

te

 

Output:

Creating test database for alias 'default'...

....

----------------------------------------------------------------------

Ran 4 tests in 0.020s

 

OK

Destroying test database for alias 'default'...

 

If you are wondering why don't I used only the “t” letter, it's because I keep it for special occasions. Let's say, you accidentally modify the article variables in index.html the following way:

***main/templates/index.html***

...

<p>Hello World</p>

<p>{{ article }}</p>

<p>{{ article }}</p>

...

Run the test:

te

 

Output:

Creating test database for alias 'default'...

...F

======================================================================

FAIL: test_index_html_displays_article (main.tests.ModelTest)

...

So you need to pinpoint the cause of the error (let's pretend, you don't know what is the cause of the problem). You can run only this one failing test, like so:

python3 manage.py test main.tests.ModelTest.test_index_html_displays_article

(Run test from main application, tests module, ModelTest class, test_index_html_displays_article function)

 

That's unnecessary long and difficult to remember. Easier to set up an alias:

alias t=”python3 manage.py test main.tests.ModelTest.test_index_html_displays_article”

 

Next time you need to run, just type:

t

 

Remember if you close your terminal window, this alias won't be remembered next time.

 

So you figured there is something going on with the “article” object. Let's check all the files that has this word in them and the exact lines.

grep -r article

 

Output:

main/views.py: article = Article.objects.last()

main/views.py: return render(request,'index.html',{'article':article,})

main/migrations/0002_auto_20161109_1542.py: model_name='article',

Binary file main/migrations/__pycache__/0002_auto_20161109_1542.cpython-34.pyc matches

main/templates/index.html: <p>{{ article }}</p>

main/templates/index.html: <p>{{ article }}</p>

Binary file main/__pycache__/views.cpython-34.pyc matches

Binary file main/__pycache__/tests.cpython-34.pyc matches

main/tests.py: self.article = Article.objects.all().filter(title=self.test_title)

main/tests.py: self.article.delete()

main/tests.py: self.assertEqual(self.article.values()[0]['text'],self.test_body)

main/tests.py: def test_index_html_displays_article(self):

 

Okay, so you narrowed down the problem. The cause of error must be in index.html, tests.py or views.py. You also remember, that last time you committed, all tests were passing. So let's check git diff to what has been modified.

Git diff

 

Output:

diff --git a/main/templates/index.html b/main/templates/index.html

index d95f9cf..1ac081a 100644

--- a/main/templates/index.html

+++ b/main/templates/index.html

@@ -6,7 +6,7 @@

</head>

<body>

<p>Hello World</p>

- <p>{{ article.title }}</p>

- <p>{{ article.text }}</p>

+ <p>{{ article }}</p>

+ <p>{{ article }}</p>

</body>

</html>

 

Oh! Now, you see what happened. Instead of changing the lines directly, you reverse all changes through git.

Git reset --hard

 

Output:

HEAD is now at 1452ef7 Article model is displayed on home page

 

It was enough for today, let's change back from virtualenv:

dea

 

Congratulations!

 

You learned to be savvy with the command line shortcuts and possibly will be seriously faster in the future! Remember the file .bashrc and update it regularly to your workflow. Let me know in the comment section if you have any questions.

 

See you next time!

Django Tutorial Setup

By David Fozo's Blog from Django community aggregator: Community blog posts. Published on Nov 13, 2016.

Github Setup IconThis article is an appendix to the other tutorial exercises on the site.

 

Follow these steps to clone my repository from github and make the tutorial setup on your computer. Replace branch “exerciseX” with your current exercise branch.

 
mkdir -p DjangoTutorial/{static,virtualenv,source,database,media}

virtualenv --python=python3 DjangoTutorial/virtualenv/

git clone https://github.com/fozodavid/DjangoTutorial.git –branch exerciseX –single-branch DjangoTutorial/source

cd DjangoTutorial/source

touch MyTutorial/local_settings.py

 

***MyTutorial/local_settings.py***

import os

from MyTutorial.settings import BASE_DIR

 

SECRET_KEY = 'rf@7y-$2a41o+4&z$ki0&=z)(ao=@+$fseu1f3*f=25b6xtnx$'

 

DEBUG = True

 

ALLOWED_HOSTS = []

DATABASES = {

    'default': {

        'ENGINE': 'django.db.backends.sqlite3',

        'NAME': os.path.join(BASE_DIR,'..','database','db.sqlite3'),

    }

}

*** end of MyTutorial/local_settings.py ***

git branch -m exerciseX master

 

You are ready to start development.

JSON Web Tokens in django application- part four

By Krzysztof Żuraw Personal Blog from Django community aggregator: Community blog posts. Published on Nov 13, 2016.

When I started this series I have got one comment from my co-worker that instead of authentication JWT can be used to sign one time links. After reading through the documentation I found that can be a great idea so I decided to write a blog post about it.

Use case

Nowadays when a user creates an account he or she has to confirm identity. It is done by sending an email with the link to confirm and activate an account.

As this link has to expire and be safe this is a good use case for using JSON Web Tokens. Such tokens can be generated for every user and set to expire for example after two hours. How can it be done in Django? Let's jump into the code.

JSON Web Tokens in urls

First I change the previous code from series and made special django app just for users. But the first user has to register - that's why I made new endpoint in urls.py:

from users.views import UserViewSet, CreateUserView,
urlpatterns = [
    # rest of url patterns
    url('^api-register/$', CreateUserView.as_view()),
]

CreateUserView looks as follows:

from rest_framework import status
from rest_framework.generics import CreateAPIView
from rest_framework.response import Response
from rest_framework_jwt.settings import api_settings

jwt_payload_handler = api_settings.JWT_PAYLOAD_HANDLER
jwt_encode_handler = api_settings.JWT_ENCODE_HANDLER

class CreateUserView(CreateAPIView):

    model = User.objects.all()
    permission_classes = [
        permissions.AllowAny # Or anon users can't register
    ]
    serializer_class = UserSerializer

    def create(self, request, *args, **kwargs):
        serializer = self.get_serializer(data=request.data)
        serializer.is_valid(raise_exception=True)
        self.perform_create(serializer)
        headers = self.get_success_headers(serializer.data)
        user = self.model.get(username=serializer.data['username'])
        payload = jwt_payload_handler(user)
        token = jwt_encode_handler(payload)
        return Response(
            {
                'confirmation_url': reverse(
                    'activate-user', args=[token], request=request
                )
            },
            status=status.HTTP_201_CREATED, headers=headers
        )

In this view, I simply add few additional lines for creating JWT. Rest of this is a standard code from DRF. First I created payload by adding user to JWT creation process, then I created the token from payload by calling jwt_encode_handler. At the end instead of returning user data, I return confirmation_url for the end user to enter and activate the account. By default django make every user active so I have to write my own create method for UserSerializer:

from django.contrib.auth.models import User
from rest_framework import serializers
from tasks.models import Task

class UserSerializer(serializers.ModelSerializer):
    tasks = serializers.PrimaryKeyRelatedField(
        many=True, queryset=Task.objects.all()
    )

    class Meta:
        model = User
        fields = ('username', 'password', 'tasks', 'email')

    def create(self, validated_data):
        user = User(
            email=validated_data['email'],
            username=validated_data['username']
        )
        user.set_password(validated_data['password'])
        user.is_active = False
        user.save()
        return user

It is simply for setting user as inactive during the process of account creation. Right now when user wants to create an account he/she has to send the following request:

$ http POST 127.0.0.1:9000/api-register/ username=krzysiek password=krzysiek email=krzysztof@kz.com
HTTP/1.0 201 Created
Allow: POST, OPTIONS
Content-Type: application/json
Date: Sun, 13 Nov 2016 15:16:33 GMT
Server: WSGIServer/0.2 CPython/3.5.2
Vary: Accept
X-Frame-Options: SAMEORIGIN

{
    "confirmation_url": "http://127.0.0.1:9000/api-activate/eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJlbWFpbCI6ImtyenlzenRvZkBrei5jb20iLCJ1c2VyX2lkIjoyNSwidXNlcm5hbWUiOiJrcnp5c2llayIsImV4cCI6MTQ3OTA1MDQ5M30.CMcW8ZtU6AS9LfVvO-PoLyqcwi6cOK1VzI2o7pEPX2k/"
}

How this confirmation_url works? I made additional urlpattern:

from users.views import ActivateUser

urlpatterns = [
    # rest of url patterns
    url(
        '^api-activate/(?P<token>.+?)/$',
        ActivateUser.as_view(),
        name='activate-user'
    ),
]

and in ActivateUser:

class ActivateUser(APIView):

    def get(self, request, *args, **kwargs):
        token = kwargs.pop('token')
        try:
            payload = jwt_decode_handler(token)
        except jwt.ExpiredSignature:
            msg = _('Signature has expired.')
            raise exceptions.AuthenticationFailed(msg)
        except jwt.DecodeError:
            msg = _('Error decoding signature.')
            raise exceptions.AuthenticationFailed(msg)
        except jwt.InvalidTokenError:
            raise exceptions.AuthenticationFailed()

        user_to_activate = User.objects.get(id=payload.get('user_id'))
        user_to_activate.is_active = True
        user_to_activate.save()

        return Response(
            {'User Activated'},
            status=status.HTTP_200_OK
        )

This is generic APIView so I write get method for handling GET requests. I was wondering if it's a good idea to activate user in GET request or do it in PUT. If you have some thoughts about this I will be happy to hear them. In get I simply take the token from kwargs and perform validation on that token - if it's valid or expired. This part of code usually lies in authentication backend but in such class I don't have access to url of a request so in this case, I have to implement this in such a way. If you have other ways of handling such a case please let me know! So if everything looks good I activate user:

$ http GET http://127.0.0.1:9000/api-activate/eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJlbWFpbCI6ImtyenlzenRvZkBrei5jb20iLCJ1c2VyX2lkIjoyNSwidXNlcm5hbWUiOiJrcnp5c2llayIsImV4cCI6MTQ3OTA1MDQ5M30.CMcW8ZtU6AS9LfVvO-PoLyqcwi6cOK1VzI2o7pEPX2k/
HTTP/1.0 200 OK
Allow: GET, HEAD, OPTIONS
Content-Type: application/json
Date: Sun, 13 Nov 2016 15:17:37 GMT
Server: WSGIServer/0.2 CPython/3.5.2
Vary: Accept
X-Frame-Options: SAMEORIGIN

[
    "User Activated"
]

$ http GET http://127.0.0.1:9000/api-activate/eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJlbWFpbCI6ImtyenlzenRvZkBrei5jb20iLCJ1c2VyX2lkIjoyNSwidXNlcm5hbWUiOiJrcnp5c2llayIsImV4cCI6MTQ3OTA1MDQ5M30.CMcW8ZtU6AS9LfVvO-PoLyqcwi6cOK1VzI2o7pEPX2k/
 HTTP/1.0 401 Unauthorized
 Allow: GET, HEAD, OPTIONS
 Content-Type: application/json
 Date: Sun, 13 Nov 2016 15:28:00 GMT
 Server: WSGIServer/0.2 CPython/3.5.2
 Vary: Accept
 WWW-Authenticate: JWT realm="api"
 X-Frame-Options: SAMEORIGIN

 {
     "detail": "Signature has expired."
 }

By default django rest framework jwt sets token expiry time to 5 minutes. If you want to change that add following lines in settings.py:

JWT_AUTH = {
     'JWT_EXPIRATION_DELTA': datetime.timedelta(seconds=7)
}

That's all for today! Feel free to comment and check repo for this blog post under this link.

JSON Web Token (JWT) Authentication in a Django/AngularJS web app

By octobot blog - Django from Django community aggregator: Community blog posts. Published on Nov 11, 2016.

No matter if you are an experienced developer or if you are starting your first app, there is a task that we all face someday in our life as developers: user’s authentication.

Nowadays, there are several kinds of authentication techniques available, and many of them could fit your needs. Nevermind, this post is not about authentication mechanisms, it is about how to implement JSON Web Token Authentication in an application with a Django-based backend, using a REST API to offer resources for an AngularJS frontend app (which fits very well in the Octobot’s technologies stack, and maybe in yours)

First of all, why JWT? Well, because it is a compact and self-contained way for securely transmitting information between parties as a JSON object. Compact is good (we all know that), but self-contained? The JWT payload contains all the required information about the user, avoiding the need to query the database more than once. This makes JWT lightweight, scalable and easy to use.

Once a user was successfully logged in to your application using a username and password, he/she obtains a JWT which should be sent in every further request to the backend as an Authorization Header, and this token will tell the backend who the user making the request is.

The following image (extracted from here) illustrates this process.


So, let me show you how simple it is to integrate this mechanism, starting from the Django backend:

BACKEND

  1. Install django-rest-framework and django-rest-framework-jwt
    pip install djangorestframework djangorestframework-jwt
  2. Add them to your Django’s installed apps (in settings.py)
    INSTALLED_APPS = ( ... 'Rest_framework', 'rest_framework_jwt', )
  3. Configure django-rest-framework permissions to accept JSON Web tokens by default. (You could avoid this step and configure each endpoint independently).
    REST_FRAMEWORK = { 'DEFAULT_AUTHENTICATION_CLASSES': ( 'rest_framework_jwt.authentication.JSONWebTokenAuthentication', ), }
  4. Configure JWT additional settings. There are several parameters available.
    JWT_AUTH = {
    'JWT_SECRET_KEY': SECRET_KEY,
    'JWT_AUTH_HEADER_PREFIX': 'Bearer',
    }

    (SECRET_KEY Is the key used to sign the JWT. Make sure this is safe and not shared or public.)
  5. URL’s (urls.py). Let’s suppose that we have a set of public data which doesn’t requires authentication and will be under the /public-data endpoint, and another set that requires authentication under the /protected-data. Last, an authentication endpoint to get the JWT from a username and password (api-token-auth).
    from django.conf.urls import url
    from YOUR_APP.views import PublicDataViewSet, ProtectedDataViewSet
    
    urlpatterns = [
        url(r'public-data', PublicDataViewSet),
        url(r'protected-data', ProtectedDataViewSet),
        url(r'^api-token-auth/', 'rest_framework_jwt.views.obtain_jwt_token'),
    ]
  6. Finally, let’s create and setup the Viewsets mentioned (PublicDataViewSet and ProtectedDataViewSet).
    from django.contrib.auth.models import User
    from YOUR_APP.serializers import YOUR_MODEL_Serializer
    from rest_framework import viewsets
    from rest_framework.permissions import IsAuthenticated
    
    class PublicDataViewSet(viewsets.ModelViewSet):
        queryset = YOUR_MODEL.objects.all()
        serializer_class = YOUR_MODEL_Serializer    
    
    class ProtectedDataViewSet(viewsets.ModelViewSet):
        permission_classes = (IsAuthenticated, )
        queryset = YOUR_MODEL.objects.all()
        serializer_class = YOUR_MODEL_Serializer

Note that the ProtectedDataViewSet has a permission class (IsAuthenticated, ) so only authenticated users could reach this data.

And that’s it for the backend side. The magic is inside django-rest-framework.


FRONTEND

Now, we’ll setup the AngularJS frontend application.

  1. Install angular-storage and angular-jwt
    bower install a0-angular-storage angular-jwt
  2. Inject the dependencies in the angular app module.
    angular.module('frontendApp', [
        ...
        'angular-storage',
        'angular-jwt'
    ])
  3. Add an interceptor to include the JWT in every http request.
    .config(function($stateProvider, $urlRouterProvider, $httpProvider, jwtInterceptorProvider) {
    jwtInterceptorProvider.tokenGetter = function(store) {
        return store.get('token');
    };
    // Add a simple interceptor that will fetch all requests and add the jwt token to its authorization header.
    $httpProvider.interceptors.push('jwtInterceptor');
    });
  4. Finally, in your login method take care of save the token in the browser local storage. This way, the interceptor will be able to access it before each request, even after refresh the page or close the browser.
    $scope.login = function(){
        $http.post("backend/api-token-auth/", {'username': $scope.username, 'password': $scope.password}).success(function(response) {
            $scope.logged = true;
            $scope.jwt = response.token;
            store.set('token', response.token);
        }).error(function(response, status) {
            $scope.logged = false;
        });
    }

That’s it!

We hope this information is useful to you!

Common web site security vulnerabilities

By Caktus Consulting Group from Django community aggregator: Community blog posts. Published on Nov 10, 2016.

I recently decided I wanted to understand better what Cross-Site Scripting and Cross-Site Request Forgery were, and how they compared to that classic vulnerability, SQL Injection.

I also looked into some ways that sites protect against those attacks.

Vulnerabilities

SQL Injection

SQL Injection is a classic vulnerability. It probably dates back almost to punch cards.

Suppose a program uses data from a user in a database query.

For example, the company web site lets users enter a name of an employee, free-form, and the site will search for that employee and display their contact information.

A naive site might build a SQL query as a string using code like this, including whatever the user entered as NAME:

"SELECT * FROM employees WHERE name LIKE '" + NAME + "'"

If NAME is "John Doe", then we get:

SELECT * FROM employees WHERE name LIKE 'John Doe'

which is fine. But suppose someone types this into the NAME field:

John Doe'; DROP TABLE EMPLOYEES;

then the site will end up building this query:

SELECT * FROM employees WHERE name LIKE 'John Doe'; DROP TABLE EMPLOYEES;'

which might delete the whole employee directory. It could instead do something less obvious but even more destructive in the long run.

This is called a SQL Injection attack, because the attacker is able to inject whatever they want into a SQL command that the site then executes.

Cross Site Scripting

Cross Site Scripting, or XSS, is a similar idea. If an attacker can get their Javascript code embedded into a page on the site, so that it runs whenever someone visits that page, then the attacker's code can do anything on that site using the privileges of the user.

For example, maybe an attacker posts a comment on a page that looks to users like:

Great post!

but what they really put in their comment was:

Great post!<script> do some nefarious Javascript stuff </script>

If the site displays comments by just embedding the text of the comment in the page, then whenever a user views the page, the browser will run the Javascript - it has no way to know this particular Javascript on the page was written by an attacker rather than the people running the site.

This Javascript is running in a page that was served by the site, so it can do pretty much anything the user who is currently logged in can do. It can fetch all their data and send it somewhere else, or if the user is particularly privileged, do something more destructive, or create a new user with similar privileges and send its credentials somewhere the bad guy can retrieve them and use them later, even after the vulnerability has been discovered and fixed.

So, clearly, a site that accepts data uploaded by users, stores it, and then displays it, needs to be careful of what's in that data.

But even a site that doesn't store any user data can be vulnerable. Suppose a site lets users search by going to http://example.com/search?q=somethingtosearchfor (Google does something similar to this), and then displays a page showing what the search string was and what the results were. An attacker can embed Javascript into the search term part of that link, put that link somewhere people might click on it, and maybe label it "Cute Kitten Pictures". When a user clicks the link to see the kittens, her browser visits the site and tries the search. It'll probably fail, but if the site embeds the search term in the results page unchanged (which Google doesn't do), the attacker's code will run.

Why is it called Cross-Site Scripting? Because it allows an attacker to run their script on a site they don't control.

CSRF

Cross Site Request Forgeries

The essence of a CSRF attack is a malicious site making a request to another site, the site under attack, using the current user's permissions.

That last XSS example could also be considered a CSRF attack.

As another, extreme example, suppose a site implemented account deletion by having a logged-in user visit (GET) /delete-my-account. Then all a malicious site would have to do is link to yoursite.com/delete-my-account and if a user who was logged into yoursite.com clicked the link, they'd make the /delete-my-account request and their account would be gone.

In a more sophisticated attack, a malicious site can build a form or make AJAX calls that do a POST or other request to the site under attack when a user visits the malicious site.

Protecting against vulnerabilities

Protections in the server and application

SQL Injection protection

Django's ORM, and most database interfaces I've seen, provide a way to specify parameters to queries directly, rather than having the programmer build the whole query as a string. Then the database API can do whatever is appropriate to protect against malicious content in the parameters.

XSS protection

Django templates apply "escaping" to all embedded content by default. This marks characters that ordinarily would be special to the browser, like "<", so that the browser will just display the "<" instead of interpreting it. That means if content includes "<SCRIPT>...</SCRIPT>", instead of the browser executing the "..." part, the user will just see "<SCRIPT>...</SCRIPT>" on the page.

CSRF protection

We obviously can't disable links to other sites - that would break the entire web. So to protect against CSRF, we have to make sure that another site cannot build any request to our site that would actually do anything harmful.

The first level of protection is simply making sure that request methods like GET don't change anything, or display unvalidated data. That blocks the simplest possible attack, where a simple link from another site causes harm when followed.

A malicious site can still easily build a form or make AJAX calls that do a POST or other request to the site under attack, so how do we protect against that?

Django's protection is to always include a user-specific, unguessable string as part of such requests, and reject any such request that doesn't include it. This string is called the CSRF token. Any form on a Django site that does a POST etc has to include it as one of the submitted parameters. Since the malicious site doesn't know the token, it cannot generate a malicious POST request that the Django site will pay any attention to.

Protections in the browser

Modern browsers implement a number of protections against these kinds of attacks.

"But wait", I hear you say. "How can I trust browsers to protect my application, when I have no control over the browser being used?"

I frequently have to remind myself that browser protections are designed to protect the user sitting in front of the browser, who for these attacks, is the victim, not the attacker. The user doesn't want their account hacked on your site any more than you do, and these browser protections help keep the attacker from doing that to the user, and incidentally to your site.

Same-origin security policy

All modern browsers implement a form of Same Origin Policy, which I'll call SOP. In some cases, it prevents a page loaded from one site from accessing resources on other sites, that is, resources that don't have the same origin.

The most important thing about SOP is that AJAX calls are restricted by default. Since an AJAX call can use POST and other data-modifying HTTP requests, and would send along the user's cookies for the target site, an AJAX call could do anything it wanted using the user's permissions on the target site. So browsers don't allow it.

What kind of attack does this prevent? Suppose the attacker sets up a site with lots of cute kitten pictures, and gets a user victim to access it. Without SOP, pages on that site could run Javascript that made AJAX calls (in the background) to the user's bank. Such calls would send along whatever cookies the user's browser had stored for the bank site, so the bank would treat them as coming from the user. But with SOP, the user's browser won't let those AJAX calls to another site happen. They can only talk to the attacker's own site, which doesn't do the attacker any good.

CSP

Content Security Policy (CSP)

CSP is a newer mechanism that browsers can use to better protect from these kinds of attacks.

If a response includes the CSP header, then by default the browser will not allow any inline javascript, CSS, or use of javascript "eval" on the page. This blocks many forms of XSS. Even if an attacker manages to trick the server into including malicious code on the page, the browser will refuse to execute it.

For example, if someone uploads a comment that includes a <script> tag with some Javascript, and the site includes that in the page, the browser just won't run the Javascript.

Conclusion

I've barely touched the surface on these topics here. Any web developer ought to have at least a general knowledge of common vulnerabilities, if only to know what areas might require more research on a given project.

A reasonable place to start is Django's Security Overview.

The OWASP Top Ten is a list of ten of the most commonly exploited vulnerabilities, with links to more information about each. The ones I've described here are numbers 1, 3, and 8 on the list, so you can see there are many more to be aware of.

Django's models, views and templates

By David Fozo's Blog from Django community aggregator: Community blog posts. Published on Nov 10, 2016.

MVC overview

Django loosely follows the MVC design pattern. That stands for Model-View-Controller. Model is the database handling layer defined in models.py, View is the display layer (html files), that is defined in the “templates” directory and also views.py doing this. The Controller is responsible for the user's input, surprisingly that work is also done in the views.py file. You will see these parts in action all working together.

 

 

In todays tutorial we will display an article on our website. Let's dive in!

 

Setup

 

If you haven't follow earlier tutorials, you can clone my repository from github:

 

mkdir -p DjangoTutorial/{static,virtualenv,source,database,media}

virtualenv --python=python3 DjangoTutorial/virtualenv/

git clone https://github.com/fozodavid/DjangoTutorial.git –branch exercise2 –single-branch DjangoTutorial/source

cd DjangoTutorial/source

touch MyTutorial/local_settings.py

 
***MyTutorial/local_settings.py***

import os

from MyTutorial.settings import BASE_DIR

 
 

# SECURITY WARNING: keep the secret key used in production secret!

SECRET_KEY = 'rf@7y-$2a41o+4&z$ki0&=z)(ao=@+$fseu1f3*f=25b6xtnx$'

 
 

# SECURITY WARNING: don't run with debug turned on in production!

DEBUG = True

 
 

ALLOWED_HOSTS = []

 
 

# Database

# https://docs.djangoproject.com/en/1.10/ref/settings/#databases

 
 

DATABASES = {

'default': {

'ENGINE': 'django.db.backends.sqlite3',

'NAME': os.path.join(BASE_DIR,'..','database','db.sqlite3'),

}

}

*** end of MyTutorial/local_settings.py ***

git branch -m exercise2 master

 

Tutorial

 
As with the previous tutorial, we will follow git and TDD best practices. So let's create a new branch for the new feature we will implement. Let's call that article branch:

git checkout -b article

git branch

 

You should see two branches and “article” selected:

* article

master

 

 Let's activate virtualenv.

source ../virtualevn/bin/activate

 

Let's create our homepage. Before anything else, the principle of TDD require us to write so some tests. We will test if our root domain (e.g.: example.com) will return our index.html template. Our bet if it starts with <!doctype html> and there is “Hello World” in it, then it's a good enough test to determine if its our index.html. So let's insert these snippets on top and to the bottom of main/tests.py.

***main/tests.py***

from django.http import HttpRequest

...

def test_root_loads_index_html(self):

request = HttpRequest()

response = home(request)

self.assertTrue(response.content.startswith(b'<!doctype html>\n<html>\n<head>'))

self.assertIn("Hello World",response.content.decode())

 

As a side note I tell you, that I had a lot of headache to wrap my mind around the fact that writing a test a lot of times is more difficult than writing the actual thing (at least at the beginning). So you don't need to understand exactly what's going on in the tests. Focus on the “real” code just now. You will get the hang of it with time.

 

Let's run the test. It should fail as expected.
python3 manage.py test

 

Okay, so let's create index.html. Django reads html files from directories called templates in it's apps. Let's do this.

mkdir main/templates

touch main/templates/index.html

 

Let's write some minimal html.

<!doctype html>

<html>

<head>

<meta http-equiv="Content-type" content="text/html; charset=utf=8">

<meta name="viewport" content="width=device-width, initial-scale=1">

</head>

<body>

<p>Hello World</p>

</body>

</html>

 

You should have a look at how the website looks like for real.

python3 manage.py runserver

 

That will start our server. Type “localhost” as a domain name (website name).

You should see:

Hello World

 

Let's run the test again. Ctrl+C to stop the server.

python3 manage.py test

 

It should pass now.

 

..

------------------------------------------------

Ran 2 tests in 0.016s

 

OK

 

We have some working code on our hand, it's time for a commit.

git status

 

You should see these differences:

...

modified: main/tests.py

modified: main/views.py

...

Untracked files:

(use "git add <file>..." to include in what will be committed)

 

main/templates/

...

 

Let's finish the procedure:

git add .

Git commit -m “Index.html created, views.py updated”

 

It's time for adding some dynamic content into our homepage. Before we dive in some word about models. Model are the equivalent of database tables in a SQL database. Django have a thing called ORM (Object Relational Mapping), which translates your SQL into Python classes and functions. Let's see a basic example.

***main/models.py***

 

Class Article(model.Models):

title = models.CharField(max_length=100)

text = models.TextField()

 

Will create the following SQL query:

CREATE TABLE "main_article" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "title" varchar(100) NOT NULL, "text" text NOT NULL);

 

As usual, before anything we do, let's write some tests first. Insert these snippets onto the top and to the bottom.

***main/tests.py***

 

from main.models import Article

...

def test_able_to_save_entries_to_db(self):

Article.objects.create(title="My Awesome Article", text="This is a text.")

a = Article.objects.filter(title="My Awesome Article")

self.assertEqual(a.text,"This is a text.")

 

Run test, expected failure:

 

ImportError: cannot import name 'Article'

 

So, it's time to create our actual model.

***main/models.py ***

from django.db import models

 

# Create your models here.

class Article(models.Model):

title = models.CharField(max_length=100)

body = models.TextField()

 

Run test:

django.db.utils.OperationalError: no such table: main_article

 

As we see earlier, the contents of models.py will translate into an SQL query, but first we need to execute that query.

 

So let me introduce you to a new concept called migrations. Migrations are a sort of version control for your database. If you do some changes to your models in models.py, you will also need “make migrations”, these are actual files in the main/migrations folder describing the changes you made. You also need to execute these files. When you execute with the migrate command, the changes will take effect in your database. Usually, we run these two commands together.

 

python3 manage.py makemigrations

python3 manage.py migrate

 

So now, the test should pass, as indeed there is a table in the database called main_article.

 

python3 manage.py test

 

Should see:

...

---------------------------------------------

Ran 3 tests in 0.016s

 

OK

 

Test passes, time for commit.

 

git status

 

Output:

...

modified: main/models.py

modified: main/tests.py

...

 

Check your work with the git diff command. Also, you should not see any migration files over there. We took care about that , in the .gitignore file. If you see them just leave it for now and download my repo from github for the next exercise. Gitignore need to be configured properly at the beginning of a project, you cannot change it afterwards.

 

git diff

 

Everything fine, let's commit.

 

git add .

git commit -m “Article model defined.”

 

Now let's display our model content in our site. That will require some more complex tests and also some serious reorganization. As you can see I created a different class called ModelTest and moved able_to_save_entries_to_db over there. Also I implemented a setUp and tearDown function, which acts as the name implies. Before any tests the setup will set the initial environment and the tearDown will destroy that environment, in between the tests of that class will run. I invite you to study it and let me know in the comment section if I should make some points more clear here.

 

***main/tests.py***

from django.core.urlresolvers import resolve

from django.test import TestCase

from django.http import HttpRequest

from main.views import home

from main.models import Article

 

# Create your tests here.

class HomepageTest(TestCase):

 

def test_root_resolves_to_home(self):

root=resolve('/')

self.assertEqual(root.func,home)

 

def test_root_loads_index_html(self):

request = HttpRequest()

response = home(request)

self.assertTrue(response.content.startswith(b'<!doctype html>\n<html>\n<head>'))

self.assertIn("Hello World",response.content.decode())

 

 

class ModelTest(TestCase):

 

def setUp(self):

self.test_title = "TEST My Awesome Article TEST"

self.test_body = "This is a text"

Article.objects.create(title=self.test_title, text=self.test_body)

self.article = Article.objects.all().filter(title=self.test_title)

 

def tearDown(self):

self.article.delete()

 

def test_able_to_save_entries_to_db(self):

self.assertEqual(self.article.values()[0]['text'],self.test_body)

 

def test_index_html_displays_article(self):

request = HttpRequest()

response = home(request)

self.assertTrue(response.content.startswith(b'<!doctype html>\n<html>\n<head>'))

self.assertIn(self.test_body,response.content.decode())

 

Let's run our test. Expected failure. Our article is not displayed. The test is looking for the article test, so lets write that in.

 

***index.html***

...

<body>

<p>Hello World</p>

<p>{{ article.text }}</p>

</body>

 

What you've seen here is the Django Template Language in action. The dynamic content will be displayed in these “holes” marked with curly braces.

 

Run the test.

python3 manage.py test

Still failing. Article is nowhere to be around. The reason is because, you need to connect the data with the templates somehow. The solution lies in rewriting views.py and the home function specifically.

 

***main/views.py***

from django.shortcuts import render

from main.models import Article

 

# Create your views here.

def home(request):

article = Article.objects.last()

return render(request,'index.html',{'article':article,})

 

Okay. What happened here, is that we imported the Article class from our main.models. We get the last object of the article class and then render it. The third argument of render is a dictionary that will set the variables for our index.html template. We have an article variable in the curly braces already, views.py will give the meaning of this article variable.

Also just for fun, let's create an article for ourselves. Django has a built in shell interface. It is very useful for testing and exploring.

 

python3 manage.py shell

>>>from main.models import Article

>>>Article.objects.create(title=”Main Title”, text=”Wonderful new site”)

>>>exit()

 

Have a look at the site right now:

python3 manage.py runserver

Type “localhost” into the browser.

You should see:

Hello World

Wonderful new site

 

Also let's run the test.

….

----------------

Ran 4 tests in 0.016s

 

OK

 

Finally everything is passing. Let's do a commit.

 

git status

git add .

Git commit -m “Article is displayed on home page”

 

We did everything we set out to do and the code is stable. Let's merge these advancements into the master branch.

git log

 

You should see 5 commits. If you remember, we have only the top tree on our branch.

git rebase -i HEAD~3

 

Interactive menu pops up. Change the bottom two from “pick” to “squash”.

 

pick ff96ece index.html created, views.py updated

squash fd4c544 Article model defined

squash 278150b Article is displayed on home page

 

Save and quit: Ctrl+O, Ctrl+X if you are using nano. Another interactive menu pops up. You might edit the commit message the following way:

 

# This is a combination of 3 commits.

# The first commit's message is:

Article model is displayed on home page

 

# Please enter the commit message for your changes. Lines starting

Ctrl+O, Ctrl+X if you are using nano.

 

Change the branch to master:

git checkout master

 

Merge our article branch and then delete it.

git merge article

git branch -d article

 

Finally, deactivate virtualenv.

deactivate

 

Woah! You made big steps towards mastering Django. You've seen how templates, views and models interact, you also did some solid test-driven development. Let me know in the comment section if you have any question.

 

See you next time!

Securing Django with multi factor authentication using Django-MFA

By Micropyramid django from Django community aggregator: Community blog posts. Published on Nov 09, 2016.

What is MFA?

Multifactor authentication (MFA) is a security system that requires more than one method of authentication to verify the user’s identity for a login or other transaction.

Why go for MFA?

One of the major problems with traditional user ID and password login is the need to maintain a password database. Whether encrypted or not, if the database is captured it provides the hacker with a source to verify his guesses at speeds limited only by his hardware resources. Given enough time, a captured password database will fall. To avoid this break we do prefer multifactor authentication.

Multifactor Authentication Technologies:

There are multiple ways we could get the MFA like using hardware devices that the user carries to authorize access to a network service. Software-based security token applications that generate a single-use login PIN. Soft tokens are often used for multifactor mobile authentication, in which the device itself – such as a smartphone – provides the possession factor or  SMS messages and phone calls sent to a user as an out-of-band method, smartphone OTP apps.

In the current blog post, we see how to implement MFA in Django.

How can we implement MFA in Django:

We do have an awesome package developed in Django called DjangoMFA. That gives us the flexibility of how to setting up MFA. We can generate two types of passwords in Django-MFA one is HMAC-based One Time Password (HOTP) and Time-based One-time Password Algorithm (TOTP). In this blog post, we will see how to enable the TOTP-based MFA using Django-MFA.

We can get the facility of MFA using Django-MFA by following the following simple steps.

  1. Install Django-MFA with the following command.
    pip install django-mfa
  2. Keep the following settings in your settings.py
    INSTALLED_APPS = [
    	   ...
    	   'otp_app',
    	]
    MIDDLEWARE_CLASSES = [
    	   ...
    	   'otp_app.middleware.MfaMiddleware',
    	]
  3. Include the following in your root urls.py
    urlpatterns = [
    	   ...
    	   url(r'^settings/', include('otp_app.urls', namespace="mfa")),
    	]

That's it, now you have the feature of MFA in your Django project. Once you have followed the above steps, you can just go to "/settings/security/” in your address bar, you can get the flow of enabling MFA to your account.

Django Under the Hood 2016

By Simple is Better Than Complex from Django community aggregator: Community blog posts. Published on Nov 08, 2016.

This was the third edition of the Django: Under The Hood (DUTH) conference. Two days of awesome talks and two days of sprints. The conference was organised by members of the Django community, including several members of the Django core team, and in association with the Dutch Django Association.

The conference was great! It was my first time in a DUTH conference and also my first time in Amsterdam, so it was quite an experience for me! I’m writing this post to share a little bit of what happened there.


About the Conference

Django Under The Hood

Picture by Bartek Pawlik: https://500px.com/photo/181370473/group-photo-at-django-under-the-hood-2016-by-django-under-the-hood

Over 300 Djangonauts. Stellar organization. Nine great talks. Two days of sprints. Awesome venue. Very healthy and friendly community. You could feel the excitement and enthusiasm of everyone participating in the conference. This was quite a surprise for me! Because the parameter of comparison I had was previous academic conferences I’ve attended – and they are far from being that fun :-)

In the first day we had three talks, starting by Andrew Godwin presenting the underlyings of Channels and discussing about the Django specific implementations.

Ana Balica talked about testing in Django, she presented how the testing framework have evolved since Django 1.0, how the testing framework works under the hood and gave some great insights and tips about extra resources to increase the quality of the tests.

The last talk of the first day was presented by Aymeric Augustin, he talked about debugging performance, both on front end and backend, tackling performance issues related to CSS, JS and much more. In the backend, he talked about the implications of the ORM and gave some insights on the differences between select_related and prefetch_related and also on other ORM related optimizations.

The second day started with a keynote by Jennifer Akullian from Keen.io. She talked about mental health in tech. It’s a very big and serious topic. Many developers worldwide are working way too much and experiencing burnouts, and we are not talking about it.

Loïc Bistuer talked about Django Validation, exploring the main concerns of data validation, such as enforcement, user experience, performance and convenience. He also talked about the different approach to validate user input.

Idan Gazit is a software developer from Heroku, and gave a presentation about the modern JavaScript. He talked about how to work with the newest JavaScript versions, discussed about frameworks, packaging, orchestration and deployment.

Custom Database Backends was the next topic, presented by Michael Manfre. He works at Microsoft and is one of the maintainers of the django-mssql package, which is a third-party database backend to support Microsoft SQL Server. He talked about the challanges to implement a custom database backends and shared his experience implementing the Microsoft SQL Server backend for Django.

Nadia Eghbal tackled the challanges of funding open source. She highlight how tough it is and how different it is from other types of funding, such as startup funding, academic funding, etc. The problem is not only with the money itself but also with the access to the money. Right now many open source funding happens ad hoc, such as crowdfunding, bouties, tipping, but none of them are sustainable.

Finally, the last talk was about Django at Instagram, presented by Carl Meyer, a Django Core developer and Instagram employee. He talked how Django was important for the Instagram growth and how it is still used today. He shared some very impressive numbers, such as 95 million photos and videos are uploaded every day, handling up to 4.2 billion likes every day. In total, their database holds more than 2.3 trillion likes in total. To do that, Instagram have tens of thousands Django servers running. He talked about the evolution of Instagram and some fun facts on how they migrated from Django 1.3 to Django 1.8.


Talks

Below you can find all the nine talks that happened in the DUTH conference in the first two days.

Channels By ANDREW GODWIN

Channels By Andrew Godwin

@andrewgodwin
Slides
Recording

Testing By ANA BALICA

Channels By Ana Balica

@anabalica
Slides
Recording

Debugging By AYMERIC AUGUSTIN

Debugging By Aymeric Augustin

@aymericaugustin
Slides
Recording

Mental Health By JENNIFER AKULLIAN

Mental Health By Jennifer Akullian

@JennyAkullian
Slides
Recording

Validation By LOÏC BISTUER

Validation By Loïc Bistuer

@loic84
Slides
Recording

Modern JavaScript By IDAN GAZIT

Modern JavaScript By Idan Gazit

@idangazit
Slides
Recording

Database Backends By MICHAEL MANFRE

Database Backends By Michael Manfre

@manfre
Slides
Recording

OSS Funding By NADIA EGHBAL

OSS Funding By Nadia Eghbal

@nayafia
Slides
Recording

Django at Instagram By CARL MEYER

Django at Instagram By Carl Meyer

@carljm
Slides
Recording


It was a unique experience. The talks were great. For the next weeks I will try to summarize some of the contents from the talks and write it down in form of posts here in the blog, so we can explore together some of the topics and learn more about it!

Some extra links:

Django Under The Hood 2016 recap

By Krzysztof Żuraw Personal Blog from Django community aggregator: Community blog posts. Published on Nov 06, 2016.

From the beginning I really wanted to contribute to Django. I asked a friend of mine- "Do you know where I can start contributing?" She answers- "Go to Django Under The Hood". So I went. This is my small recap of this very event.

Day one

After wandering a little bit around the city I finally got to the venue and the talks started- the first one was Channels by Andrew Godwin. Until then I had heard about this topic but I hadn't really go into details for what it is useful for. Andrew presented a very thought-through understanding of what channels really are and for what they can be used. But I would like to see them in production to see how this gonna work. As a guy who hadn't heard about this topic before I liked it very much.

Right after that was a talk about testing by Ana Balica. She started by introducing about how testing in django evolved which I really liked. Then there was an introduction what is happening when you execute test suite via django. And what is happening in various testcases classes and clients in Django. I really liked the segment about tools that you can use to exhance your testing and 8 tips on how to speed up tests. Another really interesting thing. You can find slides here.

The last talk on this day was debugging by Aymeric Augustin. It was a talk about how to speed up your page load. As it turns out backend is responsible for only 20% of page load. Good thing to consider when improving performance. To speed your page load you should start by improving your frontend and then go to the backend. When it comes to backend I heard some interesting ideas on how to improve performance.

Day two

The second day started with a keynote by Jennifer Akullian. It was a talk about mental health in IT. I found this topic really interesting and I was happy that it has been raised.

Next talk was a more technical one about validation by Loïc Bistuer. It was a really interesting talk about forms and validation. It was deeply technical which sometimes for me was difficult to understand but it is very good- when something isn't comfortable you don't learn.

Then there was a talk about javascript by Idan Gazit. It was a talk that gave me a lot because of my rising interest in JavaScript. I heard about various tools and what it means to write modern javascript. I also heard about promises- the thing that is right now on top in javascript world so I heard it every other talk from this subject :). But overally talk gave me a lot of information that I can use further.

Next one was a database backends talk by Michael Manfre. It was diving deep into django ORM to show how to develop new database backend for Microsoft MSSQL. A lot of useful info.

After a coffee break, there was a talk about open source founding by Nadia Eghbal. Nice talk about what it means to find founder for open source projects and what challenges you may have along the way

The last talk was about Instagram and how it uses django by Carl Meyer. It was amazing talk! I really liked how they evolved and what was replaced or improved along the way. The funny thing was about Justin Bieber- his photos (especially likes to this photos) heat up the postgres database. I enjoyed the way the instagram handle performance.

Day three & four

As the talks day ended time has come for sprints! There were held in another location of Amsterdam but I found it comfortable too. Also, the experience was really nice as about 300 people were developing the same framework at the same time. At the beginning of the sprint, I decided to work on some GeoDjango stuff. I was able to close one and write some documentation. Awesome time!

Conclusion

It was a great time in Amsterdam! Talks were deeply technical and sprints productive. Superb organization. Highly recommended to everyone!

Cover picture taken from DUTH twitter account: Under the Hood made by Bartek.

Special thanks to Kasia for being editor for this post. Thank you.

I'm Gonna Regret This

By chrism from . Published on Jun 14, 2016.

A plea for liberals to fight for individual rights.

Is Open Source Consulting Dead?

By chrism from . Published on Sep 10, 2013.

Has Elvis left the building? Will we be able to sustain ourselves as open source consultants?

Consulting and Patent Indemification

By chrism from . Published on Aug 09, 2013.

Article about consulting and patent indemnification

Python Advent Calendar 2012 Topic

By chrism from . Published on Dec 24, 2012.

An entry for the 2012 Japanese advent calendar at http://connpass.com/event/1439/

Why I Like ZODB

By chrism from . Published on May 15, 2012.

Why I like ZODB better than other persistence systems for writing real-world web applications.

A str. __iter__ Gotcha in Cross-Compatible Py2/Py3 Code

By chrism from . Published on Mar 03, 2012.

A bug caused by a minor incompatibility can remain latent for long periods of time in a cross-compatible Python 2 / Python 3 codebase.

In Praise of Complaining

By chrism from . Published on Jan 01, 2012.

In praise of complaining, even when the complaints are absurd.

2012 Python Meme

By chrism from . Published on Dec 24, 2011.

My "Python meme" replies.

In Defense of Zope Libraries

By chrism from . Published on Dec 19, 2011.

A much too long defense of Pyramid's use of Zope libraries.

Plone Conference 2011 Pyramid Sprint

By chrism from . Published on Nov 10, 2011.

An update about the happenings at the recent 2011 Plone Conference Pyramid sprint.

Jobs-Ification of Software Development

By chrism from . Published on Oct 17, 2011.

Try not to Jobs-ify the task of software development.

WebOb Now on Python 3

By chrism from . Published on Oct 15, 2011.

Report about porting to Python 3.

Open Source Project Maintainer Sarcastic Response Cheat Sheet

By chrism from . Published on Jun 12, 2011.

Need a sarcastic response to a support interaction as an open source project maintainer? Look no further!

Pylons Miniconference #0 Wrapup

By chrism from . Published on May 04, 2011.

Last week, I visited the lovely Bay Area to attend the 0th Pylons Miniconference in San Francisco.

Pylons Project Meetup / Minicon

By chrism from . Published on Apr 14, 2011.

In the SF Bay Area on the 28th, 29th, and 30th of this month (April), 3 separate Pylons Project events.