One aspect of working with database (doesn't relates only to GAE) is retrieving stored data. It can be single entity or several entities. Quite often, need is to look at large number of data whether is to displaying for user or some background batch processing. To retrieve large number of data is slower and leads to increasing latency so often pagination is used. Pagination (in context of web development) is process in which we instead of retrieving large number of entities from database, we retrieve smaller number of entities with fixed size and then we retrieve next "page" and so on. Large and small number of entities is relative. We could say that large is anything above 100 items. Common case is table of 10, 25, or 50 items with buttons (links) for next/previours pages or links to go to table page 2, 3, 4 etc. In this article I want to write some ways how to do pagination. Examples will be in done Python with use of NDB.
Let's create app.yaml for project settings with basic stuff:
application: my-gae-app version: 1 runtime: python27 api_version: 1 threadsafe: yes handlers: - url: .* script: main.app libraries: - name: webapp2 version: "2.5.2" - name: jinja2 version: "latest"
then we can create file models.py in which we will define our simple test model:
from google.appengine.ext import ndb class TestPagModel(ndb.Model): number = ndb.IntegerProperty() created = ndb.DateTimeProperty(auto_now_add=True)
I set number as integer property so that it's easier to track which entities we are using. Created is datetime property which represents creation of datetime.
Lets create handler in file main.py for creation of 100 entities of TestPagModel and in this examples we will set size for page of 10 entities.
import webapp2 from google.appengine.ext import ndb from models import TestModelPag class CreateHandler(webapp2.RequestHandler): """Create items which we will use for pagination""" def get(self): entities = [] for i in range(100): tm = TestModelPag(name=i) entities.append(tm) ndb.put_multi(entities) self.response.write('Done!!!') app = webapp2.WSGIApplication([ ('/create', CreateHandler), ], debug=True)
we can run locally from project's folder with
dev_appserver.py runserver .
or
gcloud preview app run .
or deploy to server with:
appcfg.py update .
gcloud preview app deploy .
Now I will describe possibilities how to do pagination with NDB:
When we look at documentation for Query class https://cloud.google.com/appengine/docs/python/ndb/queryclass we see several options which we will use in examples. Offset is number of query results to skip.
Lets create folder templates in project's folder in which we will create file offset_pagination.html which will be used for rendering results. We will create simple html table in which we will display property of our model and implement pagination links for Next and Previous results (if applicable).
<!DOCTYPE html> <html> <head lang="en"> <meta charset="UTF-8"> <title>Offset Pagination</title> </head> <body> <table> <thead> <tr> </tr> </thead> <tbody> {% for obj in objects %} <tr> <td>{{ obj.number }}</td> </tr> {% endfor %} </tbody> <tfoot> <tr> <p> {% if prev %} <a href="/offset_pagination?offset={{ prev_offset }}">Previous</a> {% endif %} {% if next %} <a href="/offset_pagination?offset={{ next_offset }}">Next</a> {% endif %} </p> </tr> </tfoot> </table> </body> </html>
In link we are passing as url parameter offset variable which will have different value for Previous and for Next link.
We'll create class method for our model to do pagination queries via offset.
ITEMS = 10 class TestPagModel(ndb.Model): number = ndb.IntegerProperty() created = ndb.DateTimeProperty(auto_now_add=True) @classmethod def offset_pagination(cls, offset): """Pagination through query offset""" if offset: try: offset = int(offset) except ValueError: offset = 0 objects, next_cursor, more = cls.query().order(cls.number).fetch_page(ITEMS, offset=offset) prev_offset = max(offset - ITEMS, 0) prev = True if offset else False next_ = True if more else False next_offset = '' if next_: next_offset = offset + ITEMS return {'objects': objects, 'prev': prev, 'next': next_, 'prev_offset': prev_offset, 'next_offset': next_offset}
Brief explanation: we are using method fetch_page which returns tuple of objects, cursor for next query and boolean value if there are possible more entities. I am doing ordering by number so that we have objects order from 1 until 100 as we created entities, it's easier to follow when displaying. In fetch_page we have two arguments ITEMS - number of entities we want to retrieve and offset as option of have many items to skip.
so for example to get first 10 items we would call:
fetch_page(10, offset=0)
to get next items we would call
fetch_page(10, offset=10)
etc. based on more variable we estimate whether there will be more results to show hide through template variables prev and next. In variables prev_offset and next_offset is calculated next offset or previous offset. Minimal offset is 0, that's why when calculation previous offset it's either zero or some positive number. In this method we are returning dictionary with all variables which we will use for displaying. Handler looks simply (added line in main.py):
import os import jinja2 JINJA_ENVIRONMENT = jinja2.Environment( loader=jinja2.FileSystemLoader(os.path.dirname(__file__)), extensions=['jinja2.ext.autoescape'], autoescape=True) class OffsetPaginationHandler(webapp2.RequestHandler): """Display through query offset""" def get(self): offset = self.request.get('offset', '0') res = TestModelPag.offset_pagination(offset) template = JINJA_ENVIRONMENT.get_template('templates/offset_pagination.html') self.response.write(template.render(res))
we will updating routing to route this handler to url offset_pagination:
app = webapp2.WSGIApplication([ ('/create', CreateHandler), ('/offset_pagination', OffsetPaginationHandler), ], debug=True)
In handler we are getting url parameter offset and passing to model method offset_pagination. Returned dictionary with results is rendered in template.
This looks fairly simple and logical, but: Don't do it like this!!!
Biggest reason is that in order to get desired results, it needs to retrieve offset + ITEMS (in our case) number of entities from database. So when you need to get first 10, 20, 30 items it's ok. But what if you want to get 10th page, i.e. from 100 to 110. That means that query will get 110 items from database and skip 100 and return last 10 items (in case number of items to get with query is 10). So if you have for example 1000 items and you are on 990 item... you'll get 1000 items from database with side effect that you will increase latency of your application, since larger number of entities takes longer to get from database. On the other side, most users go through first several pages so potentially safe... but don't be surprised if billing will increase. So just to repeat, don't do it like this.
So if you try out web application, you see that you have on first page from 0 to 9, then from 10 until 19 and so on. Offset parameter in url is changing according direction, i.e. whether we are looking for previous or next page.
Second choice to implement pagination is to use query option start_cursor. Based on documentation cursor is: A query cursor is a small opaque data structure representing a resumption point in a query. Let's see how this can be implemented for our TestPagModel:
@classmethod def cursor_pagination(cls, prev_cursor_str, next_cursor_str): if not prev_cursor_str and not next_cursor_str: objects, next_cursor, more = cls.query().order(cls.number).fetch_page(ITEMS) prev_cursor_str = '' if next_cursor: next_cursor_str = next_cursor.urlsafe() else: next_cursor_str = '' next_ = True if more else False prev = False elif next_cursor_str: cursor = Cursor(urlsafe=next_cursor_str) objects, next_cursor, more = cls.query().order(cls.number).fetch_page(ITEMS, start_cursor=cursor) prev_cursor_str = next_cursor_str next_cursor_str = next_cursor.urlsafe() prev = True next_ = True if more else False elif prev_cursor_str: cursor = Cursor(urlsafe=prev_cursor_str) objects, next_cursor, more = cls.query().order(-cls.number).fetch_page(ITEMS, start_cursor=cursor) objects.reverse() next_cursor_str = prev_cursor_str prev_cursor_str = next_cursor.urlsafe() prev = True if more else False next_ = True return {'objects': objects, 'next_cursor': next_cursor_str, 'prev_cursor': prev_cursor_str, 'prev': prev, 'next': next_}
Corresponding html template would look like this:
<!DOCTYPE html> <html> <head lang="en"> <meta charset="UTF-8"> <title>Cursor pagination</title> </head> <body> <table> <thead> </thead> <tbody> {% for obj in objects %} <tr> <td>{{ obj.number }}</td> </tr> {% endfor %} </tbody> <tfoot> <tr> <p> {% if prev %} <a href="/cursor_pagination?prev_cursor={{ prev_cursor }}">Previous</a> {% endif %} {% if next %} <a href="/cursor_pagination?next_cursor={{ next_cursor }}">Forward</a> {% endif %} </p> </tr> </tfoot> </table> </body> </html>
To use cursors (as parameters in url) we have to convert into string representation via:
cursor_str = cursor.urlsafe()
to convert back into cursor object which can used in query:
cursor = Cursor(urlsafe=cursor_str)
Method takes two arguments: cursor to be used for previous page, cursor to be used in next page. Method returns dictionary with variables that are used in html template. Overall method looks a bit more complicated so I'll try to explain each part.
First if branch is used for start (first) page when there are no url parameters. Again we use method fetch_page this time with parameter how many items we want to retrieve. There is no previous cursor so we set empty string, as also boolean flag set to false. If there are more results is based on more variable and next_ variable is set accordingly.
Second branch is used for forward direction, i.e. when user clicks Next link. We take next_cursor and used it in fetch as a value for parameter start_cursor. Now previous cursor is set cursor which was used as next_cursor and new next_cursor is the one returned for query.
Third branch is for backward direction, i.e. when user clicks on Previous link. Here we use opposite ordering. If there is no ordering then it should be used ordering by key value, so in this case it would be query().order(-cls.key) etc. As input start cursor we use previous cursor. Since it's opposite ordering returned results are in opposite order, for example first item is 19 and last 10. To display consistent order of items we are reversing list of results.
Clear advance in comparison with first approach is that you retrieve all time items from database how much is necessary (set in fetch_page method) not anything extra hidden. Drawnback (not so big) is that you need to have extra index for opposite ordering (in case of using based on some field) and little bit uglier url, since string representation of cursor is quite long :).
For the sake of completeness here is handler code in main.py file:
class CursorPaginationHandler(webapp2.RequestHandler): def get(self): prev_cursor = self.request.get('prev_cursor', '') next_cursor = self.request.get('next_cursor', '') res = TestModelPag.cursor_pagination(prev_cursor, next_cursor) template = JINJA_ENVIRONMENT.get_template('templates/cursor_pagination.html') self.response.write(template.render(res))
app = webapp2.WSGIApplication([ ('/create', CreateHandler), ('/cursor_pagination', CursorPaginationHandler), ('/offset_pagination', OffsetPaginationHandler), ], debug=True)
Complete code is on this link https://github.com/zdenulo/gae-ndb-pagination.