Google App Engine Flexible (with examples)

April 4, 2017

English Google App Engine

So finally at Cloud Next 2017 (March 2017), Google App Engine Flexible was announced as generally available. You can read about history and my views of GAE Flex in these posts:

Google App Engine Flexible Environment status

Reflections on Google App Engine Flexible going Beta

Since functionalities should be stable now, lets move forward and see how can we use this. First I want to write about some general properties and comparison to GAE Standard and then provide some examples for Python.

In one sentence, "GAE Flex is Docker container running on top of Google Compute Engine". But there's more to this. Let's see what.

In order to use GAE Flex, your app will be deployed as Docker container. Google offers Docker images/runtimes for several languages: Python, Java, Go, Ruby, NodeJS, .NET, PHP. There is possibility to create custom runtime/docker image so you can deploy anything which listens to http port 80.
More flexibility regarding choice of CPU/RAM (and hard disk). Bassicaly you set number of cores, amount of memory and hard disk size.
Possibility to install third party libraries which are not possible to install in GAE Standard sandbox. This comes out of first point. Also for example it's possible to create files on hard drive
Possibility to SSH into concrete VM instance
There is always minimal one instance alive (even when there is no traffic/requests)
Deployment is done via one command: gcloud app deploy (plus some optional parameters)
GAE sends requests to /_ah/vm_health to check health status of instance. This is bit annoying since it floods logs with there requests. Luckily health checks settings can be modified.
Google takes care of operating system and security updates. It's written in documentation that there are weekly restarts instances in order to apply updates) so during this time instance is not serving requests
No possibility to use GAE Standard services like Memcache, Push Taskqueues, Datastore library (with improved functionality), Search etc. Communication with other GCP Products (like Datastore) is done via Client Libraries although most are in beta/alpha stage of development at the moment.
Automatic scaling up & down (based on settings)

So when would be good to use GAE Flex (instead of GAE Standard)?

You want to use language which is not supported on GAE Standard
You want to use library which is not possible to use the GAE Standard Sandbox
GAE Standard machine types doesn't fit your need (You need bulky CPU or/and RAM)
Save some $$$ by switching from GAE Standard to GAE Flex

There is big difference between case if you have existing GAE application or you are starting from the scratch. If you have existing application it depends what kind of GAE services you are using and how big your code base is. Nothing is impossible but at what price? :) So points 2,3,4 are very relative. Best case in such situation is to isolate code / create microservice where you will implement custom code / use custom machine type (check example bellow)

When to use GAE Standard on contrary?:

You are programming in one of GAE Standard supported languages (Python, Go, Java, PHP) and you are not limited by not supported libraries and you are pretty much assured that in future you won't need one
You benefit very much from integerated GAE standard services like taskqueues, cron, memcache, datastore

Last but not the least of course is pricing:

One 2.4 GHz core with 1GB RAM (and 10GB HDD) costs 44$ per month (this is minimal configuration for GAE Flex currently) whereas if you would use instance with same parameters on GAE Standard (F4_1G/B4_1G), it would cost 176$ per month. Normal Compute Engine instance with these parameters costs ~20$ per month (with sustained discount). So keep this in mind also. Maybe you could be well with GCE instance and don't necessarily need GAE Flex. GAE Flex in comparison with GCE can be advantage if you do deployments often or you have regular spikes in traffic (regional use for example) and autoscaling comes handy in such cases.

In conclusion, to decide how to architecture your application, which environment to use etc. doesn't have to be trivial task, I guess it's always possible to find solution based on needs and priorities.

Now lets look at some example which is using GAE Flexible. I am doing this in Python, since it's my primary programming language. If you don't have, you will need to install Google Cloud SDK and configure/authenticate for Google Cloud project you want to work on.

So far I made these examples:

Google App Engine Standard + Flexible (with Pandas)

Django, Python 3 and Cloud SQL

Google App Engine Standard + Flexible (with Pandas)

Lets say that you have some GAE application and you deperately need to use Pandas library (used for data analysis and manipulation) which is one of top desired libraries for GAE. Since we are already using GAE Standard we will add another service as GAE Flex where we will use Pandas.

For the sake of simplicity in Flex service we are generating with Numpy random Pandas array and converting into html table and that html string we are saving in Datastore with client library (since we cannot use native GAE Datastore service). In GAE Standard we are fetching the same entities and displaying generated html.

In both services we are using webapp2 framework. In GAE Standard it's included in SDK but in Flex we need to install it explicitly as well as some extra libraries like Gunicorn, Paste and WebOb.

With dispatch.yaml file are are customizing routing so that GAE Flex service is available under url /pandas/

Here is full repository, I will go here through most important files:

dispatch.yaml

dispatch:

  - url: "<app-id>.appspot.com/pandas/"
    service: flex-module

  - url: "<app-id>.appspot.com/"
    service: default

In the file, default service (GAE Standard) is used to serve content under root url and GAE Flex service I named as flex-module and it is used to serve content under /pandas/. This configuration depends on the needs of course, new flex-module service is also automatically avilable under flex-module.<app-id>.appspot.com/pandas/ url.

For default service (GAE Standard) main.py file looks like this:

import webapp2
import jinja2

from models import PandasText

JINJA_ENV = jinja2.Environment(
    loader=jinja2.FileSystemLoader('templates', ),
)


class MainHandler(webapp2.RequestHandler):
    def get(self):
        notes = PandasText.query().fetch()
        template = JINJA_ENV.get_template('index.html')
        self.response.write(template.render(notes=notes))


app = webapp2.WSGIApplication([
    ('/', MainHandler),
], debug=True)

Like I emphasized earlier, nothing fancy is going on here, in MainHandler we are just fetching PandasText entities and render in index.html file

this is app.yaml file

runtime: python27
api_version: 1
threadsafe: yes
service: default

handlers:
- url: /favicon\.ico
  static_files: favicon.ico
  upload: favicon\.ico

- url: .*
  script: main.app

libraries:
- name: webapp2
  version: "2.5.2"

- name: jinja2
  version: "latest"

for flex-module (GAE Flex) main.py file:

import os

import webapp2
import pandas as pd
import numpy as np
from webapp2_extras import routes

from google.cloud import datastore

client = datastore.Client(project=os.environ['APP_ID'])


class PandasHandler(webapp2.RequestHandler):
    def get(self):
        s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])
        df = s.to_frame()
        html = df.to_html()
        key = client.key('PandasText')
        entity = datastore.Entity(key=key)
        entity['text'] = html

        client.put(entity)
        self.response.write(html)


app = webapp2.WSGIApplication([
    routes.PathPrefixRoute('/pandas', [
        webapp2.Route('/', PandasHandler),
    ])
], debug=True)


def main():
    from paste import httpserver
    httpserver.serve(app, host='127.0.0.1', port='8081')


if __name__ == '__main__':
    main()

PandasHandler does some minimal use of Pandas (generates random array, converts into html) and saves data in Datastore with Datastore client library and renders generated data on the page.

app.yaml is a bit different:

runtime: python
env: flex
entrypoint: gunicorn -b :$PORT main:app
service: flex-module

runtime_config:
  python_version: 2

resources:
  cpu: 1
  memory_gb: 1
  disk_size_gb: 10

manual_scaling:
  instances: 1

health_check:
  enable_health_check: False

env_variables:
  APP_ID: '<app id>'

with env: flex we say that this is Flexible not Standard

with entrypoint we define command which is run when instance is started, in this case Gunicorn (WSGI server) handles requests to WSGI app in main.py file

under resources I defined minimalistic configuration and I set just one instance which is always alive (without autoscaling). Note minimal configuration is 1CPU

I dissabled health_check (although requests are visible in logs, not sure why)

I added APP_ID variable to represent app id in order to initialize Datastore client libary in main.py file. One thing is that is it's available automatically under GCLOUD_PROJECT variable, but with explictly defining app id, it's possible to access Datastore locally, i.e. when we type

python flex/main.py

this runs flex-module app locally and when hitting 127.0.0.1:8081:/pandas/ Pandas data is generated and entity is created and saved in online Datastore! Keep this in mind during development.

This example project doesn't work so well locally because GAE Standard has it's own local database and we could use Datastore emulator for GAE Flex but each service saves data somewhere else so it's not possible to share data. It's possible to run them both at the same time though, that's why I set for flex module port 8081 since GAE Standard usually runs on 8080.

All dependencies are listed in requirements.txt file since container for Python runtime automatically gets requirements.txt file from app root folder and installs dependencies.

Finally to deploy everything, we use command:

gcloud app deploy dispatch.yaml flex/app.yaml standard/app.yaml --verbosity=debug --promote

here we deploy both Standard and Flexible Service and also dispatch.yaml file to set routing. It's possible of course to deploy each part separately. By using --promote flag, we automatically promote deployed services to handle traffic. Keep in mind, that deploying GAE Flex application can take serveral minutes. If after 10 minutes you don't have info about sucessful deployment, checkout logs in cloud console for GAE application

Of course in this example everything could be done in GAE Flex, but for the sake of example we were assuming that you have big app in GAE Standard to which you wanted to add some specific functionality which envolves Pandas.

Django, Python 3 and Cloud SQL

This is example of using Djnago with Python 3.4 (for which there is predefined GAE Flex image) since Django 1.11 (currently latest release) is last version which will support Python 2, so it makes sense to start development in Python 3 so upgrade to new Django version is smooth. It can happen (I honestly hope so) that Python 3 will be available in GAE Standard with which this example stops making sense... but that's not yet case so... I'll provide example.

Since GAE Flex instances are not suitable to save states (since they are created/destroyed occasionaly) I will use in this example Google Cloud SQL which is managed MySQL/PostregSQL database. It gives flexiblity regarding machine type use, storage space, automatic backups, automatic storage increase, read replicas etc. so it goes well with Django.

There are few steps you to be done to setup this:

- Create database

- Create and configure Django project

- Configure Cloud Storage

Creating database

First thing to start working with Cloud SQL is to enable in your console cloud project following APIs: Google Cloud SQL API and Google Cloud SQL

Now we can create MySQL Cloud SQL instance (second generation) with gcloud command:

gcloud sql instances create gae-flex-django --tier=db-f1-micro --activation-policy=ALWAYS

This will create instance named "gae-flex-django" and f1-micro machine type (the smallest since we are just experimenting). It will take few minutes to create. Now Google Cloud SQL instance has public IP, but by default, network access is not allowed, although GAE application has automatic access (within the same cloud project). There are also other possibilities to setup, like automatic backups, replication etc. You can check more in case of need https://cloud.google.com/sql/docs/mysql/.

After that we can set root password with command:

gcloud sql instances set-root-password gae-flex-django --password myrootpassword

This will set root password for instance.

One thing that we will need few times is Cloud SQL Instance connection name, it's mostly in format "<PROJECT_ID>:<ZONE>:<INSTANCE_NAME>".

You can check with this command:

gcloud sql instances describe gae-flex-django3 | grep "connectionName"

Now, in order to connect from our local computer to Cloud SQL instance we have 2 possibilities. One is to use Cloud SQL Proxy, the other one is to allow network traffic for Cloud SQL instance based on IP address of your local computer. I'll use first option since it's I don't have to mess around with Cloud SQL network settings.

In order to use Cloud SQL Proxy we need to download file and make it executable:

wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64

chmod +x cloud_sql_proxy.linux.amd64

Simplest command to run Cloud SQL Proxy is:

./cloud_sql_proxy.linux.amd64 instances="[INSTANCE_CONNECTION_NAME]"=tcp:3306

Now we can connect to Cloud SQL instance with mysql client (need to provide password which you set for root)

mysql --host 127.0.0.1 --user root --password

Now in MySQL console, we can create database for django, lets call it "gae_flex_db":

CREATE DATABASE gae_flex_db;

Little comment: it would be better choice to create extra database user and grant him access to newly created database but I am simplifying so I'll use root user in example.

Finally it's time to move to Django stuff.

Setting up Django project

You can create project (repository) folder on your local computer, I called mine gae_flex_django. Then as with Python projects, best thing is to create virtual environment and install Django there. We can create Django project, I in this example I called it mywebsite:

python manage.py startproject mywebsite

From now on, everything happens inseid this folder. Lets create requirements.txt and add more dependencies which we will need:

Django==1.10
mysqlclient==1.3.10
gunicorn==19.7.1

We need mysqlclient in order to connect to database and we will use gunicorn as webserver. requirements.txt needs to be in mywebsite folder.

We also need to create app.yaml and define our GAE Flex settings:

runtime: python
env: flex
entrypoint: gunicorn -b :$PORT mywebsite.wsgi

runtime_config:
  python_version: 3

resources:
  cpu: 1
  memory_gb: 1
  disk_size_gb: 10

manual_scaling:
  instances: 1


beta_settings:
    cloud_sql_instances: '<Cloud SQL Instance connection name>'

Few notes here:

In entrypoint as I explained earlier: gunicorn is run and in this case it runs WSGI application which is defined in wsgi.py file in mywebsite folder (which is in root mywebsite folder).

In runtime_config we explicitly define Python 3 version.

beta_settings part is obligatory (although at the moment neither GEA Flex or Cloud SQL are in Beta) and keep in mind that you need to enter Cloud SQL instance connection name inside strings, for example 'my-project:us-central1:gae-flex-django', otherwise it won't work.

Next we need to configure settings.py file which is mywebsite folder. Most important parts for start are database settings, that part is here:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'gae_flex_db',
        # 'HOST': '<Cloud SQL IP>',
        'USER': 'root',
        'PASSWORD': 'rootpass',
        # 'PORT': '3306'
    }
}

DATABASES['default']['HOST'] = '/cloudsql/my-project:us-central1:gae-flex-django'
# SECURITY WARNING: don't run with debug turned on in production!
DEBUG = True
if not os.getenv('GAE_INSTANCE'):
    DATABASES['default']['HOST'] = '127.0.0.1'
    # DEBUG = True

Few comments:

os.getenv('GAE_INSTANCE') checks if application is deployed, (in that case, GAE_INSTANCE variable is set) and in that way we know that when we run locally or on deployed instance. when local, database host is set to localhost (127.0.0.1) because of cloud sql proxy which we start locally during mysql setup. In this case everything we do to locally with database is done in mysql database in the cloud. Keep this in mind when you will have local and production environment.

database host needs to be in format '/cloudsql/<Cloud SQL Instance connection name' in order to connect to Cloud SQL instance

Like I mentioned earlier, it's possible to connect to database without proxy, in that case you need to set IP address of your Cloud SQL instance and uncomment POST variable, which for mysql is 3306. It's also required to change Network Traffic settings in Cloud SQL instance so it allows connection / traffic from local IP.

It's also good prior to deployment to add basic logging because in that way we can see in logs if somethings go wrong when application is not working after deployment. I copied basic logging settings from Django documentation:

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
        },
    },
    'loggers': {
        'django': {
            'handlers': ['console'],
            'level': 'INFO'
        },
    },
}

Don't forget to setup ALLOWED_HOSTS variable according the needs. In this project is like this:

ALLOWED_HOSTS = ['.appspot.com', ]

Of course, it's better to specify concretely hosts/domain, but for the sake of genericity in this example I am allowing all appspot.com subdomains. This setting is of course valid only when DEBUG variable is set to False

finally we can make initial database migration:

python manage.py migrate

This will create tables in our Cloud SQL database. Keep in mind, that in order for this to work, you need to have cloud sql proxy running.

We will also create django superuser so we can login into admin part of website:

python manage.py createsuperuser

Setting up Google Cloud Storage

Regarding static files, it's possible for application to serve them, but that's not recommended use, it's better to use dedicated server (like nginx), CDN or in our case we will use Google Cloud Storage. Google Cloud Storage is product on Google Cloud Platform that is well suited for work/serving files like videos, images, data text files etc. Keep in mind that Google Cloud Storage Bucket name has to be unique across internet and it's possible to connect custom domain to the bucket.

We need to create bucket where static files will be uploaded. run command:

gsutil mb gs://<gcs-bucket-name>

And to make it public, we need to run this command:

gsutil defacl set public-read gs://<gcs-bucket-name>

in settings.py file we need to set STATIC_URL path so that it points to GCS bucket:

STATIC_URL = 'https://storage.googleapis.com/<GCS_BUCKET_NAME>/'

finally, I created bash script deploy.sh which is useful for automation deployment:

#!/bin/bash

python manage.py migrate  # apply database migrations in project
python manage.py collectstatic <<< yes  # collect static files with automatic yes answer
gsutil rsync -R static gs://GCS_BUCKET_NAME  # upload/sync static files folder with GCS bucket

gcloud app deploy --verbosity=debug --promote <<< Y  # deploy application and automatically make it default / serving

I hope comments are self explanatory. If deployments run with no errors, you should reach your application on url: http://<my-app-id>.appspot.com/admin/. With superuser credentials you can login into Django admin. If you get Internal Server Error, check logs for GAE first, check all types of logs so you see logs from application, nginx server, vm instance... Note: error logs from Django app are displayed as normal logs.

Now you can proceed with creating some app :)