So finally at Cloud Next 2017 (March 2017), Google App Engine Flexible was announced as generally available. You can read about history and my views of GAE Flex in these posts:
Google App Engine Flexible Environment status
Reflections on Google App Engine Flexible going Beta
Since functionalities should be stable now, lets move forward and see how can we use this. First I want to write about some general properties and comparison to GAE Standard and then provide some examples for Python.
In one sentence, "GAE Flex is Docker container running on top of Google Compute Engine". But there's more to this. Let's see what.
So when would be good to use GAE Flex (instead of GAE Standard)?
There is big difference between case if you have existing GAE application or you are starting from the scratch. If you have existing application it depends what kind of GAE services you are using and how big your code base is. Nothing is impossible but at what price? :) So points 2,3,4 are very relative. Best case in such situation is to isolate code / create microservice where you will implement custom code / use custom machine type (check example bellow)
When to use GAE Standard on contrary?:
Last but not the least of course is pricing:
One 2.4 GHz core with 1GB RAM (and 10GB HDD) costs 44$ per month (this is minimal configuration for GAE Flex currently) whereas if you would use instance with same parameters on GAE Standard (F4_1G/B4_1G), it would cost 176$ per month. Normal Compute Engine instance with these parameters costs ~20$ per month (with sustained discount). So keep this in mind also. Maybe you could be well with GCE instance and don't necessarily need GAE Flex. GAE Flex in comparison with GCE can be advantage if you do deployments often or you have regular spikes in traffic (regional use for example) and autoscaling comes handy in such cases.
In conclusion, to decide how to architecture your application, which environment to use etc. doesn't have to be trivial task, I guess it's always possible to find solution based on needs and priorities.
Now lets look at some example which is using GAE Flexible. I am doing this in Python, since it's my primary programming language. If you don't have, you will need to install Google Cloud SDK and configure/authenticate for Google Cloud project you want to work on.
So far I made these examples:
Google App Engine Standard + Flexible (with Pandas)
Django, Python 3 and Cloud SQL
Lets say that you have some GAE application and you deperately need to use Pandas library (used for data analysis and manipulation) which is one of top desired libraries for GAE. Since we are already using GAE Standard we will add another service as GAE Flex where we will use Pandas.
For the sake of simplicity in Flex service we are generating with Numpy random Pandas array and converting into html table and that html string we are saving in Datastore with client library (since we cannot use native GAE Datastore service). In GAE Standard we are fetching the same entities and displaying generated html.
In both services we are using webapp2 framework. In GAE Standard it's included in SDK but in Flex we need to install it explicitly as well as some extra libraries like Gunicorn, Paste and WebOb.
With dispatch.yaml file are are customizing routing so that GAE Flex service is available under url /pandas/
Here is full repository, I will go here through most important files:
dispatch.yaml
dispatch: - url: "<app-id>.appspot.com/pandas/" service: flex-module - url: "<app-id>.appspot.com/" service: default
In the file, default service (GAE Standard) is used to serve content under root url and GAE Flex service I named as flex-module and it is used to serve content under /pandas/. This configuration depends on the needs of course, new flex-module service is also automatically avilable under flex-module.<app-id>.appspot.com/pandas/ url.
For default service (GAE Standard) main.py file looks like this:
import webapp2 import jinja2 from models import PandasText JINJA_ENV = jinja2.Environment( loader=jinja2.FileSystemLoader('templates', ), ) class MainHandler(webapp2.RequestHandler): def get(self): notes = PandasText.query().fetch() template = JINJA_ENV.get_template('index.html') self.response.write(template.render(notes=notes)) app = webapp2.WSGIApplication([ ('/', MainHandler), ], debug=True)
Like I emphasized earlier, nothing fancy is going on here, in MainHandler we are just fetching PandasText entities and render in index.html file
this is app.yaml file
runtime: python27 api_version: 1 threadsafe: yes service: default handlers: - url: /favicon\.ico static_files: favicon.ico upload: favicon\.ico - url: .* script: main.app libraries: - name: webapp2 version: "2.5.2" - name: jinja2 version: "latest"
for flex-module (GAE Flex) main.py file:
import os import webapp2 import pandas as pd import numpy as np from webapp2_extras import routes from google.cloud import datastore client = datastore.Client(project=os.environ['APP_ID']) class PandasHandler(webapp2.RequestHandler): def get(self): s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e']) df = s.to_frame() html = df.to_html() key = client.key('PandasText') entity = datastore.Entity(key=key) entity['text'] = html client.put(entity) self.response.write(html) app = webapp2.WSGIApplication([ routes.PathPrefixRoute('/pandas', [ webapp2.Route('/', PandasHandler), ]) ], debug=True) def main(): from paste import httpserver httpserver.serve(app, host='127.0.0.1', port='8081') if __name__ == '__main__': main()
PandasHandler does some minimal use of Pandas (generates random array, converts into html) and saves data in Datastore with Datastore client library and renders generated data on the page.
app.yaml is a bit different:
runtime: python env: flex entrypoint: gunicorn -b :$PORT main:app service: flex-module runtime_config: python_version: 2 resources: cpu: 1 memory_gb: 1 disk_size_gb: 10 manual_scaling: instances: 1 health_check: enable_health_check: False env_variables: APP_ID: '<app id>'
with env: flex we say that this is Flexible not Standard
with entrypoint we define command which is run when instance is started, in this case Gunicorn (WSGI server) handles requests to WSGI app in main.py file
under resources I defined minimalistic configuration and I set just one instance which is always alive (without autoscaling). Note minimal configuration is 1CPU
I dissabled health_check (although requests are visible in logs, not sure why)
I added APP_ID variable to represent app id in order to initialize Datastore client libary in main.py file. One thing is that is it's available automatically under GCLOUD_PROJECT variable, but with explictly defining app id, it's possible to access Datastore locally, i.e. when we type
python flex/main.py
this runs flex-module app locally and when hitting 127.0.0.1:8081:/pandas/ Pandas data is generated and entity is created and saved in online Datastore! Keep this in mind during development.
This example project doesn't work so well locally because GAE Standard has it's own local database and we could use Datastore emulator for GAE Flex but each service saves data somewhere else so it's not possible to share data. It's possible to run them both at the same time though, that's why I set for flex module port 8081 since GAE Standard usually runs on 8080.
All dependencies are listed in requirements.txt file since container for Python runtime automatically gets requirements.txt file from app root folder and installs dependencies.
Finally to deploy everything, we use command:
gcloud app deploy dispatch.yaml flex/app.yaml standard/app.yaml --verbosity=debug --promote
here we deploy both Standard and Flexible Service and also dispatch.yaml file to set routing. It's possible of course to deploy each part separately. By using --promote flag, we automatically promote deployed services to handle traffic. Keep in mind, that deploying GAE Flex application can take serveral minutes. If after 10 minutes you don't have info about sucessful deployment, checkout logs in cloud console for GAE application
Of course in this example everything could be done in GAE Flex, but for the sake of example we were assuming that you have big app in GAE Standard to which you wanted to add some specific functionality which envolves Pandas.
This is example of using Djnago with Python 3.4 (for which there is predefined GAE Flex image) since Django 1.11 (currently latest release) is last version which will support Python 2, so it makes sense to start development in Python 3 so upgrade to new Django version is smooth. It can happen (I honestly hope so) that Python 3 will be available in GAE Standard with which this example stops making sense... but that's not yet case so... I'll provide example.
Since GAE Flex instances are not suitable to save states (since they are created/destroyed occasionaly) I will use in this example Google Cloud SQL which is managed MySQL/PostregSQL database. It gives flexiblity regarding machine type use, storage space, automatic backups, automatic storage increase, read replicas etc. so it goes well with Django.
There are few steps you to be done to setup this:
- Create and configure Django project
First thing to start working with Cloud SQL is to enable in your console cloud project following APIs: Google Cloud SQL API and Google Cloud SQL
Now we can create MySQL Cloud SQL instance (second generation) with gcloud command:
gcloud sql instances create gae-flex-django --tier=db-f1-micro --activation-policy=ALWAYS
This will create instance named "gae-flex-django" and f1-micro machine type (the smallest since we are just experimenting). It will take few minutes to create. Now Google Cloud SQL instance has public IP, but by default, network access is not allowed, although GAE application has automatic access (within the same cloud project). There are also other possibilities to setup, like automatic backups, replication etc. You can check more in case of need https://cloud.google.com/sql/docs/mysql/.
After that we can set root password with command:
gcloud sql instances set-root-password gae-flex-django --password myrootpassword
This will set root password for instance.
One thing that we will need few times is Cloud SQL Instance connection name, it's mostly in format "<PROJECT_ID>:<ZONE>:<INSTANCE_NAME>".
You can check with this command:
gcloud sql instances describe gae-flex-django3 | grep "connectionName"
Now, in order to connect from our local computer to Cloud SQL instance we have 2 possibilities. One is to use Cloud SQL Proxy, the other one is to allow network traffic for Cloud SQL instance based on IP address of your local computer. I'll use first option since it's I don't have to mess around with Cloud SQL network settings.
In order to use Cloud SQL Proxy we need to download file and make it executable:
wget https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64
chmod +x cloud_sql_proxy.linux.amd64
Simplest command to run Cloud SQL Proxy is:
./cloud_sql_proxy.linux.amd64 instances="[INSTANCE_CONNECTION_NAME]"=tcp:3306
Now we can connect to Cloud SQL instance with mysql client (need to provide password which you set for root)
mysql --host 127.0.0.1 --user root --password
Now in MySQL console, we can create database for django, lets call it "gae_flex_db":
CREATE DATABASE gae_flex_db;
Little comment: it would be better choice to create extra database user and grant him access to newly created database but I am simplifying so I'll use root user in example.
Finally it's time to move to Django stuff.
You can create project (repository) folder on your local computer, I called mine gae_flex_django. Then as with Python projects, best thing is to create virtual environment and install Django there. We can create Django project, I in this example I called it mywebsite:
python manage.py startproject mywebsite
From now on, everything happens inseid this folder. Lets create requirements.txt and add more dependencies which we will need:
Django==1.10
mysqlclient==1.3.10
gunicorn==19.7.1
We need mysqlclient in order to connect to database and we will use gunicorn as webserver. requirements.txt needs to be in mywebsite folder.
We also need to create app.yaml and define our GAE Flex settings:
runtime: python env: flex entrypoint: gunicorn -b :$PORT mywebsite.wsgi runtime_config: python_version: 3 resources: cpu: 1 memory_gb: 1 disk_size_gb: 10 manual_scaling: instances: 1 beta_settings: cloud_sql_instances: '<Cloud SQL Instance connection name>'
Few notes here:
In entrypoint as I explained earlier: gunicorn is run and in this case it runs WSGI application which is defined in wsgi.py file in mywebsite folder (which is in root mywebsite folder).
In runtime_config we explicitly define Python 3 version.
beta_settings part is obligatory (although at the moment neither GEA Flex or Cloud SQL are in Beta) and keep in mind that you need to enter Cloud SQL instance connection name inside strings, for example 'my-project:us-central1:gae-flex-django', otherwise it won't work.
Next we need to configure settings.py file which is mywebsite folder. Most important parts for start are database settings, that part is here:
DATABASES = { 'default': { 'ENGINE': 'django.db.backends.mysql', 'NAME': 'gae_flex_db', # 'HOST': '<Cloud SQL IP>', 'USER': 'root', 'PASSWORD': 'rootpass', # 'PORT': '3306' } } DATABASES['default']['HOST'] = '/cloudsql/my-project:us-central1:gae-flex-django' # SECURITY WARNING: don't run with debug turned on in production! DEBUG = True if not os.getenv('GAE_INSTANCE'): DATABASES['default']['HOST'] = '127.0.0.1' # DEBUG = True
Few comments:
os.getenv('GAE_INSTANCE') checks if application is deployed, (in that case, GAE_INSTANCE variable is set) and in that way we know that when we run locally or on deployed instance. when local, database host is set to localhost (127.0.0.1) because of cloud sql proxy which we start locally during mysql setup. In this case everything we do to locally with database is done in mysql database in the cloud. Keep this in mind when you will have local and production environment.
database host needs to be in format '/cloudsql/<Cloud SQL Instance connection name' in order to connect to Cloud SQL instance
Like I mentioned earlier, it's possible to connect to database without proxy, in that case you need to set IP address of your Cloud SQL instance and uncomment POST variable, which for mysql is 3306. It's also required to change Network Traffic settings in Cloud SQL instance so it allows connection / traffic from local IP.
It's also good prior to deployment to add basic logging because in that way we can see in logs if somethings go wrong when application is not working after deployment. I copied basic logging settings from Django documentation:
LOGGING = { 'version': 1, 'disable_existing_loggers': False, 'handlers': { 'console': { 'class': 'logging.StreamHandler', }, }, 'loggers': { 'django': { 'handlers': ['console'], 'level': 'INFO' }, }, }
Don't forget to setup ALLOWED_HOSTS variable according the needs. In this project is like this:
ALLOWED_HOSTS = ['.appspot.com', ]
Of course, it's better to specify concretely hosts/domain, but for the sake of genericity in this example I am allowing all appspot.com subdomains. This setting is of course valid only when DEBUG variable is set to False
finally we can make initial database migration:
python manage.py migrate
This will create tables in our Cloud SQL database. Keep in mind, that in order for this to work, you need to have cloud sql proxy running.
We will also create django superuser so we can login into admin part of website:
python manage.py createsuperuser
Regarding static files, it's possible for application to serve them, but that's not recommended use, it's better to use dedicated server (like nginx), CDN or in our case we will use Google Cloud Storage. Google Cloud Storage is product on Google Cloud Platform that is well suited for work/serving files like videos, images, data text files etc. Keep in mind that Google Cloud Storage Bucket name has to be unique across internet and it's possible to connect custom domain to the bucket.
We need to create bucket where static files will be uploaded. run command:
gsutil mb gs://<gcs-bucket-name>
And to make it public, we need to run this command:
gsutil defacl set public-read gs://<gcs-bucket-name>
in settings.py file we need to set STATIC_URL path so that it points to GCS bucket:
STATIC_URL = 'https://storage.googleapis.com/<GCS_BUCKET_NAME>/'
finally, I created bash script deploy.sh which is useful for automation deployment:
#!/bin/bash
python manage.py migrate # apply database migrations in project
python manage.py collectstatic <<< yes # collect static files with automatic yes answer
gsutil rsync -R static gs://GCS_BUCKET_NAME # upload/sync static files folder with GCS bucket
gcloud app deploy --verbosity=debug --promote <<< Y # deploy application and automatically make it default / serving
I hope comments are self explanatory. If deployments run with no errors, you should reach your application on url: http://<my-app-id>.appspot.com/admin/. With superuser credentials you can login into Django admin. If you get Internal Server Error, check logs for GAE first, check all types of logs so you see logs from application, nginx server, vm instance... Note: error logs from Django app are displayed as normal logs.
Now you can proceed with creating some app :)