On May 21st, 2017 it was (somewhat) quietly announced that for Cloud Task there is Beta support for HTTP targets. I want to explain what does it mean in this article since I think it deserves more attention, but first let's go back in history to understand the whole context.
Cloud Tasks is a product on Google Cloud Platform which manages dispatching of asynchronous tasks. What that means is that you post data to concrete Cloud Tasks queue and it forwards it to target worker which does the task processing. This is especially handy when you have long-running task and you don't want to prolong your response. Instead, you dispatch the long processing task to Task queue and it forwards it to a worker.
Cloud Tasks history starts within App Engine. Since (Standard) App Engine (oldest product on GCP) has 60 seconds response limit, there was a need to provide service which can provide handling of longer requests. This was done through Tasks Queues which is an integrated service within App Engine. In recent years in effort of decoupling initial App Engine services into separate products, Cloud Tasks were introduced with the initial support of App Engine targets, which was still limiting service to App Engine usage.
From now on, however, Cloud Tasks finally provide support for any HTTP target, not just App Engine (and not just Google Cloud). This, of course, expands usability since now basically any URL can be used as a worker.
Cloud Tasks have nice features which should make you start thinking of using them in case you didn't so far:
- retry with exponential backoff. In case task worker doesn't return 2xx response, it repeats the task for a predefined number of attempts.
- pause/resume task execution. For example, you deploy new worker code version and it doesn't work as expected you can pause tasks execution
- you can set a maximal number of dispatched tasks from queue per second or a maximal number of concurrent tasks that can be dispatched. This is handy when you need to limit tasks processing. For example, you have a third party API which has a limit of how many requests per second you can make, and with these parameters, you can easily control execution rate.
- postpone task dispatch. When you create a task, it's dispatched as soon as possible for processing. You can, however, set execution date in the future when a task should be executed. I admit I didn't find if there is constraint how long in the future you can postpone dispatching.
- push tasks are supported. In App Engine both push and pull were supported but now only push.
- tasks can be created with client libraries for concrete languages as well as REST API.
- price. 1 million operations per month for free. after that every 5 million for 0.4$.
- provides web UI with information about queues and tasks processed.
Other distributed messaging platform on Google Cloud is PubSub. Of course, there are differences as well as similarities, I won't go into them, because there is a very detailed comparison of Cloud Tasks and Cloud PubSub https://cloud.google.com/tasks/docs/comp-pub-sub in case you need to decide which one to use. It can be said that Cloud Tasks have a subset of PubSub functionalities. PubSub is more event-driven whereas cloud tasks have more web/backend focus. Again, check the URL for better comparison.
Since Cloud Tasks is managed service it plays very nicely with other managed/serverless products like Cloud Functions, Cloud Run, App Engine etc.
To start with Cloud Task you need to create at least one queue with Google Cloud SDK (gcloud). Important is to be aware, that when creating for the first time queue, it prompts you to create App Engine application as well since at the moment Cloud Tasks are tied to the region where App Engine is set for your project. This can't be changed for the project. So in the region where your App Engine lives (although you don't have to deploy any App Engine application nor you plan to), Cloud Tasks lives as well.
This is how the process of first queue creation looks like, with this command I am creating queue "datasets-queue" for which there are 10 maximum concurrent dispatches and just one attempt to execute. The default value for the maximal number of attempts is 100 and for development/testing it's useful to set it to 1 or similar because if a task fails for some reason it would keep repeating 100 times (which can spread through the whole day). When using Cloud Tasks through gcloud, Cloud Tasks API needs to be enabled (which can be done during a queue creation).
gcloud tasks queues create datasets-queue --max-concurrent-dispatches 10 --max-attempts 1 API [cloudtasks.googleapis.com] not enabled on project [301020687502]. Would you like to enable and retry (this will take a few minutes)? (y/N)? y Enabling service [cloudtasks.googleapis.com] on project [301020687502]... Waiting for async operation operations/acf.12adf55c-d1e1-40c2-ab93-00e534eaff57 to complete... Operation finished successfully. The following command can describe the Operation details: gcloud services operations describe operations/tmo-acf.12adf55c-d1e1-40c2-ab93-00e534eaff57 There is no App Engine app in project [cz-open-data]. Would you like to create one (Y/n)? You are creating an app for project [cz-open-data]. WARNING: Creating an App Engine application for a project is irreversible and the region cannot be changed. More information about regions is at <https://cloud.google.com/appengine/docs/locations>. Please choose the region where you want your App Engine application located: [1] asia-east2 (supports standard and flexible) [2] asia-northeast1 (supports standard and flexible) [3] asia-northeast2 (supports standard and flexible) [4] asia-south1 (supports standard and flexible) [5] australia-southeast1 (supports standard and flexible) [6] europe-west (supports standard and flexible) [7] europe-west2 (supports standard and flexible) [8] europe-west3 (supports standard and flexible) [9] europe-west6 (supports standard and flexible) [10] northamerica-northeast1 (supports standard and flexible) [11] southamerica-east1 (supports standard and flexible) [12] us-central (supports standard and flexible) [13] us-east1 (supports standard and flexible) [14] us-east4 (supports standard and flexible) [15] us-west2 (supports standard and flexible) [16] cancel Please enter your numeric choice: 8 Creating App Engine application in project [cz-open-data] and region [europe-west3]....done. WARNING: You are managing queues with gcloud, do not use queue.yaml or queue.xml in the future. More details at: https://cloud.google.com/cloud-tasks/docs/queue-yaml. Created queue [datasets-queue].
For one of my personal projects, I am collecting Czech Open Data and uploading into BigQuery and Cloud Tasks is a perfect tool to which I can dispatch tasks and then execute in parallel. In practice, that means that as an input, I have a list of URLs with some other metadata which are encoded and sent to Cloud Tasks. In Cloud Function, which is used as HTTP target, the main job regarding download/upload is done. I've decided to use Cloud Functions because they have concurrency of 1, which means that for every request Cloud Function instance is launched separately and thus it provides more working memory (2GB max) which I need as much as possible since I am downloading a file into memory.
Cloud function does the following:
I will not include code for Cloud Function since it's a bit lengthy, but here is a code which I use to create tasks.
from google.cloud import tasks_v2beta3 tasks = tasks_v2beta3.CloudTasksClient() tasks_parent = tasks.queue_path(GCP_PROJECT, 'europe-west3', 'datasets-queue') def create_task(data: Dict[str, str]): task_data = { 'http_request': { 'http_method': 'POST', 'url': CF_URL, 'body': json.dumps(data).encode(), 'oidc_token': {'service_account_email': SERVICE_ACCOUNT_EMAIL} } } return tasks.create_task(tasks_parent, task_data)
An important part of creating tasks workers is to secure them, i.e. authenticate only for wanted users and with that prevent unwanted access. Cloud Run, App Engine support this from start whereas for Cloud Functions to control access via IAM accounts is still in alpha. I found out about that possibility reading this article. Process of securing Cloud Function is the following:
Create a service account
gcloud iam service-accounts create tasks-creator --display-name="Creates tasks for upload" Created service account [tasks-creator].
Set role to a created service account and bind it to Cloud Function which will be used as task worker..
~> gcloud alpha functions add-iam-policy-binding upload-dataset --member serviceAccount:[email protected] --role roles/cloudfunctions.invoker --region europe-west1 --project cz-open-data bindings: - members: - serviceAccount:[email protected] - allUsers role: roles/cloudfunctions.invoker etag: BwWKA8atmFg= version: 1
As can be seen from the response, the function has also role "allUsers" which means that anybody can invoke it, so I need to remove it.
~>gcloud alpha functions remove-iam-policy-binding upload-dataset --member allUsers --role roles/cloudfunctions.invoker --region europe-west1 --project cz-open-data bindings: - members: - serviceAccount:[email protected] role: roles/cloudfunctions.invoker
Now only tasks-creator can execute this function.
In the above code for tasks creation, I am passing as "oidc_token" service account email, in this case, "[email protected]". With this, Cloud Tasks will create appropriate JWT token which Cloud Function (or some other Google Cloud service) will automatically decode and authenticate which is awesome. Note that you don't have to download key file and use that in initialize client or similar, you just need to pass the appropriate email address.
Web UI provides a nice interface with useful information like queue settings or details of tasks executions.
Hope this article will inspire you to start using Cloud Tasks (if you didn't already).