When somebody tells you Google Cloud Storage, probably first thing that comes to your mind is bucket, blob, Nearline, Coldline, you can set ACL for buckets and objects in there... but lets have a look at some cool features which are not so often mentioned. All code examples are in this github repository.
Transcoding is feature that allows you to upload files compressed in gzip but when accessed, they are sent uncompressed. Practical use case: save $$$ by paying for compressed storage, not full size files and reduced transfered data.
To make this work, you need to upload file as gzip compressed and set Content-Encoding of the file in Storage to be gzip. That's all. Good practice is to set also Content-Type accordingly.
So now when you request compressed file from Storage, it will be automatically uncompressed as response.
Lets see how can this be done in Python using client library for Google Cloud Storage.
from google.cloud import storage client = storage.Client().from_service_account_json(SERVICE_JSON_FILE) bucket = storage.Bucket(client, BUCKET_NAME) compressed_file = 'test_file.txt.gz' blob = bucket.blob(compressed_file, chunk_size=262144 * 10) blob.content_encoding = 'gzip' blob.upload_from_filename(compressed_file, content_type='text/plain')
blob.make_public() print(blob.public_url)
I am using Service Account created in IAM & Admin with Role Storage Admin to access Cloud Storage with client library. As I mentioned before, be need to set explicitly Content Encoding. I am making it also public.
Now when I use wget to download file from public url, whole content of file (uncompressed) is downloaded:
wget <public url>
It's interesting that requests library is downloading file compressed and uncompress it afterwards. So since Python library for Storage also uses requests, this method first downloads file as compressed and then it uncompress.
blob.download_to_filename('downloaded.txt')
to download file as compressed, you need to set headers Accept-Encoding: gzip. With wget for example:
wget --header="Accept-Encoding: gzip" <public url>
Browsers usually sent such headers so files are downloaded as compressed and are then uncompressed which is another good example of this case, i.e. serving static files.
Of course compressing / decompressing makes sense for files which are well compressed, like text files. Images, binary files not much, so keep that in mind. Everything depends from case to case.
Detailed description can be found here https://cloud.google.com/storage/docs/transcoding
or in plain English "Do something with object in bucket based on date and time". Use case for this: archiving, deleting objects or object versions, moving to another storage class. Rules of actions on objects are defined per bucket. Currently two actions are supported:
Conditions for actions are:
Example to set lifecycle rules with Python library is following:
from settings import SERVICE_JSON_FILE, BUCKET_NAME from google.cloud import storage client = storage.Client().from_service_account_json(SERVICE_JSON_FILE) bucket = storage.Bucket(client, BUCKET_NAME) rules = [ { 'action': { 'type': 'SetStorageClass', 'storageClass': 'COLDLINE' }, 'condition': { 'age': 365 } }, { 'action': { 'type': 'Delete' }, 'condition': { 'age': 365*10, 'matchesStorageClass': ['COLDLINE'] } } ] bucket.lifecycle_rules = rules bucket.update()
Set rules for bucket in example means following:
This feature is like suited for backups and similar stuff. Action doesn't take immediately after condition is reached (all conditions for object need to be reached for action to be executed) but asynchronously with some time lag. More info in this link: https://cloud.google.com/storage/docs/lifecycle
Cloud Storage can keep versions of the same file. Versioning needs to be enabled on bucket level. When versioning is enabled and file under the same name is uploaded, version (or in terms of GCS - generation) is created. As usually, best is explained on example.
from settings import SERVICE_JSON_FILE, BUCKET_NAME from google.cloud import storage from google.auth.transport.requests import AuthorizedSession from pprint import pprint from google.oauth2 import service_account client = storage.Client().from_service_account_json(SERVICE_JSON_FILE) credentials = service_account.Credentials.from_service_account_file(SERVICE_JSON_FILE) credentials = credentials.with_scopes(['https://www.googleapis.com/auth/cloud-platform']) authed_session = AuthorizedSession(credentials) bucket = storage.Bucket(client, BUCKET_NAME) # name of the file which will be created in Cloud Storage FILENAME = 'test_versioned_file.txt' # enable versioning for bucket bucket.enable_logging(BUCKET_NAME) bucket.versioning_enabled = True bucket.update() # upload 3 times different content for the same filename blob = bucket.blob(FILENAME) blob.upload_from_string('first version', content_type='text/plain') blob.upload_from_string('second version', content_type='text/plain') blob.upload_from_string('third version', content_type='text/plain') # list all files in the bucket with prefix equals to the filename blob_versions_url = "https://www.googleapis.com/storage/v1/b/{bucket}/o?versions=true&prefix={filename}".format(bucket=BUCKET_NAME, filename=FILENAME) # get versions response = authed_session.get(blob_versions_url) data = response.json() versions = data['items'] pprint(versions) # get first and latest generation first_generation_url = versions[0]['mediaLink'] last_generation_url = versions[-1]['mediaLink'] resp = authed_session.get(first_generation_url) pprint(resp.text) resp = authed_session.get(last_generation_url) pprint(resp.text) print("current content ", blob.download_as_string())
Output of the program is:
[{'bucket': 'adventures-on-gcp', 'contentType': 'text/plain', 'crc32c': 'ZTopYw==', 'etag': 'CL/os/GogtYCEAE=', 'generation': '1504211601519679', 'id': 'adventures-on-gcp/test_versioned_file.txt/1504211601519679', 'kind': 'storage#object', 'md5Hash': '6eI3FXDa7C57cPqk8PHquA==', 'mediaLink': 'https://www.googleapis.com/download/storage/v1/b/adventures-on-gcp/o/test_versioned_file.txt?generation=1504211601519679&alt=media', 'metageneration': '1', 'name': 'test_versioned_file.txt', 'selfLink': 'https://www.googleapis.com/storage/v1/b/adventures-on-gcp/o/test_versioned_file.txt', 'size': '13', 'storageClass': 'STANDARD', 'timeCreated': '2017-08-31T20:33:21.440Z', 'timeDeleted': '2017-08-31T20:33:21.894Z', 'timeStorageClassUpdated': '2017-08-31T20:33:21.440Z', 'updated': '2017-08-31T20:33:21.440Z'}, {'bucket': 'adventures-on-gcp', 'contentType': 'text/plain', 'crc32c': 'M11WNQ==', 'etag': 'CO7YyvGogtYCEAE=', 'generation': '1504211601894510', 'id': 'adventures-on-gcp/test_versioned_file.txt/1504211601894510', 'kind': 'storage#object', 'md5Hash': '8IS+N+2E6dDSoC1NS+WXRQ==', 'mediaLink': 'https://www.googleapis.com/download/storage/v1/b/adventures-on-gcp/o/test_versioned_file.txt?generation=1504211601894510&alt=media', 'metageneration': '1', 'name': 'test_versioned_file.txt', 'selfLink': 'https://www.googleapis.com/storage/v1/b/adventures-on-gcp/o/test_versioned_file.txt', 'size': '14', 'storageClass': 'STANDARD', 'timeCreated': '2017-08-31T20:33:21.820Z', 'timeDeleted': '2017-08-31T20:33:22.355Z', 'timeStorageClassUpdated': '2017-08-31T20:33:21.820Z', 'updated': '2017-08-31T20:33:21.820Z'}, {'bucket': 'adventures-on-gcp', 'contentType': 'text/plain', 'crc32c': 'xXXOtw==', 'etag': 'CKns5vGogtYCEAE=', 'generation': '1504211602355753', 'id': 'adventures-on-gcp/test_versioned_file.txt/1504211602355753', 'kind': 'storage#object', 'md5Hash': 'gE7JlW9qdJW8bWNO4NE6hw==', 'mediaLink': 'https://www.googleapis.com/download/storage/v1/b/adventures-on-gcp/o/test_versioned_file.txt?generation=1504211602355753&alt=media', 'metageneration': '1', 'name': 'test_versioned_file.txt', 'selfLink': 'https://www.googleapis.com/storage/v1/b/adventures-on-gcp/o/test_versioned_file.txt', 'size': '13', 'storageClass': 'STANDARD', 'timeCreated': '2017-08-31T20:33:22.295Z', 'timeStorageClassUpdated': '2017-08-31T20:33:22.295Z', 'updated': '2017-08-31T20:33:22.295Z'}] 'first version' 'third version' current content b'third version'
Briefly to explain the code: First I am updating bucket so that it has enabled versioning then I am creating blob and I am uploading 3 times different content. Since currently Python's library for Cloud Storage doesn't support versioning I am doing with "raw" REST requests, using credentials from service account file. Then I am making request to list all files in bucket. I am using prefix field as filter to get only versions for related file and I am priting info about objects. In case I didn't miss something, currently there is no endpoint to get straigh all versions for object. I get url of first and latest version of the file from 'mediaLink' field and I download them and print them. At last I download via client library current version of file which is the same as downloaded latest.
Practical use is obvious. Whenever you need to keep versions of the same file and make them available, this is built in solution. There are some gotchas still so better read more detailed info https://cloud.google.com/storage/docs/object-versioning