I'm using Airflow(GCP Composer) now.
I know it has GCS hook and I can download some GCS files.
But I'd like to read a file partially.
Can I use this python logic with PythonOperator in DAG?
from google.cloud import storage
def my_func():
client = storage.Client()
bucket = client.get_bucket("mybucket")
blob = bucket.get_blob("myfile")
data = blob.download_as_bytes(end=100)
return data
In Airflow task, is direct Client API call which is not using hooks forbidden?
You can but a more Airflowy to handle missing functionality in the hook is to extend the hook:
from airflow.providers.google.cloud.hooks.gcs import GCSHook
class MyGCSHook(GCSHook):
def download_bytes(
self,
bucket_name: str,
object_name: str,
end:str,
) -> bytes:
client = self.get_conn()
bucket = client.bucket(bucket_name)
blob = bucket.blob(blob_name=object_name)
return blob.download_as_bytes(end=end)
Then you can use the hook function in PythonOperator or in a custom operator.
To note that GCSHook has download function as you mention.
What you may have missed is that if you don't provide filename it will download as bytes (see source code). It doesn't allow to configure the end parameter as you expect but this should be an easy fix to PR for Airflow if you are looking to contributing to Airflow open source.
Related
I am generally new to programming, with a large capacity for self-learning. I have found myself, after taking Harvard's CS50 program unable to use a database in FLASK (using python).
I have created the database with sqlite and have it saved to my work environment and a copy in exterior folders.
I would not like to use FLASK ALCHEMY as I am 100% comfortable in SQL and would not like to start defamiliarizing myself with the basic usage.
I am using visual studio and have Flask already properly installed, and being used to define working routes already.
The database is named trial.db and it can be assumed to have been properly set-up.
import os
from cs50 import SQL
from flask import Flask, flash, redirect, render_template, request,
session, jsonify
from flask_session import Session
from tempfile import mkdtemp
from werkzeug.exceptions import default_exceptions, HTTPException,
InternalServerError
from werkzeug.security import check_password_hash,
generate_password_hash
from helpers import apology, login_required, lookup, usd
app = Flask(__name__)
app.config["TEMPLATES_AUTO_RELOAD"] = True
#app.after_request
def after_request(response):
response.headers["Cache-Control"] = "no-cache, no-store, must-
revalidate"
response.headers["Expires"] = 0
response.headers["Pragma"] = "no-cache"
return response
app.jinja_env.filters["usd"] = usd
app.config["SESSION_FILE_DIR"] = mkdtemp()
app.config["SESSION_PERMANENT"] = False
app.config["SESSION_TYPE"] = "filesystem"
Session(app)
db = SQL("sqlite:///finance.db")
** THE ABOVE CODE is what I am use to calling based on CS50s library, which is excessively generous.**
SQL, as above is utilized like such:
cs50.SQL(url)
Parameters
url – a str that indicates database dialect and connection arguments
Returns
a cs50.SQL object that represents a connection to a database
Example usage:
db = cs50.SQL("sqlite:///file.db") # For SQLite, file.db must exist
Please help, and Thank you
FastAPI documentation recommends using lru_cache decorated functions to retrieve the config file. That makes sense to avoid I/O getting the env file.
config.py
from pydantic import BaseSettings
class Settings(BaseSettings):
app_name: str = "Awesome API"
admin_email: str
items_per_user: int = 50
class Config:
env_file = ".env"
and then in other modules, the documentation implemented a function that gets the settings
#module_omega.py
from . import config
#lru_cache()
def get_settings():
return config.Settings()
settings = get_settings()
print(settings.ENV_VAR_ONE)
I am wondering if this method is better practice or advantageous to just initializing a settings object in the config module and then importing it like below.
#config.py
from pydantic import BaseSettings
class Settings(BaseSettings):
app_name: str = "Awesome API"
admin_email: str
items_per_user: int = 50
class Config:
env_file = ".env"
settings = Settings()
#module_omega.py
from .config import settings
print(settings.ENV_VAR_ONE)
I realize it's been a while since you asked, and though I agree with the commenters that these can be functionally equivalent, I can point out another important difference that I think motivates the use of #lru_cache.
What the #lru_cache approach can help with is limiting the amount of code that is executed when the module is imported.
settings = Settings()
By doing this, like you suggested, you are exporting an instance of your settings. Which means that you're transitively executing any code that needs to be run to create your settings immediately when your module is imported.
While module exports are cached similar to how #lru_cache would do, you don't have as much control over deferring the loading of your settings, since in python we typically place our imports at the top of a file.
The #lru_cache technique is especially useful if you have more expensive settings, like looking at the filesystem, or going to the network. That way you can defer loading your settings until you really actually need them.
from . import get_settings
def do_something_with_deferred_settings():
print(get_settings().my_setting)
if __name__ == "__main__":
do_something_with_deferred_settings()
Other things to look into:
#cache in python 3.9 instead of #lru_cache
Module __getattr__ doesn't add anything here IMO, but it can be useful when working with dynamism and the import system.
I am looking for a tutorial or document on how to access datastore using cloud functions (python).
However, it seems there is only tutorial for nodejs.
https://github.com/GoogleCloudPlatform/nodejs-docs-samples/tree/master/functions/datastore
Can anybody help me out?
Thanks
There are no special setup needed to access datastore from cloud functions in python.
You just need to add google-cloud-datastore into requirements.txt and use datastore client as usual.
requirements.txt
# Function dependencies, for example:
# package>=version
google-cloud-datastore==1.8.0
main.py
from google.cloud import datastore
datastore_client = datastore.Client()
def foo(request):
"""Responds to any HTTP request.
Args:
request (flask.Request): HTTP request object.
Returns:
The response text or any set of values...
"""
query = datastore_client.query(kind=<KindName>)
data = query.fetch()
for e in data:
print(e)
Read more:
Python Client for Google Cloud Datastore
Setting Up Authentication for Server to Server Production Applications
I am trying out aiohttp (to test against Flask, and just to learn it) and am having an issue with passing data via the Application. The examples say that I can set a key value in the app in order to pass static info (e.g., a database connection). But, somehow this information is getting lost and I suspect it is in the nested applications, though not sure.
app.py:
import asyncio
from aiohttp import web
import logging
from data import data_handler
from data import setup_web_app as data_setup_web_app
logging.basicConfig()
log = logging.getLogger('data')
log.setLevel(logging.DEBUG)
async def my_web_app():
loop = asyncio.get_event_loop()
app = web.Application(loop=loop)
app['test'] = 'here'
data_setup_web_app(web, app)
return app
data.py:
from aiohttp import web
import logging
logging.basicConfig()
log = logging.getLogger('data')
log.setLevel(logging.DEBUG)
def setup_web_app(web, app):
data = web.Application()
data.add_routes([web.get('/{name}', data_handler, name='data')])
app.add_subapp('/data/', data)
async def data_handler(request):
name = request.match_info['name']
log.debug('test data is {}'.format(request.app['test']))
return web.json_response({'handler': name})
And I am using gunicorn to run it: gunicorn app:my_web_app --bind localhost:8080 --worker-class aiohttp.worker.GunicornWebWorker --workers=2
But when I go to http://127.0.0.1:8080/data/asdf in the browser I get a KeyError: 'test' in the data.py debug print statement.
I suspect the app data is not being passed through correctly to the nested applications, but not sure.
Now keys from main app are not visible from subapp and vise versa.
Please read the issue for more details.
I'd like to support a kind of chained map for this but the feature is not implemented yet.
I am working in an Ionic 3 project with ts to integrate Firebase into my app.
The below code I used to integrate firebase with Ionic project
constructor(angFire: AngularFireDatabase){
}
books: FirebaseListObservable<any>;
To send the data from my app to firebase, I used push method and to update entries I used update($key). Now I have all the data's in Firebase backend.
Now, how can I sync the firebase Database with Google Sheets so that each and every entry added to firebase backend has to get updated into sheets. I used a third party ZAPIER for this integration, but it would be nice if I get to learn on how to do this sync on my own.
Upon surfing, there are many tutorials to get the data's from the google sheets into Firebase. But I didn't come across any tutorials for vice versa.
I followed the below tutorial but it doesn't point to spreadsheets.
https://sites.google.com/site/scriptsexamples/new-connectors-to-google-services/firebase
Any help would be greatly appreciated!
I looked into importing Firebase right into Google Scripts either through the JavaScript SDK or or the REST API. Both have requirements/steps that Google Scripts cannot satisfy or that are extremely difficult to satisfy.
There is no foreseeable method of downloading the JavaScript SDK inside a Google Script because almost every method requires a DOM, which you don't have with a Google Sheet.
The REST API requires GoogleCredentials which, at a short glance, appear very difficult to get inside Google Scripts as well
So, the other option is to interact with Firebase in a true server side environment. This would be a lot of code, but here are the steps that I would take:
1) Setup a Pyrebase project so you can interact with your Firebase project via Python.
import pyrebase
config = {
"apiKey": "apiKey",
"authDomain": "projectId.firebaseapp.com",
"databaseURL": "https://databaseName.firebaseio.com",
"storageBucket": "projectId.appspot.com",
"serviceAccount": "path/to/serviceAccountCredentials.json"
}
firebase = pyrebase.initialize_app(config)
...
db = firebase.database()
all_users = db.child("users").get()
2) Setup a Google Scripts/Sheets project as a class that can interact with your Google Sheet
from __future__ import print_function
import httplib2
import os
from apiclient import discovery
from oauth2client import client
from oauth2client import tools
from oauth2client.file import Storage
try:
import argparse
flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
except ImportError:
flags = None
# If modifying these scopes, delete your previously saved credentials
# at ~/.credentials/sheets.googleapis.com-python-quickstart.json
SCOPES = 'https://www.googleapis.com/auth/spreadsheets.readonly'
CLIENT_SECRET_FILE = 'client_secret.json'
APPLICATION_NAME = 'Google Sheets API Python Quickstart'
class GoogleSheets:
...
# The rest of the functions from that link can go here
...
def write(self, sheet, sheet_name, row, col):
"""
Write data to specified google sheet
"""
if sheet == None or sheet == "":
print("Sheet not specified.")
return
day = time.strftime("%m/%d/%Y")
clock = time.strftime("%H:%M:%S")
datetime = day + " - " + clock
values = [[datetime]]
spreadsheetId = sheet
rangeName = sheet_name + "!" + str(row) + ":" + str(col)
body = {
'values': values
}
credentials = self.get_credentials()
http = credentials.authorize(httplib2.Http())
discoveryUrl = ('https://sheets.googleapis.com/$discovery/rest?'
'version=v4')
service = discovery.build('sheets', 'v4', http=http,
discoveryServiceUrl=discoveryUrl)
result = service.spreadsheets().values().update(
spreadsheetId=spreadsheetId, range=rangeName,
valueInputOption="RAW", body=body).execute()
3) Call the Google Sheets somewhere inside your Pyrebase project
from GoogleSheets import GoogleSheets
...
g = GoogleSheets()
g.write(<project-id>, <sheet-name>, <row>, <col>)
...
4) Set up a cron job to run the python script every so often
# every 2 minutes
*/2 * * * * /root/my_projects/file_example.py
You will need some basic server (Heroku, Digital Ocean) to run this.
This is not extensive because there is a lot of code to be written, but you could get the basics done. Makes we want to make a package now.
You can go for Zapier which is a 3rd party service through which you can easily integrate your Firebase and Google spreadsheets and vice versa. It has also got some support for google docs and other features.
https://zapier.com/zapbook/firebase/google-sheets/
Firebase can't be used as a trigger in Zapier, only as an action, so you can't send data from it to Google Sheets.