How to hide the query rendering in airflow (hide secrets from logs) - airflow

I have a query in snowflake to share data to AWS S3. I have to enter the access keys in the query.
How can I hide the query rendering in airflow?
CREATE OR REPLACE STAGE MY_STAGE
url='s3://my_bucket/others'
credentials=(
aws_key_id='XXXXXXXXXXXXXXXXXXXX'
aws_secret_key='DFgsdFGSdfgAqTRjfFGHJ343'
aws_token='hsfdjfhksdfhskdfhsdkjfhiauowqegkhbHSALDfkshdfisuhiqwuger8748sf0!$#%FGH#$%'
)

Moving Simon's answer from comment to answer, for closure:
Create your stage beforehand or with storage integrations.
https://docs.snowflake.com/en/sql-reference/sql/create-storage-integration.html
If you create the stage beforehand, then you can just reference it by name, without the need to type credentials again.

Related

How to set a field for every document in a Cosmos db?

What would a Cosmos stored procedure look like that would set the PumperID field for every record to a default value?
We are needing to do this to repair some data, so the procedure would visit every record that has a PumperID field (not all docs have this), and set it to a default value.
Assuming a one-time data maintenance task, arguably the simplest solution is to create a single purpose .NET Core console app and use the SDK to query for the items that require changes, and perform the updates. I've used this approach to rename properties, for example. This works for any Cosmos database and doesn't require deploying any stored procs or otherwise.
Ideally, it is designed to be idempotent so it can be run multiple times if several passes are required to catch new data coming in. If the item count is large, one could optionally use the SDK operations to scale up throughput on start and scale back down when finished. For performance run it close to the endpoint on an Azure Virtual Machine or Function.
For scenarios where you want to iterate through every item in a container and update a property, the best means to accomplish this is to use the Change Feed Processor and run the operation in an Azure function or VM. See Change Feed Processor to learn more and examples to start with.
With Change Feed you will want to start it to read from the beginning of the container. To do this see Reading Change Feed from the beginning.
Then within your delegate you will read each item off the change feed, check it's value and then call ReplaceItemAsync() to write back if it needed to be updated.
static async Task HandleChangesAsync(IReadOnlyCollection<MyType> changes, CancellationToken cancellationToken)
{
Console.WriteLine("Started handling changes...");
foreach (MyType item in changes)
{
if(item.PumperID == null)
{
item.PumperID = "some value"
//call ReplaceItemAsync(), etc.
}
}
Console.WriteLine("Finished handling changes.");
}

Firebase Cloud Functions onDelete - How to access parent's information?

So, from Firebase functions, I'm listening to this event -
exports.populateVairations_delete =
functions.database.ref('/parentA/parentB/child').onDelete(event =>
{
// I know how to get the previous value for what I'm listening too...
val = event.data.previous.val();
...
}
This function is being invoked also when deleting the parent, which is exactly what I want.
But when deleting a parent, how do I access data from /parentA before it's being deleted?
onDelete triggers are always executed after the delete has occurred. There's no way to prevent a delete from happening with a function. Your onDelete code will be delivered an event that contains only the data that was deleted. The event object itself can't be used to see other parts of the database.
If you need to access other parts of the database inside a database trigger, you can use the Admin SDK to make those queries. There is a lot of official sample code that illustrates how to do this.
With context.resource.name you could get a string containing the data path.
Just use Admin SDK for Firebase, You can have administrator access to the firebase Db.From there, you can do basically anything with the Firebase Db

Optimize Firebase database design

I am having trouble designing the database of my app. In the app users are allowed to create jobs and then using GeoFire I find people nearby.
This is my design for the jobs so far:
As you can see there are the users and then the workers. After pushing the new job to the users Unique ID (UID) under serviceUsers, I then use geoFire to find the workerUsers that are nearby. I then push the jobs into the UID's of the workerUsers.
Now here are my questions:
I am basically creating copies of these jobs. Once for the person who created it (under serviceUsers) and once for every nearby workerUsers.
Is this inefficient? Should I rather pass some kind of pointer instead of the whole job object to the nearby users?
And here for the more important question: If the design is fine as it is, how would I go on about when the creator of the job deletes it? I would then need to find each job in workerUsers and delete the job with the jobs UID. Does Firebase support queries for this?
Thank you very much in advance!
I am basically creating copies of these jobs. Once for the person who
created it (under serviceUsers) and once for every nearby workerUsers.
Is this inefficient? Should I rather pass some kind of pointer instead
of the whole job object to the nearby users?
Every job should have a UUID which can act as a "pointer" (I'd rather call it a key). Then every user should include a job UUID, not a whole copy, so you can refer to it. I won't completely replicate your use case, but you should get an idea.
{
users: {
exampleUserId: {
jobs: ['exampleUUID']
}
},
jobs: {
exampleUUID: {
name: 'awesome job'
}
}
}
If the design is fine as it is, how would I go on about when the
creator of the job deletes it? I would then need to find each job in
workerUsers and delete the job with the jobs UID. Does Firebase
support queries for this?
It does support it, but you should implement my suggestion from above to do it in a sane way. After this, you can create a cloud function whose job should sound like this: "When a job with given UUID is removed, then go through every user and remove a reference to it if it exists"
exports.checkReferences = functions.database.ref('/jobs/{uuid}').onWrite(event => {
// check information here
if (!event.data.val()) {
// job was removed! get its uuid and iterate through users and remove the uuid from them
}
});

What is the best practice to update a Vertex after is detached from DB with Tinkerpop Frames?

Let's exemplify
I receive a Vertex with Tinkerpop Blueprint, then I use Frames to convert it in an entity.
I close the database (so from now the node is detached from the DB)
and I show the node on a web page to let the user modify it.
The user makes some modifications, then I shoud persist the changes.
The problem is that the Instance of the database is already closed, so the entity is detached from the database: What is the best practice (considering performance and memory usage too) to update the node?
This may be the code example:
FramedGraph<OrientGraph> graph = factory.getFramedGraph();
User user = graph.addVertex(null, User.class);
graph.shutdown();
then I want to update later the node:
user.name = "Donald Duck";
user.... ?
Thank you,
Andrea
I found this way, that seems quite efficient:
public User persistUser(User user){
FramedGraph<OrientGraph> graph = factory.getFramedGraph();
user = graph.frame(user.asVertex(), User.class);
factory.persist();
graph.shutdown();
So the framework automatically merge back the entity to the database.
Then you have to persist.

Salesforce Batch Apex Class - Querying Against Large Data Sets

I have a batch apex class where i'm building collections of websites and emails, so that i can use those collections to filter other other queries which will be made into collections. With all collections set, i want to run through a final loop of the scope to perform business processes.
Mockup:
for(Object o : scope)
{
listEmails.add(o.Email);
listWebsites.add(o.Websites);
}
Map<String, Account> accounts = Gather all accounts where website not in :listWebsties; //Website is key
List<String, Contact> contacts = Gather all contacts where email not in :listEmails; //Email is key
for(Object o : scope)
{
Account = accounts.get(o.website);
Contact = contacts.get(o.Email);
Perform business logic here
}
The problem is when i run this batch it stays processing for hours. When working with a rather small database this works fine. But in working in a larger environment perhaps this is not the best solution.
Can anyone help me speed up the batch process with a more effective approach?
Is there anyway to post the entire batch apex class? Or help understand the data more?
It looks like from your map that all of your accounts (in theory) have unique websites and all of your contacts have unique emails?
I assume you build those maps by hand? That is you loop over the accounts and do a
map.put(account.website,account)?
Do you have any system debug statements to confirm your map sizes?
What happens if there is no account or no contact when you call accounts.get()?
And the business logic - is it more looping?
And are you using Batch variables in a static manner - i.e. you can have a counter to count the total number of records processed. If so, is your variable a list? that can be dangerous of course.
Also what object is your scope object? Not that it matters, but I'd think you'd want to have your scope be the Accounts themselves or the Contacts themselves.
I'd try adding system.debug statements to your batch to verify it's running and to see where the infinite loop may be occurring.

Resources