Delete millions of fields in Firebase Realtime Database - firebase

I have to delete multiple fields in different locations.
I have millions of "messages" with multiple fields like:
messagesId: {
text: "...",
datetime: "...",
unusedField: "...", <-- REMOVE
...
}, ...
To save storage space, I want to delete old and not used fields for each message. It means, saves tens of Giga, alias money (hundreds of dollars that we currently pay because in addition to the first GB guaranteed by Firebase).
Problem #1 - Database Peak: deleting a huge amount of data, programmatically, put the Peak to 100%, temporarily blocking it.
To solve this, the solution suggested by Firebase support is to use the CLI (firebase database: remove path).
Once I've listed millions of lines:
firebase database:remove /root/messages/messageid_1/field --confirm &&
firebase database:remove /root/messages/messageid_2/field --confirm &&
...
even considering a few milliseconds of execution for each line, the overall execution may take an unacceptably long time (days).
Problem #2 - Delete locally and re-upload the DB: another solution suggested, is to download the entire database, remove the json paths and re-upload it.
Currently, the entire database weighs 60GB.
Is it possible to reload the entire database from the Firebase console? (Given the fact that I would have to suspend any writes in the meantime, to avoid data loss)
Are there any other possible solutions?

The common path for this is:
Enable automated backups for your database, and download the JSON from the Storage bucket.
Process the JSON locally, and determine the exact path of all nodes to remove.
Process the paths in reasonably sized chunks through the API, using multi-location updates.
Removing each chunk of nodes would be something like:
var nodesToRemove = ["/root/messages/messageid_1/field", "/root/messages/messageid_2/field"];
var updates = nodesToRemove.map(function(path) {
return { [path]: null };
});
firebase.database().ref().update(updates);

Related

Firebase storage url, new file keep same access token

Duplicate of: Firebase storage URL keeps changing with new token
When a user uploads a profile pic I store this in firebase storage with the file name as the uid.
Lets say the user then goes and makes say 100 posts + 500 comments and then updates their profile image.
Currently I have a trigger which goes and updates the profile image url in all of the post and comment documents. The reason I have to do this is that when the image is changed in storage the access token is changed and this is part of the url so the old url no longer works.
What I want to do is not have the access token change. If I can do this I can avoid the mass updates that will massively increase my firestore writes.
Is there any way to do this? or an alternative?
Edit:
Another solution if you don't mind making the file public.
Add this storage rule and you won't have to use a token to access the file.
This will allow read access to "mydir" globally in any subfolder.
match /{path=**}/mydir/{doc} {
allow read: if true;
}
There are only two options here:
You store the profile image URL only once, probably in the user's profile document, and look it up every time it is needed. In return you only have to write it once.
You store the profile image URL for every post, in which case you only have to load the post documents and not the profile URL for each. In return you'll have to write the profile URL in each post document, and update it though.
For smaller networks the former is more common, since you're more likely to see multiple posts from the same user, so you amortizing the cost of the extra lookup over multiple posts.
The bigger the network of users, the more interesting the second approach becomes, as you'll care about read performance and simplicity more than the writes you're focusing on right now.
In the end, there's no singular right answer here though. You'll have to decide for yourself what performance and cost profile you want your app to have.
Answer provided by #Prodigy here: https://stackoverflow.com/a/64129850/10222449
I tried this and it works well.
This will save millions of writes.
var storage = firebase.storage();
var pathReference = storage.ref('users/' + userId + '/avatar.jpg');
pathReference.getDownloadURL().then(function (url) {
$("#large-avatar").attr('src', url);
}).catch(function (error) {
// Handle any errors
});

firebase real time database takes a long time to load data

Firebase realtime database is taking a long time to load data. Here's a screenshot of the data that I have in the database. What can I do to optimize the loading? Also are there other places that I can store the data other than firebase? The data is 3.8MB in size, and has the following structure
{"10-happier": {body: "test"}, "zero-to-one": {body: "test2"}}
Here's my code
var defer = Q.defer();
app.database().ref('content').once('value').then(snapshot => {
if (snapshot && snapshot.val()) {
defer.resolve(snapshot.val());
} else {
defer.resolve({});
}
}).catch( error => {
defer.reject(error);
});
return defer.promise;
There's nothing you can do to optimize a query like this. When you fetch an entire node:
app.database().ref('content').once('value')
The SDK will download everything, and it will take as long as it takes to get the whole thing. The total performance is going to be determined by the speed of the client's connection to the server. If you want the query to be faster, your only viable option is to get a faster connection to Realtime Database.
Alternatively, you can bypass the use of a database altogether and use a different method of storage that involves some form of compression or CDN to deliver the content more efficiently for the end user. Since recommendations for software and services are off-topic for Stack Overflow, you will have to do some research to figure out what your options are and what will work best for your specific situation.

Safe to Save Binary Data in Cloud Firestore Database?

I've always used the Cloud Firestore Database (and the old real-time one) to store text, and then use the Storage for images.
While using SurveyJS and AngularFirestore, I discovered I can push binary files into and out of the Firestore Database with the attached code. My question is: Is this OK?? I mean it works great, but I don't want to incur a cost or network slowdown...Thanks
var resultAsString = JSON.stringify(this.survey.data);
this.qs.saveSupplierQuestionnaire(this.companyid, this.id,this.survey.data)
...
saveSupplierQuestionnaire(userid:string, questionnaireid:string, questionnaireData:any) {
var resultAsString = JSON.stringify(questionnaireData);
var numCompleted = 0; /////test grading
const dbRef = this.afs.collection<questionnaire>('companies/' + userid + '/questionnaires/').doc(questionnaireid).update({results:resultAsString})
If it meets the needs of your application, then it's OK.
You should be aware than any time a document is read, the entire document is transferred to the client. So, even if you don't use the field with the binary data, you are going to make the user wait for the entire contents to be downloaded. This is true for all fields of a document, regardless of their types. There is really nothing special about binary fields, other than how the data is typed.

Firebase data structure - is the Firefeed structure relevant?

Firefeed is a very nice example of what can be achieved with Firebase - a fully client side Twitter clone. So there is this page : https://firefeed.io/about.html where the logic behind the adopted data structure is explained. It helps a lot to understand Firebase security rules.
By the end of the demo, there is this snippet of code :
var userid = info.id; // info is from the login() call earlier.
var sparkRef = firebase.child("sparks").push();
var sparkRefId = sparkRef.name();
// Add spark to global list.
sparkRef.set(spark);
// Add spark ID to user's list of posted sparks.
var currentUser = firebase.child("users").child(userid);
currentUser.child("sparks").child(sparkRefId).set(true);
// Add spark ID to the feed of everyone following this user.
currentUser.child("followers").once("value", function(list) {
list.forEach(function(follower) {
var childRef = firebase.child("users").child(follower.name());
childRef.child("feed").child(sparkRefId).set(true);
});
});
It's showing how the writing is done in order to keep the read simple - as stated :
When we need to display the feed for a particular user, we only need to look in a single place
So I do understand that. But if we take a look at Twitter, we can see that some accounts has several millions followers (most followed is Katy Perry with over 61 millions !). What would happen with this structure and this approach ? Whenever Katy would post a new tweet, it would make 61 millions Write operations. Wouldn't this simply kill the app ? And even more, isn't it consuming a lot of unnecessary space ?
With denormalized data, the only way to connect data is to write to every location its read from. So yeah, to publish a tweet to 61 million followers would require 61 million writes.
You wouldn't do this in the browser. The server would listen for child_added events for new tweets, and then a cluster of workers would split up the load paginating a subset of followers at a time. You could potentially prioritize online users to get writes first.
With normalized data, you write the tweet once, but pay for the join on reads. If you cache the tweets in feeds to avoid hitting the database for each request, you're back to 61 million writes to redis for every Katy Perry tweet. To push the tweet in real time, you need to write the tweet to a socket for every online follower anyway.

Please suggest a way to store a temp file in Windows Azure

Here I have a simple feature on ASP.NET MVC3 which host on Azure.
1st step: user upload a picture
2nd step: user crop the uploaded picture
3rd: system save the cropped picture, delete the temp file which is the uploaded original picture
Here is the problem I am facing now: where to store the temp file?
I tried on windows system somewhere, or on LocalResources: the problem is these resources are per Instance, so here is no guarantee the code on an instance shows the picture to crop will be the same code on the same instance that saved the temp file.
Do you have any idea on this temp file issue?
normally the file exist just for a while before delete it
the temp file needs to be Instance independent
Better the file can have some expire setting (for example, 1H) to delete itself, in case code crashed somewhere.
OK. So what you're after is basically somthing that is shared storage but expires. Amazon have just announced a rather nice setting called object expiration (https://forums.aws.amazon.com/ann.jspa?annID=1303). Nothing like this for Windows Azure storage yet unfortunately, but, doesnt mean we can't come up with some other approach; indeed even come up with a better (more cost effective) approach.
You say that it needs to be instance independant which means using a local temp drive is out of the picture. As others have said my initial leaning would be towards Blob storage but you will have cleanup effort there. If you are working with large images (>1MB) or low throughput (<100rps) then I think Blob storage is the only option. If you are working with smaller images AND high throughput then the transaction costs for blob storage will start to really add up (I have a white paper coming out soon which shows some modelling of this but some quick thoughts are below).
For a scenario with small images and high throughput a better option might be to use the Windows Azure Cache as your temporary storaage area. At first glance it will be eye wateringly expensive; on a per GB basis (110GB/month for Cache, 12c/GB for Storage). But, with storage your transactions are paid for whereas with Cache they are 'free'. (Quotas are here: http://msdn.microsoft.com/en-us/library/hh697522.aspx#C_BKMK_FAQ8) This can really add up; e.g. using 100kb temp files held for 20 minutes with a system throughput of 1500rps using Cache is about $1000 per month vs $15000 per month for storage transactions.
The Azure Cache approach is well worth considering, but, to be sure it is the 'best' approach I'd really want to know;
Size of images
Throughput per hour
A bit more detail on the actual client interaction with the server during the crop process? Is it an interactive process where the user will pull the iamge into their browser and crop visually? Or is it just a simple crop?
Here is what I see as a possible approach:
user upload the picture
your code saves it to a blob and have some data backend to know the relation between user session and uploaded image (mark it as temp image)
display the image in the cropping user interface interface
when user is done cropping on the client:
4.1. retrieve the original from the blob
4.2. crop it according the data sent from the user
4.3. delete the original from the blob and the record in the data backend used in step 2
4.4. save the final to another blob (final blob).
And have one background process checking for "expired" temp images in the data backend (used in step 2) to delete the images and the records in the data backend.
Please note that even in WebRole, you still have the RoleEntryPoint descendant, and you still can override the Run method. Impleneting the infinite loop in the Run() (that method shall never exit!) method, you can check if there is anything for deleting every N seconds (depending on your Thread.Sleep() in the Run().
You can use the Azure blob storage. Have look at this tutorial.
Under sample will be help you.
https://code.msdn.microsoft.com/How-to-store-temp-files-in-d33bbb10
you have two way of temp file in Azure.
1, you can use Path.GetTempPath and Path.GetTempFilename() functions for the temp file name
2, you can use Azure blob to simulate it.
private long TotalLimitSizeOfTempFiles = 100 * 1024 * 1024;
private async Task SaveTempFile(string fileName, long contentLenght, Stream inputStream)
{
try
{
//firstly, we need check the container if exists or not. And if not, we need to create one.
await container.CreateIfNotExistsAsync();
//init a blobReference
CloudBlockBlob tempFileBlob = container.GetBlockBlobReference(fileName);
//if the blobReference is exists, delete the old blob
tempFileBlob.DeleteIfExists();
//check the count of blob if over limit or not, if yes, clear them.
await CleanStorageIfReachLimit(contentLenght);
//and upload the new file in this
tempFileBlob.UploadFromStream(inputStream);
}
catch (Exception ex)
{
if (ex.InnerException != null)
{
throw ex.InnerException;
}
else
{
throw ex;
}
}
}
//check the count of blob if over limit or not, if yes, clear them.
private async Task CleanStorageIfReachLimit(long newFileLength)
{
List<CloudBlob> blobs = container.ListBlobs()
.OfType<CloudBlob>()
.OrderBy(m => m.Properties.LastModified)
.ToList();
//get total size of all blobs.
long totalSize = blobs.Sum(m => m.Properties.Length);
//calculate out the real limit size of before upload
long realLimetSize = TotalLimitSizeOfTempFiles - newFileLength;
//delete all,when the free size is enough, break this loop,and stop delete blob anymore
foreach (CloudBlob item in blobs)
{
if (totalSize <= realLimetSize)
{
break;
}
await item.DeleteIfExistsAsync();
totalSize -= item.Properties.Length;
}
}

Resources