Tables in Cloud Datastore suddenly gone - google-cloud-datastore

In our development project, two tables in datastore are suddenly missing.
In the dashboard the summary of the tables are still shown, but there's no way of querying them.
How can I find out how/by whom/why these tables got deleted?
How is this even possible?

There can be only two ways of removing the tables from datastore.
Running a script which will delete all entries from the table and eventually table is gone.
Deleting all the entries manually by using Google Cloud Datastore UI.
If someone has deleted using step 1 then you might be able to track it down by logs but if it has been done using step 2 then I'm sorry there is no way to know who has done it.
Hope this answers your question!!

Related

Deleting a very large collection in firestore from the firebase console

I have a very large collection of aprox 2 milions documents, all of them are outdated, and needed to be deleted.
I need to do this operation only one time, in the new data i have TTL (time to live) so i won't run into this problem again.
Sould i use the firestore console ui to delete those, or there is a better way to do this. is it possible to do this in one shot or sould i split it?
There's no single way that is pertinently better here.
The simplest option is probably to delete the documents from the console, but I often also use the Firebase CLI's firestore:delete command - and writing your own logic through the API is equally fine. Any of these can work fine, all will need to read the documents before deleting them, and none of them is going to be significantly faster than the other.

Changing data structure after app has been published - Firestore

I have just published an app that uses Firestore as a backend.
I want to change how the data is structured;
for example if some documents are stored in subcollections like 'PostsCollection/userId/SubcolletionPosts/postIdDocument' I want to move all this last postIdDocument inside the first collection 'PostsCollection'.
Obviously doing so would prevent users of the previous app version from writing and reading the right collection and all data would be lost.
Since I don't know how to approach this issue, I want to ask you what is the best approach that also big companies use when changing the data structure of their projects.
So the approach I have used is document versioning. There is an explanation here.
You basically version your documents so when you app reads them, it knows how to update those documents to get them to the desired version. So in your case, you would have no version, and need to get to version 1, which means read the sub-collections to the top collection and remove the sub collection before working with the document.
Yes it is more work, but allows an iterative approach to document changes. And sometimes, a script is written to update to the desired state and new code is deployed 😛. Which usually happens when someone wants it done yesterday. Which with many documents can have it's own issues.

How to find which kinds are not being used in Google Datastore

There's any way to list the kinds that are not being used in google's datastore by our app engine app without having to look into our code and/or logic? : )
I'm not talking about indexes, which I can list by issuing an
gcloud datastore indexes list
and then compare with the datastore-indexes.xml or index.yaml.
I tried to check datastore kinds statistics and other metadata but I could not find anything useful to help me on this matter.
Should I give up to find ways of datastore providing me useful stats and code something to keep collecting datastore statistics(like data size), during a huge period to have at least a clue of which kinds are not being used and then, only after this research, take a look into our app code to see if the kind Model was removed?
Example:
select bytes from __Stat_Kind__
Store it somewhere and keep updating for a period. If the Kind bytes size does not change than probably the kind is not being used anymore.
The idea is to do some cleaning in datastore.
I would like to find which kinds are not being used anymore, maybe for a long time or were created manually to be used once... You know, like a table in oracle that no one knows what is used for and then if we look into the statistics of that table we would see that this table was only used once 5 years ago. I'm trying to achieve the same in datastore, I want to know which kinds are not being used anymore or were used a while ago, then ask around and backup/delete it if no owner was found.
It's an interesting question.
I think you would be best-placed to audit your code and instill organizational practice that requires this documentation to be performed in future as a business|technical pre-prod requirement.
IIRC, Datastore doesn't automatically timestamp Entities and keys (rightly) aren't incremental. So there appears no intrinsic mechanism to track changes short of taking a snapshot (expensive) and comparing your in-flight and backup copies for changes (also expensive and inconclusive).
One challenge with identifying a Kind that appears to be non-changing is that it could be referenced (rarely) by another Kind and so, while it does not change, it is required.
Auditing your code and documenting it for posterity should not only provide you with a definitive answer (and identify owners) but it pays off a significant technical debt that has been incurred and avoids this and probably future problems (e.g. GDPR-like) requirements that will arise in the future.
Assuming you are referring to records being created/updated, then I can think of the following options
Via the Cloud Console (Datastore > Dashboard) - This lists all your 'Kinds' and the number of records in each Kind. Theoretically, you can take a screen shot and compare the counts so that you know which one has experienced an increase or not.
Use of Created/LastModified Date columns - I usually add these 2 columns to most of my datastore tables. If you have them, then you can have a stored function that queries them. For example, you run a query to sort all of your Kinds in descending order of creation (or last modified date) and you only pull the first record from each one. This tells you the last time a record was created or modified.
I would write a function as part of my App, put it behind a page which requires admin privilege (only app creator can run it) and then just clicking a link on my App would give me the information.

How do you remove columns in Azure Cosmos DB View?

I'm pretty new to NoSQL in general but I'm trying azure's cosmos db to store my logs. I've been really struggling with their UI.
In the image below, I just created a random collection and added a simple entity with 3 fields. All I want to do at this point is so reduce the columns so that I only see the ones I want. Granted I can't remove the first 3 columns, whenever I try to remove any column at all, I get this error and instead of removing the columns it just leave it blank. Now I can't find any documentation for this simple UI (which shouldn't really be the case) so here are some things maybe someone can help me out.
How to remove columns. Like in SQL, only display the columns I want?
How to create custom query? The query builder here is really limited and unusable to be honest.
Is there any other DB Client similar to MongoDB's compass that might make the UI easier?
1.How to remove columns. Like in SQL, only display the columns I want?
You could choose your desired columns in the advanced options,but the PartitionKey,RowKey and TimesStamp are default columns in cosmos db table api,you can't remove them in the portal data explorer.
2.How to create custom query?
Same as above situations,it has limitation in the portal UI.
3.Is there any other DB Client similar to MongoDB's compass that might make the UI easier?
However,you could download the Azure Storage Explorer so that you could easily manage the contents of your storage account.
It's more flexible than portal UI ,such as you could filter the columns,including the default columns.
However, the general design of the tool UI is similar to the portal UI.
BTW,as #David mentioned in the comment,if you want to query the data more flexible with sql language,you need to consider using Cosmos DB SQL API instead of Cosmos DB Table API.
I agree that the experience could be quite daunting. As a suggestion, you could try Cerebrata (https://cerebrata.com/) as a cost-effective solution.
The tool basically allows you to remove the desired columns in just a few clicks (as projected in the GIF below). The tool also allows you to edit multiple entities in bulk and more.
CosmosDB Delete entities Table API - Cerebrata

Create dashboard on Firebase Database for various metrics

I have events in firebase database table where each event has certain fields. One of the field is event_type. What I want to achieve is to be able to visualize in graphical form, how many events of each type comes daily?
How do I do something like that in firebase database?
Q1. Is it possible to directly do this in firebase?
Q2. Do I need to move data to some other datasource (like Big query) and setup dashboard there?
It is definitely possible to create a dashboard with aggregate data directly on the Firebase Realtime Database. But you'll have to take a different approach than with e.g. BigQuery.
With relational databases, you'll create a dashboard by running aggregation queries. For example to show how many events of each type, you'll run something like SELECT type, COUNT(*) FROM events GROUP BY type.
The Firebase Realtime Database (and most NoSQL databases) don't have such a GROUP BY operation, not a COUNT() method. So that means that you'd have to load all data into your dashboard, and group/count it there, which is quite expensive. That why on NoSQL databases you'll typically keep a running count for each type in the database and update that on every write operation. While this puts an overhead on each write operation, the dashboard itself suddenly becomes very simply when you do this. For an example of a simple counter, see the function-samples repo.
This approach only works if you know up front what counters (and other aggregates) you want to show in the dashboard. If that isn't the case, many developers use the nightly backups from the Realtime Database to ingest the data into another system that lends itself more to exploratory querying, such as BigQuery.
Either approach can work fine. The right approach is a matter of your exact use-case (e.g. do you know the exact data you want in the dashboard, or are you still figuring that out?) and what you're most comfortable with.

Resources