Delete a Google Storage folder including all versions of objects inside - directory

Hi and thanks in advance. I want to delete a folder from Google Cloud Storage, including all the versions of all the objects inside. That's easy when you use gsutil from your laptop (you can just use the folder name as prefix and put the flag to delete all versions/generations of each object)
..but I want it in a script that is triggered periodically (for example when I'm on holidays). My current ideas are Apps Script and Google Cloud Functions (or firebase functions). The problem is that in these cases I don't have an interface as powerful as gsutil, I have to use REST API, so I cannot say something like "delete everything with this prefix" and neither "all the versions of this object". Thus the best I can do is
a) List all the object given a prefix. So for prefix "myFolder" I receive:
myFolder/obj1 - generation 10
myFolder/obj1 - generation 15
myFolder/obj2 - generation 12
... and so on for hundreds of files and at least 1 generation/version per file.
b) For each file-generation delete it giving the complete object name plus its generation.
As you can see that seems a lot of work. Do you know a better alternative?

Listing the objects you want to delete and deleting them is the only way to achieve what you want.
The only alternative is to use Lifecycle which can delete objects for you automatically based on conditions, if the conditions satisfy your requirements.

Related

Changing data structure after app has been published - Firestore

I have just published an app that uses Firestore as a backend.
I want to change how the data is structured;
for example if some documents are stored in subcollections like 'PostsCollection/userId/SubcolletionPosts/postIdDocument' I want to move all this last postIdDocument inside the first collection 'PostsCollection'.
Obviously doing so would prevent users of the previous app version from writing and reading the right collection and all data would be lost.
Since I don't know how to approach this issue, I want to ask you what is the best approach that also big companies use when changing the data structure of their projects.
So the approach I have used is document versioning. There is an explanation here.
You basically version your documents so when you app reads them, it knows how to update those documents to get them to the desired version. So in your case, you would have no version, and need to get to version 1, which means read the sub-collections to the top collection and remove the sub collection before working with the document.
Yes it is more work, but allows an iterative approach to document changes. And sometimes, a script is written to update to the desired state and new code is deployed 😛. Which usually happens when someone wants it done yesterday. Which with many documents can have it's own issues.

Parsing FB-Purity's Firefox idb (Indexed Database API) object_data blob from Linux bash

From a Linux bash script, I want to read the structured data stored by a particular Firefox add-on called FB-Purity.
I have found a folder called .mozilla/firefox/b8eab5j0.default/storage/default/moz-extension+++37a9788c-671d-4cae-ba5c-fbdb8788499a^userContextId=4294967295/ that contains a .metadata file which contains the string moz-extension://37a9788c-671d-4cae-ba5c-fbdb8788499a, an URL which when opened in Firefox shows the add-on's details, so I am pretty sure that this folder belongs to the add-on.
That folder contains an idb directory, which sounds like Indexed Database API, a W3C standard apparently used since last year by Firefox it to store add-ons data.
The idb folder only contains an empty folder and an SQLite file.
The SQLite file, unfortunately, does not contain much application structured data, but the object_data table contains a 95KB blob which probably contains the real structured data:
INSERT INTO `object_data` VALUES (1,'0pmegsjfoetupsf.742612367',NULL,NULL,
X'e08b0d0403000101c0f1ffe5a201000400ffff7b00220032003100380035003000320022003a002
2005300610074006f0072007500200055007205105861006e00690022002c00220036003100350036
[... 95KB ...]
00780022007d00000000000000');
Question: Any clue what this blob's format is? How to extract it (using command line or any library or Linux tool) to JSON or any other readable format?
Well, I had a fun day today figuring this out and ended creating a Python tool that can read the data from these indexedDB database files and print them (and maybe more at some point): moz-idb-edit
To answer the technical parts of the question first:
Both the name key (name) and data (value) use a Mozilla proprietary format whose only documentation appears to be its source code at this time.
The keys use a special just-for-this use-case encoding whose rough description is available in mozilla-central/dom/indexedDB/Key.cpp – the file also contains the only known implementation. Its unique selling point appears to be the fact that it is relatively compact while being compatible with all the possible index types websites may throw at you as well as being in the correct binary sorting order by default.
The values are stored using SpiderMonkey's internal StructuredClone representation that is also used when moving values between processes in the browser. Again there are no docs to speak of but one can read the source code which fortunately is quite easy to understand. Before being added to the database however the generated binary is compressed on-the-fly using Google's Snappy compression which “does not aim for maximum compression [but instead …] aims for very high speeds and reasonable compression” – probably not a bad idea considering that we're dealing with wasteful web content here.
To locate the correct indexedDB file for an extension's local storage data, one needs to resolve the extension's static ID to a so-call “internal UUID” whose value is different in every browser profile instance (to make tracking based on installed addons a lot harder). The mapping table for this is stored as a pref (“extensions.webextensions.uuids”) in the prefs.js. The IDB path then is ${MOZ_PROFILE}/storage/default/moz-extension+++${EXT_UUID}^userContextId=4294967295/idb/3647222921wleabcEoxlt-eengsairo.sqlite
For all practical intents and purposes you can read the value of a single storage key of any extension by downloading the project mentioned above. Basic usage is:
$ ./moz-idb-edit --extension "${EXT_ID}" --profile "${MOZ_PROFILE}" "${STORAGE_KEY}"
Where ${EXT_ID} is the extension's static ID (check its manifest.json file or look in about:support#extensions-tbody if your unsure), ${MOZ_PROFILE} is the Firefox profile directory (also in about:support) and ${STORAGE_KEY} is the name of the key you'd like to query (unfortunately querying all keys is not supported yet).
Also writing data is not currently supported either.
I'll update this answer as I implement more features (or drop me an issue on the project page!).

How to use multiple namespaces in DoctrineCacheBundle cacheDriver?

I know I can setup multiple namespaces for DoctrineCacheBundle in config.yml file. But Can I use one driver but with multiple namespaces?
The case is that in my app I want to cache all queries for all of my entities. The problem is with flushing cache while making create/update actions. I want to flush only part of my cached queries. My app is used by multiple clients. So when a client updates sth in his data for instance in Article entity, I want to clear cache only for this client only for Article. I could add proper IDs for each query and remove them manually but the queries are dynamically used. In my API mobile app send version number for which DB should return data so I don't know what kind of IDs will be used in the end.
Unfortunately I don't think what you want to do can be solved with some configuration magic. What you want it some sort of indexed cache, and for that you have to find a more powerful tool.
You can take a look at doctrines second level cache. Don't know how good it is now (tried it once when it was in beta and did not make the cut for me).
Or you can build your own cache manager. If you do i recommend using redis. The data structures will help you keep you indexes (Can be simulated with memcached, but it requires more work). What I meen by indexes.
You will have a key like client_1_articles where 1 is the client id. In that key you will store all the ids of the articles of client 1. For every article id you will have a key like article_x where x is the id the of article. In this example client_1_articles is a rudimentary index that will help you, if you want at some point, to invalidated all the caches of articles coming from client 1.
The abstract implementation for the above example will end up being a graph like structure over your cache, with possibly
-composed indexes 'client_1:category_1' => {article_1, article_2}
-multiple indexes for one item eg: 'category_1'=>{article_1, article_2, article_3}, 'client_1' => {article_1, article_3}
-etc.
Hope this help you in some way. At least that was my solution for a similar problem.
Good luck with your project,
Alexandru Cosoi

DocumentDb and how to create folder?

New to documentdb and I am trying to determine the best way to store documents. We are uploading documents every 15 minutes and I need to keep them as easily separated by upload as possible. At first glance, I thought I could have a database and a collection for each upload. Then, I discovered you can only have 3 collections per database. This leaves me with either adding a naming convention or trying to use folders and paths. According to the same source (http://azure.microsoft.com/en-us/documentation/articles/documentdb-limits/), we are limited to 100 paths per collection. This leaves folders. I have been looking, but I haven't found anything concrete on creating folders within a collection. The object API doesn't have an obvious add/create method.
Is this possible? If so, are we limited to how many (assuming I stay within the allowed collection/database size)?
You could define a sequential naming convention and create a range index on the collection indexing policy. In this way, if you need to retrieve a range of documents, you can do it in this way, which will leverage the indexing capabilities of docdb efficiently.
As a recommendation, you can examine the charge response header on the requests you fire off during your tests. This allows you to gauge how efficient your setup is (how stringent it is against the Db, which will translate into your cost structure for the service)
Sorry about the comment. What we ended up doing was just dumping everything into one collection. The azure documentdb query language (i.e. sql like) seems robust enough to handle detailed queries. Though I am not sure what the efficiency will be like once we have a ton of documents in there.

How to Access Data in ZODB

I have a Plone site that has a lot of data in it and I would like to query the database for usage statistics; ie How many cals with more than 1 entries, how many blogs per group with entries after a given date, etc.
I want to run the script from the command line... something like so:
bin/instance [script name]
I've been googling for a while now but can't find out how to do this.
Also, can anybody provide some help on how to get user specific information. Information like, last logged in, items created.
Thanks!
Eric
In general, you can query the portal_catalog to locate content by searching various indexes. See http://plone.org/documentation/manual/developer-manual/indexing-and-searching/querying-the-catalog and http://docs.zope.org/zope2/zope2book/SearchingZCatalog.html for an introduction to the catalog.
In some cases the built-in indexes will allow you to do the query you want. In other cases you may need to write some Python to narrow down the results after doing an initial catalog query.
If you put your querying code in a file called foo.py, you can run it via:
bin/instance run foo.py
Within foo.py, you can refer to the root of the database as 'app'. The catalog would then be found at app.site.portal_catalog, where 'site' is the id of your Plone site.
Finding information about users happens via a separate API (for the Pluggable Auth Service). I'd suggest asking a separate question about that.

Resources