Is there a benefit storing protobuf in sqlite? - sqlite

In my mobile app (hybrid), I want to allow the user to take his data to another device. There will be no server side components from my end. The data user would carry would contain images, audio, video along with text and timestamps etc. My design evolved as below
1. Store each entry in a JSON file with image, audio and video as Data URI and export this file to cloud sync platforms. The problem with this approach is that, even though JSON is better than XML, there could be better options. See below
2. Store each entry in a BSON file with image, audio and video as Data URI and export this file to cloud sync platforms. The problem with this approach is that as mentioned in its site still the field names will be repeated and protobuf could be a better fit.
3. Store each entry in a protocol buffer file with image, audio and video as Data URI and export this file to cloud sync platforms.
Then when I stumbled across greenDAO they were mentioning
greenDAO lets you persist protocol buffer (protobuf) objects directly
into the database.
What is the benefit I will be getting by storing the protobuf object in sqlite DB? Will be able to export sqlite file instead of file containing object in protobuf format?

Well, the data still has to be serialized somehow into the database. greenDAO just hides the serialization from you. Since you have specific needs, you are probably best building your own solution, better tailored for your needs.
If you don't anticipate the field names changing, why not just store the entries as database rows? This has a number of nice advantages, including the ability to have sortable and searchable entries.

Related

cosmos database to be indexed in pull approach + files

I have items and files. There is a 1:m relationship between items and files. Items are stored in a relational database and files in folders. The association between items and files is stored in the relational database. Files can be pdfs, word docs, email etc. I intend to POC cognitive search to be able to search items and associated documents.
My current understanding is, that a pull approach might be cheaper in comparison to the push approach when using cognitive search (the latency requirements are not stringent and eventual consistency is OK). Hence, I intend to move the data into a cosmos database, which can then be indexed via the pull approach. Curious, how does this work with the documents? Would I need to crack them on prem?
There is also the option of attachments and blob storage of documents. The latter is most likely more future proofed. I would think that if I put documents into blob storage, cognitive search indexing would still need to crack the documents and apply skills?
This sounds like a good approach. In terms of data sources, Cognitive Search supports CosmosDB and blob storage and some relationship databases. I would probably:
Create a new Cognitive Search resource in the Azure portal.
In that Cognitive Search resource, click "Import data" to create a new indexer (this is the "pull" option that you mention above). You may want to do this twice, assuming that your items are in CosmosDB or a relational DB, and your documents are stored separately in blob storage.
The first indexer has a data source which points to your items/relationship data in whatever DB you decide to put them, applies any skills that you want, and puts everything in an index.
The second indexer has a different data source which points to your documents in blob storage, applies any skills that you want, and puts everything in the same index.
If you use indexers, they will take care of the document cracking. If you push data directly into the index, you will need to crack the documents yourself.
This gives a simple walkthrough of creating an indexer with the portal (skillset is optional, and change the data source to your own data): https://learn.microsoft.com/en-us/azure/search/cognitive-search-quickstart-blob

Parsing FB-Purity's Firefox idb (Indexed Database API) object_data blob from Linux bash

From a Linux bash script, I want to read the structured data stored by a particular Firefox add-on called FB-Purity.
I have found a folder called .mozilla/firefox/b8eab5j0.default/storage/default/moz-extension+++37a9788c-671d-4cae-ba5c-fbdb8788499a^userContextId=4294967295/ that contains a .metadata file which contains the string moz-extension://37a9788c-671d-4cae-ba5c-fbdb8788499a, an URL which when opened in Firefox shows the add-on's details, so I am pretty sure that this folder belongs to the add-on.
That folder contains an idb directory, which sounds like Indexed Database API, a W3C standard apparently used since last year by Firefox it to store add-ons data.
The idb folder only contains an empty folder and an SQLite file.
The SQLite file, unfortunately, does not contain much application structured data, but the object_data table contains a 95KB blob which probably contains the real structured data:
INSERT INTO `object_data` VALUES (1,'0pmegsjfoetupsf.742612367',NULL,NULL,
X'e08b0d0403000101c0f1ffe5a201000400ffff7b00220032003100380035003000320022003a002
2005300610074006f0072007500200055007205105861006e00690022002c00220036003100350036
[... 95KB ...]
00780022007d00000000000000');
Question: Any clue what this blob's format is? How to extract it (using command line or any library or Linux tool) to JSON or any other readable format?
Well, I had a fun day today figuring this out and ended creating a Python tool that can read the data from these indexedDB database files and print them (and maybe more at some point): moz-idb-edit
To answer the technical parts of the question first:
Both the name key (name) and data (value) use a Mozilla proprietary format whose only documentation appears to be its source code at this time.
The keys use a special just-for-this use-case encoding whose rough description is available in mozilla-central/dom/indexedDB/Key.cpp – the file also contains the only known implementation. Its unique selling point appears to be the fact that it is relatively compact while being compatible with all the possible index types websites may throw at you as well as being in the correct binary sorting order by default.
The values are stored using SpiderMonkey's internal StructuredClone representation that is also used when moving values between processes in the browser. Again there are no docs to speak of but one can read the source code which fortunately is quite easy to understand. Before being added to the database however the generated binary is compressed on-the-fly using Google's Snappy compression which “does not aim for maximum compression [but instead …] aims for very high speeds and reasonable compression” – probably not a bad idea considering that we're dealing with wasteful web content here.
To locate the correct indexedDB file for an extension's local storage data, one needs to resolve the extension's static ID to a so-call “internal UUID” whose value is different in every browser profile instance (to make tracking based on installed addons a lot harder). The mapping table for this is stored as a pref (“extensions.webextensions.uuids”) in the prefs.js. The IDB path then is ${MOZ_PROFILE}/storage/default/moz-extension+++${EXT_UUID}^userContextId=4294967295/idb/3647222921wleabcEoxlt-eengsairo.sqlite
For all practical intents and purposes you can read the value of a single storage key of any extension by downloading the project mentioned above. Basic usage is:
$ ./moz-idb-edit --extension "${EXT_ID}" --profile "${MOZ_PROFILE}" "${STORAGE_KEY}"
Where ${EXT_ID} is the extension's static ID (check its manifest.json file or look in about:support#extensions-tbody if your unsure), ${MOZ_PROFILE} is the Firefox profile directory (also in about:support) and ${STORAGE_KEY} is the name of the key you'd like to query (unfortunately querying all keys is not supported yet).
Also writing data is not currently supported either.
I'll update this answer as I implement more features (or drop me an issue on the project page!).

Using Azure Search index to index blobs in Azure Blob Storage (Images and Videos)

I want to index blob of type image and video.
From what I have read Azure Search cannot index image and video types.
What I have done is that I was thinking of using the blob's metadata_storage_path. However that is my key and it is encoded.
Decoding it is really a performance killer.
Is there any way I can index images and videos, using azure search index?
If not, is there any other way?
IIUC, you want to index the metadata attached to the blob but not its content, correct? If so, set dataToExtract parameter to storageMetadata as described in Controlling which parts of the blob are indexed.
The cost of base64-decoding the encoded metadata_storage_path to correlate with the rest of your system is likely to be negligible compared to other work your app is doing, such as calls to the database or Azure Search. However, you can avoid the need for decoding if you fork metadata_storage_path into a new non-key field in your index, which won't need to be encoded. You can use field mappings to fork the field.

Save image url or save image file in sql database?

We can save an image with 2 way
upload image in Server and save image url in Database.
save directly image into database
which one is better?
There's a really good paper by Microsoft Research called To Blob or Not To Blob.
Their conclusion after a large number of performance tests and analysis is this:
if your pictures or document are typically below 256K in size, storing them in a database VARBINARY column is more efficient
if your pictures or document are typically over 1 MB in size, storing them in the filesystem is more efficient (and with SQL Server 2008's FILESTREAM attribute, they're still under transactional control and part of the database)
in between those two, it's a bit of a toss-up depending on your use
If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee foto in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee foto, too, as part of your queries.
For filegroups, check out Files and Filegroup Architecture for an intro. Basically, you would either create your database with a separate filegroup for large data structures right from the beginning, or add an additional filegroup later. Let's call it "LARGE_DATA".
Now, whenever you have a new table to create which needs to store VARCHAR(MAX) or VARBINARY(MAX) columns, you can specify this file group for the large data:
CREATE TABLE dbo.YourTable
(....... define the fields here ......)
ON Data -- the basic "Data" filegroup for the regular data
TEXTIMAGE_ON LARGE_DATA -- the filegroup for large chunks of data
Check out the MSDN intro on filegroups, and play around with it!
Like many questions, the ansewr is "it depends." Systems like SharePoint use option 2. Many ticket tracking systems (I know for sure Trac does this) use option 1.
Think also of any (potential) limitations. As your volume increases, are you going to be limited by the size of your database? This has particular relevance to hosted databases and applications where increasing the size of your database is much more expensive than increasing your storage allotment.
Saving the image to the server will work better for a website, given that these are incidental to your website, like per customer branding images - if you're setting up the next Flickr obviously the answer would be different :). You'd want to set up one server to act as a file server, share out the /uploaded_images directory (or whatever you name it), and set up an application variable defining the base url of uploaded images. Why is it better? Cost. File servers are dirt cheap commodity hardware. You can back up the file contents using dirt cheap commodity (even just consumer grade) backup software. And if your file server croaks and someone loses a day of uploaded images? Who cares. They just upload them again. Our database server is an enterprise cluster running on SSD SAN. Our backups and tran logs are shipped to remote sites over expensive bandwidth and maintained even on tape for x period. We use it for all the data where we need the ACID (atomicity, consistency, isolation, durability) benefits of a RDBMS. We don't use it for company logos.
Store them in the database unless you have a good reason not to.
Storing them in the filesystem is premature optimization.
With a database you get referential integrity, you can back everything up at once, integrated security, etc.
The book SQL Anti-Patterns calls storing files in the filesystem an anti-pattern.

LocalStorage or SQLite Database?

I'm currently developing a mobile application which uses AJAX request to get data from a server.
To enable offline navigation in my application, I need to store all data collected.
My application is quite powerful because there's a section where the user can see charts (powered by highcharts).
I'm asking myself about the best solution to cache the data collected in the JSON format.
Is it light or efficient enough to JSON.stringify the data array into local storage like:
localStorage.setItem("graph_1_datas", JSON.stringify(json_data_array));
Or would it be better to create a database, and a table like that:
TABLE
-----
id
graphId
blockId
x
y
I have 3 graphIds by blockId, and about 10 blockIds...
Storing the JSON strings to local storage should be fairly fast and efficient. Just store a separate file for each request and then it will give you clear simple code for getting the data either from local storage or web service.
If you are likely to want to edit the data offline then you may wish to consider an SQLite database as it will make it easier/more efficient to add code to track changes.
You may also want to consider an SQLite database if your object graph gets more complicated and fits a relational database model.

Resources