Is there any downside of using GUID as a file name an นด uploaded image to avoid duplication ?
Your filenames will be unique, true. But there won't be any way to sort them, in any order.
You could use the Unix TimeStamp in front of your GUID, to help sort-by-name and perform other such operations, without having to use a look-up table in your database.
If you store uploaded files with a name based on the hash (eg. SHA1) of the file contents, then you can also store files with identical contents only once (saving space).
I think its enough unique, but consider using an own generator using a date and time and some serial number, i think the names would be more expressive.
There is a downside to using GUIDs in a filename. CLSID's are GUIDs, and some virus scanners will think that you are trying to use this exploit, and will mark your files as potential malware.
See Microsoft Windows CLSID Hidden File Extension Vulnerability for more information.
Related
From a Linux bash script, I want to read the structured data stored by a particular Firefox add-on called FB-Purity.
I have found a folder called .mozilla/firefox/b8eab5j0.default/storage/default/moz-extension+++37a9788c-671d-4cae-ba5c-fbdb8788499a^userContextId=4294967295/ that contains a .metadata file which contains the string moz-extension://37a9788c-671d-4cae-ba5c-fbdb8788499a, an URL which when opened in Firefox shows the add-on's details, so I am pretty sure that this folder belongs to the add-on.
That folder contains an idb directory, which sounds like Indexed Database API, a W3C standard apparently used since last year by Firefox it to store add-ons data.
The idb folder only contains an empty folder and an SQLite file.
The SQLite file, unfortunately, does not contain much application structured data, but the object_data table contains a 95KB blob which probably contains the real structured data:
INSERT INTO `object_data` VALUES (1,'0pmegsjfoetupsf.742612367',NULL,NULL,
X'e08b0d0403000101c0f1ffe5a201000400ffff7b00220032003100380035003000320022003a002
2005300610074006f0072007500200055007205105861006e00690022002c00220036003100350036
[... 95KB ...]
00780022007d00000000000000');
Question: Any clue what this blob's format is? How to extract it (using command line or any library or Linux tool) to JSON or any other readable format?
Well, I had a fun day today figuring this out and ended creating a Python tool that can read the data from these indexedDB database files and print them (and maybe more at some point): moz-idb-edit
To answer the technical parts of the question first:
Both the name key (name) and data (value) use a Mozilla proprietary format whose only documentation appears to be its source code at this time.
The keys use a special just-for-this use-case encoding whose rough description is available in mozilla-central/dom/indexedDB/Key.cpp – the file also contains the only known implementation. Its unique selling point appears to be the fact that it is relatively compact while being compatible with all the possible index types websites may throw at you as well as being in the correct binary sorting order by default.
The values are stored using SpiderMonkey's internal StructuredClone representation that is also used when moving values between processes in the browser. Again there are no docs to speak of but one can read the source code which fortunately is quite easy to understand. Before being added to the database however the generated binary is compressed on-the-fly using Google's Snappy compression which “does not aim for maximum compression [but instead …] aims for very high speeds and reasonable compression” – probably not a bad idea considering that we're dealing with wasteful web content here.
To locate the correct indexedDB file for an extension's local storage data, one needs to resolve the extension's static ID to a so-call “internal UUID” whose value is different in every browser profile instance (to make tracking based on installed addons a lot harder). The mapping table for this is stored as a pref (“extensions.webextensions.uuids”) in the prefs.js. The IDB path then is ${MOZ_PROFILE}/storage/default/moz-extension+++${EXT_UUID}^userContextId=4294967295/idb/3647222921wleabcEoxlt-eengsairo.sqlite
For all practical intents and purposes you can read the value of a single storage key of any extension by downloading the project mentioned above. Basic usage is:
$ ./moz-idb-edit --extension "${EXT_ID}" --profile "${MOZ_PROFILE}" "${STORAGE_KEY}"
Where ${EXT_ID} is the extension's static ID (check its manifest.json file or look in about:support#extensions-tbody if your unsure), ${MOZ_PROFILE} is the Firefox profile directory (also in about:support) and ${STORAGE_KEY} is the name of the key you'd like to query (unfortunately querying all keys is not supported yet).
Also writing data is not currently supported either.
I'll update this answer as I implement more features (or drop me an issue on the project page!).
In my Windows 8/RT app I use SQLite DataBase (sqlite-net) witch store in Isolated Storage. In DataBase I have a lot of data, including files(images, pdf's and other) links. I get those links from web server. When I got link, I want to download file and store it locally.
My question is: what is the best way to store big number of files (100+)? One important think: I need to organize quickly find the desired file.
I have three ideas:
Create another DataBase only for files (I can't modify existing)
Create folder in IS and store here directly.
Create list of files and store it in IS.
Which would be better/faster? Or somebody have another great solution?
100 files isn't such a big number as you can easily store up to 100k files (or folders) in a single (NTFS) directory.
If you receive the files from a webserver then the question is whether the source makes sure there are no duplicate filenames. If this can't be assured, I'd recommend having a database table mapping from original filename and metadata to its hash (SHA256 or similar) and store the file with a filename corresponding to its hash.
Then, when using the file, you can pass pass it to the user using the original filename using the StorageFile API.
Going beyond 100k files, you could create a subfolder structure from the first two letters of the hash.
Either way, storing the file metadata in a database and the files in a directory has been the most useful approach for us in the past.
100 files with average size of 1MB is only 100MB.
Most people say that storing binary files in database is wrong and suggest storing files separately and only keep file names in database, but I think it is fine provided you know what you are doing and why.
Big advantage of storing files in database is that you keep files together with their properties logically in one place. Also, you can simply copy one file and this would backup everything.
Database also affords you transaction support. You may have some problems reading and writing BLOBs into database, but it is not very difficult.
I have an application that stores configuration files as XML on disk. I'd like to reduce the risk of data file corruption in the case of crashing etc. It seems like the common recommendation is to use SQLite.
What is your opinion on just using BLOBs to store the current XML format? The table would look like:
CREATE TABLE t ( filename TEXT, filedata BLOB )
On the one hand, this seems inelegant, but on the other it would avoid all the work (and corresponding bugs) of converting the configuration to an appropriate format.
Sounds inefficient. You'll need to load and parse the BLOB to get your configuration values as well as save the entire configuration file for every change.
I'm assuming the reason you're switching to a SQLite database is because the transaction mechanism will give you some amount of fault tolerance to crashes. If you store each of your configuration files as one BLOB then you will need to save then entire file before the transaction completes as opposed to just saving the updated values which should be quicker.
In addition if you're using a DOM based XML parser you'll end up loading both the BLOB and the parsed DOM tree into memory at the same time. Depending on the size and number of your configuration files that could be resource intensive.
IMHO you're better off creating a table for each configuration files with a row for each of your configuration values. You'll get better read/write performance, less memory usage and be able to use all the relational mechanisms of SQLite.
I develop a web application that let users to upload files like images and documents. this file divided into two parts :
binary files
document files
I want to allow users to search documents that uploaded. specialy using full text search. What data types I should use for these two file types?
You can store the data in binary and use full text search to interpret the binary data and extract the textual information: .doc, .txt, .xls, .ppt, .htm. The extracted text is indexed and becomes available for querying (make sure you use the CONTAINS keyword). Needless to say, full text search has to be enabled.Not sure how adding a full text index will affect your system - i.e., its size. You'll also need to look at the execution plan to ensure the index gets used at query time.
For more information look at this:
http://technet.microsoft.com/en-us/library/ms142499(SQL.90).aspx
Pros:
The main advantage of storing data in the database is that it makes the data "self-contained". Since all of the data is contained within the database, backing up the data, moving the data from one database server to another, replicating the database, and so on, is much easier.
also you can enable versioning of files and also make it easier for load balanced web farms.
Cons:
you can read it here: https://dba.stackexchange.com/questions/3924/sql-server-2005-large-binary-storage. But this is something that you have to do in order to search through the files efficiently.
Or the other thing that I could suggest is probably storing keywords in the database and then linking the same to file in the fileshare.
Here is an article discussing abt using a FileStream and a database: http://blogs.msdn.com/b/manisblog/archive/2007/10/21/filestream-data-type-sql-server-2008.aspx
You first need to convert the PDF to text. There are libraries for this sort of thing (ie: PowerGREP). Then I'd recommend storing the text of the PDF files in a database. If you need to do full text searching and logic such as "on the same line" then you'll need to store one record per line of text. If you just want to search for text in a file, then you can change the structure of your SQL schema to match your needs.
For docx files, I would convert them to RTF and search them that way while stored in SQL.
For images, Microsoft has a program called Microsoft OneNote that does OCR (optical character recognition) so you can search for text within images. It doesn't matter what tool you use, just that it supports OCR.
Essentially, if you don't have a way to directly read the binary file, then you need to convert it to text with some library, then worry about doing your searching.
The full-text index can be created for columns which use any of the following data types – CHAR, NCHAR, VARCHAR, NVARCHAR, TEXT, NTEXT, VARBINARY, VARBINARY (MAX), IMAGE and XML.
In addition, To use full text search you must create a full-text index for the table against which they want to run full-text search queries. For a particular SQL Server Table or Indexed View you can create a maximum of one Full-Text Index.
these are two article about it:
SQL SERVER - 2008 - Creating Full Text Catalog and Full Text Search
Using Full Text Search in SQL Server 2008
For my website I've just implemented tinyMCE for my site (just a word processor). Everything works fine except when i try to store the string variable input into a sql server database. I want to store a string and not have the html tags make me exceed the 8000 length limit(the html tags take up most of that space). My question is, is there a solution so I can store my document with the html tags without shortening my document? Thanks
Some ideas I've had but not sure if they'll work
create an if statement that will determine the length If > 8000 than split the string apart and insert into seperate fields.
maybe their is a compression feature which I'm unaware of?
Paul
Can you store it as a BLOB or possibly even FILESTREAM. I know BLOB's have a size limit of 2 GB and are probably less than the ideal depending on the average size of the file you expect because of the hit to the log file. FILESTREAM's were added in SQL SERVER 2008 to handle large files by writing them directly to the filesystem by setting an attribute on the varbinary type.