i have a binary file of mobile, in this binary file msgs and contacts of phone-book are stored i have extracted msgs from it but now have to extract contacts saved in phone book.in this binary file the data is stored in sqlite format as i found this string 53514C69746520666F726D617420330000 in my binary file. now how to extract list of contacts saved in phone book.
You need to first work out the format of the file from which you are extracting information, then write code to extract it. A good starting point would be The SQLite Database File Format.
The first part of that string you give (53514C69746520666F726D6174203300) is ASCII hex for SQLite format 3<nul>, which matches the header shown in that link above, so that may go some way toward helping you figure out how best to process it.
Although, given the fact it appears to be just a normal SQLite database file, you may get lucky and be able to use it as-is with a normal SQLite instance. That would be the first thing I'd try since you can then use regular SQL queries to output the data in a more usable form.
For example, if the file is called pax.db, simply run:
sqlite pax.db
to open it, then you may find you can use all the regular investigative commands like .databases, .schema, .tables and so on.
Related
I'm trying to use Azure Data Explorer to ingest some logs (IIS Logs, POP3 logs, IMAP logs) that contain values delimited by space.
I would have expected Azure Data Explorer to infer the correct schema from the files as separate columns, however it only identifies a single column with the entire data.
The reason for this seems to be the header and metadata rows, which I can't find a way to skip (would have thought there is a way to skip those).
However, even if I remove the metadata rows, manually, from the log file, it still doesn't seem to be able to recognize the schema for the table.
I have also tried to create the table before ingesting, using KQL queries, and instead of creating a new table, I ask the import to ingest into an already existing table. However, doing this, it doesn't identify any rows to be imported from the logs.
I'm not sure what exactly can be done, I thought Azure Data Explorer (and Log Explorer - tried that too, works the same) to be a perfect solution for log files created by Windows apps.
The documentation might have been a good start point.
It is very clear as to what are the supported formats for ingestion.
IIS Logs, POP3 logs & IMAP logs are not listed.
Data formats supported by Azure Data Explorer for ingestion
As to the TXT format, an entire line is ingested as a single value. No additional parsing there.
Format
Extension
Description
TXT
.txt
A text file with lines delimited by \n. Empty lines are skipped.
You could use the TXT format to load the data and then parse it and split it to columns, within ADX, probably by using REGEX.
From a Linux bash script, I want to read the structured data stored by a particular Firefox add-on called FB-Purity.
I have found a folder called .mozilla/firefox/b8eab5j0.default/storage/default/moz-extension+++37a9788c-671d-4cae-ba5c-fbdb8788499a^userContextId=4294967295/ that contains a .metadata file which contains the string moz-extension://37a9788c-671d-4cae-ba5c-fbdb8788499a, an URL which when opened in Firefox shows the add-on's details, so I am pretty sure that this folder belongs to the add-on.
That folder contains an idb directory, which sounds like Indexed Database API, a W3C standard apparently used since last year by Firefox it to store add-ons data.
The idb folder only contains an empty folder and an SQLite file.
The SQLite file, unfortunately, does not contain much application structured data, but the object_data table contains a 95KB blob which probably contains the real structured data:
INSERT INTO `object_data` VALUES (1,'0pmegsjfoetupsf.742612367',NULL,NULL,
X'e08b0d0403000101c0f1ffe5a201000400ffff7b00220032003100380035003000320022003a002
2005300610074006f0072007500200055007205105861006e00690022002c00220036003100350036
[... 95KB ...]
00780022007d00000000000000');
Question: Any clue what this blob's format is? How to extract it (using command line or any library or Linux tool) to JSON or any other readable format?
Well, I had a fun day today figuring this out and ended creating a Python tool that can read the data from these indexedDB database files and print them (and maybe more at some point): moz-idb-edit
To answer the technical parts of the question first:
Both the name key (name) and data (value) use a Mozilla proprietary format whose only documentation appears to be its source code at this time.
The keys use a special just-for-this use-case encoding whose rough description is available in mozilla-central/dom/indexedDB/Key.cpp – the file also contains the only known implementation. Its unique selling point appears to be the fact that it is relatively compact while being compatible with all the possible index types websites may throw at you as well as being in the correct binary sorting order by default.
The values are stored using SpiderMonkey's internal StructuredClone representation that is also used when moving values between processes in the browser. Again there are no docs to speak of but one can read the source code which fortunately is quite easy to understand. Before being added to the database however the generated binary is compressed on-the-fly using Google's Snappy compression which “does not aim for maximum compression [but instead …] aims for very high speeds and reasonable compression” – probably not a bad idea considering that we're dealing with wasteful web content here.
To locate the correct indexedDB file for an extension's local storage data, one needs to resolve the extension's static ID to a so-call “internal UUID” whose value is different in every browser profile instance (to make tracking based on installed addons a lot harder). The mapping table for this is stored as a pref (“extensions.webextensions.uuids”) in the prefs.js. The IDB path then is ${MOZ_PROFILE}/storage/default/moz-extension+++${EXT_UUID}^userContextId=4294967295/idb/3647222921wleabcEoxlt-eengsairo.sqlite
For all practical intents and purposes you can read the value of a single storage key of any extension by downloading the project mentioned above. Basic usage is:
$ ./moz-idb-edit --extension "${EXT_ID}" --profile "${MOZ_PROFILE}" "${STORAGE_KEY}"
Where ${EXT_ID} is the extension's static ID (check its manifest.json file or look in about:support#extensions-tbody if your unsure), ${MOZ_PROFILE} is the Firefox profile directory (also in about:support) and ${STORAGE_KEY} is the name of the key you'd like to query (unfortunately querying all keys is not supported yet).
Also writing data is not currently supported either.
I'll update this answer as I implement more features (or drop me an issue on the project page!).
I am trying to run a search query on my SQLite db and am having problems with special characters that are stored.
I have a column called site_name which contains records like castle, chàteau, church. When someone uses chateau as their search term I want it to pull out the chàteau record.
Is there a method for handling this in SQLite?
Thanks
See here
The link references Android development, but it appears to answer your question.
I develop a web application that let users to upload files like images and documents. this file divided into two parts :
binary files
document files
I want to allow users to search documents that uploaded. specialy using full text search. What data types I should use for these two file types?
You can store the data in binary and use full text search to interpret the binary data and extract the textual information: .doc, .txt, .xls, .ppt, .htm. The extracted text is indexed and becomes available for querying (make sure you use the CONTAINS keyword). Needless to say, full text search has to be enabled.Not sure how adding a full text index will affect your system - i.e., its size. You'll also need to look at the execution plan to ensure the index gets used at query time.
For more information look at this:
http://technet.microsoft.com/en-us/library/ms142499(SQL.90).aspx
Pros:
The main advantage of storing data in the database is that it makes the data "self-contained". Since all of the data is contained within the database, backing up the data, moving the data from one database server to another, replicating the database, and so on, is much easier.
also you can enable versioning of files and also make it easier for load balanced web farms.
Cons:
you can read it here: https://dba.stackexchange.com/questions/3924/sql-server-2005-large-binary-storage. But this is something that you have to do in order to search through the files efficiently.
Or the other thing that I could suggest is probably storing keywords in the database and then linking the same to file in the fileshare.
Here is an article discussing abt using a FileStream and a database: http://blogs.msdn.com/b/manisblog/archive/2007/10/21/filestream-data-type-sql-server-2008.aspx
You first need to convert the PDF to text. There are libraries for this sort of thing (ie: PowerGREP). Then I'd recommend storing the text of the PDF files in a database. If you need to do full text searching and logic such as "on the same line" then you'll need to store one record per line of text. If you just want to search for text in a file, then you can change the structure of your SQL schema to match your needs.
For docx files, I would convert them to RTF and search them that way while stored in SQL.
For images, Microsoft has a program called Microsoft OneNote that does OCR (optical character recognition) so you can search for text within images. It doesn't matter what tool you use, just that it supports OCR.
Essentially, if you don't have a way to directly read the binary file, then you need to convert it to text with some library, then worry about doing your searching.
The full-text index can be created for columns which use any of the following data types – CHAR, NCHAR, VARCHAR, NVARCHAR, TEXT, NTEXT, VARBINARY, VARBINARY (MAX), IMAGE and XML.
In addition, To use full text search you must create a full-text index for the table against which they want to run full-text search queries. For a particular SQL Server Table or Indexed View you can create a maximum of one Full-Text Index.
these are two article about it:
SQL SERVER - 2008 - Creating Full Text Catalog and Full Text Search
Using Full Text Search in SQL Server 2008
ASP.net app inserts Microsoft Windows 2007 .docx file into a row on DB2 OS/390 Blob table. A different VB.net app gets the DB2 OS/390 Blob data. VB.net app kicks off Microsoft Word to open the .docx file but then Microsoft Word pops up a message that the data is corrupted. Word will allow you to fix the data so the file can be viewed but it is extra steps and users complain.
I've seen some examples where .docx can be converted to .doc but they only talk about stripping out the text. Some of our .docx have pictures in them.
Any ideas?
I see that this question is 10 months old. I hope it's not too late to be helpful.
Neither DB2 nor any other database that allows a "Blob" data type would know that the data came from a .docx file, or do anything that would cause Word to complain. The data is supposed to be an exact copy of whatever data you pass to it.
Similarly, the Word document does not "know" that it has been copied to a BLOB object and then back.
Therefore, the problem is almost certainly with your handling of the BLOB data, in one or both of your programs.
Please run your first program to copy the .docx file into the databse, then run the second one to read it back out. Then use a byte-by-byte tool to compare the two files. One way to do this would be to open a command window and type:
fc/b Doc1.docx Doc2.docx
If you have access to some better compare tools, by all means use them... but make sure that it looks at EVERY BYTE, not just the printable characters.
Obviously, you ARE going to find differences, or else Microsoft Word wouldn't give you errors on the second one when the first one is just fine. Once you see what the differences are, hopefully you will understand what is going wrong and how to fix them.
I had a similar problem several years ago (I was storing graphics, but it's the same basic problem). It turns out that the document size was being affected - I would store 8005 bytes into the BLOB object, and when I read it back out I was getting 8192 bytes. NUL (0) bytes were being appended to the end of the data.
My solution at the time was to append an "X" to the end of the BLOB data when I wrote it to the database. Then, when I read it back, I would search for the very last "X" in the data and remove it, along with any data after it. That way, I could recover the original data. What I should have done was store the data length in the database along with the BLOB data. Then you could truncate the file to that size, eliminating the corruption.
If appended NUL bytes aren't your problem, then you'll need to do something else to fix the problem. But you don't have a clue until you know what changed. Something did.