Importing very large sqlite table to BigQuery - sqlite

I have a relatively large SQLite table (5 million rows, 2GB) which I'm trying to move to Google BigQuery. The easy solution, which I've used for other tables in the db, was to use something like SQLite Manager (the Firefox extension) to export to CSV, but this fails with what I'd imagine is an out of memory error when trying to export the table in question. I'm trying to think of the best way to approach this, and have come up with the following:
Write something that will manually write a single, gigantic CSV. This seems like a bad idea for many reasons, but the big ones are that one of the fields is text data which will inevitably screw things up with any of the delimiters supported by BQ's import tools, and I'm not sure that BQ could even support a single CSV that big
Write a script to manually export everything to a series of CSVs, like ~100k rows each or something--the main problem being that this will then require importing 50 files
Write everything to a series of JSONs and try to figure out a way to deal with it from there, same as above
Try to import it to MySQL and then do a mysqldump which apparently can be read by BQ
Use Avro, which seems like the same as #2 except it's going to be in binary so it'll be harder to debug when it inevitably fails
I also have some of this data on a local ElasticSearch node, but I couldn't find any way of migrating that to BQ either. Does anyone have any suggestions? Most of what I've found online has been trying to get things out of BQ, not put things in.

(2) is not a problem. BQ can import up to 10k files per import job.
Also, BQ can also import very large CSV/JSON/AVRO files, as long as the input can be sharded (text based formats are not compressed, CSV files without quoted new lines).
See https://cloud.google.com/bigquery/quota-policy#import for more.

Related

Can I import data to Cloud Firestore without having to have exported first?

I want to populate my database with a lot of data and I have found there exists an import command but you should have exported first. I don't know if I can do it just with several json files of my collections and use the import command in order to make the population faster or the most efficient way to do it is just with batch writes?
I have reviewed the documentation and I have not found something similar but I wanted to be completely sure that I am not ruling out a very good option, Thank you for your time.
In nutshell I wonder if I can adapt my data in a specific format to be able to use the import command and load my data quickly without having to use batched writes. On the documentation I've found I can use import command if I previously exported but that is impossible for me because I still don't have any data in my database.

Update a single sheet in a workbook

I like using Excel as a poor man's database for storing data dictionaries and such, because Excel makes it super easy to edit the data in there without the pains of installing a RDBMS.
Now I hit an unexpected problem. I can't find a simple way to rewrite just one of the worksheets, at least not without reading and writing the whole file.
write.xlsx(df,file ="./codebook.xlsx",sheetName="mysheet",overwrite=F)
Complains file exists. With overwrite=T, my sheets are lost.

Neo4j Configuration with Gephi

I want to use Neo4j to store a number of graphs I created in python. I was using Gephi for visualization, and I thought the export to Neo4j plugin would be a very simple way to get the data across. The problem is that the server is seemingly not recognizing the neostore...db files that Gephi generated.
I'm guessing I configured things incorrectly, but is there a way to fix that?
Alternatively, I'm also open to importing the files directly. I have two files: one with node titles and attributes and another with an edge list of title to title.
I'm guessing that I would need to convert the titles to ids, right? What would be the fastest way to do that?
Thank you in advance!
If you have the file as tab separated csv files, feel free to import them directly. There are some options, check out this page: http://www.neo4j.org/develop/import
Especially the CSV batch importer can help you: http://maxdemarzi.com/2012/02/28/batch-importer-part-1/
Or if it is just a little bit of data, use the spreadsheet approach: http://blog.neo4j.org/2013/03/importing-data-into-neo4j-spreadsheet.html
Please report back if you were successful.
I used Gephi to generate a neo4j store file directory in the past - it worked like a charm...
I assume you did delete the default graph.db directory and renamed your gephi-generated directory to graph.db? That worked for me...

Store map key/values in a persistent file

I will be creating a structure more or less of the form:
type FileState struct {
LastModified int64
Hash string
Path string
}
I want to write these values to a file and read them in on subsequent calls. My initial plan is to read them into a map and lookup values (Hash and LastModified) using the key (Path). Is there a slick way of doing this in Go?
If not, what file format can you recommend? I have read about and experimented with with some key/value file stores in previous projects, but not using Go. Right now, my requirements are probably fairly simple so a big database server system would be overkill. I just want something I can write to and read from quickly, easily, and portably (Windows, Mac, Linux). Because I have to deploy on multiple platforms I am trying to keep my non-go dependencies to a minimum.
I've considered XML, CSV, JSON. I've briefly looked at the gob package in Go and noticed a BSON package on the Go package dashboard, but I'm not sure if those apply.
My primary goal here is to get up and running quickly, which means the least amount of code I need to write along with ease of deployment.
As long as your entiere data fits in memory, you should't have a problem. Using an in-memory map and writing snapshots to disk regularly (e.g. by using the gob package) is a good idea. The Practical Go Programming talk by Andrew Gerrand uses this technique.
If you need to access those files with different programs, using a popular encoding like json or csv is probably a good idea. If you just have to access those file from within Go, I would use the excellent gob package, which has a lot of nice features.
As soon as your data becomes bigger, it's not a good idea to always write the whole database to disk on every change. Also, your data might not fit into the RAM anymore. In that case, you might want to take a look at the leveldb key-value database package by Nigel Tao, another Go developer. It's currently under active development (but not yet usable), but it will also offer some advanced features like transactions and automatic compression. Also, the read/write throughput should be quite good because of the leveldb design.
There's an ordered, key-value persistence library for the go that I wrote called gkvlite -
https://github.com/steveyen/gkvlite
JSON is very simple but makes bigger files because of the repeated variable names. XML has no advantage. You should go with CSV, which is really simple too. Your program will make less than one page.
But it depends, in fact, upon your modifications. If you make a lot of modifications and must have them stored synchronously on disk, you may need something a little more complex that a single file. If your map is mainly read-only or if you can afford to dump it on file rarely (not every second) a single csv file along an in-memory map will keep things simple and efficient.
BTW, use the csv package of go to do this.

Filehelpers Excel to Oracle db

I want to import excel data to oracle DB. I got enough help for Excel part, can you guys help me in Oracle side?
Is it possible to import to oracledb with filehelpers? Please provide some sample code for reference.
Thanks
Dee
If you save the spreadsheet data as .csv files it is relatively straightforward to import it into Oracle using SQLLoader or external tables. Because we can use SQL with external tables they are generally eaiser to work with, and so are preferable to SQLLoader in almost all cases. SQL*Loader is the betterchoice when dealing with huuuuge amounts of data and when ultra-fast loading is paramount.
Find out more about external tables here. You'll find the equivalent reference for SQL*Loader here.
A simple and relatively fool-proof way to do that for one-off tasks is to create a new column in the excel sheet, containing a formula like that:
="insert into foobar values('"&A1&"','"&A2&"');"
Copy that to all rows, then copy the whole column into an editor and run it in SQL*Plus or sql developer.
I am using Sqlloader to upload data from CSV to Oracle DB.

Resources