Split sqllite file into chunks for appcfg.py - sqlite

I have a 750MB sql3 file that I want to load into appcfg.py, a program that can restore appengine data. It's taking forever to load in there. Is there a way I could split it into smaller, totally-separate chunks, to be loaded independantly?
I don't need to run queries across the data, or maintain any other kind of relationship. I just need to copy a list of the records to my appengine app.
Elaboration:
I'm trying to restore a 750 MB sql3 file I got from
appcfg.py download_data --appl=myapp --url=https://myapp.appspot.com/remote_path --file=backup.sql3
Now, I'm trying to restore the file with
appcfg.py upload_data --appl=restoreapp --url=https://restoreapp.appspot.com/remote_api --file=backup.sql3
I also set some parameters tweaking the default limits.
This prints out some initial logging information, repeating the parameters, etc. Then nothing happens for about 45 minutes, except that python takes about 50% cpu for the duration. Then, finally, it starts to upload to appengine.
From there, it seems to work. But, if there's an error in the transmission, I have to wait the 45 minutes again, even after specifying the progress database. That's why I'm looking for a way to split up the file, or something.
FWIW, both the original app and the restore app use the Java sdk

Related

Download CSV in Shiny app every 24 hours & display download time

I have a CSV that I want to download. I do not want it to download every time a user joins or uses the app.
I want to run the code every 24 hours and also display any of 1) timer since last download 2) timer until next download 3) timestamp of last download
Below is what I have right now, which works, but will probably cause unnecessary downloads. Is doing something with invalidatelater going to work or is there a better way?
CSV.Path <- "https://oracleselixir-downloadable-match-data.s3-us-west-2.amazonaws.com/2021_LoL_esports_match_data_from_OraclesElixir_20210404.csv"
download.file(CSV.Path, "lol2021")
lol2021 <- read.csv("lol2021")
There are two ways to approach this:
Check to see if it should be downloaded when the app starts; if the file is more recent than 24h, do not re-download it. This can be resolved fairly easily with:
fileage <- difftime(Sys.time(), file.info("data")["mtime"][[1]], units = "day")
if (is.na(fileage) || fileage > 1) {
CSV.Path <- "https://oracleselixir-downloadable-match-data.s3-us-west-2.amazonaws.com/2021_LoL_esports_match_data_from_OraclesElixir_20210404.csv"
download.file(CSV.Path, "lol2021")
}
lol2021 <- read.csv("lol2021")
(The is.na is there in case the file does not exist.)
One complicating factor with this is that two simultaneous users might attempt to download it at the same time. There should likely be some mutex file-access control here if that is a possibility.
Make sure this script is run every 24h, regardless of what users are or are not using the app. On what type of server are you running this app? Something like shiny-server does not do cron-like running, I believe, and you might not be able to guarantee that the app is "awake" every 24h. RStudio Connect does allow scheduled jobs, which might be a consideration for you.
Lacking that, if you have good access to the server, you might just add it as a cron job using Rscript or similar to download and overwrite the file.
Note about mutex file access: many networked filesystems (common in cloud and server architectures) do not guarantee file locking. A common technique is to download into a temporary file and then move (or copy) this temp file into the "real" file name in one step. This guards against the possibility that one process is reading from the file while another process is writing to it ... partial-file reads will be a frustrating and difficult-to-reproduce bug.

ADF Pipeline Bulk copy activity to Log files

I have a bulk copy template to azure blob storage data transfer set up in ADF. This activity will dynamically produce 'n' number of files.
I need to write log file (txt format) after pipeline activity completed finished.
The log file should have pipeline start & completion datetime and also number of files outputted, status etc.
What is the best way or to choose the activity to do this?
Firstly,i have to say that ADF won't generate log files about the execution information automatically. You could see Visually monitor and Programmatically monitor for activities in ADF.
In above link, you could get the start time of pipeline: Run Start.Even though it does not have any Run End, you could calculate by yourself: Run End = Run Start + Duration.
As for the number of files, please refer to this link.
Anyway,all these metrics need to be got programatically i think,you could choose the language you are good at.

Failing to upload JSON file through Chrome to Firebase Database

This is really frustrating. I have a 104 MB JSON file that I want to upload to my Firebase database through the web front end, but after a random period of time (I've timed it, it's not constant, anywhere from 2 to 20 seconds) I get the error:
There was a problem contacting the server. Try uploading your file again.
So I do try again, and it just keeps failing. I've uploaded files nearly this big before, and the limit for stored data in the realtime DB is 1 GB,
I'm not even close to that. Why does it keep failing to upload?
This is the error I get in chrome dev tools:
Failed to load resource: net::ERR_CONNECTION_ABORTED
https://project.firebaseio.com/.upload?auth=eyJhbGciOiJIUzI1NiIsInR5cCI6…Q3NiwiYWRtaW4iOnRydWUsInYiOjB9.CihvjvLSlx43nOBynAJeyibkBRtygeRlG4Yo1t3jKVA
Failed to load resource: net::ERR_CONNECTION_ABORTED
If I click on the link that shows up in the error, it's a page with the words POST request required.
Turns out the answer is to ignore the web importer entirely and use firebase-import. It worked perfectly first time, and only took a minute to upload the whole json. And it also has merging capabilities.
Using firebase-import as the accepted answer suggested, I get error:
Error: WRITE_TOO_BIG: Data to write exceeds the maximum size that can be modified with a single request.
However, with the firebase-cli I was successful in deleting my entire database:
firebase database:remove /
It seems like it automatically traverses down your database tree to find requests that are under the limit size, then it does multiple delete requests automatically. It takes some time, but definitely works.
You can also import via a json file:
firebase database:set / data.json
I'm unsure if firebase database:set supports merging.

sqlite disk i/o error when performing SELECT statement [duplicate]

We have a new beta version of our software with some changes, but not around our database layer.
We've just started getting Error 3128 reported in our server logs. It seems that once it happens, it happens for as long as the app is open. The part of the code where it is most apparent is where we log data every second via SQLite. We've generated 47k errors on our server this month alone.
3128 Disk I/O error occurred. Indicates that an operation could not be completed because of a disk I/O error. This can happen if the runtime is attempting to delete a temporary file and another program (such as a virus protection application) is holding a lock on the file. This can also happen if the runtime is attempting to write data to a file and the data can't be written.
I don't know what could be causing this error. Maybe an anti-virus program? Maybe our app is getting confused and writing data on top of each other? We're using async connections.
It's causing lots of issues and we're at a loss. It has happened in our older version, but maybe 100 times in a month rather than 47,000 times. Either way I'd like to make it happen "0" times.
Possible solution: Exception Message: Some kind of disk I/O error occurred
Summary: There is probably not a problem with the database but a problem creating (or deleting) the temporary file once the database is opened. AIR may have permissions to the database, but not to create or delete files in the directory.
One answer that has worked for me is to use the PRAGMA statement to set the journal_mode value to something other than DELETE. You do this by issuing a PRAGMA statement in the same way you would issue a query statement.
PRAGMA journal_mode = OFF
Unfortunately, if the application crashes in the middle of a transaction when the OFF journaling mode is set, then the database file will very likely go corrupt.1.
1 http://www.sqlite.org/pragma.html#pragma_journal_mode
The solution was to make sure database delete, update, insert only happened one at at time by wrapping a little wrapper. On top of that, we had to watch for error 3128 and retry. I think this is because we have a trigger running that could lock the database after we inserted data.

Any method for going through large log files?

// Java programmers, when I mean method, I mean a 'way to do things'...
Hello All,
I'm writing a log miner script to monitor various log files at my company, It's written in Perl though I have access to Python and if I REALLY need to, C (though my company doesn't like binary files). It needs to be able to go through the last 24 hours, take the log code and check it if we should ignore or email the appropriate people (me). The script would run as a cron job on Solaris servers. Now here is what I had in mind (this is only pseudo-ish... and badly written pesudo)
main()
{
$today = Get_Current_Date();
$yesterday = Subtract_One_Day($today);
`grep $yesterday '/path/to/log' > /tmp/log` # Get logs from previous day
`awk '{print $X}' > /tmp/log_codes`; # Get Log Code
SubRoutine_to_Compare_Log_Codes('/tmp/log_codes');
}
Another thought was to load the log file into memory and read it in there... that is all fine and dandy except for a two small problems.
These servers are production servers and serve a couple million customers...
The Log files average 3.3GB (which are logs for about two days)
So not only would grep take a while to go through each file, but It would use up CPU and Memory in the process which need to be used elsewhere. And loading into memory a 3.3GB file is not of the wisest ideas. (At least IMHO). Now I had a crazy idea involving assembly code and memory locations but I don't know SPARC assembly sooo flush that idea.
Anyone have any suggestions?
Thanks for reading this far =)
Possible solutions: 1) have the system start a new log file every midnight -- this way you could mine the finite-size log file of the previous day at a reduced priority; and 2) modify the logging system so that it automatically extracts certain messages for further processing on the fly.

Resources