Download CSV in Shiny app every 24 hours & display download time - r

I have a CSV that I want to download. I do not want it to download every time a user joins or uses the app.
I want to run the code every 24 hours and also display any of 1) timer since last download 2) timer until next download 3) timestamp of last download
Below is what I have right now, which works, but will probably cause unnecessary downloads. Is doing something with invalidatelater going to work or is there a better way?
CSV.Path <- "https://oracleselixir-downloadable-match-data.s3-us-west-2.amazonaws.com/2021_LoL_esports_match_data_from_OraclesElixir_20210404.csv"
download.file(CSV.Path, "lol2021")
lol2021 <- read.csv("lol2021")

There are two ways to approach this:
Check to see if it should be downloaded when the app starts; if the file is more recent than 24h, do not re-download it. This can be resolved fairly easily with:
fileage <- difftime(Sys.time(), file.info("data")["mtime"][[1]], units = "day")
if (is.na(fileage) || fileage > 1) {
CSV.Path <- "https://oracleselixir-downloadable-match-data.s3-us-west-2.amazonaws.com/2021_LoL_esports_match_data_from_OraclesElixir_20210404.csv"
download.file(CSV.Path, "lol2021")
}
lol2021 <- read.csv("lol2021")
(The is.na is there in case the file does not exist.)
One complicating factor with this is that two simultaneous users might attempt to download it at the same time. There should likely be some mutex file-access control here if that is a possibility.
Make sure this script is run every 24h, regardless of what users are or are not using the app. On what type of server are you running this app? Something like shiny-server does not do cron-like running, I believe, and you might not be able to guarantee that the app is "awake" every 24h. RStudio Connect does allow scheduled jobs, which might be a consideration for you.
Lacking that, if you have good access to the server, you might just add it as a cron job using Rscript or similar to download and overwrite the file.
Note about mutex file access: many networked filesystems (common in cloud and server architectures) do not guarantee file locking. A common technique is to download into a temporary file and then move (or copy) this temp file into the "real" file name in one step. This guards against the possibility that one process is reading from the file while another process is writing to it ... partial-file reads will be a frustrating and difficult-to-reproduce bug.

Related

In R, how can I schedule function execution in a cronjob-like way?

I would like to write an R script that runs for the whole day. It should basically be an infinite loop that fetches trading data, applies smart stuff to it, and uploads the trading data somewhere else again.
I'm thinking about starting the script at 8:30am, then it would automatically "do nothing" until 9:00am, then start running in a loop until 5:00pm, and then idle again, until I shut down the R session.
What's the best way to achieve this behavior?
I have no access to Linux machines, so multiple scripts and cronjobs are not possible, unfortunately.
while ( as.numeric(format(Sys.time(),format = "%H")) %in% 8:17){
if(as.numeric(format(Sys.time(),format = "%H")) %in% 9:17){
# your code here
}
}
You can take a look at this link
Windows Task scheduler can do the task
Even with Rstudio you can
Some more links here and here Link
Then the package taskschedulR
If you're on a Windows consumer platform, you could use the 'scheduleR' package. Otherwise, if you're on a Windows Server, then you could use the Windows Scheduler.

Split sqllite file into chunks for appcfg.py

I have a 750MB sql3 file that I want to load into appcfg.py, a program that can restore appengine data. It's taking forever to load in there. Is there a way I could split it into smaller, totally-separate chunks, to be loaded independantly?
I don't need to run queries across the data, or maintain any other kind of relationship. I just need to copy a list of the records to my appengine app.
Elaboration:
I'm trying to restore a 750 MB sql3 file I got from
appcfg.py download_data --appl=myapp --url=https://myapp.appspot.com/remote_path --file=backup.sql3
Now, I'm trying to restore the file with
appcfg.py upload_data --appl=restoreapp --url=https://restoreapp.appspot.com/remote_api --file=backup.sql3
I also set some parameters tweaking the default limits.
This prints out some initial logging information, repeating the parameters, etc. Then nothing happens for about 45 minutes, except that python takes about 50% cpu for the duration. Then, finally, it starts to upload to appengine.
From there, it seems to work. But, if there's an error in the transmission, I have to wait the 45 minutes again, even after specifying the progress database. That's why I'm looking for a way to split up the file, or something.
FWIW, both the original app and the restore app use the Java sdk

have R halt the EC2 machine it's running on

I have a few work flows where I would like R to halt the Linux machine it's running on after completion of a script. I can think of two similar ways to do this:
run R as root and then call system("halt")
run R from a root shell script (could run the R script as any user) then have the shell script run halt after the R bit completes.
Are there other easy ways of doing this?
The use case here is for scripts running on AWS where I would like the instance to stop after script completion so that I don't get charged for machine time post job run. My instance I use for data analysis is an EBS backed instance so I don't want to terminate it, simply suspend. Issuing a halt command from inside the instance is the same effect as a stop/suspend from AWS console.
I'm impressed that works. (For anyone else surprised that an instance can stop itself, see notes 1 & 2.)
You can also try "sudo halt", as you wouldn't need to run as a root user, as long as the user account running R is capable of running sudo. This is pretty common on a lot of AMIs on EC2.
Be careful about what constitutes an assumption of R quitting - believe it or not, one can crash R. It may be better to have a separate script that watches the R pid and, once that PID is no longer active, terminates the instance. Doing this command inside of R means that if R crashes, it never reaches the call to halt. If you call it from within another script, that can be dangerous, too. If you know Linux well, what you're looking for is the PID from starting R, which you can pass to another script that checks ps, say every 1 second, and then terminates the instance once the PID is no longer running.
I think a better solution is to use the EC2 API tools (see: http://docs.amazonwebservices.com/AWSEC2/latest/APIReference/ for documentation) to terminate OR stop instances. There's a difference between the two of these, and it matters if your instance is EBS backed or S3 backed. You needn't run as root in order to terminate the instance - the fact that you have the private key and certificate shows Amazon that you're the BOSS, way above the hoi polloi who merely have root access on your instance.
Because these credentials can be used for mischief, be careful about running API tools from a given server, you'll need your certificate and private key on the server. That's a bad idea in the event that you have a security problem. It would be better to message to a master server and have it shut down the instance. If you have messaging set up in any way between instances, this can do all the work for you.
Note 1: Eric Hammond reports that the halt will only suspend an EBS instance, so you still have storage fees. If you happen to start a lot of such instances, this can clutter things up. Your original question seems unclear about whether you mean to terminate or stop an instance. He has other good advice on this page
Note 2: A short thread on the EC2 developers forum gives advice for Linux & Windows users.
Note 3: EBS instances are billed for partial hours, even when restarted. (See this thread from the developer forum.) Having an auto-suspend close to the hour mark can be useful, assuming the R process isn't working, in case one might re-task that instance (i.e. to save on not restarting). Other useful tools to consider: setTimeLimit and setSessionTimeLimit, and various checkpointing tools (I have a Q that mentions a couple). Using an auto-kill is useful if one has potentially badly behaved code.
Note 4: I recently learned of the shutdown command in package fun. This is multi-platform. See this blog post for commentary, and code is here. Dangerous stuff, but it could be useful if you want to adapt to Windows. I haven't tried it, though.
Update 1. Three more ideas:
You could use .Last() and runLast = TRUE for q() and quit(), which could shut down the instance.
If using littler or a script that invokes the script via Rscript, the same command line functions could be used.
My favorite package of today, tcltk2 has a neat timer mechanism, called tclTaskSchedule() that can be used to schedule the execution of an expression. You could then go crazy with the execution of stuff just before a hourly interval has elapsed.
system("echo 'rootpassword' | sudo halt")
However, the downside is having your root password in plain text in the script.
AFAIK those ways you mentioned are the only ones. In any case the script will have to run as root to be able to shut down the machine (if you find a way to do it without root that's possibly an exploit). You ask for an easier way but system("halt") is just an additional line at the end of your script.
sudo is an option -- it allows you to run certain commands without prompting for any password. Just put something like this in /etc/sudoers
<username> ALL=(ALL) PASSWD: ALL, NOPASSWD: /sbin/halt
(of course replacing with the name of user running R) and system('sudo halt') should just work.

Any method for going through large log files?

// Java programmers, when I mean method, I mean a 'way to do things'...
Hello All,
I'm writing a log miner script to monitor various log files at my company, It's written in Perl though I have access to Python and if I REALLY need to, C (though my company doesn't like binary files). It needs to be able to go through the last 24 hours, take the log code and check it if we should ignore or email the appropriate people (me). The script would run as a cron job on Solaris servers. Now here is what I had in mind (this is only pseudo-ish... and badly written pesudo)
main()
{
$today = Get_Current_Date();
$yesterday = Subtract_One_Day($today);
`grep $yesterday '/path/to/log' > /tmp/log` # Get logs from previous day
`awk '{print $X}' > /tmp/log_codes`; # Get Log Code
SubRoutine_to_Compare_Log_Codes('/tmp/log_codes');
}
Another thought was to load the log file into memory and read it in there... that is all fine and dandy except for a two small problems.
These servers are production servers and serve a couple million customers...
The Log files average 3.3GB (which are logs for about two days)
So not only would grep take a while to go through each file, but It would use up CPU and Memory in the process which need to be used elsewhere. And loading into memory a 3.3GB file is not of the wisest ideas. (At least IMHO). Now I had a crazy idea involving assembly code and memory locations but I don't know SPARC assembly sooo flush that idea.
Anyone have any suggestions?
Thanks for reading this far =)
Possible solutions: 1) have the system start a new log file every midnight -- this way you could mine the finite-size log file of the previous day at a reduced priority; and 2) modify the logging system so that it automatically extracts certain messages for further processing on the fly.

How do I unlock a SQLite database?

When I enter this query:
sqlite> DELETE FROM mails WHERE (id = 71);
SQLite returns this error:
SQL error: database is locked
How do I unlock the database so this query will work?
In windows you can try this program http://www.nirsoft.net/utils/opened_files_view.html to find out the process is handling db file. Try closed that program for unlock database
In Linux and macOS you can do something similar, for example, if your locked file is development.db:
$ fuser development.db
This command will show what process is locking the file:
> development.db: 5430
Just kill the process...
kill -9 5430
...And your database will be unlocked.
I caused my sqlite db to become locked by crashing an app during a write. Here is how i fixed it:
echo ".dump" | sqlite old.db | sqlite new.db
Taken from: http://random.kakaopor.hu/how-to-repair-an-sqlite-database
The SQLite wiki DatabaseIsLocked page offers an explanation of this error message. It states, in part, that the source of contention is internal (to the process emitting the error). What this page doesn't explain is how SQLite decides that something in your process holds a lock and what conditions could lead to a false positive.
This error code occurs when you try to do two incompatible things with a database at the same time from the same database connection.
Changes related to file locking introduced in v3 and may be useful for future readers and can be found here: File Locking And Concurrency In SQLite Version 3
If you want to remove a "database is locked" error then follow these steps:
Copy your database file to some other location.
Replace the database with the copied database. This will dereference all processes which were accessing your database file.
Deleting the -journal file sounds like a terrible idea. It's there to allow sqlite to roll back the database to a consistent state after a crash. If you delete it while the database is in an inconsistent state, then you're left with a corrupted database. Citing a page from the sqlite site:
If a crash or power loss does occur and a hot journal is left on the disk, it is essential that the original database file and the hot journal remain on disk with their original names until the database file is opened by another SQLite process and rolled back. [...]
We suspect that a common failure mode for SQLite recovery happens like this: A power failure occurs. After power is restored, a well-meaning user or system administrator begins looking around on the disk for damage. They see their database file named "important.data". This file is perhaps familiar to them. But after the crash, there is also a hot journal named "important.data-journal". The user then deletes the hot journal, thinking that they are helping to cleanup the system. We know of no way to prevent this other than user education.
The rollback is supposed to happen automatically the next time the database is opened, but it will fail if the process can't lock the database. As others have said, one possible reason for this is that another process currently has it open. Another possibility is a stale NFS lock, if the database is on an NFS volume. In that case, a workaround is to replace the database file with a fresh copy that isn't locked on the NFS server (mv database.db original.db; cp original.db database.db). Note that the sqlite FAQ recommends caution regarding concurrent access to databases on NFS volumes, because of buggy implementations of NFS file locking.
I can't explain why deleting a -journal file would let you lock a database that you couldn't before. Is that reproducible?
By the way, the presence of a -journal file doesn't necessarily mean that there was a crash or that there are changes to be rolled back. Sqlite has a few different journal modes, and in PERSIST or TRUNCATE modes it leaves the -journal file in place always, and changes the contents to indicate whether or not there are partial transactions to roll back.
the SQLite db files are just files, so the first step would be to make sure it isn't read-only. The other thing to do is to make sure that you don't have some sort of GUI SQLite DB viewer with the DB open. You could have the DB open in another shell, or your code may have the DB open. Typically you would see this if a different thread, or application such as SQLite Database Browser has the DB open for writing.
My lock was caused by the system crashing and not by a hanging process. To resolve this, I simply renamed the file then copied it back to its original name and location.
Using a Linux shell that would be:
mv mydata.db temp.db
cp temp.db mydata.db
If a process has a lock on an SQLite DB and crashes, the DB stays locked permanently. That's the problem. It's not that some other process has a lock.
I had this problem just now, using an SQLite database on a remote server, stored on an NFS mount. SQLite was unable to obtain a lock after the remote shell session I used had crashed while the database was open.
The recipes for recovery suggested above did not work for me (including the idea to first move and then copy the database back). But after copying it to a non-NFS system, the database became usable and not data appears to have been lost.
Some functions, like INDEX'ing, can take a very long time - and it locks the whole database while it runs. In instances like that, it might not even use the journal file!
So the best/only way to check if your database is locked because a process is ACTIVELY writing to it (and thus you should leave it the hell alone until its completed its operation) is to md5 (or md5sum on some systems) the file twice.
If you get a different checksum, the database is being written, and you really really REALLY don't want to kill -9 that process because you can easily end up with a corrupt table/database if you do.
I'll reiterate, because it's important - the solution is NOT to find the locking program and kill it - it's to find if the database has a write lock for a good reason, and go from there. Sometimes the correct solution is just a coffee break.
The only way to create this locked-but-not-being-written-to situation is if your program runs BEGIN EXCLUSIVE, because it wanted to do some table alterations or something, then for whatever reason never sends an END afterwards, and the process never terminates. All three conditions being met is highly unlikely in any properly-written code, and as such 99 times out of 100 when someone wants to kill -9 their locking process, the locking process is actually locking your database for a good reason. Programmers don't typically add the BEGIN EXCLUSIVE condition unless they really need to, because it prevents concurrency and increases user complaints. SQLite itself only adds it when it really needs to (like when indexing).
Finally, the 'locked' status does not exist INSIDE the file as several answers have stated - it resides in the Operating System's kernel. The process which ran BEGIN EXCLUSIVE has requested from the OS a lock be placed on the file. Even if your exclusive process has crashed, your OS will be able to figure out if it should maintain the file lock or not!! It is not possible to end up with a database which is locked but no process is actively locking it!!
When it comes to seeing which process is locking the file, it's typically better to use lsof rather than fuser (this is a good demonstration of why: https://unix.stackexchange.com/questions/94316/fuser-vs-lsof-to-check-files-in-use). Alternatively if you have DTrace (OSX) you can use iosnoop on the file.
I added "Pooling=true" to connection string and it worked.
This error can be thrown if the file is in a remote folder, like a shared folder. I changed the database to a local directory and it worked perfectly.
I found the documentation of the various states of locking in SQLite to be very helpful. Michael, if you can perform reads but can't perform writes to the database, that means that a process has gotten a RESERVED lock on your database but hasn't executed the write yet. If you're using SQLite3, there's a new lock called PENDING where no more processes are allowed to connect but existing connections can sill perform reads, so if this is the issue you should look at that instead.
I have such problem within the app, which access to SQLite from 2 connections - one was read-only and second for writing and reading. It looks like that read-only connection blocked writing from second connection. Finally, it is turns out that it is required to finalize or, at least, reset prepared statements IMMEDIATELY after use. Until prepared statement is opened, it caused to database was blocked for writing.
DON'T FORGET CALL:
sqlite_reset(xxx);
or
sqlite_finalize(xxx);
I just had something similar happen to me - my web application was able to read from the database, but could not perform any inserts or updates. A reboot of Apache solved the issue at least temporarily.
It'd be nice, however, to be able to track down the root cause.
lsof command on my Linux environment helped me to figure it out that a process was hanging keeping the file open.
Killed the process and problem was solved.
This link solve the problem. : When Sqlite gives : Database locked error
It solved my problem may be useful to you.
And you can use begin transaction and end transaction to not make database locked in future.
Should be a database's internal problem...
For me it has been manifested after trying to browse database with "SQLite manager"...
So, if you can't find another process connect to database and you just can't fix it,
just try this radical solution:
Provide to export your tables (You can use "SQLite manager" on Firefox)
If the migration alter your database scheme delete the last failed migration
Rename your "database.sqlite" file
Execute "rake db:migrate" to make a new working database
Provide to give the right permissions to database for table's importing
Import your backed up tables
Write the new migration
Execute it with "rake db:migrate"
In my experience, this error is caused by: You opened multiple connections.
e.g.:
1 or more sqlitebrowser (GUI)
1 or more electron thread
rails thread
I am nore sure about the details of SQLITE3 how to handle the multiple thread/request, but when I close the sqlitebrowser and electron thread, then rails is running well and won't block any more.
I ran into this same problem on Mac OS X 10.5.7 running Python scripts from a terminal session. Even though I had stopped the scripts and the terminal window was sitting at the command prompt, it would give this error the next time it ran. The solution was to close the terminal window and then open it up again. Doesn't make sense to me, but it worked.
I just had the same error.
After 5 minets google-ing I found that I didun't closed one shell witch were using the db.
Just close it and try again ;)
I had the same problem. Apparently the rollback function seems to overwrite the db file with the journal which is the same as the db file but without the most recent change. I've implemented this in my code below and it's been working fine since then, whereas before my code would just get stuck in the loop as the database stayed locked.
Hope this helps
my python code
##############
#### Defs ####
##############
def conn_exec( connection , cursor , cmd_str ):
done = False
try_count = 0.0
while not done:
try:
cursor.execute( cmd_str )
done = True
except sqlite.IntegrityError:
# Ignore this error because it means the item already exists in the database
done = True
except Exception, error:
if try_count%60.0 == 0.0: # print error every minute
print "\t" , "Error executing command" , cmd_str
print "Message:" , error
if try_count%120.0 == 0.0: # if waited for 2 miutes, roll back
print "Forcing Unlock"
connection.rollback()
time.sleep(0.05)
try_count += 0.05
def conn_comit( connection ):
done = False
try_count = 0.0
while not done:
try:
connection.commit()
done = True
except sqlite.IntegrityError:
# Ignore this error because it means the item already exists in the database
done = True
except Exception, error:
if try_count%60.0 == 0.0: # print error every minute
print "\t" , "Error executing command" , cmd_str
print "Message:" , error
if try_count%120.0 == 0.0: # if waited for 2 miutes, roll back
print "Forcing Unlock"
connection.rollback()
time.sleep(0.05)
try_count += 0.05
##################
#### Run Code ####
##################
connection = sqlite.connect( db_path )
cursor = connection.cursor()
# Create tables if database does not exist
conn_exec( connection , cursor , '''CREATE TABLE IF NOT EXISTS fix (path TEXT PRIMARY KEY);''')
conn_exec( connection , cursor , '''CREATE TABLE IF NOT EXISTS tx (path TEXT PRIMARY KEY);''')
conn_exec( connection , cursor , '''CREATE TABLE IF NOT EXISTS completed (fix DATE, tx DATE);''')
conn_comit( connection )
One common reason for getting this exception is when you are trying to do a write operation while still holding resources for a read operation. For example, if you SELECT from a table, and then try to UPDATE something you've selected without closing your ResultSet first.
I was having "database is locked" errors in a multi-threaded application as well, which appears to be the SQLITE_BUSY result code, and I solved it with setting sqlite3_busy_timeout to something suitably long like 30000.
(On a side-note, how odd that on a 7 year old question nobody found this out already! SQLite really is a peculiar and amazing project...)
Before going down the reboot option, it is worthwhile to see if you can find the user of the sqlite database.
On Linux, one can employ fuser to this end:
$ fuser database.db
$ fuser database.db-journal
In my case I got the following response:
philip 3556 4700 0 10:24 pts/3 00:00:01 /usr/bin/python manage.py shell
Which showed that I had another Python program with pid 3556 (manage.py) using the database.
An old question, with a lot of answers, here's the steps I've recently followed reading the answers above, but in my case the problem was due to cifs resource sharing. This case is not reported previously, so hope it helps someone.
Check no connections are left open in your java code.
Check no other processes are using your SQLite db file with lsof.
Check the user owner of your running jvm process has r/w permissions over the file.
Try to force the lock mode on the connection opening with
final SQLiteConfig config = new SQLiteConfig();
config.setReadOnly(false);
config.setLockingMode(LockingMode.NORMAL);
connection = DriverManager.getConnection(url, config.toProperties());
If your using your SQLite db file over a NFS shared folder, check this point of the SQLite faq, and review your mounting configuration options to make sure your avoiding locks, as described here:
//myserver /mymount cifs username=*****,password=*****,iocharset=utf8,sec=ntlm,file,nolock,file_mode=0700,dir_mode=0700,uid=0500,gid=0500 0 0
I got this error in a scenario a little different from the ones describe here.
The SQLite database rested on a NFS filesystem shared by 3 servers. On 2 of the servers I was able do run queries on the database successfully, on the third one thought I was getting the "database is locked" message.
The thing with this 3rd machine was that it had no space left on /var. Everytime I tried to run a query in ANY SQLite database located in this filesystem I got the "database is locked" message and also this error over the logs:
Aug 8 10:33:38 server01 kernel: lockd: cannot monitor 172.22.84.87
And this one also:
Aug 8 10:33:38 server01 rpc.statd[7430]: Failed to insert: writing /var/lib/nfs/statd/sm/other.server.name.com: No space left on device
Aug 8 10:33:38 server01 rpc.statd[7430]: STAT_FAIL to server01 for SM_MON of 172.22.84.87
After the space situation was handled everything got back to normal.
If you're trying to unlock the Chrome database to view it with SQLite, then just shut down Chrome.
Windows
%userprofile%\Local Settings\Application Data\Google\Chrome\User Data\Default\Web Data
or
%userprofile%\Local Settings\Application Data\Google\Chrome\User Data\Default\Chrome Web Data
Mac
~/Library/Application Support/Google/Chrome/Default/Web Data
From your previous comments you said a -journal file was present.
This could mean that you have opened and (EXCLUSIVE?) transaction and have not yet committed the data. Did your program or some other process leave the -journal behind??
Restarting the sqlite process will look at the journal file and clean up any uncommitted actions and remove the -journal file.
As Seun Osewa has said, sometimes a zombie process will sit in the terminal with a lock aquired, even if you don't think it possible. Your script runs, crashes, and you go back to the prompt, but there's a zombie process spawned somewhere by a library call, and that process has the lock.
Closing the terminal you were in (on OSX) might work. Rebooting will work. You could look for "python" processes (for example) that are not doing anything, and kill them.

Resources