Finding total edit time for R files - r

I am trying to determine how much time I have spent on a project, which has mainly been done in .R files. I know the file.info function will extract metadata for me on that file, but since I have opened it several times over several days, I don't know how to use that information to determine total time editing. Is there a function to find this information, or a way to go through the file system to find it?

Just a thought: you could maintain a log file to which you write the following from your R script: start time, stop-time and R script file name.
You can add simple code in your script that would do this. You would then require a separate script that would analyse the logs and inform you about how much time was spent using the scipt.
For a single user this would work.
Note: this catches script execution time and not the time spent on editing the files. The log would still have merit: you would have a record of when you were working on the script under the assumption that you run your scripts frequently when developing code.
How about using an old-fashioned time sheet for the purpose of recording development time? Tools such as JIRA are very suitable for that purpose.
For example at the start of the script:
logFile <-file("log.txt")
writeLines(paste0("Scriptname start: ", Sys.time()), logFile)
close(logFile)
And at the end of the script:
logFile <-file("log.txt")
writeLines(paste0("Scriptname stop: ", Sys.time()), logFile)
close(logFile)

Related

Task Scheduler - Argument - Add timestamp to opened Excel document and close Excel

tldr - Task Scheduler, Edit Action, Argument = (open excel without app selection, add date-time to cell, close Excel)
I'm new to using Task Scheduler and am looking to add a date-time stamp into the opened Excel document.
It can be very simple, it just needs to be in the next cell/row and close. 
It also seems there is a difference in an application opening through Task Scheduler because it will ALWAYS ask what to open it with(yes, default extensions/apps in Settings/file set).
So it seems even with an argument, it will still be stopped? Is there a way around this is as well? I have modified permissions on the file, but that doesn't seem to work.
You can use a.txt if it is easier.
Thank you all you wonderful community!

How to keep track of the number of runs of an R script per day

I am currently looking to write a function in R that can keep track of the number of completed runs of an .R file within any particular day. Notice that the runs might be conducted at different time periods of a day. I did some research on this problem and came across this post (To show how many times user has run the script). So far I am unable to build upon the first commenter's code when converting into R (the main obstacle is to replicate the try....except ). However, I need to add the restriction that the count is measured only within a day (exactly from 00:00:00 AM EST to 24:00:00 AM EST).
Can someone please offer some help on how to accomplish this goal?
Either I didn't get the problem, or it seems a rather easy one: create a temporary file (use Sys.Date() to name it) and store the current run number there; at the beginning of your .R file, read the temporary file, increment the number, write the file back.

RStudio converted code to jibberish, froze, and saved as gibberish

I had a small but important R file that I have been working on for a few days.
I created and uploaded a list of about 1,000 ID's to SQL Server the other day and today I was repeating the process with a different type of ID. I frequently save the file and after having added a couple of lines and saved, I ran the sqlSave() statement to upload the new ID's.
RStudio promptly converted all of my code to gibberish and froze (see screen shot).
After letting it try to finish for several minutes I closed RStudio and reopened it. It automatically re-opened my untitled text files where I had a little working code, but didn't open my main code file.
When I tried to open it I was informed that the file is 55 Megabytes and thus too large to open. Indeed, I confirmed that it really is 55MB now and when opening it in an external text editor I see the same gibberish as in this screnshot.
Is there any hope of recovering my code?
I suppose a low memory must be to blame. The object and command I was executing at the time were not resource intensive, however a few minutes before that I did retrieve an overly large dataframe from SQL Server.
You overwrote your code with a binary representation of your objects with this line:
save.image('jive.R')
save.image saves the R objects, not your R script file. To save your script, you can just click "File->Save". To save your objects, you would have to put that in a different file.

Submit a new script after all parallel jobs in R have completed

I have an R script that creates multiple scripts and submits these simultaneously to a computer cluster, and after all of the multiple scripts have completed and the output has been written in the respective folders, I would like to automatically launch another R script that works on these outputs.
I haven't been able to figure out whether there is a way to do this in R: the function 'wait' is not what I want since the scripts are submitted as different jobs and each of them completes and writes its output file at different times, but I actually want to run the subsequent script after all of the outputs appear.
One way I thought of is to count the files that have been created and, if the correct number of output files are there, then submit the next script. However to do this I guess I would have to have a script opened that checks for the presence of the files every now and then, and I am not sure if this is a good idea since it probably takes a day or more before the completion of the first scripts.
Can you please help me find a solution?
Thank you very much for your help
-fra
I think you are looking at this the wrong way:
Not an R problem at all, R happens to be the client of your batch job.
This is an issue that queue / batch processors can address on your cluster.
Worst case you could just wait/sleep in a shell (or R script) til a 'final condition reached' file has been touched
Inter-dependencies can be expressed with make too

One to one correspending to files - in unix - log files

I am writing a Log Unifier program. That is, I have a system that produces logs:
my.log, my.log.1, my.log.2, my.log.3...
I want on each iteration to store the number of lines I've read from a certain file, so that on the next iteration - I can continue reading on from that place.
The problem is that when the files are full, they roll:
The last log is deleted
...
my.log.2 becomes my.log.3
my.log.1 becomes my.log.2
my.log becomes my.log.1
and a new my.log is created
I can ofcourse keep track of them, using inodes - which are almost a one-to-one correspondence to files.
I say "almost", because I fear of the following scenario:
Between two of my iterations - some files are deleted (let's say the logging is very fast), and are then new files are created and some have inodes of files just deleted. The problem is now - that I will mistake these files as old files - and start reading from line 500 (for example) instead of 0.
So I am hoping to find a way to solve this- here are a few directions I thought about - that may help you help me:
Either another 1-to-1 correspondence other than inodes.
An ability to mark a file. I thought about using chmod +x to mark the file as an
existing file, and for new files that don't have these permissions - I will know they are new - but if somebody were to change the permissions manually, that would confuse my program. So if you have any other way to mark.
I thought about creating soft links to a file that are deleted when the file is deleted. That would allow me to know which files got deleted.
Any way to get the "creation date"
Any idea that comes to mind - maybe using timestamps, atime, ctime, mtime in some clever way - all will be good, as long as they will allow me to know which files are new, or any idea creating a one-to-one correspondence to files.
Thank you
I can think of a few alternatives:
Use POSIX extended attributes to store metadata about each log file that your program can use for its operation.
It should be a safe assumption that the contents of old log files are not modified after being archived, i.e. after my.log becomes my.log.1. You could generate a hash for each file (e.g. SHA-256) to uniquely identify it.
All decent log formats embed a timestamp in each entry. You could use the timestamp of the first entry - or even the whole entry itself - in the file for identification purposes. Log files are usually rolled on a periodic basis, which would ensure a different starting timestamp for each file.

Resources