If statement for directory starting with specific character - r

I have a script in which I call R and depending on the directory I specify I want it to carry out a different process. One directory starts with L and the other with S. I have numerous directories that either start with L or S and they all end differently.
I specify the directory in bash and run a script like so:
./script L_dir
or
./script S_dir
So within my R script I have it set up as such:
args <- commandArgs(TRUE)
img_dir <- args[1]
if(img_dir == "^L*"){
do_process_1
} else {
do_process_2
}
Everything works fine except that no matter what directory I specify, the process called will always be do_process_2.
I have looked at this question and tried to adapt it but can't get it to work.

After changing my code to
if(grepl("^LM*", img_dir)){
do_process_1
} else {
do_process_2
}
it worked. Be careful if you change it to the above and it still carries out process_2. This may be because what you are looking for, in my case ^L*, may also be in your second directory name i.e. dir_L = LMNOP, dir_S = STUVLJH. But once i specified ^LM* it did what i wanted it to do.

Related

R: Using source and loop cycle to run a 2nd script

I am using command source in a Script (Script Number 1) to run other script file saved (Script number 2).
My idea is use a loop to read continually the Script number 2 that is modified continually. I Never stop the loop. But Lamentably this dont work.
Always Script number 1 read the original Script number 2 and not change the result when I change the script number 2. I am using a sublime Text to change the script and not stop the loop cycle.
Example:
Example Script number 1:
repeat{
source("C:/Users/myPC/Desktop/script.R")
Sys.sleep(10)
}
Example Script number 2 modified (script saved as script.R in my desktop):
repeat {
print('Checking files')
Sys.sleep(time=10)
}
This run ok. But during the loop cycle I make changes to the script number 2 (and save file) :
Script number 2 modified:
repeat {
print('NOW RE-Checking files')
Sys.sleep(time=10)
}
The result always is this. Not read the script number 2 modified.
[1] "Checking files"
[1] "Checking files"
[1] "Checking files"
[1] "Checking files"
[1] "Checking files"
[1] "Checking files"
Not knowing what your scripts are really doing, it's a little guess-work, but I'll work on the two things I do know from this:
Checking to see if a file has been changed and then reloading it will work 95-99% or more of the time. Unfortunately, the risk is that whatever is writing the file will still be writing when your script-1 tries to source it: writing contents to a file is never atomic, so you cannot guarantee without some form of file-lock (not supported on all filesystems) or other coordination mechanism.
To remedy this, I suggest that what mechanism (sight-unseen) that is modifying script-2 needs to write to a temp-file and then destructively rename it to be script2.R. While writing to a file is not atomic, renaming/moving a file is atomic. With this, when script1.R notices that a file has been updated, it will get to read the whole thing unabated.
I suggest that instead of running code in script2.R, you should define a function that is called (repeatedly) by script-1. That way, script-1 retains control and can check for updates in script2.R and, as necessary, reload it. The function one writes in script-2 needs to do one thing and then yield control back to the caller; that is, no repeat or while loops unless they are very well contained/limited and will not take longer to exit than you expect script2.R to be updated. (Recognize that script-2's while/repeat loop will not be interrupted, so plan accordingly.)
(script2.R)
work <- function() {
print("Checking files")
Sys.sleep(10)
}
(script1.R)
prevmt <- NA
repeat {
mt <- file.info("script2.R")$mtime
if (is.na(prevmt) || prevmt < mt) {
source("script2.R")
}
work()
Sys.sleep(10)
}

Is there a way to check where R is 'stuck' within a for loop? (R)

I am using system() to run several files iteratively through a program via CMD. It deposits each outputs into a sub-directory designated for specifically and only that input file. So # of inputs is exactly equal to the number of output directories/outputs.
My code works for the first iteration, but I can see in the console that it won't move on to the second file after completing the first. The stop sign remains active so I know R is still 'running', but since the for loop environment is unique I can't really tell what it's stuck on. It just stays like this for hours. Therefore I'm not sure how to begin to diagnose the issue I'm having. Is there a way of tracing what happened after cancelling the code, for example?
If your curious, the code looks like this btw. I don't know how to make it reproducible, so I just commented each line:
for (i in 1:length(flist)) {
##flist is a vector of character strings. Each
row of characters is both the name of the input file and the name of the
output directory
setwd(paste0(solutions_dir, "\\", flist[i]))
#sets the appropriate dir
system(paste0(program_dir,"\\program.exe I=",
file_dir, "\\", flist[i], " O=",solutions_dir, "\\", flist[i],
"\\solv"))
##line that inputs program's exe file and the appropriate input/output
locations
}

How to prevent from overwriting the file?

I'm looking for a way to prevent R from overwriting files during the session. The more general solution then better.
Currently I got bunch of functions called e.g.: safe.save, safe.png, safe.write.table which are implemented more or less as
safe.smth <- function(..., file) {
if (file.exists(file))
stop("File exists!")
else
smth(..., file=file)
}
It works, but only if I got control over execution. If some (not mine) function created file I can't stop it from overwrite.
Another way is to set read only flag on files, which also top R from overwriting existing files. But this has drawbacks as well (e.g.: you don't know which files needs to be protected).
Or write one-liner:
protect <- function(p) if (file.exists(p)) stop("File exsits!") else p
and use it always when providing filename.
Is there a way to force this behaviour session wide? Some kind of global setting for connections? Maybe only for subset of functions (graphics devices, file-created connections, etc)? Maybe some system specific solution?
The following could be used as test case:
test <- function(i) {
try(write.table(i, "test_001.csv"))
try(writeLines(as.character(i), "test_002.txt"))
try({png("test_003.png");plot(i);dev.off()})
try({pdf("test_004.pdf");plot(i);dev.off()})
try(save(i, file="test_005.RData"))
try({f<-file("test_006.txt", "w");cat(as.character(i), file=f);close(f)})
}
test(1)
magic_incantations() # or magic_incantations(test(2)), etc.
test(2) # should fail on all writes (to test set read-only to files from first call)
The conventional way to avoid clobbering data files isn't to look for OS hacks, but to use filenames and directories that are special for your session.
session.dir <- tempdir()
...
write.table(i, file.path(session.dir,"test_001.csv"))
writeLines(as.character(i), file.path(session.dir,"test_002.txt"))
...
or
session.pid <- Sys.getpid()
...
write.table(i, paste0("test_001.",session.pid,".csv"))
writeLines(as.character(i), paste0("test_002.",session.pid,".txt"))
...

How to create a new output file in R if a file with that name already exists?

I am trying to run an R-script file using windows task scheduler that runs it every two hours. What I am trying to do is gather some tweets through Twitter API and run a sentiment analysis that produces two graphs and saves it in a directory. The problem is, when the script is run again it replaces the already existing files with that name in the directory.
As an example, when I used the pdf("file") function, it ran fine for the first time as no file with that name already existED in the directory. Problem is I want the R-script to be running every other hour. So, I need some solution that creates a new file in the directory instead of replacing that file. Just like what happens when a file is downloaded multiple times from Google Chrome.
I'd just time-stamp the file name.
> filename = paste("output-",now(),sep="")
> filename
[1] "output-2014-08-21 16:02:45"
Use any of the standard date formatting functions to customise to taste - maybe you don't want spaces and colons in your file names:
> filename = paste("output-",format(Sys.time(), "%a-%b-%d-%H-%M-%S-%Y"),sep="")
> filename
[1] "output-Thu-Aug-21-16-03-30-2014"
If you want the behaviour of adding a number to the file name, then something like this:
serialNext = function(prefix){
if(!file.exists(prefix)){return(prefix)}
i=1
repeat {
f = paste(prefix,i,sep=".")
if(!file.exists(f)){return(f)}
i=i+1
}
}
Usage. First, "foo" doesn't exist, so it returns "foo":
> serialNext("foo")
[1] "foo"
Write a file called "foo":
> cat("fnord",file="foo")
Now it returns "foo.1":
> serialNext("foo")
[1] "foo.1"
Create that, then it returns "foo.2" and so on...
> cat("fnord",file="foo.1")
> serialNext("foo")
[1] "foo.2"
This kind of thing can break if more than one process might be writing a new file though - if both processes check at the same time there's a window of opportunity where both processes don't see "foo.2" and think they can both create it. The same thing will happen with timestamps if you have two processes trying to write new files at the same time.
Both these issues can be resolved by generating a random UUID and pasting that on the filename, otherwise you need something that's atomic at the operating system level.
But for a twice-hourly job I reckon a timestamp down to minutes is probably enough.
See ?files for file manipulation functions. You can check if file exists with file.exists, and then either rename the existing file, or create a different name for the new one.

R: While loop input

I am bit new to R and have a question about a program I am trying to write. I am hoping to take in files (as many as a user pleases) with a while loop (eventually using read.table on each) but it keeps breaking on me.
Here is what I have so far:
cat("Please enter the full path for your files, if you have no more files to add enter 'X': ")
fil<-readLines(con="stdin", 1)
cat(fil, "\n")
while (!input=='X' | !input=='x'){
inputfile=input
input<- readline("Please enter the full path for your files, if you have no more files to add enter 'X': ")
}
if(input=='X' | input=='x'){
exit -1
}
When I run it (from the commandline (UNIX)) I get these results:
> library("lattice")
>
> cat("Please enter the full path for your files, if you have no more files to add enter 'X': ")
Please enter the full path for your files, if you have no more files to add enter 'X': > fil<-readLines(con="stdin", 1)
x
> cat(fil, "\n")
x
> while (!input=='X' | !input=='x'){
+ inputfile=input
+ input<- readline("Please enter the full path for your files, if you have no more files to add enter 'X': ")
+ }
Error: object 'input' not found
Execution halted
I am not quite sure how to fix the problem, but I am pretty sure that it is probably a simple problem.
Any suggestions?
Thanks!
when you first run the script input doesnt exist. Assign
input<-c()
say before your while statement or put
inputfile=input
below input<- readline....
I'm not exactly sure what the underlying problem is for your issue. It may be that you're inputting the directory path incorrectly.
Here's a solution I've used a few times. It makes it much easier for the user. Basically, your code will not require user input, all it requires is that you have a certain naming convention for your files.
setwd("Your/Working/Directory") #This doesn't change
filecontents <- 1
i <- 1
while (filecontents != 0) {
mydata.csv <- try(read.csv(paste("CSV_file_",i,".csv", sep = ""), header = FALSE), silent = TRUE)
if (typeof(mydata.csv) != "list") { #checks to see if the imported data is a list
filecontents <- 0
}
else {
assign(paste('dataset',i, sep=''), mydata)
#Whatever operations you want to do on the files.
i <- i + 1
}
}
As you can see, the naming convention for the files is CSV_file_n where n is any number of input files (i took this code out of one of my programs, in which I load csv's). One of the problems I kept having was Error messages popping up when my code looked for a file that wasn't there. With this loop, those messages won't arise. If it assigns the contents of a non-existant file to mydata.csv, it merely checks to see if mydata.csv is a list. If it is, it continues operating. If not, it stops. If you're worried about differentiating between your data from different files within the code, just insert any relevant information about the file in a constant location within the file itself. For example, in my csv's, My 3rd column always contained the name of the image from which I gathered the information contained in the rest of the csv.
Hope this helps you a bit, even though I see you've already got a solution :-). It's really just an option if you want your program to be more autonomous.

Resources