I am bit new to R and have a question about a program I am trying to write. I am hoping to take in files (as many as a user pleases) with a while loop (eventually using read.table on each) but it keeps breaking on me.
Here is what I have so far:
cat("Please enter the full path for your files, if you have no more files to add enter 'X': ")
fil<-readLines(con="stdin", 1)
cat(fil, "\n")
while (!input=='X' | !input=='x'){
inputfile=input
input<- readline("Please enter the full path for your files, if you have no more files to add enter 'X': ")
}
if(input=='X' | input=='x'){
exit -1
}
When I run it (from the commandline (UNIX)) I get these results:
> library("lattice")
>
> cat("Please enter the full path for your files, if you have no more files to add enter 'X': ")
Please enter the full path for your files, if you have no more files to add enter 'X': > fil<-readLines(con="stdin", 1)
x
> cat(fil, "\n")
x
> while (!input=='X' | !input=='x'){
+ inputfile=input
+ input<- readline("Please enter the full path for your files, if you have no more files to add enter 'X': ")
+ }
Error: object 'input' not found
Execution halted
I am not quite sure how to fix the problem, but I am pretty sure that it is probably a simple problem.
Any suggestions?
Thanks!
when you first run the script input doesnt exist. Assign
input<-c()
say before your while statement or put
inputfile=input
below input<- readline....
I'm not exactly sure what the underlying problem is for your issue. It may be that you're inputting the directory path incorrectly.
Here's a solution I've used a few times. It makes it much easier for the user. Basically, your code will not require user input, all it requires is that you have a certain naming convention for your files.
setwd("Your/Working/Directory") #This doesn't change
filecontents <- 1
i <- 1
while (filecontents != 0) {
mydata.csv <- try(read.csv(paste("CSV_file_",i,".csv", sep = ""), header = FALSE), silent = TRUE)
if (typeof(mydata.csv) != "list") { #checks to see if the imported data is a list
filecontents <- 0
}
else {
assign(paste('dataset',i, sep=''), mydata)
#Whatever operations you want to do on the files.
i <- i + 1
}
}
As you can see, the naming convention for the files is CSV_file_n where n is any number of input files (i took this code out of one of my programs, in which I load csv's). One of the problems I kept having was Error messages popping up when my code looked for a file that wasn't there. With this loop, those messages won't arise. If it assigns the contents of a non-existant file to mydata.csv, it merely checks to see if mydata.csv is a list. If it is, it continues operating. If not, it stops. If you're worried about differentiating between your data from different files within the code, just insert any relevant information about the file in a constant location within the file itself. For example, in my csv's, My 3rd column always contained the name of the image from which I gathered the information contained in the rest of the csv.
Hope this helps you a bit, even though I see you've already got a solution :-). It's really just an option if you want your program to be more autonomous.
Related
So my problem is the following, I am reading through a text file using readlines, and have to go line by line until there is a trigger, and then I have to read the following line using read.fortran().
The problem is that read.fortran() only read text files, and I need to read a line (a string). At the moment I have solved this problem with a disastrous function which writes a file and then I read the file.
read.fortran.string <- function(x_string, x_format, x_file, col.names){
# Disastrous function to be changed
if (missing(x_file)){
x_file = r'(C:\Users\your_user\Documents\tempfolder\line.txt)'
}
write(x_string,x_file)
df = read.fortran(x_file, x_format)
if (!missing(col.names)){
colnames(df) <-col.names
}
return(df)
}
I was expecting that someone will point to a different solution in which:
Do not require to hardcode a path
Do not require having to have writing permission
I believe that
read.fortran(textConnection(x_string), x_format)
will do what you want (untested!)
I am using system() to run several files iteratively through a program via CMD. It deposits each outputs into a sub-directory designated for specifically and only that input file. So # of inputs is exactly equal to the number of output directories/outputs.
My code works for the first iteration, but I can see in the console that it won't move on to the second file after completing the first. The stop sign remains active so I know R is still 'running', but since the for loop environment is unique I can't really tell what it's stuck on. It just stays like this for hours. Therefore I'm not sure how to begin to diagnose the issue I'm having. Is there a way of tracing what happened after cancelling the code, for example?
If your curious, the code looks like this btw. I don't know how to make it reproducible, so I just commented each line:
for (i in 1:length(flist)) {
##flist is a vector of character strings. Each
row of characters is both the name of the input file and the name of the
output directory
setwd(paste0(solutions_dir, "\\", flist[i]))
#sets the appropriate dir
system(paste0(program_dir,"\\program.exe I=",
file_dir, "\\", flist[i], " O=",solutions_dir, "\\", flist[i],
"\\solv"))
##line that inputs program's exe file and the appropriate input/output
locations
}
I have a script in which I call R and depending on the directory I specify I want it to carry out a different process. One directory starts with L and the other with S. I have numerous directories that either start with L or S and they all end differently.
I specify the directory in bash and run a script like so:
./script L_dir
or
./script S_dir
So within my R script I have it set up as such:
args <- commandArgs(TRUE)
img_dir <- args[1]
if(img_dir == "^L*"){
do_process_1
} else {
do_process_2
}
Everything works fine except that no matter what directory I specify, the process called will always be do_process_2.
I have looked at this question and tried to adapt it but can't get it to work.
After changing my code to
if(grepl("^LM*", img_dir)){
do_process_1
} else {
do_process_2
}
it worked. Be careful if you change it to the above and it still carries out process_2. This may be because what you are looking for, in my case ^L*, may also be in your second directory name i.e. dir_L = LMNOP, dir_S = STUVLJH. But once i specified ^LM* it did what i wanted it to do.
I'm having issues when trying to read in a binary file I've previously written into another program. I have been able to open it and read it to an array with out compilation errors, however, the array is not populated (all 0's). Any suggestions or thoughts would be great. Here is the open/read statement I'm using:
allocate(dummy(imax,jmax))
open(unit=io, file=trim(input), form='binary', access='stream', &
iostat=ioer, status='old', action='READWRITE')
if(ioer/=0) then
print*, 'Cannot open file'
else
print*,'success opening file'
end if
read(unit=io, fmt=*, iostat=ioer) dummy
j=0
k=0
size: do j=1, imax
do k=1, jmax
if(dummy(j,k) > 0.) print*,dummy(j,k)
end do
end do size
Please let me know if you need more info.
Here is how the file is originally written:
out_file = trim(output_dir)//'SEVIRI_FRP_.08deg_'//trim(season)//'.bin'
print*, out_file
print*, i_max,' i_max,',j_max,' j_max'
open (io, file = out_file, access = 'direct', status = 'replace', recl = i_max*j_max*4)
write(io, rec = 1) sev_frp
write(io, rec = 2) count_sev_frp
write(io, rec = 3) sum_sev_frp
check: do n=1, i_max
inna: do m=1, j_max
!if (sev_frp(n,m) > 0) print*, count_sev_frp(n,m)
end do inna
end do check
print*,'n-',n,'m-',m
close(io)
First of all the form takes two possible values as far as I know: "FORMATTED" or "UNFORMATTED".
Second, to read, you should use a open that is symmetric to the open statement that you used to write the file, Unless you know exactely what you are doing. I suggest that for reading, you open with:
open(unit=io, file=trim(input), access='direct', &
iostat=ioer, status='old', action='READ', recl = i_max*j_max*4)
That corresponds to the open statement that you used to save the file.
As innoSPG says, you have a mismatch in the way the file is written and how it is read.
An external file may be connected with one of three access methods: sequential; direct; stream. Further, a connection may be formatted or unformatted.
When the file is opened for writing it uses the direct access method with unformatted records. The records are unformatted because this is the default (in the abscence of the form= specifier).
When you open the file for reading you use the non-standard extension of form="binary" and stream access. There is possibly nothing wrong with this, but it does require care.
However, with the read statements you are using formatted (list-directed) input. This will not be allowed.
The way suggested in the previous answer, of using a similar access method and record length will require a further change to the code. [You'll also need to set the value of the record length somehow.]
Not only will you need to remove the format, to match the unformatted records written, but you'll want to use the rec= specifier to access the records of the file.
Finally, if you are using the iostat= specifier you really should check the resulting value.
I am trying to run an R-script file using windows task scheduler that runs it every two hours. What I am trying to do is gather some tweets through Twitter API and run a sentiment analysis that produces two graphs and saves it in a directory. The problem is, when the script is run again it replaces the already existing files with that name in the directory.
As an example, when I used the pdf("file") function, it ran fine for the first time as no file with that name already existED in the directory. Problem is I want the R-script to be running every other hour. So, I need some solution that creates a new file in the directory instead of replacing that file. Just like what happens when a file is downloaded multiple times from Google Chrome.
I'd just time-stamp the file name.
> filename = paste("output-",now(),sep="")
> filename
[1] "output-2014-08-21 16:02:45"
Use any of the standard date formatting functions to customise to taste - maybe you don't want spaces and colons in your file names:
> filename = paste("output-",format(Sys.time(), "%a-%b-%d-%H-%M-%S-%Y"),sep="")
> filename
[1] "output-Thu-Aug-21-16-03-30-2014"
If you want the behaviour of adding a number to the file name, then something like this:
serialNext = function(prefix){
if(!file.exists(prefix)){return(prefix)}
i=1
repeat {
f = paste(prefix,i,sep=".")
if(!file.exists(f)){return(f)}
i=i+1
}
}
Usage. First, "foo" doesn't exist, so it returns "foo":
> serialNext("foo")
[1] "foo"
Write a file called "foo":
> cat("fnord",file="foo")
Now it returns "foo.1":
> serialNext("foo")
[1] "foo.1"
Create that, then it returns "foo.2" and so on...
> cat("fnord",file="foo.1")
> serialNext("foo")
[1] "foo.2"
This kind of thing can break if more than one process might be writing a new file though - if both processes check at the same time there's a window of opportunity where both processes don't see "foo.2" and think they can both create it. The same thing will happen with timestamps if you have two processes trying to write new files at the same time.
Both these issues can be resolved by generating a random UUID and pasting that on the filename, otherwise you need something that's atomic at the operating system level.
But for a twice-hourly job I reckon a timestamp down to minutes is probably enough.
See ?files for file manipulation functions. You can check if file exists with file.exists, and then either rename the existing file, or create a different name for the new one.