User independent data import

User independent data import - r

Say X and Y are working on a project which requieres a file data.csv. They have some common file within a cloud service named main.R. Now assume that within main.R X and Y respectively are importing data via
# uncomment first line if you are X, otherwise uncomment second line
# data <- read.csv("C:/User/X/Documents/cloud/project/data.csv")
# data <- read.csv("C:/User/Y/Desktop/cloud/project/data.csv")
Instead of uncomment one of the lines depending who is running the script I'd like to have one command in total which is universal and referes to the part of the file path which they have in common, say,
data <- read.csv(".../cloud/project/data.csv")
Any idea how to accomplish this?

Try this:
#check if directory exists
dataDir <-
if(dir.exists(paste0("C:/Users/",Sys.info()["effective_user"], "/Documents/cloud/project/"))){
paste0("C:/Users/", Sys.info()["effective_user"],"/Documents/cloud/project/")
} else if(dir.exists(paste0("C:/Users/", Sys.info()["effective_user"],"/Desktop/cloud/project/"))) {
paste0("C:/Users/", Sys.info()["effective_user"],"/Desktop/cloud/project/")
}
#if exists then read in
if(!is.null(dataDir)){ read.csv(paste0(dataDir,"data.csv")) }

Related

Create multiple directories based on files found

I'm trying to create directories to store image sequences based on a 'find' command.
Let's say there are 2 image sequences in different locations within the 'test' directory
test_123.####.dpx
test_abc.####.dpx
I would run something like:
testDir=$(find /Users/Tim/test/ -type f -name "test*.dpx")
and it would return all of the files as listed above.
What I would like to do is create two directories named test_123 and test_abc.
mkdir /Users/Tim/test/scan/${testDir:t:r:r}
If I run this then it will only create one directory, presumably based on the first result.
How would I be able to make this work to create directories that share the same base name for an unlimited number of results? (not just two as in the case of this example).

if it's not important that it's done in zsh, you could just do it in python like this:
#!/usr/bin/python3
import os
x = 0
y = 0
if input(f"{os.getcwd()} is current working dir. press y to continue \n") == "y":
for file in os.listdir(): # get files in current folder
split = file.split(".") # split by .
if len(split) > 1: # only files with more than one . are considered
if split[0] not in os.listdir(): # if the folder it fits doesn't exist
os.mkdir(split[0]) # make the folder
print(f"made new folder {split[0]}")
y = y + 1
os.rename(file, os.path.join(split[0],file)) # then move it there - OS agnostic path generated!
x = x + 1
print(f"moved {x} files to {y} folders!")
I added a check before it runs, just to prevent random people who find this from wreaking havoc. The if len(split) > 1: should also prevent some accidents by making sure it's only files with at least 2 dots that are moved, as that's an unusual naming scheme.

If statement for directory starting with specific character

I have a script in which I call R and depending on the directory I specify I want it to carry out a different process. One directory starts with L and the other with S. I have numerous directories that either start with L or S and they all end differently.
I specify the directory in bash and run a script like so:
./script L_dir
or
./script S_dir
So within my R script I have it set up as such:
args <- commandArgs(TRUE)
img_dir <- args[1]
if(img_dir == "^L*"){
do_process_1
} else {
do_process_2
}
Everything works fine except that no matter what directory I specify, the process called will always be do_process_2.
I have looked at this question and tried to adapt it but can't get it to work.

After changing my code to
if(grepl("^LM*", img_dir)){
do_process_1
} else {
do_process_2
}
it worked. Be careful if you change it to the above and it still carries out process_2. This may be because what you are looking for, in my case ^L*, may also be in your second directory name i.e. dir_L = LMNOP, dir_S = STUVLJH. But once i specified ^LM* it did what i wanted it to do.

How to create a new output file in R if a file with that name already exists?

I am trying to run an R-script file using windows task scheduler that runs it every two hours. What I am trying to do is gather some tweets through Twitter API and run a sentiment analysis that produces two graphs and saves it in a directory. The problem is, when the script is run again it replaces the already existing files with that name in the directory.
As an example, when I used the pdf("file") function, it ran fine for the first time as no file with that name already existED in the directory. Problem is I want the R-script to be running every other hour. So, I need some solution that creates a new file in the directory instead of replacing that file. Just like what happens when a file is downloaded multiple times from Google Chrome.

I'd just time-stamp the file name.
> filename = paste("output-",now(),sep="")
> filename
[1] "output-2014-08-21 16:02:45"
Use any of the standard date formatting functions to customise to taste - maybe you don't want spaces and colons in your file names:
> filename = paste("output-",format(Sys.time(), "%a-%b-%d-%H-%M-%S-%Y"),sep="")
> filename
[1] "output-Thu-Aug-21-16-03-30-2014"
If you want the behaviour of adding a number to the file name, then something like this:
serialNext = function(prefix){
if(!file.exists(prefix)){return(prefix)}
i=1
repeat {
f = paste(prefix,i,sep=".")
if(!file.exists(f)){return(f)}
i=i+1
}
}
Usage. First, "foo" doesn't exist, so it returns "foo":
> serialNext("foo")
[1] "foo"
Write a file called "foo":
> cat("fnord",file="foo")
Now it returns "foo.1":
> serialNext("foo")
[1] "foo.1"
Create that, then it returns "foo.2" and so on...
> cat("fnord",file="foo.1")
> serialNext("foo")
[1] "foo.2"
This kind of thing can break if more than one process might be writing a new file though - if both processes check at the same time there's a window of opportunity where both processes don't see "foo.2" and think they can both create it. The same thing will happen with timestamps if you have two processes trying to write new files at the same time.
Both these issues can be resolved by generating a random UUID and pasting that on the filename, otherwise you need something that's atomic at the operating system level.
But for a twice-hourly job I reckon a timestamp down to minutes is probably enough.

See ?files for file manipulation functions. You can check if file exists with file.exists, and then either rename the existing file, or create a different name for the new one.

R - Connect Scripts via Pipes

I have a number of R scripts that I would like to chain together using a UNIX-style pipeline. Each script would take as input a data frame and provide a data frame as output. For example, I am imagining something like this that would run in R's batch mode.
cat raw-input.Rds | step1.R | step2.R | step3.R | step4.R > result.Rds
Any thoughts on how this could be done?

Writing executable scripts is not the hard part, what is tricky is how to make the scripts read from files and/or pipes. I wrote a somewhat general function here: https://stackoverflow.com/a/15785789/1201032
Here is an example where the I/O takes the form of csv files:
Your step?.R files should look like this:
#!/usr/bin/Rscript
OpenRead <- function(arg) {
if (arg %in% c("-", "/dev/stdin")) {
file("stdin", open = "r")
} else if (grepl("^/dev/fd/", arg)) {
fifo(arg, open = "r")
} else {
file(arg, open = "r")
}
}
args <- commandArgs(TRUE)
file <- args[1]
fh.in <- OpenRead(file)
df.in <- read.csv(fh.in)
close(fh.in)
# do something
df.out <- df.in
# print output
write.csv(df.out, file = stdout(), row.names = FALSE, quote = FALSE)
and your csv input file should look like:
col1,col2
a,1
b,2
Now this should work:
cat in.csv | ./step1.R - | ./step2.R -
The - are annoying but necessary. Also make sure to run something like chmod +x ./step?.R to make your scripts executables. Finally, you could store them (and without extension) inside a directory that you add to your PATH, so you will be able to run it like this:
cat in.csv | step1 - | step2 -

Why on earth you want to cram your workflow into pipes when you have the whole R environment available is beyond me.
Make a main.r containing the following:
source("step1.r")
source("step2.r")
source("step3.r")
source("step4.r")
That's it. You don't have to convert the output of each step into a serialised format; instead you can just leave all your R objects (datasets, fitted models, predicted values, lattice/ggplot graphics, etc) as they are, ready for the next step to process. If memory is a problem, you can rm any unneeded objects at the end of each step; alternatively, each step can work with an environment which it deletes when done, first exporting any required objects to the global environment.
If modular code is desired, you can recast your workflow as follows. Encapsulate the work done by each file into one or more functions. Then call these functions in your main.r with the appropriate arguments.
source("step1.r") # defines step1_read_input, step1_f2
source("step2.r") # defines step2_f2
source("step3.r") # defines step3_f1, step3_f2, step3_f3
source("step4.r") # defines step4_write_output
step1_read_input(...)
step1_f2(...)
....
step4write_output(...)

You'll need to add a line at the top of each script to read in from stdin. Via this answer:
in_data <- readLines(file("stdin"),1)
You'll also need to write the output of each script to stdout().

R: While loop input

I am bit new to R and have a question about a program I am trying to write. I am hoping to take in files (as many as a user pleases) with a while loop (eventually using read.table on each) but it keeps breaking on me.
Here is what I have so far:
cat("Please enter the full path for your files, if you have no more files to add enter 'X': ")
fil<-readLines(con="stdin", 1)
cat(fil, "\n")
while (!input=='X' | !input=='x'){
inputfile=input
input<- readline("Please enter the full path for your files, if you have no more files to add enter 'X': ")
}
if(input=='X' | input=='x'){
exit -1
}
When I run it (from the commandline (UNIX)) I get these results:
> library("lattice")
>
> cat("Please enter the full path for your files, if you have no more files to add enter 'X': ")
Please enter the full path for your files, if you have no more files to add enter 'X': > fil<-readLines(con="stdin", 1)
x
> cat(fil, "\n")
x
> while (!input=='X' | !input=='x'){
+ inputfile=input
+ input<- readline("Please enter the full path for your files, if you have no more files to add enter 'X': ")
+ }
Error: object 'input' not found
Execution halted
I am not quite sure how to fix the problem, but I am pretty sure that it is probably a simple problem.
Any suggestions?
Thanks!

when you first run the script input doesnt exist. Assign
input<-c()
say before your while statement or put
inputfile=input
below input<- readline....

I'm not exactly sure what the underlying problem is for your issue. It may be that you're inputting the directory path incorrectly.
Here's a solution I've used a few times. It makes it much easier for the user. Basically, your code will not require user input, all it requires is that you have a certain naming convention for your files.
setwd("Your/Working/Directory") #This doesn't change
filecontents <- 1
i <- 1
while (filecontents != 0) {
mydata.csv <- try(read.csv(paste("CSV_file_",i,".csv", sep = ""), header = FALSE), silent = TRUE)
if (typeof(mydata.csv) != "list") { #checks to see if the imported data is a list
filecontents <- 0
}
else {
assign(paste('dataset',i, sep=''), mydata)
#Whatever operations you want to do on the files.
i <- i + 1
}
}
As you can see, the naming convention for the files is CSV_file_n where n is any number of input files (i took this code out of one of my programs, in which I load csv's). One of the problems I kept having was Error messages popping up when my code looked for a file that wasn't there. With this loop, those messages won't arise. If it assigns the contents of a non-existant file to mydata.csv, it merely checks to see if mydata.csv is a list. If it is, it continues operating. If not, it stops. If you're worried about differentiating between your data from different files within the code, just insert any relevant information about the file in a constant location within the file itself. For example, in my csv's, My 3rd column always contained the name of the image from which I gathered the information contained in the rest of the csv.
Hope this helps you a bit, even though I see you've already got a solution :-). It's really just an option if you want your program to be more autonomous.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

User independent data import - r

Related

Create multiple directories based on files found

If statement for directory starting with specific character

How to create a new output file in R if a file with that name already exists?

R - Connect Scripts via Pipes

R: While loop input

Categories

Resources