I have written a R script for binning on the specific parameters of several .csv files in the same folder. I used the smbinning package. When I execute the script, it produces detailed results. I do not need all of them. I want to take a specific part of the results and write into a .csv file automatically. Can someone tell me how can I do this? My R script, details results, and wanted parts of result is as follows
My R script is as follows:
library(smbinning)
files <- list.files(pattern = "0.csv")
cutpoint <- rep(0,length(files))
for(i in 1:length(files)){
data <- read.csv(files[i],header=T)
df.train <- data.frame(data)
df.train_amp <-rbind(df.train)
cutpoint[i] <- smbinning(df=df.train_amp, y="cvflg",x="dwell")
}
result <- cbind(files,cutpoint)
write.csv(result,"result_dwell.csv")
You can use View(result) to see if the variable contains exactly what your require. Else there is something wrong in your logic.
There is function sink in R which writes the output of a program to a file.
https://stat.ethz.ch/R-manual/R-devel/library/base/html/sink.html
Related
I am using the purrr:walk to read multiple excel files and it failed. I have 3 questions:
(1) I used the function list.files to read the excel file list in one folder. But the returned values also included the subfolders. I tried set value for the parameters recursive= and include.dirs=, but it didn't work.
setwd(file_path)
files<-as_tibble(list.files(file_path,recursive=F,include.dirs=F)) %>%
filter(str_detect(value,".xlsx"))
files
(2) When I used the following piece of code, it can run without any error or warning message, but there is no returned data.
###read the excel data
file_read <- function(value1) {
print(value1)
file1<-read_excel(value1,sheet=1)
}
walk(files$value,file_read)
When I used the following, it worked. Not sure why.
test<-read_excel(files$value,sheet=1)
(3) In Q2, actually I want to create file1 to file6, suppose there are 6 excel files. How can I dynamically assign the dataset name?
list.files has pattern argument where you can specify what kind of files you are looking for. This will help you avoid filter(str_detect(value,".xlsx")) step. Also list.files only returns the files that are included in the main directory (file_path) and not it's subdirectory unless you specify recursive = TRUE.
library(readxl)
setwd(file_path)
files <- list.files(pattern = '\\.xlsx')
In the function you need to return the object.
file_read <- function(value1) {
data <- read_excel(value1,sheet=1)
return(data)
}
Now you can use map/lapply to read the files.
result <- purrr::map(files,file_read)
I am trying to deal with extracting a subset from multiple .grb2 files in the same file path, and write them in a csv. I am able to do it for one (or a few) by using the following set of commands:
GRIB <- brick("tmp2m.1989102800.time.grb2")
GRIB <- as.array(GRIB)
readGDAL("tmp2m.1989102800.time.grb2")
tmp2m.6hr <- GRIB[51,27,c(261:1232)]
str(tmp2m.6hr)
tmp2m.data <- data.frame(tmp2m.6hr)
write.csv(tmp2m.data,"tmp1.csv")
The above set of commands extract, in csv, temperature values for specific latitude "51" and longitude "27", as well as for a specific time range "c(261:1232)".
Now I have hundreds of these files (with different file names, of course) in the same directory and I want to do the same for all. As you know, better than me, I cannot do this to one by one, changing the file name each time.
I have struggled a lot with this, but so far I did not manage to do it. Since I am new in R, and my knowledge is limited, I would very much appreciate any possible help with this.
The simplest way would be to use a normal for loop:
path <- "your file path here"
input.file.names <- dir(path, pattern =".grb2")
output.file.names <- paste0(tools::file_path_sans_ext(file.names),".csv")
for(i in 1:length(file.names)){
GRIB <- brick(input.file.names[i])
GRIB <- as.array(GRIB)
readGDAL(input.file.names[i]) # edited line
tmp2m.6hr <- GRIB[51,27,c(261:1232)]
str(tmp2m.6hr)
tmp2m.data <- data.frame(tmp2m.6hr)
write.csv(tmp2m.data,output.file.names[i])
}
You could of course create the body of the for loop into a function and then use the standard lapply or the map function from purrr.
Note that this code will print out different CSV files. If you want to append the data to a single file then you should check out write.table
I would like to run two R functions from the library korPus on a bunch of txt files. First, I want to tokenize or treetag these texts and then to run MTLD on the outputs and save results in a single document. I'm trying to do this using lapply or for loop. The basic syntax for both commands is straightforward:
tokenized_1 <- tokenize("txt", lang="en")
MLTD(tokenized_1)
I've tried the following
files <- list.files(path="path/to/dir", pattern="*.txt", full.names=T, recursive=FALSE)
tokenized<-(lapply(files, tokenize, lang='en'))
But, this merges all results into a one file whereas I'd like to go through all files separately. In addition, it creates an invalid class for MTLD function.
I am familiar with for loop from shell scripting and I thought this would do what I want but
for (i in files){tokenize(i, lang='en')}
does nothing. I'm aware that this is a very naive and simplistic approach but I'm not a very experienced R user.
Use lapply
filenames <- list.files(pattern="*.txt")
text.tagged <- lapply(filenames, function(x) treetag(x, treetagger="manual", lang="en",
TT.options=list(path=filepath, preset="en")))
text.tagged[1]
I have a bunch of CSV files and I would like to perform the same analysis (in R) on the data within each file. Firstly, I assume each file must be read into R (as opposed to running a function on the CSV and providing output, like a sed script).
What is the best way to input numerous CSV files to R, in order to perform the analysis and then output separate results for each input?
Thanks (btw I'm a complete R newbie)
You could go for Sean's option, but it's going to lead to several problems:
You'll end up with a lot of unrelated objects in the environment, with the same name as the file they belong to. This is a problem because...
For loops can be pretty slow, and because you've got this big pile of unrelated objects, you're going to have to rely on for loops over the filenames for each subsequent piece of analysis - otherwise, how the heck are you going to remember what the objects are named so that you can call them?
Calling objects by pasting their names in as strings - which you'll have to do, because, again, your only record of what the object is called is in this list of strings - is a real pain. Have you ever tried to call an object when you can't write its name in the code? I have, and it's horrifying.
A better way of doing it might be with lapply().
# List files
filelist <- list.files(pattern = "*.csv")
# Now we use lapply to perform a set of operations
# on each entry in the list of filenames.
to_dispose_of <- lapply(filelist, function(x) {
# Read in the file specified by 'x' - an entry in filelist
data.df <- read.csv(x, skip = 1, header = TRUE)
# Store the filename, minus .csv. This will be important later.
filename <- substr(x = x, start = 1, stop = (nchar(x)-4))
# Your analysis work goes here. You only have to write it out once
# to perform it on each individual file.
...
# Eventually you'll end up with a data frame or a vector of analysis
# to write out. Great! Since you've kept the value of x around,
# you can do that trivially
write.table(x = data_to_output,
file = paste0(filename, "_analysis.csv"),
sep = ",")
})
And done.
You can try the following codes by putting all csv files in the same directory.
names = list.files(pattern="*.csv") %csv file names
for(i in 1:length(names)){ assign(names[i],read.csv(names[i],skip=1, header=TRUE))}
Hope this helps !
In R, I would like to read in data from a file, then do a bunch of stuff, then write out data to another file. I can do that. But I'd like to have the two files have similar names automatically.
e.g. if I create a file params1.R I can read it in with
source("c:\\personal\\consults\\ElwinWu\\params1.R")
then do a lot of stuff
then write out a resulting table with write.table and a filename similar to above, except with output1 instead of params1.
But I will be doing this with many different params files, and I can foresee making careless mistakes of not changing the output file to match the params file. Is there a way to automate this?
That is, set the number for output to match the number for params?
thanks
Peter
If your source file always contains "params" which you want to change to "output" then you can easilly do this with gsub:
source(file <- "c:\\personal\\consults\\ElwinWu\\params1.R")
### some stuff
write.table(youroutput, gsub("params","output",file) )
# Will write in "c:\\personal\\consults\\ElwinWu\\output1.R"
Edit:
Or to get .txt as filetype:
write.table(youroutput, gsub(".R",".txt",gsub("params","output",file)))
# Will output in c:\\personal\\consults\\ElwinWu\\output1.txt"
Edit2:
And a loop for 20 param files then would be:
n <- 20 # number of files
for (i in 1:n)
{
source(file <- paste("c:\\personal\\consults\\ElwinWu\\params",i,".R",sep=""))
### some stuff
write(youroutput, gsub(".R",".txt",gsub("params","output",file)))
}
If the idea is just to make sure that all the outputs go in the same directory as the input then try this:
source(file <- "c:\\personal\\consults\\ElwinWu\\params1.R")
old.dir <- setwd(dirname(file))
write.table(...whatever..., file = "output1.dat")
write.table(...whatever..., file = "output2.dat")
setwd(old.dir)
If you don't need to preserve the initial directory you can omit the last line.