How to quickly export multiple files from RStudio - r

I'd like to know, how can I export subsets of a dataframe in R in an automated way?
I am currently using this manual method, where I retype 'a' and 'file_name' values for every file I want to save:
data <- MS[grepl('a', MS$name),]
write.xlsx(data, 'file_path/file_name')
Any help would be very much appreciated.

I would try something like this:
lijst <- c('a','b','c') # list of the values you type for 'a'
for(a in lijst){
filename <- paste0('file_path/',a,'.xlsx')
data <- MS[grepl(a, MS$name),]
write.xlsx(data, filename)
}

Related

Binding rows of multiple data frames into one data frame in R

I have a vector of file paths called dfs, and I want create a dataframe of those files and bind them together into one huge dataframe, so I did something like this :
for (df in dfs){
clean_df <- bind_rows(as.data.table(read.delim(df, header=T, sep="|")))
return(clean_df)
}
but only the last item in the dataframe is being returned. How do I fix this?
I'm not sure about your file format, so I'll take common .csv as an example. Replace the a * i part with actually reading all the different files, instead of just generating mockup data.
files = list()
for (i in 1:10) {
a = read.csv('test.csv', header = FALSE)
a = a * i
files[[i]] = a
}
full_frame = data.frame(data.table::rbindlist(files))
The problem is that you can only pass one file at a time to the function read.delim(). So the solution would be to use a function like lapply() to read in each file specified in your df.
Here's an example, and you can find other answers to your question here.
library(tidyverse)
df <- c("file1.txt","file2.txt")
all.files <- lapply(df,function(i){read.delim(i, header=T, sep="|")})
clean_df <- bind_rows(all.files)
(clean_df)
Note that you don't need the function return(), putting the clean_df in parenthesis prompts R to print the variable.

Convert a list into multiple data frame by list column

I have import a excel file with multi worksheets. It’s a list format.
names(mysheets)
#[1] "test_sheet1" "test_sheet2"
Test_sheet1 and test _sheet2 have a different matrix.
I have to put each worksheets as individual data frame.
If do it manually, the code will look like this:
s_1 <- data.frame(mysheets[1])
s_2 <- data.frame(mysheets[2])
I try to write a function to do it, because I have many excel files and each file have multi worksheets
function
p_fun <- function (y) {
for (s_i in 1:2) {
for (i in 1:2) {
s_i<- data.frame(y[i])
return(s_i) }}}
It didn’t work correctly.
Appreciate if anyone can help.
You could use mget to get the object and then change them to data.frame
list_df <- lapply(mget(names(mysheets)), data.frame)
If you want them as separate dataframes, we can do
names(list_df) <- paste0('s_', seq_along(list_df))
list2env(list_df, .GlobalEnv)
We can use assign if we are doing this in a for loop
for(i in seq_along(mysheets)) assign(paste0("s", i), data.frame(mysheets[i]))

How to store data from read.table to variable array

I have data files something like
class1 class2 ....
1 1 ....
2 1
If I try to read data file like this
var <- read.table("file path", sep="\t",header=TRUE)
It works correctly, so I can access to the data using 'var' variable.
but, If I try to read data using for loop using variable list like this,
var <- c()
for(file in list.files(path="inputDir")){
i <- i+1
var[i] <- read.table("file path", sep="\t", header=TRUE)
}
I get only first column of the file, and can't get full data of the file.
Do I have to make separate variables like var1, var2, ...?
Can't I use var[i]??
With
var <- c()
you create a (numerical) vector. I guess the imported data gets coerced to that format too, which is why you only see 'one column'.
What you want is a list:
var <- list()
Make sure to index it with double brackets afterwards, like so:
var[[i]] = ...
You should use list to do such work. data.frame can only store variable with same rows.
var <- list()
i <- 1
for(file in list.files(path="inputDir")){
var[[as.character(i)]] <- read.table("file path", sep="\t", header=TRUE)
i <- i+1
}
I hope this will help you.
I don't if these code can work correctly, and you can debug according to error reports.
And if you really do not know how to do it, you should give some sample files, so everyone can debug for you.

lapply r to one column of a csv file

I have a folder with several hundred csv files. I want to use lappply to calculate the mean of one column within each csv file and save that value into a new csv file that would have two columns: Column 1 would be the name of the original file. Column 2 would be the mean value for the chosen field from the original file. Here's what I have so far:
setwd("C:/~~~~")
list.files()
filenames <- list.files()
read_csv <- lapply(filenames, read.csv, header = TRUE)
dataset <- lapply(filenames[1], mean)
write.csv(dataset, file = "Expected_Value.csv")
Which gives the error message:
Warning message: In mean.default("2pt.csv"[[1L]], ...) : argument is not numeric or logical: returning NA
So I think I have 2(at least) problems that I cannot figure out.
First, why doesn't r recognize that column 1 is numeric? I double, triple checked the csv files and I'm sure this column is numeric.
Second, how do I get the output file to return two columns the way I described above? I haven't gotten far with the second part yet.
I wanted to get the first part to work first. Any help is appreciated.
I didn't use lapply but have done something similar. Hope this helps!
i= 1:2 ##modify as per need
##create empty dataframe
df <- NULL
##list directory from where all files are to be read
directory <- ("C:/mydir/")
##read all file names from directory
x <- as.character(list.files(directory,,pattern='csv'))
xpath <- paste(directory, x, sep="")
##For loop to read each file and save metric and file name
for(i in i)
{
file <- read.csv(xpath[i], header=T, sep=",")
first_col <- file[,1]
d<-NULL
d$mean <- mean(first_col)
d$filename=x[i]
df <- rbind(df,d)
}
###write all output to csv
write.csv(df, file = "C:/mydir/final.csv")
CSV file looks like below
mean filename
1999.000661 hist_03082015.csv
1999.035121 hist_03092015.csv
Thanks for the two answers. After much review, it turns out that there was a much easier way to accomplish my goal. The csv files that I had were originally in one file. I split them into multiple files by location. At the time, I thought this was necessary to calculate mean on each type. Clearly, that was a mistake. I went to the original file and used aggregate. Code:
setwd("C:/~~")
allshots <- read.csv("All_Shots.csv", header=TRUE)
EV <- aggregate(allshots$points, list(Location = allshots$Loc), mean)
write.csv(EV, file= "EV_location.csv")
This was a simple solution. Thanks again or the answers. I'll need to get better at lapply for future projects so they were not a waste of time.

How to not overwrite file in R

I am trying to copy and paste tables from R into Excel. Consider the following code from a previous question:
data <- list.files(path=getwd())
n <- length(list)
for (i in 1:n)
{
data1 <- read.csv(data[i])
outline <- data1[,2]
outline <- as.data.frame(table(outline))
print(outline) # this prints all n tables
name <- paste0(i,"X.csv")
write.csv(outline, name)
}
This code writes each table into separate Excel files (i.e. "1X.csv", "2X.csv", etc..). Is there any way of "shifting" each table down some rows instead of rewriting the previous table each time? I have also tried this code:
output <- as.data.frame(output)
wb = loadWorkbook("X.xlsx", create=TRUE)
createSheet(wb, name = "output")
writeWorksheet(wb,output,sheet="output",startRow=1,startCol=1)
writeNamedRegion(wb,output,name="output")
saveWorkbook(wb)
But this does not copy the dataframes exactly into Excel.
I think, as mentioned in the comments, the way to go is to first merge the data frames in R and then writing them into (one) output file:
# get vector of filenames
filenames <- list.files(path=getwd())
# for each filename: load file and create outline
outlines <- lapply(filenames, function(filename) {
data <- read.csv(filename)
outline <- data[,2]
outline <- as.data.frame(table(outline))
outline
})
# merge all outlines into one data frame (by appending them row-wise)
outlines.merged <- do.call(rbind, outlines)
# save merged data frame
write.csv(outlines.merged, "all.csv")
Despite what microsoft would like you to believe, .csv files are not excel files, they are a common file type that can be read by excel and many other programs.
The best approach depends on what you really want to do. Do you want all the tables to read into a single worksheet in excel? If so you could just write to a single file using the append argument to the write.csv or other functions. Or use a connection that you keep open so each new one is appended. You may want to use cat to put a couple of newlines before each new table.
Your second attempt looks like it uses the XLConnect package (but you don't say, so it could be something else). I would think this the best approach, how is the result different from what you are expecting?

Resources