R: Importing different excel file and rename colums in a loop - r

I am struggeling in writing a code to import different files (selecting and renaming columns) with a loop.
In detail, I am looking for an efficient way to create a loop that does the same as follows for several .xlsx files.
library(readxl)
Data <-read_excel('File.xlsx', sheet = "results", range = cell_cols("B:D"),
c("Col1", "Col2", "Col3"))
I have done several attemps, but non of them worked; can anyone suggest a solution?

Related

Cannot re-import .csv file after transposing and exporting

I'm working with a huge file to do some data analysis in R. This file is a .csv. I can import it just fine. However, after transposing all the rows and columns using data.frame(t(data)), I export it and cannot re-import this data.
This is the code I am using:
write.csv(transposed_data, file = "transposed_data.csv", row.names = FALSE, quote = FALSE)
When I transpose the rows and columns, does something happen to the data that is causing these issues? When using read.csv, the transposed data simply will not open.

Writing dataframe to .txt file in R, without   added

I have merged two tables together and I want to write them to a .txt file. I have managed to do this, however when I open the .txt file in excel the symbols   have been added to some values. How do I stop this from happening? I have used the following code:
ICU_PPS <- merge(ICU, PPS, by=c("Study","Lane","Isolate_ID","Sample_Number","MALDI_ID", "WGS","Source"),all=TRUE)
write.table(ICU_PPS,"ICUPPS2.txt", sep="\t", row.names = FALSE)
An example of some values in a column that I get:
100_1#175
100_1#176
100_1#177
100_1#179
100_1#18 
100_1#19 
100_1#20 
What I want to achieve:
100_1#175
100_1#176
100_1#177
100_1#179
100_1#18
100_1#19
100_1#20

Creating a loop for creating multiple sheet from multiple excel files in R

I have multiple excel files with data. I wanted to split the data in each excel file into multiple sheets within that particular excel file. I have already managed to do that with the following code:
library(Openxlsx)
data<- read.xlsx(file.choose())
splitdata <- split(data, data$Assigned)
splitdata
workbook <- createWorkbook()
Map(function(data,name){
addWorksheet(workbook, name)
writeDataTable(workbook, name, data)
},splitdata, names(splitdata))
saveWorkbook(workbook, file = "WorkbookWithMultipleSheets.xlsx", overwrite = TRUE)
However, I have more than 50 excel files, for which I need to create multiple sheets using the code above. Is there any way to create a loop so that I won't have to write this data for each excel file that I have?
Any help is appreciated! Thank you!

Specify column classes when reading in data via lapply(FileList, read.xls)

My question is about how to specify the class for various columns when reading in data that come from many files. More specifically, I am uploading 1000s of .xlsx files at a time and converting them to .csv files using the read.xls() function in the gdata package.
My approach is as follows:
Myfiles<-list.files() # lists all files in working directory (which contains data files)
library(gdata)
Mylist <- lapply(Myfiles, read.xls, header=T,
perl="C:/Users/A/PERL/perl/bin/perl.exe",
sheet=1,
method="csv",
skip=1,
as.is=1)
I apologize for not providing a workable example. I'm not sure how to do so for this problem.
All the .xlsx files have identical headers and set-up, but the classes of corresponding columns in the data frames within Mylist are not all the same. Is there a way to specify the classes within the lapply() approach I am using? I know you can extend functions of read.table() to read.xls() but I haven't figured out how to specify the column classes properly within the lapply call.
It's all in Gabor's comment, but to put this one to bed:
lapply(Myfiles, read.xls, colClasses = c("character", "numeric", "factor"), header=T)

Join two dataframes before exporting as .csv files

I am working on a large questionnaire - and I produce summary frequency tables for different questions (e.g. df1 and df2).
a<-c(1:5)
b<-c(4,3,2,1,1)
Percent<-c(40,30,20,10,10)
df1<-data.frame(a,b,Percent)
c<-c(1,1,5,2,1)
Percent<-c(10,10,50,20,10)
df2<-data.frame(a,c,Percent)
rm(a,b,c,Percent)
I normally export the dataframes as csv files using the following command:
write.csv(df1 ,file="df2.csv")
However, as my questionnaire has many questions and therefore dataframes, I was wondering if there is a way in R to combine different dataframes (say with a line separating them), and export these to a csv (and then ultimately open them in Excel)? When I open Excel, I therefore will have just one file with all my question dataframes in, one below the other. This one csv file would be so much easier than having individual files which I have to open in turn to view the results.
Many thanks in advance.
If your end goal is an Excel spreadsheet, I'd look into some of the tools available in R for directly writing an xls file. Personally, I use the XLConnect package, but there is also xlsx and also several write.xls functions floating around in various packages.
I happen to like XLConnect because it allows for some handy vectorization in situations just like this:
require(XLConnect)
#Put your data frames in a single list
# I added two more copies for illustration
dfs <- list(df1,df2,df1,df2)
#Create the xls file and a sheet
# Note that XLConnect doesn't seem to do tilde expansion!
wb <- loadWorkbook("/Users/jorane/Desktop/so.xls",create = TRUE)
createSheet(wb,"Survey")
#Starting row for each data frame
# Note the +1 to get a gap between each
n <- length(dfs)
rows <- cumsum(c(1,sapply(dfs[1:(n-1)],nrow) + 1))
#Write the file
writeWorksheet(wb,dfs,"Survey",startRow = rows,startCol = 1,header = FALSE)
#If you don't call saveWorkbook, nothing will happen
saveWorkbook(wb)
I specified header = FALSE since otherwise it will write the column header for each data frame. But adding a single row at the top in the xls file at the end isn't much additional work.
As James commented, you could use
merge(df1, df2, by="a")
but that would combine the data horizontally. If you want to combine them vertically you could use rbind:
rbind(df1, df2, df3,...)
(Note: the column names need to match for rbind to work).

Resources