I have a folder containing excel files. There are some that have worksheets and I just want to know how to use a special case (maybe a for loop) where I would identify which files need to be used to read worksheets and then select only certain tabs to join all of the excel files together at the end. In addition, these tabs would need to skip 40 lines which I assume would use skip=40. When I type the code that I have, it looks like a giant mess.
files <- list.files(path = "/Users/Desktop/folder2, pattern = "*.xlsx")
files_join <- lapply(files, read_excel) %>%
bind_rows()
Related
In my folder I do have 45 different excel files, each having only one sheet and different data structure. I want to import them in R and then just export them back into one single excel file. Till here I can do with the help of google. However I want the sheet names of the individual sheets in the exported file to be what they were in the original file. Below is what my code looks like till now and I can't seem to figure out how to change it (this code works fine just not the way I want).
setwd("C:\\Users\\I0510906\\OneDrive - Sanofi\\Desktop\\Insulin\\IQVIA Files\\RAuto\\PERFB\\INSULIN\\")
data.files <- list.files(pattern = "*.xlsx")
# read the files
files <- lapply(data.files, function(x) read.xlsx2(x, sheetIndex = 1))
for (i in 1:length(data.files)){
# loading the workbook or excel file
wbook <- loadWorkbook(data.files[i])
#extracting sheets from the individual workbooks
sheet <- getSheets(wbook)
for (j in 1:length(sheet)){
assign(paste("global.", i,j, sep = ""),
read.xlsx2(data.files[i], sheetIndex=j,
as.data.frame=TRUE, header=TRUE))
}
}
for (i in 1:length(data.files)) {
if(i==1)
write.xlsx(files[[i]], file="global-data.xlsx",
sheetName = paste("global",i))
else
write.xlsx(files[[i]], file="global-data.xlsx",
sheetName = paste("global",i), append=TRUE)
}
Created on 2022-08-22 by the reprex package (v2.0.1)
Currently it is setting the name of sheets in the exported final file to be Global1/2/3 etc. while what I want is the sheet names should be XYZ/ABC/DEF etc. which were exactly at the time of importing them in RStudio.
EDIT
To make it more clear, I have a folder with 45 files inside. All of them have different structure. My need is not to create a single DF.
e.g. my first excel file is named "SLIDE1". The sheet name is also "SHEET1". Same way the 45th file will have a sheet named "SLIDE45".
After importing all the 45 of them, I want to export them back as one single excel file with 45 different worksheets and name of each of the worksheet would be "SLIDE1/SLIDE2/....../SLIDE45" and so on.
This should work:
library(tidyverse)
files <- list.files("C:\\Users\\I0510906\\OneDrive - Sanofi\\Desktop\\Insulin\\IQVIA Files\\RAuto\\PERFB\\INSULIN\\",
pattern = ".xlsx$",
full.names = TRUE)
files_list <- map(files, rio::import) |>
setNames(tools::file_path_sans_ext(basename(files)))
rio::export(files_list, "C:\\Users\\I0510906\\OneDrive - Sanofi\\Desktop\\Insulin\\IQVIA Files\\RAuto\\PERFB\\INSULIN\\insulin.xlsx")
map iterates over the vector files and applies the same function to each file (in this case import) and returns a list with one element per iteration (essentially The same as lapply)
rio::import can handle most file types automatically and is easier to use than loadWorkbook
setNames sets names to each list elements
rio::export likewise can handle most file types and uses sensible default values for export. When given a named list of data.frames, it writes each element into a sheet
Folder 1 and Folder 2 are full of .rds files. How would I go about merging all files in both folders into 1 .rds file?
What I have so far
mergedat <- do.call('rbind', lapply(list.files("File/Path/To/Folder/1/", full.names = TRUE), readRDS))
However I don't know how to add the second file path and even then, the code above does not seem to be working.
The information in the .rds files are all set up exactly the same as far as number of columns and column headers go, but the information in them is obviously different. I just figured out that I did not have the files read either within my code.
Any suggestions?
You can do something like this twice, each time for a different path:
path <- "./files"
files <- list.files(path = path,
full.names = TRUE,
all.files = FALSE)
files <- files[!file.info(files)$isdir]
data <- lapply(files,
function(x) {
readRDS(x)
})
You end up with 2 data objects which are lists with each list element containing a data frame that corresponds with what is in the RDS file. If all those files are the same in terms if structure, you can use dplyr::bind_rows() to concatenate all data frames into one combined data frame.
I am new to R. I have multiple Excel files in different folders that I would like to take out into a single file. All the files contain the same heading in row 2 except some have an extra heading (hierarchy). I would like to only take out and combine all the spreadsheets that exclude the Hierarchy heading.
Is there an easy way of doing this in R?
Thank you so much for your help!
This is to get you started, but after the import you need filter out the wrong sheets, I can't do that, because the condition is not clear enough.
library(readxl)
library(data.table)
# Import single sheets for all excel files, which are found in the working directory
import.xlsx <- data.table(read_xlsx(path = list.files(path = ".", pattern = "\\.xlsx$")))
# Import all sheets in an Excel workbook
sheet_list <- lapply(excel_sheets( list.files(path = ".", pattern = "\\.xlsx$")), read_excel, path= list.files(path = ".", pattern = "\\.xlsx$"))
I tried to write multiple csv files with the same amount of columns and rows into multiple data frames, which can be accessed in a way like this:
file[1] #Outputs the whole content of the first csv file
file[2] #Outputs the whole content of the second csv file
and so on...
I have already saved everything into one data frame, but the neccesary values can't be accessed in such a way:
files = list.files(pattern="*.csv")
myfiles = do.call(rbind, lapply(files, function(x) read.csv(x, stringsAsFactors = FALSE)))
myfiles is one big data frame but I want to access them in the way I explained above.
I am using RStudio 0.9 and my working directory is where all files are located. The csv files are named in that way:
"001.csv"
"002.csv" "003.csv"...
Thank you in advance
Using this script I have created a specific folder for each csv file and then saved all my further analysis results in this folder. The name of the folder and csv file are same. The csv files are stored in the main/master directory.
Now, I have created a csv file in each of these folders which contains a list of all the fitted values.
I would now like to do the following:
Set the working directory to the particular filename
Read fitted values file
Add a row/column stating the name of the site/ unique ID
Add it to the masterfile which is stored in the main directory with a title specifying site name/filename. It can be stacked by rows or by columns it doesn't really matter.
Come to the main directory to pick the next file
Repeat the loop
Using the merge(), rbind(), cbind() combines all the data under one column name. I want to keep all the sites separate for comparison at a later on stage.
This is what I'm using at the moment and I'm lost on how to proceed further.
setwd( "path") # main directory
path <-"path" # need this for convenience while switching back to main directory
# import all files and create a character type array
files <- list.files(path=path, pattern="*.csv")
for(i in seq(1, length(files), by = 1)){
fileName <- read.csv(files[i]) # repeat to set the required working directory
base <- strsplit(files[i], ".csv")[[1]] # getting the filename
setwd(file.path(path, base)) # setting the working directory to the same filename
master <- read.csv(paste(base,"_fiited_values curve.csv"))
# read the fitted value csv file for the site and store it in a list
}
I want to construct a for loop to make one master file with the files in different directories. I do not want to merge all under one column name.
For example, If I have 50 similar csv files and each had two columns of data, I would like to have one csv file which accommodates all of it; but in its original format rather than appending to the existing row/column. So then I will have 100 columns of data.
Please tell me what further information can I provide?
for reading a group of files, from a number of different directories, with pathnames patha pathb pathc:
paths = c('patha','pathb','pathc')
files = unlist(sapply(paths, function(path) list.files(path,pattern = "*.csv", full.names = TRUE)))
listContainingAllFiles = lapply(files, read.csv)
If you want to be really quick about it, you can grab fread from data.table:
library(data.table)
listContainingAllFiles = lapply(files, fread)
Either way this will give you a list of all objects, kept separate. If you want to join them together vertically/horizontally, then:
do.call(rbind, listContainingAllFiles)
do.call(cbind, listContainingAllFiles)
EDIT: NOTE, the latter makes no sense unless your rows actually mean something when they're corresponding. It makes far more sense to just create a field tracking what location the data is from.
if you want to include the names of the files as the method of determining sample location (I don't see where you're getting this info from in your example), then you want to do this as you read in the files, so:
listContainingAllFiles = lapply(files,
function(file) data.frame(filename = file,
read.csv(file)))
then later you can split that column to get your details (Assuming of course you have a standard naming convention)