Create dataframe from list in Rproj

Create dataframe from list in Rproj - r

I have an issue that really bugs me: I've tried to convert to Rproj lately, because I would like to make my data and scripts available at some point. But with one of them, I get an error that, I think, should not occur. Here is the tiny code that gives me so much trouble, the R.proj being available at: https://github.com/fredlm/mockup.
library(readxl)
list <- list.files(path = "data", pattern = "file.*.xls") #List excel files
#Aggregate all excel files
df <- lapply(list, read_excel)
for (i in 1:length(df)){
df[[i]] <- cbind(df[[i]], list[i])
}
df <- do.call("rbind", df)
It gives me the following error right after "df <- lapply(list, read_excel)":
Error in read_fun(path = path, sheet = sheet, limits = limits, shim =
shim, : path[1]="file_1.xls": No such file or directory
Do you know why? When I do it old school, i.e. using 'setwd' before creating 'list', everything works just fine. So it looks like lapply does not know where to look for the file when used in a Rproj, which seems very odd...
What did I miss?
Thanks :)

Thanks to a non-stackoverflower, a solution was found. It's silly, but 'list' was missing a directory, so lapply couldn't aggregate the data. The following works just fine:
list <- paste("data/", list.files(path = "data", pattern = pattern = "file.*.xls"), sep = "") #List excel files

Related

how can I import several .txt files into r using loops for text analysis

I'm working on a bibliometric analysis in r that requires me to work with data obtained from the Web of Science database. I import my data into r using the following code:
file1 <- "data1.txt"
data1 <- convert2df(file = file1, dbsource = "isi", format = "plaintext")
I have about 35 text files that I need to repeat this code for. I would like to do so using loops, something I don't have much experience with. I tried something like this but it did not work:
list_of_items <- c("file1", "file2")
dataset <- vector(mode = "numeric", length = length(list_of_items))
for (i in list_of_items){
dataset[i] <- convert2df(file = list_of_items[i], dbsource = "isi", format = "plaintext")
print(dataset)}
I get the following error:
Error in file(con, "r") : invalid 'description' argument
I'm not very familiar with using loops but I need to finish this work. Any help would be appreciated!

R wants to open file1, but you only have file1.txt. The filenames in the list are incorrect.
I once had that problem as well, maybe the solution works for you too.
maybe put all text files in a folder and read the folder, this might be easier.
FOLDER_PATH <- #your path here just paste it from the explorer bar (Windows). Beware to replace`\` with `\\`
file_paths <- list.files(path = FOLDER_PATH,
full.names = TRUE # so you do not change working dir
) # please put only txt in the folder or read the doc on list.files
# using lapply you do not need a for loop, but this is optional
dataset <- lapply(file_paths, convert2df, dbsource = "isi", format = "plaintext")

CSV to disk frame with multiple CSVs

I'm getting this error when trying to import CSVs using this code:
some.df = csv_to_disk.frame(list.files("some/path"))
Error in split_every_nlines(name_in = normalizePath(file, mustWork =
TRUE), : Expecting a single string value: [type=character;
extent=3].
I got a temporary solution with a for loop that iterated through each of the files and then I rbinded all the disk frames together.
I pulled the code from the ingesting data doc

This seems to be an error triggered by the bigreadr package. I wonder if you have a way to reproduce the chunks.
Or maybe try a different chunk reader,
csv_to_disk.frame(..., chunk_reader ="data.table")
Also, if all fails (since CSV reading is hard), reading them in a loop then append would work as well.
Perhaps you need to specify to only read CSVs? like
list.files("some/path", pattern=".csv", full.names=TRUE)
Otherwise, it normally works,
library(disk.frame)
tmp = tempdir()
sapply(1:10, function(x) {
data.table::fwrite(nycflights13::flights, file.path(tmp, sprintf("tmp%s.csv", x)))
})
library(disk.frame)
setup_disk.frame()
some.df = csv_to_disk.frame(list.files(tmp, pattern = "*.csv", full.names = TRUE))

Loop over all subdirectories and read in a file in each subdirectory

I have an output directory from dbcans with each sample output in a subdirectory. I need to loop over each subdrectory are read into R a file called overview.csv.
for (subdir in list.dirs(recursive = FALSE)){
data = read.csv(file.path(~\\subdir, "overview.csv"))
}
I am unsure how to deal with the changing filepath in read.csv for each subdir. Any help would be appriciated.

Up front, the ~\\subdir (not as a string) is obviously problematic. Since subdir is already a string, using file.path is correct but with just the variable. If you are concerned about relative versus absolute, you can always normalize the paths with normalizePath(list.dirs()), though this does not really change things if you use `
A few things to consider.
Constantly reassigning to the same variable doesn't help, so either you need to assign to an element of a list or something else (e.g., lapply, below). (I also think data as a variable name is problematic. While it works just fine "now", if you ever run part of your script without assigning to data, you will be referencing the function, resulting in possibly confusing errors such as Error in data$a : object of type 'closure' is not subsettable; since a closure is really just a function with its enclosing namespace/environment, this is just saying "you tried to do something to a function".)
I think both pattern= and full.names= might be useful to switch from using list.dirs to list.files, such as
datalist <- list()
# I hope recursion doesn't go too deep here
filelist <- list.files(pattern = "overview.csv", full.names = TRUE, recursive = TRUE)
for (ind in seq_along(filelist)) {
datalist[[ind]] <- read.csv(filelist[ind])
}
# perhaps combine into one frame
data1 <- do.call(rbind, datalist)
Reading in lots of files and doing them same thing to all of them suggests lapply. This is a little more compact version of number 2:
filelist <- list.files(pattern = "overview.csv", recursive = TRUE, full.names = TRUE)
datalist <- lapply(filelist, read.csv)
data1 <- do.call(rbind, datalist)
Note: if you really only need precisely one level of subdirs, you can work around that with:
filelist <- list.files(list.dirs(somepath, recursive = FALSE),
pattern = "overview.csv", full.names = TRUE)
or you can limit to no more than some depth, perhaps with list.dirs.depth.n from https://stackoverflow.com/a/48300309.

I think it should be this.
for (subdir in list.dirs(recursive = FALSE)){
data = read.csv(paste0(subdir, "overview.csv"))
}

Loop through subfolders and extract data from CSV files

I am trying to loop through all the subfolders of my wd, list their names, open 'data.csv' in each of them and extract the second and last value from that csv file.
The df would look like this :
Name_folder_1 2nd value Last value
Name_folder_2 2nd value Last value
Name_folder_3 2nd value Last value
For now, I managed to list the subfolders and each of the file (thanks to this thread: read multiple text files from multiple folders) but I struggle to implement (what I'm guessing should be) a nested loop to read and extract data from the csv files.
parent.folder <- "C:/Users/Desktop/test"
setwd(parent.folder)
sub.folders1 <- list.dirs(parent.folder, recursive = FALSE)
r.scripts <- file.path(sub.folders1)
files.v <- list()
for (j in seq_along(r.scripts)) {
files.v[j] <- dir(r.scripts[j],"data$")
}
Any hints would be greatly appreciated !
EDIT :
I'm trying the solution detailed below but there must be something I'm missing as it runs smoothly but does not produce anything. It might be something very silly, I'm new to R and the learning curve is making me dizzy :p
lapply(files, function(f) {
dat <- fread(f) # faster
dat2 <- c(basename(dirname(f)), head(dat$time, 1), tail(dat$time, 1))
write.csv(dat2, file = "test.csv")
})

Not easy to reproduce but here is my suggestion:
library(data.table)
files <- list.files("PARENTDIR", full.names = T, recursive = T, pattern = ".*.csv")
lapply(files, function(f) {
dat <- fread(f) # faster
# Do whatever, get the subfolder name for example
basename(dirname(f))
})
You can simply look recursivly for all CSV files in your parent directory and still get their corresponding parent folder.

R: used "write.csv" inside a loop but R won't read it back in

I have a short loop I used to create several .csv files. The loop works and the files are created correctly, and I can open them in Microsoft Excel. Everything looks great. But when I try to read these files back into R in another script, R doesn't recognize them.
Do you need to turn off some sort of driver inside the loop as you would if you were creating several png files?
Here is the loop (works). For reference, dtlm is a large dataframe with several columns including "diag", "county" and "date" and "Freq".
single = c("492", "493", "427", "428", "786")
for (q in 1:length(single)) {
xx = xtabs(~date+county, data=dtlm, subset = dtlm$diag == single[q])
xy=as.data.frame(xx)
write.csv(xy, paste(single[q], ".csv", sep=""))
}
Now here is an example of a command that R can't recognize the file with:
dt <- read.csv("C:/Users/myname/Desktop/FreqTables/492.csv")
So weird! I have also tried read.table and that didn't work either, and I didn't find anything helpful in ?read.csv. Any suggestions would be greatly appreciated!

This is how I normally do,
## store the path in some object, here 'dir'
dir <- "[path to the folder where you have your data]"
## then pick up the file names from the 'dir,' change the filter as needed
fnames <- list.files(path = dir, pattern = ".csv")
## read the data into a list
dfn = list()
for (string in fnames){
dfn[[string]]=read.csv(paste(dir,string,sep = ""))
}
You can probably do it in fewer lines, but this works for me.
If you like to merge the files together you can use something like this, note that I bring in the file name from the csv files
## remove the .csv suffix
names(dfn) <- strsplit(names(dfn),".csv")
## merging the data frames together (traditional)
DF <- dfn[[1]]
for ( .df in dfn) {
DF <-merge(DF, .df, by.x="ID", by.y="ID", all.x=T,
suffixes=paste(":", names(dfn), sep = ""))
}
Let me know if this works for you.
Best,
Eric