Using a variable to fetch a document - r

Lets say there are these documents I want to fetch I named in a very clear pattern for example EIS_Chip14_1_pre. Everything is constant except the two numbers in the name which range from 6:18 and 1:4. Im using a for loop:
How do I include both "i" and "n" into my title of the storage vector so I dont overwrite it?
dat <- vector(mode = "list")
i <- numeric(13)
n <- numeric(4)
for(i in 6:18){
for(n in 1:4){
path <- paste0("C:/.../downloads/EIS_Chip",i,"_", n, "_pre.dat")
dat_(i)[[n]] <- read.csv(file = path)
}
}

This isn't necessarily the best solution, but it should do the trick for you.
At the top of your outer loop, initialize a variable, tmp here, with an empty list. Inside your inner loop, store the results into elements of that list using [[. When your inner loop is finished, assign the value in tmp to a variable dynamically named using paste0.
dat <- vector(mode = "list")
i <- numeric(13)
n <- numeric(4)
for(i in 6:18){
tmp = list()
for(n in 1:4){
path <- paste0("C:/.../downloads/EIS_Chip",i,"_", n, "_pre.dat")
list[[n]] <- read.csv(file = path)
}
assign(paste0("dat_",i),tmp)
}
A better way to approach this though could be to use expand.grid to get every combination of i and n, then write a function that reads in a file using those values. Finally, using something like purrr::pmap to iterate over your values. This will return a list with 52 elements, one for each file you loaded.
read_in_file = function(i,n){
path = paste0("C:/.../downloads/EIS_Chip",i,"_", n, "_pre.dat")
return(read.csv(file=path))
}
combinations = expand.grid(i = 6:18, n = 1:4)
dat = purrr::pmap(combinations,read_in_file)
This works because expand.grid returns a data.frame which is a special kind of list. Because we named the inputs appropriately, purrr::pmap processes the list of inputs and assigns them correctly to the function.

Related

Collecting a list of files that doesn't have any data to add to your frame

I'm trying to load multiple excel files all into one data frame. Some of the files don't have any data in the sheet that I'm looking for, so I'm looking to write code that collects the files that do have data, but also tells me which files weren't included because they didn't have any data. The code I have written does tell me which ones don't have data if I simply 'print(i)' inside the no part of my ifelse statement. However, as soon as I try to do anything else instead of printing, it seems to just ignore me! It's infuriating. How can I collect the names of the files that haven't contributed towards the total data frame?
this works fine:
library(readxl)
files <- list.files(path="./sfiles", pattern = "*.xls", full.names = T)
alldiasendcgmlist <- lapply(files,function(i){
ifelse(nrow(i)==NULL, NULL, i$name<-i)
x= read_excel(i,sheet=2,skip=4)
ifelse(nrow(x)>1, x$ID <- i, print(i))
x
})
but as soon as I want to collect these printings inside a vector, the vector just continues to remain empty:
library(readxl)
files <- list.files(path="./sfiles", pattern = "*.xls", full.names = T)
vectornodata <- character(0)
alldiasendcgmlist <- lapply(files,function(i){
ifelse(nrow(i)==NULL, NULL, i$name<-i)
x= read_excel(i,sheet=2,skip=4)
ifelse(nrow(x)>1, x$ID <- i, nodata <- append(vectornodata, i))
x
})
Help!
Consider building your list of data frames without any logical conditions. Then afterwards run Filter to separate empty and non-empty elements. Negate (another high-order function) is used to return the opposite (i.e., NULL elements without length).
# BUILD NAMED LIST OF EMPTY AND NON-EMPTY DFs
alldiasendcgmlist <- lapply(files, function(i) read_excel(i,sheet=2,skip=4))
alldiasendcgmlist <- setNames(alldiasendcgmlistl), gsub(".xls", "", basename(files)))
# EXTRACT ACTUAL DFs FROM FULL LIST
full_xlfiles <- Filter(length, alldiasendcgmlist)
# EXTRACT NULL ELEMENTS FROM FULL LIST
empty_xlfiles <- Filter(function(df) Negate(length)(df), alldiasendcgmlist)
empty_xlfiles <- names(empty_xlfiles)
I will use my own example to show what one can do. Simply make your function return a list and then use one of its elements to filter our empty data frames:
input <- c(1,2,3,4,5,6)
func <- function(x){
list("square"=x^2, "even"= ifelse(x %% 2 == 0, TRUE, FALSE))
}
res <- lapply(1:6, func) # returns a list of lists
even_numbers <- input[sapply(res, function(x) x[[2]])] # use 2nd element to filter
In your case, you can use a boolean vector to identify files that are empty.

Usage of iteration element within name pattern

I would like to iterate with a for loop trough a list applying the following function to all list elements:
new_x = do.call("rbind",mget(ls(pattern = "^x.*")))
where x is a certain name pattern of a dataframe.
How do I iterate through a list where the list element i is the name pattern for my function?
The goal would be to get something like this:
for (i in filenames){
i = do.call("rbind",mget(ls(pattern = "^i.*")))
}
So my question is basically how to use i within a name pattern, so I'm able to use the loop to rbind togerther seperate parts of a dataframe xpart1, xpart2, xpart3 to x; ypart1, ypart2, ypart3 to y and so on....
Thank you in advance!
If we are using a for loop, then an option
v1 <- ls(pattern = "^x.*")
lst1 <- vector('list', length(v1))
for(i in seq_along(v1)){
lst1[[i]] <- get(v1[i])
}
do.call(rbind, lst1)
Or if we need to use i to create the pattern, we can use paste
lst1 <- vector('list', length(filenames))
names(lst1) <- filenames
for(i in filenames){
lst1[[i]] <- get(ls(pattern = paste0(i, ".*")))
}
do.call(rbind, lst1)
NOTE: get returns the value of a single object, whereas mget returns more than one object in a list. If we use for loop, we assume that it is returning one object within the loop and get is only needed
Based on the OP's clarification, we can also use mget
xs <- paste0("xpart", 1:100)
ys <- paste0("ypart", 1:100)
xsdat <- do.call(rbind, mget(xs))
ysdat <- do.call(rbind, mget(ys))

How to read multiple data sets with specific prefix pattern into global environment?

I want to read a bunch of data sets (e. g. *.dta) with specific prefix and increasing number pattern into the global environment, and combine them in a list. (In this special case they're all of same dimension.)
Traditionally I code:
library(foreign) # for reading *.dta files
df_1 <- read.dta("df_1.dta")
df_2 <- read.dta("df_2.dta")
...
df_n <- read.dta("df_n.dta") # note: consider 'n' being an arbitrary defined integer
df_lst <- mget(ls(pattern = "df[0-9]")) # combine dfs into list
Now I want to accomplish this in one brief step.
I attempted this loop which won't work - most likely due to defining a variable within quotation marks:
# initialize list
df_lst <- list()
# read and combine dfs into list
i <- 0
while(i < n) {
i = i + 1
df_[i] = read.dta("df_[i].dta")
c(df_lst, df[i])
}
Moreover I'd rather prefer a function than a loop.
How can I reach my goal?
Try using rio:
rio::import_list(dir(pattern = "df[0-9]"))
This will return a list of the data frames.
(Generally speaking, there's no need to import data files into the global environment before putting them into a list.)
Full disclosure: I am the maintainer of rio.
for the loop, use paste to recreate the name:
# initialize list
df_lst <- list()
# read and combine dfs into list
i <- 0
while(i < n) {
i = i + 1
df_[i] = read.dta(paste("df_[",i,"].dta",sep=''))
c(df_lst, df[i])
}
and define 'n' (I assume you did it, but does not appear defined in the text)
cheers
Fer
Using assign() and do.call("list",...), you can do this with a function:
# list of filenames matching pattern
fnames <- list.files(pattern = "df_[0-9].dta")
# function to read, assign to global env, and return data
dtafx <- function(i){
df <- foreign::read.dta(fnames[i])
assign(gsub(".dta", "", fnames[i]), df, envir = .GlobalEnv)
return(df)
}
# apply function to filenames, combining dfs into list
df_lst <- do.call("list", sapply(seq_along(fnames), dtafx, simplify = F))

Add element to a component of a list in R dynamically

How can I add an element to a component of a list in R dynamically?
For example, supose I have a list called "mylist" with two components, one called "number" and the other called "numberPowerTwo", as following, and need to add 1 element in each loop cycle to each component of my mylist list.
N <- 10
mylist <- list(number = c(), numberPowerTwo = c())
for (i in 1:N){
n <- i
n2 <- i * i
mylist[i]$number <- n
mylist[i]$numberPowerTwo <- n2
}
I don't know if this code work because I don't know also how to print the result in a file (using write or write.table).
Try code below:
N <- 10
mylist <- list(number = c(), numberPowerTwo = c())
for (i in 1:N){
n <- i
n2 <- i * i
mylist[['number']][i] <- n
mylist$numberPowerTwo[i] <- n2
}
To output as a file, it depends on format you want. The result is a list, I would recommend RData:
save(mylist, file = 'mylist.RData')
then you can simply load('mylist.RData').
Or you may want an excel file, if you have package xlsx, you can output each element of mylist as one sheet and all together as one file. Otherwise, you can just write.csv for each element of mylist.
# one file with multiple sheets
for (i in 1:length(mylist)) {
write.xlsx(mylist[[i]], 'mylist.xlsx', append = T, row.names = F,
sheetName = names(mylist)[i])
}
# separate files for each element
for (i in 1:length(mylist)) {
write.csv(mylist[[i]], paste0(names(mylist)[i], '.csv'))
}

How to create a vector or list of tables in R

I'm trying to scrape a bunch of tables from a website. I would like to be able to store them all in one or more variables - basically for easy access.
The below code is what I have so far, I'm using the XML Package which I have found works well on a single table but can't get this to work for more than one table.
i <- 1
N <- 3
DSFL1<- 'http://website/results/2012_aussies_thu/results/'
DSFL2 <- '.html'
SportHTML <- vector(length=N)
vectorOfTables <- vector(length=N)
for ( i in i:N) {
DSVL <- i
SportHTML[i] <- paste(DSFL1,DSVL,DSFL2, sep="")
Sport.table <- readHTMLTable(SportHTML[i], header=T, which=3,stringsAsFactors=F)
vectorOfTables[1] <- Sport.table
i <- i + 1
}
Any help would be appreciated.
Your tables being objects of length > 1 (and possibly differing lengths), they must go into a list. So you should do:
vectorOfTables <- vector(mode = "list", length = N)
and when you assign inside the loop, do:
vectorOfTables[[i]] <- Sport.table
However, you can avoid a for loop and create a list using lapply:
SportHTML <- paste0(DSFL1, 1:N, DSFL2)
ListOfTables <- lapply(SportHTML, readHTMLTable, header = TRUE,
which = 3, stringsAsFactors = FALSE)
and it is also a lot more concise as you can see.

Resources