R: How to store a list within a list? - r

I am trying to parse data from tables at baseball-reference.com. I want to do so for multiple teams and multiple years. The code below is used to capture each team season link.
library(XML)
#Will use for loop to fill in the rest of the link
link_base <- "http://www.baseball-reference.com/teams/"
#List of teams
teams <- c("CHC", "STL")
#Year
season <- 2000:2002
#End of link
end_link <- "-schedule-scores.shtml"
links <- list()
for(i in 1:length(teams)){
links[[i]] <- NaN*seq(length(teams))
for(j in 1:length(season)){
links[[i]][j] <- paste0(link_base, teams[i], "/", season[j], end_link)
}
}
This results in:
> links
[[1]]
[1] "http://www.baseball-reference.com/teams/CHC/2000-schedule-scores.shtml"
[2] "http://www.baseball-reference.com/teams/CHC/2001-schedule-scores.shtml"
[3] "http://www.baseball-reference.com/teams/CHC/2002-schedule-scores.shtml"
[[2]]
[1] "http://www.baseball-reference.com/teams/STL/2000-schedule-scores.shtml"
[2] "http://www.baseball-reference.com/teams/STL/2001-schedule-scores.shtml"
[3] "http://www.baseball-reference.com/teams/STL/2002-schedule-scores.shtml"
Now, for each element in the list, I would like to use the readHTMLTable function so that I can parse information. I have tried doing so:
a <- list()
for(i in 1:length(teams)){
a[[i]] <- NaN*seq(length(teams))
for(j in 1:length(season)){
a[[i]][j] <- readHTMLTable(links[[i]][j])
}
}
The readHTMLTable returns a list of length 6:
x <- readHTMLTable(links[[1]][1])
> length(x)
[1] 6
I would like the 1st element of list a to store to the output from the readHTMLTable function for the "CHC" links. I would like the 2nd element of list a to store the output from the readHTMLTable function for the "STL" links. Thus, the list a would comprise of 2 elements. Both elements would comprise of 3 lists comprising of 6 elements.

I think this works
lst <- lapply(links, function(l) lapply(l, function(x) readHTMLTable(x)))
length(lst)
# [1] 2
lengths(lst)
# [1] 3 3
The first sublist should have the CHC, the second the STL.

Related

Can't add error cases to list R using tryCatch()

I am trying to call an API that throttles data use for requests that are too large. I have broken up the data to respect all the terms of use. What I want to do is call the API on what I think are reasonable chunks, however if they throw an error, get the offending chunk back and then break this up further. I can generate a list of viable data, but not the list of failed requests. Here is a mock up of what I am trying:
#create the data
a <- 1
b <- 2
c <- 'three'
d <- 4
ls <- list(a, b, c, d)
#create a function that will throw one error
add <- function(x){
output <- x + 1
return(output)
}
#Call a for loop that will populate two lists - one of viable calls and one of error cases
new_ls <- list()
errors_ls <- list()
for (i in 1:length(ls)) {
tryCatch({new_ls <- append(new_ls, list(add(ls[[i]])))}
, error = function(e) {errors_ls <- append(errors_ls, list(ls[[i]]))})
}
print(new_ls)
print(errors_ls)
Gives:
> print(new_ls)
[[1]]
[1] 2
[[2]]
[1] 3
[[3]]
[1] 5
> print(errors_ls)
list()
Notably errors_ls is empty. What I was expecting is:
[[1]]
[1] "three"
I appreciate that I should be doing this with apply. The API call is however really messy (I also artificially limit the frequency of calls, so speed isn't an issue), so I find it easier to iterate over the API calls in a for loop. I have tried following the documentation on tryCatch, including playing with the structure of the tryCatch({}) syntax, based on other posts on this, but I can't get it right.
There are couple of ways to get the output. In the OP's code, the errors_ls is getting assigned within the function env, and it is not updating the object 'errors_ls' in the global env. We can use <<- instead of <- to make the change
new_ls <- list()
errors_ls <- list()
for (i in 1:length(ls)) {
tryCatch({new_ls <- append(new_ls, list(add(ls[[i]])))}
, error = function(e) {
errors_ls <<- append(errors_ls, list(ls[[i]]))})
}
-checking
> new_ls
[[1]]
[1] 2
[[2]]
[1] 3
[[3]]
[1] 5
> errors_ls
[[1]]
[1] "three"
Or another option is to make changes in the loop to do the assignment outside
new_ls <- list()
errors_ls <- list()
for (i in 1:length(ls)) {
tmp <- tryCatch({list(add(ls[[i]]))}
, error = function(e) {return(list(ls[[i]]))})
if(is.numeric(unlist(tmp)))
new_ls <- append(new_ls, tmp)
else errors_ls <- append(errors_ls, tmp)
}
-checking
> new_ls
[[1]]
[1] 2
[[2]]
[1] 3
[[3]]
[1] 5
> errors_ls
[[1]]
[1] "three"

Nested list assignment R

I have a list of the following type
categories = list(
c("Women","Clothing", "Jeans"),
c("Women","Clothing", "Sweaters"),
c("Men","Accessories", "Belts"),
c("Women", "Accessories", "Jewelry" ))
I want to parse this list and create a list of lists to export in JSON and it should have the following structure:
Women={
Clothing= {
Jeans{},
Sweaters{}
},
accesories={
Jewleery{}
}
},
Men ={
Accessires={
Belts={}
}
So it should go over each element which is a char vector contained in the list and check if there is such element in the final list, if there isn't it should append it. It should append the element at the proper level. For example if Clothing is second element to Woman, it should append to the Women list of the final list. Or if Sweaters is thrid element to Women.Clothing it should apppend Clothing list of the Women list of the final list.
If the element exists at the given level already it should not append, instead it should go to next element in the char vector.
In the char vectors of the input lsit, the first element is always level 1 the second level 2 the third level 3 etc..
It should be done recursively, I tried few times but I have no idea how to assign to a nested list, specifically i need to do nested assigns.
I made the data into a matrix, transposed, then a dataframe:
x <- data.frame(t(vapply(categories, identity, character(3))), stringsAsFactors = F)
Then split, and lapply. You could do this recursively if you have more than 3 levels:
lapply(split(x, x$X1), function(df) {
lapply(split(df, df$X2), function(df) {
lapply(split(df, df$X3), function(x) list())
})
})
If you are looking for a recursive solution, then the following may help you:
output the full directory trajectory within a string at the end
## construct a data frame from list
df <- data.frame(matrix(unlist(categories),nrow = length(categories),byrow = T),stringsAsFactors = F)
## recursion function that makes nested list
f <- function(df, k=1) {
if (k == ncol(df)) return(lapply(split(df,df[,k]), toString)) ##
return(lapply(split(df,df[,k]), function(df) f(df, k+1)))
}
The nested list output looks as below
> f(df)
$Men
$Men$Accessories
$Men$Accessories$Belts
[1] "Men, Accessories, Belts"
$Women
$Women$Accessories
$Women$Accessories$Jewelry
[1] "Women, Accessories, Jewelry"
$Women$Clothing
$Women$Clothing$Jeans
[1] "Women, Clothing, Jeans"
$Women$Clothing$Sweaters
[1] "Women, Clothing, Sweaters"
output empty lists at the end
f <- function(df, k=1) {
if (k == ncol(df)) return(lapply(split(df,df[,k]), function(v) list()))
return(lapply(split(df,df[,k]), function(df) f(df, k+1)))
}
which gives:
> f(df)
$Men
$Men$Accessories
$Men$Accessories$Belts
list()
$Women
$Women$Accessories
$Women$Accessories$Jewelry
list()
$Women$Clothing
$Women$Clothing$Jeans
list()
$Women$Clothing$Sweaters
list()

trying to get a proper names(list) output

I'm trying to split a 2 level deep list of characters into a 1 level list using a suffix.
More precisely, I have a list of genes, each containing 6 lists of probes corresponding to 6 bins. The architecture looks like :
feat_indexed_probes_bin$HSPB6$bin1
[1] "cg14513218" "cg22891287" "cg20713852" "cg04719839" "cg27580050" "cg18139462" "cg02956481" "cg26608795" "cg15660498" "cg25654926" "cg04878216"
I'm trying to get a list "bins_indexed_probes" with the following architecture :
bins_indexed_probes$HSPB6_bin6 containing the same probes so I can pass it to my map-reducing function.
I tried many solutions such as melt(), for loop, etc but I can't figure how to perform a double nested loop ( on genes and on bins) and get a list output with only 1 level depth.
For the moment, my func to do so is the following :
create_map <- function(indexes = feat_indexed_probes_bin, binlist = c("bin1", "bin2", "bin3", "bin4", "bin5", "bin6"), genes = features) {
map <- list()
ret <- lapply(binlist, function(bin) {
lapply(rownames(features), function(gene) {
map[[paste(gene, "_", bin, sep = "")]] <- feat_indexed_probes_bin[[gene]][[bin]]
tmp_names <<- paste(gene, "_", bin, sep = "")
return(map)
})
names(map) <- tmp_names
rm(tmp_names)
})
return(ret)
}
it returns:
[[6]][[374]]
GDF10_bin6
"cg13565300"
[[6]][[375]]
NULL
[[6]][[376]]
[[6]][[376]]$HNF1B_bin6
[1] "cg03433642" "cg09679923" "cg17652435" "cg03348978" "cg02435495" "cg02701059" "cg05110178" "cg11862993" "cg09463047"
[[6]][[377]]
[[6]][[377]]$GPIHBP1_bin6
[1] "cg01953797" "cg00152340"
instead, I would expect something like
$GPIHBP1_bin1
"cg...." "cg...."
...
$GPIHBP1_bin6
"someotherprobe"
$someothergene_bin1
"probe" "probe"
...
I hope I'm being clear, and since this is my first time asking question, I already apologise if I didn't follow the stackoverflow protocol.
Thank you already for reading me
Consider a nested lapply with extract, [[, and setNames calls, all wrapped in do.call using c to bind return elements together.
bins_indexed_probes <- do.call(c,
lapply(1:6, function(i)
setNames(lapply(feat_indexed_probes_bin, `[[`, i),
paste0(names(feat_indexed_probes_bin), "_bin", i))
)
)
# RE-ORDER ELEMENTS BY NAME
bins_indexed_probes <- bins_indexed_probes[sort(names(bins_indexed_probes))]
Rextester Demo

how to add value to existing variable from inside a loop?

I want to add a computed value to an existing vector from within a loop in which the wanted vector is called from within the loop . that is im looking for some function that is similar to assign() function but that will enable me to add values to an existing variables and not creating new variables.
example:
say I have 3 variabels :
sp=3
for(i in 1:sp){
name<-paste("sp",i,sep="")
assign(name,rnorm(5))
}
and now I want to access the last value in each of the variabels, double it and add the resault to the vector:
for(i in 1:sp){
name<-paste("sp",i,sep="")
name[6]<-name[5]*2
}
the problem here is that "name" is a string, how can R identify it as a veriable name and access it?
What you are asking for is something like this:
get(name)
In your code it would like this:
v <- 1:10
var <- "v"
for (i in v){
tmp <- get(var)
tmp[6] <- tmp[5]*2
assign(var, tmp)
}
# [1] 1 2 3 4 5 10 7 8 9 10
Does that help you in any way?
However, I agree with the other answer, that lists and the lapply/sapply-functions are better suited!
This is how you can do this with a list:
sp=3
mylist <- vector(mode = "list", length = sp) #initialize a list
names(mylist) <- paste0("sp",seq_len(sp)) #set the names
for(i in 1:sp){
mylist[[i]] <- rnorm(5)
}
for(i in 1:sp){
mylist[[i]] <- c(mylist[[i]], mylist[[i]][5] * 2)
}
mylist
#$sp1
#[1] 0.6974563 0.7714190 1.1980534 0.6011610 -1.5884306 -3.1768611
#
#$sp2
#[1] -0.2276942 0.2982770 0.5504381 -0.2096708 -1.9199551 -3.8399102
#
#$sp3
#[1] 0.235280995 0.276813498 0.002567075 -0.774551774 0.766898045 1.533796089
You can then access the list elements as described in help("["), i.e., mylist$sp1, mylist[["sp1"]], etc.
Of course, this is still very inefficient code and it could be improved a lot. E.g., since all three variables are of same type and length, they really should be combined into a matrix, which could be filled with one call to rnorm and which would also allow doing the second operation with vectorized operations.
#Roland is absolutely right and you absolutely should use a list for this type of problem. It's cleaner and easier to work with. Here's another way of working with what you have (It can be easily generalised):
sp <- replicate(3, rnorm(5), simplify=FALSE)
names(sp) <- paste0("sp", 1:3)
sp
#$sp1
#[1] -0.3723205 1.2199743 0.1226524 0.7287469 -0.8670466
#
#$sp2
#[1] -0.5458811 -0.3276503 -1.3031100 1.3064743 -0.7533023
#
#$sp3
#[1] 1.2683564 0.9419726 -0.5925012 -1.2034788 -0.6613149
newsp <- lapply(sp, function(x){x[6] <- x[5]*2; x})
newsp
#$sp1
#[1] -0.3723205 1.2199743 0.1226524 0.7287469 -0.8670466 -1.7340933
#
#$sp2
#[1] -0.5458811 -0.3276503 -1.3031100 1.3064743 -0.7533023 -1.5066046
#
#$sp3
#[1] 1.2683564 0.9419726 -0.5925012 -1.2034788 -0.6613149 -1.3226297
EDIT: If you are truly, sincerely dedicated to doing this despite being recommended otherwise, you can do it this way:
for(i in 1:sp){
name<-paste("sp",i,sep="")
assign(name, `[<-`(get(name), 6, `[`(get(name), 5) * 2))
}

Adding data frames as list elements (using for loop)

I have in my environment a series of data frames called EOG. There is one for each year between 2006 and 2012. Like, EOG2006, EOG2007...EOG2012. I would like to add them as elements of a list.
First, I am trying to know if this is possible. I read the official R guide and a couple of R programming manuals but I didn't find explicit examples about that.
Second, I would like to do this using a for loop. Unfortunately, the code I used to do the job is wrong and I am going crazy to fix it.
for (j in 2006:2012){
z<-j
sEOG<-paste("EOG", z, sep="")
dEOG<-get(paste("EOG", z, sep=""))
lsEOG<-list()
lsEOG[[sEOG]]<-dEOG
}
This returns a list with one single element. Where is the mistake?
You keep reinitializing the list inside the loop. You need to move lsEOG<-list() outside the for loop.
lsEOG<-list()
for (j in 2006:2012){
z <- j
sEOG <- paste("EOG", z, sep="")
dEOG <- get(paste("EOG", z, sep=""))
lsEOG[[sEOG]] <-dEOG
}
Also, you can use j directly in the paste functions:
sEOG <- paste("EOG", j, sep="")
I had the same question, but felt that the OP's initial code was a bit opaque for R beginners. So, here is perhaps a bit clearer example of how to create data frames in a loop and add them to a list which I just now figured out by playing around in the R shell:
> dfList <- list() ## create empty list
>
> for ( i in 1:5 ) {
+ x <- rnorm( 4 )
+ y <- sin( x )
+ dfList[[i]] <- data.frame( x, y ) ## create and add new data frame
+ }
>
> length( dfList ) ## 5 data frames in list
[1] 5
>
> dfList[[1]] ## print 1st data frame
x y
1 -0.3782376 -0.3692832
2 -1.3581489 -0.9774756
3 1.2175467 0.9382535
4 -0.7544750 -0.6849062
>
> dfList[[2]] ## print 2nd data frame
x y
1 -0.1211670 -0.1208707
2 -1.5318212 -0.9992406
3 0.8790863 0.7701564
4 1.4014124 0.9856888
>
> dfList[[2]][4,2] ## in 2nd data frame, print element in row 4 column 2
[1] 0.9856888
>
For R beginners like me, note that double brackets are required to access the ith data frame. Basically, double brackets are used for lists while single brackets are used for vectors.
If the data frames are saved as an object you can find them by apropos("EOG", ignore.case=FALSE) and them with a loop store them in the list:
list.EOG<- apropos("EOG", ignore.case=FALSE) #Find the objects with case sensitive
lsEOG<-NULL #Creates the object to full fill in the list
for (j in 1:length(list.EOG)){
lsEOG[i]<-get(list.EOG[i]) #Add the data.frame to each element of the list
}
to add the name of each one to the list you can use:
names(lsEOG, "names")<-list.EOG

Resources