change specific component of list in R - r

I try to change a specific component L[[2]] in a list L in R. Unfortunately, the other component L[[1]] in the list changes as well. Below is a minimal working example:
# initialize list L:
L <- matrix(list( matrix(0,1,2) ), 2, 1)
# show that L[[1]] = c(0,0):
print(L[[1]][1,])
#>[1] 0 0
# only change L[[2]] into c(1,1):
L[[2]][1,] <- 1
# however L[[1]] has changed too to c(1,1):
print(L[[1]][1,])
#>[1] 1 1
(Maybe this is a basic question as I am not an expert in R.)
In response to Akrun's comment:
The change in L[[1]] occurs when I run the complete code in one go in the editor of R-studio. Somehow the change in L[1] does not occur when I run the four commands at the command line one at a time. Seems very strange to me.

There are multiple ways to tackle this. The structure is a bit convoluted to make the changes as we do in regular list. It is a list with dimension attributes given by matrix and is complicated by having a list of matrices
1) The list object is created within a matrix and it is a list of matrices. So, we could assign the values based on subsetting the elements of the matrix first and then extract the list component to assign it to 1
L[2][[1]][] <- 1
print(L[[1]][1,])
#[1] 0 0
2) Another option is to create a temporary list object and assign the values on the list, update the matrix/list later with the changed list
l1 <- lapply(L, I) # I stands for identity.
l1[[2]][] <- 1
L[] <- l1
print(L[[1]][1,])
#[1] 0 0

Related

How to get access to "str_match_all" results in R?

Just used "str_match_all" as follows:
a <- str_match_all(dd, '\\d+(\\w+)')`
and obtained the following:
#[[1]]
# [,1] [,2]
#[1,] "12hours" "hours"
#[2,] "23days" "days"
How can I access each string?
I have tried a[1][,1] to access the first column for example but I get an error saying the number of dimensions is not correct.
If I understand your problem correctly, you are having trouble accessing each individual element.
I think you have to remember that your output is a list and the element in that list is a matrix. Therefore to access each individual element you first have to invoke which element of the list you are interested in and then the row and then the column.
a[[1]][1,2]
So in your case, this will access the first element in your list (looks like you only have 1), and then the 1st row and then the 2nd column so it will give you, "hours".
If however, you're more used to working with dataframes as I assume that is your end goal, I would approach this programmatically as follows:
Taking an example from the str_match_all() documentation
# Creating reproduceable example
strings <- c("Home: 219 733 8965. Work: 229-293-8753 ",
"banana pear apple", "595 794 7569 / 387 287 6718")
phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"
Your goal is to convert the matrix in to a data frame. Which you do as follows
as.data.frame(a[[1]])
For future reference, lets say your output is more than 1 element as is the case in this example, you should approach the solution like so:
# Make a function that accepts your list variable.
# Copy and paste the step before and then add an extra step using dplyr::bind_rows()
output_to_df <- function(x){
a <- as.data.frame(x)
bind_rows(a)
}
# Using this function we will then use map_dfr()
# so that we can apply our premade function on all elements
# of our list no matter how many elements it contains
str_output <- map_df(a, output_to_df)
You can now reuse your output_to_df() function as many times as you need.

Problem deleting elements with 2 values in R list

I am trying to format a list such that I would have one word per value(I imported it from a very poor quality csv, and can't do much about improving the csv). I currently am trying to make it so that every element only has one value, however, the code I am currently using is not doing this, although I am not getting error messages.
Here is the code I am currently using:
Terms <- [] #9020 elements with lengths 1, 2, and 3
for (x in 1:length(Terms)){
if (Terms[[x]] %>% is.list()){
term <-Terms[[x]]
length(term) <- 1
Terms[[x]]<-term
}#should return list of same size, but only with elements of length 1
Any help figuring out what I could use to make it so that I can delete any second variables would be appreciated.
An option would be to create a logical condition with lengths and then use that for subsetting the list
lst2 <- lst1[lengths(lst1) == 1]
If the intention is to get only the first element
lst2 <- lapply(lst1, `[`, 1)
NOTE: Assuming the list elements are vectorss

for loop to print dimensions of files in a list

temp <- list.files(pattern = '*.xlsx')
list <- lapply(temp,read_excel)
The above list has 12 files. I want to check the dimensions of each of those 12 files, i.e. number of rows and columns in each file.
for(i in length(list)){
print(dim(list[[i]]))
}
[1] 12533 49
The above for loop only gives me output of the last file, whereas I need the for loop to give me dimension of each 12 files.
Could someone please let me know what needs to be done.
You need to change your for loop to
for(i in 1:length(list)){
print(dim(list[[i]]))
}
since 1:length(list) would create a sequence of numbers to loop over whereas length(list) would only give the length of the list (which is 12 in your case) and it would not generate the sequence.
Moreover, you don't even need a loop to do this. You could just use lapply and it would give you dimensions of the list.
lapply(list, dim)
For example,
list_df <- list(mtcars, iris)
lapply(list_df, dim)
#[[1]]
#[1] 32 11
#[[2]]
#[1] 150 5
On a side note, it is not a good practice to name your list as list since it is an internal R function.

Turn Multiple Uneven Nested Lists Into A DataFrame in R

I am trying to get to grips with R and as an experiment I thought that I would try to play around with some cricket data. In its rawest format it is a yaml file, which I used the yaml R package to turn into an R object.
However, I now have a number of nested lists of uneven length that I want to try and turn into a data frame in R. I have tried a few methods such as writing some loops to parse the data and some of the functions in the tidyr package. However, I can't seem to get it to work nicely.
I wondered if people knew of the best way to tackle this? Replicating the data structure would be difficult here, because the complexity comes in the multiple nested lists and the unevenness of their length (which would make for a very long code block. However, you can find the raw yaml data here: http://cricsheet.org/downloads/ (I was using the ODI internationals).
Thanks in advance!
Update
I have tried this:
1)Using tidyr - seperate
d <- unnest(balls)
Name <- c("Batsman","Bowler","NonStriker","RunsBatsman","RunsExtras","RunsTotal","WicketFielder","WicketKind","PlayerOut")
a <- separate(d, x, Name, sep = ",",extra = "drop")
Which basically uses the tidyr package returns a single column dataframe that I then try to separate. However, the problem here is that in the middle there is sometimes extras variables that appear in some rows and not others, thereby throwing off the separation.
2) Creating vectors
ballsVector <- unlist(balls[[2]],use.names = FALSE)
names_vector <- c("Batsman","Bowler","NonStriker","RunsBatsman","RunsExtras","RunsTotal")
names(ballsVector) <- c(names_vector)
ballsMatrix <- matrix(ballsVector, nrow = 1, byrow = TRUE)
colnames(ballsMatrix) <- names_vector
The problem here is that the resulting vectors are uneven in length and therefore cant be combined into a data frame. It will also suffer from the issue that there are sporadic variables in the middle of the dataset (as above).
Caveat: not complete answer; attempt to arrange the innings data
plyr::rbind.fill may offer a solution to binding rows with a different number of columns.
I dont use tidyr but below is some rough code to get the innings data into a data.frame. You could then loop this through all the yaml files in the directory.
# Download and unzip data
download.file("http://cricsheet.org/downloads/odis.zip", temp<- tempfile())
tmp <- unzip(temp)
# Create lists - use first game
library(yaml)
raw_dat <- yaml.load_file(tmp[[2]])
#names(raw_dat)
# Function to process list into dataframe
p_fun <- function(X) {
team = X[[1]][["team"]]
# function to process each list subelement that represents each throw
fn <- function(...) {
tmp = unlist(...)
tmp = data.frame(ball=gsub("[^0-9]", "", names(tmp))[1], t(tmp))
colnames(tmp) = gsub("[0-9]", "", colnames(tmp))
tmp
}
# loop over all throws
lst = lapply(X[[1]][["deliveries"]], fn )
cbind(team, plyr:::rbind.fill(lst))
}
# Loop over each innings
dat <- plyr::rbind.fill(lapply(raw_dat$innings, p_fun))
Some explanation
The list structure and subsetting it. To get an idea of the structure of the list use
str(raw_dat) # but this gives a really long list of data
You can truncate this, to make it a bit more useful
str(raw_dat, 3)
length(raw_dat)
So there are three main list elements - meta, info, and innings. You can also see this with
names(raw_dat)
To access the meta data, you can use
raw_dat$meta
#or using `[[1]]` to access the first element of the list (see ?'[[')
raw_dat[[1]]
#and get sub-elements by either
raw_dat$meta$data_version
raw_dat[[1]][[1]] # you can also use the names of the list elements eg [[`data_version`]]
The main data is in the inningselement.
str(raw_dat$innings, 3)
Look at the names in the list element
lapply(raw_dat$innings, names)
lapply(raw_dat$innings[[1]], names)
There are two list elements, each with sub-elements. You can access these as
raw_dat$innings[[1]][[1]][["team"]] # raw_dat$innings[[1]][["1st innings"]][["team"]]
raw_dat$innings[[2]][[1]][["team"]] # raw_dat$innings[[2]][["2nd innings"]][["team"]]
The above function parsed the deliveries data in raw_dat$innings. To see what it does, work through it from the inside.
Use one record to see how it works
(note the lapply, with p_fun, looped over raw_dat$innings[[1]] and raw_dat$innings[[2]] ; so this is the outer loop, and the lapply, with fn, loops through the deliveries, within an innings ; the inner loop)
X <- raw_dat$innings[[1]]
tmp <- X[[1]][["deliveries"]][[1]]
tmp
#create a named vector
tmp <- unlist(tmp)
tmp
# 0.1.batsman 0.1.bowler 0.1.non_striker 0.1.runs.batsman 0.1.runs.extras 0.1.runs.total
# "IR Bell" "DW Steyn" "MJ Prior" "0" "0" "0"
To use rbind.fill, the elements to bind together need to be data.frames. We also want to remove the leading numbers /
deliveries from the names, as otherwise we will have lots of uniquely names columns
# this regex removes all non-numeric characters from the string
# you could then split this number into over and delivery
gsub("[^0-9]", "", names(tmp))
# this regex removes all numeric characters from the string -
# allowing consistent names across all the balls / deliveries
# (if i was better at regex I would have also removed the leading dots)
gsub("[0-9]", "", names(tmp))
So for the first delivery in the first innings we have
tmp = data.frame(ball=gsub("[^0-9]", "", names(tmp))[1], t(tmp))
colnames(tmp) = gsub("[0-9]", "", colnames(tmp))
tmp
# ball X..batsman X..bowler X..non_striker X..runs.batsman X..runs.extras X..runs.total
# 1 01 IR Bell DW Steyn MJ Prior 0 0 0
To see how the lapply works, use the first three deliveries (you will need to run the function fn in your workspace)
lst = lapply(X[[1]][["deliveries"]][1:3], fn )
lst
# [[1]]
# ball X..batsman X..bowler X..non_striker X..runs.batsman X..runs.extras X..runs.total
# 1 01 IR Bell DW Steyn MJ Prior 0 0 0
#
# [[2]]
# ball X..batsman X..bowler X..non_striker X..runs.batsman X..runs.extras X..runs.total
# 1 02 IR Bell DW Steyn MJ Prior 0 0 0
#
# [[3]]
# ball X..batsman X..bowler X..non_striker X..runs.batsman X..runs.extras X..runs.total
# 1 03 IR Bell DW Steyn MJ Prior 3 0 3
So we end up with a list element for every delivery within an innings. We then use rbind.fill to create one data.frame.
If I was going to try and parse every yaml file I would use a loop.
Use the first three records as an example, and also add the match date.
tmp <- unzip(temp)[2:4]
all_raw_dat <- vector("list", length=length(tmp))
for(i in seq_along(tmp)) {
d = yaml.load_file(tmp[i])
all_raw_dat[[i]] <- cbind(date=d$info$date, plyr::rbind.fill(lapply(d$innings, p_fun)))
}
Then use rbind.fill.
Q1. from comments
A small example with rbind.fill
a <- data.frame(x=1, y=2)
b <- data.frame(x=2, z=1)
rbind(a,b) # error as names dont match
plyr::rbind.fill(a, b)
rbind.fill doesnt go back and add/update rows with the extra columns, where needed (a still doesnt have column z), Think of it as creating an empty dataframe with the number of columns equal to the number of unique columns found in the list of dataframes - unique(c(names(a), names(b))). The values are then filled in each row where possible, and left missing (NA) otherwise..

How to create a list of variables using a for loop in R

I have a list of 100 variables lets say v1 to v100.
I want to create a list that holds each of these variable in a seprate column. What i have done is
s=list()
for (i in 1:100){
name=paste("v",i,sep="")
s[name]=vi
}
Now the problem is how to make R treat vi as name for the variable that will be stored in list. On running the above written code the console is showing the error
Error in s[name] = vi :
`cannot coerce type 'closure' to vector of type 'list'
You can use mget to create a list based on multiple objects.
Here's an example:
v1 <- 1:3
v2 <- 4:6
mget(paste0("v", 1:2))
# $v1
# [1] 1 2 3
#
# $v2
# [1] 4 5 6
The correct names are assigned automatically.
Ok, let's try to make things reproducible first.
# I generate a list of 100 random normal vector with 50 elements
s <- list()
for(i in 1:100) s[[i]] <- rnorm(50)
# Creating the names vector(!) does not need any loops.
nms <- paste("v",1:100,sep="")
# if you already have the list like you say, you're done.
names(s) <- nms
If you have your variables v1, v2, v3, etc. you can use:
s=list()
for (i in 1:100){
name=paste("v",i,sep="")
s[[name]]=get(name) #write to the list with [[ ]] operator
}
The get function returns the value of the variable with the name equal to the string passed as argument (if the variable exists).
Please note that this will copy the value, and it does not create a reference to that variables, as usual in R. So if you then change the value of a variable it does not also change in the list (and viceversa).

Resources