Extract nested list element where element names are similar - r

I have a nested liste, resulted from a function. Where the top element names are reapeated in the element names further down.
$`1`
$`1`$`1`
[1] 0 0 0 0 0 0 0 1 0
$`1`$`2`
[1] 0 0 0 0 0 0 0 0 0
$`2`
$`2`$`1`
[1] 0 0 0 1 1 0 0 0 0
$`2`$`2`
[1] 0 1 0 0 0 1 0 0 0
Is there a way to use an apply function (or whatever) to extract those vectors where the element and subelement names match. E.g. $1$1 and $2$2. I have a huge list (4000 elements with 4000 subelements) so efficiency is thus a must.
Alternatively - I have figured out a way out of this mess by using ´melt()´, but it's too consuming for the size of my set. But if anyone know how to replicate the effect - giving a dataframe with 3 columns one for elementname, one for subelement name and one for the vector - that will also work.
Regards and thanks :)

This is a way to get a list of the vectors you want:
lapply(names(dat), function(x) dat[[x]][[x]])
In a data frame:
do.call("rbind",
lapply(names(dat),
function(x) data.frame(element = x,
subelement = x,
values = dat[[x]][[x]])
)
)

You can unlist them without recursion to remove the top level list structure, and then use regex-assisted subsetting on the names of this result.
l <- list(`1`=list(`1`=rpois(6,1),`2`=rep(0,6)),`2`=list(`1`=rep(0,6),`2`=rpois(6,1)))
l2 <- unlist(l,recursive=F)
l2[grepl("([0-9]+)[.]\\1",names(l2))]
$`1.1`
[1] 2 0 2 4 1 0
$`2.2`
[1] 0 0 0 2 1 0

Related

Sample random column in dataframe

I have the following code: model$data
model$data
[[1]]
Category1 Category2 Category3 Category4
3555 1 0 0 0
6447 1 0 0 0
5523 1 0 1 0
7550 1 0 1 0
6330 1 0 1 0
2451 1 0 0 0
4308 1 0 1 0
8917 0 0 0 0
4780 1 0 1 0
6802 1 0 1 0
2021 1 0 0 0
5792 1 0 1 0
5475 1 0 1 0
4198 1 0 0 0
223 1 0 1 0
4811 1 0 1 0
678 1 0 1 0
I am trying to use this formula to get an index of the column names:
sample(colnames(model$data), 1)
But I receive the following error message:
Error in sample.int(length(x), size, replace, prob) :
invalid first argument
Is there a way to avoid that error?
Notice this?
model$data
[[1]]
The [[1]] means that model$data is a list, whose first component is a data frame. To do anything with it, you need to pass model$data[[1]] to your code, not model$data.
sample(colnames(model$data[[1]]), 1)
This seems to be a near-duplicate of Random rows in dataframes in R and should probably be closed as duplicate. But for completeness, adapting that answer to sampling column-indices is trivial:
you don't need to generate a vector of column-names, only their indices. Keep it simple.
sample your col-indices from 1:ncol(df) instead of 1:nrow(df)
then put those column-indices on the RHS of the comma in df[, ...]
df[, sample(ncol(df), 1)]
the 1 is because you apparently want to take a sample of size 1.
one minor complication is that your dataframe is model$data[[1]], since your model$data looks like a list with one element which is a dataframe, rather than a plain dataframe. So first, assign df <- model$data[[1]]
finally, if you really really want the sampled column-name(s) as well as their indices:
samp_col_idxs <- sample(ncol(df), 1)
samp_col_names <- colnames(df) [samp_col_idxs]

transition matrix force ncol to equal nrows

I have created a transition matrix as a 'from cluster' (rows) 'to cluster' (columns) frequency. Think Markov chain.
Assume I have 5 from clusters but only 3 to clusters then I get a 5*3 transition matrix. How do a force it to be a 5*5 transition matrix? Effectively how to I show the all zero columns?
I'm after an elegant solution as this will be applied on a much larger problem involving hundreds of clusters. I am really quite unfamiliar with R Matrix's and to my knowledge I don't know of an elegant way to force number of columns to enter number of rows then impute zero's where no match except for using a for loop which my hunch is that's not the best solution.
Example code:
# example data
cluster_before <- c(1,2,3,4,5)
cluster_after <- c(1,2,4,4,1)
# Table output
table(cluster_before,cluster_after)
# ncol does not = nrows. I want to rectify that
# I want output to look like this:
what_I_want <- matrix(
c(1,0,0,0,0,
0,1,0,0,0,
0,0,0,1,0,
0,0,0,1,0,
1,0,0,0,0),
byrow=TRUE,ncol=5
)
# Possible solution. But for loop can't be best solution?
empty_mat <- matrix(0,ncol=5,nrow=5)
matrix_to_update <- empty_mat
for (i in 1:length(cluster_before)) {
val_before <- cluster_before[i]
val_after <- cluster_after[i]
matrix_to_update[val_before,val_after] <- matrix_to_update[val_before,val_after]+1
}
matrix_to_update
# What's the more elegant solution?
Thanks in advance for your help. It's much appreciated.
Make them factors and then table:
levs <- union(cluster_before, cluster_after)
table(factor(cluster_before,levs), factor(cluster_after,levs))
# 1 2 3 4 5
# 1 1 0 0 0 0
# 2 0 1 0 0 0
# 3 0 0 0 1 0
# 4 0 0 0 1 0
# 5 1 0 0 0 0
Another solution is to use matrix indicies:
what_I_want <- matrix(0,ncol=5,nrow=5)
what_I_want[cbind(cluster_before,cluster_after)] <- 1
print(what_I_want)
## [,1] [,2] [,3] [,4] [,5]
##[1,] 1 0 0 0 0
##[2,] 0 1 0 0 0
##[3,] 0 0 0 1 0
##[4,] 0 0 0 1 0
##[5,] 1 0 0 0 0
The second line sets the elements corresponding to the row (cluster_before) and column (cluster_after) indices to 1.
Hope this helps.

Creating a loop in R which also changes the column name

I am attempting to loop a command based upon a list (fish_species). And while I’ve found plenty of examples, I haven’t found one that also includes changing the column name as part of the loop. I have figured out how to get the desired result for an individual species (lines 10-13), but in the actual dataset I have ~500 species, and I’d prefer not to repeat this command 500+ times. Is there a way to substitute the values from a list where it says variable?
Fishdata$variable <- ifelse(fishdata$Species== “variable”,fishdata$Number,0)
I know how to do this is ArcGIS, but I am trying to expand my horizons and learn R. This is also my first post, so please excuse any screw ups.
Thank you for any help you can provide.
fishdata <-c()
fishdata$Site <-c(1,1,1,2,2,2)
fishdata$Species <- c("one_fish", "two_fish", "two_fish", "red_fish", "blue_fish", "blue_fish")
fishdata$Number <- c(1,1,1,1,1,1)
fishdata$one_fish <-0
fishdata$two_fish <-0
fishdata$red_fish <-0
fishdata$blue_fish <-0
fish_list <- c("one_fish","two_fish", "red_fish", "blue_fish")
fishdata$one_fish <- ifelse(fishdata$Species=="one_fish",fishdata$Number,0)
fishdata$two_fish <- ifelse(fishdata$Species=="two_fish",fishdata$Number,0)
fishdata$red_fish <- ifelse(fishdata$Species=="red_fish",fishdata$Number,0)
fishdata$blue_fish <- ifelse(fishdata$Species=="blue_fish",fishdata$Number,0)
You can use sapply to iterate over species,
sapply(fishdata$Species, function(i)ifelse(fishdata$Species== i, fishdata$Number,0))
# one_fish two_fish two_fish red_fish blue_fish blue_fish
#[1,] 1 0 0 0 0 0
#[2,] 0 1 1 0 0 0
#[3,] 0 1 1 0 0 0
#[4,] 0 0 0 1 0 0
#[5,] 0 0 0 0 1 1
#[6,] 0 0 0 0 1 1
$ is just an alternative to the [] operator:
a$x
a["x"]
So you can do:
fishdata[species] <- ifelse(fishdata$Species == species, fishdata$Number, 0)
for (species in fish_species) {
fishdata[species] <- ifelse(fishdata$Species == species, fishdata$Number, 0)
}

R How to convert vectors of number with different length into vectors of binary with fixed length

How to convert this
1,2,5,6,9
1,2
3,11
into this:
1,1,0,0,1,1,0,0,1,0,0
1,1,0,0,0,0,0,0,0,0,0
0,0,1,0,0,0,0,0,0,0,1
I thought I can read my data by adding na if the index is not exist.
Then, replace each na with zero, and each not na with one.
But I don't know how, and I searched to similar code and I didn't find
You can do:
lapply(z,tabulate,nbins=max(unlist(z)))
[[1]]
[1] 1 1 0 0 1 1 0 0 1 0 0
[[2]]
[1] 1 1 0 0 0 0 0 0 0 0 0
[[3]]
[1] 0 0 1 0 0 0 0 0 0 0 1
where z is a list of vectors:
z <- list(c(1,2,5,6,9),c(1,2),c(3,11))
I'm not sure what your original numbers are stored as, but here's a solution assuming it's a list of vectors:
nums <-list(
c(1,2,5,6,9),
c(1,2),
c(3,11)
)
maxn <- max(unlist(nums))
lapply(nums, function(x) {
binary <- numeric(maxn)
binary[x] <- 1
binary
})

R concatenating two factors

This is making me feel dumb, but I am trying to produce a single vector/df/list/etc (anything but a matrix) concatenating two factors. Here's the scenario. I have a 100k line dataset. I used the top half to predict the bottom half and vice versa using knn. So now I have 2 objects created by knn predict().
> head(pred11)
[1] 0 0 0 0 0 0
Levels: 0 1
> head(pred12)
[1] 0 1 1 0 0 0
Levels: 0 1
> class(pred11)
[1] "factor"
> class(pred12)
[1] "factor"
Here's where my problem starts:
> pred13 <- rbind(pred11, pred12)
> class(pred13)
[1] "matrix"
There are 2 problems. First it changes the 0's and 1's to 1's and 2's and second it seems to create a huge matrix that's eats all my memory. I've tried messing with as.numeric(), data.frame(), etc, but can't get it to just combine the 2 50k factors into 1 100k one. Any suggestions?
#James presented one way, I'll chip in with another (shorter):
set.seed(42)
x1 <- factor(sample(0:1,10,replace=T))
x2 <- factor(sample(0:1,10,replace=T))
unlist(list(x1,x2))
# [1] 1 1 0 1 1 1 1 0 1 1 0 1 1 0 0 1 1 0 0 1
#Levels: 0 1
...This might seem a bit like magic, but unlist has special support for factors for this particular purpose! All elements in the list must be factors for this to work.
rbind will create 2 x 50000 matrix in your case which isn't what you want. c is the correct function to combine 2 vectors in a single longer vector. When you use rbind or c on a factor, it will use the underlying integers that map to the levels. In general you need to combine as a character before refactoring:
x1 <- factor(sample(0:1,10,replace=T))
x2 <- factor(sample(0:1,10,replace=T))
factor(c(as.character(x1),as.character(x2)))
[1] 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 1 1 0 0 0
Levels: 0 1

Resources