avoiding the side-effects of c()

avoiding the side-effects of c() - r

In particular, the removing attributes note in ?c
> x <- 1
> y <- as.integer(1)
> str(x);str(y)
num 1
int 1
> identical(x, y)
[1] FALSE
> str(c(x, y))
num [1:2] 1 1
> tmp <- c(x, y)
> identical(tmp[1], tmp[2])
[1] TRUE
Another example (but not as relevant)
> tmp <- c(1, 3, 2)
> sort(tmp)
[1] 1 2 3
> tmp <- factor(tmp, levels = ordered(tmp))
> sort(tmp)
[1] 1 3 2
Levels: 1 3 2
> sort(rep(tmp, 2))
[1] 1 1 3 3 2 2
Levels: 1 3 2
> tmp1 <- c(tmp, tmp)
> sort(tmp1)
[1] 1 1 2 2 3 3
I ask because I have a function that takes ... and combines 11ty-many objects (as in tmp <- c(...) and performs identical on each pair, and it currently (correctly) says 1 and as.integer(1) are identical which is not what I want.

Atomic vectors cannot store elements of different modes. If you combine objects of different modes using c, all are transformed into the most general mode (character > numeric > integer > logical).
If you want to store objects of different modes, you can use lists. Here is an illustration:
Atomic vectors:
x <- 1
y <- 1L
str(x); str(y)
# num 1
# int 1
str(c(x, y))
# num [1:2] 1 1
Combine both values in a list:
z <- list(x, y)
str(z)
# List of 2
# $ : num 1
# $ : int 1
identical(z[[1]], z[[2]])
# [1] FALSE
Store objects in a one-element list and combine them using c:
xList <- list(x)
yList <- list(y)
zList <- c(xList, yList)
str(zList)
# List of 2
# $ : num 1
# $ : int 1
identical(zList[[1]], zList[[2]])
# [1] FALSE

Related

Replacing in list inside list in R

I have a list of lists like the following:
x <- list(x = list(a = 1:10, b = 10:20), y = 4, z = list(a = 1, b = 2))
str(x)
List of 3
$ x:List of 2
..$ a: int [1:10] 1 2 3 4 5 6 7 8 9 10
..$ b: int [1:11] 10 11 12 13 14 15 16 17 18 19 ...
$ y: num 4
$ z:List of 2
..$ a: num 1
..$ b: num 2
How can I replace values in the list "a" inside list "x" (x$a) to replace for example the 1 with 100.
My real data is very large so I cannot do it one by one and the unlist function is not a solution for me because I miss information.
Any ideas??

Operate on all a subcomponents
For the list x shown in the question we can check whether each component is a list with an a component and if so then replace 1 in the a component with 100.
f <- function(z) { if (is.list(z) && "a" %in% names(z)) z$a[z$a == 1] <- 100; z }
lapply(x, f)
Just x component
1) If you only want to perform the replacement in the x component of x then x2 is the result.
x2 <- x
x2$x$a[x2$x$a == 1] <- 100
2) Another possibility for the operating on just the x component is to use rrapply.
library(rrapply)
cond <- function(z, .xparents) identical(.xparents, c("x", "a"))
rrapply(x, cond, function(z) replace(z, z == 1, 100))
3) And another possibility is to use modifyList
modifyList(x, list(x = list(a = replace(x$x$a, x$x$a ==1, 100))))
4) within is another option.
within(x, x$a[x$a == 1] <- 100 )
4a) or iterating within:
within(x, x <- within(x, a[a == 1] <- 100) )

Here is trick using relist + unlist
> v <- unlist(x)
> relist(replace(v, grepl("\\.a\\d?", names(v)) & v == 1, 100), x)
$x
$x$a
[1] 100 2 3 4 5 6 7 8 9 10
$x$b
[1] 10 11 12 13 14 15 16 17 18 19 20
$y
[1] 4
$z
$z$a
[1] 100
$z$b
[1] 2

Adding a new column to a list of data frames and then 'unlist' with names intact?

I have a number of dfs to which I want to add a column.
For the sake of a mrp, these dfs are called df_1, df_2, df_3...
for (i in 1:10) {
assign(paste("df_",i,sep = ""),data.frame(x = rep(1,10), y = rep(2,10)))
}
I want to add another column z to each of these dfs.
z <- rep("hello",10)
How can I accomplish this?
Using lapply I have been able to do this
q <- list()
for (i in 1:10) {
q[[i]] <- assign(paste("df_",i,sep = ""),data.frame(x = rep(1,10), y = rep(2,10)))
}
z <- rep("hello",10)
q <- lapply(q, cbind,z)
This adds the required column, however, I don't know how to preserve the names. How can I still have df_1, df_2, etc but each with a new column z?
Thanks in advance

Using `[<-`().
q <- lapply(q,`[<-`, 'z', value=rep("hello", 10))
Gives
str(q)
# List of 10
# $ :'data.frame': 10 obs. of 3 variables:
# ..$ x: num [1:10] 1 1 1 1 1 1 1 1 1 1
# ..$ y: num [1:10] 2 2 2 2 2 2 2 2 2 2
# ..$ z: chr [1:10] "hello" "hello" "hello" "hello" ...
# $ :'data.frame': 10 obs. of 3 variables:
# ..$ x: num [1:10] 1 1 1 1 1 1 1 1 1 1
# ..$ y: num [1:10] 2 2 2 2 2 2 2 2 2 2
# ..$ z: chr [1:10] "hello" "hello" "hello" "hello" ...
# ...
This works, because `[<-`(df_1, 'z', value=z) is similar to df_1[['z']] <- z. (Actually we're using base:::`[<-.data.frame()`.)
Note: You might get q a little cheaper using replicate:
n <- 3
q <- replicate(n, data.frame(x=rep(1, 3), y=rep(2, 3)), simplify=FALSE) |>
setNames(paste0('df_', 1:n))
q
# $df_1
# x y
# 1 1 2
# 2 1 2
# 3 1 2
#
# $df_2
# x y
# 1 1 2
# 2 1 2
# 3 1 2
#
# $df_3
# x y
# 1 1 2
# 2 1 2
# 3 1 2

Alternatively, you can slightly adjust your own list-method such that the names of the data frames are also stored:
q <- list()
for (i in 1:10) {
q[[paste0('df_', i)]] <- data.frame(x = rep(1,10), y = rep(2,10))
}
z <- rep("hello",10)
q <- lapply(q, cbind,z)
Edit: using list2env mentioned by #jay.sf, the dfs are returned to the global environment.
list2env(q , .GlobalEnv)

Functionally the same as #jay.sf's answer, slightly more verbose/more lines of code, but perhaps easier to understand using transform().
# create dataframes
for (i in 1:10) {
assign(paste("df_",i,sep = ""),data.frame(x = rep(1,10), y = rep(2,10)))
}
# store dataframes into a list (only objects starting with df_)
df_list <- mget(ls(pattern="^df_"))
# add new column to each dataframe
lapply(df_list, \(x) transform(x, z = rep("hello", 10)))

How to increment vector in r by a fixed value and generate histogram of each iteration

I'm looking to iterate each value in the vector by 1 until a set value is reached and saving each iteration in a vector, and further iterations do not include values past the set value. So for instance say the set value is 3. Consider this vector, A <- c(1,1,2). Then the desired outcome should be:
Outcome:
1 1 2
2 2 3
3 3
Then I want to store each line in a vector so I can plot a histogram
so with each vector outcome including the original vector.
hist(c(1,1,2))
hist(c(2,2,3))
hist(c(3,3))
Potential code:
for (i in 1:length(A)) {
A[i] <- A + 1
}

# given values
A <- c(1, 1, 2)
value <- 3
# incrementations
out_lst <- lapply(A, function(x) x : 3)
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 1 2 3
#
# [[3]]
# [1] 2 3
# histograms
hist_lst <- list()
max_len <- max(sapply(out_lst, function(x) length(x)))
for(l in 1:max_len) {
hist_lst[[l]] <- sapply(out_lst, function(x) x[l])
}
hist_lst
# [[1]]
# [1] 1 1 2
#
# [[2]]
# [1] 2 2 3
#
# [[3]]
# [1] 3 3 NA
par(mfrow = c(1, length(hist_lst)))
invisible(lapply(hist_lst, hist))

You can use a while loop:
funfun=function(vec,max){
y=list()
i=1
while(length(vec)!=0){
y[[i]]=vec
vec=vec+1
vec=`attributes<-`(na.omit(replace(vec,vec>max,NA)),NULL)
i=i+1
}
y
}
funfun(c(1,1,2),3)
[[1]]
[1] 1 1 2
[[2]]
[1] 2 2 3
[[3]]
[1] 3 3
you can now do
sapply(funfun(c(1,1,2),3),hist)

R - Collapse into vector same member of a list

I have a list with same structure for every member as the following
config <- NULL
config[["secA"]] <- NULL
config[["secA"]]$VAL <- 0
config[["secA"]]$ARR <- c(1,2,3,4,5)
config[["secA"]]$DF <- data.frame(matrix(c(1,5,3,8),2,2))
config[["secB"]] <- NULL
config[["secB"]]$VAL <- 1
config[["secB"]]$ARR <- c(1,3,2,4,9)
config[["secB"]]$DF <- data.frame(matrix(c(2,6,1,9),2,2))
config[["secC"]] <- NULL
config[["secC"]]$VAL <- 5
config[["secC"]]$ARR <- c(4,2,1,5,8)
config[["secC"]]$DF <- data.frame(matrix(c(4,2,1,7),2,2))
and I need to obtain 3 vectors VAL, ARR and DF, each with the concatenated elements of the corresponding member. such as
# VAL: 0,1,5
# ARR: 1,2,3,4,5,1,3,2,4,9,4,2,1,5,8
# DF: 1,5,3,8,2,6,1,9,4,2,1,7
Looking at similar situations, I have the feeling I need to use a combination of do.call and cbind or lapply but I have no clue. any suggestions?

config <- NULL
config[["secA"]] <- NULL
config[["secA"]]$VAL <- 0
config[["secA"]]$ARR <- c(1,2,3,4,5)
config[["secA"]]$DF <- data.frame(matrix(c(1,5,3,8),2,2))
config[["secB"]] <- NULL
config[["secB"]]$VAL <- 1
config[["secB"]]$ARR <- c(1,3,2,4,9)
config[["secB"]]$DF <- data.frame(matrix(c(2,6,1,9),2,2))
config[["secC"]] <- NULL
config[["secC"]]$VAL <- 5
config[["secC"]]$ARR <- c(4,2,1,5,8)
config[["secC"]]$DF <- data.frame(matrix(c(4,2,1,7),2,2))
sapply(names(config[[1]]), function(x)
unname(unlist(sapply(config, `[`, x))), USE.NAMES = TRUE)
# $VAL
# [1] 0 1 5
#
# $ARR
# [1] 1 2 3 4 5 1 3 2 4 9 4 2 1 5 8
#
# $DF
# [1] 1 5 3 8 2 6 1 9 4 2 1 7
Or you can use this clist function
Unfortunately there were no other answers.
(l <- Reduce(clist, config))
# $VAL
# [1] 0 1 5
#
# $ARR
# [1] 1 2 3 4 5 1 3 2 4 9 4 2 1 5 8
#
# $DF
# X1 X2 X1 X2 X1 X2
# 1 1 3 2 1 4 1
# 2 5 8 6 9 2 7
It merges data frames and matrices, so you need to unlist to get the vector you want
l$DF <- unname(unlist(l$DF))
l
# $VAL
# [1] 0 1 5
#
# $ARR
# [1] 1 2 3 4 5 1 3 2 4 9 4 2 1 5 8
#
# $DF
# [1] 1 5 3 8 2 6 1 9 4 2 1 7
Function
clist <- function (x, y) {
islist <- function(x) inherits(x, 'list')
'%||%' <- function(a, b) if (!is.null(a)) a else b
get_fun <- function(x, y)
switch(class(x %||% y),
matrix = cbind,
data.frame = function(x, y)
do.call('cbind.data.frame', Filter(Negate(is.null), list(x, y))),
factor = function(...) unlist(list(...)), c)
stopifnot(islist(x), islist(y))
nn <- names(rapply(c(x, y), names, how = 'list'))
if (is.null(nn) || any(!nzchar(nn)))
stop('All non-NULL list elements should have unique names', domain = NA)
nn <- unique(c(names(x), names(y)))
z <- setNames(vector('list', length(nn)), nn)
for (ii in nn)
z[[ii]] <- if (islist(x[[ii]]) && islist(y[[ii]]))
Recall(x[[ii]], y[[ii]]) else
(get_fun(x[[ii]], y[[ii]]))(x[[ii]], y[[ii]])
z
}

Another approach, with slightly less code.
un_config <- unlist(config)
un_configNAM <- names(un_config)
vecNAM <- c("VAL", "ARR", "DF")
for(n in vecNAM){
assign(n, un_config[grepl(n, un_configNAM)])
}
This will return 3 vectors as the OP requested. However, generally it is more advantageous to store results in a list as rawr suggests. You of course can adopt the above code so that results are stored within a list.
l <- rep(list(NA), length(vecNAM))
i = 1
for(n in vecNAM){
l[[i]] <- un_config[grepl(n, un_configNAM)]
i = i +1
}

Intersect all possible combinations of list elements

I have a list of vectors:
> l <- list(A=c("one", "two", "three", "four"), B=c("one", "two"), C=c("two", "four", "five", "six"), D=c("six", "seven"))
> l
$A
[1] "one" "two" "three" "four"
$B
[1] "one" "two"
$C
[1] "two" "four" "five" "six"
$D
[1] "six" "seven"
I would like to calculate the length of the overlap between all possible pairwise combinations of the list elements, i.e. (the format of the result doesn't matter):
AintB 2
AintC 2
AintD 0
BintC 1
BintD 0
CintD 1
I know combn(x, 2) can be used to get a matrix of all possible pairwise combinations in a vector and that length(intersect(a, b)) would give me the length of the overlap of two vectors, but I can't think of a way to put the two things together.
Any help is much appreciated! Thanks.

If I understand correctly, you can look at crossprod and stack:
crossprod(table(stack(l)))
# ind
# ind A B C D
# A 4 2 2 0
# B 2 2 1 0
# C 2 1 4 1
# D 0 0 1 2
You can extend the idea if you want a data.frame of just the relevant values as follows:
Write a spiffy function
listIntersect <- function(inList) {
X <- crossprod(table(stack(inList)))
X[lower.tri(X)] <- NA
diag(X) <- NA
out <- na.omit(data.frame(as.table(X)))
out[order(out$ind), ]
}
Apply it
listIntersect(l)
# ind ind.1 Freq
# 5 A B 2
# 9 A C 2
# 13 A D 0
# 10 B C 1
# 14 B D 0
# 15 C D 1
Performance seems pretty decent.
Expand the list:
L <- unlist(replicate(100, l, FALSE), recursive=FALSE)
names(L) <- make.unique(names(L))
Set up some functions to test:
fun1 <- function(l) listIntersect(l)
fun2 <- function(l) apply( combn( l , 2 ) , 2 , function(x) length( intersect( unlist( x[1]) , unlist(x[2]) ) ) )
fun3 <- function(l) {
m1 <- combn(names(l),2)
val <- sapply(split(m1, col(m1)),function(x) {x1 <- l[[x[1]]]; x2 <- l[[x[2]]]; length(intersect(x1, x2))})
Ind <- apply(m1,2,paste,collapse="int")
data.frame(Ind, val, stringsAsFactors=F)
}
Check out the timings:
system.time(F1 <- fun1(L))
# user system elapsed
# 0.33 0.00 0.33
system.time(F2 <- fun2(L))
# user system elapsed
# 4.32 0.00 4.31
system.time(F3 <- fun3(L))
# user system elapsed
# 6.33 0.00 6.33
Everyone seems to be sorting the result differently, but the numbers match:
table(F1$Freq)
#
# 0 1 2 4
# 20000 20000 29900 9900
table(F2)
# F2
# 0 1 2 4
# 20000 20000 29900 9900
table(F3$val)
#
# 0 1 2 4
# 20000 20000 29900 9900

combn works with list structures as well, you just need a little unlist'ing of the result to use intersect...
# Get the combinations of names of list elements
nms <- combn( names(l) , 2 , FUN = paste0 , collapse = "" , simplify = FALSE )
# Make the combinations of list elements
ll <- combn( l , 2 , simplify = FALSE )
# Intersect the list elements
out <- lapply( ll , function(x) length( intersect( x[[1]] , x[[2]] ) ) )
# Output with names
setNames( out , nms )
#$AB
#[1] 2
#$AC
#[1] 2
#$AD
#[1] 0
#$BC
#[1] 1
#$BD
#[1] 0
#$CD
#[1] 1

Try:
m1 <- combn(names(l),2)
val <- sapply(split(m1, col(m1)),function(x) {x1 <- l[[x[1]]]; x2 <- l[[x[2]]]; length(intersect(x1, x2))})
Ind <- apply(m1,2,paste,collapse="int")
data.frame(Ind, val, stringsAsFactors=F)
# Ind val
# 1 AntB 2
# 2 AntC 2
# 3 AntD 0
# 4 BntC 1
# 5 BntD 0
# 6 CntD 1