I have a list of lists like the following:
x <- list(x = list(a = 1:10, b = 10:20), y = 4, z = list(a = 1, b = 2))
str(x)
List of 3
$ x:List of 2
..$ a: int [1:10] 1 2 3 4 5 6 7 8 9 10
..$ b: int [1:11] 10 11 12 13 14 15 16 17 18 19 ...
$ y: num 4
$ z:List of 2
..$ a: num 1
..$ b: num 2
How can I replace values in the list "a" inside list "x" (x$a) to replace for example the 1 with 100.
My real data is very large so I cannot do it one by one and the unlist function is not a solution for me because I miss information.
Any ideas??
Operate on all a subcomponents
For the list x shown in the question we can check whether each component is a list with an a component and if so then replace 1 in the a component with 100.
f <- function(z) { if (is.list(z) && "a" %in% names(z)) z$a[z$a == 1] <- 100; z }
lapply(x, f)
Just x component
1) If you only want to perform the replacement in the x component of x then x2 is the result.
x2 <- x
x2$x$a[x2$x$a == 1] <- 100
2) Another possibility for the operating on just the x component is to use rrapply.
library(rrapply)
cond <- function(z, .xparents) identical(.xparents, c("x", "a"))
rrapply(x, cond, function(z) replace(z, z == 1, 100))
3) And another possibility is to use modifyList
modifyList(x, list(x = list(a = replace(x$x$a, x$x$a ==1, 100))))
4) within is another option.
within(x, x$a[x$a == 1] <- 100 )
4a) or iterating within:
within(x, x <- within(x, a[a == 1] <- 100) )
Here is trick using relist + unlist
> v <- unlist(x)
> relist(replace(v, grepl("\\.a\\d?", names(v)) & v == 1, 100), x)
$x
$x$a
[1] 100 2 3 4 5 6 7 8 9 10
$x$b
[1] 10 11 12 13 14 15 16 17 18 19 20
$y
[1] 4
$z
$z$a
[1] 100
$z$b
[1] 2
Related
I have a number of dfs to which I want to add a column.
For the sake of a mrp, these dfs are called df_1, df_2, df_3...
for (i in 1:10) {
assign(paste("df_",i,sep = ""),data.frame(x = rep(1,10), y = rep(2,10)))
}
I want to add another column z to each of these dfs.
z <- rep("hello",10)
How can I accomplish this?
Using lapply I have been able to do this
q <- list()
for (i in 1:10) {
q[[i]] <- assign(paste("df_",i,sep = ""),data.frame(x = rep(1,10), y = rep(2,10)))
}
z <- rep("hello",10)
q <- lapply(q, cbind,z)
This adds the required column, however, I don't know how to preserve the names. How can I still have df_1, df_2, etc but each with a new column z?
Thanks in advance
Using `[<-`().
q <- lapply(q,`[<-`, 'z', value=rep("hello", 10))
Gives
str(q)
# List of 10
# $ :'data.frame': 10 obs. of 3 variables:
# ..$ x: num [1:10] 1 1 1 1 1 1 1 1 1 1
# ..$ y: num [1:10] 2 2 2 2 2 2 2 2 2 2
# ..$ z: chr [1:10] "hello" "hello" "hello" "hello" ...
# $ :'data.frame': 10 obs. of 3 variables:
# ..$ x: num [1:10] 1 1 1 1 1 1 1 1 1 1
# ..$ y: num [1:10] 2 2 2 2 2 2 2 2 2 2
# ..$ z: chr [1:10] "hello" "hello" "hello" "hello" ...
# ...
This works, because `[<-`(df_1, 'z', value=z) is similar to df_1[['z']] <- z. (Actually we're using base:::`[<-.data.frame()`.)
Note: You might get q a little cheaper using replicate:
n <- 3
q <- replicate(n, data.frame(x=rep(1, 3), y=rep(2, 3)), simplify=FALSE) |>
setNames(paste0('df_', 1:n))
q
# $df_1
# x y
# 1 1 2
# 2 1 2
# 3 1 2
#
# $df_2
# x y
# 1 1 2
# 2 1 2
# 3 1 2
#
# $df_3
# x y
# 1 1 2
# 2 1 2
# 3 1 2
Alternatively, you can slightly adjust your own list-method such that the names of the data frames are also stored:
q <- list()
for (i in 1:10) {
q[[paste0('df_', i)]] <- data.frame(x = rep(1,10), y = rep(2,10))
}
z <- rep("hello",10)
q <- lapply(q, cbind,z)
Edit: using list2env mentioned by #jay.sf, the dfs are returned to the global environment.
list2env(q , .GlobalEnv)
Functionally the same as #jay.sf's answer, slightly more verbose/more lines of code, but perhaps easier to understand using transform().
# create dataframes
for (i in 1:10) {
assign(paste("df_",i,sep = ""),data.frame(x = rep(1,10), y = rep(2,10)))
}
# store dataframes into a list (only objects starting with df_)
df_list <- mget(ls(pattern="^df_"))
# add new column to each dataframe
lapply(df_list, \(x) transform(x, z = rep("hello", 10)))
I have a list with 1000 list objects. In any of them , I have 20 list elements var0001:var0020 (each element does not necessarily have the same length as the other ones, I mean, mylist[[]]$var0001 has length = 1, but mylist[[]]$var0012 has length = 1000).
I need a function that would allow me to count the number of list objects inside my list when their list elements var0002 = 1, for example.
I tried things just as:
sum(mylist[[]]$var0002 == 1)
didnt work.
I could get it in a very bizarre code,
j <- 1
for(i in 1:length(mylist)){
if(mylist[[i]]$var0002 %in% 1){
dum[[j]] <- mylist[[i]]
j <- j + 1
}
}
so I would like to improve it, maybe without the looping... I am pretty sure there is a way to do it in about 1 or 2 lines.
Here's an answer with purrr.
library(purrr)
## dummy data
myList <- list(list(var1 = 1, var2 = 1), list(var1 = 0, var2 = 2), list(var1 = 1, var2 = 1))
myList %>%
purrr::map(~ .[["var2"]] == 1) %>%
unlist() %>%
sum()
Just give a minimal reproducible example.
In a tidyverse style,
mylist <-
list(
list01= list(
var01 = 1:3
,var02 = 1:5
,var03 = 1:7
)
,list02= list(
var01 = 1:9
,var02 = 1:11
,var03 = 1:13
)
)
str(mylist)
#> List of 2
#> $ list01:List of 3
#> ..$ var01: int [1:3] 1 2 3
#> ..$ var02: int [1:5] 1 2 3 4 5
#> ..$ var03: int [1:7] 1 2 3 4 5 6 7
#> $ list02:List of 3
#> ..$ var01: int [1:9] 1 2 3 4 5 6 7 8 9
#> ..$ var02: int [1:11] 1 2 3 4 5 6 7 8 9 10 ...
#> ..$ var03: int [1:13] 1 2 3 4 5 6 7 8 9 10 ...
library(purrr)
mylist %>%
transpose() %>%
.$var02 %>%
unlist %>%
table()
#> .
#> 1 2 3 4 5 6 7 8 9 10 11
#> 2 2 2 2 2 1 1 1 1 1 1
Created on 2018-11-09 by the reprex package (v0.2.1)
The unlist function will take all of the objects in a list into a vector. Then you can use a loop or lapply to count the number of time a value occurs within in the inner list.
sapply(1:length(mylist), function(a) sum(unlist(mylist[[a]])==1))
This should return a vector of length 1000 that tells you how many var0001 - var0020 have the value 1.
First, I simplify my question. I want to extract certain ranges from a numeric vector. For example, extracting 3 ranges from 1:20 at the same time :
1 < x < 5
8 < x < 12
17 < x < 20
Therefore, the expected output is 2, 3, 4, 9, 10, 11, 18, 19.
I try to use the function findInterval() and control arguments rightmost.closed and left.open to do that, but any arguments sets cannot achieve the goal.
x <- 1:20
v <- c(1, 5, 8, 12, 17, 20)
x[findInterval(x, v) %% 2 == 1]
# [1] 1 2 3 4 8 9 10 11 17 18 19
x[findInterval(x, v, rightmost.closed = T) %% 2 == 1]
# [1] 1 2 3 4 8 9 10 11 17 18 19 20
x[findInterval(x, v, left.open = T) %% 2 == 1]
# [1] 2 3 4 5 9 10 11 12 18 19 20
By the way, the conditions can also be a matrix like that :
[,1] [,2]
[1,] 1 5
[2,] 8 12
[3,] 17 20
I don't want to use for loop if it's not necessary.
I am grateful for any helps.
I'd probably do it using purrr::map2 or Map, passing your lower-bounds and upper-bounds as arguments and filtering your dataset with a custom function
library(purrr)
x <- 1:20
lower_bounds <- c(1, 8, 17)
upper_bounds <- c(5, 12, 20)
map2(
lower_bounds, upper_bounds, function(lower, upper) {
x[x > lower & x < upper]
}
)
You may use data.table::inrange and its incbounds argument. Assuming ranges are in a matrix 'm', as shown in your question:
x[data.table::inrange(x, m[ , 1], m[ , 2], incbounds = FALSE)]
# [1] 2 3 4 9 10 11 18 19
m <- matrix(v, ncol = 2, byrow = TRUE)
You were on the right path, and left.open indeed helps, but rightmost.closed actually concerns only the last interval rather than the right "side" of each interval. Hence, we need to use left.open twice. As you yourself figured out, it looks like an optimal way to do that is
x[findInterval(x, v) %% 2 == 1 & findInterval(x, v, left.open = TRUE) %% 2 == 1]
# [1] 2 3 4 9 10 11 18 19
Clearly there are alternatives. E.g.,
fun <- function(x, v)
if(length(v) > 1) v[1] < x & x < v[2] | fun(x, v[-1:-2]) else FALSE
x[fun(x, v)]
# [1] 2 3 4 9 10 11 18 19
I found an easy way just with sapply() :
x <- 1:20
v <- c(1, 5, 8, 12, 17, 20)
(v.df <- as.data.frame(matrix(v, 3, 2, byrow = T)))
# V1 V2
# 1 1 5
# 2 8 12
# 3 17 20
y <- sapply(x, function(x){
ind <- (x > v.df$V1 & x < v.df$V2)
if(any(ind)) x else NA
})
y[!is.na(y)]
# [1] 2 3 4 9 10 11 18 19
I need help defining a function that creates a vector in a database where, for each row, the function looks at another column in that database, searches for that value in a designated column of a separate database, creates a subset of that second database consisting of all matching rows, sums a separate column of that new subset, and returns that value to the corresponding row of the new column in the original database.
In other words, I have a data frame that looks something like this:
ID <- c('a', 'b', 'c', 'd', 'e')
M <- 20:39
df <- data.frame(cbind(ID, M))
df$M <- as.numeric(df$M)
> df
ID M
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 a 6
7 b 7
8 c 8
9 d 9
10 e 10
11 a 11
12 b 12
13 c 13
14 d 14
15 e 15
16 a 16
17 b 17
18 c 18
19 d 19
20 e 20
> str(df)
'data.frame': 20 obs. of 2 variables:
$ ID: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5 1 2 3 4 5 ...
$ M : num 1 2 3 4 5 6 7 8 9 10 ...
I would like to create a new data frame, Z, such that Z <- data.frame(cbind(X, Y)) where:
X <- as.character(unique(df$ID))
> X
[1] "a" "b" "c" "d" "e"
and Y is a vector of the sum of all a's, sum of all b's, sum of all c's, etc...
So, Y should be equal to c(34, 38, 42, 46, 50) and my final result should be:
> Z
X Y
1 a 34
2 b 38
3 c 42
4 d 46
5 e 50
> str(Z)
'data.frame': 5 obs. of 2 variables:
$ X: chr "a" "b" "c" "d" ...
$ Y: num 34 38 42 46 50
To do this, I've tried first turning X into a data frame (is it easier to work with as a data table?):
> Z <- data.frame(X)
> Z
X
1 a
2 b
3 c
4 d
5 e
> str(Z)
'data.frame': 5 obs. of 1 variable:
$ X: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
and then defining Y as Z$Y <- sum(df[df$ID == Z$X, 2]) but I don't get unique values:
> Z
X Y
1 a 210
2 b 210
3 c 210
4 d 210
5 e 210
I've also tried defining the function f1() like so:
f1 <- function(v, w, x, y, z){sum(v[v$w == x$y, z])}
but that gets me:
> f1(df, 'ID', Z, 'X', 'M')
[1] 0
I have found a function from another post on this forum that does something similar:
f1 <- function(df, cols, match_with, to_x = 50){
df[cols] <- lapply(df[cols], function(i)
ifelse(grepl(to_x, match_with, fixed = TRUE), 'MID',
i))
return(df)
}
This looks for the value "50" in the match_with column and returns the value "MID" to that row of the column designated by cols, provided both columns in the same designated data base df. So, I would need to replace to_x = 50 with something that, instead of looking for the fixed value "50," looks for whatever value is in the column Z$X and, instead of returning the fixed value "MID," returns the sum of the values df[df$ID == Z$X, df$M]. I've attempted these changes myself by writing variations of the following:
f1 <- function(df, cols, match_with, to_x = df[ , 1], x){
df[cols] <- lapply(df[cols], function(i)
ifelse(grepl(to_x, match_with, fixed = TRUE), sum(x),
i))
return(df)
}
but, so far, none of my variations have produced the desired results. This one gave me:
> f1(Z, df, cols = c('Y'), match_with = df$ID, x = df$M)
X Y
1 a 210
2 b 210
3 c 210
4 d 210
5 e 210
Warning messages:
1: In grepl(to_x, match_with, fixed = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used
2: In `[<-.data.frame`(`*tmp*`, cols, value = list(Y = c(210, 210, :
replacement element 1 has 20 rows to replace 5 rows
It seems to be summing the entirety of df$M instead of the subsets where df$ID == Z$X. In other variations it seemed to have problems referencing a column in a second data frame.
I am somewhat new to R and have almost no experience writing user-defined functions (as you probably could tell by this question). Any help would be very much appreciated!
Nevermind ya'll, I think I got it!
> f1 <- function(col1, col2, df2, to_add){
+ lapply(col1, function(i){
+ df2$x <- grepl(i, col2, fixed = TRUE)
+ df3 <- df2[df2$x == TRUE, to_add]
+ sum(df3, na.rm = TRUE)
+ })}
> Z$Y <- f1(Z$X, df$ID, df, c('M'))
> Z
X Y
1 a 34
2 b 38
3 c 42
4 d 46
5 e 50
In particular, the removing attributes note in ?c
> x <- 1
> y <- as.integer(1)
> str(x);str(y)
num 1
int 1
> identical(x, y)
[1] FALSE
> str(c(x, y))
num [1:2] 1 1
> tmp <- c(x, y)
> identical(tmp[1], tmp[2])
[1] TRUE
Another example (but not as relevant)
> tmp <- c(1, 3, 2)
> sort(tmp)
[1] 1 2 3
> tmp <- factor(tmp, levels = ordered(tmp))
> sort(tmp)
[1] 1 3 2
Levels: 1 3 2
> sort(rep(tmp, 2))
[1] 1 1 3 3 2 2
Levels: 1 3 2
> tmp1 <- c(tmp, tmp)
> sort(tmp1)
[1] 1 1 2 2 3 3
I ask because I have a function that takes ... and combines 11ty-many objects (as in tmp <- c(...) and performs identical on each pair, and it currently (correctly) says 1 and as.integer(1) are identical which is not what I want.
Atomic vectors cannot store elements of different modes. If you combine objects of different modes using c, all are transformed into the most general mode (character > numeric > integer > logical).
If you want to store objects of different modes, you can use lists. Here is an illustration:
Atomic vectors:
x <- 1
y <- 1L
str(x); str(y)
# num 1
# int 1
str(c(x, y))
# num [1:2] 1 1
Combine both values in a list:
z <- list(x, y)
str(z)
# List of 2
# $ : num 1
# $ : int 1
identical(z[[1]], z[[2]])
# [1] FALSE
Store objects in a one-element list and combine them using c:
xList <- list(x)
yList <- list(y)
zList <- c(xList, yList)
str(zList)
# List of 2
# $ : num 1
# $ : int 1
identical(zList[[1]], zList[[2]])
# [1] FALSE