How to drop element from named list without assignment? - r

Suppose I have a named list like
somelist <- list(a = 1, b = 5, c = 3)
I know that I can drop somelist$b, say, by assigning NULL to it:
somelist$b <- NULL
I suppose this is fine for interactive work, but not so much for programmatic work, because it forces the creation of otherwise superfluous variables.
For example, suppose that foo(42) evaluates to a list similar to somelist above, and that I want to pass the list resulting from dropping the b element from foo(42) to some other function bar. In this case, applying the method shown above would require the following:
superfluous.variable <- foo(42)
superfluous.variable$b <- NULL
bar(superfluous.variable)
rm(superfluous.variable)
I'm looking for a way to pass to bar the modified results from foo that does not require these superfluous assignments. The four lines above would collapse to a single line:
bar(drop.item.from.list(foo(42), item.to.drop = "b"))
Does R already have something like the hypothetical drop.item.from.list function above?

You can do that removal on the fly with replace()
replace(somelist, "b", NULL)
# $a
# [1] 1
#
# $c
# [1] 3
It works for multiple variables as well ...
replace(somelist, c("a", "b"), NULL)
# $c
# [1] 3
So just wrap that in bar() and the original list remains intact.
Note: I am not exactly sure what you are doing with foo(42) but you state that the resulting list takes a similar structure, so this should be fine for that.

We can try with setdiff
bar(foo(42)[setdiff(names(somelist), "b")])
as the setdiff subsets the 'somelist'
somelist[setdiff(names(somelist), "b")]
#$a
#[1] 1
#$c
#[1] 3
We can also use this to subset for multiple variables
somelist[setdiff(names(somelist), c("a", "b"))]
#$c
#[1] 3

Related

Split dataframe columns into vectors in R

I have a dataframe as such:
Number <- c(1,2,3)
Number2 <- c(10,12,14)
Letter <- c("A","B","C")
df <- data.frame(Number,Number2,Letter)
I would like to split the df into its respective three columns, each one becoming a vector with the respective column name. In essence, the output should look exactly like the original three input vectors in the above example.
I have tried the split function and also using for loop, but without success.
Any ideas? Thank you.
We may use unclass as data.frame is a list with additional attributes. By unclassing, it removes the data.frame attribute
unclass(df)
Or another option is asplit with MARGIN specified as 2
asplit(df, 2)
NOTE: Both of them return a named list. If we intend to create new objects in the global env, use list2env (not recommended though)
We can use c oras.list
> c(df)
$Number
[1] 1 2 3
$Number2
[1] 10 12 14
$Letter
[1] "A" "B" "C"
> as.list(df)
$Number
[1] 1 2 3
$Number2
[1] 10 12 14
$Letter
[1] "A" "B" "C"
Assuming you are trying to create these as vectors if the global environment, use list2env:
df <- data.frame(Number = c(1, 2, 3),
Number2 = c(10, 12, 14),
Letter = c("A", "B", "C"))
list2env(df, .GlobalEnv)
## <environment: R_GlobalEnv>
ls()
## [1] "df" "Letter" "Number" "Number2"
list2env is clearly the easiest way, but if you want to do it with a for loop it can also be achieved.
The "tricky" part is to make a new vector based on the column names inside the for loop. If you just write
names(df[i]) <- input
a vector will not be created.
A workaround is to use paste to create a string with the new vector name and what should be in it, then use "eval(parse(text=)" to evaluate this expression.
Maybe not the most elegant solution, but seems to work.
for (i in colnames(df)){
vector_name <- names(df[i])
expression_to_be_evaluated <- paste(vector_name, "<- df[[i]]")
eval(parse(text=expression_to_be_evaluated))
}
> Letter
[1] A B C
Levels: A B C
> Number
[1] 1 2 3
> Number2
[1] 10 12 14

How to use grep to search for patterns matches within a list of data frames using a second list of character vectors in R

I have two lists in R. One is a list of data frames with rows that contain strings (List 1). The other is a list (of the same length) of characters (List 2). I would like to go through the lists in a parallel fashion taking the character string from List 2 and searching for it to get its position (using grep) in the data frame at the corresponding element in List 1. Here is a toy example to show what my lists look like:
List1 <- list(data.frame(a = c("other","other","dog")),
data.frame(a = c("cat","other","other")),
data.frame(a = c("other","other","bird")))
List2 <- list("a" = c("dog|xxx|xxx"),
"a" = c("cat|xxx|xxx"),
"a" = c("bird|xxx|xxx"))
The output I would like to get would be a list of the position in each data frame in List 1 of the pattern match i.e. in this example the positions would be 3, 1 & 3. So the list would be:
[[1]]
[1] 3
[[2]]
[1] 1
[[3]]
[1] 3
I cannot seem to figure out how to do this.
I tried lapply:
NewList1 <- lapply(1:length(List1),
function(x) grep(List2[[x]]))
But that does not work. I also tried purrr:map2:
NewList2<-map2(List2, List1, grep(List2$A, List1))
This also does not work. I would be very grateful of any suggestions anyone may have as to how to fix this. Many thanks to anyone willing to wade in!
Try Map + unlist
> Map(grep, List2, unlist(List1, recursive = FALSE))
$a
[1] 3
$a
[1] 1
$a
[1] 3
Using Map you can do -
Map(function(x, y) grep(y, x$a), List1, List2)
#[[1]]
#[1] 3
#[[2]]
#[1] 1
#[[3]]
#[1] 3
The map2 attempt was close but you need to refer lists as .x and .y in the function.
purrr::map2(List2, List1, ~grep(.x, .y$a))

Handling NULL after if() inside appply

I have a situation where apply returns a list with many NULL entries. The code I use is quite long, so I reproduce the problem with a simple example.
# Generate data.
df <- data.frame(a= c(1,2, NA, 6),
b= c(1, 7, 3, 7))
# Return only columns that have NAs.
my_list <- apply(df, 2, function(col_i){
if(any(is.na(col_i))){
return(col_i)
}})
Running this gives us
my_list
$a
[1] 1 2 NA 6
$b
NULL
My problem is that I get many Null entries so that I can not work with the results. How can I (a) avoid apply to return the NULL entries or (b) discard all NULL entries in my_list?
So the expected output is
my_list
$a
[1] 1 2 NA 6
Again, the actual code I use is more complex than that. So please do not suggest using something like df[ , !complete.cases(t(df)), drop= FALSE] which also returns columns that contain any missings. My question is not about how to get columns with any missings but how to handle NULLentries in apply. I want to keep the if part inside apply.
Every function has to return something so you can't really avoid returning something but you can remove them :
Filter(length, apply(df, 2, function(col_i) if(any(is.na(col_i))) return(col_i)))
#$a
#[1] 1 2 NA 6

ifelse conditional assignent of tibbles [duplicate]

I've found R's ifelse statements to be pretty handy from time to time. For example:
ifelse(TRUE,1,2)
# [1] 1
ifelse(FALSE,1,2)
# [1] 2
But I'm somewhat confused by the following behavior.
ifelse(TRUE,c(1,2),c(3,4))
# [1] 1
ifelse(FALSE,c(1,2),c(3,4))
# [1] 3
Is this a design choice that's above my paygrade?
The documentation for ifelse states:
ifelse returns a value with the same
shape as test which is filled with
elements selected from either yes or
no depending on whether the element
of test is TRUE or FALSE.
Since you are passing test values of length 1, you are getting results of length 1. If you pass longer test vectors, you will get longer results:
> ifelse(c(TRUE, FALSE), c(1, 2), c(3, 4))
[1] 1 4
So ifelse is intended for the specific purpose of testing a vector of booleans and returning a vector of the same length, filled with elements taken from the (vector) yes and no arguments.
It is a common confusion, because of the function's name, to use this when really you want just a normal if () {} else {} construction instead.
I bet you want a simple if statement instead of ifelse - in R, if isn't just a control-flow structure, it can return a value:
> if(TRUE) c(1,2) else c(3,4)
[1] 1 2
> if(FALSE) c(1,2) else c(3,4)
[1] 3 4
Note that you can circumvent the problem if you assign the result inside the ifelse:
ifelse(TRUE, a <- c(1,2), a <- c(3,4))
a
# [1] 1 2
ifelse(FALSE, a <- c(1,2), a <- c(3,4))
a
# [1] 3 4
use `if`, e.g.
> `if`(T,1:3,2:4)
[1] 1 2 3
yeah, I think ifelse() is really designed for when you have a big long vector of tests and want to map each to one of two options. For example, I often do colors for plot() in this way:
plot(x,y, col = ifelse(x>2, 'red', 'blue'))
If you had a big long vector of tests but wanted pairs for outputs, you could use sapply() or plyr's llply() or something, perhaps.
Sometimes the user just needs a switch statement instead of an ifelse. In that case:
condition <- TRUE
switch(2-condition, c(1, 2), c(3, 4))
#### [1] 1 2
(which is another syntax option of Ken Williams's answer)
Here is an approach similar to that suggested by Cath, but it can work with existing pre-assigned vectors
It is based around using the get() like so:
a <- c(1,2)
b <- c(3,4)
get(ifelse(TRUE, "a", "b"))
# [1] 1 2
In your case, using if_else from dplyr would have been helpful: if_else is more strict than ifelse, and throws an error for your case:
library(dplyr)
if_else(TRUE,c(1,2),c(3,4))
#> `true` must be length 1 (length of `condition`), not 2
Found on everydropr:
ifelse(rep(TRUE, length(c(1,2))), c(1,2),c(3,4))
#>[1] 1 2
Can replicate the result of your condition to return the desired length

Using ifelse to change column names in R [duplicate]

I've found R's ifelse statements to be pretty handy from time to time. For example:
ifelse(TRUE,1,2)
# [1] 1
ifelse(FALSE,1,2)
# [1] 2
But I'm somewhat confused by the following behavior.
ifelse(TRUE,c(1,2),c(3,4))
# [1] 1
ifelse(FALSE,c(1,2),c(3,4))
# [1] 3
Is this a design choice that's above my paygrade?
The documentation for ifelse states:
ifelse returns a value with the same
shape as test which is filled with
elements selected from either yes or
no depending on whether the element
of test is TRUE or FALSE.
Since you are passing test values of length 1, you are getting results of length 1. If you pass longer test vectors, you will get longer results:
> ifelse(c(TRUE, FALSE), c(1, 2), c(3, 4))
[1] 1 4
So ifelse is intended for the specific purpose of testing a vector of booleans and returning a vector of the same length, filled with elements taken from the (vector) yes and no arguments.
It is a common confusion, because of the function's name, to use this when really you want just a normal if () {} else {} construction instead.
I bet you want a simple if statement instead of ifelse - in R, if isn't just a control-flow structure, it can return a value:
> if(TRUE) c(1,2) else c(3,4)
[1] 1 2
> if(FALSE) c(1,2) else c(3,4)
[1] 3 4
Note that you can circumvent the problem if you assign the result inside the ifelse:
ifelse(TRUE, a <- c(1,2), a <- c(3,4))
a
# [1] 1 2
ifelse(FALSE, a <- c(1,2), a <- c(3,4))
a
# [1] 3 4
use `if`, e.g.
> `if`(T,1:3,2:4)
[1] 1 2 3
yeah, I think ifelse() is really designed for when you have a big long vector of tests and want to map each to one of two options. For example, I often do colors for plot() in this way:
plot(x,y, col = ifelse(x>2, 'red', 'blue'))
If you had a big long vector of tests but wanted pairs for outputs, you could use sapply() or plyr's llply() or something, perhaps.
Sometimes the user just needs a switch statement instead of an ifelse. In that case:
condition <- TRUE
switch(2-condition, c(1, 2), c(3, 4))
#### [1] 1 2
(which is another syntax option of Ken Williams's answer)
Here is an approach similar to that suggested by Cath, but it can work with existing pre-assigned vectors
It is based around using the get() like so:
a <- c(1,2)
b <- c(3,4)
get(ifelse(TRUE, "a", "b"))
# [1] 1 2
In your case, using if_else from dplyr would have been helpful: if_else is more strict than ifelse, and throws an error for your case:
library(dplyr)
if_else(TRUE,c(1,2),c(3,4))
#> `true` must be length 1 (length of `condition`), not 2
Found on everydropr:
ifelse(rep(TRUE, length(c(1,2))), c(1,2),c(3,4))
#>[1] 1 2
Can replicate the result of your condition to return the desired length

Resources