handling sequential tasks with purrr

handling sequential tasks with purrr - r

I would like to take list of objects and build a single object out of all of them. The actual use case is to combine multiple Seurat Objects into a single object. Currently I use a for loop, however, I was curious if I could use purrr::map. To make the problem simpler, lets just concatenate a part of a list. Try not to get too cute with the result because I the true problem is more difficult (a more complex function).
w1 = list(a="first",b="This")
w2 = list(a="second",b="is")
w3 = list(a="third",b="the")
w4 = list(a="fourth",b="desired results")
The desired results would be an "This is the desired results".
list(w1,w2,w3,w4) %>% map(paste,.$b," ")
gives
[[1]] [1] "This "
[[2]] [1] "is "
[[3]] [1] "the "
[[4]] [1] "desired result "
I would like to save the results of the previous iteration and add it as a parameter to the function.
essentially I would like to replace the following line with a functional.
y=NULL;for (x in list(w1,w2,w3,w4)){ y=ifelse(is.null(y),x$b,paste0(y," ",x$b))}
#y
#"This is the desired result"

library(purrr)
list(w1, w2, w3, w4) %>%
accumulate(~paste(.x, .y[2][[1]]), .init = '') %>%
tail(1) %>%
substr(2, nchar(.))
# [1] "This is the desired results"

With do.call and lapply in Base R:
do.call(paste, lapply(list(w1,w2,w3,w4), `[[`, "b"))
# [1] "This is the desired results"

I would recommend this using purrr
list(w1,w2,w3,w4) %>%
map_chr("b") %>%
paste(collapse=" ")
We can pass a string to map() to return just that named element, and since we are expecting only character values, we can use map_chr to get just a vector of character values rather than a list. Finally just pipe that to paste(collapse=) to turn it into just one string.
But more generally if you want to collapse incrementally, you can use reduce.
list(w1, w2, w3, w4) %>%
map_chr("b") %>%
reduce(~paste(.x, .y))

Related

How to randomly reshuffle letters in words

I am trying to make a word scrambler in R. So i have put some words in a collection and tried to use strsplit() to split the letters of each word in the collection.
But I don't understand how to jumble the letters of a word and merge them to one word in R Tool. Does anyone know how can I solve this?
This is what I have done
enter image description here

Once you've split the words, you can use sample() to rescramble the letters, and then paste0() with collapse="", to concatenate back into a 'word'
lapply(words, function(x) paste0(sample(strsplit(x, split="")[[1]]), collapse=""))

You can use the stringi package if you want:
> stringi::stri_rand_shuffle(c("hello", "goodbye"))
[1] "oellh" "deoygob"

Here's a one-liner:
lapply(lapply(strsplit(strings, ""), sample), paste0, collapse = "")
[[1]]
[1] "elfi"
[[2]]
[1] "vleo"
[[3]]
[1] "rmsyyet"
Use unlistto get rid of the list:
unlist(lapply(lapply(strsplit(strings, ""), sample), paste0, collapse = ""))
Data:
strings <- c("life", "love", "mystery")

You can use the sample function for this.
here is an example of doing it for a single word. You can use this within your for-loop:
yourword <- "hello"
# split: Split will return a list with one char vector in it.
# We only want to interact with the vector not the list, so we extract the first
# (and only) element with "[[1]]"
jumble <- strsplit(yourword,"")[[1]]
jumble <- sample(jumble, # sample random element from jumble
size = length(jumble), # as many times as the length of jumble
# ergo all Letters
replace = FALSE # do not sample an element multiple times
)
restored <- paste0(jumble,
collapse = "" # bas
)
As the answer from langtang suggests, you can use the apply family for this, which is more efficient. But maybe this answer helps the understanding of what R is actually doing here.

keep duplicates using `make_clean_names` in R janitor package

I am trying to clean a character column using make_clean_names function in janitor package in R. I need to keep the duplicated in this case and not add a numeric to it. Is this possible? My code is like this
x <- c(' x y z', 'xyz', 'x123x', 'xy()','xyz','xyz')
janitor::make_clean_names(x)
[1] "x_y_z" "xyz" "x123x" "xy" "xyz_2" "xyz_3"
janitor::make_clean_names(x, unique_sep = '.')
[1] "x_y_z" "xyz" "x123x" "xy" "xyz.1" "xyz.2"
janitor::make_clean_names(x, unique_sep = NULL)
[1] "x_y_z" "xyz" "x123x" "xy" "xyz_2" "xyz_3"
Using unique_sep = NULL doesn't seem to work. Any other way to keep unique values?
Desired Output:
[1] "x_y_z" "xyz" "x123x" "xy" "xyz" "xyz"
I know how to use regular expressions to do this. Just searching for a shortcut.
PS: I know this function is created to clean names of a data.frame, I am trying to apply this to a different use case. This functionality might help a lot in cleaning character columns.

You can use sapply to go through the vector elements one by one and thus avoid adding numeric suffixes to duplicates:
sapply(x, make_clean_names, USE.NAMES = F)
[1] "x_y_z" "xyz" "x123x" "xy" "xyz" "xyz"

Unfortunately no, it's not possible. If you look at the code for make_clean_names you'll see it ends with this:
# Handle duplicated names - they mess up dplyr pipelines. This appends the
# column number to repeated instances of duplicate variable names.
while (any(duplicated(cased_names))) {
dupe_count <-
vapply(
seq_along(cased_names), function(i) {
sum(cased_names[i] == cased_names[1:i])
},
1L
)
cased_names[dupe_count > 1] <-
paste(
cased_names[dupe_count > 1],
dupe_count[dupe_count > 1],
sep = "_"
)
}
I think you're on the right track passing the unique_sep argument through to the underlying function that make_clean_names uses, snakecase::to_any_case. But that while loop, recently introduced to ensure there are never duplicated names resulting from make_clean_names, will always deduplicate at the end.
You could try adapting your own function that is the first part of make_clean_names, without the loop, or you could perhaps make use of snakecase::to_any_case.

How to drop substring from variable names?

I have the following names of variables:
vars <- c("var1.caps(12, For]","var2(5,For]","var3.tree.(15, For]","var4.caps")
I need to clean these names in order to get the following result:
clean_vars <- c("var1.caps","var2","var3.tree.","var4.caps")
So, basically I would like to drop (..].
Is there any automated way to do it in R?
I was trying to adapt str_replace(vars, pattern, ""), but not sure how to make pattern flexible because it could have different values between ( and ].

gsub("\\(.*\\]","",vars)
[1] "var1.caps" "var2" "var3.tree." "var4.caps"

Using stringr and purrr:
stringr::str_split(vars, "\\(") %>% purrr::map(., 1) %>% unlist()
[1] "var1.caps" "var2" "var3.tree." "var4.caps"

Another option of using gsub
> gsub("(?<=)\\(.*\\]","\\1",vars,perl = T)
[1] "var1.caps" "var2" "var3.tree."
[4] "var4.caps

Eliminate
the first ( (in regex \\() in the string
and everything that comes after it (.+).
Replace it with nothing ("").
sub("\\(.+", "", vars)
# [1] "var1.caps" "var2" "var3.tree." "var4.caps"

R subsetting list "incorrect number of dimensions"

I am working with some text in a list. The text is separated by CR/LF, so I split the string on that. Then I have to clean up the list to make it usable.
library(tidyverse)
my_list <-("abc\r\ndef\r\nghi\r\njkl\r\n")
# The str_split gives me a list that has an empty element at the end. Why?
split_list <- str_split(my_list, "\r\n")
[[1]]
[1] "abc" "def" "ghi" "jkl" ""
I need to remove the first two elements and then sort in reverse order:
split_list %>%
split_list[[1]][-1:-2] %>%
sort(split_list, decreasing = TRUE)
But it fails with Error in.[split_list[[1]], -1:-2] : incorrect number of dimensions
I've read so many discussions of subsetting but they all seem more complicated than my example. I clearly don't understand this yet. Thank you for your suggestions!

You could do :
library(magrittr)
split_list %>% .[[1]] %>% tail(-2) %>% sort(decreasing = TRUE)
#[1] "jkl" "ghi" ""

Here's a way of using "[[" and "[" inside the tidyverse framework. They are both functions so you need to backtick them when they are used in this manner. (Your error arises from referring to the data-object twice. You should not need to refer to split_list twice.) The tidyverse creates an implicit pass-through of the leading data-object as it gets progressively modified by the sequence of functions. Functions become somewhat like 'infix'-functions in base R:
split_list %>%
`[[`(1) %>% # pulls first column from split_list
`[`(-1:-2) %>% # both extraction functions used by back-ticked names
sort( decreasing = TRUE)
[1] "jkl" "ghi" ""
It's actually quite similar to the arrangement you could use in the base R use of these functions which are also infix:
sort( split_list
[[ 1]]
[ (-1:-2)],
decreasing = TRUE)
[1] "jkl" "ghi" ""

If you are only working on one vector such that str_split only ever returns a list with one element containing the split vector, you could wrap your str_split() inside the unlist() function to obtain the vector of split elements directly. It could look something like this:
sort(unlist(str_split(my_list, "\r\n"))[-c(1:2)], decreasing = TRUE)
Above I also subset the unlisted vector to remove the first two elements and then wrap the entire expression inside the sort() function with decreasing = TRUE.

R List with sub-lists: Extract all elements that match a rule into array

I have a R list of objects which are again lists of various types. I want "cost" value for all objects whose category is "internal". What's a good way of achieving this?
If I had a data frame I'd have done something like
my_dataframe$cost[my_dataframe$category == "internal"]
What's the analogous idiom for a list?
mylist<-list(list(category="internal",cost=2),
list(category="bar",cost=3),list(category="internal",cost=4),
list(category='foo',age=56))
Here I'd want to get c(2,4). Subsetting like this does not work:
mylist[mylist$category == "internal"]
I can do part of this by:
temp<-sapply(mylist,FUN = function(x) x$category=="internal")
mylist[temp]
[[1]]
[[1]]$category
[1] "internal"
[[1]]$cost
[1] 2
[[2]]
[[2]]$category
[1] "internal"
[[2]]$cost
[1] 4
But how do I get just the costs so that I can (say) sum them up etc.? I tried this but does not help much:
unlist(mylist[temp])
category cost category cost
"internal" "2" "internal" "4"
Is there a neat, compact idiom for doing what I want?

The idiom you are looking for is
sapply(mylist, "[[", "cost")
which returns a list of the extracted vector, should it exist, and NULL if it does not.
[[1]]
[1] 2
[[2]]
[1] 3
[[3]]
[1] 4
[[4]]
NULL
If you just want the sum of categories that are internal you can do the following assuming you want a vector.
sum(sapply(mylist[temp], "[[", "cost"))
And if you want a list of the same result you can do
sapply(mylist,function(x) x[x$category == "internal"]$cost)
One of the beautiful, but challenging things about R is that there are so many ways to express the same language.
You might note from the other answers that you can interchange sapply and lapply since lists are just heterogenous vectors, the following will also return 6.
do.call("sum",lapply(mylist, function(x) x[x[["category"]] == "internal"]$cost))

Yet another attempt, this time using ?Filter and a custom function to do the necessary selecting:
sum(sapply(Filter(function(x) x$category=="internal", mylist), `[[`, "cost"))
#[1] 6

Could try something like this. For all sublists, if the category is "internal", get the cost, otherwise return NULL which will be ignored when you unlist the result:
sum(unlist(lapply(mylist, function(x) if(x$category == "internal") x$cost)))
# [1] 6
A safer way is to also check if category exists in the sublist by checking the length of category:
sum(unlist(lapply(mylist, function(x) if(length(x$category) && x$category == "internal") x$cost)))
# [1] 6
This will avoid raising an error if the sublist doesn't contain the category field.

I approached your question by rlist package. This method is similar to apurrr package method #alistaire mentioned.
library(rlist); library(dplyr)
mylist %>%
list.filter(category=="internal") %>%
list.mapv(cost) %>% sum()
# list.mapv returns each member of a list by an expression to a vector.

The purrr package has some nice utilities for manipulating lists. Here, keep lets you specify a predicate function that returns a Boolean for whether to keep a list element:
library(purrr)
mylist %>%
keep(~.x[['category']] == 'internal') %>%
# now select the `cost` element of each, and simplify to numeric
map_dbl('cost') %>%
sum()
## [1] 6
The predicate structure with ~ and .x is a shorthand equivalent to
function(x){x[['category']] == 'internal'}

Here's a dplyr option:
library(dplyr)
bind_rows(mylist) %>%
filter(category == 'internal') %>%
summarize(total = sum(cost))
# A tibble: 1 x 1
total
<dbl>
1 6

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

handling sequential tasks with purrr - r

library(purrr) list(w1, w2, w3, w4) %>% accumulate(~paste(.x, .y[2][[1]]), .init = '') %>% tail(1) %>% substr(2, nchar(.)) # [1] "This is the desired results"

With do.call and lapply in Base R: do.call(paste, lapply(list(w1,w2,w3,w4), `[[`, "b")) # [1] "This is the desired results"

Related

How to randomly reshuffle letters in words

keep duplicates using `make_clean_names` in R janitor package

How to drop substring from variable names?

R subsetting list "incorrect number of dimensions"

R List with sub-lists: Extract all elements that match a rule into array

Categories

Resources