I would like to use the accumulate function with two input vectors and the reduce2 function. The documentation for accumulate implies that two input vectors can be given and that accumulate can work with reduce2. However, I am having trouble.
Here is an example, inspired by the documentation from reduce2.
This is the example from reduce2
> paste2 <- function(x, y, sep = ".") paste(x, y, sep = sep)
> letters[1:4] %>% reduce2(.y=c("-", ".", "-"), paste2)
[1] "a-b.c-d"
Here are several attempts to use accumulate similarly to reduce2. None properly iterate through both letters[1:4] and c("-",".","-").
> letters[1:4] %>% accumulate(.y=c("-", ".", "-"),paste2)
Error in .f(x, y, ...) : unused argument (.y = c("-", ".", "-"))
> letters[1:4] %>% accumulate(c("-", ".", "-"),paste2)
[[1]]
[1] "a"
[[2]]
NULL
> letters[1:4] %>% accumulate(sep=c("-", ".", "-"),paste2)
[1] "a" "a-b" "a-b-c" "a-b-c-d"
How would I use accumulate to see the intermediate results given by the reduce2 example?
It is possible that this is an oversight where the documentation is simply not up to date/a bit misleading? I could not get accumulate to accept a three argument function either, and I'm surprised there's no error in your last example though I guess it would have to be paste that throws it. The fact that the text for .f is exactly the same for accumulate as for reduce makes me think that this just isn't functionality present in accumulate. Additionally, looking at the source seems to show (unless I misread) that reduce and reduce2 have their own implementation but accumulate relies on base::Reduce. Might be worth a GitHub issue.
Here's my best shot at producing the output you wanted. It basically involves calling reduce2 multiple times with the right subset of the input list and the secondary input vector to paste2, which doesn't feel very neat or tidy. This might just not be a particularly neat or tidy problem. Note the use of the {} to override the default %>% behaviour of placing the pipe LHS as the first argument, and the different indexing on .x and .y inside reduce2 (we want to keep .y one element shorter than .x).
paste2 <- function(x, y, sep = ".") paste(x, y, sep = sep)
library(purrr)
letters[1:4] %>%
{map_chr(
.x = 2:length(.),
.f = function(index) reduce2(
.x = .[1:index],
.y = c("-", ".", "-")[1:(index - 1)],
.f = paste2
)
)}
#> [1] "a-b" "a-b.c" "a-b.c-d"
Created on 2018-05-11 by the reprex package (v0.2.0).
A few months after this post accumulate2 was introduced that gives the results OP was after:
library(purrr)
paste2 <- function(x, y, sep = ".") paste(x, y, sep = sep)
accumulate2(letters[1:4], c("-", ".", "-"), paste2)
#> [[1]]
#> [1] "a"
#>
#> [[2]]
#> [1] "a-b"
#>
#> [[3]]
#> [1] "a-b.c"
#>
#> [[4]]
#> [1] "a-b.c-d"
With this trick you can use unlimited arguments in accumulate and you did not even need accumulate2
library(tidyverse)
x <- letters[1:4]
y <- c('-', '.', '-')
accumulate(seq_along(x[-1]), .init = x[1], ~paste(.x, x[.y+1], sep = y[.y]))
#> [1] "a" "a-b" "a-b.c" "a-b.c-d"
# OR
accumulate(seq_along(y), .init = x[1], ~paste(.x, x[.y+1], sep = y[.y]))
#> [1] "a" "a-b" "a-b.c" "a-b.c-d"
Created on 2022-02-21 by the reprex package (v2.0.1)
Related
I have to 2 list such as follow:
List1 <- c("X", "Y","Z")
List2 <- c("Enable", "Status", "Quality")
I am expecting something like this:
X_Enable, X_Status,X_Quality,Y_Enable, Y_Status,Y_Quality, Z_Enable, Z_Status,Z_Quality.
Any recommendation will be helpful for me.Thank you
Here a way to do it:
Data
List1 <- c("X", "Y","Z")
List2 <- c("Enable", "Status", "Quality")
Code
paste(rep(List1,each = length(List2)),List2,sep = "_")
Output
[1] "X_Enable" "X_Status" "X_Quality" "Y_Enable" "Y_Status" "Y_Quality" "Z_Enable" "Z_Status" "Z_Quality"
We may use outer
c(outer(List1, List2, FUN = function(x, y) paste(x, y, sep = "_")))
We can use interaction like below
> levels(interaction(List1, List2, sep = "_"))
[1] "X_Enable" "Y_Enable" "Z_Enable" "X_Quality" "Y_Quality" "Z_Quality"
[7] "X_Status" "Y_Status" "Z_Status"
or expand.grid
> do.call(paste, c(expand.grid(List1, List2), sep = "_"))
[1] "X_Enable" "Y_Enable" "Z_Enable" "X_Status" "Y_Status" "Z_Status"
[7] "X_Quality" "Y_Quality" "Z_Quality"
I have this text file:
l=c("ced","nad")
h=c("SAF","EYR")
res=cbind(l,h)
and this list of files:
dirf<- list.files ("path", "*.txt", full.names = TRUE)
example of files
ced_SAF_jkh_2020.txt
ced_EYR_jkh_2001.txt
nad_SAF_jkh_200.txt
nad_EYR_jkh_200.txt
I want to grip files that contain both words in the two columns, so the files i need
ced_SAF_jkh_2020.txt
nad_EYR_jkh_200.txt
You can construct the name from the matrix and use that, i.e.
do.call(paste, c(data.frame(res), sep = '_'))
#[1] "ced_SAF" "nad_EYR"
To grep them you can do,
ptrn <- do.call(paste, c(data.frame(res), sep = '_'))
grep(paste(ptrn, collapse = '|'), x, value = TRUE)
#[1] "ced_SAF_jkh_2020.txt" "nad_EYR_jkh_200.txt"
where x,
dput(x)
c("ced_SAF_jkh_2020.txt", "ced_EYR_jkh_2001.txt", "nad_SAF_jkh_200.txt",
"nad_EYR_jkh_200.txt")
Another possible solution, based on tidyverse:
library(tidyverse)
l=c("ced","nad")
h=c("SAF","EYR")
res=cbind(l,h)
df <- data.frame(
files = c("ced_SAF_jkh_2020.txt","ced_EYR_jkh_2001.txt","nad_SAF_jkh_200.txt",
"nad_EYR_jkh_200.txt")
)
df %>%
filter((str_detect(files, res[1,1]) & str_detect(files, res[1,2])) |
(str_detect(files, res[2,1]) & str_detect(files, res[2,2])))
#> files
#> 1 ced_SAF_jkh_2020.txt
#> 2 nad_EYR_jkh_200.txt
Or, more compactly, with purrr::map2_dfr:
library(tidyverse)
map2_dfr(res[,1], res[,2],
~ filter(df, (str_detect(files, .x) & str_detect(files, .y))))
#> files
#> 1 ced_SAF_jkh_2020.txt
#> 2 nad_EYR_jkh_200.txt
You can use sprintf() + paste(collapse = '|') to make the expected syntax of regular expression and pass it to list.files() directly:
regex <- paste(sprintf("%s_%s", l, h), collapse = '|')
# [1] "ced_SAF|nad_EYR"
list.files("path_to_file", regex, full.names = TRUE)
Then all the file names which match the regular expression will be returned.
I am trying to find an easy way (preferably a one-liner) to combine and concatenate unique combinations of elements in a character vector into a new vector of strings.
I also want to be able to include any lines of text in the new vector before, in between or after the inserted vector combinations. Combinations should not be repeated in reverse order (e.g. 'x1_x2' but not 'x2_x1'), nor should an element be combined with itself (not 'x1_x1').
Does a quick solution to this exist?
Example of equivalent code for the desired outcome:
vec <- paste0("X", 1:5)
# The underscore signifies any arbitrary line of text
c(
paste0("_", vec[1], "_", vec[2:5], "_"),
paste0("_", vec[2], "_", vec[3:5], "_"),
paste0("_", vec[3], "_", vec[4:5], "_"),
paste0("_", vec[4], "_", vec[5], "_")
)
'_X1_X2_''_X1_X3_''_X1_X4_''_X1_X5_''_X2_X3_''_X2_X4_''_X2_X5_'
'_X3_X4_''_X3_X5_''_X4_X5_'
Try combn
> sprintf("_%s_", combn(vec, 2, paste0, collapse = "_"))
[1] "_X1_X2_" "_X1_X3_" "_X1_X4_" "_X1_X5_" "_X2_X3_" "_X2_X4_" "_X2_X5_"
[8] "_X3_X4_" "_X3_X5_" "_X4_X5_"
> paste0("_", combn(vec, 2, paste0, collapse = "_"), "_")
[1] "_X1_X2_" "_X1_X3_" "_X1_X4_" "_X1_X5_" "_X2_X3_" "_X2_X4_" "_X2_X5_"
[8] "_X3_X4_" "_X3_X5_" "_X4_X5_"
You could use
apply(combn(vec, 2), 2, \(x) paste(x, collapse = "_"))
#> [1] "X1_X2" "X1_X3" "X1_X4" "X1_X5" "X2_X3" "X2_X4" "X2_X5" "X3_X4" "X3_X5" "X4_X5"
Here is tidyverse version using crossing:
library(tidyverse)
crossing(x=vec, y=vec) %>%
mutate(new = paste0("_",x,"_",y,"_")) %>%
group_by(x) %>%
filter(row_number()!= 1:unique(parse_number(x))) %>%
pull(new)
[1] "_X1_X2_" "_X1_X3_" "_X1_X4_" "_X1_X5_" "_X2_X3_"
[6] "_X2_X4_" "_X2_X5_" "_X3_X4_" "_X3_X5_" "_X4_X5_"
Here is another option using arrangements::combinations:
paste0("_",
apply(arrangements::combinations(x = vec, k = 2), 1, paste, collapse = "_"),
"_")
#[1] "_X1_X2_" "_X1_X3_" "_X1_X4_" "_X1_X5_" "_X2_X3_" "_X2_X4_" "_X2_X5_" "_X3_X4_" "_X3_X5_" "_X4_X5_"
I have a dataset with proteins accession numbers (DataGranulomeTidy). I have written a function (extractInfo) in r to scrape some information of those proteins from the ncbi website. The function works as expected when I run it in a short "for" loop.
DataGranulomeTidy <- tibble(GIaccessionNumber = c("29436380", "4504165", "17318569"))
extractInfo <- function(GInumber){
tempPage <- readLines(paste("https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=", GInumber, "&db=protein&report=genpept&conwithfeat=on&withparts=on&show-cdd=on&retmode=html&withmarkup=on&tool=portal&log$=seqview&maxdownloadsize=1000000", sep = ""), skipNul = TRUE)
tempPage <- base::paste(tempPage, collapse = "")
Accession <- str_extract(tempPage, "(?<=ACCESSION).{3,20}(?=VERSION)")
Symbol <- str_extract(tempPage, "(?<=gene=\").{1,20}(?=\")")
GeneID <- str_extract(tempPage, "(?<=gov/gene/).{1,20}(?=\">)")
out <- paste(Symbol, Accession, GeneID, sep = "---")
return(out)
}
for(n in 1:3){
print(extractInfo(GInumber = DataGranulomeTidy$GIaccessionNumber[n]))
}
[1] "MYH9--- AAH49849---4627"
[1] "GSN--- NP_000168---2934"
[1] "KRT1--- NP_006112---3848"
When I use the same function in a dplyr pipe I doesn't work and I can't figure our why.
> DataGranulomeTidy %>% mutate(NewVar = extractInfo(.$GIaccessionNumber))
Error in file(con, "r") : argumento 'description' inválido
At this point I could make things work without using the "pipe" operator by using the "for" operator but I would like so much to understand why the function does not work in the dplyr pipe.
It is the cause that your UDF can't treat vector.
vectorized_extractInfo <- Vectorize(extractInfo, "GInumber")
DataGranulomeTidy %>%
mutate(NewVar = vectorized_extractInfo(GIaccessionNumber))
As #cuttlefish44 already pointed out, the problem is that your fun is not a vectorized fun. My approach uses purrr::map_chr. Another option would be to use dplyr::rowwise:
library(tidyverse)
DataGranulomeTidy <- tibble(GIaccessionNumber = c("29436380", "4504165", "17318569"))
extractInfo <- function(GInumber){
tempPage <- readLines(paste("https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=", GInumber, "&db=protein&report=genpept&conwithfeat=on&withparts=on&show-cdd=on&retmode=html&withmarkup=on&tool=portal&log$=seqview&maxdownloadsize=1000000", sep = ""), skipNul = TRUE)
tempPage <- base::paste(tempPage, collapse = "")
Accession <- str_extract(tempPage, "(?<=ACCESSION).{3,20}(?=VERSION)")
Symbol <- str_extract(tempPage, "(?<=gene=\").{1,20}(?=\")")
GeneID <- str_extract(tempPage, "(?<=gov/gene/).{1,20}(?=\">)")
out <- paste(Symbol, Accession, GeneID, sep = "---")
return(out)
}
DataGranulomeTidy %>% mutate(NewVar = map_chr(GIaccessionNumber, extractInfo))
#> # A tibble: 3 x 2
#> GIaccessionNumber NewVar
#> <chr> <chr>
#> 1 29436380 MYH9--- AAH49849---4627
#> 2 4504165 GSN--- NP_000168---2934
#> 3 17318569 KRT1--- NP_006112---3848
Created on 2020-04-17 by the reprex package (v0.3.0)
There is a rentrez package for NCBI queries, for example:
library(rentrez)
protein <- entrez_summary("protein", id = 29436380)
protein$caption
# [1] "AAH49849"
links <- entrez_link(dbfrom = "protein", id = 29436380, db = "gene")
links$links$protein_gene
# [1] "4627"
gene <- entrez_summary("gene", id = links$links$protein_gene)
gene$name
# [1] "MYH9"
Wrap this up into a function, then we don't need to mess about with regex.
I am trying to rename several variables in a chain:
df_foo = data_frame(
a.a = 1:10,
"b...b" = 1:10,
"cc..d" = 1:10
)
df_foo %>%
rename_(
.dots = setNames(
names(.),
gsub("[[:punct:]]", "", names(.)))
)
This works fine, but when there is a space in the name of one of the variables:
df_foo = data_frame(
a.a = 1:10,
"b...b" = 1:10,
"c c..d" = 1:10
)
df_foo %>%
rename_(
.dots = setNames(
names(.),
gsub("[[:punct:]]", "", names(.)))
)
I get this error:
Error in parse(text = x) : <text>:1:3: unexpected symbol
1: c c..d
^
I am not sure where this stems from since when I run gsub directly:
setNames(
names(df_foo),
gsub("[[:punct:]]", "", names(df_foo)))
I do not get an error. Not sure what is going on here.
This is now raised as issue #2391 on the dplyr GH issues page.
In general: I strongly suggest you never use variable names with spaces. They are a pain and will often cause more trouble than they are worth.
Here is the cause of this error.
rename_ dispatches to dplyr:::rename_.data.frame. First line of that function is:
dots <- lazyeval::all_dots(.dots, ...)
That lazyeval function will then call lazyeval::as.lazy_dots, which uses lazyeval::as.lazy, which itself uses lazyeval:::as.lazy.character which calls lazy_(parse(text = x)[[1]], env). Now, parse() expects valid R expression as its text argument:
text: character vector. The text to parse. Elements are treated as if they were lines of a file. (from help("parse"))
This is why rename_ doesn't seem to like character vectors with spaces and we get "Error in parse(text = x)":
lazyeval:::as.lazy(names(df_foo)[2])
<lazy>
expr: b...b
env: <environment: base>
lazyeval:::as.lazy(names(df_foo)[3])
Error in parse(text = x) : <text>:1:3: unexpected symbol
1: c c..d
^
I'm not aware of a solution, other then just using base for this simple renaming.