Here is my sample:
a = c("a","b","c")
b = c("1","2","3")
I need to concatenate a and b automatically. The result should be "a 1","a 2","a 3","b 1","b 2","b 3","c 1","c 2","c 3".
For now, I am using the paste function:
paste(a[1],b[1])
I need an automatic way to do this. Besides writing a loop, is there any easier way to achieve this?
c(outer(a, b, paste))
# [1] "a 1" "b 1" "c 1" "a 2" "b 2" "c 2" "a 3" "b 3" "c 3"
Other options are :
paste(rep.int(a,length(b)),b)
or :
with(expand.grid(b,a),paste(Var2,Var1))
You can do:
c(sapply(a, function(x) {paste(x,b)}))
[1] "a 1" "a 2" "a 3" "b 1" "b 2" "b 3" "c 1" "c 2" "c 3"
edited paste0 into paste to match OP update
Related
I have a very long string (~1000 words) and I would like to split it into two-word phrases.
I have this:
string <- "A B C D E F"
and I would like this:
"A B"
"B C"
"C D"
"D E"
"E F"
The long string has already been cleaned and stemmed, and stop-words have been removed.
I tried to use str_split, but (I think) this needs a separator, which here is complicated because I don't want to separate A from B only "A B" from "C D", and "B C" from "D E", etc.
Split on space, then paste with shift:
s <- unlist(strsplit(string, " ", fixed = TRUE))
sl <- length(s)
paste(s[1:(sl-1)], s[2:sl])
# [1] "A B" "B C" "C D" "D E" "E F"
tmp <- strsplit(string, " ")[[1]]
tmp
# [1] "A" "B" "C" "D" "E" "F"
sapply(seq_along(tmp)[-1], function(z) paste(tmp[z-1:0], collapse = " "))
# [1] "A B" "B C" "C D" "D E" "E F"
If you already use some text mining package (as cleaned, stemmed and removed stop-words would suggest), there's most likely something to generate n-grams (and not just bigrams). For example quanteda::tokens_ngrams() or tidytext::unnest_ngrams():
string <- "A B C D E F"
quanteda::tokens_ngrams(quanteda::tokens(string), concatenator = " ")
#> Tokens consisting of 1 document.
#> text1 :
#> [1] "A B" "B C" "C D" "D E" "E F"
data.frame(s = string) |>
tidytext::unnest_ngrams(input = "s", output = "bigrams", n = 2)
#> bigrams
#> 1 a b
#> 2 b c
#> 3 c d
#> 4 d e
#> 5 e f
Created on 2023-01-31 with reprex v2.0.2
An option would be to use a regex with look ahead.
string <- "A B C D E F"
. <- gregexpr("\\S+\\s+(?=(\\S+))", string, perl=TRUE)[[1]]
attr(.,"match.length") <- attr(.,"match.length") + attr(., "capture.length")
regmatches(string, list(.))[[1]]
#[1] "A B" "B C" "C D" "D E" "E F"
I have a vector
x <- c("a b c", "d e")
with splitted entries
str_split(x, " ")
I want to get all permutations per splitted vector entry, so the result should be
c("a b c", "b c a", "c a b", "a c b", "b a c", "c b a", "d e", "e d")
I tried to use function
permutations(n, r, v=1:n, set=TRUE, repeats.allowed=FALSE)
After the str_split step , you can use combinat::permn to create all possible permutation of the string and paste them together.
result <- unlist(sapply(strsplit(x, " "), function(x)
combinat::permn(x, paste0, collapse = " ")))
result
#[1] "a b c" "a c b" "c a b" "c b a" "b c a" "b a c" "d e" "e d"
You can try pracma::perms like below
unlist(
Map(
function(v) do.call(paste, as.data.frame(pracma::perms(v))),
strsplit(x, " ")
)
)
which gives
[1] "c b a" "c a b" "b c a" "b a c" "a b c" "a c b" "e d" "d e"
I want to loop over combinations created by combn().
Input:
"a" "b" "c" "d"
Desired Output:
[1] "a" "b" "c" "d"
[1] "a and b" "a and c" "a and d" "b and c" "b and d" "c and d"
[1] "a and b and c" "a and b and d" "a and c and d" "b and c and d"
[1] "a and b and c and d"
What i tried:
classes <- letters[1:4]
cl <- lapply(1:length(classes), combn, x = classes)
apply(cl[[1]], 2, paste, collapse = " and ")
apply(cl[[2]], 2, paste, collapse = " and ")
apply(cl[[3]], 2, paste, collapse = " and ")
apply(cl[[4]], 2, paste, collapse = " and ")
Basically my Question is what is the best way to loop over the last part apply(cl[[NR]], 2, paste, collapse = " and ").
I thought about lapply, but that i would assign FUN twice and it seems odd to combine lapply and apply in one call. For loop is possible but Maybe there is a more efficient way.
If the Question is better suited for Code review, i am happy to migrate it.
You can iterate over the length of your vector and use the function argument of combn() to collapse the output using paste():
vec <- letters[1:4]
lapply(seq_along(vec), function(x) combn(vec, x, FUN = paste, collapse = " and "))
[[1]]
[1] "a" "b" "c" "d"
[[2]]
[1] "a and b" "a and c" "a and d" "b and c" "b and d" "c and d"
[[3]]
[1] "a and b and c" "a and b and d" "a and c and d" "b and c and d"
[[4]]
[1] "a and b and c and d"
I'm trying to split my string into multiple rows. String looks like this:
x <- c("C 10.1 C 12.4","C 12", "C 45.5 C 10")
Code snippet:
strsplit(x, "//s")[[3]]
Result:
"C 45.5 C 10"
Expected Output: Split string into multiple rows like this:
"C 10.1"
"C 12.4"
"C 12"
"C 45.5"
"C 10"
The question is how to split the string?
Clue: there is a space and then character which is "C" in our case. Anyone who knows how to do it?
You may use
unlist(strsplit(x, "(?<=\\d)\\s+(?=C)", perl=TRUE))
Output:
[1] "C 10.1" "C 12.4" "C 12" "C 45.5" "C 10"
See the online R demo and a regex demo.
The (?<=\\d)\\s+(?=C) regex matches 1 or more whitespace characters (\\s+) that are immediately preceded with a digit ((?<=\\d)) and that are immediately followed with C.
If C can be any uppercase ASCII letter, replace C with [A-Z].
A somwhat more complicated expression but easier on the regex side:
unlist(
sapply(
strsplit(x, " ?C"),
function(x) {
paste0("C", x[nzchar(x)])
}
)
)
"C 10.1" "C 12.4" "C 12" "C 45.5" "C 10"
I have a dataframe but need to convert it to a large character. Here is an example of the dataframe structure:
texts <- c("TEXT 1", "TEXT 2", "TEXT 3")
data <- data.frame(texts)
I need this structure:
[1] "TEXT 1" "TEXT 2" "TEXT 3"
I already tried using function as.character() , but it does not work as it converts all the lines to a single line.
You can transpose and concatenate, i.e.
c(t(data))
#[1] "TEXT 1" "TEXT 2" "TEXT 3"