Pasting together two vectors in R elementwise [duplicate] - r

This question already has answers here:
Unique combination of all elements from two (or more) vectors
(6 answers)
How to cross-paste all combinations of two vectors (each-to-each)?
(4 answers)
Closed 4 years ago.
I am trying to build a web scraper that'll go through a number of websites and extract some information. In order to do this, I'm going to need to make a list of the last couple of characters in the url, so I can map the information into a data frame. To do this, I've found out that the end of the url should have a number between 1 and 10 to identify a department, and a given character to identify a unit. Therefore I've made two vectors with the needed units and department IDs:
departments <- c(1, 2, 9, 10)
units <- c("A", "B", "C", "F", "I", "O", "V")
To go through each website, I need a list that combines these two vectors in each possible way, like for instance "1A", "1B", "1C", "1F", "1I", "1O", "1V", "2A" and so forth.
I've tried different solutions, but they do not return what I expected, like for instance:
> depUn <- as.list(paste(departments, units, sep = ""))
> depUn
[[1]]
[1] "1A"
[[2]]
[1] "2B"
[[3]]
[1] "9C"
[[4]]
[1] "10F"
[[5]]
[1] "1I"
[[6]]
[1] "2O"
[[7]]
[1] "9V"
Does someone have any good insight to how I could solve this?
EDIT
I've already tried the expand.grid option, and while it was succesful in putting the elements after one another in a list, it failed to put them together in one string. Can someone help me how to accomplish this? Here's and excerpt of the code and results I tried back then:
> dfDepUn <- expand.grid(departments, units)
> DepUn <- lapply(apply(dfDepUn, 1, identity), unlist)
> dfDepUn
[[1]]
Var1 Var2
"1" "A"
[[2]]
Var1 Var2
"2" "A"
[[3]]
Var1 Var2
"9" "A"
[[4]]
Var1 Var2
"10" "A"

Is this what you need?
df <- expand.grid(departments, units)
as.list(paste0(df$Var1, df$Var2))

Related

R: How to count length of intervals between specific word/symbol in a vector?

I have a vector that contains series of texts and numbers, like:
t <- c("A", 1:3, "A", 1:4, "A", 1:3)
t
#> [1] "A" "1" "2" "3" "A" "1" "2" "3" "4" "A" "1" "2" "3"
Created on 2022-08-06 by the reprex package (v2.0.1)
That is, the actual data is taken from a pdf, with the data frame collapsed into a single column vector, and the wrap length is uneven for some reason (probably because of the cell merging).
To process this data efficiently, I want to know the length from "A" to next "A" or end. In this example the answer would be 3, 4, 3 (Edit: sorry for a simple mistake, it would be 4, 5, 4).
I have tried many different methods but can't find one that works. Does anyone know of a better way?
An alternative using rle (run-length encoding)
with(rle(t == "A"), subset(lengths, !values))
#> [1] 3 4 3
You want the number of elements
(1) between adjacent "A"s;
(2) from the last "A" (excluding it) to the end.
We can use either of the following:
diff(c(which(t == "A"), length(t) + 1)) - 1
#[1] 3 4 3
diff(which(c(t, "A") == "A")) - 1
#[1] 3 4 3
Essentially we pad an "A" at the end to turn (2) into (1). If the last element of t happens to be an "A", the last value in the result will be 0.
Extension:
If you further want to know the number of elements from the beginning to the first "A" (excluding it), we can pad a leading "A":
diff(c(0, which(t == "A"), length(t) + 1)) - 1
#[1] 0 3 4 3
diff(which(c("A", t, "A") == "A")) - 1
#[1] 0 3 4 3
Here, the first value is 0, because the first element of t happens to be an "A".

Why multiple option in vunion function of the vecsets package does not work for character vectors?

When I run the code:
library(vecsets)
p <- c("a","b")
q <- c( "a")
vunion(p,q, multiple = TRUE)
I get the result:
[1] "a" "b"
But I expect the result to be
vunion(p,q, multiple = TRUE)
[1] "a" "b" "a"
I also do not understand the result provided in the example of the vesect package. The example shows:
x <- c(1:5,3,3,3,2,NA,NA)
y <- c(2:5,4,3,NA)
vunion(x,y,multiple=TRUE)
[1] 2 3 3 4 5 NA 1 3 3 2 NA 4
But if we check
length(x)+length(y); length(vunion(x,y))
[1] 18
[1] 12
we get different lengths, but I think they should be the same. Note, for example, 5 appears only once.
What's going on here? Can someone explain?
I think the vecset package documentation (link) describes this behavior quite well:
The base::union function removes duplicates per algebraic set theory. vunion does not, and so returns as many duplicate elements as are in either input vector (not the sum of their inputs.) In short, vunion is the same as vintersect(x,y) + vsetdiff(x,y) + vsetdiff(y,x).
It's true that you have to read carefully, though. I've emphasized the important part. The issue is not with character versus numeric vectors, but rather whether elements are repeated within the same vector or not. Consider p1 versus p2 in the following example. The result from vunion will have as many a's as either p or q, so we expect 1 "a" in the first part and two a's in the second part; both times we expect only 1 "b":
library(vecsets)
q <- c("a", "b")
p1 <- c("a", "b")
vunion(p1, q, multiple = TRUE)
[1] "a" "b"
p2 <- c("a", "a", "b")
vunion(p2, q, multiple = TRUE)
[1] "a" "b" "a"

R sort multiple (linked) list

I encountered this problem, I have two lists which have the same dimension (or the same number of elements), and they are linked.
For example, one list stores the ID number of the students, while another one stores the exam marks of these students.
I want to sort the exam marks from small to large, but I do not want to loose the one-to-one link between students ID and their marks. How can I do so in R ?
Let's assume a list like this:
a <- list(id = c("c", "a", "b"), score = c(2, 9, 12))
> a
$id
[1] "c" "a" "b"
$score
[1] 2 9 12
Then you can sort it using lapply...
> lapply(a, function(x)x[order(a$id)])
$id
[1] "a" "b" "c"
$score
[1] 9 12 2
> lapply(a, function(x)x[order(a$score, decreasing=TRUE)])
$id
[1] "b" "a" "c"
$score
[1] 12 9 2

Random sequence from fixed ensemble that contains at least one of each character

I am trying to generate a random sequence from a fixed number of characters that contains at least one of each character.
For example having the ensemble
m = letters[1:3]
I would like to create a sequence of N = 10 elements that contain at least one of each m characters, like
a
a
a
a
b
c
c
c
c
a
I tried with sample(n,N,replace=T) but in this way also a sequence like
a
a
a
a
a
c
c
c
c
a
can be generated that does not contain b.
f <- function(x, n){
sample(c(x, sample(m, n-length(x), replace=TRUE)))
}
f(letters[1:3], 5)
# [1] "a" "c" "a" "b" "a"
f(letters[1:3], 5)
# [1] "a" "a" "b" "b" "c"
f(letters[1:3], 5)
# [1] "a" "a" "b" "c" "a"
f(letters[1:3], 5)
# [1] "b" "c" "b" "c" "a"
Josh O'Briens answer is a good way to do it but doesn't provide much input checking. Since I already wrote it might as well present my answer. It's pretty much the same thing but takes care of checking things like only considering unique items and making sure there are enough unique items to guarantee you get at least one of each.
at_least_one_samp <- function(n, input){
# Only consider unique items.
items <- unique(input)
unique_items_count <- length(items)
if(unique_items_count > n){
stop("Not enough unique items in input to give at least one of each")
}
# Get values for vector - force each item in at least once
# then randomly select values to get the remaining.
vals <- c(items, sample(items, n - unique_items_count, replace = TRUE))
# Now shuffle them
sample(vals)
}
m <- c("a", "b", "c")
at_least_one_samp(10, m)

R: are there built-in functions to sort lists?

in R I have produced the following list L:
>L
[[1]]
[1] "A" "B" "C"
[[2]]
[1] "D"
[[3]]
[1] NULL
I would like to manipulate the list L arriving at a database df like
>df
df[,1] df[,2]
"A" 1
"B" 1
"C" 1
"D" 2
where the 2nd column gives the position in the list L of the corresponding element in column 1.
My question is: is(are) there a() built-in R function(s) which can do this manipulation quickly? I can do it using "brute force", but my solution does not scale well when I consider much bigger lists.
I thank you all!
You'll get a warning because of your NULL value, but you can use stack if you give your list items names:
L <- list(c("A", "B", "C"), "D", NULL)
stack(setNames(L, seq_along(L)))
# values ind
# 1 A 1
# 2 B 1
# 3 C 1
# 4 D 2
# Warning message:
# In stack.default(setNames(L, seq_along(L))) :
# non-vector elements will be ignored
If the warning displeases you, you can, of course, run stack on the non-NULL elements, but do it after you name your list elements so that the "ind" column reflects the correct value.
I'll show in 2 steps just for clarity:
names(L) <- seq_along(L)
stack(L[!sapply(L, is.null)])
Similarly, if you've gotten rid of the NULL list elements, you can use melt from "reshape2". You don't gain anything in brevity, and I'm not sure that you gain anything in efficiency either, but I thought I'd share it as an option.
library(reshape2)
names(L) <- seq_along(L)
melt(L[!sapply(L, is.null)])
Ananda's answer is seemingly better than this, but I'll put it up anyway:
> cbind(unlist(L), rep(1:length(L), sapply(L, length)))
[,1] [,2]
[1,] "A" "1"
[2,] "B" "1"
[3,] "C" "1"
[4,] "D" "2"

Resources