Match vectors in sequence - r

I have 2 vectors.
x=c("a", "b", "c", "d", "a", "b", "c")
y=structure(c(1, 2, 3, 4, 5, 6, 7, 8), .Names = c("a", "e", "b",
"c", "d", "a", "b", "c"))
I would like to match a to a, b to b in sequence accordingly, so that x[2] matches y[3] rather than y[7]; and x[5] matches y[6] rather than y[1], so on and so forth.
lapply(x, function(z) grep(z, names(y), fixed=T))
gives:
[[1]]
[1] 1 6
[[2]]
[1] 3 7
[[3]]
[1] 4 8
[[4]]
[1] 5
[[5]]
[1] 1 6
[[6]]
[1] 3 7
[[7]]
[1] 4 8
which matches all instances. How do I get this sequence:
1 3 4 5 6 7 8
So that elements in x can be mapped to the corresponding values in y accordingly?

You are actually looking for pmatch
pmatch(x,names(y))
[1] 1 3 4 5 6 7 8

You can change the names attributes according to the number of times each element appeared and then subset y:
x2 <- paste0(x, ave(x, x, FUN=seq_along))
#[1] "a1" "b1" "c1" "d1" "a2" "b2" "c2"
names(y) <- paste0(names(y), ave(names(y), names(y), FUN=seq_along))
y[x2]
#a1 b1 c1 d1 a2 b2 c2
# 1 3 4 5 6 7 8

Another option using Reduce
Reduce(function(v, k) y[-seq_len(v)][k],
x=x[-1L],
init=y[x[1L]],
accumulate=TRUE)

Well, I did it with a for-loop
#Initialise the vector with length same as x.
answer <- numeric(length(x))
for (i in seq_along(x)) {
#match the ith element of x with that of names in y.
answer[i] <- match(x[i], names(y))
#Replace the name of the matched element to empty string so next time you
#encounter it you get the next index.
names(y)[i] <- ""
}
answer
#[1] 1 3 4 5 6 7 8

Another possibility:
l <- lapply(x, grep, x = names(y), fixed = TRUE)
i <- as.integer(ave(x, x, FUN = seq_along))
mapply(`[`, l, i)
which gives:
[1] 1 3 4 5 6 7 8

Similar solution to Ronak, but it does not persist changes to y
yFoo<-names(y)
sapply(x,function(u){res<-match(u,yFoo);yFoo[res]<<-"foo";return(res)})
Result
#a b c d a b c
#1 3 4 5 6 7 8

Related

R - identify sequences in a vector

Suppose I have a vector ab containing A's and B's. I want to identify sequences and create a vector v with length(ab) that indicates the sequence length at the beginning and end of a given sequence and NA otherwise.
I have however the restriction that another vector x with 0/1 will indicate that a sequence ends.
So for example:
rep("A", 6)
"A" "A" "A" "A" "A" "A"
x <- c(0,0,1,0,0,0)
0 0 1 0 0 0
should give
v <- c(3 NA 3 3 NA 3)
An example could be the following:
ab <- c(rep("A", 5), "B", rep("A", 3))
"A" "A" "A" "A" "A" "B" "A" "A" "A"
x <- c(rep(0,3),1,0,1,rep(0,3))
0 0 0 1 0 1 0 0 0
Here the output should be:
4 NA NA 4 1 1 3 NA 3
(without the restriction it would be)
5 NA NA NA 5 1 3 NA 3
So far, my code without the restriction looks like this:
ab <- c(rep("A", 5), "B", rep("A", 3))
x <- c(rep(0,3),1,0,1,rep(0,3))
cng <- ab[-1L] != ab[-length(ab)] # is there a change in A and B w.r.t the previous value?
idx <- which(cng) # where do the changes take place?
idx <- c(idx,length(ab)) # include the last value
seq_length <- diff(c(0, idx)) # how long are the sequences?
# create v
v <- rep(NA, length(ab))
v[idx] <- seq_length # sequence end
v[idx-(seq_length-1)] <- seq_length # sequence start
v
Does anyone have an idea how I can implement the restriction? (And since my vector has 2 Millions of observations, I wonder whether there would be a more efficient way than my approach)
I would appreciate any comments! Many thanks in advance!
You may do something like this
x <- c(rep(0,3),1,rep(0,2),1,rep(0,3))
ab <- c(rep("A", 5), "B", rep("A", 4))
#creating result of lengths
res <- as.numeric(ave(ab, rev(cumsum(rev(x))), FUN = function(z){with(rle(z), rep(lengths, lengths))}))
> res
[1] 4 4 4 4 1 1 1 3 3 3
#creating intermediate NAs
replace(res, with(rle(res), setdiff(seq_along(res), c(length(res) + 1 - cumsum(rev(lengths)),
cumsum(lengths),
which(res == 1)))), NA)
[1] 4 NA NA 4 1 1 1 3 NA 3
As per edited scenario
x <- c(rep(0,3),1,rep(0,2),1,rep(0,3))
ab <- c(rep("A", 5), "B", rep("A", 4))
ab[3] <- 'B'
as.numeric(ave(ab, rev(cumsum(rev(x))), FUN = function(z){with(rle(z), rep(lengths, lengths))}))
[1] 2 2 1 1 1 1 1 3 3 3
ab
[1] "A" "A" "B" "A" "A" "B" "A" "A" "A" "A"

Naming objects from functions

I am a beginner in R. I have a vast data set and I am trying to create a loop to optimize the time.
I have something like:
a <- c ('exam12', 'example22', 'e33')
b <- list (c (2,4,5,6), c (10,4,8,6), c (25, 3, 7, 30))
And I would like to use the strings of a as the name of objects for other values, obtaining, in my environment, something like:
exam <- c (2,4,5,6)
example <- c (10,4,8,6)
e <- c (25, 3, 7, 30)
I tried the following:
for (i in seq_along (a)) {
for (j in seq_along (b)) {
str_sub (a [i], start = 1, end = -1) <- b [j]
}
}
But I was not successful. I appreciate any help.
You can use list2env:
a <- c ('exam12', 'example22', 'e33')
b <- list (c (2,4,5,6), c (10,4,8,6), c (25, 3, 7, 30))
a
# [1] "exam12" "example22" "e33"
b
# [[1]]
# [1] 2 4 5 6
#
# [[2]]
# [1] 10 4 8 6
#
# [[3]]
# [1] 25 3 7 30
ls()
# [1] "a" "b"
list2env(setNames(b, sub("\\d+$", "", a)), .GlobalEnv)
# <environment: R_GlobalEnv>
ls()
# [1] "a" "b" "e" "exam" "example"
exam
# [1] 2 4 5 6
For reference, you could also do this with assign, for example:
for (i in seq_along(a)) {
assign(sub("\\d+$", "", a[i]), b[[i]])
}

Applying a function across nested list

Say, I have the following list
raw <- list(list(1:2, 2:3, 3:4), list(4:5, 5:6, 6:7), list(7:8, 8:9, 9:10))
I would like to find the mean of the corresponding entries of the out-most list. The expected output would be something like
[[1]]
[1] 4 5
[[2]]
[1] 5 6
[[3]]
[1] 6 7
This is because the mean of 1:2, 4:5, and 7:8 would be 4:5.
I have been experimenting with stuff like lapply(raw, function(x) lapply(x, mean)), but apparently it doesn't return the desired output.
This is pretty ugly, but we can use mapply to iterate over the lists but we need to expand the list into parameters via do.call
do.call("mapply", c(function(...) rowMeans(data.frame(...)), raw, SIMPLIFY=FALSE))
You can make this prettier using the purrr package
purrr::pmap(raw, ~rowMeans(data.frame(...)))
1
n = length(raw[[1]])
lapply(1:n, function(i){
d = do.call(rbind, lapply(seq_along(raw), function(j){
raw[[j]][[i]]
}))
apply(d, 2, mean)
})
#[[1]]
#[1] 4 5
#[[2]]
#[1] 5 6
#[[3]]
#[1] 6 7
2
aggregate(. ~ ind, do.call(rbind, lapply(raw, function(x)
data.frame(cbind(do.call(rbind, x), ind = seq_along(x))))), mean)
# ind V1 V2
#1 1 4 5
#2 2 5 6
#3 3 6 7
You could put the thing into an array and take the cell medians (I suppose you want these instead of means).
A <- array(matrix(unlist(raw), 2, byrow=FALSE), dim=c(2, 3, 3))
v.mds <- t(apply(A, 1:2, median))
lapply(1:3, function(x) v.mds[x, ])
# [[1]]
# [1] 4 5
#
# [[2]]
# [1] 5 6
#
# [[3]]
# [1] 6 7
Generalized like so:
A <- array(matrix(unlist(raw), length(el(el(raw))), byrow=0),
dim=c(length(el(el(raw))), el(lengths(raw)), length(raw)))
v.mds <- t(apply(A, 1:2, median))
lapply(1:nrow(v.mds), function(x) v.means[x, ])

print rearranged dataframe to screen with loops or `purrr::map()`

I have data in a form like this (reproducible code below):
#> y x char
#> 1 1 1 a
#> 2 1 2 b
#> 3 1 3 c
#> 4 2 1 d
#> 5 2 2 e
#> 6 2 3 f
#> 7 3 1 g
#> 8 3 2 h
#> 9 3 3 i
df <- data.frame(stringsAsFactors=FALSE,
y = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
x = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
char = c("a", "b", "c", "d", "e", "f", "g", "h", "i")
)
df
Is there an easy way to print to the screen using y values as the y axis, and x values as the x axis and the third column (char) as the values? A solution with map() would be great.
So my desired output would look like
abc
def
ghi
I started trying to loop through y and x, with a view to using purrr::map(), but I haven't gotten very far.
if (df$y==1 & df$x==1){
print(df$char)
}
That's what tidyr::spread() is for:
spread(df, x, char)
You can also convert your data.frame into a matrix:
a <- matrix(
data = df$char,
nrow = length(unique(df$x)),
ncol = length(unique(df$y)),
"dimnames" = list(unique(df$y), unique(df$x)),
byrow = TRUE
)
it will be:
1 2 3
1 "a" "b" "c"
2 "d" "e" "f"
3 "g" "h" "i"
To concatenate the strings into a column as you wish:
for (r in 1:nrow(a)) {
print(paste(a[r, ], collapse = ''))
}
[1] "abc"
[1] "def"
[1] "ghi"

Recursively set dimnames on a list of matrices

On a list of matrices, I'd like to set only the colnames and leave the rownames as NULL. The matrices are all different dimension. Unlike this example, the names are specific to each matrix.
provideDimnames gets me in the ballpark, but I'm having trouble telling it to ignore the NULL row names, and only set the column names. Here are my attempts.
> L <- list(matrix(1:6, 2), matrix(1:20, 5))
> dimnm <- list(list(NULL, letters[1:3]), list(NULL, letters[1:4]))
> lapply(L, provideDimnames, base = dimnm)
# Error in make.unique(base[[ii]][1L + (ss%%M[ii])], sep = sep) :
# 'names' must be a character vector
> lapply(L, provideDimnames, base = list(dimnm))
# Error in make.unique(base[[ii]][1L + (ss%%M[ii])], sep = sep) :
# 'names' must be a character vector
> lapply(L, provideDimnames, base = list(letters))
# [[1]]
# a b c
# a 1 3 5
# b 2 4 6
#
# [[2]]
# a b c d
# a 1 6 11 16
# b 2 7 12 17
# c 3 8 13 18
# d 4 9 14 19
# e 5 10 15 20
Almost, but I want [n,] for the row names. The desired result is:
> dimnames(L[[1]]) <- list(NULL, letters[1:3])
> dimnames(L[[2]]) <- list(NULL, letters[1:4])
> L
# [[1]]
# a b c
# [1,] 1 3 5
# [2,] 2 4 6
#
# [[2]]
# a b c d
# [1,] 1 6 11 16
# [2,] 2 7 12 17
# [3,] 3 8 13 18
# [4,] 4 9 14 19
# [5,] 5 10 15 20
> lapply(L, provideDimnames, base = list(NULL, letters))
# Error in make.unique(base[[ii]][1L + (ss%%M[ii])], sep = sep) :
# 'names' must be a character vector
> lapply(L, `colnames<-`, , letters)
# Error in FUN(X[[1L]], ...) :
# unused argument (c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k",
# "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"))
Is there a way to do this with provideDimnames()? setNames() wouldn't accept a list for the dim-names either.
How about something like this?
L <- list(matrix(1:6, 2), matrix(1:20, 5))
nms <- list(letters[1:3], letters[23:26])
mapply(function(X,Y) {colnames(X) <-Y; X}, L, nms)
[[1]]
a b c
[1,] 1 3 5
[2,] 2 4 6
[[2]]
w x y z
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
You can do this relatively easily but you are complicating it by trying to do both dimnames where really you just want to fiddle with the column names. I would go about it this way:
## different dimnames; list of only the colnames
dimnm <- list(letters[1:3], letters[1:4])
## function to lapply which does the change
cnames <- function(i, lmat, names) {
colnames(lmat[[i]]) <- names[[i]]
lmat[[i]]
}
## do the change
L2 <- lapply(seq_along(L), cnames, lmat = L, names = dimnm)
L2
Gives us:
> L2
[[1]]
a b c
[1,] 1 3 5
[2,] 2 4 6
[[2]]
a b c d
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20

Resources