Merging two consecutive values - r

I have a vector of different values, and I would like to merge and add two values together if a 5 is followed by a 3.
Input:
vector <- c(1, 2, 7, 4, 3, 8, 5, 3, 2, 6, 9, 4, 4, 5, 6, 2, 6, 5, 3)
Expected output:
1 2 7 4 3 8 8 2 6 9 4 4 5 6 2 6 8
So as you can see, the two occurrences of a three following a 5 have been added together to show 8. I'm sure there is a simple function that will do this in a matter of seconds, I just wasn't able to find it.
Thanks in advance!

vector <- c(1, 2, 7, 4, 3, 8, 5, 3, 2, 6, 9, 4, 4, 5, 6, 2, 6, 5, 3)
# get indices where 5 followed by 3
fives <- head(vector, -1) == 5 & tail(vector, -1) == 3
# add three to fives
vector[fives] <- vector[fives] + 3
# remove threes
vector <- vector[c(TRUE, !fives)]
vector
# [1] 1 2 7 4 3 8 8 2 6 9 4 4 5 6 2 6 8

Here is one possibility:
x <- c(1, 2, 7, 4, 3, 8, 5, 3, 2, 6, 9, 4, 4, 5, 6, 2, 6, 5, 3)
A <- rbind(x[-length(x)], x[-1])
id <- which( colSums( abs(A - c(5, 3)) ) == 0 )
x[rbind(id, id + 1L)] <- c(8, NA)
na.omit(x)
This solution was proposed to make it easier to extend to general cases (It may not best meets OP's need, but I just did it as an exercise.)
In general, if you want to match a chunk xc in a vector x, we can do:
A <- t(embed(x, length(xc)))
id <- which(colSums(abs(A - rev(xc))) == 0)
Now id gives you the starting index of the matching chunk in x.

vector <- c(1, 2, 7, 4, 3, 8, 5, 3, 2, 6, 9, 4, 4, 5, 6, 2, 6, 5, 3)
temp = rev(which((vector == 5) & (vector[-1] == 3))) # find indexes of 5s followed by 3s
for (t in temp){
vector = vector[-(t+1)] # remove threes
vector[t] = 8 # replace fives with eights
}
vector
# [1] 1 2 7 4 3 8 8 2 6 9 4 4 5 6 2 6 8

Related

Finding Values Present in Two or More Unequal-Length Vectors

I have the following two numeric vectors:
A <- c(1, 3, 5, 7, 9)
B <- c(2, 3, 4, 5, 6, 10, 12, 13)
I want to generate a new vector C that contains the values that are present in both A and B (not the positions at which these values are found). The result should be:
C <- c(3, 5)
I also want to generate a vector D containing the values present in A but not present in B and a vector E containing the values present in B but not A.
D <- c(1, 7, 9)
E <- c(2, 4, 6, 10, 12, 13)
What is the best way to do this using base R? Thanks!
You can use the base R function intersect().
In addition, generally speaking I wouldn't use C as a variable name as it really close to c(), which might cause you problems.
A <- c(1, 3, 5, 7, 9)
B <- c(2, 3, 4, 5, 6, 10, 12, 13)
Inter <- intersect(A, B)
[1] 3 5
For the opposite of `intersect()':
#taken from here:https://www.r-bloggers.com/outersect-the-opposite-of-rs-intersect-function/
outersect <- function(x, y) {
sort(c(setdiff(x, y),
setdiff(y, x)))
}
outersect(A, B)
[1] 1 2 4 6 7 9 10 12 13
A <- c(1, 3, 5, 7, 9)
B <- c(2, 3, 4, 5, 6, 10, 12, 13)
C <- A[!A%in%B]
D <- B[!B%in%A]
Which yields
> C
[1] 1 7 9
> D
[1] 2 4 6 10 12 13

Group matching numbers in random order in R

I'm working on a Monte-Carlo simulation type problem and need to generate a vector of repeated random numbers, with the matching numbers grouped together, but in random order.
It's easier to explain with an example. If I had:
1, 3, 7, 12, 1, 3, 7, 12, 1, 3, 7, 12
I would like it sorted as:
7, 7, 7, 3, 3, 3, 12, 12, 12, 1, 1, 1 (or with the groups of matching numbers in any order but ascending/descending).
The reason I need the random order is because my MC simulation is for 2 variables, so if both are in order they won't vary independently.
I've got as far as:
sort(rep(runif(50,1,10),10), decreasing = FALSE)
Which generates 50 random numbers between 1 and 10, repeats each 10 times, then sorts the 50 groups of 10 matching random numbers in ascending order (or it could easily be descending order if I changed "FALSE" to "TRUE"). I just can't figure out the last step of getting 50 groups of 10 matching numbers in random order. Can anyone help?
Here is one option with split
unlist(sample(split(v1, v1)), use.names = FALSE)
#[1] 3 3 3 1 1 1 12 12 12 7 7 7
Or another option is match with unique
v1[order(match(v1, sample(unique(v1))))]
data
v1 <- c(1, 3, 7, 12, 1, 3, 7, 12, 1, 3, 7, 12)
An option could be as:
v <- c(1, 3, 7, 12, 1, 3, 7, 12, 1, 3, 7, 12)
lst <- split(v, unique(v))
sapply(sample(seq(length(lst)),length(lst)), function(i)lst[[i]])
# [,1] [,2] [,3] [,4]
#[1,] 3 12 7 1
#[2,] 3 12 7 1
#[3,] 3 12 7 1
#OR for having just a vector
as.vector(sapply(sample(seq(length(lst)),length(lst)), function(i)lst[[i]]))
#[1] 3 3 3 12 12 12 7 7 7 1 1 1

R Replicating until length is met

Say we have the following:
a=c( 1, 9, 5, 7, 8, 11)
length(a) ## 6
and I want to obtain:
a_desired=c( 1, 1, 9, 9, 5, 5, 7, 7, 8, 11)
length(a_desired) ## 10
Basically it stops replicating when it reaches the desired length, in this case 10.
If the desired length is 14,
a_desired=c( 1, 1, 1, 9, 9, 9, 5, 5, 7, 7, 8, 8, 11, 11)
Does anyone have a suggestion on how to obtain this or perhaps a link on something similar asked before ?(I'm not too sure what keyword to look for)
You could write your own function to do something like this
extend_to <- function(x, len) {
stopifnot(len>0)
times = len %/% length(x)
each <- rep(times, length(x))
more <- len-sum(each)
if (more>0) {
each[1:more] <- each[1:more]+1
}
rep(x, each)
}
a <- c( 1, 9, 5, 7, 8, 11)
extend_to(a, 6)
# [1] 1 9 5 7 8 11
extend_to(a, 10)
# [1] 1 1 9 9 5 5 7 7 8 11
extend_to(a, 14)
# [1] 1 1 1 9 9 9 5 5 7 7 8 8 11 11
extend_to(a, 2)
# [1] 1 9
We use the rep() to repeat each element a certain number of times.
So if your sequence is currently of length M and you want length N > M, then you have these possibilities:
N <= 2M: double the first (N-M) items
2M < N <= 3M: triple the first (N-2M) items, double the rest
3M < N <= 4M: quadruple the first (N-3M) items, triple the rest.
and so on.
So first, divide the target length by the current length, take the floor, and replicate the sequence that many times. Then add an extra copy of the first remainder items.
a=c( 1, 9, 5, 7, 8, 11)
m=length(a)
n=10 # desired new length
new_a = append(
rep(a[1:(n%%m)],each=ceiling(n/m)),
rep(a[((n%%m)+1):m],each=floor(n/m)))

Compare frequencies of samples in r

I would like to compare the frequency of samples from two different observations. The problem is that the first doesn't contain the whole range of numbers of the second. How could I combine these without writing a for loop sorting them based on the x values returned by count?
Here's a MWE for clarification:
library(plyr)
a <- c(5, 4, 5, 7, 3, 5, 6, 5, 5, 4, 5, 5, 4, 5, 4, 7, 2, 4, 4, 5, 3, 6, 5, 6, 4, 4, 5, 4, 5, 5, 6, 7, 4)
b <- c(1, 3, 4, 6, 2, 7, 7, 4, 3, 6, 6, 3, 6, 6, 5, 6, 6, 5)
a.count <- count(a)
b.count <- count(b)
My desired result should look somehow like that:
freq.a freq.b
1 1
2 1 1
3 3 2
4 2 10
5 2 13
6 7 4
7 2 3
If you put your data in long format (one row per observation, with a variable for which sample it is from), then you can just make a contingency table:
data.frame(v=df.a, s='a') %>% rbind(data.frame(v=df.b, s='b')) %>%
xtabs(f=~v+s)
Produces:
s
v a b
1 0 1
2 1 1
3 2 3
4 10 2
5 13 2
6 4 7
7 3 2
df <- merge(a.count, b.count, by ='x', all=TRUE)[2:3]
names(df) <- c('freq.a', 'freq.b')
df
freq.a freq.b
1 NA 1
2 1 1
3 2 3
4 10 2
5 13 2
6 4 7
7 3 2

How to divide group in R

If I have a series like this:
s={9, 4, 6, 5, 3, 10, 5, 3, 5)}
I want to divide the group by the number 5
at the end , it has to be
s1={9, 4, 6, 5}
s2={5, 3, 10, 5}
s3={5, 3, 5}
I have already tried
cut(ss,ss==5)
what am i supposed to do?
what function that i can use?
Here's an approach to generate a list containing the three vectors:
# the original vector
s <- c(9, 4, 6, 5, 3, 10, 5, 3, 5)
# an index vector
idx <- unique(c(1, which(s == 5), length(s)))
# create a list
mylist <- lapply(seq(length(idx) - 1), function(i) s[idx[i]:idx[i+1]])
mylist
# [[1]]
# [1] 9 4 6 5
# [[2]]
# [1] 5 3 10 5
# [[3]]
# [1] 5 3 5
You can access the list elements with [[, e.g., mylist[[1]] for the first vector.

Resources