Updating empirical cumulative function - r

I have the following problem:
given a stream of observations, find the number of observations that are less or equal to the currently last observation. For example, if the streaming observations are
8, 1, 10, 3, 9, 7, 4, 5, 6, 2
then we have the following updates
Observations - 8, there is 1 observation less or equal 8
Observations - 8, 1, there is 1 observation less or equal 1
Observations - 8, 1, 10, there are 3 observation less or equal 10
...
As a result one would obtain such values
1, 1, 3, 2, 4, 3, 3, 4, 5, 2
The solution should be very fast as I am working with huge dataset.

Using a for but in the reverse direction, I don't test but I think it is faster.
xx <- c(8, 1, 10, 3, 9, 7, 4, 5, 6, 2)
res = vector('integer',length=length(xx))
for (i in rev(seq_along(xx))) {
res[i] <- sum(xx[i]>=xx)
xx <- xx[-i]
}
res
[1] 1 1 3 2 4 3 3 4 5 2

You can use sapply:
vec <- c(8, 1, 10, 3, 9, 7, 4, 5, 6, 2)
sapply(seq_along(vec), function(x) sum(vec[seq(x)] <= vec[x]))
# [1] 1 1 3 2 4 3 3 4 5 2
Since performence is important, you can also use vapply. It might be faster (untested):
vapply(seq_along(vec), function(x) sum(vec[seq(x)] <= vec[x]), integer(1))
# [1] 1 1 3 2 4 3 3 4 5 2

So I couldn't leave well enough alone, so I created a kludgemonster
carl<-function(vec) {
newct<-vector('integer',length=length(vec))
vlen<-length(vec)
for(j in 1:length(vec) ) {
wins<- (which(vec[j:vlen] >= vec[j])+j-1)
newct[wins]<-newct[wins]+1
}
}
It appears to work, but...
Rgames> set.seed(20)
Rgames> vec<-runif(2000)
Rgames> microbenchmark(carl(vec),agstudy(vec),times=10)
Unit: milliseconds
expr min lq median uq max neval
carl(vec) 86.75314 87.55323 88.16816 88.80831 89.65117 10
agstudy(vec) 70.26213 70.83771 71.06158 71.72247 71.93800 1
Still not quite as good as agstudy's code. Maybe someone can tighten up my loop?

Related

average calculation for many data in R

I have a file to which the results are saved:
4
4
4
4
5
4
4
5
6
4
4
5
5
6
4
I would like to calculate the average for each group
unfortunately, only I managed to calculate for everyone
I would like to get an average of 5 items
they are savedin wynik2.txt file
wynik_epidemii <- read.table(file="wynik2.txt")
wynik_epidemii<- mean(as.numeric(unlist(wynik_epidemii)))
You can use tapply, defining the grouping factor with a cumsum trick.
meanN <- function(x, n = 5){
f <- cumsum(rep(c(1, rep(0, n - 1)), length.out = length(x)))
tapply(x, f, mean)
}
meanN(x)
# 1 2 3
#4.2 4.6 4.8
DATA.
x <-
c(4, 4, 4, 4, 5, 4, 4, 5, 6, 4, 4, 5, 5, 6, 4)

Count occurence of multiple numbers in vector one by one

I have two vectors
a <- c(1, 5, 2, 1, 2, 3, 3, 4, 5, 1, 2)
b <- (1, 2, 3, 4, 5, 6)
I want to know how many times each element in b occurs in a. So the result should be
c(3, 3, 2, 1, 2, 0)
All methods I found like match(),==, %in% etc. are not suited for entire vectors. I know I can use a loop over all elements in b,
for (i in 1:length(b)) {
c[I] <- sum(a==b, na.rm=TRUE)
}
but this is used often and takes to long. That's why I'm looking for a vectorized way, or a way to use apply().
You can do this using factor and table
table(factor(a, unique(b)))
#
#1 2 3 4 5 6
#3 3 2 1 2 0
Since you mentioned match, here is a possibility without sapply loop (thanks to #thelatemail)
table(factor(match(a, b), unique(b)))
#
#1 2 3 4 5 6
#3 3 2 1 2 0
Here is a base R option, using sapply with which:
a <- c(1, 5, 2, 1, 2, 3, 3, 4, 5, 1, 2)
b <- c(1, 2, 3, 4, 5, 6)
sapply(b, function(x) length(which(a == x)))
[1] 3 3 2 1 2 0
Demo
Here is a vectorised method
x = expand.grid(b,a)
rowSums( matrix(x$Var1 == x$Var2, nrow = length(b)))
# [1] 3 3 2 1 2 0

Group matching numbers in random order in R

I'm working on a Monte-Carlo simulation type problem and need to generate a vector of repeated random numbers, with the matching numbers grouped together, but in random order.
It's easier to explain with an example. If I had:
1, 3, 7, 12, 1, 3, 7, 12, 1, 3, 7, 12
I would like it sorted as:
7, 7, 7, 3, 3, 3, 12, 12, 12, 1, 1, 1 (or with the groups of matching numbers in any order but ascending/descending).
The reason I need the random order is because my MC simulation is for 2 variables, so if both are in order they won't vary independently.
I've got as far as:
sort(rep(runif(50,1,10),10), decreasing = FALSE)
Which generates 50 random numbers between 1 and 10, repeats each 10 times, then sorts the 50 groups of 10 matching random numbers in ascending order (or it could easily be descending order if I changed "FALSE" to "TRUE"). I just can't figure out the last step of getting 50 groups of 10 matching numbers in random order. Can anyone help?
Here is one option with split
unlist(sample(split(v1, v1)), use.names = FALSE)
#[1] 3 3 3 1 1 1 12 12 12 7 7 7
Or another option is match with unique
v1[order(match(v1, sample(unique(v1))))]
data
v1 <- c(1, 3, 7, 12, 1, 3, 7, 12, 1, 3, 7, 12)
An option could be as:
v <- c(1, 3, 7, 12, 1, 3, 7, 12, 1, 3, 7, 12)
lst <- split(v, unique(v))
sapply(sample(seq(length(lst)),length(lst)), function(i)lst[[i]])
# [,1] [,2] [,3] [,4]
#[1,] 3 12 7 1
#[2,] 3 12 7 1
#[3,] 3 12 7 1
#OR for having just a vector
as.vector(sapply(sample(seq(length(lst)),length(lst)), function(i)lst[[i]]))
#[1] 3 3 3 12 12 12 7 7 7 1 1 1

R Replicating until length is met

Say we have the following:
a=c( 1, 9, 5, 7, 8, 11)
length(a) ## 6
and I want to obtain:
a_desired=c( 1, 1, 9, 9, 5, 5, 7, 7, 8, 11)
length(a_desired) ## 10
Basically it stops replicating when it reaches the desired length, in this case 10.
If the desired length is 14,
a_desired=c( 1, 1, 1, 9, 9, 9, 5, 5, 7, 7, 8, 8, 11, 11)
Does anyone have a suggestion on how to obtain this or perhaps a link on something similar asked before ?(I'm not too sure what keyword to look for)
You could write your own function to do something like this
extend_to <- function(x, len) {
stopifnot(len>0)
times = len %/% length(x)
each <- rep(times, length(x))
more <- len-sum(each)
if (more>0) {
each[1:more] <- each[1:more]+1
}
rep(x, each)
}
a <- c( 1, 9, 5, 7, 8, 11)
extend_to(a, 6)
# [1] 1 9 5 7 8 11
extend_to(a, 10)
# [1] 1 1 9 9 5 5 7 7 8 11
extend_to(a, 14)
# [1] 1 1 1 9 9 9 5 5 7 7 8 8 11 11
extend_to(a, 2)
# [1] 1 9
We use the rep() to repeat each element a certain number of times.
So if your sequence is currently of length M and you want length N > M, then you have these possibilities:
N <= 2M: double the first (N-M) items
2M < N <= 3M: triple the first (N-2M) items, double the rest
3M < N <= 4M: quadruple the first (N-3M) items, triple the rest.
and so on.
So first, divide the target length by the current length, take the floor, and replicate the sequence that many times. Then add an extra copy of the first remainder items.
a=c( 1, 9, 5, 7, 8, 11)
m=length(a)
n=10 # desired new length
new_a = append(
rep(a[1:(n%%m)],each=ceiling(n/m)),
rep(a[((n%%m)+1):m],each=floor(n/m)))

Merging two consecutive values

I have a vector of different values, and I would like to merge and add two values together if a 5 is followed by a 3.
Input:
vector <- c(1, 2, 7, 4, 3, 8, 5, 3, 2, 6, 9, 4, 4, 5, 6, 2, 6, 5, 3)
Expected output:
1 2 7 4 3 8 8 2 6 9 4 4 5 6 2 6 8
So as you can see, the two occurrences of a three following a 5 have been added together to show 8. I'm sure there is a simple function that will do this in a matter of seconds, I just wasn't able to find it.
Thanks in advance!
vector <- c(1, 2, 7, 4, 3, 8, 5, 3, 2, 6, 9, 4, 4, 5, 6, 2, 6, 5, 3)
# get indices where 5 followed by 3
fives <- head(vector, -1) == 5 & tail(vector, -1) == 3
# add three to fives
vector[fives] <- vector[fives] + 3
# remove threes
vector <- vector[c(TRUE, !fives)]
vector
# [1] 1 2 7 4 3 8 8 2 6 9 4 4 5 6 2 6 8
Here is one possibility:
x <- c(1, 2, 7, 4, 3, 8, 5, 3, 2, 6, 9, 4, 4, 5, 6, 2, 6, 5, 3)
A <- rbind(x[-length(x)], x[-1])
id <- which( colSums( abs(A - c(5, 3)) ) == 0 )
x[rbind(id, id + 1L)] <- c(8, NA)
na.omit(x)
This solution was proposed to make it easier to extend to general cases (It may not best meets OP's need, but I just did it as an exercise.)
In general, if you want to match a chunk xc in a vector x, we can do:
A <- t(embed(x, length(xc)))
id <- which(colSums(abs(A - rev(xc))) == 0)
Now id gives you the starting index of the matching chunk in x.
vector <- c(1, 2, 7, 4, 3, 8, 5, 3, 2, 6, 9, 4, 4, 5, 6, 2, 6, 5, 3)
temp = rev(which((vector == 5) & (vector[-1] == 3))) # find indexes of 5s followed by 3s
for (t in temp){
vector = vector[-(t+1)] # remove threes
vector[t] = 8 # replace fives with eights
}
vector
# [1] 1 2 7 4 3 8 8 2 6 9 4 4 5 6 2 6 8

Resources