Compare frequencies of samples in r - r

I would like to compare the frequency of samples from two different observations. The problem is that the first doesn't contain the whole range of numbers of the second. How could I combine these without writing a for loop sorting them based on the x values returned by count?
Here's a MWE for clarification:
library(plyr)
a <- c(5, 4, 5, 7, 3, 5, 6, 5, 5, 4, 5, 5, 4, 5, 4, 7, 2, 4, 4, 5, 3, 6, 5, 6, 4, 4, 5, 4, 5, 5, 6, 7, 4)
b <- c(1, 3, 4, 6, 2, 7, 7, 4, 3, 6, 6, 3, 6, 6, 5, 6, 6, 5)
a.count <- count(a)
b.count <- count(b)
My desired result should look somehow like that:
freq.a freq.b
1 1
2 1 1
3 3 2
4 2 10
5 2 13
6 7 4
7 2 3

If you put your data in long format (one row per observation, with a variable for which sample it is from), then you can just make a contingency table:
data.frame(v=df.a, s='a') %>% rbind(data.frame(v=df.b, s='b')) %>%
xtabs(f=~v+s)
Produces:
s
v a b
1 0 1
2 1 1
3 2 3
4 10 2
5 13 2
6 4 7
7 3 2

df <- merge(a.count, b.count, by ='x', all=TRUE)[2:3]
names(df) <- c('freq.a', 'freq.b')
df
freq.a freq.b
1 NA 1
2 1 1
3 2 3
4 10 2
5 13 2
6 4 7
7 3 2

Related

Vector of repeated index values

I have a vector of the following form:-
a <- c(4, 6, 3, 6, 1)
What I want is to make a vector such that it has the index of the vector a the number of times the value of that index in vector a.
Like the first index has value 4, so there should be 4 ones, followed by 6 twos, followed by 3 threes, and so on.
Then resulting vector should be of the following form:-
b <- c(1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5)
Thanks in advance.
We can use rep as :
a <- c(4, 6, 3, 6, 1)
rep(seq_along(a), a)
#[1] 1 1 1 1 2 2 2 2 2 2 3 3 3 4 4 4 4 4 4 5
We can use sequence
cumsum(sequence(a) == 1)
#[1] 1 1 1 1 2 2 2 2 2 2 3 3 3 4 4 4 4 4 4 5
Or using uncount
library(dplyr)
library(tidyr)
tibble(a) %>%
mutate(rn = row_number()) %>%
uncount(a)

average calculation for many data in R

I have a file to which the results are saved:
4
4
4
4
5
4
4
5
6
4
4
5
5
6
4
I would like to calculate the average for each group
unfortunately, only I managed to calculate for everyone
I would like to get an average of 5 items
they are savedin wynik2.txt file
wynik_epidemii <- read.table(file="wynik2.txt")
wynik_epidemii<- mean(as.numeric(unlist(wynik_epidemii)))
You can use tapply, defining the grouping factor with a cumsum trick.
meanN <- function(x, n = 5){
f <- cumsum(rep(c(1, rep(0, n - 1)), length.out = length(x)))
tapply(x, f, mean)
}
meanN(x)
# 1 2 3
#4.2 4.6 4.8
DATA.
x <-
c(4, 4, 4, 4, 5, 4, 4, 5, 6, 4, 4, 5, 5, 6, 4)

Group matching numbers in random order in R

I'm working on a Monte-Carlo simulation type problem and need to generate a vector of repeated random numbers, with the matching numbers grouped together, but in random order.
It's easier to explain with an example. If I had:
1, 3, 7, 12, 1, 3, 7, 12, 1, 3, 7, 12
I would like it sorted as:
7, 7, 7, 3, 3, 3, 12, 12, 12, 1, 1, 1 (or with the groups of matching numbers in any order but ascending/descending).
The reason I need the random order is because my MC simulation is for 2 variables, so if both are in order they won't vary independently.
I've got as far as:
sort(rep(runif(50,1,10),10), decreasing = FALSE)
Which generates 50 random numbers between 1 and 10, repeats each 10 times, then sorts the 50 groups of 10 matching random numbers in ascending order (or it could easily be descending order if I changed "FALSE" to "TRUE"). I just can't figure out the last step of getting 50 groups of 10 matching numbers in random order. Can anyone help?
Here is one option with split
unlist(sample(split(v1, v1)), use.names = FALSE)
#[1] 3 3 3 1 1 1 12 12 12 7 7 7
Or another option is match with unique
v1[order(match(v1, sample(unique(v1))))]
data
v1 <- c(1, 3, 7, 12, 1, 3, 7, 12, 1, 3, 7, 12)
An option could be as:
v <- c(1, 3, 7, 12, 1, 3, 7, 12, 1, 3, 7, 12)
lst <- split(v, unique(v))
sapply(sample(seq(length(lst)),length(lst)), function(i)lst[[i]])
# [,1] [,2] [,3] [,4]
#[1,] 3 12 7 1
#[2,] 3 12 7 1
#[3,] 3 12 7 1
#OR for having just a vector
as.vector(sapply(sample(seq(length(lst)),length(lst)), function(i)lst[[i]]))
#[1] 3 3 3 12 12 12 7 7 7 1 1 1

Merging two consecutive values

I have a vector of different values, and I would like to merge and add two values together if a 5 is followed by a 3.
Input:
vector <- c(1, 2, 7, 4, 3, 8, 5, 3, 2, 6, 9, 4, 4, 5, 6, 2, 6, 5, 3)
Expected output:
1 2 7 4 3 8 8 2 6 9 4 4 5 6 2 6 8
So as you can see, the two occurrences of a three following a 5 have been added together to show 8. I'm sure there is a simple function that will do this in a matter of seconds, I just wasn't able to find it.
Thanks in advance!
vector <- c(1, 2, 7, 4, 3, 8, 5, 3, 2, 6, 9, 4, 4, 5, 6, 2, 6, 5, 3)
# get indices where 5 followed by 3
fives <- head(vector, -1) == 5 & tail(vector, -1) == 3
# add three to fives
vector[fives] <- vector[fives] + 3
# remove threes
vector <- vector[c(TRUE, !fives)]
vector
# [1] 1 2 7 4 3 8 8 2 6 9 4 4 5 6 2 6 8
Here is one possibility:
x <- c(1, 2, 7, 4, 3, 8, 5, 3, 2, 6, 9, 4, 4, 5, 6, 2, 6, 5, 3)
A <- rbind(x[-length(x)], x[-1])
id <- which( colSums( abs(A - c(5, 3)) ) == 0 )
x[rbind(id, id + 1L)] <- c(8, NA)
na.omit(x)
This solution was proposed to make it easier to extend to general cases (It may not best meets OP's need, but I just did it as an exercise.)
In general, if you want to match a chunk xc in a vector x, we can do:
A <- t(embed(x, length(xc)))
id <- which(colSums(abs(A - rev(xc))) == 0)
Now id gives you the starting index of the matching chunk in x.
vector <- c(1, 2, 7, 4, 3, 8, 5, 3, 2, 6, 9, 4, 4, 5, 6, 2, 6, 5, 3)
temp = rev(which((vector == 5) & (vector[-1] == 3))) # find indexes of 5s followed by 3s
for (t in temp){
vector = vector[-(t+1)] # remove threes
vector[t] = 8 # replace fives with eights
}
vector
# [1] 1 2 7 4 3 8 8 2 6 9 4 4 5 6 2 6 8

Group sequence of integers

I have bunch of observations
x = c(1, 2, 4, 1, 6, 7, 11, 11, 12, 13, 14)
that I want to turn into the group:
y = c(1, 1, 1, 1, 2, 2, 3, 3, 3, 3, 3)
I.e I want the first 5 integers (1 to 5) to constitute one group, the next 5 integers to constitute the next group (6 to 10), and so on.
Is there a straightforward way to accomplish this without a loop?
Clarification: I need to programmatically create the groups form the input vector (x)
We can use %/% to create the group
x%/%5+1
#[1] 1 1 1 1 2 2 3 3 3 3 3
You can use ceiling to create groups
ceiling(x/5)
# [1] 1 1 1 1 2 2 3 3 3 3 3

Resources