Counting occurances for each element without any loop - r

I have the following vector V1 <- c(1, 2, 3, 1, 4, 5, 5, 2, 1).
I would like to have a vector V2 of the same size that for each element indicates how often the corresponding element in V1 occurs, i.e. V2 should be c(3, 2, 1, 3, 1, 2, 2, 2, 3).
I know a multitude of ways to obtain this result, however all of them include using a loop, e.g.
V2 <- c()
for(k in 1:length(V1)){
V2[k] <- length(which(V1 == V1[k]))
}
My question is: Is there a more elegant way of doing it, i.e. without using a loop?

Summing up the solutions provided
# #bouncyball
ave(V1, V1, FUN = length)
# [1] 3 2 1 3 1 2 2 2 3
# #AntoniosK
table(V1)[V1]
# V1
# 1 2 3 1 4 5 5 2 1
# 3 2 1 3 1 2 2 2 3
# #Roman
library(tidyverse);
as.tibble(V1) %>% group_by(value) %>% mutate(V2=n())
# A tibble: 9 x 2
# Groups: value [5]
# value V2
# <dbl> <int>
# 1 1 3
# 2 2 2
# 3 3 1
# 4 1 3
# 5 4 1
# 6 5 2
# 7 5 2
# 8 2 2
# 9 1 3

Related

Divide data in to chunks with multiple values in each chunk in R

I have a dataframe with observations from three years time, with column df$week that indicates the week of the observation. (The week count of the second year continues from the count of the first, so the data contains 207 weeks).
I would like to divide the data to longer time periods, to df$period that would include all observations from several weeks' time.
If a period would be the length of three weeks, and I the data would include 13 observations in six weeks time, the I idea would be to divide
weeks <- c(1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 6)
into
periods <- c(1, 1, 1, 2, 2, 3, 3), c(4, 5, 5, 6, 6, 6)
periods
[1]
1 1 1 2 2 3 3
[2]
4 5 5 6 6 6
To look something like
> df
week period
1 1 1
2 1 1
3 1 1
4 2 1
5 2 1
6 3 1
7 3 1
8 4 2
9 5 2
10 5 2
11 6 2
12 6 2
13 6 2
>
The data contains +13k rows so would need to do some sort of map in style of
mapPeriod <- function(df, fun) {
out <- vector("vector_of_weeks", length(df))
for (i in seq_along(df)) {
out[i] <- fun(df[[i]])
}
out
}
I just don't know what to include in the fun to divide the weeks to the decided sequences of periods. Can function rep be of assistance here? How?
I would be very grateful for all input and suggestions.
split(weeks, f = (weeks - 1) %/% 3)
$`0`
[1] 1 1 1 2 2 3 3
$`1`
[1] 4 5 5 6 6 6
from comments below
weeks <- c(1, 1, 1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 6)
df <- data.frame(weeks)
library(data.table)
df$period <- data.table::rleid((weeks - 1) %/% 3)
# weeks period
# 1 1 1
# 2 1 1
# 3 1 1
# 4 2 1
# 5 2 1
# 6 3 1
# 7 3 1
# 8 4 2
# 9 5 2
# 10 5 2
# 11 6 2
# 12 6 2
# 13 6 2

How to split the data 1 1 2 2 3 3 to 1 2 3 1 2 3 in R? [duplicate]

This question already has an answer here:
Sort vector into repeating sequence when sequential values are missing R
(1 answer)
Closed 6 months ago.
I want to convert a vector:
1 1 2 2 3 3
to
1 2 3 1 2 3
How to do it? Many thanks.
You can use a matrix to layout the original vector by rows and then convert it back to a vector to get the desired result.
v = c(1,1,2,2,3,3)
v2 = as.vector(matrix(v, nrow = length(unique(v)), byrow = T))
> v2
[1] 1 2 3 1 2 3
The length(unique(v)) is there to generalize how many rows the matrix should have and not hardcode a 3.
Another example:
v = c(1,1,1,2,2,2,3,3,3,4,4,4)
v2 = as.vector(matrix(v, nrow = length(unique(v)), byrow = T))
v2
[1] 1 2 3 4 1 2 3 4 1 2 3 4
We can use rbind/split
c(do.call(rbind, split(v1, v1)))
#[1] 1 2 3 1 2 3
Or if there are unequal number of replications of each element, get the order of the rowid
library(data.table)
v1[order(rowid(v1))]
#[1] 1 2 3 1 2 3
Or with base R
v1[order(ave(v1, v1, FUN = seq_along))]
#[1] 1 2 3 1 2 3
data
v1 <- c(1, 1, 2, 2, 3, 3)
vec <- c(1, 1, 2, 2, 3, 3)
rep(unique(vec), 2)
[1] 1 2 3 1 2 3

Count frequency of each element in vector

I'm looking for a way to count the frequency of each element in a vector.
ex <- c(2,2,2,3,4,5)
Desired outcome:
[1] 3 3 3 1 1 1
Is there a simple command for this?
rep(table(ex), table(ex))
# 2 2 2 3 4 5
# 3 3 3 1 1 1
If you don't want the labels you can wrap in as.vector()
as.vector(rep(table(ex), table(ex)))
# [1] 3 3 3 1 1 1
I'll add (because it seems related somehow) that if you only wanted consecutive values, you could use rle instead of table:
ex2 = c(2, 2, 2, 3, 4, 2, 2, 3, 4, 4)
rep(rle(ex2)$lengths, rle(ex2)$lengths)
# [1] 3 3 3 1 1 2 2 1 2 2
As pointed out in comments, for a large vector calculating a table can be expensive, so doing it only once is more efficient:
tab = table(ex)
rep(tab, tab)
# 2 2 2 3 4 5
# 3 3 3 1 1 1
You can use
ex <- c(2,2,2,3,4,5)
outcome <- ave(ex, ex, FUN = length)
This is what thelatemail suggested. Also similar to the answer at this question

Add a column for counting unique tuples in the data frame [duplicate]

This question already has answers here:
How to get frequencies then add it as a variable in an array?
(3 answers)
Closed 8 years ago.
Suppose I have the following data frame:
userID <- c(1, 1, 3, 5, 3, 5)
A <- c(2, 3, 2, 1, 2, 1)
B <- c(2, 3, 1, 0, 1, 0)
df <- data.frame(userID, A, B)
df
# userID A B
# 1 1 2 2
# 2 1 3 3
# 3 3 2 1
# 4 5 1 0
# 5 3 2 1
# 6 5 1 0
I would like to create a data frame with the same columns but with an added final column that counts up the number of unique tuples / combinations of the other columns. The output should look like the following:
userID A B count
1 2 2 1
1 3 3 1
3 2 1 2
5 1 0 2
The meaning is the the tuple / combination of (1, 2, 2) occurs with count=1, while the tuple of (3, 2, 1) occurs twice so has count=2. I would prefer not to use any external packages.
1) aggregate
ag <- aggregate(count ~ ., cbind(count = 1, df), length)
ag[do.call("order", ag), ] # sort the rows
giving:
userID A B count
3 1 2 2 1
4 1 3 3 1
2 3 2 1 2
1 5 1 0 2
The last line of code which sorts the rows could be omitted if the order of the rows is unimportant.
The remaining solutions use the indicated packages:
2) sqldf
library(sqldf)
Names <- toString(names(df))
fn$sqldf("select *, count(*) count from df group by $Names order by $Names")
giving:
userID A B count
1 1 2 2 1
2 1 3 3 1
3 3 2 1 2
4 5 1 0 2
The order by clause could be omitted if the order is unimportant.
3) dplyr
library(dplyr)
df %>% regroup(as.list(names(df))) %>% summarise(count = n())
giving:
Source: local data frame [4 x 4]
Groups: userID, A
userID A B count
1 1 2 2 1
2 1 3 3 1
3 3 2 1 2
4 5 1 0 2
4) data.table
library(data.table)
data.table(df)[, list(count = .N), by = names(df)]
giving:
userID A B count
1: 1 2 2 1
2: 1 3 3 1
3: 3 2 1 2
4: 5 1 0 2
ADDED additional solutions. Also some small improvements.
Here's a fairly straightforward way (ave to the rescue!):
unique(cbind(df,
count = ave(rep(1, nrow(df)),
do.call(paste, df),
FUN = length)))
# userID A B count
# 1 1 2 2 1
# 2 1 3 3 1
# 3 3 2 1 2
# 4 5 1 0 2
Here's a variation of the above:
unique(within(df, {
counter <- rep(1, nrow(df))
count <- ave(counter, df, FUN = length)
rm(counter)
}))
# userID A B count
# 1 1 2 2 1
# 2 1 3 3 1
# 3 3 2 1 2
# 4 5 1 0 2
userID <- c(1, 1, 3, 5, 3, 5)
A <- c(2, 3, 2, 1, 2, 1)
B <- c(2, 3, 1, 0, 1, 0)
df <- data.frame(userID, A, B)
Make a quick factor of the tuples:
df$AB <- as.factor(paste(df$userID,df$A,df$B, sep=""))
No external packages just taking advantage of summary() and storing it as a DF then merging the counts on the original data:
df2 <- as.data.frame(summary(df$AB))
df2 <- data.frame(x=row.names(df2), y=df2[1])
names(df2) <- c("AB", "count")
df <- merge(df, df2, by="AB", all.x=TRUE)
df$AB <- NULL
Almost final output, just has dupes:
df
userID A B count
1 1 2 2 1
2 1 3 3 1
3 3 2 1 2
4 3 2 1 2
5 5 1 0 2
6 5 1 0 2
Lastly, clean up dupes:
df <- df[!duplicated(df), ]
Here you go:
df
userID A B count
1 1 2 2 1
2 1 3 3 1
3 3 2 1 2
5 5 1 0 2
Been a while not doing that with sql or plyr. if you can use dplyr or a package later on do it. Bioconductor has a lot of great sequencing packages if it starts to get more complex.
Hope this helps.
This should do the trick, even if it is a little bit ugly:
vec <- table(apply(df,1,paste,collapse=""))
df2 <- data.frame(do.call(rbind,strsplit(names(vec),"")))
names(df2) <- names(df)
df2$count <- vec
# userID A B count
#1 1 2 2 1
#2 1 3 3 1
#3 3 2 1 2
#4 5 1 0 2

Cumulative count of each value [duplicate]

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 2 years ago.
I want to create a cumulative counter of the number of times each value appears.
e.g. say I have the column:
id
1
2
3
2
2
1
2
3
This would become:
id count
1 1
2 1
3 1
2 2
2 3
1 2
2 4
3 2
etc...
The ave function computes a function by group.
> id <- c(1,2,3,2,2,1,2,3)
> data.frame(id,count=ave(id==id, id, FUN=cumsum))
id count
1 1 1
2 2 1
3 3 1
4 2 2
5 2 3
6 1 2
7 2 4
8 3 2
I use id==id to create a vector of all TRUE values, which get converted to numeric when passed to cumsum. You could replace id==id with rep(1,length(id)).
Here is a way to get the counts:
id <- c(1,2,3,2,2,1,2,3)
sapply(1:length(id),function(i)sum(id[i]==id[1:i]))
Which gives you:
[1] 1 1 1 2 3 2 4 2
The dplyr way:
library(dplyr)
foo <- data.frame(id=c(1, 2, 3, 2, 2, 1, 2, 3))
foo <- foo %>% group_by(id) %>% mutate(count=row_number())
foo
# A tibble: 8 x 2
# Groups: id [3]
id count
<dbl> <int>
1 1 1
2 2 1
3 3 1
4 2 2
5 2 3
6 1 2
7 2 4
8 3 2
That ends up grouped by id. If you want it not grouped, add %>% ungroup().
For completeness, adding a data.table way:
library(data.table)
DT <- data.table(id = c(1, 2, 3, 2, 2, 1, 2, 3))
DT[, count := seq(.N), by = id][]
Output:
id count
1: 1 1
2: 2 1
3: 3 1
4: 2 2
5: 2 3
6: 1 2
7: 2 4
8: 3 2
The dataframe I had was too large and the accepted answer kept crashing. This worked for me:
library(plyr)
df$ones <- 1
df <- ddply(df, .(id), transform, cumulative_count = cumsum(ones))
df$ones <- NULL
Function to get the cumulative count of any array, including a non-numeric array:
cumcount <- function(x){
cumcount <- numeric(length(x))
names(cumcount) <- x
for(i in 1:length(x)){
cumcount[i] <- sum(x[1:i]==x[i])
}
return(cumcount)
}

Resources