Add observation number by group in R [duplicate]

Add observation number by group in R [duplicate] - r

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 1 year ago.
This is a silly question but I am new to R and it would make my life so much easier if I could figure out how to do this!
So here is some sample data
data <- read.table(text = "Category Y
A 5.1
A 3.14
A 1.79
A 3.21
A 5.57
B 3.68
B 4.56
B 3.32
B 4.98
B 5.82
",header = TRUE)
I want to add a column that counts the number of observations within a group. Here is what I want it to look like:
Category Y OBS
A 5.1 1
A 3.14 2
A 1.79 3
A 3.21 4
A 5.57 5
B 3.68 1
B 4.56 2
B 3.32 3
B 4.98 4
B 5.82 5
I have tried:
data <- data %>% group_by(Category) %>% mutate(count = c(1:length(Category)))
which just creates another column numbered from 1 to 10, and
data <- data %>% group_by(Category) %>% add_tally()
which just creates another column of all 5s

Base R:
data$OBS <- ave(seq_len(nrow(data)), data$Category, FUN = seq_along)
data
# Category Y OBS
# 1 A 5.10 1
# 2 A 3.14 2
# 3 A 1.79 3
# 4 A 3.21 4
# 5 A 5.57 5
# 6 B 3.68 1
# 7 B 4.56 2
# 8 B 3.32 3
# 9 B 4.98 4
# 10 B 5.82 5
BTW: one can use any of the frame's columns as the first argument, including ave(data$Category, data$Category, FUN=seq_along), but ave chooses its output class based on the input class, so using a string as the first argument will result in a return of strings:
ave(data$Category, data$Category, FUN = seq_along)
# [1] "1" "2" "3" "4" "5" "1" "2" "3" "4" "5"
While not heinous, it needs to be an intentional choice. Since it appears that you wanted an integer in that column, I chose the simplest integer-in, integer-out approach. It could also have used rep(1L,nrow(data)) or anything that is both integer and the same length as the number of rows in the frame, since seq_along (the function I chose) won't otherwise care.

library(data.table)
setDT(data)[, OBS := seq_len(.N), by = .(Category)]
data
Category Y OBS
1: A 5.10 1
2: A 3.14 2
3: A 1.79 3
4: A 3.21 4
5: A 5.57 5
6: B 3.68 1
7: B 4.56 2
8: B 3.32 3
9: B 4.98 4
10: B 5.82 5

library(dplyr)
data %>% group_by(Category) %>% mutate(Obs = row_number())
# A tibble: 10 x 3
# Groups: Category [2]
Category Y Obs
<chr> <dbl> <int>
1 A 5.1 1
2 A 3.14 2
3 A 1.79 3
4 A 3.21 4
5 A 5.57 5
6 B 3.68 1
7 B 4.56 2
8 B 3.32 3
9 B 4.98 4
10 B 5.82 5
OR
data$OBS <- ave(data$Category, data$Category, FUN = seq_along)
data
Category Y OBS
1 A 5.10 1
2 A 3.14 2
3 A 1.79 3
4 A 3.21 4
5 A 5.57 5
6 B 3.68 1
7 B 4.56 2
8 B 3.32 3
9 B 4.98 4
10 B 5.82 5

Another base R
category <- c(rep('A',5),rep('B',5))
sequence <- sequence(rle(as.character(category))$lengths)
data <- data.frame(category=category,sequence=sequence)
head(data,10)

Related

Fill value of a numeric with the entry of another row based on name

Good day,
I have a dataframe that looks like this;
Community Value Num
<chr> <dbl> <dbl>
1 A 3.54 3
2 A 4.56 3
3 A 2.22 3
4 B 0 NA
5 B 0.76 NA
6 C 1.2 5
I am hoping to fill the Num observation of Community B with the Num observation of Community A while keeping the names as is.

If this is all you need to do, you can try using an if_else() statement. If your greater problem is more complex, you might need to take another approach.
library(dplyr)
df %>%
mutate(Num = if_else(Community == "B", Num[Community == "A"][1], Num))
# Community Value Num
# 1 A 3.54 3
# 2 A 4.56 3
# 3 A 2.22 3
# 4 B 0.00 3
# 5 B 0.76 3
# 6 C 1.20 5
Data:
df <- read.table(textConnection("Community Value Num
A 3.54 3
A 4.56 3
A 2.22 3
B 0 NA
B 0.76 NA
C 1.2 5"), header = TRUE)

If the NA values will always be immediately below the positive values with which you want to replace them, you can use fill:
library(tidyr)
df %>%
fill(Num, .direction = "down")

create new var by product of preceding result per goup id in dplyr

I have the follorwing data, and what I need is to create new var new will obtain by product the preceding row values of var Z per group id. eg. the first value of column new is 0.9, 0.90.1, 0.90.1*0.5 for id x=1.
data <- data.frame(x=c(1,1,1,1,2,2,3,3,3,4,4,4,4),
y=c(4,2,2,6,5,6,6,7,8,2,1,6,5),
z=c(0.9,0.1,0.5,0.12,0.6,1.2,2.1,0.9,0.4,0.8,0.45,1.3,0.85))
desired outcome
x y z new
1 1 4 0.90 0.9000
2 1 2 0.10 0.0900
3 1 2 0.50 0.0450
4 1 6 0.12 0.0054
5 2 5 0.60 0.6000
6 2 6 1.20 0.7200
7 3 6 2.10 2.1000
8 3 7 0.90 1.8900
9 3 8 0.40 0.7560
10 4 2 0.80 0.8000
11 4 1 0.45 0.3600
12 4 6 1.30 0.4680
13 4 5 0.85 0.3978

We can use the cumprod from base R
library(dplyr)
data %>%
group_by(x) %>%
mutate(new = cumprod(z)) %>%
ungroup
Or with base R
data$new <- with(data, ave(z, x, FUN = cumprod))

mutate rnorm with multiple column output

I would like to create random variables using rnorm() with the mean and sd specified in separate columns in a tibble
n <- 2
ti <- tibble(Name = letters[1:10], mean = 1:10, sd = 1:10)
How do I use mutate to add n columns to the tibble with output from rnorm(n, mean, sd) for every row?
(I know I can do this in base R, but am curious to learn how this works using dplyr)

One dplyr and tidyr option could be:
ti %>%
rowwise() %>%
mutate(col_rnorm = list(setNames(rnorm(n, mean, sd), c("col1", "col2")))) %>%
unnest_wider(col_rnorm)
Name mean sd col1 col2
<chr> <int> <int> <dbl> <dbl>
1 a 1 1 -1.73 1.18
2 b 2 2 3.86 0.0943
3 c 3 3 3.54 -0.502
4 d 4 4 3.21 -3.90
5 e 5 5 3.61 9.48
6 f 6 6 7.07 16.1
7 g 7 7 17.4 5.95
8 h 8 8 5.32 13.6
9 i 9 9 19.2 19.8
10 j 10 10 9.67 11.3

How can I leave columns untouched with aggregate in R

I have a large dataframe with experiments with different parameters. Each combination of parameters have several executions:
PROFILE TIME NTHREADS PARAM1 PARAM2 PARAM3
prof1 3.01 1 4 10 1
prof1 2.90 1 4 10 1
prof1 3.02 1 4 10 1
prof1 1.52 1 4 10 2
prof1 1.60 1 4 10 2
...
I am using aggregate to obtain the best time for each combination of profile & nthreads:
data_aggregated <- aggregate(data$TIME,
by = list(PROFILE = data$PROFILE,
NTHREADS = data$NTHREADS),
FUN = min)
That return a new dataframe like this:
PROFILE NTHREADS TIME
prof1 1 1.52
prof1 2 0.9
prof2 1 1.41
prof2 2 0.88
...
What I want is to obtain the values of PARAM1, PARAM2, PARAM3 for the aggregated row in each case (the one with minimum time). For now, I look in first dataframe the row where PROFILE, TIME and NTHREADS are equal to the ones in the second dataframe, but maybe there is an easier way?

Alternatively, with dplyr:
library(dplyr)
dat <- dat %>%
group_by(PROFILE, NTHREADS) %>%
filter(TIME == min(TIME))

Finally I've done it with the comment by Ronak Shah. Iff both dataframes share column names & values (because of aggregating with MIN instead of MEAN), the simplest solution is:
data_aggr <- merge(data_aggr, data)

Consider ave, the method to aggregate across different levels of factors. You can pass multiple groupings as separate arguments:
data <- read.table(text="PROFILE TIME NTHREADS PARAM1 PARAM2 PARAM3
prof1 3.01 1 4 10 1
prof2 2.90 2 4 10 1
prof1 3.02 1 4 10 1
prof2 1.52 2 4 10 2
prof1 1.60 1 4 10 2", header=TRUE)
data$min_TIME <- ave(data$TIME, data$PROFILE, data$NTHREADS, FUN=min)
data
# PROFILE TIME NTHREADS PARAM1 PARAM2 PARAM3 min_TIME
# 1 prof1 3.01 1 4 10 1 1.60
# 2 prof2 2.90 2 4 10 1 1.52
# 3 prof1 3.02 1 4 10 1 1.60
# 4 prof2 1.52 2 4 10 2 1.52
# 5 prof1 1.60 1 4 10 2 1.60

calculate multiple columns mean in R and generate a new table

I have a data set in .csv. It contains multiple columns for example.
Group Wk1 WK2 WK3 WK4 WK5 WK6
A 1 2 3 4 5 6
B 7 8 9 1 2 3
C 4 5 6 7 8 9
D 1 2 3 4 5 6
Then if I want to have the mean of both WK1 & WK2, Wk3, WK4 & WK5, WK6.
How can I do that?
The result may like
Group 1 2 3 4
mean 3.75 5.25 4.5 6
And how can I save it into a new table?
Thanks in advance.

You can melt your data.frame, create your groups using some basic indexing, and use aggregate:
library(reshape2)
X <- melt(mydf, id.vars="Group")
Match <- c(Wk1 = 1, Wk2 = 1, Wk3 = 2, Wk4 = 3, Wk5 = 3, Wk6 = 4)
aggregate(value ~ Match[X$variable], X, mean)
# Match[X$variable] value
# 1 1 3.75
# 2 2 5.25
# 3 3 4.50
# 4 4 6.00
tapply is also an appropriate candidate here:
tapply(X$value, Match[X$variable], mean)
# 1 2 3 4
# 3.75 5.25 4.50 6.00

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Add observation number by group in R [duplicate] - r

library(data.table) setDT(data)[, OBS := seq_len(.N), by = .(Category)] data Category Y OBS 1: A 5.10 1 2: A 3.14 2 3: A 1.79 3 4: A 3.21 4 5: A 5.57 5 6: B 3.68 1 7: B 4.56 2 8: B 3.32 3 9: B 4.98 4 10: B 5.82 5

Another base R category <- c(rep('A',5),rep('B',5)) sequence <- sequence(rle(as.character(category))$lengths) data <- data.frame(category=category,sequence=sequence) head(data,10)

Related

Fill value of a numeric with the entry of another row based on name

create new var by product of preceding result per goup id in dplyr

mutate rnorm with multiple column output

How can I leave columns untouched with aggregate in R

calculate multiple columns mean in R and generate a new table

Categories

Resources