How can I leave columns untouched with aggregate in R

How can I leave columns untouched with aggregate in R - r

I have a large dataframe with experiments with different parameters. Each combination of parameters have several executions:
PROFILE TIME NTHREADS PARAM1 PARAM2 PARAM3
prof1 3.01 1 4 10 1
prof1 2.90 1 4 10 1
prof1 3.02 1 4 10 1
prof1 1.52 1 4 10 2
prof1 1.60 1 4 10 2
...
I am using aggregate to obtain the best time for each combination of profile & nthreads:
data_aggregated <- aggregate(data$TIME,
by = list(PROFILE = data$PROFILE,
NTHREADS = data$NTHREADS),
FUN = min)
That return a new dataframe like this:
PROFILE NTHREADS TIME
prof1 1 1.52
prof1 2 0.9
prof2 1 1.41
prof2 2 0.88
...
What I want is to obtain the values of PARAM1, PARAM2, PARAM3 for the aggregated row in each case (the one with minimum time). For now, I look in first dataframe the row where PROFILE, TIME and NTHREADS are equal to the ones in the second dataframe, but maybe there is an easier way?

Alternatively, with dplyr:
library(dplyr)
dat <- dat %>%
group_by(PROFILE, NTHREADS) %>%
filter(TIME == min(TIME))

Finally I've done it with the comment by Ronak Shah. Iff both dataframes share column names & values (because of aggregating with MIN instead of MEAN), the simplest solution is:
data_aggr <- merge(data_aggr, data)

Consider ave, the method to aggregate across different levels of factors. You can pass multiple groupings as separate arguments:
data <- read.table(text="PROFILE TIME NTHREADS PARAM1 PARAM2 PARAM3
prof1 3.01 1 4 10 1
prof2 2.90 2 4 10 1
prof1 3.02 1 4 10 1
prof2 1.52 2 4 10 2
prof1 1.60 1 4 10 2", header=TRUE)
data$min_TIME <- ave(data$TIME, data$PROFILE, data$NTHREADS, FUN=min)
data
# PROFILE TIME NTHREADS PARAM1 PARAM2 PARAM3 min_TIME
# 1 prof1 3.01 1 4 10 1 1.60
# 2 prof2 2.90 2 4 10 1 1.52
# 3 prof1 3.02 1 4 10 1 1.60
# 4 prof2 1.52 2 4 10 2 1.52
# 5 prof1 1.60 1 4 10 2 1.60

Related

Fill value of a numeric with the entry of another row based on name

Good day,
I have a dataframe that looks like this;
Community Value Num
<chr> <dbl> <dbl>
1 A 3.54 3
2 A 4.56 3
3 A 2.22 3
4 B 0 NA
5 B 0.76 NA
6 C 1.2 5
I am hoping to fill the Num observation of Community B with the Num observation of Community A while keeping the names as is.

If this is all you need to do, you can try using an if_else() statement. If your greater problem is more complex, you might need to take another approach.
library(dplyr)
df %>%
mutate(Num = if_else(Community == "B", Num[Community == "A"][1], Num))
# Community Value Num
# 1 A 3.54 3
# 2 A 4.56 3
# 3 A 2.22 3
# 4 B 0.00 3
# 5 B 0.76 3
# 6 C 1.20 5
Data:
df <- read.table(textConnection("Community Value Num
A 3.54 3
A 4.56 3
A 2.22 3
B 0 NA
B 0.76 NA
C 1.2 5"), header = TRUE)

If the NA values will always be immediately below the positive values with which you want to replace them, you can use fill:
library(tidyr)
df %>%
fill(Num, .direction = "down")

create new var by product of preceding result per goup id in dplyr

I have the follorwing data, and what I need is to create new var new will obtain by product the preceding row values of var Z per group id. eg. the first value of column new is 0.9, 0.90.1, 0.90.1*0.5 for id x=1.
data <- data.frame(x=c(1,1,1,1,2,2,3,3,3,4,4,4,4),
y=c(4,2,2,6,5,6,6,7,8,2,1,6,5),
z=c(0.9,0.1,0.5,0.12,0.6,1.2,2.1,0.9,0.4,0.8,0.45,1.3,0.85))
desired outcome
x y z new
1 1 4 0.90 0.9000
2 1 2 0.10 0.0900
3 1 2 0.50 0.0450
4 1 6 0.12 0.0054
5 2 5 0.60 0.6000
6 2 6 1.20 0.7200
7 3 6 2.10 2.1000
8 3 7 0.90 1.8900
9 3 8 0.40 0.7560
10 4 2 0.80 0.8000
11 4 1 0.45 0.3600
12 4 6 1.30 0.4680
13 4 5 0.85 0.3978

We can use the cumprod from base R
library(dplyr)
data %>%
group_by(x) %>%
mutate(new = cumprod(z)) %>%
ungroup
Or with base R
data$new <- with(data, ave(z, x, FUN = cumprod))

Add observation number by group in R [duplicate]

This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 1 year ago.
This is a silly question but I am new to R and it would make my life so much easier if I could figure out how to do this!
So here is some sample data
data <- read.table(text = "Category Y
A 5.1
A 3.14
A 1.79
A 3.21
A 5.57
B 3.68
B 4.56
B 3.32
B 4.98
B 5.82
",header = TRUE)
I want to add a column that counts the number of observations within a group. Here is what I want it to look like:
Category Y OBS
A 5.1 1
A 3.14 2
A 1.79 3
A 3.21 4
A 5.57 5
B 3.68 1
B 4.56 2
B 3.32 3
B 4.98 4
B 5.82 5
I have tried:
data <- data %>% group_by(Category) %>% mutate(count = c(1:length(Category)))
which just creates another column numbered from 1 to 10, and
data <- data %>% group_by(Category) %>% add_tally()
which just creates another column of all 5s

Base R:
data$OBS <- ave(seq_len(nrow(data)), data$Category, FUN = seq_along)
data
# Category Y OBS
# 1 A 5.10 1
# 2 A 3.14 2
# 3 A 1.79 3
# 4 A 3.21 4
# 5 A 5.57 5
# 6 B 3.68 1
# 7 B 4.56 2
# 8 B 3.32 3
# 9 B 4.98 4
# 10 B 5.82 5
BTW: one can use any of the frame's columns as the first argument, including ave(data$Category, data$Category, FUN=seq_along), but ave chooses its output class based on the input class, so using a string as the first argument will result in a return of strings:
ave(data$Category, data$Category, FUN = seq_along)
# [1] "1" "2" "3" "4" "5" "1" "2" "3" "4" "5"
While not heinous, it needs to be an intentional choice. Since it appears that you wanted an integer in that column, I chose the simplest integer-in, integer-out approach. It could also have used rep(1L,nrow(data)) or anything that is both integer and the same length as the number of rows in the frame, since seq_along (the function I chose) won't otherwise care.

library(data.table)
setDT(data)[, OBS := seq_len(.N), by = .(Category)]
data
Category Y OBS
1: A 5.10 1
2: A 3.14 2
3: A 1.79 3
4: A 3.21 4
5: A 5.57 5
6: B 3.68 1
7: B 4.56 2
8: B 3.32 3
9: B 4.98 4
10: B 5.82 5

library(dplyr)
data %>% group_by(Category) %>% mutate(Obs = row_number())
# A tibble: 10 x 3
# Groups: Category [2]
Category Y Obs
<chr> <dbl> <int>
1 A 5.1 1
2 A 3.14 2
3 A 1.79 3
4 A 3.21 4
5 A 5.57 5
6 B 3.68 1
7 B 4.56 2
8 B 3.32 3
9 B 4.98 4
10 B 5.82 5
OR
data$OBS <- ave(data$Category, data$Category, FUN = seq_along)
data
Category Y OBS
1 A 5.10 1
2 A 3.14 2
3 A 1.79 3
4 A 3.21 4
5 A 5.57 5
6 B 3.68 1
7 B 4.56 2
8 B 3.32 3
9 B 4.98 4
10 B 5.82 5

Another base R
category <- c(rep('A',5),rep('B',5))
sequence <- sequence(rle(as.character(category))$lengths)
data <- data.frame(category=category,sequence=sequence)
head(data,10)

Replacing conditional values with previous values in r

I have some data on organism survival as a function of time. The data is constructed using the averages of many replicates for each time point, which can yield a forward time step with an increase in survival. Occasionally, this results in a survivorship greater than 1, which is impossible. How can I conditionally change values greater than 1 to the value preceeding it in the same column?
Here's what the data looks like:
>df
Generation Treatment time lx
1 0 1 0 1
2 0 1 2 1
3 0 1 4 0.970
4 0 1 6 0.952
5 0 1 8 0.924
6 0 1 10 0.913
7 0 1 12 0.895
8 0 1 14 0.729
9 0 2 0 1
10 0 2 2 1
I've tried mutating the column of interest as such, which still yields values above 1:
df1 <- df %>%
group_by(Generation, Treatment) %>%
mutate(lx_diag = as.numeric(lx/lag(lx, default = first(lx)))) %>% #calculate running survival
mutate(lx_diag = if_else(lx_diag > 1.000000, lag(lx_diag), lx_diag)) #substitute values >1 with previous value
>df1
Generation Treatment time lx lx_diag
1 12 1 0 1 1
2 12 1 2 1 1
3 12 1 4 1 1
4 12 1 6 0.996 0.996
5 12 1 8 0.988 0.992
6 12 1 10 0.956 0.968
7 12 1 12 0.884 0.925
8 12 1 14 0.72 0.814
9 12 1 15 0.729 1.01
10 12 1 19 0.76 1.04
I expect the results to look something like:
>df1
Generation Treatment time lx lx_diag
1 12 1 0 1 1
2 12 1 2 1 1
3 12 1 4 1 1
4 12 1 6 0.996 0.996
5 12 1 8 0.988 0.992
6 12 1 10 0.956 0.968
7 12 1 12 0.884 0.925
8 12 1 14 0.72 0.814
9 12 1 15 0.729 0.814
10 12 1 19 0.76 0.814
I know you can conditionally change the values to a specific value (i.e. ifelse with no else), but I haven't found any solutions that can conditionally change a value in a column to the value in the previous row. Any help is appreciated.
EDIT: I realized that mutate and if_else are quite efficient when it comes to converting values. Instead of replacing values in sequence from the first to last, as I would have expected, the commands replace all values at the same time. So in a series of values >1, you will have some left behind. Thus, if you just run the command:
SurvTot1$lx_diag <- if_else(SurvTot1$lx_diag > 1, lag(SurvTot1$lx_diag), SurvTot1$lx_diag)
over again, you can rid of the values >1. Not the most elegant solution, but it works.

This looks like a very ugly solution to me, but I couldn't think of anything else:
df = data.frame(
"Generation" = rep(12,10),
"Treatent" = rep(1,10),
"Time" = c(seq(0,14,by=2),15,19),
"lx_diag" = c(1,1,1,0.996,0.992,0.968,0.925,0.814,1.04,1.04)
)
update_lag = function(x){
k <<- k+1
x
}
k=1
df %>%
mutate(
lx_diag2 = ifelse(lx_diag <=1,update_lag(lx_diag),lag(lx_diag,n=k))
)

Using the data from #Fino, here is my vectorized solution using base R
vals.to.replace <- which(df$lx_diag > 1)
vals.to.substitute <- sapply(vals.to.replace, function(x) tail( df$lx_diag[which(df$lx_diag[1:x] <= 1)], 1) )
df$lx_diag[vals.to.replace] = vals.to.substitute
df
Generation Treatent Time lx_diag
1 12 1 0 1.000
2 12 1 2 1.000
3 12 1 4 1.000
4 12 1 6 0.996
5 12 1 8 0.992
6 12 1 10 0.968
7 12 1 12 0.925
8 12 1 14 0.814
9 12 1 15 0.814
10 12 1 19 0.814

Add numbers corresponding to different hours associated with two different dates in R

I have the following dataframe
set.seed(1000)
data <- data.frame(date = sort(rep(Sys.Date()-1:3, 5)),
hour = rep(0:4, 3),
values = round(rexp(15),2))
date hour values
1 2016-04-25 0 1.00
2 2016-04-25 1 0.52
3 2016-04-25 2 2.44
4 2016-04-25 3 2.16
5 2016-04-25 4 0.48
6 2016-04-26 0 0.17
7 2016-04-26 1 1.56
8 2016-04-26 2 0.51
9 2016-04-26 3 0.96
10 2016-04-26 4 0.05
11 2016-04-27 0 0.75
12 2016-04-27 1 1.69
13 2016-04-27 2 0.61
14 2016-04-27 3 0.85
15 2016-04-27 4 2.23
I want to add numbers correspondig to the column values, these numbers should be associated with the hours from
2 to 1 closed. However, the number 2 correspond to one date, and the number 1 is associated with the next date.
I want a final dataframe like
date sumvalue
2016-04-26 6.81
2016-04-27 3.96
Someone knows an elegant way to do this? I want to do the same with a huge dataframe.
Kind regards

Here is one way to get the expected output
library(data.table)
setDT(data)[, {Un1 <- unique(date)
i1 <- which(hour==2 & date %in% Un1[-length(Un1)])
i2 <- which(hour==1 & date %in% Un1[-1])
v1 <- unlist(Map(function(x,y) sum(values[seq(x,y)]),
i1, i2))
list(date = Un1[-1], sumvalue= v1)}]
# date sumvalue
#1: 2016-04-26 6.81
#2: 2016-04-27 3.96

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How can I leave columns untouched with aggregate in R - r

Alternatively, with dplyr: library(dplyr) dat <- dat %>% group_by(PROFILE, NTHREADS) %>% filter(TIME == min(TIME))

Finally I've done it with the comment by Ronak Shah. Iff both dataframes share column names & values (because of aggregating with MIN instead of MEAN), the simplest solution is: data_aggr <- merge(data_aggr, data)

Related

Fill value of a numeric with the entry of another row based on name

create new var by product of preceding result per goup id in dplyr

Add observation number by group in R [duplicate]

Replacing conditional values with previous values in r

Add numbers corresponding to different hours associated with two different dates in R

Categories

Resources