I have a concentration-time data of many individuals. I want to find out the Cmax (maximum concentration) and Tmax (the time at Cmax) for each individual. I want to retain the results in R by adding a new "Cmax" and "Tmax" columns to the original dataset.
The data frame looks like this:
#df <-
ID TIME CONC
1 0 0
1 1 10
1 2 15
1 5 12
2 1 5
2 2 10
2 5 20
2 6 10
Ans so on. I started with something to find Cmax for an individual but its not getting me any where. Any help in fixing the code or an easier way of finding both (Cmax, and Tmax) is highly appreciable !
Cmax=function(df) {
n = length(df$CONC)
c_temp=0 # this is a temporary counter
c_max=0
for(i in 2:n){
if(df$CONC[i] > df$CONC[i-1]{
c_temp= c_temp+1
if(c_temp > c_max) c_max=c_temp # check
}
}
return(c_max)
}
Try
library(dplyr)
df %>%
group_by(ID) %>%
mutate(Cmax= max(CONC), Tmax=TIME[which.max(CONC)])
# ID TIME CONC Cmax Tmax
#1 1 0 0 15 2
#2 1 1 10 15 2
#3 1 2 15 15 2
#4 1 5 12 15 2
#5 2 1 5 20 5
#6 2 2 10 20 5
#7 2 5 20 20 5
#8 2 6 10 20 5
Or using data.table
library(data.table)
setDT(df)[, c("Cmax", "Tmax") := list(max(CONC),
TIME[which.max(CONC)]), by=ID]
Or using split from base R
unsplit(lapply(split(df, df$ID), function(x)
within(x, {Cmax <- max(CONC)
Tmax <- TIME[which.max(CONC)] })),
df$ID)
# ID TIME CONC Tmax Cmax
#1 1 0 0 2 15
#2 1 1 10 2 15
#3 1 2 15 2 15
#4 1 5 12 2 15
#5 2 1 5 5 20
#6 2 2 10 5 20
#7 2 5 20 5 20
#8 2 6 10 5 20
data
df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), TIME = c(0L,
1L, 2L, 5L, 1L, 2L, 5L, 6L), CONC = c(0L, 10L, 15L, 12L, 5L,
10L, 20L, 10L)), .Names = c("ID", "TIME", "CONC"), class = "data.frame",
row.names = c(NA, -8L))
Related
I am trying to identify the top 15% of scores for each watershed but retain the polygon ID when I print the results.
# here's a small example dataset (called "data"):
polygon watershed score
1 1 61
2 1 81
3 1 16
4 2 18
5 2 12
6 3 78
7 3 81
8 3 20
9 3 97
10 3 95
# I obtain the top 15% using this method:
top15 <- (data %>% select(watershed, score) %>%
group_by(watershed) %>%
arrange(watershed, desc(score)) %>%
filter(score > quantile(score, 0.15)))
# results look like this:
<int> <int>
1 1 81
2 1 61
3 2 18
4 3 97
5 3 95
6 3 81
7 3 78
How can I include the column "polygon" when I print the results?
Thanks so much for the help!
In your statement you selected only watershed and score but excluded polygon. So remove the select statement and you should get what you want. Additionally the arrange doesn't add value so I removed it:
library(dplyr)
mdat <- structure(list(polygon = 1:10,
watershed = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L),
score = c(61L, 81L, 16L, 18L, 12L, 78L, 81L, 20L, 97L, 95L)),
class = "data.frame", row.names = c(NA, -10L))
mdat %>%
group_by(watershed) %>%
filter(score > quantile(score, 0.15))
# # A tibble: 7 x 3
# # Groups: watershed [3]
# polygon watershed score
# <int> <int> <int>
# 1 1 1 61
# 2 2 1 81
# 3 4 2 18
# 4 6 3 78
# 5 7 3 81
# 6 9 3 97
# 7 10 3 95
I’m trying to figure out how to append a column that identifies whether a difference of 10 exists between different IDs for a given day using the column named reading.
**Day ID Reading**
19-Jan 1 10
19-Jan 1 10
19-Jan 1 10
19-Jan 1 20
19-Jan 2 20
19-Jan 2 20
19-Jan 2 20
19-Jan 2 20
20-Jan 1 10
21-Jan 1 10
22-Jan 1 10
23-Jan 1 10
24-Jan 1 20
25-Jan 2 20
25-Jan 2 20
25-Jan 2 20
25-Jan 2 10
I would like:
**Day ID Reading Difference**
19-Jan 1 10 Y
19-Jan 1 10 Y
19-Jan 1 10 Y
19-Jan 1 20 Y
19-Jan 2 20 N
19-Jan 2 20 N
19-Jan 2 20 N
19-Jan 2 20 N
20-Jan 1 10 N
21-Jan 1 10 N
22-Jan 1 10 N
23-Jan 1 10 N
24-Jan 1 20 N
25-Jan 2 20 Y
25-Jan 2 20 Y
25-Jan 2 20 Y
25-Jan 2 10 Y
What you could do is to check whether the difference of the range is equal to or greater than 10 for each group.
dat$Diff <- with(dat, ave(Reading, Day, ID, FUN = function(x) diff(range(x)) >= 10))
dat
# Day ID Reading Diff
#1 19-Jan 1 10 1
#2 19-Jan 1 10 1
#3 19-Jan 1 10 1
#4 19-Jan 1 20 1
#5 19-Jan 2 20 0
#6 19-Jan 2 20 0
#7 19-Jan 2 20 0
#8 19-Jan 2 20 0
#9 20-Jan 1 10 0
#10 21-Jan 1 10 0
#11 22-Jan 1 10 0
#12 23-Jan 1 10 0
#13 24-Jan 1 20 0
#14 25-Jan 2 20 1
#15 25-Jan 2 20 1
#16 25-Jan 2 20 1
#17 25-Jan 2 10 1
data
dat <- structure(list(Day = c("19-Jan", "19-Jan", "19-Jan", "19-Jan",
"19-Jan", "19-Jan", "19-Jan", "19-Jan", "20-Jan", "21-Jan", "22-Jan",
"23-Jan", "24-Jan", "25-Jan", "25-Jan", "25-Jan", "25-Jan"),
ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), Reading = c(10L, 10L, 10L, 20L, 20L, 20L,
20L, 20L, 10L, 10L, 10L, 10L, 20L, 20L, 20L, 20L, 10L)), .Names = c("Day",
"ID", "Reading"), class = "data.frame", row.names = c(NA, -17L
))
We can use data.table
library(data.table)
setDT(df1)[, Difference := abs(Reduce(`-`, as.list(range(Reading)))) >= 10,
.(ID, Day)]
df1
# Day ID Reading Difference
# 1: 19-Jan 1 10 TRUE
# 2: 19-Jan 1 10 TRUE
# 3: 19-Jan 1 10 TRUE
# 4: 19-Jan 1 20 TRUE
# 5: 19-Jan 2 20 FALSE
# 6: 19-Jan 2 20 FALSE
# 7: 19-Jan 2 20 FALSE
# 8: 19-Jan 2 20 FALSE
# 9: 20-Jan 1 10 FALSE
#10: 21-Jan 1 10 FALSE
#11: 22-Jan 1 10 FALSE
#12: 23-Jan 1 10 FALSE
#13: 24-Jan 1 20 FALSE
#14: 25-Jan 2 20 TRUE
#15: 25-Jan 2 20 TRUE
#16: 25-Jan 2 20 TRUE
#17: 25-Jan 2 10 TRUE
data
df1 <- structure(list(Day = c("19-Jan", "19-Jan", "19-Jan", "19-Jan",
"19-Jan", "19-Jan", "19-Jan", "19-Jan", "20-Jan", "21-Jan", "22-Jan",
"23-Jan", "24-Jan", "25-Jan", "25-Jan", "25-Jan", "25-Jan"),
ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L), Reading = c(10L, 10L, 10L, 20L, 20L, 20L,
20L, 20L, 10L, 10L, 10L, 10L, 20L, 20L, 20L, 20L, 10L)),
class = "data.frame", row.names = c(NA, -17L))
Using tidyverse you could do something like
library(tidyverse)
your_data %>%
group_by(Day, ID) %>%
mutate(difference = (max(difference) - min(difference)) >= 10)
Currently the data-frame looks something like this:
Scenario Month A B C
1 1 -0.593186301 1.045550808 -0.593816304
1 2 0.178626141 2.043084432 0.111370583
1 3 1.205779717 -0.324083723 -1.397716949
2 1 0.933615199 0.052647056 -0.656486153
2 2 1.647291688 -1.065793671 0.799040546
2 3 1.613663101 -1.955567231 -1.817457972
3 1 -0.621991775 1.634069402 -1.404981646
3 2 -1.899326887 -0.836322394 -1.826351541
3 3 0.164235141 -1.160701812 1.238246459
I'd like to add rows on top of the row where Month = 1 as below. I know dplyr has an add_rows function but I'd like to add rows based on a condition. Any help is hugely appreciated.
Scenario Month A B C
0
1 1 -0.593186301 1.045550808 -0.593816304
1 2 0.178626141 2.043084432 0.111370583
1 3 1.205779717 -0.324083723 -1.397716949
0
2 1 0.933615199 0.052647056 -0.656486153
2 2 1.647291688 -1.065793671 0.799040546
2 3 1.613663101 -1.955567231 -1.817457972
0
3 1 -0.621991775 1.634069402 -1.404981646
3 2 -1.899326887 -0.836322394 -1.826351541
3 3 0.164235141 -1.160701812 1.238246459
A solution using tidyverse.
library(tidyverse)
dat2 <- dat %>%
split(f = .$Scenario) %>%
map_dfr(~bind_rows(tibble(Scenario = 0), .x))
dat2
# # A tibble: 12 x 5
# Scenario Month A B C
# <dbl> <int> <dbl> <dbl> <dbl>
# 1 0 NA NA NA NA
# 2 1 1 -0.593 1.05 -0.594
# 3 1 2 0.179 2.04 0.111
# 4 1 3 1.21 -0.324 -1.40
# 5 0 NA NA NA NA
# 6 2 1 0.934 0.0526 -0.656
# 7 2 2 1.65 -1.07 0.799
# 8 2 3 1.61 -1.96 -1.82
# 9 0 NA NA NA NA
# 10 3 1 -0.622 1.63 -1.40
# 11 3 2 -1.90 -0.836 -1.83
# 12 3 3 0.164 -1.16 1.24
DATA
dat <- read.table(text = "Scenario Month A B C
1 1 -0.593186301 1.045550808 -0.593816304
1 2 0.178626141 2.043084432 0.111370583
1 3 1.205779717 -0.324083723 -1.397716949
2 1 0.933615199 0.052647056 -0.656486153
2 2 1.647291688 -1.065793671 0.799040546
2 3 1.613663101 -1.955567231 -1.817457972
3 1 -0.621991775 1.634069402 -1.404981646
3 2 -1.899326887 -0.836322394 -1.826351541
3 3 0.164235141 -1.160701812 1.238246459 ",
header = TRUE)
Somehow add_row doesn't take multiple values to its .before parameter.
One way is to split the dataframe wherever Month = 1 and then for each dataframe add a row using add_row above Month = 1.
library(tidyverse)
map_df(split(df, cumsum(df$Month == 1)),
~ add_row(., Scenario = 0, .before = which(.$Month == 1)))
# Scenario Month A B C
#1 0 NA NA NA NA
#2 1 1 -0.5931863 1.04555081 -0.5938163
#3 1 2 0.1786261 2.04308443 0.1113706
#4 1 3 1.2057797 -0.32408372 -1.3977169
#5 0 NA NA NA NA
#6 2 1 0.9336152 0.05264706 -0.6564862
#7 2 2 1.6472917 -1.06579367 0.7990405
#8 2 3 1.6136631 -1.95556723 -1.8174580
#9 0 NA NA NA NA
#10 3 1 -0.6219918 1.63406940 -1.4049816
#11 3 2 -1.8993269 -0.83632239 -1.8263515
#12 3 3 0.1642351 -1.16070181 1.2382465
Here is one option with data.table
library(data.table)
setDT(df1)[, .SD[c(.N+1, seq_len(.N))], Scenario][
!duplicated(Scenario), Scenario := 0][]
# Scenario Month A B C
# 1: 0 NA NA NA NA
# 2: 1 1 -0.5931863 1.04555081 -0.5938163
# 3: 1 2 0.1786261 2.04308443 0.1113706
# 4: 1 3 1.2057797 -0.32408372 -1.3977169
# 5: 0 NA NA NA NA
# 6: 2 1 0.9336152 0.05264706 -0.6564862
# 7: 2 2 1.6472917 -1.06579367 0.7990405
# 8: 2 3 1.6136631 -1.95556723 -1.8174580
# 9: 0 NA NA NA NA
#10: 3 1 -0.6219918 1.63406940 -1.4049816
#11: 3 2 -1.8993269 -0.83632239 -1.8263515
#12: 3 3 0.1642351 -1.16070181 1.2382465
Or as #chinsoon12 mentioned in the comments
setDT(df1)[, rbindlist(.(.(Scenario=0L), c(.(Scenario=rep(Scenario, .N)),
.SD)), use.names=TRUE, fill=TRUE), by=.(Scenario)][, -1L]
data
df1 <- structure(list(Scenario = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L
), Month = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), A = c(-0.593186301,
0.178626141, 1.205779717, 0.933615199, 1.647291688, 1.613663101,
-0.621991775, -1.899326887, 0.164235141), B = c(1.045550808,
2.043084432, -0.324083723, 0.052647056, -1.065793671, -1.955567231,
1.634069402, -0.836322394, -1.160701812), C = c(-0.593816304,
0.111370583, -1.397716949, -0.656486153, 0.799040546, -1.817457972,
-1.404981646, -1.826351541, 1.238246459)), class = "data.frame",
row.names = c(NA,
-9L))
Here's a simple way (without loops) using base R -
df1 <- df[rep(1:nrow(df), (df$Month == 1)+1), ]
df1[duplicated(df1, fromLast = T), ] <- NA
df1$Scenario[is.na(df1$Scenario)] <- 0
df1
Scenario Month A B C
1 0 NA NA NA NA
1.1 1 1 -0.5931863 1.04555081 -0.5938163
2 1 2 0.1786261 2.04308443 0.1113706
3 1 3 1.2057797 -0.32408372 -1.3977169
4 0 NA NA NA NA
4.1 2 1 0.9336152 0.05264706 -0.6564862
5 2 2 1.6472917 -1.06579367 0.7990405
6 2 3 1.6136631 -1.95556723 -1.8174580
7 0 NA NA NA NA
7.1 3 1 -0.6219918 1.63406940 -1.4049816
8 3 2 -1.8993269 -0.83632239 -1.8263515
9 3 3 0.1642351 -1.16070181 1.2382465
Data -
df <- structure(list(Scenario = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L
), Month = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), A = c(-0.593186301,
0.178626141, 1.205779717, 0.933615199, 1.647291688, 1.613663101,
-0.621991775, -1.899326887, 0.164235141), B = c(1.045550808,
2.043084432, -0.324083723, 0.052647056, -1.065793671, -1.955567231,
1.634069402, -0.836322394, -1.160701812), C = c(-0.593816304,
0.111370583, -1.397716949, -0.656486153, 0.799040546, -1.817457972,
-1.404981646, -1.826351541, 1.238246459)), class = "data.frame", row.names = c(NA,
-9L))
This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 4 years ago.
I have a odometer reading with following sample data for different cars. I intend to reset value of odometer to effectively
measure the distance he traveled in an effective manner
Sample data
ID ODometer
1 2132
1 2133
1 2134
1 2135
1 2136
1 2137
2 1123
2 1124
2 1125
Expected:
Expected Output
ID Odometer
1 1
1 2
1 3
1 4
1 5
1 6
2 1
2 2
2 3
We can use row_number() after grouping by 'ID'
library(dplyr)
df1 %>%
group_by(ID) %>%
mutate(Odometer = row_number())
# A tibble: 9 x 3
# Groups: ID [2]
# ID ODometer Odometer
# <int> <int> <int>
#1 1 2132 1
#2 1 2133 2
#3 1 2134 3
#4 1 2135 4
#5 1 2136 5
#6 1 2137 6
#7 2 1123 1
#8 2 1124 2
#9 2 1125 3
data
df1 <- structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L),
ODometer = c(2132L,
2133L, 2134L, 2135L, 2136L, 2137L, 1123L, 1124L, 1125L)),
class = "data.frame", row.names = c(NA, -9L))
I'm trying to get a data frame (just.samples.with.shoulder.values, say) contain only samples that have non-NA values. I've tried to accomplish this using the complete.cases function, but I imagine that I'm doing something wrong syntactically below:
data <- structure(list(Sample = 1:14, Head = c(1L, 0L, NA, 1L, 1L, 1L,
0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L), Shoulders = c(13L, 14L, NA,
18L, 10L, 24L, 53L, NA, 86L, 9L, 65L, 87L, 54L, 36L), Knees = c(1L,
1L, NA, 1L, 1L, 2L, 3L, 2L, 1L, NA, 2L, 3L, 4L, 3L), Toes = c(324L,
5L, NA, NA, 5L, 67L, 785L, 42562L, 554L, 456L, 7L, NA, 54L, NA
)), .Names = c("Sample", "Head", "Shoulders", "Knees", "Toes"
), class = "data.frame", row.names = c(NA, -14L))
just.samples.with.shoulder.values <- data[complete.cases(data[,"Shoulders"])]
print(just.samples.with.shoulder.values)
I would also be interested to know whether some other route (using subset(), say) is a wiser idea. Thanks so much for the help!
You can try complete.cases too which will return a logical vector which allow to subset the data by Shoulders
data[complete.cases(data$Shoulders), ]
# Sample Head Shoulders Knees Toes
# 1 1 1 13 1 324
# 2 2 0 14 1 5
# 4 4 1 18 1 NA
# 5 5 1 10 1 5
# 6 6 1 24 2 67
# 7 7 0 53 3 785
# 9 9 1 86 1 554
# 10 10 1 9 NA 456
# 11 11 1 65 2 7
# 12 12 1 87 3 NA
# 13 13 0 54 4 54
# 14 14 1 36 3 NA
You could try using is.na:
data[!is.na(data["Shoulders"]),]
Sample Head Shoulders Knees Toes
1 1 1 13 1 324
2 2 0 14 1 5
4 4 1 18 1 NA
5 5 1 10 1 5
6 6 1 24 2 67
7 7 0 53 3 785
9 9 1 86 1 554
10 10 1 9 NA 456
11 11 1 65 2 7
12 12 1 87 3 NA
13 13 0 54 4 54
14 14 1 36 3 NA
There is a subtle difference between using is.na and complete.cases.
is.na will remove actual na values whereas the objective here is to only control for a variable not deal with missing values/na's those which could be legitimate data points