I have a data frame, similar to the one below (see dput), recording responses of a variable to a treatment over time:
df <- structure(list( time = c(0, 0, 0, 0, 0, 0, 14, 14, 14, 14, 14, 14, 33, 33, 33, 33, 33, 33, 90, 90, 90, 90, 90, 90),
trt = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L),
.Label = c("1", "2"), class = "factor"),
A1 = c(6.301, 5.426, 5.6021, NA, NA, NA, 6.1663, 6.426, 6.8239, 2.301, 4.7047, 2.301, 5.8062, 4.97, 4.97, 2.301, 2.301, 2.301, 2.301, 2.301, 2.301, 2.301, 2.301, 2.301),
B1 = c(5.727, 5.727, 5.4472, NA, NA, NA, 6.6021, 7.028, 7.1249, 3.028, 3.1663, 3.6021, 5.727, 5.2711, 5.2389, 3.3554, 3.9031, 4.2389, 3.727, 3.6021, 3.6021, 3.8239, 3.727, 3.426)),
row.names = c(NA, -24L), class = c("tbl_df", "tbl", "data.frame"))
which looks lie this:
time trt A1 B1
<dbl> <fct> <dbl> <dbl>
1 0 2 6.30 5.73
2 0 2 5.43 5.73
3 0 2 5.60 5.45
4 0 1 NA NA
5 0 1 NA NA
6 0 1 NA NA
7 14 2 6.17 6.60
8 14 2 6.43 7.03
9 14 2 6.82 7.12
10 14 1 2.30 3.03
In our experiments, we don’t always record values for all treatments at time == 0. I want to replace any missing values (NA) when (and only when) time == 0 with the mean of the trt ‘2’ group at time == 0. So NA in A1 all become 5.78, and those in B1 become 5.63.
Using answers from here and here, as well as some others, I have been able to come up with the following:
df %>%
mutate_if(is.numeric, funs(if_else(is.na(.),if_else(time == 0, 0, .), .)))
This replaces NA at time == 0 with 0 (this is useful for some of my variables where there is no data in any of the treatments at time == 0, but not what i'm after here). I also tried this:
df %>%
mutate_if(is.numeric, funs(if_else(is.na(.),if_else(time == 0, mean(., na.rm = TRUE), .), .)))
This is closer to what I want, but is averaging the values from the whole column/variable. Can I make it average only those values from treatment ‘2’ when time == 0?
I think I would just use indexing in base R for this:
within(df, {A1[is.na(A1) & time == 0] <- mean(A1[trt == "2" & time == 0])
B1[is.na(B1) & time == 0] <- mean(B1[trt == "2" & time == 0])})
#> # A tibble: 24 x 4
#> time trt A1 B1
#> <dbl> <fct> <dbl> <dbl>
#> 1 0 2 6.30 5.73
#> 2 0 2 5.43 5.73
#> 3 0 2 5.60 5.45
#> 4 0 1 5.78 5.63
#> 5 0 1 5.78 5.63
#> 6 0 1 5.78 5.63
#> 7 14 2 6.17 6.60
#> 8 14 2 6.43 7.03
#> 9 14 2 6.82 7.12
#> 10 14 1 2.30 3.03
#> # ... with 14 more rows
Created on 2020-05-15 by the reprex package (v0.3.0)
If we add group_by(time), we can recode the missing columns to the time-specific mean values for the observations where time == 0 as follows.
df <- structure(list( time = c(0, 0, 0, 0, 0, 0, 14, 14, 14, 14, 14, 14, 33, 33, 33, 33, 33, 33, 90, 90, 90, 90, 90, 90),
trt = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L),
.Label = c("1", "2"), class = "factor"),
A1 = c(6.301, 5.426, 5.6021, NA, NA, NA, 6.1663, 6.426, 6.8239, 2.301, 4.7047, 2.301, 5.8062, 4.97, 4.97, 2.301, 2.301, 2.301, 2.301, 2.301, 2.301, 2.301, 2.301, 2.301),
B1 = c(5.727, 5.727, 5.4472, NA, NA, NA, 6.6021, 7.028, 7.1249, 3.028, 3.1663, 3.6021, 5.727, 5.2711, 5.2389, 3.3554, 3.9031, 4.2389, 3.727, 3.6021, 3.6021, 3.8239, 3.727, 3.426)),
row.names = c(NA, -24L), class = c("tbl_df", "tbl", "data.frame"))
library(dplyr)
df %>% group_by(time) %>%
mutate(A1 = if_else(is.na(A1) & time == 0,mean(A1,na.rm=TRUE),A1),
B1 = if_else(is.na(B1) & time == 0,mean(B1,na.rm=TRUE),B1))
...and the output:
# A tibble: 24 x 4
# Groups: time [4]
time trt A1 B1
<dbl> <fct> <dbl> <dbl>
1 0 2 6.30 5.73
2 0 2 5.43 5.73
3 0 2 5.60 5.45
4 0 1 5.78 5.63
5 0 1 5.78 5.63
6 0 1 5.78 5.63
7 14 2 6.17 6.60
8 14 2 6.43 7.03
9 14 2 6.82 7.12
10 14 1 2.30 3.03
# ... with 14 more rows
>
UPDATE: general solution across multiple columns
Per the comments in my answer, here is a solution that uses the development version of dplyr to access the new across() function.
devtools::install_github("tidyverse/dplyr") # needed for across()
# get all columns except time and trt
theColumns <- colnames(df)[!(colnames(df) %in% c("time","trt"))]
df %>% group_by(time) %>%
mutate(across(theColumns,~if_else(is.na(.) & time == 0,mean(.,na.rm=TRUE),.)))
...and the output:
# Groups: time [4]
time trt A1 B1
<dbl> <fct> <dbl> <dbl>
1 0 2 6.30 5.73
2 0 2 5.43 5.73
3 0 2 5.60 5.45
4 0 1 5.78 5.63
5 0 1 5.78 5.63
6 0 1 5.78 5.63
7 14 2 6.17 6.60
8 14 2 6.43 7.03
9 14 2 6.82 7.12
10 14 1 2.30 3.03
# … with 14 more rows
>
As i was unable to access the development version of dplyr to use the new across() function, I combined elements of both answers above to give the result i wanted:
df %>%
mutate_if(is.numeric, funs(if_else(is.na(.) & time == 0, mean(.[trt == "2" & time == 0]), .)))
It looks like across() is intended to replace the _if functions in the long run (see here), but this solution works in the meantime.
Related
I have a toy dataset as follows:
df <- structure(list(id = 1:11, price = c(40.59, 70.42, 1.8, 1.98,
65.02, 2.23, 54.79, 54.7, 3.32, 1.77, 3.5), month_pct = structure(c(11L,
10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 1L, 2L), .Label = c("-19.91%",
"-8.55%", "1.22%", "1.39%", "1.41%", "1.83%", "2.02%", "2.59%",
"2.86%", "6.58%", "8.53%"), class = "factor"), year_pct = structure(c(4L,
9L, 5L, 3L, 10L, 1L, 11L, 8L, 6L, 7L, 2L), .Label = c("-10.44%",
"-19.91%", "-2.46%", "-35.26%", "-4.26%", "-5.95%", "-6.35%",
"-6.91%", "-7.95%", "1.51%", "1.54%"), class = "factor")), class = "data.frame", row.names = c(NA,
-11L))
Out:
id price month_pct year_pct
0 1 40.59 8.53% -35.26%
1 2 70.42 6.58% -7.95%
2 3 1.80 2.86% -4.26%
3 4 1.98 2.59% -2.46%
4 5 65.02 2.02% 1.51%
5 6 2.23 1.83% -10.44%
6 7 54.79 1.41% 1.54%
7 8 54.70 1.39% -6.91%
8 9 3.32 1.22% -5.95%
9 10 1.77 -19.91% -6.35%
10 11 3.50 -8.55% -19.91%
Now I want to count the number and percentage of positive, 0 and negative for columns month_pct and year_pct, how could I do that in R?
Thanks.
One dplyr and tidyr possibility could be:
df %>%
pivot_longer(-c(1:2)) %>%
group_by(name,
value_sign = factor(sign(as.numeric(sub("%", "", value))),
levels = -1:1,
labels = c("negative", "zero", "positive")),
.drop = FALSE) %>%
count() %>%
group_by(name) %>%
mutate(prop = n/sum(n)*100)
name value_sign n prop
<chr> <fct> <int> <dbl>
1 month_pct negative 2 18.2
2 month_pct zero 0 0
3 month_pct positive 9 81.8
4 year_pct negative 9 81.8
5 year_pct zero 0 0
6 year_pct positive 2 18.2
Here's a base R approach using regex:
sts <- data.frame(
sign = c("positive", "zero", "negative"),
month_number = c(length(which(grepl("^\\d", df$month_pct))),
length(which(df$month_pct==0)),
length(which(grepl("^-", df$month_pct)))),
month_percent = c(length(which(grepl("^\\d", df$month_pct)))/length(df$month_pct)*100,
length(which(df$month_pct==0))/length(df$month_pct)*100,
length(which(grepl("^-", df$month_pct)))/length(df$month_pct)*100),
year_number = c(length(which(grepl("^\\d", df$year_pct))),
length(which(df$year_pct==0)),
length(which(grepl("^-", df$year_pct)))),
year_percent = c(length(which(grepl("^\\d", df$year_pct)))/length(df$year_pct)*100,
length(which(df$month_pct==0))/length(df$year_pct)*100,
length(which(grepl("^-", df$year_pct)))/length(df$year_pct)*100)
)
Result:
sts
sign month_number month_percent year_number year_percent
1 positive 9 81.81818 2 18.18182
2 zero 0 0.00000 0 0.00000
3 negative 2 18.18182 9 81.81818
Using dplyr 1.0.0 here is one way :
library(dplyr)
df %>%
summarise(across(c(month_pct, year_pct),
~table(factor(sign(readr::parse_number(as.character(.))),
levels = -1:1)))) %>%
mutate(sign = c('negative', 'zero', 'positive'), .before = month_pct) %>%
rename_at(-1, ~sub('pct', 'n', .)) %>%
mutate(across(-1, list(pct = ~./sum(.) * 100)))
# sign month_n year_n month_n_pct year_n_pct
#1 negative 2 9 18.2 81.8
#2 zero 0 0 0.0 0.0
#3 positive 9 2 81.8 18.2
I have a toy dataset as follows:
df <- structure(list(id = 1:11, price = c(40.59, 70.42, 1.8, 1.98,
65.02, 2.23, 54.79, 54.7, 3.32, 1.77, 3.5), month_pct = structure(c(11L,
10L, 9L, 8L, 7L, 6L, 5L, 4L, 3L, 1L, 2L), .Label = c("-19.91%",
"-8.55%", "1.22%", "1.39%", "1.41%", "1.83%", "2.02%", "2.59%",
"2.86%", "6.58%", "8.53%"), class = "factor"), year_pct = structure(c(4L,
9L, 5L, 3L, 10L, 1L, 11L, 8L, 6L, 7L, 2L), .Label = c("-10.44%",
"-19.91%", "-2.46%", "-35.26%", "-4.26%", "-5.95%", "-6.35%",
"-6.91%", "-7.95%", "1.51%", "1.54%"), class = "factor")), class = "data.frame", row.names = c(NA,
-11L))
Out:
id price month_pct year_pct
0 1 40.59 8.53% -35.26%
1 2 70.42 6.58% -7.95%
2 3 1.80 2.86% -4.26%
3 4 1.98 2.59% -2.46%
4 5 65.02 2.02% 1.51%
5 6 2.23 1.83% -10.44%
6 7 54.79 1.41% 1.54%
7 8 54.70 1.39% -6.91%
8 9 3.32 1.22% -5.95%
9 10 1.77 -19.91% -6.35%
10 11 3.50 -8.55% -19.91%
How could I filter maximum and minimum values of month_pct and year_pct, then show correspondent id and price for those values, how could I do that in R?
The expected could be a table like this or other form at your conveniance:
max_min type pct id price
0 max month_pct 1.13% 7 1.79
1 min month_pct -2.63% 1 1.85
2 max year_pct 0.83% 2 2.42
3 min year_pct -16.06% 9 2.30
Thanks.
You can get the data in long format, convert factor values to numeric using parse_number and for each column name select max and min rows.
library(dplyr)
df %>%
tidyr::pivot_longer(cols = c(month_pct, year_pct)) %>%
mutate(value = readr::parse_number(as.character(value))) %>%
group_by(name) %>%
slice(which.min(value), which.max(value)) %>%
mutate(max_min = c('min', 'max'), .before = 'id')
# max_min id price name value
# <chr> <int> <dbl> <chr> <dbl>
#1 min 10 1.77 month_pct -19.9
#2 max 1 40.6 month_pct 8.53
#3 min 1 40.6 year_pct -35.3
#4 max 7 54.8 year_pct 1.54
I am looking for a way to change my way in such a way that it sorts the data into quintiles instead of the top 5 and bottom 5. My current code looks like this:
CombData <- CombData %>%
group_by(Date) %>%
mutate(
R=min_rank(Value),
E_P = case_when(
R < 6 ~ "5w",
R > max(R, na.rm =TRUE) - 5 ~ "5b",
TRUE ~ NA_character_)
) %>%
ungroup() %>%
arrange(Date, E_P)
My dataset is quite large therefore I will just provide sample data. The data I use is more complex and the code should, therefore, allow for varying lengths of the column Date and also for multiple values that are missing (NAs):
df <- data.frame( Date = c(rep("2010-01-31",16), rep("2010-02-28", 14)), Value=c(rep(c(1,2,3,4,5,6,7,8,9,NA,NA,NA,NA,NA,15),2))
Afterward, I would also like to test the minimum size of quintiles i.e. how many data points are minimum in each quintile in the entire dataset.
The expected output would look like this:
structure(list(Date = structure(c(14640, 14640, 14640, 14640,
14640, 14640, 14640, 14640, 14640, 14640, 14640, 14640, 14640,
14640, 14640, 14640, 14668, 14668, 14668, 14668, 14668, 14668,
14668, 14668, 14668, 14668, 14668, 14668, 14668, 14668), class = "Date"),
Value = c(1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 15, NA, NA, NA, NA,
NA, 2, 3, 4, 5, 6, 7, 8, 9, 15, NA, NA, NA, NA, NA), R = c(1L,
1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, NA, NA, NA, NA,
NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, NA, NA, NA, NA, NA
), S_P = c("Worst", "Worst", "Worst", NA, NA, NA, NA, "Best",
"Best", "Best", NA, NA, NA, NA, NA, NA, "Worst", "Worst", NA, NA,
NA, NA, NA, "Best", "Best", NA, NA, NA, NA, NA)), row.names = c(NA,
-30L), class = c("tbl_df", "tbl", "data.frame"))
Probably, you could use something like this with quantile :
library(dplyr)
out <- CombData %>%
group_by(Date) %>%
mutate(S_P = case_when(Value <= quantile(Value, 0.2, na.rm = TRUE) ~ 'Worst',
Value >= quantile(Value, 0.8, na.rm = TRUE) ~ 'Best'))
You could change the value of quantile according to your preference.
To get minimum number of "Best" and "Worst" we can do :
out %>%
count(Date, S_P) %>%
na.omit() %>%
ungroup() %>%
select(-Date) %>%
group_by(S_P) %>%
top_n(-1, n)
# S_P n
# <chr> <int>
#1 Best 2
#2 Worst 2
When I understand you correctly, you want to rank your column 'Value' and mark those with rank below the quantile 20% with "worst" and those above 80% with "best". After that you want a table.
You could use use ave for both, the ranking and the quantile identification. The quantile function yields three groups, that you can identify with findInterval, code as a factor variable and label them at will. I'm not sure, though, which ranks should be included in the quantiles, I therefore make the E_P coding in two separate columns for comparison purposes.
dat2 <- within(dat, {
R <- ave(Value, Date, FUN=function(x) rank(x, na.last="keep"))
E_P <- ave(R, Date, FUN=function(x) {
findInterval(x, quantile(R, c(.2, .8), na.rm=TRUE))
})
E_P.fac <- factor(E_P, labels=c("worst", NA, "best"))
})
dat2 <- dat2[order(dat2$Date, dat2$E_P), ] ## order by date and E_P
Yields:
dat2
# Date Value E_P.fac E_P R
# 1 2010-01-31 1 worst 0 1.5
# 16 2010-01-31 1 worst 0 1.5
# 2 2010-01-31 2 <NA> 1 3.0
# 3 2010-01-31 3 <NA> 1 4.0
# 4 2010-01-31 4 <NA> 1 5.0
# 5 2010-01-31 5 <NA> 1 6.0
# 6 2010-01-31 6 <NA> 1 7.0
# 7 2010-01-31 7 <NA> 1 8.0
# 8 2010-01-31 8 best 2 9.0
# 9 2010-01-31 9 best 2 10.0
# 15 2010-01-31 15 best 2 11.0
# 10 2010-01-31 NA <NA> NA NA
# 11 2010-01-31 NA <NA> NA NA
# 12 2010-01-31 NA <NA> NA NA
# 13 2010-01-31 NA <NA> NA NA
# 14 2010-01-31 NA <NA> NA NA
# 17 2010-02-28 2 worst 0 1.0
# 18 2010-02-28 3 worst 0 2.0
# 19 2010-02-28 4 <NA> 1 3.0
# 20 2010-02-28 5 <NA> 1 4.0
# 21 2010-02-28 6 <NA> 1 5.0
# 22 2010-02-28 7 <NA> 1 6.0
# 23 2010-02-28 8 <NA> 1 7.0
# 24 2010-02-28 9 <NA> 1 8.0
# 30 2010-02-28 15 best 2 9.0
# 25 2010-02-28 NA <NA> NA NA
# 26 2010-02-28 NA <NA> NA NA
# 27 2010-02-28 NA <NA> NA NA
# 28 2010-02-28 NA <NA> NA NA
# 29 2010-02-28 NA <NA> NA NA
When I check the quintiles of the Rank column, it appears to be right.
quantile(dat2$R, c(.2, .8), na.rm=TRUE)
# 20% 80%
# 2.8 8.2
After that you could just make a table to get the numbers of each category.
with(dat2, table(Date, E_P.fac))
# E_P.fac
# Date worst <NA> best
# 2010-01-31 2 6 3
# 2010-02-28 2 6 1
Data
dat <- structure(list(Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("2010-01-31", "2010-02-28"
), class = "factor"), Value = c(1, 2, 3, 4, 5, 6, 7, 8, 9, NA,
NA, NA, NA, NA, 15, 1, 2, 3, 4, 5, 6, 7, 8, 9, NA, NA, NA, NA,
NA, 15)), row.names = c(NA, -30L), class = "data.frame")
I have a little bit of a tricky question. Here is my data:
> structure(list(seconds = c(689, 689.25, 689.5, 689.75, 690, 690.25, 690.5, 690.75, 691, 691.25, 691.5, 691.75, 692, 692.25, 692.5 ), threat = c(NA, NA, NA, NA, NA, NA, 1L, 1L, 0L, 0L, 1L, NA, NA, 1L, 1L), bins = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L)), .Names = c ("seconds", "threat", "bins"), class = "data.frame", row.names = c(NA, -15L))
seconds threat bins
1 689.00 NA 1
2 689.25 NA 1
3 689.50 NA 1
4 689.75 NA 1
5 690.00 NA 1
6 690.25 NA 2
7 690.50 1 2
8 690.75 1 2
9 691.00 0 2
10 691.25 0 2
11 691.50 1 3
12 691.75 NA 3
13 692.00 NA 3
14 692.25 1 3
15 692.50 1 3
Within each bin, I am trying to calculate the amount of time they are in each type of "threat" in the threat column. So I would need to calculate the difference score every time something different happens in threat and within each bin. So here is an example of something I am hoping to achieve:
bin threat seconds
1 NA 1.25
1 1 0.00
1 0 0.00
2 NA 0.25
2 1 0.50
2 0 0.50
3 NA 0.50
3 1 0.75
3 0 0.00
Here's a tidyverse solution:
df %>% arrange(seconds) %>%
mutate(duration = lead(seconds) - seconds) %>%
complete(bins, threat, fill = list(duration = 0)) %>%
group_by(bins, threat) %>%
summarize(seconds = sum(duration, na.rm = TRUE))
# A tibble: 9 x 3
# Groups: bins [?]
# bins threat seconds
# <int> <int> <dbl>
# 1 1 0 0
# 2 1 1 0
# 3 1 NA 1.25
# 4 2 0 0.5
# 5 2 1 0.5
# 6 2 NA 0.25
# 7 3 0 0
# 8 3 1 0.5
# 9 3 NA 0.5
You may erase complete(bins, threat, fill = list(duration = 0)) if adding rows where seconds is 0 is not necessary.
So, first we arrange the data to be safe. Then due to the interactions between threat we define a new variable duration. Next we add new rows with duration == 0 for those (bins, threat) cases that are not yet present. Lastly we group by bins and threat and sum up the durations.
I want to calculate the mean for several columns and thus create a new column for the mean using dplyr and without melting + merging.
> head(growth2)
CODE_COUNTRY CODE_PLOT IV12_ha_yr IV23_ha_yr IV34_ha_yr IV14_ha_yr IV24_ha_yr IV13_ha_yr
1 1 6 4.10 6.97 NA NA NA 4.58
2 1 17 9.88 8.75 NA NA NA 8.25
3 1 30 NA NA NA NA NA NA
4 1 37 15.43 15.07 11.89 10.00 12.09 14.33
5 1 41 20.21 15.01 14.72 11.31 13.27 17.09
6 1 46 12.64 14.36 13.65 9.07 12.47 12.36
>
I need a new column within the dataset with the mean of all the IV columns.
I tried this:
growth2 %>%
group_by(CODE_COUNTRY, CODE_PLOT) %>%
summarise(IVmean=mean(IV12_ha_yr:IV13_ha_yr, na.rm=TRUE))
And returned several errors depending on the example used, such as:
Error in NA_real_:NA_real_ : NA/NaN argument
or
Error in if (trim > 0 && n) { : missing value where TRUE/FALSE needed
You don't need to group, just select() and then mutate()
library(dplyr)
mutate(df, IVMean = rowMeans(select(df, starts_with("IV")), na.rm = TRUE))
Use . in dplyr.
library(dplyr)
mutate(df, IVMean = rowMeans(select(., starts_with("IV")), na.rm = TRUE))
Here is a dplyr solution using c_across which is designed for row-wise aggregations. This makes it easy to refer to columns by name, type or position and to apply any function to the selected columns.
library("tidyverse")
df <-
tibble::tribble(
~CODE_COUNTRY, ~CODE_PLOT, ~IV12_ha_yr, ~IV23_ha_yr, ~IV34_ha_yr, ~IV14_ha_yr, ~IV24_ha_yr, ~IV13_ha_yr,
1L, 6L, 4.1, 6.97, NA, NA, NA, 4.58,
1L, 17L, 9.88, 8.75, NA, NA, NA, 8.25,
1L, 30L, NA, NA, NA, NA, NA, NA,
1L, 37L, 15.43, 15.07, 11.89, 10, 12.09, 14.33,
1L, 41L, 20.21, 15.01, 14.72, 11.31, 13.27, 17.09,
1L, 46L, 12.64, 14.36, 13.65, 9.07, 12.47, 12.36
)
df %>%
rowwise() %>%
mutate(
IV_mean = mean(c_across(starts_with("IV")), na.rm = TRUE),
IV_sd = sd(c_across(starts_with("IV")), na.rm = TRUE)
)
#> # A tibble: 6 × 10
#> # Rowwise:
#> CODE_COUNTRY CODE_PLOT IV12_ha_yr IV23_ha_yr IV34_ha_yr IV14_ha_yr IV24_ha_yr
#> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 6 4.1 6.97 NA NA NA
#> 2 1 17 9.88 8.75 NA NA NA
#> 3 1 30 NA NA NA NA NA
#> 4 1 37 15.4 15.1 11.9 10 12.1
#> 5 1 41 20.2 15.0 14.7 11.3 13.3
#> 6 1 46 12.6 14.4 13.6 9.07 12.5
#> # … with 3 more variables: IV13_ha_yr <dbl>, IV_mean <dbl>, IV_sd <dbl>
Created on 2022-06-25 by the reprex package (v2.0.1)
I tried to comment on Rick Scriven's answer but don't have the experience points for it. Anyway, wanted to contribute. His answer said to do this:
library(dplyr)
mutate(df, IVMean = rowMeans(select(df, starts_with("IV")), na.rm = TRUE))
That works, but if all columns don't start with "IV", which was my case, how do you do it? Turns out, that select does not want a logical vector, so you can't use AND or OR. For example, you cannot say "starts_with('X') | starts_with('Y')". You have to build a numeric vector. Here is how it is done.
mutate(df, IVMean = rowMeans(select(df, c(starts_with("IV"), starts_with("IX"))), na.rm = TRUE))
you can use as follows:
your data
data<- structure(list(CODE_COUNTRY = c(1L, 1L, 1L, 1L, 1L, 1L), CODE_PLOT = c(6L,
17L, 30L, 37L, 41L, 46L), IV12_ha_yr = c(4.1, 9.88, NA, 15.43,
20.21, 12.64), IV23_ha_yr = c(6.97, 8.75, NA, 15.07, 15.01, 14.36
), IV34_ha_yr = c(NA, NA, NA, 11.89, 14.72, 13.65), IV14_ha_yr = c(NA,
NA, NA, 10, 11.31, 9.07), IV24_ha_yr = c(NA, NA, NA, 12.09, 13.27,
12.47), IV13_ha_yr = c(4.58, 8.25, NA, 14.33, 17.09, 12.36)), .Names = c("CODE_COUNTRY",
"CODE_PLOT", "IV12_ha_yr", "IV23_ha_yr", "IV34_ha_yr", "IV14_ha_yr",
"IV24_ha_yr", "IV13_ha_yr"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
mydata <- cbind(data,IVMean=apply(data[,3:8],1,mean, na.rm=TRUE))
you can also do this
mydata <- cbind(data,IVMean=rowMeans(data[3:8], na.rm=TRUE))