Calculate medians of rows in a grouped dataframe - r

I have a dataframe containing multiple entries per week. It looks like this:
Week t_10 t_15 t_18 t_20 t_25 t_30
1 51.4 37.8 25.6 19.7 11.9 5.6
2 51.9 37.8 25.8 20.4 12.3 6.2
2 52.4 38.5 26.2 20.5 12.3 6.1
3 52.2 38.6 26.1 20.4 12.4 5.9
4 52.2 38.3 26.1 20.2 12.1 5.9
4 52.7 38.4 25.8 20.0 12.1 5.9
4 51.1 37.8 25.7 20.0 12.2 6.0
4 51.9 38.0 26.0 19.8 12.0 5.8
The Weeks have different amounts of entries, they range from one entry for a week to multiple (up to 4) entries a week.
I want to calculate the medians of each week and output it for all the different variables (t_10 throughout to t_30) in a new dataframe. NA cells are already omitted in the original dataframe. I have tried different approaches through the ddply function of the plyrpackage but to no avail so far.

We could use summarise_at for multiple columns
library(dplyr)
colsToKeep <- c("t_10", "t_30")
df1 %>%
group_by(Week) %>%
summarise_at(vars(colsToKeep), median)
# A tibble: 4 x 3
# Week t_10 t_30
# <int> <dbl> <dbl>
#1 1 51.40 5.60
#2 2 52.15 6.15
#3 3 52.20 5.90
#4 4 52.05 5.90

Specify variables to keep in colsToKeep and store input table in d
library(tidyverse)
colsToKeep <- c("t_10", "t_30")
gather(d, variable, value, -Week) %>%
filter(variable %in% colsToKeep) %>%
group_by(Week, variable) %>%
summarise(median = median(value))
# A tibble: 8 x 3
# Groups: Week [4]
Week variable median
<int> <chr> <dbl>
1 1 t_10 51.40
2 1 t_30 5.60
3 2 t_10 52.15
4 2 t_30 6.15
5 3 t_10 52.20
6 3 t_30 5.90
7 4 t_10 52.05
8 4 t_30 5.90

You can also use the aggregate function:
newdf <- aggregate(data = df, Week ~ . , median)

Related

Merging 2 dataframes (when columns are different)

I am trying to merge 2 data frames.
The main dataset, df1, contains numerical data in wide format - each row represents a date, each column contains the value for that date in a given city.
df2 contains metadata for each city: latitude, longitude, and elevation.
What I wish to do is add the metadata for each city to df1, but I was unsuccessful in doing so as the data frames don't match up in structure.
df1
Date Machrihanish High_Wycombe Camborne Dun_Fell Plymouth
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 20200101 8.5 6.9 9.6 3.3 9.9
2 20200102 11.7 9.1 11.2 5 10.9
3 20200103 9.1 9.9 11.2 5.1 11.1
4 20200104 9.2 8.1 9.4 2.2 9.4
5 20200105 11.7 7.6 9 4.3 9.3
6 20200106 10.8 8 11.6 3.7 10.6
7 20200107 14.7 11.7 12 6.7 11.5
8 20200108 11.2 11.8 11.6 6.2 11.3
9 20200109 7 12 11.6 -0.2 11.5
10 20200110 9.3 7.4 10 0 10.1
df2
Location Longitude Latitude Elevation
<chr> <dbl> <dbl> <dbl>
1 Machrihanish -5.70 55.4 10
2 High_Wycombe -0.807 51.7 204
3 Camborne -5.33 50.2 87
4 Dun_Fell -2.45 54.7 847
5 Plymouth -4.12 50.4 50
Here is a solution that tidies the data to long format by location and day, and merges the lat / long information.
Using data provided in the original post, we read it into two data frames.
tempText <- "rowId Date Machrihanish High_Wycombe Camborne Dun_Fell Plymouth
1 20200101 8.5 6.9 9.6 3.3 9.9
2 20200102 11.7 9.1 11.2 5 10.9
3 20200103 9.1 9.9 11.2 5.1 11.1
4 20200104 9.2 8.1 9.4 2.2 9.4
5 20200105 11.7 7.6 9 4.3 9.3
6 20200106 10.8 8 11.6 3.7 10.6
7 20200107 14.7 11.7 12 6.7 11.5
8 20200108 11.2 11.8 11.6 6.2 11.3
9 20200109 7 12 11.6 -0.2 11.5
10 20200110 9.3 7.4 10 0 10.1"
library(tidyr)
library(dplyr)
temps <- read.table(text = tempText,header = TRUE)
latLongs <-"rowId Location Longitude Latitude Elevation
1 Machrihanish -5.70 55.4 10
2 High_Wycombe -0.807 51.7 204
3 Camborne -5.33 50.2 87
4 Dun_Fell -2.45 54.7 847
5 Plymouth -4.12 50.4 50"
latLongs <- read.table(text = latLongs,header = TRUE)
Next, we use tidyr::pivot_longer() to generate long format data, and then merge it with the lat long data via dplyr::full_join(). Note that we set the name of the column where the wide format column names are stored with names_to = "Location" so that full_join() uses Location to join the two data frames.
temps %>%
select(-rowId) %>%
pivot_longer(.,Machrihanish:Plymouth,names_to = "Location", values_to="MaxTemp") %>%
full_join(.,latLongs) %>% select(-rowId) -> joinedData
head(joinedData)
...and the first few rows of joined output looks like this:
> head(joinedData)
# A tibble: 6 × 6
Date Location MaxTemp Longitude Latitude Elevation
<int> <chr> <dbl> <dbl> <dbl> <int>
1 20200101 Machrihanish 8.5 -5.7 55.4 10
2 20200101 High_Wycombe 6.9 -0.807 51.7 204
3 20200101 Camborne 9.6 -5.33 50.2 87
4 20200101 Dun_Fell 3.3 -2.45 54.7 847
5 20200101 Plymouth 9.9 -4.12 50.4 50
6 20200102 Machrihanish 11.7 -5.7 55.4 10
>

Repeat mean values based in a custom ordination

I have a min to max custom ordination "Class_0_1","Class_1_3","Class_3_9", "Class_9_25","Class_25_50","Class_50"
library(dplyr)
my.ds <- read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/test_ants.csv")
my.ds$ClassType <- cut(my.ds$AT,breaks=c(-Inf,1,2.9,8.9,24.9,49.9,Inf),
right=FALSE,labels=c("Class_0_1","Class_1_3","Class_3_9",
"Class_9_25","Class_25_50","Class_50"))
my.ds%>% group_by(nest,ClassType)%>% summarize(avg=mean(AT))
# A tibble: 14 x 3
# Groups: nest [7]
nest ClassType avg
<int> <fct> <dbl>
1 2 Class_9_25 19.0
2 3 Class_0_1 0.776
3 3 Class_9_25 12.4
4 3 Class_25_50 29.4
5 4 Class_1_3 2.42
6 4 Class_9_25 17.0
7 7 Class_9_25 18.2
8 7 Class_25_50 33.1
9 10 Class_3_9 5.22
10 10 Class_9_25 13.6
11 10 Class_25_50 38.9
12 10 Class_50 110.
13 1066 Class_0_1 0.111
14 1067 Class_0_1 0.436
I'd like to repeat the last mean value inside the intermediate absent ClassType by nest. The desirable output for nest 3 for example is:
nest ClassType avg
<int> <fct> <dbl>
...
3 Class_0_1 0.776
3 Class_1_3 0.776
3 Class_3_9 0.776
3 Class_9_25 12.4
3 Class_25_50 29.4
...
#
You may try using complete and fill
my.ds %>%
group_by(nest,ClassType)%>%
summarize(avg=mean(AT)) %>%
complete(ClassType, fill = list(avg = NA)) %>%
fill(avg, .direction = "downup")
nest ClassType avg
<int> <fct> <dbl>
1 2 Class_0_1 19.0
2 2 Class_1_3 19.0
3 2 Class_3_9 19.0
4 2 Class_9_25 19.0
5 2 Class_25_50 19.0
6 2 Class_50 19.0
7 3 Class_0_1 0.776
8 3 Class_1_3 0.776
9 3 Class_3_9 0.776
10 3 Class_9_25 12.4
# … with 32 more rows

Add -0.5 to a value below 0 and add 0.5 to value above 0 in r

I maybe have a strange question...I have a dataframe as below:
Station Mean_length Diff
1 AMEL 28.1 -2.91
2 AMRU 21.1 -9.90
3 BALG 31.0 0
4 BORK 30.1 -0.921
5 BUSU 22.6 -8.38
6 CADZ 28.5 2.46
7 DOLL 27.9 -3.07
8 EGMO 28.3 -2.69
9 EIER 30.8 0.233
10 FANO 23.1 -7.89
Now from column "Diff" I want to get a new column and I want to add -0.5 to a value below 0 and add 0.5 to value above 0.
So I get a new dataframe like this:
Station Mean_length Diff Diff05
1 AMEL 28.1 -2.91 -3.41 (-0.5)
2 AMRU 21.1 -9.90 -13.8 (-0.5)
3 BALG 31.0 0 0.5 (+0.5)
4 BORK 30.1 -0.921 -1.421 (-0.5)
5 BUSU 22.6 -8.38 -8.88 (-0.5)
6 CADZ 28.5 2.46 2.96 (+0.5)
7 DOLL 27.9 -3.07 -3.57 (-0.5)
8 EGMO 28.3 -2.69 -3.19 (-0.5)
9 EIER 30.8 0.233 0.733 (+0.5)
10 FANO 23.1 -7.89 -8.39 (-0.5)
How can I tackle this? Is there something in dplyr possible? with the 'ifelse' function? recognizing values when they are haven the '-' in front of them....
Thank you I advance!
Another way:
df$Diff05 <- df$Diff + 0.5 * sign(df$Diff)
Station Mean_length Diff Diff05
1 AMEL 28.1 -2.910 -3.410
2 AMRU 21.1 -9.900 -10.400
3 BALG 31.0 0.000 0.000
4 BORK 30.1 -0.921 -1.421
5 BUSU 22.6 -8.380 -8.880
6 CADZ 28.5 2.460 2.960
7 DOLL 27.9 -3.070 -3.570
8 EGMO 28.3 -2.690 -3.190
9 EIER 30.8 0.233 0.733
10 FANO 23.1 -7.890 -8.390
You could also use df$Diff + (df$Diff>0) - 0.5
Does this work:
library(dplyr)
df %>% mutate(Diff05 = if_else(Diff < 0, Diff - 0.5, Diff + 0.5))
# A tibble: 10 x 4
station Mean_length Diff Diff05
<chr> <dbl> <dbl> <dbl>
1 AMEL 28.1 -2.91 -3.41
2 AMRU 21.1 -9.9 -10.4
3 BALG 31 0 0.5
4 BORK 30.1 -0.921 -1.42
5 BUSU 22.6 -8.38 -8.88
6 CADZ 28.5 2.46 2.96
7 DOLL 27.9 -3.07 -3.57
8 EGMO 28.3 -2.69 -3.19
9 EIER 30.8 0.233 0.733
10 FANO 23.1 -7.89 -8.39
The logical way
df$Diff05 <- ifelse(test = df$Diff < 0, yes = df$Diff - 0.5, no = df$Diff + 0.5)

Apply row-wise transformation in R so that total percentage for each row will be 100%

I have a data frame like this:
df <- structure(list(groups= c("group1", "group2", "group3", "group4"),
A = c(28.6, 26.7, 29.1,23.1,1.0),
B = c(24.5, 22.3,23.9,20.2,1.5),
C = c(12.1,11.2,12.1,11.7,1.5),
D = c(9.4,7.0,9.0,8.7,1.1)),
class = "data.frame",
row.names = c("1","2","3","4"))
groups A B C D
1 group1 28.6 24.5 12.1 9.4
2 group2 26.7 22.3 11.2 7.0
3 group3 29.1 23.9 12.1 9.0
4 group4 23.1 20.2 11.7 8.7
The values in the dataframe are in percentage. I would like to grow the total percent for each row to be 100%. So the output would look similar like this(BTW, I calculated the expected output by hand, so it may not be so accurate as computer calculated):
groups A B C D
1 group1 38.3 32.8 16.2 12.6
2 group2 39.7 33.1 16.7 10.4
3 group3 39.7 32.6 16.4 11.3
4 group4 36.3 31.5 18.9 13.3
How should I do it? Thank you!
You can use proportions to get percentages.
proportions(as.matrix(df[1:4,-1]), 1) * 100
# A B C D
#1 38.33780 32.84182 16.21984 12.60054
#2 39.73214 33.18452 16.66667 10.41667
#3 39.27126 32.25371 16.32928 12.14575
#4 36.26374 31.71115 18.36735 13.65777
If you want to do this in the dplyr context:
df %>%
rowwise() %>%
mutate(sm = sum(c_across(-groups))) %>%
mutate(across(A:D, function(x)x/sm)*100) %>%
select(-sm)
## A tibble: 5 x 5
## Rowwise:
# groups A B C D
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 group1 38.3 32.8 16.2 12.6
#2 group2 39.7 33.2 16.7 10.4
#3 group3 39.3 32.3 16.3 12.1
#4 group4 36.3 31.7 18.4 13.7
#5 group5 19.6 29.4 29.4 21.6

Computation of the average of n observations in a data set using R

Please, how do I compute the average, that is, mean of the last 5 observations by class in a data: the first column is the class i.e., Plot and the second column is the measured variable i.e., Weight.
Plot Weight
1 12.5
1 14.5
1 15.8
1 16.1
1 18.9
1 21.2
1 23.4
1 25.7
2 13.1
2 15.0
2 15.8
2 16.3
2 17.4
2 18.6
2 22.6
2 24.1
2 25.6
3 11.5
3 12.2
3 13.9
3 14.7
3 18.9
3 20.5
3 21.6
3 22.6
3 24.1
3 25.8
We select the last 5 observation for each 'Plot and get the mean
library(dplyr)
df1 %>%
group_by(Plot) %>%
summarise(MeanWt = mean(tail(Weight, 5)))
Or with data.table
library(data.table)
setDT(df1)[, .(MeanWt = mean(tail(Weight, 5))), by = Plot]
Or using base R
aggregate(cbind(MeanWt = Weight) ~ Plot, FUN = function(x) mean(tail(x, 5)))
I made this without a library:
It's a step-by-step solution, of course you can make the code shorter using a for or apply.
Hope you find it useful.
#Collecting your data
values <- scan()
1 12.5 1 14.5 1 15.8 1 16.1 1 18.9 1 21.2 1 23.4 1 25.7 2 13.1 2 15.0 2 15.8
2 16.3 2 17.4 2 18.6 2 22.6 2 24.1 2 25.6 3 11.5 3 12.2 3 13.9 3 14.7 3 18.9
3 20.5 3 21.6 3 22.6 3 24.1 3 25.8
data_w <- matrix(values, ncol=2, byrow = T)
#Naming your cols
colnames(data_w) <- c("Plot", "Weight")
dt_w <- as.data.frame(data_w)
#Mean of the 5 last observations by class:
#Computing number of Plots = 1
size1 <- length(which(dt_w$Plot == 1))
#Value to compute the last 5 values
index1 <- size1 - 5
#Way to compute the mean
mean1 <- mean(dt_w$Weight[index1:size1])
#mean of the last 5 observations of class 1
mean1
To compute for the class 2 and 3 it's the same process.

Resources