Aggregate top 3 values of a numerical column - r

I have a column of species abundance
Pct_Cov Species Site Plot
1 2.25 AMLA AC 1
2 4.75 BECA4 AC 1
3 9.50 BEPA AC 1
4 7.00 BEPO AC 1
5 9.25 PIRU AC 1
6 2.25 PIRI AC 1
tail
tail(st.ov)
Pct_Cov Species Site Plot
612207 8.0 QUGA ZI 527
612208 1.0 RHAR4 ZI 527
612209 0.5 ARTR2 ZI 527
612210 1.0 POFE ZI 527
612211 3.0 VICIA ZI 527
612212 0.5 ARLU ZI 527
There are a LOT of plots here, 12438 to be exact. Each plot has a variety of different species, etc. I'm trying to write a function that creates a new column to calculate the ratio of the abundance of the dominant species / abundance of the subordinate species.
"Dominant" would be the sum of the top 1/4 of the species per each plot. So if a plot had 20 species, it would be the sum of the abundance of the 4 most abundant species.
I'm having a hard time going about this and was wondering if anyone had any tips. It would also be helpful to know what those species are, but that seems to be tricky.
Thanks!

Here's another tidyverse option. Since your data only has 6 rows for each of two Plots, I'll go with the "top 2" and "all but top 2", instead of your "4". It's easily modified.
dat %>%
group_by(Plot) %>%
mutate(R = dense_rank(Pct_Cov)) %>%
summarize(Ratio = sum(Pct_Cov[R %in% 1:2]) / sum(Pct_Cov[! R %in% 1:2]))
# # A tibble: 2 x 2
# Plot Ratio
# <int> <dbl>
# 1 1 0.359
# 2 527 0.273
This does not protect against plots with few unique species. For that, one might add some row-counting logic:
dat %>%
group_by(Plot) %>%
mutate(R = dense_rank(Pct_Cov)) %>%
summarize(Ratio = if (n() > (2+2)) sum(Pct_Cov[R %in% 1:2]) / sum(Pct_Cov[! R %in% 1:2]) else NA_real_)
If you get an NA, that means that that Plot had too few unique species.
Also, it doesn't acknowledge the possibility of 3 (my "2" plus one) having the same Pct_Cov, which sounds unlikely but would be a corner-case that will skew the math.
Data
dat <- structure(list(Pct_Cov = c(2.25, 4.75, 9.5, 7, 9.25, 2.25, 8, 1, 0.5, 1, 3, 0.5), Species = c("AMLA", "BECA4", "BEPA", "BEPO", "PIRU", "PIRI", "QUGA", "RHAR4", "ARTR2", "POFE", "VICIA", "ARLU"), Site = c("AC", "AC", "AC", "AC", "AC", "AC", "ZI", "ZI", "ZI", "ZI", "ZI", "ZI"), Plot = c(1L, 1L, 1L, 1L, 1L, 1L, 527L, 527L, 527L, 527L, 527L, 527L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "612207", "612208", "612209", "612210", "612211", "612212"))

We could use count to get the frequency count of 'plot', 'Species', arrange by 'plot' and descending order of 'n', then grouped by 'plot', create the ratio by taking the sum of first 3 'n' values divided by the sum of the rest and join with the original data
library(dplyr)
out <- df1 %>%
count(plot, Species) %>%
arrange(plot, desc(n)) %>%
group_by(plot) %>%
mutate(ratio = sum(n[1:3])/sum(n[-(1:3)])) %>%
right_join(df1)

Related

Summary means using dplyr

I'm trying to produce a table with summary totals and means across the whole dataset, and then by sub-category (f_grp), and show this by site.
I can use the group_by function to group by, which works well for reporting total_count and Mean_per_litre, but I would then like the same values for each category, as shown in f_grp.
|Site
|total_count
|Mean_per_litre
|1 |66 |3.33333333
|2 |77 |4.27777778
|3 |65 |3.38541667
|4 |154 |8.85057471
etc
I've tried group_by for both site and f_grp but this isn't quite right
|site
|f_grp
|total_count
|mean_per_litre
|1 |1c |3 |1.666667
|1 |1d |15 |4.166667
|1 |2a |1 |1.666667
|1 |2b |47 |11.190476
This isn't quite right as its not easy to read and I've now lost the original total columns I had in the first table (sorry about the tables, cant get them to work here).
dat$site=as.factor(dat$site)
dat$count=as.numeric(dat$count)
dat$f_grp=as.factor(dat$f_grp)
# totals across all f_grp
tabl1 <- dat %>%
group_by(site) %>%
summarise (total_count = sum(count), Mean_per_litre = mean(count_l_site))
tabl1
# totals FG 1b
tabl2 <- dat %>%
group_by(site) %>%
filter(f_grp== '1b') %>%
summarise ('1b_total_count' = sum(count))
tabl2
### BUT - this doesnt give a correct mean, as it only shows the mean of '1b' when only '1b' is present. I need a mean over the entire dataset at that site.
# table showing totals across whole dataset
tabl7 <- dat %>%
summarise (total_count = sum(count, na.rm = TRUE), Total_mean_per_litre = mean(count_l_site, na.rm = TRUE))
tabl7
# table with means for each site by fg
table6 <- dat %>%
group_by(site, f_grp) %>%
summarise (total_count = sum(count), mean_per_litre = mean(count_l_site, na.rm = TRUE))
table6
Ideally I need a way to extract the f-grp categories, put them as column headings, and then summarise means by site for those categories. But filtering the data and then joining multiple tables, gives incorrect means (as not mean of whole dataset, but a subset of that category, ie: when f_grp value is present only).
Many thanks to all who have read this far :)
> dput(head(dat))
structure(list(X = 1:6, site = structure(c(1L, 10L, 11L, 12L,
13L, 14L), levels = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18"), class = "factor"),
count = c(0, 0, 0, 0, 0, 0), f_grp = structure(c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), levels = c("1b", "1c", "1d", "2a", "2b"), class = "factor"),
count_l_site = c(0, 0, 0, 0, 0, 0)), row.names = c(NA, 6L
), class = "data.frame")
Updated:
Following advice here from Jon, and using the mtcars data (which worked as expected) I've tried the same method using my own data.
I can almost produce what's needed, but the totals are coming through as a row when they are needed as a column.
tabl1 <- dat %>%
group_by(site) %>%
summarise (total_count = sum(count), Mean_per_litre = mean(count_l_site)) %>%
mutate(fg = "total")
tabl1
tabl2_fg <- dat %>%
group_by(site, f_grp = as.character(f_grp)) %>%
summarize(total_count = sum(count), Mean_per_litre = mean(count_l_site))
tabl2_fg
tabl4 <-
bind_rows(tabl1, tabl2_fg) %>%
arrange(site, f_grp) %>%
tidyr::pivot_wider(names_from = f_grp, values_from = c(Mean_per_litre, total_count), names_vary = "slowest")
tabl4
Output as follows
Next steps:
move the circled outputs and put them at the beginning of the table
remove every other line
result - left with a simple table rows = sites; columns: total count; total mean; then columns for each fg count & mean: eg 1c count; 1c mean; 1d count; 1d mean.
Something like this?
library(dplyr)
avg_gear <- mtcars %>%
group_by(gear) %>%
summarize(avg_mpg = mean(mpg), n = n()) %>%
mutate(cyl = "total")
avg_gear_cyl <- mtcars %>%
group_by(gear,cyl = as.character(cyl)) %>%
summarize(avg_mpg = mean(mpg), n = n())
bind_rows(avg_gear, avg_gear_cyl) %>%
arrange(gear, cyl)
# A tibble: 11 × 4
gear avg_mpg n cyl
<dbl> <dbl> <int> <chr>
1 3 21.5 1 4
2 3 19.8 2 6
3 3 15.0 12 8
4 3 16.1 15 total
5 4 26.9 8 4
6 4 19.8 4 6
7 4 24.5 12 total
8 5 28.2 2 4
9 5 19.7 1 6
10 5 15.4 2 8
11 5 21.4 5 total
Or if you want categories as columns:
bind_rows(avg_gear, avg_gear_cyl) %>%
arrange(gear, desc(cyl)) %>%
tidyr::pivot_wider(names_from = cyl, values_from = c(avg_mpg, n), names_vary = "slowest")
# A tibble: 3 × 9
gear avg_mpg_total n_total avg_mpg_8 n_8 avg_mpg_6 n_6 avg_mpg_4 n_4
<dbl> <dbl> <int> <dbl> <int> <dbl> <int> <dbl> <int>
1 3 16.1 15 15.0 12 19.8 2 21.5 1
2 4 24.5 12 NA NA 19.8 4 26.9 8
3 5 21.4 5 15.4 2 19.7 1 28.2 2

Making a table that contains Mean and SD of a Dataset

I am using this dataset: http://www.openintro.org/stat/data/cdc.R
to create a table from a subset that only contains the means and standard deviations of male participants. The table should look like this:
Mean Standard Deviation
Age: 44.27 16.715
Height: 70.25 3.009219
Weight: 189.3 36.55036
Desired Weight: 178.6 26.25121
I created a subset for males and females with this code:
mdata <- subset(cdc, cdc$gender == ("m"))
fdata <- subset(cdc, cdc$gender == ("f"))
How should I create a table that only contains means and SDs of age, height, weight, and desired weight using these subsets?
The data frame you provided sucked up all the memory on my laptop, and it's not needed to provide that much data to solve your problem. Here's a dplyr/tidyr solution to create a summary table grouped by categories, using the starwars dataset available with dplyr:
library(dplyr)
library(tidyr)
starwars |>
group_by(sex) |>
summarise(across(
where(is.numeric),
.fns = list(Mean = mean, SD = sd), na.rm = TRUE,
.names = "{col}__{fn}"
)) |>
pivot_longer(-sex, names_to = c("var", ".value"), names_sep = "__")
# A tibble: 15 × 4
sex var Mean SD
<chr> <chr> <dbl> <dbl>
1 female height 169. 15.3
2 female mass 54.7 8.59
3 female birth_year 47.2 15.0
4 hermaphroditic height 175 NA
5 hermaphroditic mass 1358 NA
6 hermaphroditic birth_year 600 NA
7 male height 179. 36.0
8 male mass 81.0 28.2
9 male birth_year 85.5 157.
10 none height 131. 49.1
11 none mass 69.8 51.0
12 none birth_year 53.3 51.6
13 NA height 181. 2.89
14 NA mass 48 NA
15 NA birth_year 62 NA
Just make a data frame of colMeans and column sd. Note, that you may also select columns.
fdata <- subset(cdc, gender == "f", select=c("age", "height", "weight", "wtdesire"))
data.frame(mean=colMeans(fdata), sd=apply(fdata, 2, sd))
# mean sd
# age 45.79772 17.584420
# height 64.36775 2.787304
# weight 151.66619 34.297519
# wtdesire 133.51500 18.963014
You can also use by to do it simultaneously for both groups, it's basically a combination of split and lapply. (To avoid apply when calculating column SDs, you could also use sd=matrixStats::colSds(as.matrix(fdata)) which is considerably faster.)
res <- by(cdc[c("age", "height", "weight", "wtdesire")], cdc$gender, \(x) {
data.frame(mean=colMeans(x), sd=matrixStats::colSds(as.matrix(x)))
})
res
# cdc$gender: m
# mean sd
# age 44.27307 16.719940
# height 70.25165 3.009219
# weight 189.32271 36.550355
# wtdesire 178.61657 26.251215
# ------------------------------------------------------------------------------------------
# cdc$gender: f
# mean sd
# age 45.79772 17.584420
# height 64.36775 2.787304
# weight 151.66619 34.297519
# wtdesire 133.51500 18.963014
To extract only one of the data frames in the list-like object use e.g. res$m.
Usually we use aggregate for this, which you also might consider:
aggregate(cbind(age, height, weight, wtdesire) ~ gender, cdc, \(x) c(mean=mean(x), sd=sd(x))) |>
do.call(what=data.frame)
# gender age.mean age.sd height.mean height.sd weight.mean weight.sd wtdesire.mean wtdesire.sd
# 1 m 44.27307 16.71994 70.251646 3.009219 189.32271 36.55036 178.61657 26.25121
# 2 f 45.79772 17.58442 64.367750 2.787304 151.66619 34.29752 133.51500 18.96301
The pipe |> call(what=data.frame) is just needed to get rid of matrix columns, which is useful in case you aim to further process the data.
Note: R >= 4.1 used.
Data:
source('https://www.openintro.org/stat/data/cdc.R')
or
cdc <- structure(list(genhlth = structure(c(3L, 3L, 1L, 5L, 3L, 3L), levels = c("excellent",
"very good", "good", "fair", "poor"), class = "factor"), exerany = c(0,
1, 0, 0, 1, 1), hlthplan = c(1, 1, 1, 1, 1, 1), smoke100 = c(1,
0, 0, 0, 0, 1), height = c(69, 66, 73, 65, 67, 69), weight = c(224L,
215L, 200L, 216L, 165L, 170L), wtdesire = c(224L, 140L, 185L,
150L, 165L, 165L), age = c(73L, 23L, 35L, 57L, 81L, 83L), gender = structure(c(1L,
2L, 1L, 2L, 2L, 1L), levels = c("m", "f"), class = "factor")), row.names = c("19995",
"19996", "19997", "19998", "19999", "20000"), class = "data.frame")

r transfer values from one dataset to another by ID

I have two datasets , the first dataset is like this
ID Weight State
1 12.34 NA
2 11.23 IA
2 13.12 IN
3 12.67 MA
4 10.89 NA
5 14.12 NA
The second dataset is a lookup table for state values by ID
ID State
1 WY
2 IA
3 MA
4 OR
4 CA
5 FL
As you can see there are two different state values for ID 4, which is normal.
What I want to do is replace the NAs in dataset1 State column with State values from dataset 2. Expected dataset
ID Weight State
1 12.34 WY
2 11.23 IA
2 13.12 IN
3 12.67 MA
4 10.89 OR,CA
5 14.12 FL
Since ID 4 has two state values in dataset2 , these two values are collapsed and separated by , and used to replace the NA in dataset1. Any suggestion on accomplishing this is much appreciated. Thanks in advance.
Collapse df2 value and join it with df1 by 'ID'. Use coalesce to use non-NA value from the two state columns.
library(dplyr)
df1 %>%
left_join(df2 %>%
group_by(ID) %>%
summarise(State = toString(State)), by = 'ID') %>%
mutate(State = coalesce(State.x, State.y)) %>%
select(-State.x, -State.y)
# ID Weight State
#1 1 12.3 WY
#2 2 11.2 IA
#3 2 13.1 IN
#4 3 12.7 MA
#5 4 10.9 OR, CA
#6 5 14.1 FL
In base R with merge and transform.
merge(df1, aggregate(State~ID, df2, toString), by = 'ID') |>
transform(State = ifelse(is.na(State.x), State.y, State.x))
Tidyverse way:
library(tidyverse)
df1 %>%
left_join(df2 %>%
group_by(ID) %>%
summarise(State = toString(State)) %>%
ungroup(), by = 'ID') %>%
transmute(ID, Weight, State = coalesce(State.x, State.y))
Base R alternative:
na_idx <- which(is.na(df1$State))
df1$State[na_idx] <- with(
aggregate(State ~ ID, df2, toString),
State[match(df1$ID, ID)]
)[na_idx]
Data:
df1 <- structure(list(ID = c(1L, 2L, 2L, 3L, 4L, 5L), Weight = c(12.34,
11.23, 13.12, 12.67, 10.89, 14.12), State = c("WY", "IA", "IN",
"MA", "OR, CA", "FL")), row.names = c(NA, -6L), class = "data.frame")
df2 <- structure(list(ID = c(1L, 2L, 3L, 4L, 4L, 5L), State = c("WY",
"IA", "MA", "OR", "CA", "FL")), class = "data.frame", row.names = c(NA,
-6L))

Calculating a rate of change between min and max years per subgroup

I am relatively new to R and sorry if the question was already asked but I obviously either can't understand the answers or can't find the right key words!
Here is my problem : I have a dataset that looks like that:
Name Year Corg
1 Bois 17 2001 1.7
2 Bois 17 2007 2.1
3 Bois 17 2014 1.9
4 8-Toume 2000 1.7
5 8-Toume 2015 1.4
6 7-Richelien 2 2004 1.1
7 7-Richelien 2 2017 1.5
8 7-Richelien 2 2019 1.2
9 Communaux 2003 1.4
10 Communaux 2016 3.8
11 Communaux 2019 2.4
12 Cocandes 2000 1.7
13 Cocandes 2014 2.1
As you can see, I sometimes have two or three rows of results per Name (theoretically I could even have 4, 5 or more rows per Name).
For each name, I would like to calculate the annual Corg rate of change between the highest year and lowest year.
More specificaly, I would like to do:
(Corg_of_highest_year/Corg_of_lowest_year)^(1/(lowest_year-highest_year))-1
Could you explain me how you would obtain a summarizing dataset that would look like that:
Name Length_in_years Corg_rate
Bois 17 13 0.9%
8-Toume 15 -1.3%
etc.
We can do the calculation using group_by in dplyr
library(dplyr)
df %>%
group_by(Name) %>%
summarise(Length = diff(range(Year)),
Corg_rate = ((Corg[which.max(Year)]/Corg[which.min(Year)]) ^
(1/Length) - 1) * 100)
# A tibble: 5 x 3
# Name Length Corg_rate
# <fct> <int> <dbl>
#1 7-Richelien2 15 0.582
#2 8-Toume 15 -1.29
#3 Bois17 13 0.859
#4 Cocandes 14 1.52
#5 Communaux 16 3.43
To perform the analysis with most recent year and the year with minimum 5 years of difference
df %>%
group_by(Name) %>%
summarise(Length = max(Year) - max(Year[Year <= max(Year) - 5]),
Corg_rate = (Corg[which.max(Year)]/Corg[Year == max(Year[Year <= (max(Year) - 5)])]) ^ (1/Length) - 1,
Corg_rate = Corg_rate * 100)
# Name Length Corg_rate
# <fct> <int> <dbl>
#1 7-Richelien2 15 0.582
#2 8-Toume 15 -1.29
#3 Bois17 7 -1.42
#4 Cocandes 14 1.52
#5 Communaux 16 3.43
data
df <- structure(list(Name = structure(c(3L, 3L, 3L, 2L, 2L, 1L, 1L,
1L, 5L, 5L, 5L, 4L, 4L), .Label = c("7-Richelien2", "8-Toume",
"Bois17", "Cocandes", "Communaux"), class = "factor"), Year = c(2001L,
2007L, 2014L, 2000L, 2015L, 2004L, 2017L, 2019L, 2003L, 2016L,
2019L, 2000L, 2014L), Corg = c(1.7, 2.1, 1.9, 1.7, 1.4, 1.1,
1.5, 1.2, 1.4, 3.8, 2.4, 1.7, 2.1)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13"))
By first creating an indicator of when the year is max and min in group Name and then spreading the Corg column into MAX_Corg (Corg of the max year) and MIN_corg we can later easily calculate the rate of change.
my_df %>%
group_by(Name) %>%
mutate( #new column denoting the max and min
year_max_min = ifelse(Year == max(Year), "MAX_corg",
ifelse(Year == min(Year), "MIN_corg",
NA
)
)
) %>%
filter(!(is.na(year_max_min))) %>% # removing NA
group_by(Name, year_max_min) %>% #grouping by Name and max_min indicator
summarise(Corg= Corg) %>% #summarising
spread(year_max_min, Corg) %>% #spread the indicator into two column; MAX_corg and MIN_corg
mutate(
rate_of_change = (MAX_corg / MIN_corg)^(1/(MIN_corg - MAX_corg)) - 1 # calculates rate of change
)
Use dplyr group_by(name) and then calculate your value. Here is an example
library(dplyr)
data %>%
group_by(name) %>%
summarise(Length = max(Year)-min(Year), Corg_End = sum(Corg[Year==max(Year), Corg_Start = sum(Corg[Year==min(Year)]))
This shows you the logic of grouping, i.e. after group_by(name) max(Year) will give out the highest year per name instead of overall. Using this logic calculating the change rate should be easy but I won't attempt to try for lack of reproducible data.
Here is a solution using data.table:
df = data.table(df)
mat = df[, .(
Rate = 100*((Corg[which.max(Year)] / Corg[which.min(Year)])^(1/diff(range(Year))) - 1)
), by = Name]
> mat
Name Rate
1: Bois17 0.8592524
2: 8-Toume -1.2860324
3: 7-Richelien2 0.5817615
4: Communaux 3.4261123
5: Cocandes 1.5207989

Creating a time series plot and converting numeric data to date

I want to create a plot of time per temperature in 2 sites. I have data of the temperature each 10 minutes a day from february to april and I need daily cycles of hourly averages of temperature to plot.
I calculated the mean temperature for hour a day and try to create a plot with geom_plot and geopm_line of different ways.
data <- read.xlsx("temperatura.xlsx", 1)
data <- data %>% mutate (month = as.factor(month), month = as.factor (month), day = as.factor(day), h = as.factor(h), min = as.factor(min))
head (data)
month day h min t.site1 t.site2
2 1 0 0 15.485 16.773
2 1 0 10 15.509 16.773
2 1 0 20 15.557 16.773
2 1 0 30 15.557 16.773
2 1 0 40 15.605 16.773
2 1 0 50 15.605 16.773
str(data)
'data.frame': 12816 obs. of 6 variables:
$ month : Factor w/ 3 levels "2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
$ day : Factor w/ 31 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ h : Factor w/ 24 levels "0","1","2","3",..: 1 1 1 1 1 1 2 2 2 2 ...
$ min : Factor w/ 6 levels "0","10","20",..: 1 2 3 4 5 6 1 2 3 4 ...
$ t.site1: num 15.5 15.5 15.6 15.6 15.6 ...
$ t.site2: num 16.8 16.8 16.8 16.8 16.8 ...
hour <- group_by(data, month, day, h)
mean.h.site1 <- summarize(hour, mean.h.site1 = mean(t.site1))
t1 <- ggplot (data = mean.h.site1, aes(x=h, y=mean.h.site1)) +
geom_line()
t2 <- ggplot(data = mean.h.site1, aes(x=h, y=mean.h.site1, group = month))+
geom_line() +
geom_point()
t3 <- ggplot (data = mean.h.site1, aes(x=day, y=mean.h.site1, group=1))+
geom_point()
I expect the output of the variability of temperature across the time for each site, but the actual output show temperature variability during each day.
It's interesting that your data is showing month, day and hour as factor. Is it possible that there are some character values somewhere in that column when you read the data? It's very unusual to see numbers stored as factor in that fashion.
I'll do 4 things:
Convert factors to numbers
Convert numbers to dates
Convert a wide table to a long one, and finally
plot the temps against a real date
# Load packages and data
library(data.table) # for overall fast data processing
library(lubridate) # for dates wrangling
library(ggplot2) # plotting
dt <- fread("month day h min t.site1 t.site2
2 1 0 0 15.485 16.773
2 1 0 10 15.509 16.773
2 1 0 20 15.557 16.773
2 1 0 30 15.557 16.773
2 1 0 40 15.605 16.773
2 1 0 50 15.605 16.773")
# Convert factors to numbers (I actuall didn't run this because I just created the data.table, but it seems you'll need to do it):
dt[, names(dt)[1:4] := lapply(.SD, function(x) as.numeric(as.character(x)), .SDcols = 1:4]
# Create proper dates. We'll consider all dates occurring in 2019.
dt[, date := ymd_hm(paste0("2019/", month, "/", day, " ", h, ":", min))]
# convert wide data to long one
dt2 <- melt(dt[, .(date, t.site1, t.site2)], id.vars = "date")
# plot the data
ggplot(dt2, aes(x = date, y = value, color = variable))+geom_point()+geom_path()
You could paste the time columns together and convert them as.POSIXct.
As #PavoDive already pointed out we'll need numeric time columns. Check your code that produced the data or transform to numeric with d[1:4] <- Map(function(x) as.numeric(as.character(x)), d[1:4]).
Now paste the rows with apply, convert as.POSIXct, and cbind it to the remainder. The sprintf looks first that all values have the same digits before pasting.
d2 <- cbind(time=as.POSIXct(apply(sapply(d[1:4], sprintf, fmt="%02d"), 1, paste, collapse=""),
format="%m%d%H%M"),
d[5:6])
Plots nicely, here in base R:
with(d2, plot(time, t.site1, ylim=c(15, 17), xaxt="n",
xlab="time", ylab="value", type="b", col="red",
main="Time series"))
with(d2, lines(time, t.site2, type="b", col="green"))
mtext(strftime(d2$time, "%H:%M"), 1, 1, at=d2$time) # strftime gives the desired formatting
legend("bottomright", names(d2)[2:3], col=c("red", "green"), lty=rep(1, 2))
Data
d <- structure(list(month = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "2", class = "factor"),
day = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "1", class = "factor"),
h = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "0", class = "factor"),
min = structure(1:6, .Label = c("0", "10", "20", "30", "40",
"50"), class = "factor"), t.site1 = c(15.485, 15.509, 15.557,
15.557, 15.605, 15.605), t.site2 = c(16.773, 16.773, 16.773,
16.773, 16.773, 16.773)), row.names = c(NA, -6L), class = "data.frame")
I'm assuming that you needed the actual output showing temperature variability by hour for each day in the same plot?
EDITED:
I have updated the code to generate a day worth of data. And, also generate the chart.
library(tidyverse)
library(lubridate)
df <- data_frame(month = rep(2, 144),
day = rep(1, 144),
h = rep(0:24, each = 6, len = 144),
min = rep((0:5)*10,24),
t.site1 = rnorm(n = 144, mean = 15.501, sd = 0.552),
t.site2 = rnorm(n = 144, mean = 16.501, sd = 0.532))
df %>%
group_by(month, day, h) %>%
summarise(mean_t_site1 = mean(t.site1), mean_t_site2 = mean(t.site2)) %>%
mutate(date = ymd_h(paste0("2019-",month,"-",day," ",h))) %>%
ungroup() %>%
select(mean_t_site1:date) %>%
gather(key = "site", value = "mean_temperature", -date) %>%
ggplot(aes(x = date, y = mean_temperature, colour = site)) +
geom_line()
Could you verify if this is the output you need?

Resources