R: add a column for the day of value - r

I have the following code to select days (24h) when both maximum and minimum temperatures have high temperatures (higher than the 90th percentiles of both). The code calculates the length of the individual event and the highest mean temperature recorded during each event.
setDT(df)
df[, hotday := +(df$MAX>=(quantile(df$MAX,.90, na.rm = T, type = 6)) & df$MIN>=(quantile(df$MIN,.90, na.rm = T, type = 6)))
] [, hw.length := with(rle(hotday), rep(lengths,lengths))
] [hotday==0, hw.length:=0][!!hotday, Highest_Mean := max(MEAN) , rleid(hw.length)][]
The result of the code looks like this:
> head(df)
YEAR MONTH DAY Date MEAN MAX MIN D hotday hw.length Highest_Mean
1: 1991 5 14 5/14/1991 32.2 41.0 23.6 17.4 1 3 34.9
2: 1991 5 15 5/15/1991 34.9 43.3 26.0 17.3 1 3 34.9
3: 1991 5 16 5/16/1991 31.4 39.2 23.6 15.6 1 3 34.9
4: 1994 5 27 5/27/1994 30.7 41.0 23.0 18.0 1 2 30.7
5: 1994 5 28 5/28/1994 30.6 39.4 23.4 16.0 1 2 30.7
The first event lived for 3 days and the highest mean was 34.9, but the code does not tell on which day that was recorded (was it on the first, second or third day of the event).
How can I add a column that gives that information along with the maximum length (non-duplicated values, one per each event)? something like this
YEAR MONTH DAY Date MEAN MAX MIN D hotday hw.length Highest_Mean mean.day.max.length
1: 1991 5 14 5/14/1991 32.2 41.0 23.6 17.4 1 3 34.9
2: 1991 5 15 5/15/1991 34.9 43.3 26.0 17.3 1 3 34.9 2-3
3: 1991 5 16 5/16/1991 31.4 39.2 23.6 15.6 1 3 34.9

You would be better off adding a unique identifying code for each heat wave event and then indexing that,but this solution will work (as long as two events do not have the exact same length and max mean temperature)
hottest_day = which(df$MEAN == df$Highest_Mean)
df$mean.day.max.length=""
for(i in hottest_day){
subset=df[(which(df$hw.length==df$hw.length[i] & df$Highest_Mean==df$Highest_Mean[i])),]
df$mean.day.max.length[i]=paste0(which(subset$MEAN==df$Highest_Mean[i]),"-",df$hw.length[i])
}

Related

Problem with return calculations using "tq_mutate" function in R

I try to calculate stock returns for different time periods for a very large dataset.
I noticed that there are some inconsistencies with tq_mutate calculations and my checking:
library(tidyquant)
A_stock_prices <- tq_get("A",
get = "stock.prices",
from = "2000-01-01",
to = "2004-12-31")
print(A_stock_prices[A_stock_prices$date>"2000-12-31",])
# A tibble: 1,003 x 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 2001-01-02 38.5 38.5 35.1 36.4 2261684 **31.0**
2 A 2001-01-03 35.1 40.4 34.0 40.1 4502678 34.2
3 A 2001-01-04 40.7 42.7 39.6 41.7 4398388 35.4
4 A 2001-01-05 41.0 41.7 38.3 39.4 3277052 33.5
5 A 2001-01-08 38.8 39.9 37.4 38.1 2273288 32.4
6 A 2001-01-09 38.3 39.3 37.1 37.9 2474180 32.3
...
1 A 2001-12-21 19.7 20.2 19.7 20.0 3732520 17.0
2 A 2001-12-24 20.4 20.5 20.1 20.4 1246177 17.3
3 A 2001-12-26 20.5 20.7 20.1 20.1 2467051 17.1
4 A 2001-12-27 20.0 20.7 20.0 20.6 1909948 17.5
5 A 2001-12-28 20.7 20.9 20.4 20.7 1600430 17.6
6 A 2001-12-31 20.5 20.8 20.4 20.4 2142016 **17.3**
A_stock_prices %>%
tq_transmute (select = adjusted,
mutate_fun = periodReturn,
period = "yearly") %>%
ungroup()
# A tibble: 5 x 2
date yearly.returns
<date> <dbl>
1 2000-12-29 -0.240
2 2001-12-31 -0.479
3 2002-12-31 -0.370
4 2003-12-31 0.628
5 2004-12-30 -0.176
Now, based on the calculation, the yearly return for the year 2001 is: "-0.479"
But, when I calculate the yearly return myself (the close price at the end of the period divided by the close price at the beginning of the period), I get a different result:
A_stock_prices[A_stock_prices$date=="2001-12-31",]$adjusted/
A_stock_prices[A_stock_prices$date=="2001-01-02",]$adjusted-1
"-0.439"
Same issue persists with other time periods (e.g., monthly or weekly calculations).
What am I missing?
Update: The very strange thing is that if I change the time in the tq_get, to 2001:
A_stock_prices <- tq_get("A",
get = "stock.prices",
from = "2001-01-01",
to = "2004-01-01")
I get the correct result for the year 2001 (but not for other years)..
Not sure how your dataset is built but what's the first date for the 2001 group? Your manual attempt has it as January 2nd, 2001. If there's data present for January 1st, what's that result?
If that's not it, I'd recommend posting your data, just so we can see how it's structured.
Eventually I figured it out:
tq_get() calculates the return for a "day before" the requested period.
I.e., for the yearly return it calculates the return from (say) 31/12/2022 to 31/12/2021 (rather than to 01/01/2022).

Loop to sum weekly rolling average

I am new to coding. I have a data set of daily stream flow averages over 20 years. Following is an example:
DATE FLOW
1 10/1/2001 88.2
2 10/2/2001 77.6
3 10/3/2001 68.4
4 10/4/2001 61.5
5 10/5/2001 55.3
6 10/6/2001 52.5
7 10/7/2001 49.7
8 10/8/2001 46.7
9 10/9/2001 43.3
10 10/10/2001 41.3
11 10/11/2001 39.3
12 10/12/2001 37.7
13 10/13/2001 35.8
14 10/14/2001 34.1
15 10/15/2001 39.8
I need to create a loop summing the previous 6 days as well as the current day (rolling weekly average), and print it to an array for the designated water year. I have already created an aggregate function to separate yearly average daily means into their designated water years.
# Separating dates into specific water years
wtr_yr <- function(dates, start_month=9)
# Convert dates into POSIXlt
POSIDATE = as.POSIXlt(NEW_DATE)
# Year offset
offset = ifelse(POSIDATE$mon >= start_month - 1, 1, 0)
# Water year
adj.year = POSIDATE$year + 1900 + offset
# Aggregating the water year function to take the mean
mean.FLOW=aggregate(data_set$FLOW,list(adj.year), mean)
It seems that it can be done much more easily.
But first I need to prepare a bit more data.
library(tidyverse)
library(lubridate)
df = tibble(
DATE = seq(mdy("1/1/2010"), mdy("12/31/2022"), 1),
FLOW = rnorm(length(DATE), 40, 10)
)
output
# A tibble: 4,748 x 2
DATE FLOW
<date> <dbl>
1 2010-01-01 34.4
2 2010-01-02 37.7
3 2010-01-03 55.6
4 2010-01-04 40.7
5 2010-01-05 41.3
6 2010-01-06 57.2
7 2010-01-07 44.6
8 2010-01-08 27.3
9 2010-01-09 33.1
10 2010-01-10 35.5
# ... with 4,738 more rows
Now let's do the aggregation by year and week number
df %>%
group_by(year(DATE), week(DATE)) %>%
summarise(mean = mean(FLOW))
output
# A tibble: 689 x 3
# Groups: year(DATE) [13]
`year(DATE)` `week(DATE)` mean
<dbl> <dbl> <dbl>
1 2010 1 44.5
2 2010 2 39.6
3 2010 3 38.5
4 2010 4 35.3
5 2010 5 44.1
6 2010 6 39.4
7 2010 7 41.3
8 2010 8 43.9
9 2010 9 38.5
10 2010 10 42.4
# ... with 679 more rows
Note, for the function week, the first week starts on January 1st. If you want to number the weeks according to the ISO 8601 standard, use the isoweek function. Alternatively, you can also use an epiweek compatible with the US CDC.
df %>%
group_by(year(DATE), isoweek(DATE)) %>%
summarise(mean = mean(FLOW))
output
# A tibble: 681 x 3
# Groups: year(DATE) [13]
`year(DATE)` `isoweek(DATE)` mean
<dbl> <dbl> <dbl>
1 2010 1 40.0
2 2010 2 45.5
3 2010 3 33.2
4 2010 4 38.9
5 2010 5 45.0
6 2010 6 40.7
7 2010 7 38.5
8 2010 8 42.5
9 2010 9 37.1
10 2010 10 42.4
# ... with 671 more rows
If you want to better understand how these functions work, please follow the code below
df %>%
mutate(
w1 = week(DATE),
w2 = isoweek(DATE),
w3 = epiweek(DATE)
)
output
# A tibble: 4,748 x 5
DATE FLOW w1 w2 w3
<date> <dbl> <dbl> <dbl> <dbl>
1 2010-01-01 34.4 1 53 52
2 2010-01-02 37.7 1 53 52
3 2010-01-03 55.6 1 53 1
4 2010-01-04 40.7 1 1 1
5 2010-01-05 41.3 1 1 1
6 2010-01-06 57.2 1 1 1
7 2010-01-07 44.6 1 1 1
8 2010-01-08 27.3 2 1 1
9 2010-01-09 33.1 2 1 1
10 2010-01-10 35.5 2 1 2
# ... with 4,738 more rows

Time-series data visualization

I have a pretty large data frame in R stored in long form. It contains body temperature data collected from 40 different individuals, with 10 sec intervals, over 16 days. Individuals have been exposed to conditions (cond1 and cond2). It essentially looks like this:
ID Cond1 Cond2 Day ToD Temp
1 A B 1 18.0 37.1
1 A B 1 18.3 37.2
1 A B 2 18.6 37.5
2 B A 1 18.0 37.0
2 B A 1 18.3 36.9
2 B A 2 18.6 36.9
3 A A 1 18.0 36.8
3 A A 1 18.3 36.7
3 A A 2 18.6 36.7
...
I want to create four separate line plots for each combination of conditions(AB, BA, AA, BB) that shows mean temp over time (day 1-16).
p.s. ToD stands for time of day. Not sure if I need to provide it in order to create the plot.
So far I have tried to define the dataset as time series by doing
ts <- ts(data=dataset$Temp, start=1, end=16, frequency=8640)
plot(ts)
This returns a plot of Temp, but I can't figure out how to define condition values for breaking up the data.
Edit:
Essentially I want a plot that looks like this 1, but one for each group separately, and using mean Temp values. This plot is just for one individual in one condition, and I want one that shows the mean for all individuals in the same condition.
You can use summarise and group_by to group the data by condition and then plot it. Is this what you're looking for?
library(dplyr)
## I created a dataframe df that looks like this:
ID Cond1 Cond2 Day ToD Temp
1 1 A B 1 18.0 37.1
2 1 A B 1 18.3 37.2
3 1 A B 2 18.6 37.5
4 2 B A 1 18.0 37.0
5 2 B A 1 18.3 36.9
6 2 B A 2 18.6 36.9
7 3 A A 1 18.0 36.8
8 3 A A 1 18.3 36.7
9 3 A A 2 18.6 36.7
df$Cond <- paste0(df$Cond1, df$Cond2)
d <- summarise(group_by(df, Cond, Day), t = mean(Temp))
ggplot(d, aes(Day, t, color = Cond)) + geom_line()
which results in:

How to subset consecutive rows if they meet a condition

I am using R to analyze a number of time series (1951-2013) containing daily values of Max and Min temperatures. The data has the following structure:
YEAR MONTH DAY MAX MIN
1985 1 1 22.8 9.4
1985 1 2 28.6 11.7
1985 1 3 24.7 12.2
1985 1 4 17.2 8.0
1985 1 5 17.9 7.6
1985 1 6 17.7 8.1
I need to find the frequency of heat waves based on this definition: A period of three or more consecutive days ‎with a daily maximum and minimum temperature exceeding the 90th percentile of the maximum ‎and minimum temperatures for all days in the studied period.
Basically, I want to subset those consecutive days (three or more) when the Max and Min temp exceed a threshold value. The output would be something like this:
YEAR MONTH DAY MAX MIN
1989 7 18 45.0 23.5
1989 7 19 44.2 26.1
1989 7 20 44.7 24.4
1989 7 21 44.6 29.5
1989 7 24 44.4 31.6
1989 7 25 44.2 26.7
1989 7 26 44.5 25.0
1989 7 28 44.8 26.0
1989 7 29 44.8 24.6
1989 8 19 45.0 24.3
1989 8 20 44.8 26.0
1989 8 21 44.4 24.0
1989 8 22 45.2 25.0
I have tried the following to subset my full dataset to just the days that exceed the 90th percentile temperature:
HW<- subset(Mydata, Mydata$MAX >= (quantile(Mydata$MAX,.9)) &
Mydata$MIN >= (quantile(Mydata$MIN,.9)))
However, I got stuck in how I can subset only consecutive days that have met the condition.
An approach with data.table which is slightly different from #jlhoward's approach (using the same data):
library(data.table)
setDT(df)
df[, hotday := +(MAX>=44.5 & MIN>=24.5)
][, hw.length := with(rle(hotday), rep(lengths,lengths))
][hotday == 0, hw.length := 0]
this produces a datatable with a heat wave length variable (hw.length) instead of a TRUE/FALSE variable for a specific heat wave length:
> df
YEAR MONTH DAY MAX MIN hotday hw.length
1: 1989 7 18 45.0 23.5 0 0
2: 1989 7 19 44.2 26.1 0 0
3: 1989 7 20 44.7 24.4 0 0
4: 1989 7 21 44.6 29.5 1 1
5: 1989 7 22 44.4 31.6 0 0
6: 1989 7 23 44.2 26.7 0 0
7: 1989 7 24 44.5 25.0 1 3
8: 1989 7 25 44.8 26.0 1 3
9: 1989 7 26 44.8 24.6 1 3
10: 1989 7 27 45.0 24.3 0 0
11: 1989 7 28 44.8 26.0 1 1
12: 1989 7 29 44.4 24.0 0 0
13: 1989 7 30 45.2 25.0 1 1
I may be missing something here but I don't see the point of subsetting beforehand. If you have data for every day, in chronological order, you can use run length encoding (see the docs on the rle(...) function).
In this example we create an artificial data set and define "heat wave" as MAX >= 44.5 and MIN >= 24.5. Then:
# example data set
df <- data.frame(YEAR=1989, MONTH=7, DAY=18:30,
MAX=c(45, 44.2, 44.7, 44.6, 44.4, 44.2, 44.5, 44.8, 44.8, 45, 44.8, 44.4, 45.2),
MIN=c(23.5, 26.1, 24.4, 29.5, 31.6, 26.7, 25, 26, 24.6, 24.3, 26, 24, 25))
r <- with(with(df, rle(MAX>=44.5 & MIN>=24.5)),rep(lengths,lengths))
df$heat.wave <- with(df,MAX>=44.5&MIN>=24.5) & (r>2)
df
# YEAR MONTH DAY MAX MIN heat.wave
# 1 1989 7 18 45.0 23.5 FALSE
# 2 1989 7 19 44.2 26.1 FALSE
# 3 1989 7 20 44.7 24.4 FALSE
# 4 1989 7 21 44.6 29.5 FALSE
# 5 1989 7 22 44.4 31.6 FALSE
# 6 1989 7 23 44.2 26.7 FALSE
# 7 1989 7 24 44.5 25.0 TRUE
# 8 1989 7 25 44.8 26.0 TRUE
# 9 1989 7 26 44.8 24.6 TRUE
# 10 1989 7 27 45.0 24.3 FALSE
# 11 1989 7 28 44.8 26.0 FALSE
# 12 1989 7 29 44.4 24.0 FALSE
# 13 1989 7 30 45.2 25.0 FALSE
This creates a column, heat.wave which is TRUE if there was a heat wave on that day. If you need to extract only the hw days, use
df[df$heat.wave,]
# YEAR MONTH DAY MAX MIN heat.wave
# 7 1989 7 24 44.5 25.0 TRUE
# 8 1989 7 25 44.8 26.0 TRUE
# 9 1989 7 26 44.8 24.6 TRUE
Your question really boils down to finding groupings of 3+ consecutive days in your subsetted dataset, removing all remaining data.
Let's consider an example where we would want to keep some rows and remove others:
dat <- data.frame(year = 1989, month=c(6, 7, 7, 7, 7, 7, 8, 8, 8, 10, 10), day=c(12, 11, 12, 13, 14, 21, 5, 6, 7, 12, 13))
dat
# year month day
# 1 1989 6 12
# 2 1989 7 11
# 3 1989 7 12
# 4 1989 7 13
# 5 1989 7 14
# 6 1989 7 21
# 7 1989 8 5
# 8 1989 8 6
# 9 1989 8 7
# 10 1989 10 12
# 11 1989 10 13
I've excluded the temperature data, because I'm assuming we've already subsetted to just the days that exceed the 90th percentile using the code from your question.
In this dataset there is a 4-day heat wave in July and a three-day heat wave in August. The first step would be to convert the data to date objects and compute the number of days between consecutive observations (I assume the data is already ordered by day here):
dates <- as.Date(paste(dat$year, dat$month, dat$day, sep="-"))
(dd <- as.numeric(difftime(tail(dates, -1), head(dates, -1), units="days")))
# [1] 29 1 1 1 7 15 1 1 66 1
We're close, because now we can see the time periods where there were multiple date gaps of 1 day -- these are the ones we want to grab. We can use the rle function to analyze runs of the number 1, keeping only the runs of length 2 or more:
(valid.gap <- with(rle(dd == 1), rep(values & lengths >= 2, lengths)))
# [1] FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE
Finally, we can subset the dataset to just the days that were on either side of a 1-day date gap that is part of a heat wave:
dat[c(FALSE, valid.gap) | c(valid.gap, FALSE),]
# year month day
# 2 1989 7 11
# 3 1989 7 12
# 4 1989 7 13
# 5 1989 7 14
# 7 1989 8 5
# 8 1989 8 6
# 9 1989 8 7
A simple approach, not full vectorized..
# play data
year <- c("1960")
month <- c(rep(1,30), rep(2,30), rep(3,30))
day <- rep(1:30,3)
maxT <- round(runif(90, 20, 22),1)
minT <- round(runif(90, 10, 12),1)
df <- data.frame(year, month, day, maxT, minT)
# target and tricky data...
df[1:3, 4] <- 30
df[1:4, 5] <- 14
df[10:13, 4] <- 30
df[10:11, 5] <- 14
# limits
df$maxTope <- df$maxT - quantile(df$maxT,0.9)
df$minTope <- df$minT - quantile(df$minT,0.9)
# define heat day
df$heat <- ifelse(df$maxTope > 0 & df$minTope >0, 1, 0)
# count heat day2
for(i in 2:dim(df)[1]){
df$count[1] <- ifelse(df$heat[1] == 1, 1, 0)
df$count[i] <- ifelse(df$heat[i] == 1, df$count[i-1]+1, 0)
}
# select last day of heat wave (and show the number of days in $count)
df[which(df$count >= 3),]
Here's a quick little solution:
is_High_Temp <- ((quantile(Mydata$MAX,.9)) &
Mydata$MIN >= (quantile(Mydata$MIN,.9)))
start_of_a_series <- c(T,is_High_Temp[-1] != is_High_Temp[-length(x)]) # this is the tricky part
series_number <- cumsum(start_of_a_series)
series_length <- ave(series_number,series_number,FUN=length())
is_heat_wave <- series_length >= 3 & is_High_Temp
A solution with dplyr , also using rle()
library(dplyr)
cond <- expr(MAX >= 44.5 & MIN >= 24.5)
df %>%
mutate(heatwave =
rep(rle(!!cond)$values & rle(!!cond)$lengths >= 3,
rle(!!cond)$lengths)) %>%
filter(heatwave)
#> YEAR MONTH DAY MAX MIN heatwave
#> 1 1989 7 24 44.5 25.0 TRUE
#> 2 1989 7 25 44.8 26.0 TRUE
#> 3 1989 7 26 44.8 24.6 TRUE
Created on 2020-05-16 by the reprex package (v0.3.0)
data
#devtools::install_github("alistaire47/read.so")
df <- read.so::read.so("YEAR MONTH DAY MAX MIN
1989 7 18 45.0 23.5
1989 7 19 44.2 26.1
1989 7 20 44.7 24.4
1989 7 21 44.6 29.5
1989 7 24 44.4 31.6
1989 7 25 44.2 26.7
1989 7 26 44.5 25.0
1989 7 28 44.8 26.0
1989 7 29 44.8 24.6
1989 8 19 45.0 24.3
1989 8 20 44.8 26.0
1989 8 21 44.4 24.0
1989 8 22 45.2 25.0")

Reshaping a data frame with more than one measure variable

I'm using a data frame similar to this one:
df<-data.frame(student=c(rep(1,5),rep(2,5)), month=c(1:5,1:5),
quiz1p1=seq(20,20.9,0.1),quiz1p2=seq(30,30.9,0.1),
quiz2p1=seq(80,80.9,0.1),quiz2p2=seq(90,90.9,0.1))
print(df)
student month quiz1p1 quiz1p2 quiz2p1 quiz2p2
1 1 1 20.0 30.0 80.0 90.0
2 1 2 20.1 30.1 80.1 90.1
3 1 3 20.2 30.2 80.2 90.2
4 1 4 20.3 30.3 80.3 90.3
5 1 5 20.4 30.4 80.4 90.4
6 2 1 20.5 30.5 80.5 90.5
7 2 2 20.6 30.6 80.6 90.6
8 2 3 20.7 30.7 80.7 90.7
9 2 4 20.8 30.8 80.8 90.8
10 2 5 20.9 30.9 80.9 90.9
Describing grades received by students during five months – in two quizzes divided into two parts each.
I need to get the two quizzes into separate rows – so that each student in each month will have two rows, one for each quiz, and two columns – for each part of the quiz.
When I melt the table:
melt.data.frame(df, c("student", "month"))
I get the two parts of the quiz in separate lines too.
dcast(dfL,student+month~variable)
of course gets me right back where I started, and I can't find a way to cast the table back in to the required form.
Is there a way to make the melt command function something like:
melt.data.frame(df, measure.var1=c("quiz1p1","quiz2p1"),
measure.var2=c("quiz1p2","quiz2p2"))
Here's how you could do this with reshape(), from base R:
df2 <- reshape(df, direction="long",
idvar = 1:2, varying = list(c(3,5), c(4,6)),
v.names = c("p1", "p2"), times = c("quiz1", "quiz2"))
## Checking the output
rbind(head(df2, 3), tail(df2, 3))
# student month time p1 p2
# 1.1.quiz1 1 1 quiz1 20.0 30.0
# 1.2.quiz1 1 2 quiz1 20.1 30.1
# 1.3.quiz1 1 3 quiz1 20.2 30.2
# 2.3.quiz2 2 3 quiz2 80.7 90.7
# 2.4.quiz2 2 4 quiz2 80.8 90.8
# 2.5.quiz2 2 5 quiz2 80.9 90.9
You can also use column names (instead of column numbers) for idvar and varying. It's more verbose, but seems like better practice to me:
## The same operation as above, using just column *names*
df2 <- reshape(df, direction="long", idvar=c("student", "month"),
varying = list(c("quiz1p1", "quiz2p1"),
c("quiz1p2", "quiz2p2")),
v.names = c("p1", "p2"), times = c("quiz1", "quiz2"))
I think this does what you want:
#Break variable into two columns, one for the quiz and one for the part of the quiz
dfL <- transform(dfL, quiz = substr(variable, 1,5),
part = substr(variable, 6,7))
#Adjust your dcast call:
dcast(dfL, student + month + quiz ~ part)
#-----
student month quiz p1 p2
1 1 1 quiz1 20.0 30.0
2 1 1 quiz2 80.0 90.0
3 1 2 quiz1 20.1 30.1
...
18 2 4 quiz2 80.8 90.8
19 2 5 quiz1 20.9 30.9
20 2 5 quiz2 80.9 90.9
There was a very similar question asked about half a year ago, in which I wrote the following function:
melt.wide = function(data, id.vars, new.names) {
require(reshape2)
require(stringr)
data.melt = melt(data, id.vars=id.vars)
new.vars = data.frame(do.call(
rbind, str_extract_all(data.melt$variable, "[0-9]+")))
names(new.vars) = new.names
cbind(data.melt, new.vars)
}
You can use the function to "melt" your data as follows:
dfL <-melt.wide(df, id.vars=1:2, new.names=c("Quiz", "Part"))
head(dfL)
# student month variable value Quiz Part
# 1 1 1 quiz1p1 20.0 1 1
# 2 1 2 quiz1p1 20.1 1 1
# 3 1 3 quiz1p1 20.2 1 1
# 4 1 4 quiz1p1 20.3 1 1
# 5 1 5 quiz1p1 20.4 1 1
# 6 2 1 quiz1p1 20.5 1 1
tail(dfL)
# student month variable value Quiz Part
# 35 1 5 quiz2p2 90.4 2 2
# 36 2 1 quiz2p2 90.5 2 2
# 37 2 2 quiz2p2 90.6 2 2
# 38 2 3 quiz2p2 90.7 2 2
# 39 2 4 quiz2p2 90.8 2 2
# 40 2 5 quiz2p2 90.9 2 2
Once the data are in this form, you can much more easily use dcast() to get whatever form you desire. For example
head(dcast(dfL, student + month + Quiz ~ Part))
# student month Quiz 1 2
# 1 1 1 1 20.0 30.0
# 2 1 1 2 80.0 90.0
# 3 1 2 1 20.1 30.1
# 4 1 2 2 80.1 90.1
# 5 1 3 1 20.2 30.2
# 6 1 3 2 80.2 90.2

Resources