Calculating yearly return from daily return data - r

I have imported daily return data for ADSK via a downloaded Yahoo finance .csv file.
ADSKcsv <- read.csv("ADSK.csv", TRUE)
I have converted the .csv file to a data frame
class(ADSKcsv)
I have selected the two relevant columns that I want to work with and sought to take the mean of all daily returns for each year. I do not know how to do this.
aggregate(Close~Date, ADSK, mean)
The above code yields a mean calculation for each date. My objective is to calculate YoY return from this data, first converting daily returns to yearly returns, then using yearly returns to calculate year-over-year returns. I'd appreciate any help.

May I suggest an easier approach?
library(tidyquant)
ADSK_yearly_returns_tbl <- tq_get("ADSK") %>%
tq_transmute(select = close,
mutate_fun = periodReturn,
period = "yearly")
ADSK_yearly_returns_tbl
If you run the above code, it will download the historical returns for a symbol of interest (ADSK in this case) and then calculate the yearly return. An added bonus to using this workflow is that you can swap out any symbols of interest without manually downloading and reading them in. Plus, it saves you the extra step of calculating the average daily return.

You can extract the year value from date and then do aggregate :
This can be done in base R :
aggregate(Close~year, transform(ADSKcsv, year = format(Date, '%Y')), mean)
dplyr
library(dplyr)
ADSKcsv %>%
group_by(year = format(Date, '%Y')) %>%
#Or using lubridate's year function
#group_by(year = lubridate::year(Date)) %>%
summarise(Close = mean(Close))
Or data.table
library(data.table)
setDT(ADSKcsv)[, .(Close = mean(Close)), format(Date, '%Y')]

Related

Applying a function using elements within a list take 2

I attempted this question yesterday(Applying a function using elements within a list) but my reprex produced the wrong data structure and unfortunately the suggestions didn't work for my actual dataset.
I have what is hopefully a simple functional programming question. I have a list of locations with average temperature and amplitude for each day (180 days in my actual dataset). I want to iterate through these locations and create a sine curve of 24 points using a custom made function taking the average temperature and amplitude from each day within a list. Below is my new reprex.
library(tibble)
library(REdaS)##degrees to radians
library(tidyverse)
sinefunc<- function(Amplitude,Average){
hour<- seq(0,23,1)
temperature<-vector("double",length = 24)
for(i in seq_along(hour)){
temperature[i]<- (Amplitude*sin(deg2rad(180*(hour[i]/24)))+Average)+Amplitude*sin(deg2rad(180*hour[i]/12))
}
temperature
}
data<- tibble(Location = c(rep("London",6),rep("Glasgow",6),rep("Dublin",6)),
Day= rep(seq(1,6,1),3),
Average = runif(18,0,20),
Amplitude = runif(18,0,15))%>%
nest_by(Location)
Using Purrr and map_dfr I get the error Error in .x$Average : $ operator is invalid for atomic vectors
df<-data %>%
map_dfr(~sinefunc(.x$Average, .x$Amplitude))
Using lapply I get the error Error in x[, "Amplitude"] : incorrect number of dimensions
data <- lapply(data, function(x){
sinefunc(Amplitude = x[,"Amplitude"], Average = x[,"Average"])
})
My goal is to have 24 hourly data points for each day and location.
Any further help would be much appreciated.
Stuart
Maybe you look for this? You get a dataframe back with 24 datapoints for each day and location, e.g. London-Day1, Dublin-Day1 etc.
library(dplyr)
library(purrr)
data<- tibble(Location = c(rep("London",6),rep("Glasgow",6),rep("Dublin",6)),
Day= rep(seq(1,6,1),3),
Average = runif(18,0,20),
Amplitude = runif(18,0,15))
# get group name
group_name <- data %>%
group_by(Location, Day) %>%
group_keys() %>%
mutate(group_name = stringr::str_c(Location,"_",Day)) %>%
pull(group_name)
data %>%
# split into lists
group_split(Location, Day) %>%
# get list name
setNames(group_name) %>%
# apply your function and get a dataframe back
map_dfr(~sinefunc(.x$Average, .x$Amplitude))

Average time from as.POSIXct ignoring dates using circular statistics

I am looking to create a simple custom function I can integrate into a dplyr pipe workflow.
I want to calculate the mean of a time excluding dates. So for instance, given a sequence of POSIXct I want to extract the time and calculate average. However, one added complication is that time is circular, meaning that 00:00:00 and 23:00:00 are very close to each other in time terms, but arithmetically not so close. So I can't just use something like mean(time_vector) to calculate average.
I have seen the package psych which has a function called circadian.mean to calculate a circular average. However, it only takes decimal hours, so there's some juggling I have had to do before getting a valid output. For instance:
library(tidyverse)
library(lubridate)
library(psych)
df = data.frame(datetime = as.POSIXct(c("2019-07-14 23:00:17",
"2019-07-14 23:40:20",
"2019-07-14 00:12:45",
"2019-07-14 00:17:19"), tz = "UTC"))
decimal_hours_vector = df %>%
mutate(hours = hour(datetime)) %>% # extracting hours
mutate(minutes = minute(datetime)) %>% # extracting minutes
mutate(seconds = second(datetime)) %>% # extracting seconds
mutate(dec_min = (minutes/60*100)/100) %>% # converting minutes to decimal hour
mutate(dec_sec = (seconds/60/60*100)/100) %>% # converting seconds to decimal hour
rowwise() %>%
mutate(dec_hour = sum(hours,dec_min,dec_sec)) %>% # summing all three time columns rowwise
ungroup() %>% # ungrouping
pull(dec_hour) # extracting dec_hour as vector to use in circadian.mean
# calculating average time with psych
average_time = circadian.mean(decimal_hours_vector)
So, I think the above works but is very cumbersome. Also, I still haven't worked out whether it makes sense to convert the final output back in hh:mm:ss or whether it should stay in the decimal format, and whether everything is working as it should.
As opposed to using the laborious process above, is there a better or more sensible way to do this?

Daily mean of hourly data per parameter

I have the following data frame that shows hourly simulated heavy metal concentrations for two parameters:
Date<-c("2013-01-01 02:00:00","2013-01-01 03:00:00","2013-01-02 02:00:00","2013-01-02 03:00:00","2013-01-01 02:00:00","2013-01-01 03:00:00","2013-01-02 02:00:00","2013-01-02 03:00:00")
Parameter<-c("Par1","Par1","Par1","Par1","Par2","Par2","Par2","Par2")
sim1<-c(1,4,3,2,6,5,3,5)
sim2<-c(3,2,3,1,8,2,7,3)
obs<-data.frame(Date,Parameter,sim1,sim2)
obs$Date<-as.POSIXct(obs$Date)
I need the daily mean for each parameter. Any ideas? I tried to aggregatebut I couldnĀ“t figure out how to group by parameter and date.
We can convert the 'Date' to Date class with as.Date, use that in the group_by along with 'Parameter' and get the mean of the rest of the columns with summarise_all
library(tidyverse)
obs %>%
group_by(Daily = as.Date(Date), Parameter) %>%
summarise_all(mean)
Or using aggregate from base R
aggregate(.~ Date + Parameter, transform(obs, Date = as.Date(Date)), mean)
Or using by
by(obs[3:4], list(obs$Parameter, as.Date(obs$Date)), FUN = colMeans)

Find the Mean of a Value by Year (not the whole date)

I'll preface this by saying I'm very much a self taught beginner with R.
I have a very large data set looking at biological data.
I want to find the average of a variable "shoot.density" split by year, but my date data is entered as "%d/%m/%y". This means using the normal way I would achieve this splits by each individual date, rather than by year only, eg.
tapply(df$Shoot.Density, list(df$Date), mean)
Any help would be much appreciated. I am also happy to paste in a section of my data, but I'm not sure how.
If your data is in date-class, you can use format to transform your date column to a year variable:
tapply(df$Shoot.Density, list(format(df$Date, '%Y')), mean)
If it is in the format %d/%m/%y, you need the substr function:
tapply(df$Shoot.Density, list(substr(df$Date,7,8)), mean)
You can also do this with dplyr:
library(dplyr)
df %>%
group_by(years = format(df$Date, '%Y')) %>%
summarise(means = mean(Shoot.Density))
Another way to do this is with the year function of the data.table package:
library(data.table)
setDT(df)[, mean(Shoot.Density), by = year(Date)]

ggvis: plotting data in multiple series

Here is what I have:
A data frame which contains a date field, and a number of summary statistics.
Here's what I want:
I want a chart that allows me to compare the time series week over week, to see how the performance of the process this week compares to the previous one, for example.
What I have done so far:
##Get the week day name to display
summaryData$WeekDay <- format(summaryData$Date, format = '%A')
##Get the week number to differentiate the weeks
summaryData$Week <- format(summaryData$Date, format = '%V')
summaryData %>%
ggvis(x = ~WeekDay, y = ~Referrers) %>%
layer_lines(stroke = ~Week)`
I expected it to create a chart with multiple coloured lines, each one representing a week in my data set. It does not do what I expect
Try looking at reshaper to convert your data with a factor variable for each week, or split up the data with a dplyr::lag() command.
A general way of doing graphs of multiple columns in ggivs is to use the following format
summaryData %>%
ggvis() %>%
layer_lines(x = ~WeekDay, y = ~Referrers)%>%
layer_lines(x=~WeekDay, y= ~Other)
I hope this helps

Resources