I'd like to aggregate time-series data to get weekly data, but doing so the class of the temporal variable becomes "character" instead of "Date", losing therefore any cool features of being a date.
This is quite annoying, especially when I need to plot data and play with breaks and labels.
Here is a short example of what I'm facing
# Storing some random daily data
require(plyr)
require(dplyr)
df = data.frame(date = seq.Date(from = as.Date('2013-01-01'),
to = as.Date('2014-12-31'),
by = 'day'),
data = rnorm(365*2))
Aggregating the data into some weekly data
wdf = df %>%
mutate(week = strftime(df$date, format = '%Y-%U')) %>%
group_by(week) %>%
summarise(wdata = max(data))
Unfortunately now the variable week is not of class "Date". Any idea about the possibility of keeping the class date for objects of the format %Y-%V?
Thanks in advance!
EB
Use the awesome lubridate package. It has a floor_date function that rounds a date downward according to any of several time units (including weeks as you want).
library(lubridate)
wdf = df %>%
mutate(week = floor_date(date, unit = 'week')) %>%
group_by(week) %>%
summarise(wdata = max(data))
Related
In R, is there a way to use "group_by" and convert daily data to yearly data using last() of all variable in one go rather than using summarise of each variable?
(adf <- data.frame(
year=rep(1:3,each=3),
month=rep(1:3,times=3),
var1= letters[1:9],
var2= -9:-1
))
#solution
library(tidyverse)
group_by(adf,
year) |>
summarise(across(.cols = everything(),
.fns = last))
I am trying to use ggplot to draw the data contained in the following date frame:
df <- data.frame( dress_id = c(1,2,3,4,5),
29/8/2013 = c(2000,150,6,1000,900),
31/8/2013 = c(2000,200,7,1100,1000),
2/9/2013 = c(2400,600,7,1350,1300),
4/9/2013 = c(2600,600,7,1500,1400),
style = c("Sexy", "Casual","vintage","Brief","cute"))
I want to have x-axis to be my date (29/8/2013...2/9/2013) and my y-axis to be the sales price of dates and finally my style.
Is this possible using ggplot?
here are the details to zx8754's answer.
First, note that I put an X infront of the date columns: this is because column-names in R should not start with a number.
df <- data.frame( dress_id = c(1,2,3,4,5),
"X29/8/2013" = c(2000,150,6,1000,900),
"X31/8/2013" = c(2000,200,7,1100,1000),
"X2/9/2013" = c(2400,600,7,1350,1300),
"X4/9/2013" = c(2600,600,7,1500,1400),
style = c("Sexy", "Casual","vintage","Brief","cute"))
Next, I load the tidyverse package, which contains functions to work with data.frames and also includes ggplot2
library(tidyverse)
Finally, I transform your data from wide to long: this is done with the gather functions. As a result, there is now a date column in your data.frame which contains all the present dates and a value column which contains the sales prices.
df %>%
gather(date, value, -dress_id, -style) %>%
mutate(date = as.Date(date, format = c("X%d.%m.%Y"))) %>%
ggplot(aes(x = date, y = value, colour = style)) +
geom_line()
I have imported daily return data for ADSK via a downloaded Yahoo finance .csv file.
ADSKcsv <- read.csv("ADSK.csv", TRUE)
I have converted the .csv file to a data frame
class(ADSKcsv)
I have selected the two relevant columns that I want to work with and sought to take the mean of all daily returns for each year. I do not know how to do this.
aggregate(Close~Date, ADSK, mean)
The above code yields a mean calculation for each date. My objective is to calculate YoY return from this data, first converting daily returns to yearly returns, then using yearly returns to calculate year-over-year returns. I'd appreciate any help.
May I suggest an easier approach?
library(tidyquant)
ADSK_yearly_returns_tbl <- tq_get("ADSK") %>%
tq_transmute(select = close,
mutate_fun = periodReturn,
period = "yearly")
ADSK_yearly_returns_tbl
If you run the above code, it will download the historical returns for a symbol of interest (ADSK in this case) and then calculate the yearly return. An added bonus to using this workflow is that you can swap out any symbols of interest without manually downloading and reading them in. Plus, it saves you the extra step of calculating the average daily return.
You can extract the year value from date and then do aggregate :
This can be done in base R :
aggregate(Close~year, transform(ADSKcsv, year = format(Date, '%Y')), mean)
dplyr
library(dplyr)
ADSKcsv %>%
group_by(year = format(Date, '%Y')) %>%
#Or using lubridate's year function
#group_by(year = lubridate::year(Date)) %>%
summarise(Close = mean(Close))
Or data.table
library(data.table)
setDT(ADSKcsv)[, .(Close = mean(Close)), format(Date, '%Y')]
I am trying to create two frequency tables, one that is daily, and one that is hourly. I am able to get the daily values fairly easily.
C<-Data
C$Data<-format(C$Data, "%m/%d/%Y")
Freq_Day<- C %>% group_by(Data) %>% summarise(frequency = n())
However when I try to get the hourly frequency by doing the following
B<-Data
B$Data<-format(B$Data,"%m/%d/%Y %H:%M")
Freq_HRLY<-B %>% group_by(Data) %>% summarise(frequency = n())
It omits hours that simply did not occur in the data set. Thus it returns a column that is less than (# of Days) *24. How would I go about getting a column of dates in one hour increments with their corresponding frequency, in a way that if there is no occurrence in "Data' it just has a value of 0
One way would be to use tidyr::complete to fill in the missing hours on the Freq_HRLY data which is already calculated by creating a sequence of hourly interval between min and max Data.
library(dplyr)
Freq_HRLY %>%
ungroup() %>%
mutate(Data = as.POSIXct(Data, format = "%m/%d/%Y %H:%M")) %>%
tidyr::complete(Data = seq(min(Data), max(Data), by = "1 hour"),
fill = list(frequency = 0))
I am using the code below to group by month to sum or count. However, the SLARespond column seems like it sums for the whole data set, not for each month.
Any way that I can fix the problem?
also, instead of sum function, can I do count function with SLAIncident$IsSlaRespondByViolated == 1
Appreciate for helps!
SLAIncident <- SLAIncident %>%
mutate(month = format(SLAIncident$CreatedDateLocal, "%m"), year = format(SLAIncident$CreatedDateLocal, "%Y")) %>%
group_by(year, month) %>%
summarise(SLARespond = sum(SLAIncident$IsSlaRespondByViolated))
If you could provide a small bit of the dataset to illustrate your example that would be great. I would first make sure that your months/years are characters or factors so that dplyr can grab them. An ifelse function wrapped in a sum should also fit your criteria for the second part of the question. I am using your code here to convert the dates into month and year but I recommend lubridate
SLAIncident <- SLAIncident %>%
mutate(month = as.character(format(SLAIncident$CreatedDateLocal, "%m")),
year = as.character(format(SLAIncident$CreatedDateLocal, "%Y"))) %>%
group_by(year, month) %>%
summarise(SLARespond = sum(IsSlaRespondByViolated),
sla_1 = sum(ifelse(isSlaRespondByViolated == 1, 1, 0)))
Also as hinted to in the comments, these column names are really long and could use some tidying