A few questions have come close to what I am looking for, but I can't find one that gets it right on.
I have sales data for several products for each day over a 6-year period. I summed the data by week, starting January 1, 2008. During the period 1/1/08-12/30/13, there were 313 weeks, so I just created dataframes for each product that contained columns for week numbers 1-313 and the weekly sales for each respective week.
I am plotting them with ggplot2, adding trendlines, etc.
The x-axis obviously uses the week number for its values, but I would prefer if it used the actual dates of the start of each week (Jaunary 1, 2008, a Tuesday, January 8, 2008, December 25, 2013, etc).
What is the best way to do this? How can I convert weeks 1-313 into their respective Start of Week dates? Or, is there a way to override the axis values on the plot itself?
To convert your week numbers to dates try something like this
weeks <- 1:313
start.date <- as.Date("2007/12/31")
y <- start.date + (weeks - 1)*7
head(y)
"2007-12-31" "2008-01-07" "2008-01-14" "2008-01-21" "2008-01-28" "2008-02-04"
Use package:lubridate?
Sample data (which you should have provided):
> df = data.frame(wid=1:10,z=runif(10))
> head(df)
wid z
1 1 0.2071595
2 2 0.4313403
3 3 0.7063967
4 4 0.2245014
5 5 0.2004542
6 6 0.1231366
Assuming your data are consecutive, with no gaps:
> require(lubridate)
> df$week=mdy("Jan 1 2008") + weeks(0:(nrow(df)-1))
> head(df)
wid z week
1 1 0.2071595 2008-01-01
2 2 0.4313403 2008-01-08
3 3 0.7063967 2008-01-15
4 4 0.2245014 2008-01-22
5 5 0.2004542 2008-01-29
6 6 0.1231366 2008-02-05
Then plot for nice labels:
> require(ggplot2)
> ggplot(df,aes(x=week,y=z))+geom_line()
Related
This question already has answers here:
How can I get the extract the previous year (2020) using Sys.Date()?
(2 answers)
Closed 1 year ago.
I have manually separated my dataset (discrete_8) into 2 separate datasets (data & data2). 'Data' contains the data from this current year (2021), whereas 'Data2' contains data from previous years. Of course, this is based on the current year (2021), but I want to automate the line of code so that when the year 2022 comes, I will not have to edit the script to change 2021 to 2022. Should I use Sys.Date() for calling the most recent year? How would I go about incorporating sys.date() to partition the dataset?
Here is my code so far, where I partition the dataset:
data <- discrete_8 %>% filter(PS_DATE >= as.POSIXct("2021-01-01"))#current year
data2 <- discrete_8 %>% filter(PS_DATE < as.POSIXct("2021-01-01"))#past years
Here is what discrete_8 looks like:
X PS_DATE PS_NAME Control.Parameters.Cell.Return.Flow.Rate Control.Parameters.Harvest.Flow.Rate Control.Parameters.Microsparger.Total.Gas.Flow.Rate
1 0 2014-02-06 123 NA NA 1
2 1 2014-02-07 124 NA NA 1
3 2 2014-02-08 125 NA NA 1
4 3 2014-02-09 126 1.5 NA 1
5 4 2014-02-10 127 1.5 NA 1
6 5 2014-02-11 128 1.5 NA 1
There is somewhat tedious bug still present in that trunc(Sys.Date(), "year") does not give you Jan 01 of the current year -- it does in R-devel.
But you can build yourself a helper such as this:
> firstDay <- function() { d <- Sys.Date(); d - as.POSIXlt(d)$yday }
> firstDay()
[1] "2021-01-01"
and you can use that to compare. (Also, in the code you posted, as.Date() is simpler as you ignore hours/minutes/seconds here.)
one option can be the lubridate::floor_date() function:
lubridate::floor_date(Sys.Date(), unit = "years")
[1] "2021-01-01"
I use substr(Sys.Date(),1,4) to get the current year. In your code you can replace as.POSIXct("2021-01-01") with
as.POSIXct(paste0(substr(Sys.Date(),1,4),"-01-01"))
This will give the 1st of the current year in your datetime format.
Apologies if this is a repeat question, I searched and could not find the specific answer I am looking for.
I have a data frame where one column is a 16-digit code, and there are a number of other columns. Here is a simplified example:
code = c("1109619910224003", "1157919910102001", "1539820070315001", "1563120190907002")
year = c(1991, 1991, 2007, 2019)
month = c(02, 01, 03, 09)
dat = as.data.frame(cbind(code,year,month))
dat
> dat
code year month
1 1109619910224003 1991 2
2 1157919910102001 1991 1
3 1539820070315001 2007 3
4 1563120190907002 2019 9
As you can see, the code contains year, month, and day information. I already have columns for year and month in my dataframe, but I need to also create a day column, which would be 24, 02, 15, and 07 in this example. The date is always in the format yyyymmdd and begins as the 6th digit in the code. So I essentially need to extract the 12th and 13th digits from each code to create my day column.
I then need to create another column for day of year from the date information, so I end up with the following:
day = c(24, 02, 15, 07)
dayofyear = c(55, 2, 74, 250)
dat2 = as.data.frame(cbind(code,year,month,day,dayofyear))
dat2
> dat2
code year month day dayofyear
1 1109619910224003 1991 2 24 55
2 1157919910102001 1991 1 2 2
3 1539820070315001 2007 3 15 74
4 1563120190907002 2019 9 7 250
Any suggestions? Thanks!
You can leverage the Date data type in R to accomplish all of these tasks. First we will parse out the date portion of the code (characters 6 to 13), and convert them to Date format using readr::parse_date(). Once the date is converted, we can simply access all of the values you want rather than calculating them ourselves.
library(tidyverse)
out <- dat %>%
mutate(
date=readr::parse_date(substr(code, 6, 13), format="%Y%m%d"),
day=format(date, "%d"),
month=format(date, "%m"),
year=format(date, "%Y"),
day.of.year=format(date, "%j")
)
(I'm using tidyverse syntax here because I find it quicker for these types of problems)
Once we create these columns, we can look at the updated data.frame out:
code year month date day day.of.year
1 1109619910224003 1991 02 1991-02-24 24 055
2 1157919910102001 1991 01 1991-01-02 02 002
3 1539820070315001 2007 03 2007-03-15 15 074
4 1563120190907002 2019 09 2019-09-07 07 250
Edit: note that the output for all the new columns is character. We can tell this without using str() because of the leading zeros in the new columns. To get rid of this, we can do something like out <- out %>% mutate_all(as.integer), or just append the mutate_all call to the end of our existing pipeline.
I have a data set with the variable 'months' from 1 to 12, but need to change them to the month names. i.e "1" needs to be January and so on. Whats the easiest way to do this?
R has an inbuilt vector called month.name for your purpose you could do something like the following:
# Some dummy data
set.seed(1)
df <- data.frame(
month = sample(1:12, size = 10)
)
# Now use your integer month to subset month.name
df$month2 <- month.name[df$month] # Also has month.abb
df
month month2
1 9 September
2 4 April
3 7 July
4 1 January
5 2 February
6 5 May
7 3 March
8 8 August
9 6 June
10 11 November
I have an ohlc daily data for US stocks. I would like to derive a weekly timeseries from it and compute SMA and EMA. To be able to do that though, requirement is to create the weekly timeseries from the maximum high per week, and another weekly timeseries from the minimum low per week. After that I, would then compute their sma and ema then assign to every days of the week (one period forward). So, first problem first, how do I get the weekly from the daily using R (any package), or better if you can show me an algo for it, any language but preferred is Golang? Anyway, I can rewrite it in golang if needed.
Date High Low Week(High) Week(Low) WkSMAHigh 2DP WkSMALow 2DP
(one period forward)
Dec 24 Fri 6 3 8 3 5.5 1.5
Dec 23 Thu 7 5 5.5 1.5
Dec 22 Wed 8 5 5.5 1.5
Dec 21 Tue 4 4 5.5 1.5
Assume Holiday (Dec 20)
Dec 17 Fri 4 3 6 2 None
Dec 16 Thu 4 3
Dec 15 Wed 5 2
Dec 14 Tue 6 4
Dec 13 Mon 6 4
Dec 10 Fri 5 1 5 1 None
Dec 9 Thu 4 3
Dec 8 Wed 3 2
Assume Holiday (Dec 6 & 7)
I'd start by generating a column which specifies which week it is.
You could use the lubridate package to do this, that would require converting your dates into Date types. It has a function called week which returns the number of full 7 day periods that have passed since Jan 1st + 1. However I don't know if this data goes over several years or not. Plus I think there's a simpler way to do this.
The example I'll give below will simply do it by creating a column which just repeats an integer 7 times up to the length of your data frame.
Pretend your data frame is called ohlcData.
# Create a sequence 7 at a time all the way up to the end of the data frame
# I limit the sequence to the length nrow(ohlcData) so the rounding error
# doesn't make the vectors uneven lengths
ohlcData$Week <- rep(seq(1, ceiling(nrow(ohlcData)/7), each = 7)[1:nrow(ohlcData)]
With that created we can then go ahead and use the plyr package, which has a really useful function called ddply. This function applies a function to columns of data grouped by another column of data. In this case we will apply the max and min functions to your data based on its grouping by our new column Week.
library(plyr)
weekMax <- ddply(ohlcData[,c("Week", "High")], "Week", numcolwise(max))
weekMin <- ddply(ohlcData[,c("Week", "Low")], "Week", numcolwise(min))
That will then give you the min and max of each week. The dataframe returned for both weekMax and weekMin will have 2 columns, Week and the value. Combine these however you see fit. Perhaps weekExtreme <- cbind(weekMax, weekMin[,2]). If you want to be able to marry up date ranges to the week numbers it will just be every 7th date starting with whatever your first date was.
This question already has answers here:
Get the difference between dates in terms of weeks, months, quarters, and years
(9 answers)
Closed 6 years ago.
I have got a dataframe with a column Date in which the observations range from 1974-10-01 to 2014-30-09. I would like to create a new column ("Day") in the dataframe which specify the number of day since the first time period day (i.e. 1974-10-01).
I already have the code and it worked perfectly for a really similar dataframe but I do not know why with this 2nd dataframe it does not work.
1) The code is the following:
library(lubridate)
ref_date <- dmy("01-10-1974")
df$Day <- as.numeric(difftime(df$Date, ref_date))
2) The first rows of my dataframe are:
Code Area Date Height
1 2001 551.4 1975-04-01 120.209
2 2001 551.4 1976-01-06 158.699
3 2001 551.4 1977-01-21 128.289
4 2001 551.4 1978-02-23 198.254
5 2001 551.4 1979-07-31 131.811
[....]
3) What I obtain with my code (1) is the following:
Code Area Date Day Height
1 2001 551.4 1975-04-01 15724800 120.209
2 2001 551.4 1976-01-06 39916800 158.699
3 2001 551.4 1977-01-21 72835200 128.289
4 2001 551.4 1978-02-23 107222400 198.254
5 2001 551.4 1979-07-31 152409600 131.811
[....]
I spent more than 2 hours wondering why without any clue.
Any suggestion?
Another option
difftime(ref_date,df$Date,units = "days")
Are you looking for something like the example below :
df <- data.frame(Date = c("1975-04-01"))
> df
Date
1 1975-04-01
df$new_col <- as.Date(as.character(df$Date), format="%Y-%m-%d") - as.Date(as.character("1974-10-01"), format="%Y-%m-%d")
> df
Date new_col
1 1975-04-01 182 days
>
Your code seems to work as long as the Date is a character column.
library(lubridate)
ref_date <- dmy("01-10-1974")
df<- data.frame(Code=2001, Area=551.4, Date=c("1975-04-01","1976-01-06","1977-01-21","1978-02-23","1979-07-31"), Height=c(120.209, 158.699, 128.289, 198.254, 131.811))
df$Day <- as.numeric(difftime(df$Date, ref_date))