I am trying to recode my time variable in my dataset. Currently, my dataset reflects data for all of December and I would like to re-code the dates so that there is a variable that includes week1, week2, week3, and week4.
My date is formatted as
december$DATE <- as.Date(december$DATE, "%m/%d/%Y")
This is my current attempt at re-coding, but to no avail:
december$week <- cut (december$DATE,
breaks = c(-Inf, 12/08/2016, 12/15/2016, 12/22/2016, Inf),
labels=c("W1", "W2", "W3", "W4"))
The traditional way of recoding continuous into categorical is not applicable in this case. Any suggestions?
The ISOWeek package allows you to convert Date-objects to ISO 8601 weeks. For example,
ISOweek::ISOweek('2016-12-08')
yields
[1] "2015-W53"
This could do it too:
december$week <- cut (december$DATE,
breaks = as.Date(c("2016-12-01", "2016-12-08", "2016-12-15", "2016-12-22", "2016-12-31")),
labels=c("W1", "W2", "W3", "W4"))
You could consider adding a column of week numbers based on the date e.g.
require(plyr)
december$week <- format(december$date, format="%U")
Use ?strptime to see that %U gives week of the year (00–53) starting on Sunday. %V starts on a Monday etc.
Related
I have a dataset, df with a column containing dates in yyyy format (ex: 2018). I’m trying to make a time series graph, and therefore need to convert them to a date format.
I initially tried, df$year <- as.Date(df$year) but was told I needed to specify an origin.
I then tried to convert to a character, then a date format:
df$year <- as.character(df$year)
df$year <- as.Date(df$year, format = “%Y”)
This seems to have worked, however when it changed the all the years to yyyy-mm-dd format, and set the month and day to April 5th, today. For example 2018 becomes 2018-04-05.
Does anyone have an idea for how to fix this? I would like it to start on January 1, not the day I am performing the conversion. I also tried strptime(as.character(beer_states$year), “%Y”) with the same result.
Any help would be very much appreciated. Thanks!
Add an arbitrary date and month before converting to date.
df$Date <- as.Date(paste(df$year, 1, 1), '%Y %m %d')
We can use as.yearmon
library(zoo)
df$Date <- as.Date(as.yearmon(df$year, '-01'), '%Y-%m'))
I have a date frame df that simply looks like this:
month values
2012M01 99904
2012M02 99616
2012M03 99530
2012M04 99500
2012M05 99380
2012M06 99103
2013M01 98533
2013M02 97600
2013M03 96431
2013M04 95369
2013M05 94527
2013M06 93783
with month that was written in form of "M01", "M02"... and so on.
Now I want to convert this column to date format, is there a way to do it in R with lubridate?
I also want to select columns that contain one certain month from each year, like only March columns from all these years, what is the best way to do it?
The short answer is that dates require a year, month and day, so you cannot convert directly to a date format. You have 2 options.
Option 1: convert to a year-month format using zoo::as.yearmon.
library(zoo)
df$yearmon <- as.yearmon(df$month, "%YM%m")
# you can get e.g. month from that
months(df$yearmon[1])
# [1] "January"
Option 2: convert to a date by assuming that the day is always the first day of the month.
df$date <- as.Date(paste(df$month, "01", sep = "-"), "%YM%m-%d")
For selection (and I think you mean select rows, not columns), you already have everything you need. For example, to select only March 2013:
library(dplyr)
df %>% filter(month == "2013M03")
Something like this will get it:
raw <- "2012M01"
dt <- strptime(raw,format = "%YM%m")
dt will be in a Posix format. The strptime function will assign a '1' as the default day of month to make it a complete date.
I have a data frame DF which has a column Month as a character string using the full English name of the month, and a column Year as numeric:
Year Month {several xi}
2016 April {numeric}
I need to plot several of the xi as a time series. What is the most efficient way to sort this data frame from the earliest month (January 2015) to the present? My attempts to convert "month" into a date-classed object using as.Date are not working as I'd like; they keep coming back sorted alphabetically.
Apologies if this is a noob question, but by sheer bad luck I have not had to work with date-class objects very often in my R career, so I'm not sure which of the various similar questions I am seeing can help me.
I concur with Gregor's suggestion of using the zoo package. I think it is good practice to combine dates into one variable. If you ever need to extract information about only the year or month you can use the lubridate package. Here is a simple example of how to use zoo.
library(zoo)
#Toy Data Set
d <- data.frame( Month = c("March", "April", "May", "March"), Year = c("2008", "1998", "1997", "1999"), stringsAsFactors = FALSE)
#Generating Yearmon
d$my <- as.yearmon(paste(d$Month, d$Year))
#Ordering the data
d <- d[order(d$my), ]
Make sure that the month and year variables in your data frame are not factors. They must respectively be of a character and numeric/integer class.
One note, if you plan to use ggplot instead of plot then you'll need to use scale_x_yearmon().
Finally, you mention that you had trouble with as.Date. As Gregor notes, this is because as.Date expects a format which contains a day, month and year. Therefore in your case you can insert an arbitrary day to use as.Date. For example, as.Date(paste(d$Month, 1, d$Year), "%B %d %Y"). For a complete list of the different date formats read this.
I have DF$Date in the as.Date format "yyyy-mm-dd" as shown below. Is there an easy way to get these grouped by month in ggplot?
Date
2015-07-30
2015-08-01
2015-08-02
2015-08-06
2015-08-11
2015-08-12
I've added a column DF$Month as "year Monthname" (e.g. April 2015.)
I'm doing this by DF$Month<-strftime(DF$Date,format="%B %Y")
Is there a quick way to factor the month/years so that they are ordinal?
I used a workaround by formatting using:
DF$Month<-strftime(DF$Date,format="%Y-%m") so that the larger numbers are first and subsequently the month number.
This gives the output, which is sortable:
DF$Month
"2015-07"
"2015-08"
This output allows me to get this grouping:
http://imgur.com/df1FI3s
When using this plot:
MonthlyActivity<-ggplot(DF,aes(x=Month, y=TotalSteps))+
geom_boxplot()
MonthlyActivity
Any alternatives so I can use the full month name and still be in the correct time order?
There are probably other solutions, but here is one with full month names as a factor. As you already found out, you need a x variable to group by. We can then treat it as a 'order a factor' problem instead of a date-scale problem.
#first, generate some data
dat <- data.frame(date=sample(seq(as.Date("01012015",format="%d%m%Y"),
as.Date("01082015", format="%d%m%Y"),by=1),1000,T),
value=rnorm(1000))
We find the minimum and maximum month, and do some date-arithmetic to allow for all start-days (so that february doesn't get skipped when the minimum date is on the 29th/30th/31st). I used lubridate for this.
library(lubridate)
min_month = min(dat$date)-day(min(dat$date))+1
max_month = max(dat$date)-day(max(dat$date))+1
We generate a grouping variable. It is a factor with labels like 'January 2015, March 2015'. However, we force the order by creating a sequence (by month) from min date to max date and formatting it in the same way.
dat$group <- factor(format(dat$date, "%B %Y"),
levels=format(seq(min_month, max_month,by="month"),
"%B %Y"))
This forces the ordering on the axis:
Try adding
scale_x_discrete(limits = month.abb)
so your code would be
MonthlyActivity<-ggplot(DF,aes(x=Month, y=TotalSteps))+ geom_boxplot()+scale_x_discrete(limits = month.abb)
you will need library(dplyr)
I have this .txt file:
http://pastebin.com/raw.php?i=0fdswDxF
First column (Date) shows date in month/day
So 0601 is the 1st of June
When I load this into R and I show the data, it removes the first 0 in the data.
So when loaded it looks like:
601
602
etc
For 1st of June, 2nd of June
For the months 10,11,12, it remains unchanged.
How do I change it back to 0601 etc.?
What I am trying to do is to change these days into the day of the year, for instance,
1st of January (0101) would be 1, and 31st of December would be 365.
There is no leap year to be considered.
I have the code to change this, if my data was shown as 0601 etc, but not as 601 etc.
copperNew$Date = as.numeric(as.POSIXct(strptime(paste0("2013",copperNew$Date), format="%Y%m%d")) -
as.POSIXct("2012-12-31"), units = "days")
Where Date of course is from the file linked above.
Please ask if you do not consider the description to be good enough.
You can use colClasses in the read.table function, then convert to POSIXlt and extract the year date. You are over complicating the process.
copperNew <- read.table("http://pastebin.com/raw.php?i=0fdswDxF", header=TRUE,
colClasses=c("character", "integer", rep("numeric", 3)))
tmp <- as.POSIXlt( copperNew$Date, format='%m%d' )
copperNew$Yday <- tmp$yday
The as.POSIXct function is able to parse a string without a year (assumes the current year) and computes the day of the year for you.
d<-as.Date("0201", format = "%m%d")
strftime(d, format="%j")
#[1] "032"
First you parse your string and obtain Date object which represents your date (notice that it will add current year, so if you want to count days for some specific year add it to your string: as.Date("1988-0201", format = "%Y-%m%d")).
Function strftime will convert your Date to POSIXlt object and return day of year. If you want the result to be a numeric value, you can do it like this: as.numeric(strftime(d, format = "%j"))(Thanks Gavin Simpson)
Convert it to POSIXlt using a year that is not a leap-year, then access the yday element and add 1 (because yday is 0 on January 1st).
strptime(paste0("2011","0201"),"%Y%m%d")$yday+1
# [1] 32
From start-to-finish:
x <- read.table("http://pastebin.com/raw.php?i=0fdswDxF",
colClasses=c("character",rep("numeric",5)), header=TRUE)
x$Date <- strptime(paste0("2011",x$Date),"%Y%m%d")$yday+1
In which language?
If it's something like C#, Java or Javascript, I'd follow these steps:
1-) parse a pair of integers from that column;
2-) create a datetime variable whose day and month are taken from the integers from step one. Set the year to some fixed value, or to the current year.
3-) create another datetime variable, whose date is the 1st of February of the same year as the one in step 2.
The number of the day is the difference in days between the datetime variables, + 1 day.
This one worked for me:
copperNew <- read.table("http://pastebin.com/raw.php?i=0fdswDxF",
header=TRUE, sep=" ", colClasses=c("character",
"integer",
rep("numeric", 3)))
copperNew$diff = difftime(as.POSIXct(strptime(paste0("2013",dat$Date),
format="%Y%m%d", tz="GMT")),
as.POSIXct("2012-12-31", tz="GMT"), units="days")
I had to specify the timezone (tz argument in as.POSIXct), otherwise I got two different timezones for the vectors I am subtracting and therefore non-integer days.