This question already has answers here:
Find which season a particular date belongs to
(11 answers)
Closed 8 years ago.
I am using R, and I need to set up a loop (I think) where I extract the month from the date and assign a season. I would like to assign winter to months 12, 1, 2; spring to 3, 4, 5; summer to 6, 7, 8; and fall to 9, 10, 11. I have a subset of the data below. I am awful with loops and couldn't figure it out. Also for the date, I wasn't sure how packages like lubridate would work
"","UT_TDS_ID_2011.Monitoring.Location.ID","UT_TDS_ID_2011.Activity.Start.Date","UT_TDS_ID_2011.Value","UT_TDS_ID_2011.Season"
"1",4930585,"7/28/2010 0:00",196,""
"2",4933115,"4/21/2011 0:00",402,""
"3",4933115,"7/23/2010 0:00",506,""
"4",4933115,"6/14/2011 0:00",204,""
"8",4933115,"12/3/2010 0:00",556,""
"9",4933157,"11/18/2010 0:00",318,""
"10",4933157,"11/6/2010 0:00",328,""
"11",4933157,"7/23/2010 0:00",290,""
"12",4933157,"6/14/2011 0:00",250,""
Regarding the subject/title of the question, its actually possible to do this without extracting the month. The first two solutions below do not extract the month. There is also a third solution which does extract the month but only to increment it.
1) as.yearqtr/as.yearmon Convert the dates to year/month and add one month (1/12). Then the calendar quarters correspond to the seasons so convert to year/quarter, yq, and label the quarters as shown:
library(zoo)
yq <- as.yearqtr(as.yearmon(DF$dates, "%m/%d/%Y") + 1/12)
DF$Season <- factor(format(yq, "%q"), levels = 1:4,
labels = c("winter", "spring", "summer", "fall"))
giving:
dates Season
1 7/28/2010 summer
2 4/21/2011 spring
3 7/23/2010 summer
4 6/14/2011 summer
5 12/3/2010 winter
6 11/18/2010 fall
7 11/6/2010 fall
8 7/23/2010 summer
9 6/14/2011 summer
1a) A variation of this is to use chron's quarters which produces a factor so that levels=1:4 does not have to be specified. To use chron replace the last line in (1) with:
library(chron)
DF$Season <- factor(quarters(as.chron(yq)),
labels = c("winter", "spring", "summer", "fall"))
chron could also be used in conjunction with the remaining solutions.
2) cut. This solution only uses the base of R. First convert the dates to the first of the month using cut and add 32 to get a date in the next month, d. The quarters corresponding to d are the seasons so compute the quarters using quarters and construct the labels in the same fashion as the first answser:
d <- as.Date(cut(as.Date(DF$dates, "%m/%d/%Y"), "month")) + 32
DF$Season <- factor(quarters(d), levels = c("Q1", "Q2", "Q3", "Q4"),
labels = c("winter", "spring", "summer", "fall"))
giving the same answer.
3) POSIXlt This solution also uses only the base of R:
p <- as.POSIXlt(as.Date(DF$dates, "%m/%d/%Y"))
p$day <- 1
p$mo <- p$mo+1
DF$Season <- factor(quarters(p), levels = c("Q1", "Q2", "Q3", "Q4"),
labels = c("winter", "spring", "summer", "fall"))
Note 1: We could optionally omit levels= in all these solutions if we knew that every season appears.
Note 2: We used this data frame:
DF <- data.frame(dates = c('7/28/2010', '4/21/2011', '7/23/2010',
'6/14/2011', '12/3/2010', '11/18/2010', '11/6/2010', '7/23/2010',
'6/14/2011'))
Using only base R, you can convert the "datetime" column to "Date" class (as.Date(..)), extract the "month" (format(..., '%m')) and change the character value to numeric (as.numeric(). Create an "indx" vector that have values from "1" to "12", set the names of the values according to the specific season (setNames(..)), and use this to get the corresponding "Season" for the "months" vector.
months <- as.numeric(format(as.Date(df$datetime, '%m/%d/%Y'), '%m'))
indx <- setNames( rep(c('winter', 'spring', 'summer',
'fall'),each=3), c(12,1:11))
df$Season <- unname(indx[as.character(months)])
df
# datetime Season
#1 7/28/2010 0:00 summer
#2 4/21/2011 0:00 spring
#3 7/23/2010 0:00 summer
#4 6/14/2011 0:00 summer
#5 12/3/2010 0:00 winter
#6 11/18/2010 0:00 fall
#7 11/6/2010 0:00 fall
#8 7/23/2010 0:00 summer
#9 6/14/2011 0:00 summer
Or as #Roland mentioned in the comments, you can use strptime to convert the "datetime" to "POSIXlt" and extract the month ($mon)
months <- strptime(df$datetime, format='%m/%d/%Y %H:%M')$mon +1
and use the same method as above
data
df <- data.frame(datetime = c('7/28/2010 0:00', '4/21/2011 0:00',
'7/23/2010 0:00', '6/14/2011 0:00', '12/3/2010 0:00', '11/18/2010 0:00',
'11/6/2010 0:00', '7/23/2010 0:00', '6/14/2011 0:00'),stringsAsFactors=FALSE)
Related
I have a dataframe that gives me each unique individuals age in years (to 2 decimal places) on the date of an event:
id eventDate ageatEvent
1 10-Jun-90 44.07
2 15-Feb-91 30.45
3 20-Dec-93 59.43
4 13-Nov-93 45.84
5 26-Jul-95 25.94
6 10-Mar-99 21.97
7 20-Jun-05 32.28
8 31-Jan-96 48.82
Using R, I would like to calculate each individual's date of birth (as precise as is possible given the data).
I have tried using lubridate but it is unclear what I should be converting the numeric 'age' column into, in order to subtract from the POSIXct instant eventDate.
Many thanks in advance for any assistance.
You can do that with lubridate package.
dmy converts characters to date using the order day-month-year. dyearconverts numeric to duration in years.
The difference gives the result.
library(tidyverse)
library(lubridate)
df <- tribble(
~id, ~eventDate, ~ageatEvent,
1, "10-Jun-90", 44.07,
2, "15-Feb-91", 30.45,
3, "20-Dec-93", 59.43)
df <- df %>%
mutate(eventDate = dmy(eventDate), ageatEvent = dyears(ageatEvent)) %>%
mutate(dateOfBirth = eventDate - ageatEvent)
im trying to subset a data frame in a for loop to create a smaller data.frame. This is my data.frame
day rain in mm temperature in °C season
1 201 20 summer
2 156 18 summer
3 56 -4 winter
4 98 15 spring
I want to extract a data.frame for each season (with all columns). Here is my code:
for (season in seasons){
a<- weather[which(weather$season %in% season[1]) , ,drop=TRUE]
...
}
Unfortunately, the sub-setting doesn' t work. When i use
a<- weather[which(weather$season %in% "summer") , ,drop=TRUE] it works perfectly. Also this does not work properly:
season <- "summer"
a<- weather[which(weather$season %in% season[1]) , ,drop=TRUE]
Does anyone see the problem with my code? Thank you.
It works with dplyr.
library(dplyr)
mydf <- data.frame(day = c(1,2,3,4),
rain = c(201,156,56,98),
temperature = c(20,18,-4,15),
season = c("summer", "summer", "winter", "spring"))
seasons <- c("spring", "summer", "autumn", "winter")
for (sea in seasons) {
a <- dplyr::filter(mydf, season == sea)
print(a)
}
I have a data frame
> df
Age year sex
12 80210 F
13 9123 M
I want to convert the year 80210 as 26june1982. How can I do this that the new data frame contains year in day month year formate from Julian days.
You can convert Julian dates to dates using as.Date and specifying the appropriate origin:
as.Date(8210, origin=as.Date("1960-01-01"))
#[1] "1982-06-24"
However, 80210 needs an origin pretty long ago.
You should substract the origin from the year column.
as.Date(c(80210,9123)-80210,origin='1982-06-26')
[1] "1982-06-26" "1787-11-08"
There are some options for doing this job in the R package date.
See for example on page 4, the function date.mmddyy, which says:
Given a vector of Julian dates, this returns them in the form “10/11/89”, “28/7/54”, etc.
Try this code:
age = c(12,13)
year = c(8210,9123)
sex = c("F","M")
df = data.frame(cbind(age,year,sex))
library(date)
date = date.mmddyy(year, sep = "/")
df2 = transform(df,year=date) #hint provided by jilber
df2
age year sex
1 12 6/24/82 F
2 13 12/23/84 M
I struggle mightily with dates in R and could do this pretty easily in SPSS, but I would love to stay within R for my project.
I have a date column in my data frame and want to remove the year completely in order to leave the month and day. Here is a peak at my original data.
> head(ds$date)
[1] "2003-10-09" "2003-10-11" "2003-10-13" "2003-10-15" "2003-10-18" "2003-10-20"
> class((ds$date))
[1] "Date"
I "want" it to be.
> head(ds$date)
[1] "10-09" "10-11" "10-13" "10-15" "10-18" "10-20"
> class((ds$date))
[1] "Date"
If possible, I would love to set the first date to be October 1st instead of January 1st.
Any help you can provide will be greatly appreciated.
EDIT: I felt like I should add some context. I want to plot an NHL player's performance over the course of a season which starts in October and ends in April. To add to this, I would like to facet the plots by each season which is a separate column in my data frame. Because I want to compare cumulative performance over the course of the season, I believe that I need to remove the year portion, but maybe I don't; as I indicated, I struggle with dates in R. What I am looking to accomplish is a plot that compares cumulative performance over relative dates by season and have the x-axis start in October and end in April.
> d = as.Date("2003-10-09", format="%Y-%m-%d")
> format(d, "%m-%d")
[1] "10-09"
Is this what you are looking for?
library(ggplot2)
## make up data for two seasons a and b
a = as.Date("2010/10/1")
b = as.Date("2011/10/1")
a.date <- seq(a, by='1 week', length=28)
b.date <- seq(b, by='1 week', length=28)
## make up some score data
a.score <- abs(trunc(rnorm(28, mean = 10, sd = 5)))
b.score <- abs(trunc(rnorm(28, mean = 10, sd = 5)))
## create a data frame
df <- data.frame(a.date, b.date, a.score, b.score)
df
## Since I am using ggplot I better create a "long formated" data frame
df.molt <- melt(df, measure.vars = c("a.score", "b.score"))
levels(df.molt$variable) <- c("First season", "Second season")
df.molt
Then, I am using ggplot2 for plotting the data:
## plot it
ggplot(aes(y = value, x = a.date), data = df.molt) + geom_point() +
geom_line() + facet_wrap(~variable, ncol = 1) +
scale_x_date("Date", format = "%m-%d")
If you want to modify the x-axis (e.g., display format), then you'll probably be interested in scale_date.
You have to remember Date is a numeric format, representing the number of days passed since the "origin" of the internal date counting :
> str(Date)
Class 'Date' num [1:10] 14245 14360 14475 14590 14705 ...
This is the same as in EXCEL, if you want a reference. Hence the solution with format as perfectly valid.
Now if you want to set the first date of a year as October 1st, you can construct some year index like this :
redefine.year <- function(x,start="10-1"){
year <- as.numeric(strftime(x,"%Y"))
yearstart <- as.Date(paste(year,start,sep="-"))
year + (x >= yearstart) - min(year) + 1
}
Testing code :
Start <- as.Date("2009-1-1")
Stop <- as.Date("2011-11-1")
Date <- seq(Start,Stop,length.out=10)
data.frame( Date=as.character(Date),
year=redefine.year(Date))
gives
Date year
1 2009-01-01 1
2 2009-04-25 1
3 2009-08-18 1
4 2009-12-11 2
5 2010-04-05 2
6 2010-07-29 2
7 2010-11-21 3
8 2011-03-16 3
9 2011-07-09 3
10 2011-11-01 4
I need to convert date (m/d/y format) into 3 separate columns on which I hope to run an algorithm.(I'm trying to convert my dates into Julian Day Numbers). Saw this suggestion for another user for separating data out into multiple columns using Oracle. I'm using R and am throughly stuck about how to code this appropriately. Would A1,A2...represent my new column headings, and what would the format difference be with the "update set" section?
update <tablename> set A1 = substr(ORIG, 1, 4),
A2 = substr(ORIG, 5, 6),
A3 = substr(ORIG, 11, 6),
A4 = substr(ORIG, 17, 5);
I'm trying hard to improve my skills in R but cannot figure this one...any help is much appreciated. Thanks in advance... :)
I use the format() method for Date objects to pull apart dates in R. Using Dirk's datetext, here is how I would go about breaking up a date into its constituent parts:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
datetxt <- as.Date(datetxt)
df <- data.frame(date = datetxt,
year = as.numeric(format(datetxt, format = "%Y")),
month = as.numeric(format(datetxt, format = "%m")),
day = as.numeric(format(datetxt, format = "%d")))
Which gives:
> df
date year month day
1 2010-01-02 2010 1 2
2 2010-02-03 2010 2 3
3 2010-09-10 2010 9 10
Note what several others have said; you can get the Julian dates without splitting out the various date components. I added this answer to show how you could do the breaking apart if you needed it for something else.
Given a text variable x, like this:
> x
[1] "10/3/2001"
then:
> as.Date(x,"%m/%d/%Y")
[1] "2001-10-03"
converts it to a date object. Then, if you need it:
> julian(as.Date(x,"%m/%d/%Y"))
[1] 11598
attr(,"origin")
[1] "1970-01-01"
gives you a Julian date (relative to 1970-01-01).
Don't try the substring thing...
See help(as.Date) for more.
Quick ones:
Julian date converters already exist in base R, see eg help(julian).
One approach may be to parse the date as a POSIXlt and to then read off the components. Other date / time classes and packages will work too but there is something to be said for base R.
Parsing dates as string is almost always a bad approach.
Here is an example:
datetxt <- c("2010-01-02", "2010-02-03", "2010-09-10")
dates <- as.Date(datetxt) ## you could examine these as well
plt <- as.POSIXlt(dates) ## now as POSIXlt types
plt[["year"]] + 1900 ## years are with offset 1900
#[1] 2010 2010 2010
plt[["mon"]] + 1 ## and months are on the 0 .. 11 intervasl
#[1] 1 2 9
plt[["mday"]]
#[1] 2 3 10
df <- data.frame(year=plt[["year"]] + 1900,
month=plt[["mon"]] + 1, day=plt[["mday"]])
df
# year month day
#1 2010 1 2
#2 2010 2 3
#3 2010 9 10
And of course
julian(dates)
#[1] 14611 14643 14862
#attr(,"origin")
#[1] "1970-01-01"
To convert date (m/d/y format) into 3 separate columns,consider the df,
df <- data.frame(date = c("01-02-18", "02-20-18", "03-23-18"))
df
date
1 01-02-18
2 02-20-18
3 03-23-18
Convert to date format
df$date <- as.Date(df$date, format="%m-%d-%y")
df
date
1 2018-01-02
2 2018-02-20
3 2018-03-23
To get three seperate columns with year, month and date,
library(lubridate)
df$year <- year(ymd(df$date))
df$month <- month(ymd(df$date))
df$day <- day(ymd(df$date))
df
date year month day
1 2018-01-02 2018 1 2
2 2018-02-20 2018 2 20
3 2018-03-23 2018 3 23
Hope this helps.
Hi Gavin: another way [using your idea] is:
The data-frame we will use is oilstocks which contains a variety of variables related to the changes over time of the oil and gas stocks.
The variables are:
colnames(stocks)
"bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC"
"emMN" "emMN.1" "chdate" "chV" "cbO" "chC" "chMN" "chMX"
One of the first things to do is change the emdate field, which is an integer vector, into a date vector.
realdate<-as.Date(emdate,format="%m/%d/%Y")
Next we want to split emdate column into three separate columns representing month, day and year using the idea supplied by you.
> dfdate <- data.frame(date=realdate)
year=as.numeric (format(realdate,"%Y"))
month=as.numeric (format(realdate,"%m"))
day=as.numeric (format(realdate,"%d"))
ls() will include the individual vectors, day, month, year and dfdate.
Now merge the dfdate, day, month, year into the original data-frame [stocks].
ostocks<-cbind(dfdate,day,month,year,stocks)
colnames(ostocks)
"date" "day" "month" "year" "bpV" "bpO" "bpC" "bpMN" "bpMX" "emdate" "emV" "emO" "emC" "emMN" "emMX" "chdate" "chV"
"cbO" "chC" "chMN" "chMX"
Similar results and I also have date, day, month, year as separate vectors outside of the df.