R Split series into month - r

I have the below data with date and count. Please help in transforming this one row where months are columns. And rows are data of each year
Date count
=================
2011-01-01 10578
2011-02-01 9330
2011-03-01 10686
2011-04-01 10260
2011-05-01 10032
2011-06-01 9762
2011-07-01 10308
2011-08-01 9966
2011-09-01 10146
2011-10-01 10218
2011-11-01 8826
2011-12-01 9504
to
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
------------------------------------------------------------------------------
2011 10578 9330 10686 10260 10032 9762 10308 9966 10146 10218 8826 9504
2012 ....

This is a perfect task for ts in R base. Suppose your data.frame is xthen using ts will produce the output you want.
> ts(x$count, start=c(2011,01,01), end=c(2011,12,01), frequency=12)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2011 10578 9330 10686 10260 10032 9762 10308 9966 10146 10218 8826 9504

If your data is in x try something like this:
library(reshape2)
res <- dcast(transform(x, month = format(Date, format="%m"),
year = format(Date, "%Y")),
year ~ month, value.var="count")
rownames(res) <- res$year
res <- res[,-1]
names(res) <- toupper(month.abb[as.numeric(names(res))])
res
This assumes that x$Date is already a date. If not, you will need to first convert is to a date:
x$Date <- as.Date(x$Date)

Related

Is it possible to convert year-week date format to the first day of the week`?

I have a Year-Week format date. Is it possible to convert it to the first day of the week i.e. 201553 is 2015-12-28 and 201601 is 2016-01-04.
I found here how to do it, however, it does not work correctly on my dates. Could you help to do it without ISOweek package.
date<-c(201553L, 201601L, 201602L, 201603L, 201604L, 201605L, 201606L,
201607L, 201608L, 201609L)
as.POSIXct(paste(date, "0"),format="%Y%u %w")
Here's a way,
date<-data.frame(first = c(201553L, 201601L, 201602L, 201603L, 201604L, 201605L, 201606L,
201607L, 201608L, 201609L))
First separate the week and year from integer,
library(stringr)
library(dplyr)
date = date %>% mutate(week = str_sub(date$first,5,6))
date = date %>% mutate(year = str_sub(date$first,1,4))
The use aweek package to find the date,
library(aweek)
date = date %>% mutate(actual_date = get_date(week = date$week, year = date$year))
first week year actual_date
1 201553 53 2015 2015-12-28
2 201601 01 2016 2016-01-04
3 201602 02 2016 2016-01-11
4 201603 03 2016 2016-01-18
5 201604 04 2016 2016-01-25
6 201605 05 2016 2016-02-01
7 201606 06 2016 2016-02-08
8 201607 07 2016 2016-02-15
9 201608 08 2016 2016-02-22
10 201609 09 2016 2016-02-29

Calculating first and last day of month from a yearmon object

I have a simple df with a column of dates in yearmon class:
df <- structure(list(year_mon = structure(c(2015.58333333333, 2015.66666666667,
2015.75, 2015.83333333333, 2015.91666666667, 2016, 2016.08333333333,
2016.16666666667, 2016.25, 2016.33333333333), class = "yearmon")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L))
I'd like a simple way, preferably using base R, lubridate or xts / zoo to calculate the first and last days of each month.
I've seen other packages that do this, but I'd like to stick with the aforementioned if possible.
We can use
library(dplyr)
library(lubridate)
library(zoo)
df %>%
mutate(firstday = day(year_mon), last = day(as.Date(year_mon, frac = 1)))
Using base R, you could convert the yearmon object to date using as.Date which would give you the first day of the month. For the last day, we could increment the date by a month (1/12) and subtract 1 day from it.
df$first_day <- as.Date(df$year_mon)
df$last_day <- as.Date(df$year_mon + 1/12) - 1
df
# year_mon first_day last_day
# <S3: yearmon> <date> <date>
# 1 Aug 2015 2015-08-01 2015-08-31
# 2 Sep 2015 2015-09-01 2015-09-30
# 3 Oct 2015 2015-10-01 2015-10-31
# 4 Nov 2015 2015-11-01 2015-11-30
# 5 Dec 2015 2015-12-01 2015-12-31
# 6 Jan 2016 2016-01-01 2016-01-31
# 7 Feb 2016 2016-02-01 2016-02-29
# 8 Mar 2016 2016-03-01 2016-03-31
# 9 Apr 2016 2016-04-01 2016-04-30
#10 May 2016 2016-05-01 2016-05-31
Use as.Date.yearmon from zoo as shown. frac specifies the fractional amount through the month to use so that 0 is beginning of the month and 1 is the end.
The default value of frac is 0.
You must already be using zoo if you are using yearmon (since that is where the yearmon methods are defined) so this does not involve using any additional packages beyond what you are already using.
If you are using dplyr, optionally replace transform with mutate.
transform(df, first = as.Date(year_mon), last = as.Date(year_mon, frac = 1))
gives:
year_mon first last
1 Aug 2015 2015-08-01 2015-08-31
2 Sep 2015 2015-09-01 2015-09-30
3 Oct 2015 2015-10-01 2015-10-31
4 Nov 2015 2015-11-01 2015-11-30
5 Dec 2015 2015-12-01 2015-12-31
6 Jan 2016 2016-01-01 2016-01-31
7 Feb 2016 2016-02-01 2016-02-29
8 Mar 2016 2016-03-01 2016-03-31
9 Apr 2016 2016-04-01 2016-04-30
10 May 2016 2016-05-01 2016-05-31

Convert character to Date (Thu Jun 14 *** 2018-05-14) in r [duplicate]

This question already has an answer here:
R convert string date (e.g. "October 1, 2014") to Date format
(1 answer)
Closed 4 years ago.
I have a dataframe which is about World Cup matches that include date,location,match_name etc.
In this dataframe I want to convert date column as date in format "2018-05-06"
Here is my file;
date match_name price
1 Thu Jun 14 Russia v Saudi Arabia €453.92
2 Fri Jun 15 Egypt v Uruguay €90.00
3 Tue Jun 19 Russia v Egypt €297.45
4 Wed Jun 20 Uruguay v Saudi Arabia €95.00
and here is my expectation;
date match_name price
1 2018-05-14 Russia v Saudi Arabia €453.92
2 2018-05-15 Egypt v Uruguay €90.00
3 2018-05-19 Russia v Egypt €297.45
4 2018-05-20 Uruguay v Saudi Arabia €95.00
This sure is not the easiest way to do it, But I just wanted you to have a quick answer.
library(stringr)
library(dplyr)
Data=data.frame(date=c("Thu Jun 14","Fri Jun 15","Tue Jun 19","Wed Jun 20"),match_name=c("a","b","c","d"),price=c(1,2,3,4))
Data$date=as.character(Data$date)
regexp <- "[[:digit:]]+"
Data=mutate(Data,datenum=str_extract(Data$date, regexp))
Data=mutate(Data,monthnum=str_extract(Data$date, regexp))
Data=mutate(Data,monthname=str_extract(Data$date,"Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec"))
Data=mutate(Data,monthnum=if(Data$monthname=="Jan")
"01"
else if(Data$monthname=="Feb")
"02"
else if(Data$monthname=="Mar")
"03"
else if(Data$monthname=="Apr")
"04"
else if(Data$monthname=="May")
"05"
else if(Data$monthname=="Jun")
"06"
else if(Data$monthname=="Jul")
"07"
else if(Data$monthname=="Aug")
"08"
else if(Data$monthname=="Sep")
"09"
else if(Data$monthname=="Oct")
"10"
else if(Data$monthname=="Nov")
"11"
else if(Data$monthname=="Dec")
"12"
)
mutate(Data,Final_Date=paste0("2018-",monthnum,"-",datenum))
Resulting in
date match_name price datenum monthnum monthname Final_Date
1 Thu Jun 14 a 1 14 06 Jun 2018-06-14
2 Fri Jun 15 b 2 15 06 Jun 2018-06-15
3 Tue Jun 19 c 3 19 06 Jun 2018-06-19
4 Wed Jun 20 d 4 20 06 Jun 2018-06-20
OK, let's say you have this data.frame:
myDF <-as.data.frame(x=list(date=c("Thu Jun 14","Fri Jun 15","Tue Jun 19","Wed Jun 20")))
Which constructs the following data.frame:
date
1 Thu Jun 14
2 Fri Jun 15
3 Tue Jun 19
4 Wed Jun 20
Assuming that each game is in 2018:
#for handling month abbreviations in English:
Sys.setlocale("LC_TIME", "en_US.UTF-8")
myDF$date <- as.Date(paste0(substr(myDF$date,5,10),", 2018"),format="%b %d, %Y")
The resulting myDF:
date
1 2018-06-14
2 2018-06-15
3 2018-06-19
4 2018-06-20
You can change 2018 to any year you like where necessary.
To convert a variable "date" to the format '2018-05-14', you need to perform the following function:
conv_date <- function(var, year){
var <- as.Date(paste0(var, " ", year), '%a %b %d %Y')
return(var)
}
where:
var - variable in your data table (i.e 'date')
year - the year you need
Example:
yours_df$date <- conv_date(yours_df$date, 2018)

Date formatting MMM-YYYY

I have a dataset with dates in following format:
Initial:
Jan-2015 Apr-2013 Jun-2014 Jan-2015 Jan-2016 Jan-2015 Jan-2016 Jan-2015 Apr-2012 Nov-2012 Jun-2013 Sep-2013
Final:
Feb-2014 Jan-2013 Sep-2014 Apr-2013 Sep-2014 Mar-2013 Aug-2012 Apr-2012 Oct-2012 Oct-2013 Jun-2014 Oct-2013
I would like to perform these steps:
create dummy variables for Month and Year
Subtract these dates from another dates to find out duration (final- initials) in months
I would like to do these in R?
You could use as.yearmon from the zoo package for this.
library(zoo)
12 * (as.yearmon("Jan-2015", "%b-%Y") - as.yearmon("Feb-2014", "%b-%Y"))
# result
# [1] 11
To expand on #neilfws answer, you can use the month and year functions from the lubridate package to create your dummy variables with the month and year in your data frame.
Here is the code:
library(lubridate)
library(zoo)
df <- data.frame(Initial = c("Jan-2015", "Apr-2013", "Jun-2014", "Jan-2015", "Jan-2016", "Jan-2015",
"Jan-2016", "Jan-2015", "Apr-2012", "Nov-2012", "Jun-2013", "Sep-2013"),
Final = c("Feb-2014", "Jan-2013", "Sep-2014", "Apr-2013", "Sep-2014", "Mar-2013",
"Aug-2012", "Apr-2012", "Oct-2012", "Oct-2013", "Jun-2014", "Oct-2013"))
df$Initial <- as.character(df$Initial)
df$Final <- as.character(df$Final)
df$Initial <- as.yearmon(df$Initial, "%b-%Y")
df$Final <- as.yearmon(df$Final, "%b-%Y")
df$month_initial <- month(df$Initial)
df$year_intial <- year(df$Initial)
df$month_final <- month(df$Final)
df$year_final <- year(df$Final)
df$Difference <- 12*(df$Initial-df$Final)
And here is the final data.frame:
> head(df)
Initial Final month_initial year_intial month_final year_final Difference
1 Jan 2015 Feb 2014 1 2015 2 2014 11
2 Apr 2013 Jan 2013 4 2013 1 2013 3
3 Jun 2014 Sep 2014 6 2014 9 2014 -3
4 Jan 2015 Apr 2013 1 2015 4 2013 21
5 Jan 2016 Sep 2014 1 2016 9 2014 16
6 Jan 2015 Mar 2013 1 2015 3 2013 22
Hope this helps!

How do I use a column from a dataframe in R to index another dataframe?

I have dataframe StateList with 2 columns STATE and Month. I have another data frame StateTemp with the average temp of each state for each month in the US. I am trying to create third column StateList$Temp which will get the temperature from StateTemp based upon the values of StateList$State and StateList$Month indexed into StateTemp. Please see below for reference. Any help is greatly appreciated.
head(StateList)
STATE Month
1 FL Jan
3 MD Jan
4 MD Jan
5 WI Jan
6 UT Jan
12 NY Jan
Second object:
head(StateTemp)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
AL 44.29 48.04 55.47 61.99 69.89 76.75 79.87 79.04 73.88 63.08 54.10 46.85
AZ 42.27 46.24 51.03 57.63 66.01 75.51 80.19 78.50 72.52 61.61 49.64 42.51
AR 38.48 43.76 51.96 60.36 68.62 76.40 80.57 79.26 72.26 61.47 50.32 41.59
CA 45.14 48.51 51.76 56.50 63.11 70.18 75.32 74.62 69.97 61.56 51.17 44.98
CO 23.71 28.34 35.57 43.06 52.50 62.15 67.60 65.75 57.72 46.64 33.51 25.20
CT 25.96 28.43 36.94 47.07 57.77 66.29 71.52 69.77 61.68 50.60 41.43 31.13
Try this. It uses the ability of a column oriented matrix to index a dimensione object using such an object as a single argument to the "[" function:
> StateList$Temp <- StateTemp[ with( StateList, cbind( STATE, Month) ) ]
> StateList
STATE Month Temp
1 FL Jan 44.29
3 MD Jan 42.27
4 MD Jan 42.27
5 WI Jan 23.71
6 UT Jan 45.14
12 NY Jan 38.48
You can just reshape your StateTemp to get what you want (in this example using dplyr & tidyr):
StateTemp <- read.table(text=" Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
AL 44.29 48.04 55.47 61.99 69.89 76.75 79.87 79.04 73.88 63.08 54.10 46.85
AZ 42.27 46.24 51.03 57.63 66.01 75.51 80.19 78.50 72.52 61.61 49.64 42.51
AR 38.48 43.76 51.96 60.36 68.62 76.40 80.57 79.26 72.26 61.47 50.32 41.59
CA 45.14 48.51 51.76 56.50 63.11 70.18 75.32 74.62 69.97 61.56 51.17 44.98
CO 23.71 28.34 35.57 43.06 52.50 62.15 67.60 65.75 57.72 46.64 33.51 25.20
CT 25.96 28.43 36.94 47.07 57.77 66.29 71.52 69.77 61.68 50.60 41.43 31.13", header=TRUE)
library(tidyr)
library(dplyr)
StateTemp %>%
add_rownames(var="State") %>%
gather(Month, Temp, -State)
## Source: local data frame [72 x 3]
##
## State Month Temp
## 1 AL Jan 44.29
## 2 AZ Jan 42.27
## 3 AR Jan 38.48
## 4 CA Jan 45.14
## 5 CO Jan 23.71
## 6 CT Jan 25.96
## 7 AL Feb 48.04
## 8 AZ Feb 46.24
## 9 AR Feb 43.76
## 10 CA Feb 48.51
## .. ... ... ...
If you like a more "traditional" approach:
# state list that fits to the temparature data
StateList <- data.frame( STATE = c( "AL", "CT", "CA", "AZ", "CO", "AR" ),
Month = c( "Jan", "Feb", "Mar", "Jan", "Jan", "Feb" ),
stringsAsFactors = FALSE )
# create column for temperature values
StateList$Temp <- 0
# fill it row by row
for( i in 1 : length( StateList$STATE ) )
{
s <- StateList[ i, 1 ] # get state name
m <- StateList[ i, 2 ] # get month name
# find in matrix:
StateList$Temp[ i ] <- StateTemp[ rownames( StateTemp ) == s,
colnames( StateTemp ) == m ]
}
# I guess this is what you want to see:
StateList
STATE Month Temp
1 AL Jan 44.29
2 CT Feb 28.43
3 CA Mar 51.76
4 AZ Jan 42.27
5 CO Jan 23.71
6 AR Feb 43.76
Thank you everyone for your responses. BondedDust that was awesome. Vaetchen your solution is great too. After I posted I managed to get some code with a for loop working as below. BondedDust's solution is much more elegant than mine. I need to get better with the [ function. hrbrmstr I shuld have expressed it more clearly I was not reshaping StateTemp but adding a third column to a two column dataframe StateList with 150K rows. The StateTemp is basically a lookup table to populate it. As usual there seems to be a over hundred ways to skin a cat in R.
`for (i in 1:nrow(StateList)) {
StateList$Temp[i] <-StateTemp[StateList$STATE[i],StateList$Month[i]]
}'

Resources