I have a dataframe with this kind of column data.
1906-02-20
1906-02-21
I want to create separate columns with years, months and days.
Intended output:
1906 02 20
1906 02 21
I have used things like strptime and lubridate before. But not able to figure this out.
format(as.Date('2014-12-31'),'%Y %m %d')
(obviously you want to replace as.Date('2014-12-31') with your date vector).
format() converts dates to stings given the format string provided. for the individual year, month and date values, you want:
myData$year <- format(as.Date('2014-12-31'),'%Y')
#> "2014"
myData$month <- format(as.Date('2014-12-31'),'%m')
#> "12"
myData$day <- format(as.Date('2014-12-31'),'%d')
#> "31"
I often refer to this page when I need to look up the meaning of the format strings.
Look at this:
X <- data.frame()
X <- rbind(strsplit(x = "1906-02-20", split = "-")[[1]])
colnames(X) <- c("year", "month", "day")
Try the separate function in the tidyr package
library(tidyr)
separate(df, "V1", into = c("year", "month", "day"), sep = "-")
# year month day
# 1 1906 02 20
# 2 1906 02 21
Data
df <- read.table(text = "
1906-02-20
1906-02-21
")
Related
I am having a data frame like this
df <- data.frame(
'Week' = c(27,28,29),
'date' = c("2019-W (01-Jul)","2019-W (08-Jul)","2019-W (15-Jul)"))
I need to append Week column after W in date column
expecteddf <- data.frame(
'Week' = c(27,28,29),
'date' = c("2019-W27 (01-Jul)","2019-W28 (08-Jul)","2019-W29 (15-Jul)"))
How can I achieve this in R?
Thanks in advance!!
You can use paste0 with a combination of sub, i.e.
paste0(sub(' .*', '', df$date), df$Week, sub('.* ', ' ', df$date))
#[1] "2019-W27 (01-Jul)" "2019-W28 (08-Jul)" "2019-W29 (15-Jul)"
in base R, you could also use regmatches + regexpr check the solution #Darren for elaboration on the pattern (?<=W)
regmatches(df$date, regexpr("(?<=W)", df$date, perl = TRUE)) <- df$Week
df
Week date
1 27 2019-W27 (01-Jul)
2 28 2019-W28 (08-Jul)
3 29 2019-W29 (15-Jul)
A base solution with sub(..., perl = T):
within(df, date <- Vectorize(sub)("(?<=W)", Week, date, perl = T))
Note:
"(?<=W)" matches the position behind "W".
The first two arguments of sub() cannot be vectorized, so Vectorize() or mapply() are needed here.
The corresponding str_replace() version, which is vectorized.
library(dplyr)
library(stringr)
df %>%
mutate(date = str_replace(date, "(?<=W)", as.character(Week)))
Output
# Week date
# 1 27 2019-W27 (01-Jul)
# 2 28 2019-W28 (08-Jul)
# 3 29 2019-W29 (15-Jul)
With stringr::str_replace, the replacement can be vectorized:
library(stringr)
df$date = str_replace(df$date, "W", paste0("W", df$Week))
df
# Week date
# 1 27 2019-W27 (01-Jul)
# 2 28 2019-W28 (08-Jul)
# 3 29 2019-W29 (15-Jul)
Alternately, we could take a date formatting approach. Converting your date column to an actual Date class (df$Date, below), we can then convert the actual Date to your desired format (or any other).
df$Date = as.Date(df$date, format = "%Y-W (%d-%b)")
df$result = format(df$Date, format = "%Y-W%V (%d-%b)")
df
# Week date Date result
# 1 27 2019-W (01-Jul) 2019-07-01 2019-W27 (01-Jul)
# 2 28 2019-W (08-Jul) 2019-07-08 2019-W28 (08-Jul)
# 3 29 2019-W (15-Jul) 2019-07-15 2019-W29 (15-Jul)
You can use mutate in str_c
library(tidyverse)
df %>%
mutate(date = str_c(str_sub(date,1,6),
Week,
str_sub(date,7)))
A base R option using:
gsub + Vectorize
expecteddf <- within(df,date <- Vectorize(gsub)("W",paste0("W",Week),date))
gsub + mapply
expecteddf <- within(
df,
date <- mapply(function(x, p) gsub("(.*W)(\\s.*)", sprintf("\\1%s\\2", p), x), date, Week)
)
I would like to retain my current date column in year-month format as date. It currently gets converted to chr format. I have tried as_datetime but it coerces all values to NA.
The format I am looking for is: "2017-01"
library(lubridate)
df<- data.frame(Date=c("2017-01-01","2017-01-02","2017-01-03","2017-01-04",
"2018-01-01","2018-01-02","2018-02-01","2018-03-02"),
N=c(24,10,13,12,10,10,33,45))
df$Date <- as_datetime(df$Date)
df$Date <- ymd(df$Date)
df$Date <- strftime(df$Date,format="%Y-%m")
Thanks in advance!
lubridate only handle dates, and dates have days. However, as alistaire mentions, you can floor them by month of you want work monthly:
library(tidyverse)
df_month <-
df %>%
mutate(Date = floor_date(as_date(Date), "month"))
If you e.g. want to aggregate by month, just group_by() and summarize().
df_month %>%
group_by(Date) %>%
summarize(N = sum(N)) %>%
ungroup()
#> # A tibble: 4 x 2
#> Date N
#> <date> <dbl>
#>1 2017-01-01 59
#>2 2018-01-01 20
#>3 2018-02-01 33
#>4 2018-03-01 45
You can solve this with zoo::as.yearmon() function. Follows the solution:
library(tidyquant)
library(magrittr)
library(dplyr)
df <- data.frame(Date=c("2017-01-01","2017-01-02","2017-01-03","2017-01-04",
"2018-01-01","2018-01-02","2018-02-01","2018-03-02"),
N=c(24,10,13,12,10,10,33,45))
df %<>% mutate(Date = zoo::as.yearmon(Date))
You can use cut function, and use breaks="month" to transform all your days in your dates to the first day of the month. So any date within the same month will have the same date in the new created column.
This is usefull to group all other variables in your data frame by month (essentially what you are trying to do). However cut will create a factor, but this can be converted back to a date. So you can still have the date class in your data frame.
You just can't get rid of the day in a date (because then, is not a date...). Afterwards you can create a nice format for axes or tables. For example:
true_date <-
as.POSIXlt(
c(
"2017-01-01",
"2017-01-02",
"2017-01-03",
"2017-01-04",
"2018-01-01",
"2018-01-02",
"2018-02-01",
"2018-03-02"
),
format = "%F"
)
df <-
data.frame(
Date = cut(true_date, breaks = "month"),
N = c(24, 10, 13, 12, 10, 10, 33, 45)
)
## here df$Date is a 'factor'. You could use substr to create a formated column
df$formated_date <- substr(df$Date, start = 1, stop = 7)
## and you can convert back to date class. format = "%F", is ISO 8601 standard date format
df$true_date <- strptime(x = as.character(df$Date), format = "%F")
str(df)
I would like to remove incomplete months from my data frame even if some of the month has data.
Example data frame:
date <- seq.Date(as.Date("2016-01-15"),as.Date("2016-09-19"),by="day")
data <- seq(1:249)
df <- data.frame(date,data)
What I would like:
date2 <- seq.Date(as.Date("2016-02-01"),as.Date("2016-08-31"),by="day")
data2 <- seq(from = 18, to = 230)
df2 <- data.frame(date2,data2)
If I interpreted your question correctly, you want to be able to select the months that have a complete number of days, removing those that don't.
The following uses dplyr v0.7.0:
library(dplyr)
df <- df %>%
mutate(mo = months(date)) # add month (mo)
complete_mo <- df %>%
count(mo) %>% #count number of days in month (n)
filter(n >= 28) %>% #rule of thumb definition of a `complete month`
pull(mo)
df_complete_mo <- df %>%
filter(mo %in% complete_mo) %>% # here is where you select the complete months
select(-mo) #remove mo, to keep your original df
Then df_complete_mo yields your dataset with just complete months.
You could join a complete set of dates for each month to your data frame and then filter out months with any missing values.
library(tidyverse)
library(lubridate)
df.filtered = data.frame(date=seq(min(df$date)-31,max(df$date)+31,by="day")) %>%
left_join(df) %>%
group_by(month=month(date)) %>% # Add a month column and group by it
filter(!any(is.na(data))) %>% # Remove months with any missing data
ungroup %>%
select(-month) # Remove the month column
# A tibble: 213 x 2
date data
<date> <int>
1 2016-02-01 18
2 2016-02-02 19
3 2016-02-03 20
4 2016-02-04 21
5 2016-02-05 22
6 2016-02-06 23
7 2016-02-07 24
8 2016-02-08 25
9 2016-02-09 26
10 2016-02-10 27
# ... with 203 more rows
In base R, you could do the following.
# get start and end dates of months that are are beyond the sample
dateRange <- as.Date(format(range(df$date) + c(-32, 32), c("%Y-%m-2", "%Y-%m-1"))) - 1
the second argument of format is a vector that separately formats the min and the max dates. We subtract 1 from these dates to get the first day of a month and the last day of a month. This returns
dateRange
[1] "2015-12-01" "2016-09-30"
Now, use which.max to select the first date that matches and which with tail to select the last day that matches monthly sequences in order to figure out the starting and stopping rows of your data.frame.
startRow <- which.max(df$date %in% seq(dateRange[1], dateRange[2], by="month"))
stopRow <- tail(which(df$date %in% (seq(dateRange[1], dateRange[2], by="month")-1)), 1)
Now, subset your data.frame
dfNew <- df[startRow:stopRow,]
range(dfNew$date)
[1] "2016-02-01" "2016-08-31"
nrow(dfNew)
[1] 213
I have a data.frame that looks like this:
> df1
Date Name Surname Amount
2015-07-24 John Smith 200
I want to extrapolate all the infos out of the Date into new columns, so I can get to this:
> df2
Date Year Month Day Day_w Name Surname Amount
2015-07-24 2015 7 24 Friday John Smith 200
So now I'd like to have Year, Month, Day and Day of the Week. How can I do that? When I try to first make the variable a date using as.Date the data.frame gets messed up and the Date all become NA (and no new columns). Thanks for your help!
Here's a simple and efficient solution using the devel version of data.table and its new tstrsplit function which will perform the splitting operation only once and also update your data set in place.
library(data.table)
setDT(df1)[, c("Year", "Month", "Day", "Day_w") :=
c(tstrsplit(Date, "-", type.convert = TRUE), wday(Date))]
df1
# Date Name Surname Amount Year Month Day Day_w
# 1: 2015-07-24 John Smith 200 2015 7 24 6
Note that I've used a numeric representation of the week days because there is an efficient built in wday function for that in the data.table package, but you can easily tweak it if you really need to using format(as.Date(Date), format = "%A") instead.
In order to install the devel version use the following
library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)
Maybe this helps:
df2 <- df1
dates <- strptime(as.character(df1$Date),format="%Y-%m-%d")
df2$Year <- format(dates, "%Y")
df2$Month <- format(dates, "%m")
df2$Day <- format(dates, "%d")
df2$Day_w <- format(dates, "%a")
Afterwards you can rearrange the order of columns in df2as you desire.
I don't often have to work with dates in R, but I imagine this is fairly easy. I have a column that represents a date in a dataframe. I simply want to create a new dataframe that summarizes a 2nd column by Month/Year using the date. What is the best approach?
I want a second dataframe so I can feed it to a plot.
Any help you can provide will be greatly appreciated!
EDIT: For reference:
> str(temp)
'data.frame': 215746 obs. of 2 variables:
$ date : POSIXct, format: "2011-02-01" "2011-02-01" "2011-02-01" ...
$ amount: num 1.67 83.55 24.4 21.99 98.88 ...
> head(temp)
date amount
1 2011-02-01 1.670
2 2011-02-01 83.550
3 2011-02-01 24.400
4 2011-02-01 21.990
5 2011-02-03 98.882
6 2011-02-03 24.900
I'd do it with lubridate and plyr, rounding dates down to the nearest month to make them easier to plot:
library(lubridate)
df <- data.frame(
date = today() + days(1:300),
x = runif(300)
)
df$my <- floor_date(df$date, "month")
library(plyr)
ddply(df, "my", summarise, x = mean(x))
There is probably a more elegant solution, but splitting into months and years with strftime() and then aggregate()ing should do it. Then reassemble the date for plotting.
x <- as.POSIXct(c("2011-02-01", "2011-02-01", "2011-02-01"))
mo <- strftime(x, "%m")
yr <- strftime(x, "%Y")
amt <- runif(3)
dd <- data.frame(mo, yr, amt)
dd.agg <- aggregate(amt ~ mo + yr, dd, FUN = sum)
dd.agg$date <- as.POSIXct(paste(dd.agg$yr, dd.agg$mo, "01", sep = "-"))
A bit late to the game, but another option would be using data.table:
library(data.table)
setDT(temp)[, .(mn_amt = mean(amount)), by = .(yr = year(date), mon = months(date))]
# or if you want to apply the 'mean' function to several columns:
# setDT(temp)[, lapply(.SD, mean), by=.(year(date), month(date))]
this gives:
yr mon mn_amt
1: 2011 februari 42.610
2: 2011 maart 23.195
3: 2011 april 61.891
If you want names instead of numbers for the months, you can use:
setDT(temp)[, date := as.IDate(date)
][, .(mn_amt = mean(amount)), by = .(yr = year(date), mon = months(date))]
this gives:
yr mon mn_amt
1: 2011 februari 42.610
2: 2011 maart 23.195
3: 2011 april 61.891
As you see this will give the month names in your system language (which is Dutch in my case).
Or using a combination of lubridate and dplyr:
temp %>%
group_by(yr = year(date), mon = month(date)) %>%
summarise(mn_amt = mean(amount))
Used data:
# example data (modified the OP's data a bit)
temp <- structure(list(date = structure(1:6, .Label = c("2011-02-01", "2011-02-02", "2011-03-03", "2011-03-04", "2011-04-05", "2011-04-06"), class = "factor"),
amount = c(1.67, 83.55, 24.4, 21.99, 98.882, 24.9)),
.Names = c("date", "amount"), class = c("data.frame"), row.names = c(NA, -6L))
You can do it as:
short.date = strftime(temp$date, "%Y/%m")
aggr.stat = aggregate(temp$amount ~ short.date, FUN = sum)
Just use xts package for this.
library(xts)
ts <- xts(temp$amount, as.Date(temp$date, "%Y-%m-%d"))
# convert daily data
ts_m = apply.monthly(ts, FUN)
ts_y = apply.yearly(ts, FUN)
ts_q = apply.quarterly(ts, FUN)
where FUN is a function which you aggregate data with (for example sum)
Here's a dplyr option:
library(dplyr)
df %>%
mutate(date = as.Date(date)) %>%
mutate(ym = format(date, '%Y-%m')) %>%
group_by(ym) %>%
summarize(ym_mean = mean(x))
I have a function monyr that I use for this kind of stuff:
monyr <- function(x)
{
x <- as.POSIXlt(x)
x$mday <- 1
as.Date(x)
}
n <- as.Date(1:500, "1970-01-01")
nn <- monyr(n)
You can change the as.Date at the end to as.POSIXct to match the date format in your data. Summarising by month is then just a matter of using aggregate/by/etc.
One more solution:
rowsum(temp$amount, format(temp$date,"%Y-%m"))
For plot you could use barplot:
barplot(t(rowsum(temp$amount, format(temp$date,"%Y-%m"))), las=2)
Also, given that your time series seem to be in xts format, you can aggregate your daily time series to a monthly time series using the mean function like this:
d2m <- function(x) {
aggregate(x, format(as.Date(zoo::index(x)), "%Y-%m"), FUN=mean)
}