This question already has answers here:
Convert date-time string to class Date
(4 answers)
Closed 1 year ago.
I just want to convert date from "dbY hms" to "y-m-d".
This is my data:
Time X10cm X20cm X30cm X40cm X50cm X60cm X70cm X80cm X90cm
1 05 Apr 2019 09:46:13 20.70675 26.20419 23.66370 18.04151 3.507654 5.644918 3.947458 0.926415 1.304021
2 11 Apr 2019 08:36:32 18.45716 25.76273 23.82202 18.59679 3.829793 6.639636 4.313009 1.002555 1.440603
3 19 Apr 2019 09:24:16 17.22486 24.03394 21.70397 16.95699 3.507654 6.827912 4.417910 1.046471 1.574125
4 26 Apr 2019 12:14:05 16.60325 22.16044 19.54150 15.73587 3.237127 6.169124 3.987147 1.002555 1.397690
5 10 May 2019 07:40:20 19.68528 22.46739 19.20813 14.97607 3.184556 5.620616 3.888364 0.959796 1.484311
6 17 May 2019 12:07:31 16.82389 23.13976 20.70675 16.86820 3.470846 5.500014 3.714201 0.985313 1.429803
I have tried many things, for example creating a character to transform it to date:
data2 = data1 %>%
mutate(Time1 = format(Time, format = "%Y-%m-%d %H:%M:%S")) %>%
mutate_at("Time1",as.Date,format="%Y-%m-%d")
but it didn't work.
You may use -
x <- '05 Apr 2019 09:46:13'
as.Date(x, '%d %b %Y')
#[1] "2019-04-05"
For the entire column of the dataframe.
data1$Time <- as.Date(data1$Time, '%d %b %Y')
If you want to use dplyr and lubridate.
library(dplyr)
library(lubridate)
data1 <- data1 %>% mutate(Time = as.Date(dmy_hms(Time)))
Related
I would like to format my date variable to %d %b %Y (e.g. 05 May 2020). However, once it has been formatted, it becomes a character variable and sorting the variable from the earliest date to the latest date would not be possible (e.g. 05 May 2020 is sorted before 26 Apr 2020).
Data:
df <- structure(list(Date = structure(c(1588204800, 1587945600, 1588464000, 1588032000,
1588291200, 1588377600, 1588118400), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = "data.frame", row.names = c(NA, -7L))
# > df
# Date
# 1 2020-04-30
# 2 2020-04-27
# 3 2020-05-03
# 4 2020-04-28
# 5 2020-05-01
# 6 2020-05-02
# 7 2020-04-29
Here is how it looks like sorting a formatted date variable:
df %>%
mutate(Date = format(Date, "%d %b %Y")) %>%
arrange(Date)
# Date
# 1 01 May 2020
# 2 02 May 2020
# 3 03 May 2020
# 4 27 Apr 2020
# 5 28 Apr 2020
# 6 29 Apr 2020
# 7 30 Apr 2020
So, this is what I have done, which works, but I would like to know if this is really correct or if there are alternatives to solve this.
df %>%
mutate(Date = factor(Date, labels = format(sort(unique(Date)), "%d %b %Y"), ordered = TRUE)) %>%
arrange(Date)
# Date
# 1 27 Apr 2020
# 2 28 Apr 2020
# 3 29 Apr 2020
# 4 30 Apr 2020
# 5 01 May 2020
# 6 02 May 2020
# 7 03 May 2020
Edit:
Actually the reason behind wanting to format it and arranging it, is so that I can have direct access to more readable date formats when building my dashboard for my users.
When it comes to ggplot(), even after you do arrange and mutate with format, the facetted plots, will always give in sorted character order. Example below:
df %>%
arrange(Date) %>%
mutate(n = 1:n(),
Date = format(Date, "%d %b %Y")) %>%
ggplot() +
geom_bar(aes(x = n)) +
facet_wrap(~Date)
If you want to use dates in plots the main idea is to adjust the factor levels based on order in which you want to show data. arrange the dates first and attach factor levels based on occurrence of dates.
library(dplyr)
library(ggplot2)
df %>%
arrange(Date) %>%
mutate(n = row_number(),
Date = format(Date, "%d %b %Y"),
Date = factor(Date, levels = unique(Date))) %>%
ggplot() + geom_bar(aes(x = n)) + facet_wrap(~Date)
My original solution is below, but the better solution is so simple it hurts a little that I didn't spot it immediately - do your arrange() before your mutate() - at that point it is a date-type variable so will sort the way you want it to:
df %>%
arrange(Date) %>%
mutate(Date = format(Date, "%d %b %Y"))
Giving:
Date
1 27 Apr 2020
2 28 Apr 2020
3 29 Apr 2020
4 30 Apr 2020
5 01 May 2020
6 02 May 2020
7 03 May 2020
Alternatively, you could add an as.Date(..., format = "%d %b %Y") to your arrange():
df %>%
mutate(Date = format(Date, "%d %b %Y")) %>%
arrange(as.Date(Date, format = "%d %b %Y"))
Personally, I prefer the tidyverse solution for dates - lubridate. Here:
library(lubridate)
df %>%
mutate(Date = ymd(Date)) %>%
arrange(Date)
In short, you can parse your dates by combining d for day, m for month and y for year. You can add time, too. For example,
ymd_hms("20150102 12:23:01")
As the example shows we do not have to bother about the seperator. If you have access this is a nice paper on that package. Otherwise, there are many tutorials out there on lubridate.
I have a column with date formatted as MM-DD-YYYY, in the Date format.
I want to add 2 columns one which only contains YYYY and the other only contains MM.
How do I do this?
Once again base R gives you all you need, and you should not do this with sub-strings.
Here we first create a data.frame with a proper Date column. If your date is in text format, parse it first with as.Date() or my anytime::anydate() (which does not need formats).
Then given the date creating year and month is simple:
R> df <- data.frame(date=Sys.Date()+seq(1,by=30,len=10))
R> df[, "year"] <- format(df[,"date"], "%Y")
R> df[, "month"] <- format(df[,"date"], "%m")
R> df
date year month
1 2017-12-29 2017 12
2 2018-01-28 2018 01
3 2018-02-27 2018 02
4 2018-03-29 2018 03
5 2018-04-28 2018 04
6 2018-05-28 2018 05
7 2018-06-27 2018 06
8 2018-07-27 2018 07
9 2018-08-26 2018 08
10 2018-09-25 2018 09
R>
If you want year or month as integers, you can wrap as as.integer() around the format.
A base R option would be to remove the substring with sub and then read with read.table
df1[c('month', 'year')] <- read.table(text=sub("-\\d{2}-", ",", df1$date), sep=",")
Or using tidyverse
library(tidyverse)
separate(df1, date, into = c('month', 'day', 'year') %>%
select(-day)
Note: it may be better to convert to datetime class instead of using the string formatting.
df1 %>%
mutate(date =mdy(date), month = month(date), year = year(date))
data
df1 <- data.frame(date = c("05-21-2017", "06-25-2015"))
Consider code:
library('zoo')
data <- c(1, 2, 4, 6)
dates <- c("2016-11-01", "2016-12-01", "2017-02-01", "2017-04-01");
z1 <- zoo(data, as.yearmon(dates))
z2 <- na.approx(z1)
Variable z2 looks like this:
nov 2016 dec 2016 feb 2017 apr 2017
1 2 4 6
But I need z2 to be similar to this:
nov 2016 dec 2016 jan 2017 feb 2017 mar 2017 apr 2017
1 2 3 4 5 6
I just need to approximate values for months where value is missing. Thanks for any hints.
With the new as.zoo argument, calendar, in zoo 1.8 (which defaults to TRUE so we don't have to specify it) we can just convert the input to "ts" and then back to "zoo" again applying na.approx after that:
na.approx(as.zoo(as.ts(z2)))
## Nov 2016 Dec 2016 Jan 2017 Feb 2017 Mar 2017 Apr 2017
## 1 2 3 4 5 6
With prior versions of zoo we can do the same but manually convert the index back to "yearmon":
na.approx(aggregate(as.zoo(as.ts(z2)), as.yearmon, c))
magrittr
Using zoo with magrittr these can be expressed as the following pipelines, respectively:
library(magrittr)
z2 %>% as.ts %>% as.zoo %>% na.approx
z2 %>% as.ts %>% as.zoo %>% aggregate(as.yearmon, c) %>% na.approx
One way using just na.approx and base R:
#add your data and dates together
df <- data.frame(data, dates = as.Date(dates))
#create all dates using seq
new_dates <- data.frame(dates = seq(as.Date(dates[1]), as.Date(dates[4]), by = 'month'))
#merge the two and then na.approx
new_df <- merge(new_dates, df, by = 'dates', all.x = TRUE)
na.approx(new_df$data)
Out:
[1] 1 2 3 4 5 6
I have a df with dates formatted in the following way.
Date Year
<chr> <dbl>
Sunday, Jul 27 2008
Tuesday, Jul 29 2008
Wednesday, July 31 (1) 2008
Wednesday, July 31 (2) 2008
Is there a simple way to achieve the following format of columns and values? I'd also like to remove the (1) and (2) notations on the two July 31 dates.
Date Year Month Day Day_of_Week
2008-07-27 2008 07 27 Sunday
With base R, you can do:
dat <- data.frame(
Date = c("Sunday, Jul 27" ,"Tuesday, Jul 29", "Wednesday, July 31", "Wednesday, July 31"),
Year = rep(2008, 4),
stringsAsFactors = FALSE
)
dts <- as.POSIXlt(paste(dat$Year, dat$Date), format = "%Y %A, %B %d")
POSIXlt provides a list-based reference for the date/time. To see them, try unclass(dts[1]).
From here it can be rather academic:
dat$Month = 1 + dts$mon # months are 0-based in POSIXlt
dat$Day = dts$mday
dat$Day_of_Week = weekdays(dts)
dat
# Date Year Month Day Day_of_Week
# 1 Sunday, Jul 27 2008 7 27 Sunday
# 2 Tuesday, Jul 29 2008 7 29 Tuesday
# 3 Wednesday, July 31 2008 7 31 Thursday
# 4 Wednesday, July 31 2008 7 31 Thursday
library(dplyr)
library(lubridate)
dat = data_frame(date = c('Sunday, Jul 27','Tuesday, Jul 29', 'Wednesday, July
31 (1)','Wednesday, July 31 (2)'), year=rep(2008,4))
dat %>%
mutate(date = gsub("\\s*\\([^\\)]+\\)","",as.character(date)),
date = parse_date_time(date,'A, b! d ')) -> dat1
year(dat1$date) <- dat1$year
# A tibble: 4 × 2
date year
<dttm> <dbl>
1 2008-07-27 2008
2 2008-07-29 2008
3 2008-07-31 2008
4 2008-07-31 2008
I have a data frame like,
2015-01-30 1 Fri
2015-01-30 2 Sat
2015-02-01 3 Sun
2015-02-02 1 Mon
2015-02-03 1 Tue
2015-02-04 1 Wed
2015-02-05 1 Thu
2015-02-06 1 Fri
2015-02-07 1 Sat
2015-02-08 1 Sun
I want to aggregaate it to weekly level such that every week starts from "monday" and ends in "sunday". So, in the aggregated data for above, first week should end on 2015-02-01.
output should look like something for above
firstweek 6
secondweek 7
I tried this,
data <- as.xts(data$value,order.by=as.Date(data$interval))
weekly <- apply.weekly(data,sum)
But here in the final result, every week is starting from Sunday.
This should work. I've called the dataframe m and named the columns possibly different to yours.
library(plyr) # install.packages("plyr")
colnames(m) = c("Date", "count","Day")
start = as.Date("2015-01-26")
m$Week <- floor(unclass(as.Date(m$Date) - as.Date(start)) / 7) + 1
m$Week = as.numeric(m$Week)
m %>% group_by(Week) %>% summarise(count = sum(count))
The library plyr is great for data manipulation, but it's just a rough hack to get the week number in.
Convert to date and use the %W format to get a week number...
df <- read.csv(textConnection("2015-01-30, 1, Fri,
2015-01-30, 2, Sat,
2015-02-01, 3, Sun,
2015-02-02, 1, Mon,
2015-02-03, 1, Tue,
2015-02-04, 1, Wed,
2015-02-05, 1, Thu,
2015-02-06, 1, Fri,
2015-02-07, 1, Sat,
2015-02-08, 1, Sun"), header=F, stringsAsFactors=F)
names(df) <- c("date", "something", "day")
df$date <- as.Date(df$date, format="%Y-%m-%d")
df$week <- format(df$date, "%W")
aggregate(df$something, list(df$week), sum)
Wit dplyr and lubridate is this really easy thanks to the function isoweek
my.df <- read.table(header=FALSE, text=
'2015-01-30 1 Fri
2015-01-30 2 Sat
2015-02-01 3 Sun
2015-02-02 1 Mon
2015-02-03 1 Tue
2015-02-04 1 Wed
2015-02-05 1 Thu
2015-02-06 1 Fri
2015-02-07 1 Sat
2015-02-08 1 Sun')
my.df %>% mutate(week = isoweek(V1)) %>% group_by(week) %>% summarise(sum(V2))
or a bit shorter
my.df %>% group_by(isoweek(V1)) %>% summarise(sum(V2))