I have date data formatted in an odd way that I would like to clean up in R.
The dates are in format "d-Mon-y hh:mm:sec AM". For example "1-Feb-05 12:00:00 AM". The day and time are useless to me, however I would like to be able to use the month and year while also converting them to date-time format.
I cannot figure out how to do this.
Here is a way to do it with handy lubridate parsers and extractors. First convert the string into a datetime and then extract the month and the year:
library(tidyverse)
library(lubridate)
tibble(datetime = "1-Feb-05 12:00:00 AM") %>%
mutate(
datetime = dmy_hms(datetime),
year = year(datetime),
month = month(datetime)
)
#> # A tibble: 1 x 3
#> datetime year month
#> <dttm> <dbl> <dbl>
#> 1 2005-02-01 00:00:00 2005 2
Created on 2018-05-09 by the reprex package (v0.2.0).
Related
I am looking for the solution to my task. It is to add a new column to the data frame, which contains monthly data, that would have year-on-year change for each record that has a corresponding record of same month a year ago.
So, my code as of now is:
library(blsAPI)
cpi <- blsAPI("CUUR0000SA0",2,TRUE)
cpi$value <- as.numeric(cpi$value)
cpi$date <- as.Date(
paste0("1 ",cpi$periodName," ",cpi$year),
format = "%d %B %Y")
cpi <- cpi[order(cpi$date),]
I would like to add new column with YoY change value for cpi$value column.
Something like:
df <- data.frame(Date = c("2021-01-16", "2017-05-09"))
df |> dplyr::mutate(new = as.Date(Date) + 365)
#> Date new
#> 1 2021-01-16 2022-01-16
#> 2 2017-05-09 2018-05-09
library(lubridate)
df |>
dplyr::mutate(new = as_date(Date) %m+% years(1))
#> Date new
#> 1 2021-01-16 2022-01-16
#> 2 2017-05-09 2018-05-09
Created on 2022-02-11 by the reprex package (v2.0.1)
I am trying to convert a column in my dataset that contains week numbers into weekly Dates. I was trying to use the lubridate package but could not find a solution. The dataset looks like the one below:
df <- tibble(week = c("202009", "202010", "202011","202012", "202013", "202014"),
Revenue = c(4543, 6764, 2324, 5674, 2232, 2323))
So I would like to create a Date column with in a weekly format e.g. (2020-03-07, 2020-03-14).
Would anyone know how to convert these week numbers into weekly dates?
Maybe there is a more automated way, but try something like this. I think this gets the right days, I looked at a 2020 calendar and counted. But if something is off, its a matter of playing with the (week - 1) * 7 - 1 component to return what you want.
This just grabs the first day of the year, adds x weeks worth of days, and then uses ceiling_date() to find the next Sunday.
library(dplyr)
library(tidyr)
library(lubridate)
df %>%
separate(week, c("year", "week"), sep = 4, convert = TRUE) %>%
mutate(date = ceiling_date(ymd(paste(year, "01", "01", sep = "-")) +
(week - 1) * 7 - 1, "week", week_start = 7))
# # A tibble: 6 x 4
# year week Revenue date
# <int> <int> <dbl> <date>
# 1 2020 9 4543 2020-03-01
# 2 2020 10 6764 2020-03-08
# 3 2020 11 2324 2020-03-15
# 4 2020 12 5674 2020-03-22
# 5 2020 13 2232 2020-03-29
# 6 2020 14 2323 2020-04-05
I need to get the row of the first and last day of each month in a big data frame where I need to apply operations that cover accurately each month, using a for loop. Unfortunately, the data frame is not very homogeneous. Here a reproducible example to work upon:
dataframe <- data.frame(Date=c(seq.Date(as.Date("2020-01-01"),as.Date("2020-01-31"),by="day"),
seq.Date(as.Date("2020-02-01"),as.Date("2020-02-28"),by="day"),seq.Date(as.Date("2020-03-02"),
as.Date("2020-03-31"),by="day")))
We can create a grouping column by converting to yearmon and then get the first and last
library(zoo)
library(dplyr)
dataframe %>%
group_by(yearMon = as.yearmon(Date)) %>%
summarise(FirstDay = first(Date), LastDay = last(Date))
# A tibble: 3 x 3
# yearMon First Last
#* <yearmon> <date> <date>
#1 Jan 2020 2020-01-01 2020-01-31
#2 Feb 2020 2020-02-01 2020-02-28
#3 Mar 2020 2020-03-02 2020-03-31
If it the first and last day irrespective of the data
library(lubridate)
dataframe %>%
group_by(yearMon = as.yearmon(Date)) %>%
summarise(First = floor_date(first(Date), 'month'),
Last = ceiling_date(last(Date), 'month')-1)
I have an excel dataset in which there are dates and time points as follows:
record_id date_E1 time_E1 date_E2 time_E2 ...
1 2019/8/24 09:00:00 2019/8/25 18:00:00
I would like to construct a variable which contains the number of hours past the first time and date, (09:00 a.m 2019/8/24). When I read the excel file with
read_excel("C:/visit.xlsx")
the time_E1 .. appears as 0.3750000 0.7736111 0.4131944 0.4131944,
and the date appears as 43640 43640 43641 43642, in R. I use visit_dates<-as.Date(as.numeric(visit_date_L$Day), origin = "1899-12-30")
to convert dates to 2019-8-24 and .. but do not know how to convert time of the day and convert to the hours past the first time point. What I expect is a vector like: 0, 42, ... hours past first time point.
I have used the following code:
as.POSIXct(visit_times, format = " %H-%M", origin = "09:00:00"),
but it returns a NULL vector. After that I could use the following code to transpose and combine date and time data:
visit_time <- subset(MY_visit, select = c(record_id, time_E1, ...)
visit_date <- subset(MY_visit, select = c(record_id, date_E1,...)
visit_time_L <- melt(visit_time, id.vars=c("record_id"))
visit_date_L <- melt(visit_date, id.vars=c("record_id"))
names(visit_time_L)[names(visit_time_L)=="value"] <- "time"
names(visit_date_L)[names(visit_date_L)=="value"] <- "Day"
visit_all <- cbind(visit_time_L, visit_date_L)
Any ideas how can I solve this problem?
Here is an approach that you can try. I have dates/times stored in an Excel file. Read it in and keep the columns as characters. Convert the dates to their proper format, as you did. Convert the fractions of the time of day to numeric and multiply by 24. Paste the dates/times together and convert to date format, then find the difference between the two in hours (the result will be in days, so multiply by 24).
library(dplyr);library(readxl); library(lubridate)
df <- read_excel('Book1.xlsx',col_types = c('text'))
# A tibble: 1 x 4
date1 time1 date2 time2
<chr> <chr> <chr> <chr>
1 43466 0.375 43467 0.41666666666666669
df %>% mutate_at(c('date1','date2'), ~ as.Date(as.numeric(.),origin='1899-12-30')) %>%
mutate_at(c('time1','time2'), ~ as.numeric(.)*24) %>%
mutate(t1=ymd_h(paste(date1,time1)),
t2=ymd_h(paste(date2,time2)),
diff=as.numeric(t2-t1)*24)
# A tibble: 1 x 7
date1 time1 date2 time2 t1 t2 diff
<date> <dbl> <date> <dbl> <dttm> <dttm> <dbl>
1 2019-01-01 9 2019-01-02 10 2019-01-01 09:00:00 2019-01-02 10:00:00 25
I used the following R code to create a POSIXct date time field from a separate date and time field both in character format using lubridate and dplyr.
library(dplyr)
library(lubridate)
c_cycle_work <- tibble(
StartDate = c("1/28/2011", "2/26/2011", "4/2/2011", "4/11/2011"),
StartTime = c("10:58", "6:02", "6:00", "9:47")
)
c_cycle_work %>%
mutate(start_dt = paste0(StartDate, StartTime, sep = " ", collapse = NULL)) %>%
mutate(start_dt = mdy_hms(start_dt))
# 1 1/28/2011 10:58 2020-01-28 11:10:58
# 2 2/26/2011 6:02 2020-02-26 11:06:02
# 3 4/2/2011 6:00 2020-04-02 11:06:00
# 4 4/11/2011 9:47 2020-04-11 11:09:47
The start_dt field I created is in Y m d format even though I used mdy_hms based on the data. Also, all years have been changed to 2020.
Went over this several times, used paste vs. paste0, etc. but still stumped.
Your problem is the paste0() which doesn't have a sep= argument. So when you paste the date and time you get 1/28/201110:58 and it spilts that into 1/28/20/11/10/58 though it seemed to work differently with my version lubridate_1.6.0. Also you where use "hms" but your times didn't have seconds. This should work with your data
c_cycle_work %>%
mutate(start_dt = paste(StartDate, StartTime, sep=" ")) %>%
mutate(start_dt = mdy_hm(start_dt))
# StartDate StartTime start_dt
# <chr> <chr> <dttm>
# 1 1/28/2011 10:58 2011-01-28 10:58:00
# 2 2/26/2011 6:02 2011-02-26 06:02:00
# 3 4/2/2011 6:00 2011-04-02 06:00:00
# 4 4/11/2011 9:47 2011-04-11 09:47:00