Recently I stumble over a problem. Unfortunately my variable for the date has not been recorded uniformly.
I got a similar data frame like the one shown below
Variable1 <- c(10,20,30,40,50)
Variable2 <- c("a", "b", "c", "d", "d")
Date <- c("today 10:45", "yesterday 3:10", "28 october 2018 5:32", "28 october 2018 8:32", "27 october 2018 5:32")
df <- data.frame(Variable1, Variable2, Date)
df
For my use I need to extract only the date of it. Therefore, I would like to create a new variable based on "Date".
The Date variable should only contain the date. The hour is irrelevant for my purpose and can be ignored.
My goal is to get the following data frame:
Variable1 <- c(10,20,30,40,50)
Variable2 <- c("a", "b", "c", "d", "d")
Date <- c("31 october 2018", "30 october 2018", "28 october 2018", "28 october 2018", "27 october 2018")
df2 <- data.frame(Variable1, Variable2, Date)
df2
Preferably the values for Date should also be in the correct format (date).
Thank you already in advance.
df$NewDate[grepl("today",df$Date)]<-Sys.Date() # Convert today to date
df$NewDate[grepl("yesterday",df$Date)]<-Sys.Date()-1 # covert yesterday to date
df$NewDate[is.na(df$NewDate)]<-df$Date[is.na(df$NewDate)] %>% as.Date(format="%d %b %Y") # Convert explicit dates to date format
class(df$NewDate)<-"Date" # Convert column to Date class
df
Variable1 Variable2 Date NewDate
1 10 a today 10:45 2018-10-31
2 20 b yesterday 3:10 2018-10-30
3 30 c 28 october 2018 5:32 2018-10-28
4 40 d 28 october 2018 8:32 2018-10-28
5 50 d 27 october 2018 5:32 2018-10-27
tolower( # not strictly necessary, but for consistency
gsub("yesterday", format(Sys.Date()-1, "%d %B %Y"), # convert *day to dates
gsub("today", format(Sys.Date(), "%d %B %Y"),
gsub("\\s*[0-9:]*$", "", # remove the times
c("today 10:45", "yesterday 3:10", "28 october 2018 5:32", "28 october 2018 8:32", "27 october 2018 5:32")))))
# [1] "31 october 2018" "30 october 2018" "28 october 2018" "28 october 2018" "27 october 2018"
Another solution, using indices.
Date <- c("today 10:45", "yesterday 3:10", "28 october 2018 5:32", "28 october 2018 8:32", "27 october 2018 5:32")
Date <- sub("today", Sys.Date(), Date)
Date <- sub("yesterday", Sys.Date() - 1, Date)
i <- grep("[[:alpha:]]", Date)
Date[i] <- format(as.POSIXct(Date[i], format = "%d %B %Y %H:%M"), format = "%d %B %Y")
Date[-i] <- format(as.POSIXct(Date[-i]), format = "%d %B %Y")
Date
#[1] "31 October 2018" "30 October 2018" "28 October 2018"
#[4] "28 October 2018" "27 October 2018"
Then I noticed the solution by user r2evans, that converts everything to lowercase. So, if necessary, end with
Date <- tolower(Date)
Related
I have columns like these:
year period period2 Sales
2015 201504 April 2015 10000
2015 201505 May 2015 11000
2018 201803 March 2018 12000
I want to change the type of period or period2 column as a date, to use later in time series analysis
Data:
tibble::tibble(
year = c(2015,2015,2018),
period = c(201504, 201505,201803 ),
period2 = c("April 2015", "May 2015", "March 2018"),
Sales = c(10000,11000,12000)
)
Using lubridate package you can transform them into date variables:
df <- tibble::tibble(
year = c(2015,2015,2018),
period = c(201504, 201505,201803 ),
period2 = c("April 2015", "May 2015", "March 2018"),
Sales = c(10000,11000,12000)
)
library(dplyr)
df %>%
mutate(period = lubridate::ym(period),
period2 = lubridate::my(period2))
I have a data frame of 1968 observations and am trying to parse the date column, where I have a string format into date format. Something like this:
df$date <- c("january 2020","january 2020","january 2020","february 2020","february 2020","february 2020","march 2020","march 2020","march 2020","april 2020","april 2020","april 2020","May 2020","May 2020","May 2020","june 2020","june 2020","june 2020")
I am using lubridate package:
date <- my(df$date)
Which bring's a "857 failed to parse" warning and returns a vactor like this:
[1] NA NA NA NA NA NA 2020-03-01 2020-03-01 2020-03-01 NA NA NA NA NA NA 2020-06-01 2020-06-01 2020-06-01 2020-06-01
although I want the date in this format, ymd, I would like to have all observations parsed. I have also tried:
date <- as.Date(df$date)
date <- my(df$date, format = "%B %Y)
but these returns all observations as NA's. What is happening?
thank you
as.Date(paste(1, df$date), '%d %B %Y')
my from lubridate package should work like this:
library(dplyr)
library(lubridate)
df %>%
mutate(my_date = my(date))
1 2020-01-01
2 2020-01-01
3 2020-01-01
4 2020-02-01
5 2020-02-01
6 2020-02-01
7 2020-03-01
8 2020-03-01
9 2020-03-01
10 2020-04-01
11 2020-04-01
12 2020-04-01
13 2020-05-01
14 2020-05-01
15 2020-05-01
16 2020-06-01
17 2020-06-01
18 2020-06-01
OR:
We could use parse_date_time from lubridate:
format(lubridate::parse_date_time(df$my_date, orders = c("m/Y")), "%m-%Y")
[1] "01-2020" "01-2020" "01-2020" "02-2020" "02-2020" "02-2020" "03-2020"
[8] "03-2020" "03-2020" "04-2020" "04-2020" "04-2020" "05-2020" "05-2020"
[15] "05-2020" "06-2020" "06-2020" "06-2020"
data:
df <- structure(list(my_date = c("january 2020", "january 2020", "january 2020",
"february 2020", "february 2020", "february 2020", "march 2020",
"march 2020", "march 2020", "april 2020", "april 2020", "april 2020",
"May 2020", "May 2020", "May 2020", "june 2020", "june 2020",
"june 2020")), class = "data.frame", row.names = c(NA, -18L))
I have a data frame
date df discharge cfs green discharge cfs north discharge cfs
1 December 2018 2520.1394 171.69667 338.81082
2 November 2018 3475.1023 239.00738 422.19063
3 October 2018 1863.4778 121.91720 200.94455
4 April 2019 3244.5356 260.38507 543.34792
5 August 2019 335.5074 14.95659 29.29938
6 February 2019 1631.3048 94.35956 198.19885
7 January 2019 1767.6266 132.69408 247.54493
8 July 2019 496.9439 26.37159 57.50114
9 June 2019 1097.2101 64.17292 143.40153
10 March 2019 1081.8046 80.32419 167.57954
11 May 2019 1507.8582 100.81569 236.58269
12 November 2019 2842.3542 284.72917 586.75000
13 October 2019 544.3002 34.67999 83.58193
14 September 2019 295.7200 11.37943 26.25823
and I want to change the "date" column into a 12-2018, 11-2018, ect. format like this:
date df discharge cfs green discharge cfs north discharge cfs
1 12-2018 2520.1394 171.69667 338.81082
2 11-2018 3475.1023 239.00738 422.19063
3 10-2018 1863.4778 121.91720 200.94455
4 04-2019 3244.5356 260.38507 543.34792
5 08-2019 335.5074 14.95659 29.29938
6 02-2019 1631.3048 94.35956 198.19885
7 01-2019 1767.6266 132.69408 247.54493
8 07-2019 496.9439 26.37159 57.50114
9 06-2019 1097.2101 64.17292 143.40153
10 03-2019 1081.8046 80.32419 167.57954
11 05-2019 1507.8582 100.81569 236.58269
12 11-2019 2842.3542 284.72917 586.75000
13 10-2019 544.3002 34.67999 83.58193
14 09-2019 295.7200 11.37943 26.25823
Currently the "date" column is in "character" format. How can I change this to a date or POSIXct format and so that it looks like it does above? Thanks.
We can use as.yearmon to convert to yearmon class and then change the format
library(zoo)
df1$date <- format(as.yearmon(df1$date, "%B %Y"), "%m-%Y")
df1$date
#[1] "12-2018" "11-2018" "10-2018" "04-2019" "08-2019" "02-2019" "01-2019" "07-2019" "06-2019" "03-2019" "05-2019" "11-2019" "10-2019"
#[14] "09-2019"
data
df1 <- structure(list(date = c("December 2018", "November 2018", "October 2018",
"April 2019", "August 2019", "February 2019", "January 2019",
"July 2019", "June 2019", "March 2019", "May 2019", "November 2019",
"October 2019", "September 2019"), df_discharge_cfs = c(2520.1394,
3475.1023, 1863.4778, 3244.5356, 335.5074, 1631.3048, 1767.6266,
496.9439, 1097.2101, 1081.8046, 1507.8582, 2842.3542, 544.3002,
295.72), green_discharge_cfs = c(171.69667, 239.00738, 121.9172,
260.38507, 14.95659, 94.35956, 132.69408, 26.37159, 64.17292,
80.32419, 100.81569, 284.72917, 34.67999, 11.37943),
north_discharge_cfs = c(338.81082,
422.19063, 200.94455, 543.34792, 29.29938, 198.19885, 247.54493,
57.50114, 143.40153, 167.57954, 236.58269, 586.75, 83.58193,
26.25823)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14"))
In base R, we can paste an arbitrary date , convert to date object and then format
format(as.Date(paste0('1', df$date), '%d %B %Y'), '%m-%Y')
Another option with regex and using an inbuilt vector month.name can be
with(df, paste(match(sub('\\s\\d+', '', date), month.name),
sub('.*\\s+', '', df$date), sep = '-'))
I want to convert this table :
Person startDate endDate
Person1 2018-12-31 2019-03-30
Person2 2018-12-31 2019-01-30
Person3 2019-02-01 2019-05-30
df1 <- data.frame(Person = paste0("Person", 1:3),
startDate = as.Date(c("31 12 2018", "31 12 2018", "01 02 2019"), format = "%d %m %Y"),
endDate = as.Date(c("30 03 2019", "30 01 2019", "30 05 2019"), format = "%d %m %Y"),
stringsAsFactors = FALSE)
to that table with R
In brief : converting the time periods between start date and end date to months table vertically
Thanks!
Here is one way
library(dplyr)
library(purrr)
library(lubridate)
df1 %>%
transmute(Person, Date = map2(dmy(StartDate), dmy(EndDate), ~
seq(.x, .y, by = '1 month') %>%
format('%b %Y'))) %>%
unnest(Date)
My data has this format:
DF <- data.frame(ids = c("uniqueid1", "uniqueid1", "uniqueid1", "uniqueid2", "uniqueid2", "uniqueid2", "uniqueid2", "uniqueid3", "uniqueid3", "uniqueid3", "uniqueid4", "uniqueid4", "uniqueid4"), stock_year = c("April 2014", "March 2012", "April 2014", "January 2017", "January 2016", "January 2015", "January 2014", "November 2011", "November 2011", "December 2009", "August 2001", "July 2000", "May 1999"))
ids stock_year
1 uniqueid1 April 2014
2 uniqueid1 March 2012
3 uniqueid1 April 2014
4 uniqueid2 January 2017
5 uniqueid2 January 2016
6 uniqueid2 January 2015
7 uniqueid2 January 2014
8 uniqueid3 November 2011
9 uniqueid3 November 2011
10 uniqueid3 December 2009
11 uniqueid4 August 2001
12 uniqueid4 July 2000
13 uniqueid4 May 1999
How is it possible to remove totally rows which have in the same id have a same value in stock_year column?
An example output of expected results is this:
DF <- data.frame(ids = c("uniqueid2", "uniqueid2", "uniqueid2", "uniqueid2", "uniqueid4", "uniqueid4", "uniqueid4"), stock_year = c("January 2017", "January 2016", "January 2015", "January 2014", "August 2001", "July 2000", "May 1999"))
ids stock_year
1 uniqueid2 January 2017
2 uniqueid2 January 2016
3 uniqueid2 January 2015
4 uniqueid2 January 2014
5 uniqueid4 August 2001
6 uniqueid4 July 2000
7 uniqueid4 May 1999
We can group by 'ids' and check for duplicates to filter those 'ids' having no duplicates
library(dplyr)
DF %>%
group_by(ids) %>%
filter(!anyDuplicated(stock_year))
Or using ave from base R
DF[with(DF, ave(as.character(stock_year), ids, FUN=anyDuplicated)!=0),]