Convert to date format in r - r

I have been trying to convert a date from factor format to date format but I've been facing errors every time. The data is of the format
Mon/Yr
201701
201602
201506
Currently the values are factor type. I want to convert them to date format. I've used following code but I've been getting NA values
as.character(x$`Mon/Yr`)
as.POSIXct(x$`Mon/Yr`, format = '%y%m')
Output: [1] NA NA NA
I've followed example solutions from many posts but I'm not able to fix it. Can you please suggest a fix for this?

library(lubridate)
library(dplyr)
df <- data.frame("Mon/Yr" = c(201701, 201602, 101506))
df
> df
Mon.Yr
1 201701
2 201602
3 101506
df2 <- df %>%
dplyr::mutate(Mon.Yr = lubridate::parse_date_time(Mon.Yr, '%Y%m'),
Mon.Yr = base::format(Mon.Yr, "%Y-%m"))
df2
> df2
Mon.Yr
1 2017-01
2 2016-02
3 1015-06

Related

Is there a possibility to lag values of a data frame in r indexed by time?

My questions concerns lagging data in r where r should be aware of the time index. I hope the question has not been asked in any further thread. Lets consider a simple setup:
df <- data.frame(date=as.Date(c("1990-01-01","1990-02-01","1990-01-15","1990-03-01","1990-05-01","1990-07-01","1993-01-02")), value=1:7)
This code should generate a table like
date
value
1990-01-01
1
1990-02-01
2
1990-01-15
3
1990-03-01
4
1990-05-01
5
1990-07-01
6
And my aim is now to try to lag the "value" by e.g. one month such that e.g when I try to compute the lagged value of "1990-05-01" (which would be 1990-04-01 but is not present in the data) should then generate an NA in the specific row. When I use the standard lag function r is not aware of the time index and simply uses the value "4" of 1990-03-01 which is not what I want. Has anyone an idea what I could do here?
Thanks in advance! :)
All the best,
Leon
You can try %m-% for lagged month like below
library(lubridate)
transform(
df,
value_lag = value[match(date %m-% months(1), date)]
)
which gives
date value value_lag
1 1990-01-01 1 NA
2 1990-02-01 2 1
3 1990-01-15 3 NA
4 1990-03-01 4 2
5 1990-05-01 5 NA
6 1990-07-01 6 NA
7 1993-01-02 7 NA
For an example with multiple columns lets consider:
df <- data.frame(date=as.Date(c("1990-01-01","1990-02-01","1990-01-15","1990-03-01","1990-05-01","1990-07-01","1993-01-02")), value=1:7, value2=7:13)
I recently found myself the following solution:
df %>%
as_tibble() %>%
mutate(across(2:ncol(df), .fns= function(x){x[match(date %m-% months(lags),date)]}, .names="{.col}_lag"))
Thanks to your code #ThomasisCoding. :)

Change NA to blank in date time in R

This might be a simple question but I have tried a few things and they're not working.
I have a large data frame with date/time formats in. An example of my data frame is:
Index FixTime1 FixTime2
1 2017-05-06 10:11:03 NA
2 NA 2017-05-07 11:03:03
I want to remove all NAs from the dataframe and make them "" (blank). I have tried:
df[is.na(df)]<-""
but this gives the error:
Error in as.POSIXlt.character(value) :
character string is not in a standard unambiguous format
Again, this is probably very simple to fix but can't find how to do this, while keeping each of these columns in time/date format
We can use replace
df[] <- replace(as.matrix(df), is.na(df), "")
df
# Index FixTime1 FixTime2
#1 1 2017-05-06 10:11:03
#2 2 2017-05-07 11:03:03
Here a possible solution on a toy dataset, adapt this code to your needs:
df<-data.frame(date=c("01/01/2017",NA,"01/02/2017"))
df
date
1 01/01/2017
2 <NA>
3 01/02/2017
From factor to character, and then remove NA
df$date <- as.character(df$date)
df[is.na(df$date),]<-""
df
date
1 01/01/2017
2
3 01/02/2017
In your specific example, this could be fine:
df_2<-data.frame(Index=c(1,2),
+ FixTime1=c("2017-05-06 10:11:03",NA),
+ FixTime2=c(NA,"2017-05-07 11:03:03"))
df_2<-data.frame(lapply(df_2, as.character), stringsAsFactors=FALSE)
df_2[is.na(df_2$FixTime1),"FixTime1"]<-""
df_2[is.na(df_2$FixTime2),"FixTime2"]<-""
df_2
Index FixTime1 FixTime2
1 1 2017-05-06 10:11:03
2 2 2017-05-07 11:03:03

Filter a data frame by two time series

Hi I am new to R and would like to know if there is a simple way to filter data over multiple dates.
I have a data which has dates from 07.03.2003 to 31.12.2016.
I need to split/ filter the data by multiple time series, as per below.
Dates require in new data frame:
07.03.2003 to 06/03/2005
and
01/01/2013 to 31/12/2016
i.e the new data frame should not include dates from 07/03/2005 to 31/12/2012
Let's take the following data.frame with dates:
df <- data.frame( date = c(ymd("2017-02-02"),ymd("2016-02-02"),ymd("2014-02-01"),ymd("2012-01-01")))
date
1 2017-02-02
2 2016-02-02
3 2014-02-01
4 2012-01-01
I can filter this for a range of dates using lubridate::ymd and dplyr::between and dplyr::between:
df1 <- filter(df, between(date, ymd("2017-01-01"), ymd("2017-03-01")))
date
1 2017-02-02
Or:
df2 <- filter(df, between(date, ymd("2013-01-01"), ymd("2014-04-01")))
date
1 2014-02-01
I would go with lubridate. In particular
library(data.table)
library(lubridate)
set.seed(555)#in order to be reproducible
N <- 1000#number of pseudonumbers to be generated
date1<-dmy("07-03-2003")
date2<-dmy("06-03-2005")
date3<-dmy("01-01-2013")
date4<-dmy("31-12-2016")
Creating data table with two columns (dates and numbers):
my_dt<-data.table(date_sample=c(sample(seq(date1, date4, by="day"), N),numeric_sample=sample(N,replace = F)))
> head(my_dt)
date_sample numeric_sample
1: 2007-04-11 2
2: 2006-04-20 71
3: 2007-12-20 46
4: 2016-05-23 78
5: 2011-10-07 5
6: 2003-09-10 47
Let's impose some cuts:
forbidden_dates<-interval(date2+1,date3-1)#create interval that dates should not fall in.
> forbidden_dates
[1] 2005-03-07 UTC--2012-12-31 UTC
test_date1<-dmy("08-03-2003")#should not fall in above range
test_date2<-dmy("08-03-2005")#should fall in above range
Therefore:
test_date1 %within% forbidden_dates
[1] FALSE
test_date2 %within% forbidden_dates
[1] TRUE
A good way of visualizing the cut:
before
>plot(my_dt)
my_dt<-my_dt[!(date_sample %within% forbidden_dates)]#applying the temporal cut
after
>plot(my_dt)

Reshape data using colsplit in R

I try to reshape my data using this code but i get NA values.
require(reshape2)
dates=data.frame(dates=seq(as.Date("1988-01-01"),as.Date("2011-12-31"),by="day"))
first=dates[,1]
dates1=cbind(dates[,1],colsplit(first,pattern="\\-",names=c("Year","Month","Day")))###split by y/m/day
head(dates1)
dates[, 1] Year Month Day
1 1988-01-01 6574 NA NA
2 1988-01-02 6575 NA NA
3 1988-01-03 6576 NA NA
4 1988-01-04 6577 NA NA
5 1988-01-05 6578 NA NA
6 1988-01-06 6579 NA NA
We can use cSplit from splitstacshape to split the 'dates' column by the delimiter -.
library(splitstackshape)
cSplit(dates, 'dates', '-', drop=FALSE)
Or extract to create additional columns
library(tidyr)
extract(dates, dates, into=c('Year', 'Month', 'Day'),
'([^-]+)-([^-]+)-([^-]+)', remove=FALSE)
Or another option from tidyr (suggested by #Ananda Mahto)
separate(dates, dates, into = c("Year", "Month", "Day"), remove=FALSE)
Or using read.table from base R. We specify the sep and the colum names, and cbind with the original column.
cbind(dates[1],read.table(text=as.character(dates$dates),
sep='-', col.names=c('Year', 'Month', 'Day')))
By using reshape2_1.4.1, I could reproduce the error
head(cbind(dates[,1],colsplit(first,pattern="-",
names=c("Year","Month","Day"))),2)
# dates[, 1] Year Month Day
#1 1988-01-01 6574 NA NA
#2 1988-01-02 6575 NA NA

How to add dates not in a data frame with a count of 0?

I have data for some dates with counts; the other dates, where the event I'm counting didn't occur, do not appear in this data set. In order to do some analysis, I'd like to create a data frame that includes those missing dates but with a count of 0. Here is what some data might look like:
mydates <- c("2013-10-01", "2013-10-04", "2013-10-05", "2013-10-08")
mycounts <- c(2,4,3,1)
df <- data.frame(mydates,mycounts)
I know how to create a vector with all the dates:
alldates <- seq.Date(as.Date("2013-10-01"), as.Date("2013-10-08"), "days")
What I want to do is check whether each item in alldates exists in df$mydates; if it does, then use the corresponding count from the data frame in a new vector and if it doesn't, use 0 as the count in a new vector. But I'm not having much luck. For example, this
mycount <- ifelse(alldates %in% df$mydates, df$mycounts, 0)
gives me a vector but an inaccurate one.
Thanks for any help!
mydates <- c("2013-10-01", "2013-10-04", "2013-10-05", "2013-10-08")
mycounts <- c(2,4,3,1)
df <- data.frame(mydates,mycounts)
alldates <- data.frame(
mydates = seq.Date(as.Date("2013-10-01"), as.Date("2013-10-08"), "days")
)
merge(
alldates,
df,
all = TRUE
)
Output -
mydates mycounts
1 2013-10-01 NA
2 2013-10-01 2
3 2013-10-02 NA
4 2013-10-03 NA
5 2013-10-04 NA
6 2013-10-04 4
7 2013-10-05 NA
8 2013-10-05 3
9 2013-10-06 NA
10 2013-10-07 NA
11 2013-10-08 NA
12 2013-10-08 1

Resources