I want to generate a data frame containing dates based on the date that I choose at the beginning as ReportDate:
ReportDate <- as.Date("2014-01-01")
date <- data.frame(matrix(nrow=60, ncol=1))
for(i in 1:60){
date[i,1] = as.Date(ReportDate+i-1, origin="%Y-%m-%d")
}
but it gives me numeric values as output not date value. Please kindly tell me how I can solve this problem.
You can just add integers to Date class objects directly:
ReportDate <- as.Date("2014-01-01")
DateDf <- data.frame(
date=ReportDate+(0:59))
##
> head(DateDf)
date
1 2014-01-01
2 2014-01-02
3 2014-01-03
4 2014-01-04
5 2014-01-05
6 2014-01-06
> str(DateDf)
'data.frame': 60 obs. of 1 variable:
$ date: Date, format: "2014-01-01" "2014-01-02" "2014-01-03" ...
Related
I am trying to read dates from different excel files and each of them have the dates stored in different formats (character or date). This is making the date column on each file being read as character "28/02/2020" or as the numeric conversion excel does to the dates "452344" (number of days since 1900)
files1 = list.files(pattern="*.xlsx")
df = lapply(files1, read_excel,col_types = "text")
df = do.call(rbind, df)
¿How can I make R to read the character type "28/02/2020" and not the "452344" numeric type?
For multiple date format in one column I suggest using lubridate::parse_date_time() (or any other date converter that converts ambiguous format to NA instead of printing an error message)
I assume your df should look something like this:
# A tibble: 6 x 2
id date
<chr> <chr>
1 1 43889
2 2 43889
3 3 43889
4 1 28/02/2020
5 2 28/02/2020
6 3 28/02/2020
Then you should use this code:
library(lubridate)
df <- as.data.frame(df)
df$date2 <- parse_date_time(x = df$date, orders = "d m y") #converts rows like "28/02/2020" to date
df[is.na(df$date2),"date2"] <- as.Date(as.numeric(df[is.na(df$date2),"date"]), origin = "1899-12-30") #converts rows like "43889"
R output:
id date date2
1 1 43889 2020-02-28
2 2 43889 2020-02-28
3 3 43889 2020-02-28
4 1 28/02/2020 2020-02-28
5 2 28/02/2020 2020-02-28
6 3 28/02/2020 2020-02-28
str(df)
'data.frame': 6 obs. of 3 variables:
$ id : chr "1" "2" "3" "1" ...
$ date : chr "43889" "43889" "43889" "28/02/2020" ...
$ date2: POSIXct, format: "2020-02-28" "2020-02-28" "2020-02-28" "2020-02-28" ...
I know it is not the nicest solution but it should work for you as well
I have some date that I am trying to convert them to numbers and then back to original date.
Date
1990-12-31
1991-12-31
1992-12-31
1993-12-31
1994-06-30
1994-12-31
I tried,
as.numeric(DF[1:6])
[1] 1 2 3 5 7
as.Date(as.numeric(DF[1:6]), "1990-12-31")
[1] "1991-01-01" "1991-01-02" "1991-01-03" "1991-01-05" "1991-01-07" "1991-01-08"
I notice the problem of time interval. What should I do to get original dates?
If what you have is a data frame with a column of class factor as shown reproducibly in the Note at the end then we do not want to apply as.numeric to that since that will just give the underlying codes for the factor levels which are not meaningful. Rather, this gives Date class:
d <- as.Date(DF$Date)
d
## [1] "1990-12-31" "1991-12-31" "1992-12-31" "1993-12-31" "1994-06-30"
## [6] "1994-12-31"
and this gives the number of days since the UNIX Epoch:
no <- as.numeric(d)
no
## [1] 7669 8034 8400 8765 8946 9130
and this turns that back to Date class:
as.Date(no, "1970-01-01")
## [1] "1990-12-31" "1991-12-31" "1992-12-31" "1993-12-31" "1994-06-30"
## [6] "1994-12-31"
Note
Lines <- "
Date
1990-12-31
1991-12-31
1992-12-31
1993-12-31
1994-06-30
1994-12-31 "
DF <- read.table(text = Lines, header = TRUE)
Hi I am new to R and would like to know if there is a simple way to filter data over multiple dates.
I have a data which has dates from 07.03.2003 to 31.12.2016.
I need to split/ filter the data by multiple time series, as per below.
Dates require in new data frame:
07.03.2003 to 06/03/2005
and
01/01/2013 to 31/12/2016
i.e the new data frame should not include dates from 07/03/2005 to 31/12/2012
Let's take the following data.frame with dates:
df <- data.frame( date = c(ymd("2017-02-02"),ymd("2016-02-02"),ymd("2014-02-01"),ymd("2012-01-01")))
date
1 2017-02-02
2 2016-02-02
3 2014-02-01
4 2012-01-01
I can filter this for a range of dates using lubridate::ymd and dplyr::between and dplyr::between:
df1 <- filter(df, between(date, ymd("2017-01-01"), ymd("2017-03-01")))
date
1 2017-02-02
Or:
df2 <- filter(df, between(date, ymd("2013-01-01"), ymd("2014-04-01")))
date
1 2014-02-01
I would go with lubridate. In particular
library(data.table)
library(lubridate)
set.seed(555)#in order to be reproducible
N <- 1000#number of pseudonumbers to be generated
date1<-dmy("07-03-2003")
date2<-dmy("06-03-2005")
date3<-dmy("01-01-2013")
date4<-dmy("31-12-2016")
Creating data table with two columns (dates and numbers):
my_dt<-data.table(date_sample=c(sample(seq(date1, date4, by="day"), N),numeric_sample=sample(N,replace = F)))
> head(my_dt)
date_sample numeric_sample
1: 2007-04-11 2
2: 2006-04-20 71
3: 2007-12-20 46
4: 2016-05-23 78
5: 2011-10-07 5
6: 2003-09-10 47
Let's impose some cuts:
forbidden_dates<-interval(date2+1,date3-1)#create interval that dates should not fall in.
> forbidden_dates
[1] 2005-03-07 UTC--2012-12-31 UTC
test_date1<-dmy("08-03-2003")#should not fall in above range
test_date2<-dmy("08-03-2005")#should fall in above range
Therefore:
test_date1 %within% forbidden_dates
[1] FALSE
test_date2 %within% forbidden_dates
[1] TRUE
A good way of visualizing the cut:
before
>plot(my_dt)
my_dt<-my_dt[!(date_sample %within% forbidden_dates)]#applying the temporal cut
after
>plot(my_dt)
I try to read data from a file which holds date and time and wrote the following code to concatenate the coulms Date and Time into 1 colum named Datetime:
df <-read.csv("file", header=TRUE)
df = data.frame(DateTime=as.POSIXct(paste(df$Date, df$Time)), df)
The problem is that the output holds only the Date and not the Time.
I also tried to change the format of the data with df$Date <- as.Date(df$Date , "%y/%m/%d") but the output is NA.
Please advice.
The file sample is here:
Date,Time
2011/12/22,02:00:00
2011/12/22,02:01:00
2011/12/22,02:02:00
2011/12/22,02:03:00
2011/12/22,02:04:00
2011/12/22,02:05:00
2011/12/22,02:06:00
2011/12/22,02:07:00
2011/12/22,02:08:00
2011/12/22,02:09:00
2011/12/22,02:10:00
2011/12/22,02:11:00
2011/12/22,02:12:00
2011/12/22,02:13:00
2011/12/22,02:14:00
2011/12/22,02:15:00
2011/12/22,02:16:00
2011/12/22,02:17:00
2011/12/22,02:18:00
2011/12/22,02:19:00
2011/12/22,02:20:00
Try
df$datetime <- as.POSIXct(paste(df$Date, df$Time), format="%Y/%m/%d %H:%M:%S")
df$Date <- as.Date(df$Date, "%Y/%m/%d")
head(df,3)
# ( Date Time datetime
#1 2011-12-22 02:00:00 2011-12-22 02:00:00
#2 2011-12-22 02:01:00 2011-12-22 02:01:00
#3 2011-12-22 02:02:00 2011-12-22 02:02:00
str(df)
#'data.frame': 21 obs. of 3 variables:
#$ Date : Date, format: "2011-12-22" "2011-12-22" ...
#$ Time : chr "02:00:00" "02:01:00" "02:02:00" "02:03:00" ...
#$ datetime: POSIXct, format: "2011-12-22 02:00:00" "2011-12-22 02:01:00" ...
I have data for some dates with counts; the other dates, where the event I'm counting didn't occur, do not appear in this data set. In order to do some analysis, I'd like to create a data frame that includes those missing dates but with a count of 0. Here is what some data might look like:
mydates <- c("2013-10-01", "2013-10-04", "2013-10-05", "2013-10-08")
mycounts <- c(2,4,3,1)
df <- data.frame(mydates,mycounts)
I know how to create a vector with all the dates:
alldates <- seq.Date(as.Date("2013-10-01"), as.Date("2013-10-08"), "days")
What I want to do is check whether each item in alldates exists in df$mydates; if it does, then use the corresponding count from the data frame in a new vector and if it doesn't, use 0 as the count in a new vector. But I'm not having much luck. For example, this
mycount <- ifelse(alldates %in% df$mydates, df$mycounts, 0)
gives me a vector but an inaccurate one.
Thanks for any help!
mydates <- c("2013-10-01", "2013-10-04", "2013-10-05", "2013-10-08")
mycounts <- c(2,4,3,1)
df <- data.frame(mydates,mycounts)
alldates <- data.frame(
mydates = seq.Date(as.Date("2013-10-01"), as.Date("2013-10-08"), "days")
)
merge(
alldates,
df,
all = TRUE
)
Output -
mydates mycounts
1 2013-10-01 NA
2 2013-10-01 2
3 2013-10-02 NA
4 2013-10-03 NA
5 2013-10-04 NA
6 2013-10-04 4
7 2013-10-05 NA
8 2013-10-05 3
9 2013-10-06 NA
10 2013-10-07 NA
11 2013-10-08 NA
12 2013-10-08 1