select dates between ranges in r - r

In dataset, I have date variable that has this format : "2020-01-01"
This variable is stored as "Date" format
This code works:
dataset[which(dataset$date_variable > 2020-01-01),]
This code also works:
dataset[which(dataset$date_variable > 2020-01-19),]
But together I get no output:
dataset[which(dataset$date_variable > 2020-01-01 & dataset$date_variable < 2020-01-19),]
# produce empty result
How I can correct this code? How in R to subset between date range? I should maybe convert variable type format?

2018-01-25 means 2018 minus 1 minus 25. Surround the dates with quotes since Date objects can be compared to character representations. Using the reproducible input in the Note at the end we have the following.
x[x > "2018-01-24" & x < "2018-01-26"]
## [1] "2018-01-25"
Note
x <- structure(c(17556, 17555, 17554), class = "Date")
x
## [1] "2018-01-25" "2018-01-24" "2018-01-23"

Related

data frame with mixed date format

I would like to change all the mixed date format into one format for example d-m-y
here is the data frame
x <- data.frame("Name" = c("A","B","C","D","E"), "Birthdate" = c("36085.0","2001-sep-12","Feb-18-2005","05/27/84", "2020-6-25"))
I hv tried using this code down here, but it gives NAs
newdateformat <- as.Date(x$Birthdate,
format = "%m%d%y", origin = "2020-6-25")
newdateformat
Then I tried using parse, but it also gives NAs which means it failed to parse
require(lubridate)
parse_date_time(my_data$Birthdate, orders = c("ymd", "mdy"))
[1] NA NA "2001-09-12 UTC" NA
[5] "2005-02-18 UTC"
and I also could find what is the format for the first date in the data frame which is "36085.0"
i did found this code but still couldn't understand what the number means and what is the "origin" means
dates <- c(30829, 38540)
betterDates <- as.Date(dates,
origin = "1899-12-30")
p/s : I'm quite new to R, so i appreciate if you can use an easier explanation thank youuuuu
You should parse each format separately. For each format, select the relevant rows with a regular expression and transform only those rows, then move on the the next format. I'll give the answer with data.table instead of data.frame because I've forgotten how to use data.frame.
library(lubridate)
library(data.table)
x = data.table("Name" = c("A","B","C","D","E"),
"Birthdate" = c("36085.0","2001-sep-12","Feb-18-2005","05/27/84", "2020-6-25"))
# or use setDT(x) to convert an existing data.frame to a data.table
# handle dates like "2001-sep-12" and "2020-6-25"
# this regex matches strings beginning with four numbers and then a dash
x[grepl('^[0-9]{4}-',Birthdate),Birthdate1:=ymd(Birthdate)]
# handle dates like "36085.0": days since 1904 (or 1900)
# see https://learn.microsoft.com/en-us/office/troubleshoot/excel/1900-and-1904-date-system
# this regex matches strings that only have numeric characters and .
x[grepl('^[0-9\\.]+$',Birthdate),Birthdate1:=as.Date(as.numeric(Birthdate),origin='1904-01-01')]
# assume the rest are like "Feb-18-2005" and "05/27/84" and handle those
x[is.na(Birthdate1),Birthdate1:=mdy(Birthdate)]
# result
> x
Name Birthdate Birthdate1
1: A 36085.0 2002-10-18
2: B 2001-sep-12 2001-09-12
3: C Feb-18-2005 2005-02-18
4: D 05/27/84 1984-05-27
5: E 2020-6-25 2020-06-25

as.Date function gives different result in a for loop

Slight problem where my as.Date function gives a different result when I put it in a for loop. I'm looking in a folder with subfolders (per date) that contain images. I build date_list to organize all the dates (for plotting options in a later stage). The Julian Day starts from the first of January of the year, so because I have 4 years of date, the year must be flexible.
# Set up list with 4 columns and counter Q. jan is used to set all dates to the first of january
date_list <- outer(1:52, 1:4)
q = 1
jan <- "-01-01"
for (scene in folders){
year <- as.numeric(substr(scene, start=10, stop=13))
day <- as.numeric(substr(scene, start=14, stop=16))
datum <- paste(year, day, sep='_')
date_list[q, 1] <- datum
date_list[q, 2] <- year
date_list[q, 3] <- day
date_list[q, 4] <- as.Date(day, origin = as.Date(paste(year,jan, sep="")))
q = q+1
}
Output final row:
[52,] "2016_267" "2016" "267" "17068"
What am i missing in date_list[q, 4] that doesn't transfer my integer to a date?
running the following code does work, but due to the large amount of scenes and folders I like to automate this:
as.Date(day, origin = as.Date(paste(year,jan, sep="")))
Thank you for your time!
Well, I assume this would answer your first question:
date_list[q, 4] <- as.character(as.Date(datum,format="%Y_%j"))
as.Date accept a format argument, (the %Y and %j are documented in strptime), the %jis the julian day, this is a little easier to read than using origin and multiple paste calls.
Your problem is actually linked to what a Date object is:
> dput(as.Date("2016-01-10"))
structure(16810, class = "Date")
When entered into a matrix (your date_list) it is coerced to character w
without special treatment before like this:
> d<-as.Date("2016-01-10")
> class(d)<-"character"
> d
[1] "16810"
Hence you get only the number of days since 1970-01-01. When you ask for the date as character representation with as.character, it gives the correct value because the Date class as a as.character method which first compute the date in human format before returning a character value.
Now if I understood well your problem I would go this way:
First create a function to work on one string:
name_to_list <- function(name) {
dpart <- substr(name, start=10, stop=16)
date <- as.POSIXlt(dpart, format="%Y%j")
c("datum"=paste(date$year+1900,date$yday,sep="_"), "year"=date$year+1900, "julian_day"=date$yday, "date"=as.character(date) )
}
this function just get your substring, and then convert it to POSIXlt class, which give us julian day, year and date in one pass. as the year is stored as integer since 1900 (could be negative), we have to add 1900 when storing the year in the fields.
Then if your folders variable is a vector of string:
lapply(folders,name_to_list)
wich for folders=c("LC81730382016267LGN00","LC81730382016287LGN00","LC81730382016167LGN00") gives:
[[1]]
datum year julian_day date
"2016_266" "2016" "266" "2016-09-23"
[[2]]
datum year julian_day date
"2016_286" "2016" "286" "2016-10-13"
[[3]]
datum year julian_day date
"2016_166" "2016" "166" "2016-06-15"
Do you mean to output your day as 3 numbers? Should it not be 2 numbers?
day <- as.numeric(substr(scene, start=15, stop=16))
or
day <- as.numeric(substr(scene, start=14, stop=15))
That could at least be part of the issue. Providing an example of what typical values of "scene" are would be helpful here.

Does R use 0-indexing for dates?

R uses the date "1970-01-01" as an origin. Does it make an exception from its typical 1-indexing to index dates with 0-indexing?
> x <- as.Date("1970-01-01")
> y <- as.Date("1970-01-02")
> unclass(x)
[1] 0
> unclass(y)
[1] 1
No. This is not an indexing thing. "Dates are represented as the number of days since 1970-01-01" (From the ?Date help page). Also note
unclass(as.Date("1969-12-31")) == -1
So it's not an index, it's a difference from a sentinel value. There's no underlying vector here.

R use of '\' in a string

I am experiment with R and came across an issue I don't fully understand.
dates = c("03-19-76", "04/19/76", as.character("04\19\76"), "05.19.76", "060766")
dates
[1] "03-19-76" "04/19/76" "04\0019>" "05.19.76" "060766"
Why should the third date be interpreted and what sort of interpretation is taking place. I also got this output when I left out the as.character function.
Thanks
Echoing the comments, make sure to escape backslashes in strings.
dates = c("03-19-76", "04/19/76", "04\\19\\76", "05.19.76", "060766")
> dates
[1] "03-19-76" "04/19/76" "04\\19\\76" "05.19.76" "060766"
Now that you've got the dates stored, there's actually a lot of built in functions you can use with dates. Dates even have their own object types! To do so use as.Date. Since you're using nonstandard date formats, you have to tell R how you've formatted them.
> as.Date(dates[1], "%m-%d-%y")
[1] "1976-03-19"
> as.Date(dates[2], "%m/%d/%y")
[1] "1976-04-19"
> as.Date("20\\10\\1999", "%d\\%m\\%Y")
[1] "1999-10-20"
a <- as.Date(dates[1], "%m-%d-%y")
b <- as.Date(dates[2], "%m/%d/%y")
> b - a
Time difference of 31 days
d <- as.numeric(b-a)
> d
[1] 31
> a + d^2
[1] "1978-11-05"
Note that since you're using 2-digit years, you use %y. If you used 4-digit years, you'd use %Y. If you forget, you'll get oddities like this:
> as.Date("03/14/2001", "%m/%d/%y")
[1] "2020-03-14"
> as.Date("03/14/10", "%m/%d/%Y")
[1] "0010-03-14"

Convert data from csv file into "xts" object

I have got CSV files which has the Date in the following format:
25-Aug-2004
I want to read it as an "xts" object so as to use the function "periodReturn" in quantmod package.
Can I use the following file for the function?
Symbol Series Date Prev.Close Open.Price High.Price Low.Price
1 XXX EQ 25-Aug-2004 850.00 1198.70 1198.70 979.00
2 XXX EQ 26-Aug-2004 987.95 992.00 997.00 975.30
Guide me with the same.
Unfortunately I can't speak for the ts part, but this is how you can convert your dates to a proper format that can be read by other functions as dates (or time).
You can import your data into a data.frame as usual (see here if you've missed it). Then, you can convert your Date column into a POSIXlt (POSIXt) class using strptime function.
nibha <- "25-Aug-2004" # this should be your imported column
lct <- Sys.getlocale("LC_TIME"); Sys.setlocale("LC_TIME", "C") #temporarily change locale to C if you happen go get NAs
strptime(nibha, format = "%d-%b-%Y")
Sys.setlocale("LC_TIME", lct) #revert back to your locale
Try this. We get rid of the nuisance columns and specify the format of the time index, then convert to xts and apply the dailyReturn function:
Lines <- "Symbol Series Date Prev.Close Open.Price High.Price Low.Price
1 XXX EQ 25-Aug-2004 850.00 1198.70 1198.70 979.00
2 XXX EQ 26-Aug-2004 987.95 992.00 997.00 975.30"
library(quantmod) # this also pulls in xts & zoo
z <- read.zoo(textConnection(Lines), format = "%d-%b-%Y",
colClasses = rep(c(NA, "NULL", NA), c(1, 2, 5)))
x <- as.xts(z)
dailyReturn(x)
Of course, textConnection(Lines) is just to keep the example self contained and in reality would be replaced with something like "myfile.dat".

Resources