ifelse function problem defining the correct condition [duplicate] - r

This question already has answers here:
Converting year and month ("yyyy-mm" format) to a date?
(9 answers)
Parse dates in format dmy together with dmY using parse_date_time
(2 answers)
Closed 2 years ago.
I could not find an answer to my following question via the search function:
Why does the given ifelse() condition not work the way I intend?
I got a dataset that wrongly had an open-text field for a date and so I got a variety of ways people filled the date in. By now I really got close to something useable but my intended next step of making every mm/yy entry that was before the year 2000 a mm/19yy entry via the ifelse function does not give me a correct result:
Dates <- c("10/19", "04/2019", "O5/1992", "03/92")
ifelse(str_length(Dates)==5 & str_sub(Dates,4,5)>20, stri_sub(Dates, 4, 3) <- 19, Dates)
The result looks like this:
[1] "10/1919" "04/192019" "O5/191992" "19"
While I would want it to look like this:
1] "10/19" "04/2019" "O5/1992" "03/1992"
Any help is highy appreciated!

This does not give the expected output you have shown but I think it is better to turn the dates into standard dates so that it easier to use them.
Dates <- c("10/19", "04/2019", "O5/1992", "03/92")
new_Date <- as.Date(lubridate::parse_date_time(paste0('1/', Dates), c('dmY', 'dmy')))
new_Date
#[1] "2019-10-01" "2019-04-01" "1992-05-01" "1992-03-01"
Then you can format these dates the way you want :
format(new_Date, '%Y-%m')
#[1] "2019-10" "2019-04" "1992-05" "1992-03"

Rather than do this in a single expression, I recommend splitting it apart for readability:
parts = str_split(dates, '/')
year = as.integer(map_chr(parts, `[[`, 2L))
months = as.integer(map_chr(parts, `[[`, 1L))
result = ifelse(
str_length(dates) == 5L & year > 20 & year < 100,
paste0(months, '/', '19', as.character(year)),
dates
)
This code also handles data type conversions explicitly, which makes the code more expressive and helps finding errors — for instance, your third date accidentally uses O (capital o) instead of 0, which I only noticed because my code complains about the invalid conversion.
Fundamentally I also agree with Ronak’s answer: the output you seem to want is inconsistent and should generally be avoided in favour of a uniform format, which incidentally leads to much simpler code, as Ronak’s answer shows.

Related

Subset a dataframe based on numerical values of a string inside a variable

I have a data frame which is a time series of meteorological measurement with monthly resolution from 1961 till 2018. I am interested in the variable that measures the monthly average temperature since I need the multi-annual average temperature for the summers.
To do this I must filter from the "DateVaraible" column the fifth and sixth digit, which are the month.
The values in time column are formatted like this
"19610701". So I need the 07(Juli) after 1961.
I start coding for 1 month for other purposes, so I did not try anything worth to mention. I guess that .grepl could do the work, but I do not know how the "matching" operator works.
So I started with this code that works.
summersmonth<- Df[DateVariable %like% "19610101" I DateVariable %like% "19610201"]
I am expecting a code like this
summermonths <- Df[DateVariable %like% "**06**" I DateVariable%like% "**07**..]
So that all entries with month digit from 06 to 09 are saved in the new dataframe summermonths.
Thanks in advance for any reply or feedback regarding my question.
Update
Thank to your answers I got the first part, which is to convert the variable in a as.date with the format "month"(Class=char)
Now I need to select months from Juni to September .
A horrible way to get the result I wanted is to do several subset and a rbind afterward.
Sommer1<-subset(Df, MonthVar == "Mai")
Sommer2<-subset(Df, MonthVar == "Juli")
Sommer3<-subset(Df, MonthVar == "September")
SummerTotal<-rbind(Sommer1,Sommer2,Sommer3)
I would be very glad to see this written in a tidy way.
Update 2 - Solution
Here is the tidy way, as here Using multiple criteria in subset function and logical operators
Veg_Seas<-subset(Df, subset = MonthVar %in% c("Mai","Juni","Juli","August","September"))
You can convert your date variable as date (format) and take the month:
allmonths <- month(as.Date(Df$DateVariable, format="%Y%m%d"))
Note that of your column has been originally imported as factor you need to convert it to character first:
allmonths <- month(as.Date(as.character(Df$DateVariable), format="%Y%m%d"))
Then you can check whether it is a summermonth:
summersmonth <- Df[allmonths %in% 6:9, ]
Example:
as.Date("20190702", format="%Y%m%d")
[1] "2019-07-02"
month(as.Date("20190702", format="%Y%m%d"))
[1] 7
We can use anydate from anytime to convert to Date class and then extract the month
library(anytime)
month(anydate(as.character(Df$DateVariable)))

Converting an integer date [duplicate]

This question already has answers here:
How do I convert date to number of days in R
(7 answers)
Closed 4 years ago.
My question doesn't have to do with my own dataset, but since I'm new to R, I wanted to make sure I knew how to work with dates, so I'm searching up the different ways to manipulate and compare dates in R.
I recently read an answer to a question regarding converting a date into an integer date using the as.numeric () function. Here is the answer that was accepted: https://stackoverflow.com/a/8215581/10864249
So from that answer, my understanding is that the date was converted into seconds.
Why would anyone want to use the as.numeric() function if we're going to only get seconds?
Can we convert the integer date into a smaller integer, like # of days by just dividing by 365.25 or by months even by dividing by 12, then? I assume it'd be easier to compare dates that way, rather than in seconds.
Thanks!!
coercing a date object into a numeric object will give you "the number of days since 01/01/1970"
my_date = as.Date('2015-01-01')
my_date
#[1] "2015-01-01"
class(my_date)
# [1] "Date"
as.numeric(my_date)
# [1] 16436

%b-%Y date conversion gives NA [duplicate]

This question already has answers here:
Converting year and month ("yyyy-mm" format) to a date?
(9 answers)
Closed 4 years ago.
I am trying to convert character strings to Dates in R. These are examples of the character strings:
"Aug-1973" "Aug-1974" "Aug-1975" "Aug-1976" "Aug-1977"
I run the following line on date strings similar to the ones above:
exportsDF$Date <- as.Date(as.character(exportsDF$Date), format = "%b-%Y")
This returns NAs for all values. The step where I convert the dates column to characters returns the correct values. Any ideas why the as.Date() command is not working? There are no NAs or missing values in the data. Every value has a "%b-%Y" format.
Any help is appreciated!
The date format needs a day as well, so you could add an arbitrary day of the month. Here, I've chosen the first day:
dates <- c("Aug-1973", "Aug-1974", "Aug-1975", "Aug-1976", "Aug-1977")
res <- as.Date(paste0("01-", dates), format = "%d-%b-%Y")
print(res)
#[1] "1973-08-01" "1974-08-01" "1975-08-01" "1976-08-01" "1977-08-01"
The reason is that the underlying Date data type is an integer counting the days since some reference day. Specifically, the number of days since 1970-01-01. See ?Date.
The Date object res can now be displayed as you please via
format(res, "%B-%Y")
#[1] "August-1973" "August-1974" "August-1975" "August-1976" "August-1977"
or similar.
The month(res) function and its cousins are also helpful. See ?month.

How to convert in both directions between year,month,day and dates in R?

How to convert between year,month,day and dates in R?
I know one can do this via strings, but I would prefer to avoid converting to strings, partly because maybe there is a performance hit?, and partly because I worry about regionalization issues, where some of the world uses "year-month-day" and some uses "year-day-month".
It looks like ISODate provides the direction year,month,day -> DateTime , although it does first converts the number to a string, so if there is a way that doesn't go via a string then I prefer.
I couldn't find anything that goes the other way, from datetimes to numerical values? I would prefer not needing to use strsplit or things like that.
Edit: just to be clear, what I have is, a data frame which looks like:
year month day hour somevalue
2004 1 1 1 1515353
2004 1 1 2 3513535
....
I want to be able to freely convert to this format:
time(hour units) somevalue
1 1515353
2 3513535
....
... and also be able to go back again.
Edit: to clear up some confusion on what 'time' (hour units) means, ultimately what I did was, and using information from How to find the difference between two dates in hours in R?:
forwards direction:
lh$time <- as.numeric( difftime(ISOdate(lh$year,lh$month,lh$day,lh$hour), ISOdate(2004,1,1,0), units="hours"))
lh$year <- NULL; lh$month <- NULL; lh$day <- NULL; lh$hour <- NULL
backwards direction:
... well, I didnt do backwards yet, but I imagine something like:
create difftime object out of lh$time (somehow...)
add ISOdate(2004,1,1,0) to difftime object
use one of the solution below to get the year,month,day, hour back
I suppose in the future, I could ask the exact problem I'm trying to solve, but I was trying to factorize my specific problem into generic reusable questions, but maybe that was a mistake?
Because there are so many ways in which a date can be passed in from files, databases etc and for the reason you mention of just being written in different orders or with different separators, representing the inputted date as a character string is a convenient and useful solution. R doesn't hold the actual dates as strings and you don't need to process them as strings to work with them.
Internally R is using the operating system to do these things in a standard way. You don't need to manipulate strings at all - just perhaps convert some things from character to their numerical equivalent. For example, it is quite easy to wrap up both operations (forwards and backwards) in simple functions you can deploy.
toDate <- function(year, month, day) {
ISOdate(year, month, day)
}
toNumerics <- function(Date) {
stopifnot(inherits(Date, c("Date", "POSIXt")))
day <- as.numeric(strftime(Date, format = "%d"))
month <- as.numeric(strftime(Date, format = "%m"))
year <- as.numeric(strftime(Date, format = "%Y"))
list(year = year, month = month, day = day)
}
I forego the a single call to strptime() and subsequent splitting on a separation character because you don't like that kind of manipulation.
> toDate(2004, 12, 21)
[1] "2004-12-21 12:00:00 GMT"
> toNumerics(toDate(2004, 12, 21))
$year
[1] 2004
$month
[1] 12
$day
[1] 21
Internally R's datetime code works well and is well tested and robust if a bit complex in places because of timezone issues etc. I find the idiom used in toNumerics() more intuitive than having a date time as a list and remembering which elements are 0-based. Building on the functionality provided would seem easier than trying to avoid string conversions etc.
I'm a bit late to the party, but one other way to convert from integers to date is the lubridate::make_date function. See the example below from R for Data Science:
library(lubridate)
library(nycflights13)
library(tidyverse)
a <- flights %>%
mutate(date = make_date(year, month, day))
Found one solution for going from date to year,month,day.
Let's say we have a date object, that we'll create here using ISOdate:
somedate <- ISOdate(2004,12,21)
Then, we can get the numerical components of this as follows:
unclass(as.POSIXlt(somedate))
Gives:
$sec
[1] 0
$min
[1] 0
$hour
[1] 12
$mday
[1] 21
$mon
[1] 11
$year
[1] 104
Then one can get what one wants for example:
unclass(as.POSIXlt(somedate))$mon
Note that $year is [actual year] - 1900, month is 0-based, mday is 1-based (as per the POSIX standard)

Can data.table functions manipulate date and time columns on the fly?

I have started using data.table. Indeed it is very fast and quite nice syntax. I am having trouble with dates. I like to use lubridate. In many of my data sets I have dates or dates and times and have used lubridate to manipulate them. Lubridate stores the instant as a POSIX class. I have seen answers here that create new variables for instance just to get the year eg. 2005. I do not like that. There are times that I will be analyzing by year and other times by quarter and other times by month and other times by durations. I would like to do something simple such as this
mydatatable[,length(medical.record.number),by=year(date.of.service)]
that should give me the number of patient encounters in a given year. The by function is not working.
Error in names(byval) = as.character(bysuborig) :
'names' attribute [2] must be the same length as the vector [1]
Can you please point me to vignettes where data.tables is used with dates and where manipulations and categorizations of those dates are done on the fly.
This uses one of the examples in the help(IDateTime) page. It shows that you canc hange to syntax for the by=argument to a character value in the form " = " or (after #Matthew Dowle's comment below) you can try to use the functional form that you were using (although I have not been able to get it to work myself. I did get the preferred form: by=list(wday=wday(idate)) to work.) Note that the key creation assumes an IDateTime class since there is no idate or itime variable. Those are attributes of the class
datetime <- seq(as.POSIXct("2001-01-01"), as.POSIXct("2001-01-03"), by = "5 hour")
(af <- data.table(IDateTime(datetime), a = rep(1:2, 5), key = "a,idate,itime"))
af[, length(a), by = "wday = wday(idate)"]
wday V1
[1,] 2 4
[2,] 3 5
[3,] 4 1

Resources