I have a dataframe column that contains a mixture of date formats, for example 30/06/2020,07/2020 and 2020. I would like to convert the four digit numbers into a date (e.g. 2020 -> XX/XX/2020). I have different years, not just 2020, so I would prefer, if possible, a generic expression.
A supplementary question:
when I read the data from an excel file, I get five-digit numbers instead of dates. From what I have read, these numbers are the days passed since 1900. Hence, the actual column involves also five-digit numbers, the four-digit numbers that represent the year, and the other days.
I have dealed with that issue, but not in an optimal way. Is there a generic way to deal all these formats together? Sorry for the large post
K
Thank you all for your ideas. You are right, I need to be more specific next time. I focused on solving the problem to be honest I believe I did it.
Regarding the data, a simple illustration might be the following:
date
08/2003
12/06/2002
38054
2004
...
...
...
First, I found which elements of the dataframe column (RHO_DataBase$date) are expressed as a year (e.g. 2003) and convert them to date (e.g. 15/05/2003):
#Step 1
counter1 <- which( (!is.na(as.numeric(RHO_DataBase$date))) & (as.numeric(RHO_DataBase$date)<2030) )
for (i in counter1) {
RHO_DataBase$date[i] <- paste ("15/05/",sep="",RHO_DataBase$date[i])
}
Then, I found which elements are expressed in numeric values (days since 30/12/1899), and convert their format to day/month/year
#Step 2
counter2 <- which(!is.na(as.numeric(RHO_DataBase$date)))
for (i in counter2) {
RHO_DataBase$date[i] <- format(as.Date(as.numeric(RHO_DataBase$date[i]), origin = "1899-12-30"),'%d/%m/%Y')
}
Then, I found the elements of the column that are expressed in the other remaining format, in this case only month/year, and change it to the day/month/year using paste.
# Step 3:
counter3<-which(is.na(as.Date( RHO_DataBase$date, "%d/%m/%Y") ) )
for (i in counter3) {
RHO_DataBase$date[i] <- paste ("01/",sep="",RHO_DataBase$date[i])
}
Cheers,
K
Related
I'm very new to R, so this might seem straightforward. But I have a data frame with an original date column that has values that look like this: 4-02-91, 5-29-93 (i.e. m-d-y). I am trying to separate this column into 3, where months, days, and years are separate. Then I need to combine them again to this format 19910402, 19930529 - I need it this way in order to compare it to another dataset with similar dates.
Here is what I've been trying to do:
# Make DATE an actual date column
dataframe$DATE <- as.Date(used$DATE, format="%m-%d-%Y")
# This changes the original date column into something that looks like this: 1991-04-02, 1993-05-29
# Separate DATE into multiple columns
dataframe$year <- year(dataframe$DATE)
dataframe$month <- month(dataframe$DATE)
dataframe$day <- day(dataframe$DATE)
# Combine dates again to get string
dataframe$raster_date<-paste(dataframe$year, dataframe$month, dataframe$day, sep = "")
The last step looks great except where the months or days are single digits. It's coming out as 199142 and 1993529 instead of 19910402 and 19930529. How do I insert zeros when the month and day values are 1 digit?
Here, we can use sprintf instead of paste as the year, month, day from lubridate extracts those as numeric values and numeric class would drop the 0 padded as prefix. We add those prefix with 0s in sprintf
sprintf("%04d%02d%02d", dataframe$year, dataframe$month, dataframe$day)
I have a data frame which is a time series of meteorological measurement with monthly resolution from 1961 till 2018. I am interested in the variable that measures the monthly average temperature since I need the multi-annual average temperature for the summers.
To do this I must filter from the "DateVaraible" column the fifth and sixth digit, which are the month.
The values in time column are formatted like this
"19610701". So I need the 07(Juli) after 1961.
I start coding for 1 month for other purposes, so I did not try anything worth to mention. I guess that .grepl could do the work, but I do not know how the "matching" operator works.
So I started with this code that works.
summersmonth<- Df[DateVariable %like% "19610101" I DateVariable %like% "19610201"]
I am expecting a code like this
summermonths <- Df[DateVariable %like% "**06**" I DateVariable%like% "**07**..]
So that all entries with month digit from 06 to 09 are saved in the new dataframe summermonths.
Thanks in advance for any reply or feedback regarding my question.
Update
Thank to your answers I got the first part, which is to convert the variable in a as.date with the format "month"(Class=char)
Now I need to select months from Juni to September .
A horrible way to get the result I wanted is to do several subset and a rbind afterward.
Sommer1<-subset(Df, MonthVar == "Mai")
Sommer2<-subset(Df, MonthVar == "Juli")
Sommer3<-subset(Df, MonthVar == "September")
SummerTotal<-rbind(Sommer1,Sommer2,Sommer3)
I would be very glad to see this written in a tidy way.
Update 2 - Solution
Here is the tidy way, as here Using multiple criteria in subset function and logical operators
Veg_Seas<-subset(Df, subset = MonthVar %in% c("Mai","Juni","Juli","August","September"))
You can convert your date variable as date (format) and take the month:
allmonths <- month(as.Date(Df$DateVariable, format="%Y%m%d"))
Note that of your column has been originally imported as factor you need to convert it to character first:
allmonths <- month(as.Date(as.character(Df$DateVariable), format="%Y%m%d"))
Then you can check whether it is a summermonth:
summersmonth <- Df[allmonths %in% 6:9, ]
Example:
as.Date("20190702", format="%Y%m%d")
[1] "2019-07-02"
month(as.Date("20190702", format="%Y%m%d"))
[1] 7
We can use anydate from anytime to convert to Date class and then extract the month
library(anytime)
month(anydate(as.character(Df$DateVariable)))
I have a date variable that includes month and year where the month is a single digit (e.g. this month/year is '62017', but October is '102017'). I need the final format to be '6/1/2017'. I have tried using as.Date to convert but it will not work as %m requires two digits.
My workaround is to add a leading zero to dates that did not start with '102' (October), '112' (November), or '122' (December). I also have a few NA that I have to ignore. Code:
index <- substr(ll$Maturity.Date,1,3) != 102 & substr(ll$Maturity.Date,1,3) != 112 & substr(ll$Maturity.Date,1,3) != 122 & !is.na(ll$Maturity.Date)
ll$Maturity.Date[index] <- paste0(0,ll$Maturity.Date[index])
From here, I can convert to other formats as needed. However, I want to know if there is a better way to do this aside from hard coding as this code will break when using historical data in the 90's or data in the next century, both of which are future possibilities.
It is probably easiest to use sprintf to pad the 0s. Here is one solution:
sprintf("%06.0f", as.numeric(temp))
[1] "062017" "102017"
Then combine this with paste0 to add the day (1) and as.Date to get
as.Date(paste0(sprintf("%06.0f", as.numeric(temp)),"-1"), "%m%Y-%d")
[1] "2017-06-01" "2017-10-01"
data
temp <- c("62017", "102017")
i am trying to convert a data.frame with the amount of time in the format hours:minutes.
i found this post useful and like the simple code approach of using the POSIXlt field type.
R: Convert hours:minutes:seconds
However each column represents a month's worth of days. columns are thus uneven. When i try the code below following several other SO posts, i get zeros in the one column with fewer row values.
The code is below. Note that when run, you get all zeros for feb which has fewer data values in its rows.
rDF <- data.frame(jan=c("9:59","10:02","10:04"),
feb=c("9:59","10:02",""),
mar=c("9:59","10:02","10:04"),stringsAsFactors = FALSE)
for (i in 1:3) {
Res <- as.POSIXlt(paste(Sys.Date(), rDF[,i]))
rDF[,i] <- Res$hour + Res$min/60
}
Thank you for any suggestions to fix this issue. I'm open to a more efficient approach as well.
Best,
Leah
You could try using the package lubridate. Here we are converting your data row by row to hour-minute format (using hm), then extracting the hours, and adding the minutes divided by 60:
library(lubridate)
rDF[] <- lapply(rDF, function(x){hm(x)$hour + hm(x)$minute/60})
jan feb mar
1 9.983333 9.983333 9.983333
2 10.033333 10.033333 10.033333
3 10.066667 NA 10.066667
This could easily be achieved with package lubridate's hm:
library(lubridate)
temp<-lapply(rDF,hm)
NewDF<-data.frame(jan=temp[[1]],feb=temp[[2]],mar=temp[[3]])
How to convert between year,month,day and dates in R?
I know one can do this via strings, but I would prefer to avoid converting to strings, partly because maybe there is a performance hit?, and partly because I worry about regionalization issues, where some of the world uses "year-month-day" and some uses "year-day-month".
It looks like ISODate provides the direction year,month,day -> DateTime , although it does first converts the number to a string, so if there is a way that doesn't go via a string then I prefer.
I couldn't find anything that goes the other way, from datetimes to numerical values? I would prefer not needing to use strsplit or things like that.
Edit: just to be clear, what I have is, a data frame which looks like:
year month day hour somevalue
2004 1 1 1 1515353
2004 1 1 2 3513535
....
I want to be able to freely convert to this format:
time(hour units) somevalue
1 1515353
2 3513535
....
... and also be able to go back again.
Edit: to clear up some confusion on what 'time' (hour units) means, ultimately what I did was, and using information from How to find the difference between two dates in hours in R?:
forwards direction:
lh$time <- as.numeric( difftime(ISOdate(lh$year,lh$month,lh$day,lh$hour), ISOdate(2004,1,1,0), units="hours"))
lh$year <- NULL; lh$month <- NULL; lh$day <- NULL; lh$hour <- NULL
backwards direction:
... well, I didnt do backwards yet, but I imagine something like:
create difftime object out of lh$time (somehow...)
add ISOdate(2004,1,1,0) to difftime object
use one of the solution below to get the year,month,day, hour back
I suppose in the future, I could ask the exact problem I'm trying to solve, but I was trying to factorize my specific problem into generic reusable questions, but maybe that was a mistake?
Because there are so many ways in which a date can be passed in from files, databases etc and for the reason you mention of just being written in different orders or with different separators, representing the inputted date as a character string is a convenient and useful solution. R doesn't hold the actual dates as strings and you don't need to process them as strings to work with them.
Internally R is using the operating system to do these things in a standard way. You don't need to manipulate strings at all - just perhaps convert some things from character to their numerical equivalent. For example, it is quite easy to wrap up both operations (forwards and backwards) in simple functions you can deploy.
toDate <- function(year, month, day) {
ISOdate(year, month, day)
}
toNumerics <- function(Date) {
stopifnot(inherits(Date, c("Date", "POSIXt")))
day <- as.numeric(strftime(Date, format = "%d"))
month <- as.numeric(strftime(Date, format = "%m"))
year <- as.numeric(strftime(Date, format = "%Y"))
list(year = year, month = month, day = day)
}
I forego the a single call to strptime() and subsequent splitting on a separation character because you don't like that kind of manipulation.
> toDate(2004, 12, 21)
[1] "2004-12-21 12:00:00 GMT"
> toNumerics(toDate(2004, 12, 21))
$year
[1] 2004
$month
[1] 12
$day
[1] 21
Internally R's datetime code works well and is well tested and robust if a bit complex in places because of timezone issues etc. I find the idiom used in toNumerics() more intuitive than having a date time as a list and remembering which elements are 0-based. Building on the functionality provided would seem easier than trying to avoid string conversions etc.
I'm a bit late to the party, but one other way to convert from integers to date is the lubridate::make_date function. See the example below from R for Data Science:
library(lubridate)
library(nycflights13)
library(tidyverse)
a <- flights %>%
mutate(date = make_date(year, month, day))
Found one solution for going from date to year,month,day.
Let's say we have a date object, that we'll create here using ISOdate:
somedate <- ISOdate(2004,12,21)
Then, we can get the numerical components of this as follows:
unclass(as.POSIXlt(somedate))
Gives:
$sec
[1] 0
$min
[1] 0
$hour
[1] 12
$mday
[1] 21
$mon
[1] 11
$year
[1] 104
Then one can get what one wants for example:
unclass(as.POSIXlt(somedate))$mon
Note that $year is [actual year] - 1900, month is 0-based, mday is 1-based (as per the POSIX standard)