converting hours:minutes with unevern column lengths - zeros - r

i am trying to convert a data.frame with the amount of time in the format hours:minutes.
i found this post useful and like the simple code approach of using the POSIXlt field type.
R: Convert hours:minutes:seconds
However each column represents a month's worth of days. columns are thus uneven. When i try the code below following several other SO posts, i get zeros in the one column with fewer row values.
The code is below. Note that when run, you get all zeros for feb which has fewer data values in its rows.
rDF <- data.frame(jan=c("9:59","10:02","10:04"),
feb=c("9:59","10:02",""),
mar=c("9:59","10:02","10:04"),stringsAsFactors = FALSE)
for (i in 1:3) {
Res <- as.POSIXlt(paste(Sys.Date(), rDF[,i]))
rDF[,i] <- Res$hour + Res$min/60
}
Thank you for any suggestions to fix this issue. I'm open to a more efficient approach as well.
Best,
Leah

You could try using the package lubridate. Here we are converting your data row by row to hour-minute format (using hm), then extracting the hours, and adding the minutes divided by 60:
library(lubridate)
rDF[] <- lapply(rDF, function(x){hm(x)$hour + hm(x)$minute/60})
jan feb mar
1 9.983333 9.983333 9.983333
2 10.033333 10.033333 10.033333
3 10.066667 NA 10.066667

This could easily be achieved with package lubridate's hm:
library(lubridate)
temp<-lapply(rDF,hm)
NewDF<-data.frame(jan=temp[[1]],feb=temp[[2]],mar=temp[[3]])

Related

Find four digit numbers and convert them to calendary date in R

I have a dataframe column that contains a mixture of date formats, for example 30/06/2020,07/2020 and 2020. I would like to convert the four digit numbers into a date (e.g. 2020 -> XX/XX/2020). I have different years, not just 2020, so I would prefer, if possible, a generic expression.
A supplementary question:
when I read the data from an excel file, I get five-digit numbers instead of dates. From what I have read, these numbers are the days passed since 1900. Hence, the actual column involves also five-digit numbers, the four-digit numbers that represent the year, and the other days.
I have dealed with that issue, but not in an optimal way. Is there a generic way to deal all these formats together? Sorry for the large post
K
Thank you all for your ideas. You are right, I need to be more specific next time. I focused on solving the problem to be honest I believe I did it.
Regarding the data, a simple illustration might be the following:
date
08/2003
12/06/2002
38054
2004
...
...
...
First, I found which elements of the dataframe column (RHO_DataBase$date) are expressed as a year (e.g. 2003) and convert them to date (e.g. 15/05/2003):
#Step 1
counter1 <- which( (!is.na(as.numeric(RHO_DataBase$date))) & (as.numeric(RHO_DataBase$date)<2030) )
for (i in counter1) {
RHO_DataBase$date[i] <- paste ("15/05/",sep="",RHO_DataBase$date[i])
}
Then, I found which elements are expressed in numeric values (days since 30/12/1899), and convert their format to day/month/year
#Step 2
counter2 <- which(!is.na(as.numeric(RHO_DataBase$date)))
for (i in counter2) {
RHO_DataBase$date[i] <- format(as.Date(as.numeric(RHO_DataBase$date[i]), origin = "1899-12-30"),'%d/%m/%Y')
}
Then, I found the elements of the column that are expressed in the other remaining format, in this case only month/year, and change it to the day/month/year using paste.
# Step 3:
counter3<-which(is.na(as.Date( RHO_DataBase$date, "%d/%m/%Y") ) )
for (i in counter3) {
RHO_DataBase$date[i] <- paste ("01/",sep="",RHO_DataBase$date[i])
}
Cheers,
K

How to convert date format to total number of days?

I'm trying to convert a yyyy-mm-dd data in a data frame to the total number of days from some date to put in my survival function.
I've already tried as_date() and grepl(), but I can't seem to get it to work since there are either too many NA values in my data frame or I'm doing something wrong.
Ref.date <- ymd("1941-08-24")
Date.MI <- ymd("Date.MI")
Day <- as.numeric(difftime(Date.MI, Ref.date))
I expect just the total number of days since 1941-08-24.
How do I solve the problem?
difftime() gives you the option to specify the units for the resulting output. So maybe try something like this
as.numeric(difftime(as.POSIXct("1941-08-25"), as.POSIXct("1941-08-24"), units = c("days")))
The way to solve it:
as.numeric(difftime(as.POSIXct(Date.MI[[1]]), as.POSIXct("1941-08-24"), units = c("days")))
There were square brackets needed since that refers to the first column.

Subset a dataframe based on numerical values of a string inside a variable

I have a data frame which is a time series of meteorological measurement with monthly resolution from 1961 till 2018. I am interested in the variable that measures the monthly average temperature since I need the multi-annual average temperature for the summers.
To do this I must filter from the "DateVaraible" column the fifth and sixth digit, which are the month.
The values in time column are formatted like this
"19610701". So I need the 07(Juli) after 1961.
I start coding for 1 month for other purposes, so I did not try anything worth to mention. I guess that .grepl could do the work, but I do not know how the "matching" operator works.
So I started with this code that works.
summersmonth<- Df[DateVariable %like% "19610101" I DateVariable %like% "19610201"]
I am expecting a code like this
summermonths <- Df[DateVariable %like% "**06**" I DateVariable%like% "**07**..]
So that all entries with month digit from 06 to 09 are saved in the new dataframe summermonths.
Thanks in advance for any reply or feedback regarding my question.
Update
Thank to your answers I got the first part, which is to convert the variable in a as.date with the format "month"(Class=char)
Now I need to select months from Juni to September .
A horrible way to get the result I wanted is to do several subset and a rbind afterward.
Sommer1<-subset(Df, MonthVar == "Mai")
Sommer2<-subset(Df, MonthVar == "Juli")
Sommer3<-subset(Df, MonthVar == "September")
SummerTotal<-rbind(Sommer1,Sommer2,Sommer3)
I would be very glad to see this written in a tidy way.
Update 2 - Solution
Here is the tidy way, as here Using multiple criteria in subset function and logical operators
Veg_Seas<-subset(Df, subset = MonthVar %in% c("Mai","Juni","Juli","August","September"))
You can convert your date variable as date (format) and take the month:
allmonths <- month(as.Date(Df$DateVariable, format="%Y%m%d"))
Note that of your column has been originally imported as factor you need to convert it to character first:
allmonths <- month(as.Date(as.character(Df$DateVariable), format="%Y%m%d"))
Then you can check whether it is a summermonth:
summersmonth <- Df[allmonths %in% 6:9, ]
Example:
as.Date("20190702", format="%Y%m%d")
[1] "2019-07-02"
month(as.Date("20190702", format="%Y%m%d"))
[1] 7
We can use anydate from anytime to convert to Date class and then extract the month
library(anytime)
month(anydate(as.character(Df$DateVariable)))

Converting variables to dates in R

There is a matrix with two columns: years and months
dates.m
1492 April
1492 August
1492 October
How to convert these two variables into a date format variable (for example mm/yyyy)? Thanks.
How about this:
dates.m<-data.frame(dates.m,stringsAsFactors=F)
dfl=split(dates.m,1:nrow(dates.m))
dates.m$data=do.call(rbind,lapply(dfl,function(rn)
paste(as.Date(paste(paste(rn,collapse = "/"),"01",sep="/"),"%Y/%b/%d"),sep="")))
dates.m$data
[,1]
1 "1492-04-01"
2 "1492-08-01"
3 "1492-10-01"
When it comes to dates I love working with the lubridate package. here the example code to solve this, if you have one column containing the data (assuming your data is ordered in the way of year-month-day - change the order of the letters in the function name otherwise):
require(lubridate)
df$date<-ymd(df$dates.m)
or if you have them in seperate columns:
require(lubridate)
require(stringr)
df$date<-ymd(str_c(as.character(df$Year),as.character(df$Month),
as.character(df$Day),sep="-"))

Data aggregation loop in R

I am facing a problem concerning aggregating my data to daily data.
I have a data frame where NAs have been removed (Link of picture of data is given below). Data has been collected 3 times a day, but sometimes due to NAs, there is just 1 or 2 entries per day; some days data is missing completely.
I am now interested in calculating the daily mean of "dist": this means summing up the data of "dist" of one day and dividing it by number of entries per day (so 3 if there is no data missing that day). I would like to do this via a loop.
How can I do this with a loop? The problem is that sometimes I have 3 entries per day and sometimes just 2 or even 1. I would like to tell R that for every day, it should sum up "dist" and divide it by the number of entries that are available for every day.
I just have no idea how to formulate a for loop for this purpose. I would really appreciate if you could give me any advice on that problem. Thanks for your efforts and kind regards,
Jan
Data frame: http://www.pic-upload.de/view-11435581/Data_loop.jpg.html
Edit: I used aggregate and tapply as suggested, however, the mean value of the data was not really calculated:
Group.1 x
1 2006-10-06 12:00:00 636.5395
2 2006-10-06 20:00:00 859.0109
3 2006-10-07 04:00:00 301.8548
4 2006-10-07 12:00:00 649.3357
5 2006-10-07 20:00:00 944.8272
6 2006-10-08 04:00:00 136.7393
7 2006-10-08 12:00:00 360.9560
8 2006-10-08 20:00:00 NaN
The code used was:
dates<-Dis_sub$date
distance<-Dis_sub$dist
aggregate(distance,list(dates),mean,na.rm=TRUE)
tapply(distance,dates,mean,na.rm=TRUE)
Don't use a loop. Use R. Some example data :
dates <- rep(seq(as.Date("2001-01-05"),
as.Date("2001-01-20"),
by="day"),
each=3)
values <- rep(1:16,each=3)
values[c(4,5,6,10,14,15,30)] <- NA
and any of :
aggregate(values,list(dates),mean,na.rm=TRUE)
tapply(values,dates,mean,na.rm=TRUE)
gives you what you want. See also ?aggregate and ?tapply.
If you want a dataframe back, you can look at the package plyr :
Data <- as.data.frame(dates,values)
require(plyr)
ddply(data,"dates",mean,na.rm=TRUE)
Keep in mind that ddply is not fully supporting the date format (yet).
Look at the data.table package especially if your data is huge. Here is some code that calculates the mean of dist by day.
library(data.table)
dt = data.table(Data)
Data[,list(avg_dist = mean(dist, na.rm = T)),'date']
It looks like your main problem is that your date field has times attached. The first thing you need to do is create a column that has just the date using something like
Dis_sub$date_only <- as.Date(Dis_sub$date)
Then using Joris Meys' solution (which is the right way to do it) should work.
However if for some reason you really want to use a loop you could try something like
newFrame <- data.frame()
for d in unique(Dis_sub$date){
meanDist <- mean(Dis_sub$dist[Dis_sub$date==d],na.rm=TRUE)
newFrame <- rbind(newFrame,c(d,meanDist))
}
But keep in mind that this will be slow and memory-inefficient.

Resources