How to convert date format to total number of days? - r

I'm trying to convert a yyyy-mm-dd data in a data frame to the total number of days from some date to put in my survival function.
I've already tried as_date() and grepl(), but I can't seem to get it to work since there are either too many NA values in my data frame or I'm doing something wrong.
Ref.date <- ymd("1941-08-24")
Date.MI <- ymd("Date.MI")
Day <- as.numeric(difftime(Date.MI, Ref.date))
I expect just the total number of days since 1941-08-24.
How do I solve the problem?

difftime() gives you the option to specify the units for the resulting output. So maybe try something like this
as.numeric(difftime(as.POSIXct("1941-08-25"), as.POSIXct("1941-08-24"), units = c("days")))

The way to solve it:
as.numeric(difftime(as.POSIXct(Date.MI[[1]]), as.POSIXct("1941-08-24"), units = c("days")))
There were square brackets needed since that refers to the first column.

Related

Data frames and datetimes [duplicate]

This question already has answers here:
Extracting time from POSIXct
(7 answers)
Closed 8 months ago.
I have a dataset that I’m working with and I’m trying to change the format of my time column. The current format reads like this, example: “2022-05-23 23:06:58”, I’m trying to change this to only show me the hour times and erase the dates.
Other info: I want to make this change within my data frame, not just random times. I want to change over 100,000 rows so I need a function or solution that will do so. Tidyverse, Lubridate, Format, etc. Thank you guys.
Edit: There was one thing I may not have articulated fully, I wanted to keep the exact time and nothing else. so ‘23:48:07 would’ be how I’m looking for it not just the our. I need it so I can eventually subtract the time passed between two columns. You get me?
Try this
for the first question here is the code to convert to the hour of the day
your_time<-format(as.POSIXct(your_time), format = "%H:%M:%S")
#which gives "23" hours of the day
Since you want to apply on a large dataset we use this below
large_df%>%
mutate(Hour = format(as.POSIXct(Datetime), format ="%H:%M:%S"))
where the large_df is your large dataset worth over 100,000 records
The mutate will open another column for the result which is named the Hour column
and the Datetime is the DateTime column in your large_df dataset
Is the time as a string ok? Cause then you can use substr to extract the hour and minutes like so:
time <- c("2022-05-23 23:02:58", "2022-05-23 13:52:58", "2022-05-23 03:31:58", "2022-05-23 09:09:58")
n <- nchar(time)
hour <- substr(time, n - 7, n - 3)
Just time with your 100.000 row time column
library(data.table)
hour("2022-05-23 23:06:58") # 23

My data does not convert to time series in R

My data contains several measurements in one day. It is stored in CSV-file and looks like this:
enter image description here
The V1 column is factor type, so I'm adding a extra column which is date-time -type: vd$Vdate <- as_datetime(vd$V1) :
enter image description here
Then I'm trying to convert the vd-data into time series: vd.ts<- ts(vd, frequency = 365)
But then the dates are gone:
enter image description here
I just cannot get it what I am doing wrong! Could someone help me, please.
Your dates are gone because you need to build the ts dataframe from your variables (V1, ... V7) disregarding the date field and your ts command will order R to structure the dates.
Also, I noticed that you have what is seems like hourly data, so you need to provide the frequency that is appropriate to your time not 365. Considering what you posted your frequency seems to be a bit odd. I recommend finding a way to establish the frequency correctly. For example, if I have hourly data for 365 days of the year then I have a frequency of 365.25*24 (0.25 for the leap years).
So the following is just as an example, it still won't work properly with what I see (it is limited view of your dataset so I am not sure 100%)
# Build ts data (univariate)
vs.ts <- ts(vd$V1, frequency = 365, start = c(2019, 4)
# check to see if it is structured correctly
print(vd.ts, calendar = T)
Finally my time series is working properly. I used
ts <- zoo(measurements, date_times)
and I found out that the date_times was supposed to be converted with as_datetime() as otherwise they were character type. The measurements are converted into data.frame type.

Convert date in vector to Excel date number (R)

I have a vector of date values:
dates=c("43018","43343","42272","06/27/17","01/10/18","10/11/18")
This is a mixture of actual dates and the Excel number-value of dates (ie: number of days since January 1, 1900). I want to convert all of these values to the Excel format of dates, so we would have an output that looks like the following:
dates
[1] "43018" "43343" "42272" "42913" "43110" "43384"
My goal is to take these values and subtract them from another vector with an equal number of date values that are all the same to get an age of each observation.
Can anyone help point me in the right direction? Thank you!
Figured it out - use the "janitor" library and the excel_numeric_to_date function.
Badda bing badda boom.

Changing a string of factors to time

I am trying to take a column of my data that is in factor format and change it to time in the format
hours:minutes:seconds:milliseconds
I tried:
start.times <- as.POSIXct(as.character(start.times), format="%H:%M:%OS")
but it returned values with todays date and left out the milliseconds in them and that is not what I want.
I also tried downloading chron and running the code:
start.times <- times(start.times)
but this just returned NA's.....
Please help!
My data is all about start times and end times of dolphin vocalizations and I am trying to find the mean whistle duration and the inter whistle interval. Anyways, I don't really know how to get my data into the format I need it in. Thank you!
Assuming you have a factor that looks like:
start.time <- c("0:13:45.9", "3:09:44.9")
Then what you wrote should work if you change the last colon to a period
as.POSIXct(start.time, format ="%H:%M:%S.%OS")

Subset a dataframe based on numerical values of a string inside a variable

I have a data frame which is a time series of meteorological measurement with monthly resolution from 1961 till 2018. I am interested in the variable that measures the monthly average temperature since I need the multi-annual average temperature for the summers.
To do this I must filter from the "DateVaraible" column the fifth and sixth digit, which are the month.
The values in time column are formatted like this
"19610701". So I need the 07(Juli) after 1961.
I start coding for 1 month for other purposes, so I did not try anything worth to mention. I guess that .grepl could do the work, but I do not know how the "matching" operator works.
So I started with this code that works.
summersmonth<- Df[DateVariable %like% "19610101" I DateVariable %like% "19610201"]
I am expecting a code like this
summermonths <- Df[DateVariable %like% "**06**" I DateVariable%like% "**07**..]
So that all entries with month digit from 06 to 09 are saved in the new dataframe summermonths.
Thanks in advance for any reply or feedback regarding my question.
Update
Thank to your answers I got the first part, which is to convert the variable in a as.date with the format "month"(Class=char)
Now I need to select months from Juni to September .
A horrible way to get the result I wanted is to do several subset and a rbind afterward.
Sommer1<-subset(Df, MonthVar == "Mai")
Sommer2<-subset(Df, MonthVar == "Juli")
Sommer3<-subset(Df, MonthVar == "September")
SummerTotal<-rbind(Sommer1,Sommer2,Sommer3)
I would be very glad to see this written in a tidy way.
Update 2 - Solution
Here is the tidy way, as here Using multiple criteria in subset function and logical operators
Veg_Seas<-subset(Df, subset = MonthVar %in% c("Mai","Juni","Juli","August","September"))
You can convert your date variable as date (format) and take the month:
allmonths <- month(as.Date(Df$DateVariable, format="%Y%m%d"))
Note that of your column has been originally imported as factor you need to convert it to character first:
allmonths <- month(as.Date(as.character(Df$DateVariable), format="%Y%m%d"))
Then you can check whether it is a summermonth:
summersmonth <- Df[allmonths %in% 6:9, ]
Example:
as.Date("20190702", format="%Y%m%d")
[1] "2019-07-02"
month(as.Date("20190702", format="%Y%m%d"))
[1] 7
We can use anydate from anytime to convert to Date class and then extract the month
library(anytime)
month(anydate(as.character(Df$DateVariable)))

Resources