I want the slope from a couple of columns that looks like so:
date time
7/8/2014 23.4917166
7/9/2014 28.69671107
7/10/2014 27.3262166
7/11/2014 30.25426663
7/12/2014 29.8345944
7/13/2014 27.7473055
7/14/2014 29.8657722
7/15/2014 29.2622055
The problem is, lm() doesn't seem to play ball with date in a mm/dd/yyyy format. If I make the date data numeric like so:
date time
1 23.4917166
2 28.69671107
3 27.3262166
4 30.25426663
5 29.8345944
6 27.7473055
7 29.8657722
8 29.2622055
and run something like
timetest <- read.table("clipboard", sep="\t", header=T)
test <- lm(time ~ date, data=timetest)
coefficients(test)[2]
I get:
date
0.5605038
So how should I go about transforming the mm/dd/yyyy date format into something numeric? Is there a function to cast them as unix time?
If you convert first the date field to a Date specifying the format used (MM/DD/YYYY eq %m/%d/%Y), then lm does the conversion for you:
timetest$new_date <- as.Date(timetest$date, format = "%m/%d/%Y")
So, the regression looks like
test <- lm(time ~ new_date, data = timetest)
coefficients(test)[2]
and gives
as.numeric(new_date)
0.5605038
Note that as.numeric turns the date into the number of days since 1970-01-01
as.numeric(timetest$new_date[1])
[1] 16259
and
difftime(timetest$new_date[1], as.Date("1970-01-01"))
Time difference of 16259 days
You can also use predict to obtain new value for dates in the format of the original field
predict(test, data.frame(new_date =
seq.Date(as.Date("7/16/2014", format = "%m/%d/%Y"),
as.Date("7/20/2014", format = "%m/%d/%Y"), by = 1)))
that returns
1 2 3 4 5
30.83212 31.39262 31.95312 32.51363 33.07413
For some reason the as.POSIXct() wasn't working, so I went with:
timetest <- read.table("clipboard", sep="\t", header=T)
timetest$date <- as.numeric(as.Date(timetest$date, "%m/%d/%Y"))
test <- lm(time ~ date, data=timetest)
coefficients(test)[2]
Where the new 2nd line just overwrites the original m/d/y data with numeric values. Unix time not necessary for this process.
If you want to convert your dates into Unix time, you can use something like
unix_time<-as.numeric(as.POSIXct(date, format="%m/%d/%Y"))
but the values will end with a lot of zeros, so I'm not sure how useful they'll be in a regression.
Related
I want to merge two data frames in R programming language using the date as the primary key. While trying to change the date, time format on one of the data frames to date only, im getting NA on the date column. Below is the date time format which i want to change to mm dd yy only.
4/12/2016 0:00
This is the code chunk i used.
sleep_day <- sleep_day %>%
rename(date = sleepday) %>%
mutate(date = as.Date(date,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))
i am expecting the date column to change from date, time to date alone. ie from mm dd yy 00:00 to mm dd yy. The result i got on the date column is NA in R programming
Your format is not correct:
test <- "4/12/2016 0:00"
as.Date(test,format ="%m/%d/%Y %H" , tz=Sys.timezone())
will work. Look at ?strptime.
As an advice, prefer to work with lubridate library, with has easy-to-use functions, which parse a lot of different formats:
library(lubridate)
mdy_hm(test)
"2016-04-12 UTC"
I have got a data frame with different columns.
One column is called "TIMESTAMP". Click on the column, I see it is a character, down below you see the content:
TIMESTAMP Price
2003-06-20 09:19:00 5.25
2003-06-20 09:21:00 5.34
2003-06-20 09:22:00 5.43
2003-06-20 09:23:00 5.32
I'd like to convert the complete "TIMESTAMP"-column into as.POSIXct
The reason for this is, that I want to afterwards add the missing minutes in the column as you see from row 1 to 2 there is the timestamp missing with 09:20:00. I want to add the missing minutes for 09:00:00 to 17:30:00, of course with the correct date too.
Let's call the dataframe data.
I tried as.POSIXct(data$TIMESTAMP, format="%Y-%m-%d %H:%M:%S"), but I m unsure if it was succesfull, because the data in the dataframe didn't change.
Is there also a hint how to add the missing timestamps after getting the correct format?
Thanks for your help!
What you tried is correct as long as you assign the result back to the column of your data frame. This is what you should do:
> data$TIMESTAMP <- as.POSIXct(data$TIMESTAMP, format="%Y-%m-%d %H:%M:%S")
After that, the TIMESTAMP column will have the desired class:
> class(data$TIMESTAMP)
[1] "POSIXct" "POSIXt"
For completing your data frame with missing lines, you can first build a new data.frame with all the expected times and then merge it to your initial data. Bellow I'm using min and max to find the range of date-time, then I'm using seq.POSIXt by minute to generate the full set of date-time. The merge will then use the already existing price values from your initial data frame:
> data_full <- data.frame(TIMESTAMP = seq.POSIXt(from=min(data$TIMESTAMP), to=max(data$TIMESTAMP), by='min'))
> data_complete <- merge(data_full, data, all.x = T)
Currently I am attempting to convert dates in the YYYYMMDD format to separate columns for year, month, and day. I know that using the as.Date function I can convert YYYYMMDD to YYYY-MM-DD, and work from there, however R is misinterpreting the dates and I'm not sure what to do. The function is converting the values into dates, but not correctly.
For example: R is converting '19030106' to '2019-03-01', when it should be '1903-01-06'. I'm not sure how to fix this, but this is the code I am using.
library(lubridate)
PrecipAll$Date <- as.Date(as.character(PrecipAll$YYYYMMDD), format = "%y%m%d")
YYYYMMDD is currently numeric, and I needed to include as.character in order for it to output a date at all, but if there are better solutions please help.
Additionally, if you have any tips on separating the corrected dates into separate Year, Month, and Date columns that would be greatly appreciated.
With {lubridate}, try ymd() to parse the YYYYMMDD varaible, regradless if it is in numeric or character form. Also use {lubridate}'s year, month, and day functions to get those variables as numeric signals.
library(lubridate)
PrecipAll <- data.frame(YYYYMMDD = c(19030106, 19100207, 20001130))
mutate(.data = PrecipAll,
date = lubridate::ymd(YYYYMMDD),
year = year(date),
month_n = month(date),
day_n = day(date))
YYYYMMDD date year month_n day_n
1 19030106 1903-01-06 1903 1 6
2 19100207 1910-02-07 1910 2 7
3 20001130 2000-11-30 2000 11 30
I couldn't find a solution of my problem with POSIXct format - I have a monthly data. This is a scrap of my code:
Data <- as.POSIXct(as.character(czerwiec$Data), format = "%Y-%m-%d %H:%M:%S")
get.rows <- Data >= as.POSIXct(as.character("2013-06-03 00:00:01")) & Data <= as.POSIXct(as.character("2013-06-09 23:59:59"))
czerwiec <- czerwiec[get.rows,]
Data <- Data[get.rows]
I chose one hole week of June from 3 to 9 and wanted to estimate the sum of column X (czerwiec$X) by every hours. As you see I could reduce time, but it will be stupid to do it, like this
get.rows <- Data >= as.POSIXct(as.character("2013-06-03 00:00:01")) &
Data <= as.POSIXct(as.character("2013-06-03 00:59:59"))
then
get.rows <- Data >= as.POSIXct(as.character("2013-06-04 00:00:01")) &
Data <= as.POSIXct(as.character("2013-06-04 00:59:59"))
And in the end of this operations, I can estimate sum for this hour etc.
Do you have any idea, how I can recall to every rows, which have time like 2013-06-03 to 2013-06-09 and 00:00:01 to 00:59:59??
Something about data frame "czerwiec", so I have three columns, where first call "ID", second "Price" and third "Data" (means Date).
Thx for help :)
This might help. I've used the lubridate package, which doesn't really do anything you can't do in base R, but it makes handling dates much easier
# Set up Data as a string vector
Data <- c("2013-06-01 05:05:05", "2013-06-06 05:05:05", "2013-06-06 08:10:05", "2013-07-07 05:05:05")
require(lubridate)
# Set up the data frame with fake data. This makes a reproducible example
set.seed(4) #For reproducibility, always set the seed when using random numbers
# Create a data frame with Data and price
czerwiec <- data.frame(price=runif(4))
# Use lubridate to turn the Data string into a vector of POSIXctn objects
czerwiec$Data <- ymd_hms(Data)
# Determine the 'yearday' -i.e. yearday of Jan 1 is 1; yearday of Dec 31 is 365 (or 366 in a leap year)
czerwiec$yday <- yday(czerwiec$Data)
# in.range is true if the date is in the desired date range
czerwiec$in.range <- czerwiec$yday[czerwiec$yday >= yday(ymd("2013-06-03")) &
czerwiec$yday yday(ymd("2013-06-09")]
# Pick out the dates that have the range that you want
selected_dates <- subset(czerwiec, in.range==TRUE)
I have one question. How to convert that format 20110711201023 of date and time, to the number of hours. This is output of software which I use to image analysis, and I can’t change it. It is very important to define starting Date and Time.
Format: 2011 year, 07 month, 11 day, 20 hour, 10 minute, 23 second.
Example:
Starting Data and Time - 20110709201023,
First Data and Time - 20110711214020
Result = 49,5h.
I have 10000 data in this format so I don't want to do this manually.
I will be very gratefully for any advice.
Best is to first make it a real R time object using strptime:
time_obj = strptime("20110711201023", format = "%Y%m%d%H%M%S")
If you do this with both the start and the end date, you can simply say:
end_time - start_time
to get the difference in seconds, which can easily be converted to number of hours. To convert a whole list of these time strings, simply do:
time_vector = strptime(dat$time_string, format = "%Y%m%d%H%M%S")
where dat is the data.frame with the data, and time_string the column containing the time strings. Note that strptime works also on a vector (it is vectorized). You can also make the new time vector part of dat:
dat$time = strptime(dat$time_string, format = "%Y%m%d%H%M%S")
or more elegantly (at least if you hate $ as much as me :)):
dat = within(dat, { time = strptime(dat$time_string, format = "%Y%m%d%H%M%S") })