Low efficiency when I loop in dataframe R - r

I use following code to map the month name to number and I find it's low efficiency compared to other dataframe computation without for loop.
Sys.time()
head(df[,4])
for (i in 1:nrow(df)){
df$monthnum[i]<-match(tolower(as.character(df[i,4])), tolower(month.name))
}
Sys.time()
and I got output like this:
> Sys.time()
[1] "2016-03-07 19:20:53 CST"
> dim(df)
[1] 229464 6
> head(df[,4])
[1] January January January January January January
Levels: April August December February January July June March May November October September
> for (i in 1:nrow(df)){
+ df$monthnum[i]<-match(tolower(as.character(df[i,4])), tolower(month.name))
+ }
> Sys.time()
[1] "2016-03-07 19:23:23 CST"
Can anyone the logic of for loop in dataframe. Any information will be appreciated.

Use sapply function.
First, create your function:
my_function = function(my_month){
match(tolower(as.character(my_month)), tolower(month.name))
}
then use sapply
sapply(df[,4],my_function)

Related

How to convert ARIMA forecast output to Date

I have got the output from the ARIMA forecast. But I don't know how to convert the first column to date. The output is the prediction for the last days of 2021 and I wanted to get the date or a day number of 2021. Is it possible?
Following is the first column output which I need to convert:
2021.9589 is the output for 17th Dec.
2021.9616 is the output for 18th Dec.
2021.9644 is the output for 19th Dec.
2021.9671 ... and so on....
2021.9699
2021.9726
2021.9753
2021.9781
2021.9808
2021.9836
2021.9863
2021.989
2021.9918
2021.9945
2021.9973
2022.0000
2022.0027
2022.0055
I would use date_decimal, from lubridate, it works like:
> require(lubridate)
( d <- date_decimal(2021.9589) )
[1] "2021-12-16 23:57:50 UTC"
And then use month and day functions:
> month(d)
[1] 12
> day(d)
[1] 16

How to split Monday, July 1, 2019 12:00:00:000 AM

I have read, studied, and tested, but I'm just not getting it. Here is my data frame:
MyDate TEMP1 TEMP2
Monday, July 1, 2019 12:00:00:000 AM 90.0 1586
Monday, July 1, 2019 12:01:00:000 AM 88.6 1581
Monday, July 1, 2019 12:02:00:000 AM 89.4 1591
Monday, July 1, 2019 12:03:00:000 AM 90.5 1586
I need to compare it to a second data frame:
Date Time A.B.Flow A.B.Batch.Volume
7/1/2019 14:47:46 1.0 2.0
7/9/2019 14:47:48 3.0 5.0
7/11/2019 14:47:52 0.0 2.0
7/17/2019 14:48:52 3.8 4.0
7/24/2019 14:49:52 0.0 3.1
I just have to combine the two data frames when the minutes dates, hours, and minutes match. The seconds do not have to match.
So far I have gleaned that I need to convert the first Column MyDate into separate Dates and Times. I've been unable to come up with a strsplit command that actually does this.
This just gives each element in quotes:
Tried, newdate <- strsplit(testdate$MyDate, "\\s+ ")[[3]]
This is better but "2019"is gone:
Tried, newdate <- strsplit(testdate$MyDate, "2019")
It looks like this:
[1] "Monday, July 1, " "12:00:00:000 AM"
[[2]]
[1] "Monday, July 1, " "12:01:00:000 AM"
[[3]]
[1] "Monday, July 1, " "12:02:00:000 AM"
[[4]]
[1] "Monday, July 1, " "12:03:00:000 AM"
Please tell me what I am doing wrong. I would love some input as to whether I am barking up the wrong tree.
I've tried a few other things using anytime and lubridate, but I keep coming back to this combined date and time with the day written out as my nemesis.
You could get rid of the day (Monday, ...) in your MyDate field by splitting on ',', removing the first element, then combining the rest and converting to POSIXCt.
Assuming your first dataframe is called df:
dt <- strsplit(df$MyDate, ',')
df$MyDate2 <- sapply(dt, function(x) trimws(paste0(x[-1], collapse = ',')))
df$MyDate2 <- as.POSIXct(df$MyDate2, format = '%b %d, %Y %H:%M:%S')
And since you are not interested in the seconds part of the timestamps, you can do:
df$MyDate2 <- format(df$MyDate2, '%Y-%m-%d %H:%M')
You should similarly convert the Date/Time fields of your second dataframe df2, creating a MyDate2 field there with the seconds part removed as above.
Now you can merge the two dataframes on the MyDate2 column.
This might give you a hint:
Since you have time, you shouldn't used as.Date but rather as.POSIXct, imho.
x=c("Monday, July 1, 2019 12:00:00:000 AM 90.0 1586")
Months=c("January","February","March","April","May","June","July","August","September","October","November","December")
GetDate=function(x){
x=str_remove_all(x,",")#get rid of the
mo=which(Months==word(x,2))
day=word(x,3)
year=word(x,4)
time=word(x,5)
as.POSIXct(paste(paste(year,mo,day,sep="-"),time))
}
GetDate(x)

converting multiple date formats into one in r

I am working with messy excel file with multiple date formats
2016-10-17T12:38:41Z
Mon Oct 17 08:03:08 GMT 2016
10-Sep-15
13-Oct-09
18-Oct-2016 05:42:26 UTC
I want to convert all of the above in yyyy-mm-dd format. I am using following code for the conversion but lot of values are coming NA.
as.Date(parse_date_time(df$date,c('mdy', 'ymd_hms','a b d HMS y','d b y HMS')))
How can I do it all of them together. I have read other threads on similar case,but nothing seems to work for my case.
Please help
If I add 'dmy' to the list then at least all of the cases in your example are succesfully parsed:
z <- c("2016-10-17T12:38:41Z", "Mon Oct 17 08:03:08 GMT 2016",
"10-Sep-15", "13-Oct-09", "18-Oct-2016 05:42:26 UTC")
library(lubridate)
parse_date_time(z,c('mdy', 'dmy', 'ymd_HMS','a b d HMS y','d b y HMS'))
## [1] "2016-10-17 12:38:41 UTC" "2016-10-17 08:03:08 UTC"
## [3] "2015-09-10 00:00:00 UTC" "2009-10-13 00:00:00 UTC"
## [5] "2016-10-18 05:42:26 UTC"
Your big problem will be the third and fourth elements: are these actually meant to be 'ymd' and 'dmy' respectively? I'm not sure how any logic will let you auto-detect these differences ... out of context, "15 Sep 2010" and "10 September 2015" both seem perfectly reasonable possibilities ...
For what it's worth I also tried the new anytime package - it only handled the first and last element.
Removing the times first makes it possible to specify only three alternatives in orders to parse the sample data in the question. This interprets 10-Sep-15 and 13-Oct-09 as dmy but if you want them interpreted as ymd then uncomment the commented out line:
orders <- c("dmy", "mdy", "ymd")
# orders <- c("ymd", "dmy", "mdy")
as.Date(parse_date_time(gsub("..:..:..", " ", x), orders = orders))
giving:
[1] "2016-10-17" "2016-10-17" "2015-09-10" "2009-10-13" "2016-10-18"
or if the commented out line is uncommented then:
[1] "2016-10-17" "2016-10-17" "2010-09-15" "2013-10-09" "2016-10-18"
Note: The input is:
x <- c("2016-10-17T12:38:41Z ", "Mon Oct 17 08:03:08 GMT 2016", "10-Sep-15",
"13-Oct-09", "18-Oct-2016 05:42:26 UTC")

Obtaining last Friday's date

I can get today's date:
Sys.Date( )
But how do I get last Friday's date?
I tried:
library(xts)
date1 <- Sys.Date( )
to.weekly(date1 )
But this gives an error.
I think this should work:
library(lubridate)
Sys.Date() - wday(Sys.Date() + 1)
Try this:
library(zoo)
lastfri(Sys.Date())
where lastfri is the same as the one line function nextfri in the this zoo vignette, zoo quickref vignette, except that ceiling is replaced with floor. Note that lastfri is vectorized, i.e. it can take a vector of input dates and produces a vector of output dates. For example,
library(zoo)
Sys.Date()
## 2015-03-10
lastfri(Sys.Date() + 0:6)
## [1] "2015-03-06" "2015-03-06" "2015-03-06" "2015-03-13" "2015-03-13"
## [6] "2015-03-13" "2015-03-13"
Thus last Friday was March 6th and we keep getting March 6th until the day advances to to next Friday at which point the last Friday is March 13th.
Aside: Next Friday is Friday the 13th.
Here is a function that finds the last date for any day of the week:
getlastdate <- function(day) {
library(lubridate)
dates <- seq((Sys.Date()-7), (Sys.Date()-1), by="days")
dates[wday(dates, label=T)==day]
}
getlastdate("Mon")
# "2015-03-09"
Enter the day of the week in abbreviated format: i.e.
Sun Mon Tues Wed Thurs Fri Sat
Last Friday was 4 days ago, thus:
Sys.Date()-4
> Sys.Date()-4
[1] "2015-03-06"
OR for any day of the week, using base:
Sys.Date()-(as.POSIXlt(Sys.Date())$wday+2)

Format Date (Year-Month) in R [duplicate]

This question already has answers here:
Converting year and month ("yyyy-mm" format) to a date?
(9 answers)
Closed 5 years ago.
Is it possible to format the following number to Year-Month
I entries as follows:
1402
1401
1312
Meaning February 2014. January 2014 and December 2013.
I tried:
date <- 1402
date <- as.Date(as.character(date), format = "%y%m")
But I get an NA as an output.
The zoo package has a "yearmon" class that directly handles year/month objects:
library(zoo)
nums <- c(1402, 1401, 1312)
ym <- as.yearmon(as.character(nums), "%y%m")
giving:
> ym
[1] "Feb 2014" "Jan 2014" "Dec 2013"
You need to include day number, otherwise it is impossible to understand what day of month you have in mind, consider:
> strptime('011402', format = "%d%y%m")
[1] "2014-02-01"
as.Date requires a full date, with day specified. Since you don't include a day it doesn't know what to do.
You could add any day and it should work like this
date <- 140201
date <- as.Date(as.character(date), format="%y%m%d")
You could use the lubridate package to work with date a little bit easier.
> library(lubridate)
> month(ymd(as.character(140201), label=TRUE)
[1] February

Resources