Find first Tuesday of Month - r

I am trying to write a function which takes a vector of dates as an input and returns a vector of dates -- where the output is the date of the first Tuesday of the month which matches the input date.
So 2012-11-19 --> 2012-11-06, etc.
I have had some success with a single date, but have not been able to generalise to the vector case. Could someone please help?
This is what I have so far:
firstTuesday <- function(tt){
ct <- as.POSIXct(tt)
lt <- as.POSIXlt(tt)
firstOf <- as.POSIXlt(ct - 60*60*24* (lt$mday - 1))
if (firstOf$wday > 2)
{
adjDays <- (9 - firstOf$wday)
firstTues <- as.POSIXlt(as.POSIXct(firstOf) + 60*60*24*adjDays)
}
else {
adjDays <- (2 - firstOf$wday)
firstTues <- as.POSIXlt(as.POSIXct(firstOf) + 60*60*24*adjDays)
}
return(firstTues)
}
Which works for a single date: firstTuesday(Sys.Date()) but yielded junk for vectors of dates (due to issues with if not being a vectorised control operator, i think).
I got around my limited understanding by using indexing. The following code seems to do the trick.
firstTuesday <- function(tt){
ct <- as.POSIXct(tt)
lt <- as.POSIXlt(tt)
firstOf <- as.POSIXlt(ct - 60*60*24* (lt$mday - 1))
firstTue <- as.POSIXct(firstOf)
idx <- firstOf$wday > 2
firstTue[idx] <- as.POSIXct(firstOf[idx]) + 60*60*24*(9 - firstOf$wday[idx])
firstTue[!idx] <- as.POSIXct(firstOf[!idx]) + 60*60*24*(2 - firstOf$wday[!idx])
return(firstTue)
}

This uses lubridate and makes the logic a little simpler. Given a vector of dates the second function will return a vector of characters, similar to your input. You can change things around to suit your needs.
library(lubridate)
getTuesday = function(x) {
date = ymd(x)
first = floor_date(date,"month")
dow = sapply(seq(0,6),function(x) wday(first+days(x)))
firstTuesday = first + days(which(dow==3)-1)
return(firstTuesday)
}
getMultipleTuesdays = function(y) {
tmp = lapply(y, getTuesday)
tmp = lapply(tmp, as.character)
return(unlist(tmp))
}
Edit
Sample input/output
getMultipleTuesdays(c("2012-11-19","2012-11-19","2011-01-15"))
[1] "2012-11-06" "2012-11-06" "2011-01-04"

Here's a simple solution using base functions:
firstDayOfMonth <- function(dates, day="Mon", abbreviate=TRUE) {
# first 7 days of month
s <- lapply(as.Date(format(dates,"%Y-%m-01")), seq, by="day", length.out=7)
# first day of month
d <- lapply(s, function(d) d[weekdays(d,abbreviate)==day])
# unlist converts to atomic, so use do.call(c,...) instead
do.call(c, d)
}
Well, maybe the do.call at the end isn't so simple... but it's a handy piece of knowledge. :)
R> d <- as.Date(c("2012-11-19","2012-11-19","2011-01-15"))
R> firstDayOfMonth(d, "Tuesday", FALSE)
[1] "2012-11-06" "2012-11-06" "2011-01-04"

Related

R Switch first part and second part of string at puncuation

I have a data.frame with columns:
names(data) = ("newid","Player.WR","data_col.WR","Trend.WR","Player.QB","data_col.QB","Trend.QB","Player.RB","data_col.RB","Trend.RB","Player.TE","data_col.TE","Trend.TE" )
However, I need to flip the first and second portions of each name at the period so it looks like this:
names(data) = ("newid", "WR.Player", "WR.data_col", "WR.Trend", "QB.Player", "QB.data_col", "QB.Trend", "RB.Player", "RB.data_col", "RB.Trend", "TE.Player", "TE.data_col", "TE.Trend")
My initial thought was to try to do a strsplit and then somehow do an lapply statement to reorder, but I wasn't sure how to make the lapply work.
Thanks!
With a vector of names v, you could also try:
v <- c("newid","Player.WR","data_col.WR","Trend.WR",
"Player.QB","data_col.QB","Trend.QB","Player.RB",
"data_col.RB","Trend.RB","Player.TE","data_col.TE","Trend.TE")
gsub(
'(.*)\\.(.*)',
'\\2\\.\\1',
v
)
Output:
[1] "newid" "WR.Player" "WR.data_col" "WR.Trend" "QB.Player" "QB.data_col" "QB.Trend" "RB.Player"
[9] "RB.data_col" "RB.Trend" "TE.Player" "TE.data_col" "TE.Trend"
And to directly assign it to names:
names(data) <- gsub('(.*)\\.(.*)', '\\2\\.\\1', v)
I would suggest next approach using a function to exchange position of values and lapply():
#Data
vec <- c("newid","Player.WR","data_col.WR","Trend.WR",
"Player.QB","data_col.QB","Trend.QB","Player.RB",
"data_col.RB","Trend.RB","Player.TE","data_col.TE","Trend.TE" )
#Split
L <- lapply(vec,strsplit,split='\\.')
#Format function
myfun <- function(x)
{
y <- x[[1]]
#if check
if(length(y)!=1)
{
z <- paste0(y[c(2,1)],collapse = '.')
} else
{
z <- y
}
return(z)
}
#Apply
L2 <- lapply(L,FUN = myfun)
#Bind
do.call(c,L2)
Output:
[1] "newid" "WR.Player" "WR.data_col" "WR.Trend" "QB.Player" "QB.data_col" "QB.Trend"
[8] "RB.Player" "RB.data_col" "RB.Trend" "TE.Player" "TE.data_col" "TE.Trend"
Last output can be saved in a new vector like vecnamesnew <- do.call(c,L2)
Arg0naut91's answer is quite concise, and I would recommend using Arg0naut91's approach. However, for the sake of providing a (somewhat) concise solution using strsplit and lapply with (perhaps) a bit more readability for those unfamiliar with gsub syntax, I submit the following:
names<-c("newid","Player.WR","data_col.WR","Trend.WR",
"Player.QB","data_col.QB","Trend.QB","Player.RB",
"data_col.RB","Trend.RB","Player.TE","data_col.TE","Trend.TE" )
newnames<-lapply(names,function(x) paste(rev(unlist(strsplit(x,split="\\."),use.names=FALSE)),collapse="."))
print(newnames)
which yields
[1] "newid" "WR.Player" "WR.data_col" "WR.Trend" "QB.Player" "QB.data_col" "QB.Trend"
[8] "RB.Player" "RB.data_col" "RB.Trend" "TE.Player" "TE.data_col" "TE.Trend"
as output.

How to take value from one column and store it in newly created column using function call

firstly sorry if this is a stupid question ... I am learning R, and really dont have too much experience
I have following function in R programming language, that is taking value and returning value.
dec2binSingle <- function(decimal) {
print(decimal)
binaryValue <- ""
index <- 0
decimal <- as.numeric(decimal)
while(decimal != 0) {
print(decimal)
temp <- as.numeric(decimal) %% 2
if (temp == 1) {
binaryValue <- paste("1", binaryValue, sep="", collapse = NULL)
decimal <- decimal - 1
} else {
binaryValue <- paste("0", binaryValue, sep="", collapse = NULL)
}
index <- index + 1
decimal <- decimal / 2
}
return(binaryValue)
}
The function is converting decimal number into binary equivalent.
When I try to call the function, the function completes without any error, but when I try to see the data, the following error appears:
Error in View : 'names' attribute [200] must be the same length as the vector [1]
And this is the way, how the function is being called:
test_function <- function(value1) {return(dec2binSingle(as.numeric(unlist(value1))))}
data_example$tv <- with(data_example, test_function(data_example[which(colnames(data_example) == "numbers")]))
Any help is appreciated... thanks
EDIT:
I called the function for single value and it works as expected.
> dec2binSingle(23)
[1] "10111"
>
I hope this is what you wanted to achieve with your code.
#sample data
df <- data.frame(char1=c("abc","def","xyz"), num1=c(1,34,12), num2=c(34,20,8))
df
#function to convert decimal into binary
bin_func <- function(x) {gsub("^0+","",paste(rev(as.numeric(intToBits(x))), collapse=""))}
#verify which all columns are numeric
num_col <- sapply(df,is.numeric)
df1 <- as.data.frame(lapply(df[,num_col], FUN = function(x) {sapply(x, FUN = bin_func)}))
names(df1) <- paste(names(df1),"_converted",sep="")
#final dataframe having original as well as converted columns
df <- cbind(df,df1)
df
Please don't forget to let us know if it helped :)

Perform test, get result from another column for the same row

I'm creating a function but i need some help with best practices.
Active.Test <- function(date) {
date <- rep(date,length(df$Start.Date))
active <- rep(0,length(df$Start.Date))
active[date > df$Start.Date & date < df$End.Date] <- 1
active[df$Start.Date == df$End.Date ] <- df$Active.Time
return (active)
}
I basically want to check if a date (which is passed to the function) is between the start and end date in my data frame. If it is, assign a 1. If the start and end dates are equal, get the result from the same row in Active.Time column. Everything else has a default value of 0.
This returns an error as it's retrieving a vector which is of a different size for the second test.
I can re-write the above as:
Active.Test <- function(date) {
date <- rep(date,length(df$Start.Date))
active <- rep(0,length(df$Start.Date))
active[date > df$Start.Date & date < df$End.Date] <- 1
active[df$Start.Date == df$End.Date] <- df$Active.Time[df$Start.Date == df$End.Date]
return (active)
}
This will then get the correct element from the Active.Time column but this doesn't seem to be an elegant way to write this. I'm also guessing it's slower as i'm performing the same check twice as many times.
Could you please help me re-write this using best practices?
EDIT: Here's some code to get a few rows of data and then test use the function by checking to see if the start and end dates encompass 25/05/2016.
#Create a data frame
df <- data.frame(End.Date = as.Date(c("1/05/2016","28/05/2016", "25/05/2016"), format = "%d/%m/%Y"), Start.Date = as.Date(c("20/04/2016 11:00","20/05/2016 23:00", "25/05/2016 10:00"), format = "%d/%m/%Y" ), Active.Time = as.numeric(c(0.5,0.4,0.8)))
#Test the function
df$new <- Active.Test(as.Date("25/05/2016", format = "%d/%m/%Y"))
Thanks
# Using the data.table approach
library(data.table)
# Make data table instead of data.frame (you can also do as.data.table(df) to get a data.table)
my_dt <- data.table(Start.Date=as.Date(c("20/04/2016 11:00","20/05/2016 23:00", "25/05/2016 10:00"), format = "%d/%m/%Y" ),
End.Date=as.Date(c("1/05/2016","28/05/2016", "25/05/2016"), format = "%d/%m/%Y"),
Active.Time = as.numeric(c(0.5,0.4,0.8))
)
setkey(my_dt)
# Sample date to test
datte <- as.Date("25/05/2016", format = "%d/%m/%Y")
# Create function with conditions and result to return
Active.Test <- function(datte, Start.Date, End.Date, Active.Time) {
if(datte > Start.Date & datte < End.Date){
return(1)
}
else if(Start.Date==End.Date){
return(Active.Time)
}
else{return(0)}
}
# Test function
my_dt[, res:=Active.Test(datte, Start.Date, End.Date, Active.Time), by=1:nrow(my_dt)]
See data.table vignette for more on data.table. Also, in your function above, note the warning you get when you run df$new <- Active.Test(as.Date("25/05/2016", format = "%d/%m/%Y"))!

Adding a new method to data.table

I work a lot with time series. Most of my manipulations are done via data.table, but often I have to check data called by specific time, and for that I use xts method:
> timedata['2014-01-02/2014-01-03']
My data.table data is basically the exact copy of xts, with first colums index, containing time:
> dt_timedata <- data.table(index=index(timedata), coredata(timedata))
At some point data became way too large, so keeping both or converting them all the time is not really a good option (which it never was really), so I am thinking about making the same method for data.table. However, I only couldn't find any reasonably easy examples of modifying a generic method.
Is what I want even possible, and if so, where could I read about it?
PS Even though I can abviosly use something like
> from <- '2014-01-02'
> to <- '2014-01-03'
> period <- as.POSIXct(c(from, to))
> dt_timedata[index %between% period]
it is far less intuitive and beautiful, so I would rather write a new method.
Edit1 (example by request)
require(xts)
require(data.table)
days <- as.POSIXct(c('2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04'), origin='1970-01-01')
timedata <- xts(1:4, days)
dt_timedata <- data.table(index=index(timedata), coredata(timedata))
What I can do in xts:
> timedata['2014-01-01/2014-01-02']
[,1]
2014-01-01 1
2014-01-02 2
I want the exact same for [.data.table.
Edit2 (to illustrate what I do)
'[.data.table' = function(x, i, ...) {
if (!missing('i')) {
if (all(class(i) == "character")) {
# do some weird stuff
return(x[ *some indices subset I just created* ])
}
}
data.table:::'[.data.table'(x, i, ...)
}
So basically if i is character and suits my format (checks happen in weird stuff section) I return something and function never goes to the last command. Otherwise nothing happens and I just call
data.table:::'[.data.table'(x, i, ...)
And the thing is, this breaks expressions like
ind <- as.POSIXct('2014-01-01', origin='1970-01-01')
dt_timedata[index==ind]
Here's a trivial example for you to try:
require(data.table)
days <- as.POSIXct(c('2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04'), origin='1970-01-01')
dt_timedata <- data.table(index=days, value=1:4)
ind <- as.POSIXct('2014-01-01', origin='1970-01-01')
# now it works
dt_timedata[index==ind]
# new trivial [.data.table
'[.data.table' = function(x, I, ...) {
data.table:::`[.data.table`(x, I, ...)
}
# and now it doesn't work
dt_timedata[index==ind]
Modifying the method to add your own smth smth is very simple:
`[.data.table` = function(...) {
print("I'm doing smth custom")
data.table:::`[.data.table`(...)
}
dt = data.table(a = 1:5)
dt[, sum(a)]
#[1] "I'm doing smth custom"
#[1] 15
So just process the first argument however you like and feed it to the standard function.
Here's an example to handle your edit:
`[.data.table` = function(...) {
if (try(class(..2), silent = TRUE) == 'character')
print("boo")
else
data.table:::`[.data.table`(...)
}
dt = data.table(a = 1:10)
dt[a == 4]
# a
#1: 4
dt['sdf']
#[1] "boo"
#[1] "boo"

Is there a way to convert mm:ss.00 to seconds.00?

I've got some performance time data in mm:ss.00 format (i.e. 02:15.45, or 00:34.58). R is recognizing the variable as a factor, but I'd like to convert each performance time to just seconds (i.e. 02:15.45 to 135.45). I've searched for an answer but can't seem to find a way to make it work.
Thanks in advance.
Using lubridate package (part of tidyverse):
library(lubridate)
period_to_seconds(hms("12:12:54"))
Here's one I've used for a number of years. It's vectorized, too.
toSeconds <- function(x){
if (!is.character(x)) stop("x must be a character string of the form H:M:S")
if (length(x)<=0)return(x)
unlist(
lapply(x,
function(i){
i <- as.numeric(strsplit(i,':',fixed=TRUE)[[1]])
if (length(i) == 3)
i[1]*3600 + i[2]*60 + i[3]
else if (length(i) == 2)
i[1]*60 + i[2]
else if (length(i) == 1)
i[1]
}
)
)
}
And the reverse (preserves fractional seconds to the number of digits requested:
secondsToString <- function(x,digits=2){
unlist(
lapply(x,
function(i){
# fractional seconds
fs <- as.integer(round((i - round(i))*(10^digits)))
fmt <- ''
if (i >= 3600)
fmt <- '%H:%M:%S'
else if (i >= 60)
fmt <- '%M:%S'
else
fmt <- '%OS'
i <- format(as.POSIXct(strptime("0:0:0","%H:%M:%S")) + i, format=fmt)
if (fs > 0)
sub('[0]+$','',paste(i,fs,sep='.'))
else
i
}
)
)
}
Look into strptime. Specifically
t = "02:15.45"
(as.numeric(as.POSIXct(strptime(t, format = "%M:%OS"))) -
as.numeric(as.POSIXct(strptime("0", format = "%S"))))
This will work, but is possibly a little awkward (doing it this way mostly because of POSIXct's annoying automatic unit conversion...)
library(lubridate)
df$variable<- hms(df$variable)
df$variable<- as.numeric(df$variable)
make it a one-liner is ok as well. Works like a charm for me.
I hope this helps.
I am not that much comfortable so i don't know if there is any builtin function available, but i have worked out this code.
mmss_to_ss <- function (string)
{
mmss <- strsplit (string, ":", T)
mm <- as.numeric (mmss[[1]][1])
ss <- as.numeric (mmss[[1]][2])
return (mm * 60 + ss)
}
This will accept a time string in mm:ss format and return second values. The code can be easily modified to convert from hh:mm:ss to seconds also.
You can do this easily with the Lubridate package. If you use the format "h:m:s" you can convert the variable to a lubridate object with
hms("12:12:54")
And then convert it to seconds with
seconds(hms("12:12:54"))
Here is a link to the lubridate article in JSS
http://www.jstatsoft.org/v40/i03/paper

Resources