Using R to Compare Dates - r

I've got two csv files.
One file lists when and why an employee leaves.
EmployeeID,Department,Separation_Type,Separation_Date,FYFQ
119549,Sales,Retirement,09/30/2013
2629053,Sales,Termination,09/30/2013
120395,Sales,Retirement,11/01/2013
122450,Sales,Transfer,11/30/2013
123962,Sales,Transfer,11/30/2013
1041054,Sales,Resignation,12/01/2013
990962,Sales,Retirement,12/14/2013
135396,Sales,Retirement,01/11/2014
Another file is a lookup table shows the start and end dates of every fiscal quarter:
FYFQ,Start,End
FY2014FQ1,10/1/2013,12/31/2013
FY2014FQ2,1/1/2014,3/31/2014
FY2014FQ3,4/1/2014,6/30/2014
FY2014FQ4,7/1/2014,9/30/2014
FY2015FQ1,10/1/2014,12/31/2014
FY2015FQ2,1/1/2015,3/31/2015
I'd like R to find what FYFQ the Separation_Date occurred in and print it into a fourth column in the data.
Input:
Separations.csv:
>EmployeeID,Department,Separation_Type,Separation_Date,FYFQ
>990962,Sales,Retirement,12/14/2013
>135396,Sales,Retirement,01/11/2014
FiscalQuarterDates.csv:
>FYFQ,Start,End
>FY2013FQ4,7/1/2013,9/30/2013
>FY2014FQ1,10/1/2013,12/31/2013
>FY2014FQ2,1/1/2014,3/31/2014
Desired Output:
Output.csv:
>EmployeeID,Department,Separation_Type,Separation_Date,FYFQ
>990962,Sales,Retirement,12/14/2013,FY2014FQ1
>135396,Sales,Retirement,01/11/2014,FY2014FQ2
I'm assuming there's some function that would iterate through the FiscalQuarterDates.csv and evaluate if each separation date was in a FYFQ, but I'm not sure.
Any thoughts on the best way to do this?
This is what worked.
#read in csv and declare th3 4th column a date
separations <- read.csv(file="Separations_DummyData.csv", head=TRUE,sep=",",colClasses=c(NA,NA,NA,"Date"))
#Use the zoo package (I installed it) to convert separation_date to quarter type and then set the quarter back by 1/4. Then construct the variable with FYyFQq.
library(zoo)
separations$FYFQ <- format(as.yearqtr(separations$Separation_Date, "%m/%d/%Y") + 1/4, "FY%YFQ%q")
#Write out this to CSV in working directory.
write.csv(separations, file = "sepscomplete.csv", row.names = FALSE)

You really don't need a second dataframe: A simple function will solve this:
yr<-with(firstdf,as.numeric(substr(Seperation_Date,7,10)))
mth<-with(firstdf,as.numeric(substr(Seperation_Date,1,2)))
firstdf$FYFQ<-with(firstdf,
ifelse(mth<=3,paste0("FY",yr,"FQ2"),
ifelse(mth>3 & mth<=6,paste0("FY",yr,"FQ3"),
ifelse(mth>7 & mth<=9,paste0("FY",yr,"FQ4"),
paste0("FY",yr+1,"FQ1")
))))

Convert each date to "yearqtr" class (from the zoo package) and add 1/4 to shift it to the next calendar quarter. Then write it out using write.csv:
library(zoo)
DF$FYFQ <- format(as.yearqtr(DF$Separation_Date, "%m/%d/%Y") + 1/4, "FY%YFQ%q")
giving:
> write.csv(DF, file = stdout(), row.names = FALSE)
"EmployeeID","Department","Separation_Type","Separation_Date","FYFQ"
990962,"Sales","Retirement","12/14/2013","FY2014FQ1"
135396,"Sales","Retirement","01/11/2014","FY2014FQ2"
Note:
1) If FYFQ need not be exactly in the format shown then it could be simplified to just:
DF$FYFQ <- as.yearqtr(DF$Separation_Date, "%m/%d/%Y") + 1/4
2) The second input file listed in the question is not used.
3) We used this for the input data:
Lines <- "EmployeeID,Department,Separation_Type,Separation_Date,FYFQ
990962,Sales,Retirement,12/14/2013
135396,Sales,Retirement,01/11/2014"
DF <- read.csv(text = Lines)
4) Fixed so that it produces shifted calendar quarters.

The text of this answer was just a copy of another answer so it has been moved to the question.

Related

How to create new row at every 49th column from the dataframe in .log file (in R)?

I am trying to import .log file using read.table in R
below is a sample (Except of the test.log)
20410088416;5268;1;5251;1;253;3;2;-8.101;25.00;3.250;1;32.00;55.00;59;0;0.100;0.000;0.000;2.216;-9.315;25.00;3.250;1;30.00;30.00;50;0;0;0;-192.633;-35.912;-8.026;-194.842;-35.729;-9.264;0;;42;1;0,0...
I have attached the test.log is here (https://www.dropbox.com/s/pki7wkwdtxy2gcc/test.log?dl=0)
log<- read.table("test.log", sep=";")
The output log shows 1 obs of 10560 variables. The desired output would be 220 obs of 48 columns.
How can I make use of this read such that every 49th column will be index to another new row in R?
I tried to reshape function and other methods here it did not work. hope there is more efficient way to solve this problem.
thanks in advance for your sharing and help.
Edited:
Note that they are , within the dataset - I consider them as strings under 1 column. I'd like to output to be (see img below):
target output
Not the most optimum, but try this:
sep_in_rows <- seq(from = 1, to = ncol(log), by = 48)
y <- vector()
for(i in 2:length(sep_in_rows)){
first_index <- sep_in_rows[i-1]
second_index <- sep_in_rows[i]
y <- as.data.frame(rbind(y, log[first_index:second_index]))
}

Using an R function output within another function

I am adapting an open-source code from GitHub (https://github.com/PeterDSteinberg/RSWMM) to calibrate EPA-SWMM. The code is written in two parts -- RSWMM.r contains the functions needed to look up/ replace SWMM values, read in the calibration files etc. runRSWMM.r is a wrapper containing the optimization code that calls in the relevant RSWMM functions. I need to edit the RSWMM code to read in a .txt file and have written the following code to do so.
getLIDtimeseriesFromTxt <- function(TxtFile, dateformat = "%m/%d/%y %H:%M"){
#RW (05/12/17): This function added to read in the appropriate CSV file
#containing the LID outputs
library(chron)
LID_data <- read.table(TxtFile, skip=9)
times <- LID_data[,1]
startTime <- as.chron("6/8/2006 1:00", format="%m/%d/%Y %H:%M", tz="GMT") #M/D/YY H:MM
simData <<- {}
simData$times <<- (times/24)+startTime
simData$obs <<- LID_data[,9] #the ninth column is ponding depth
return(simData)
}
getCalDataFromCSV<-function(CSVFile,dateFormat="%m/%d/%y %H:%M"){
temp=read.csv(file=CSVFile, header = TRUE, sep = ",", quote="\"", dec=".", fill = TRUE, comment.char="",stringsAsFactors = FALSE)
calData<<-{}
#RW (05/12/17): This line of code edited to match the date format pulled in from the simulation file.
calData$times<<-as.chron(temp[,1], format=dateFormat,tz="GMT")
calData$obs<<-temp[,2]
return(calData)
}
interpCalDataToSWMMTimes<-function(){
mergedData <- merge(simData, calData, by="times")
return(mergedData)
#first column of mergedData are times as chrons,
#second column is simulated data
#third column is observed data
}
When I run the full code from runRSWMM, I get the error the "simData" is not found, which implies that I cannot use merge. However, this doesn't appear to be a problem with the variable "calData", which I can see in my global environment in RStudio. What is the difference in the way the two variables are being output? How can I fix this error?

Replace a string with the date value from the row above

I have a dataset where I have about 200 \N and I'd like to replace \N with the date/day value in the row above. Such as for row 641, I want to change the date to 10-Nov-14 and day to Mon:
If this can be done in R, does the format of date matter? As currently, these dates are shown as factors.
Easy in Excel. Replace all \N with nothing, select the two relevant columns, HOME > Editing, Find & Select, Go To Special..., Blanks, then
=
↑
Ctrl+Enter.
Then copy range again and HOME > Clipboard - Paste, Paste Special..., Values, OK over the top.
If that's an Excel file that you are importing into R then you need to understand how R works with the backslash characters (which is what is showing in the screenshot) and which is used to "escape" characters. See ?Quotes. Once the data is in R it will probably all be factor columns.
If the dataframe is named 'dat' then this should work to really make true missing values:
is.na( dat) <- dat == "\\N" # need to escape the escape character.
Then use na.locf from package zoo:
library(zoo) # lots of useful methods in zoo.
dat$date <- na.locf(dat$date)
dat$day_of_week <- na.locf(dat$day_of_week)
These methods should work with any class of column, and these would not be R Date-classed variables until you made the conversion.
It can be solved easily in R with the following code:
ListNa <- grep("\\N", a$date)
ListPrewRow <- ListNa-1
data[ListNa,c("date", "day")] <- data [ListPrewRow,c("date", "day")]
Where:
"data" is the data table
"date" and "day" is the columns to be replaced.
This can be done rapidly in Excel with a quick use of Find and FindNext. Assuming you want to replace all of the \N on the ActiveSheet, this code will terminate once it has replaced them all. Offset(-1) gets the value one row up.
Sub ReplaceWithValueAbove()
Dim rng_search As Range
Set rng_search = ActiveSheet.UsedRange.Find("\N")
While Not rng_search Is Nothing
'set to row above
rng_search = rng_search.Offset(-1)
'find the next one
Set rng_search = ActiveSheet.UsedRange.FindNext()
Wend
End Sub
Before and After pictures

Create a time vector from an excel import

I am working with data from csv files that will all look the same so I am hoping to come up with a code that can be easily applied to all of them.
However, sadly enough I am failing at step one :-(.
The csv files have the date and time saved in one column, so when I import them with read.csv that column gets read as a chr. How can I most easily convert this into a date that I then can use for plotting and analysis?
Here is what I tried:
load the data --> will save the date and time as chr under mydata$Date.Time (e.g. 1/1/15 0:00)
mydata<-read.csv(file.choose(), stringsAsFactors = FALSE,
strip.white = TRUE,
na.strings = c("NA",""), skip=16,
header=TRUE)
separate the Date.Time into Date and Time:
new <- do.call( rbind , strsplit( as.character( mydata$Date.Time ) , " " ) )
add these two back to the df mydata:
cbind( mydata , Date = new[,2] , Time = new[,1] )
convert Date into a date format via as.Date:
mydata$Date <- as.Date(new[,1], format="")
So this works fine for the date however I am stuck with the time, I tried this:
mydata$Time <- format(as.POSIXct(new[,2], format="%H:%M"))
this gives me the following error:
Error in as.POSIXlt.character(x, tz, ...) :
character string is not in a standard unambiguous format
I wonder if there is a smarter way of doing this? Reading in time and date seems to be one of the substantial tasks that I would like to understand. Is there a way of R directly recognizing the date and time from the csv? Or is it generally smarter to generate a time vector by its own, if so how would I do that?
Thanks so much for your help.
Sandra
If you want to use time only, consider using the chron package:
library(chron)
mytime <- times("21:19:37")
or in your case
times(new[,2])
assuming that that's a character vector.
I tried the chron approach but it wouldn't work for me :-(.
So what I ended up doing is just creating a time vector for the period that I am loading the data in for:
date <-seq(as.POSIXct("2015/1/1 00:00"), as.POSIXct("2015/1/31 23:00"), "hours")
and then adding it back to the df.
Not what I wanted but it will work until I find the ultimate solution :-)

How to parse complex date/time string into zoo object?

I'm trying to convert the following date/time string into a zoo object:
2004:071:15:23:41.87250
2004:103:15:24:15.35931
year:doy:hour:minute:second
The date/time string is stored in a dataframe without headers. What's the best way to go about this in R?
Cheers!
Edit based on answer by Gavin:
# read in time series from CSV file; each entry as described above
timeSeriesDates <- read.csv("timeseriesdates.csv", header = FALSE, sep = ",")
# convert to format that can be used as a zoo object
timeSeriesDatesZ <- as.POSIXct(timeSeriesDates$V1, format = "%Y:%j:%H:%M:%S")
Read the data in to R in the usual way. You will have something like the following:
dats <- data.frame(times = c("2004:071:15:23:41.87250", "2004:103:15:24:15.35931"))
dats
These can be converted to one of the POSIXt classes via:
dats <- transform(dats, as.POSIXct(times, format = "%Y:%j:%H:%M:%S"))
or
data$times <- as.POSIXct(dats$times, format = "%Y:%j:%H:%M:%S"))
which can then be used in a zoo object. See ?strftime for details on the placeholders used in the format argument; essentially %j is the day of the year placeholder.
To do the zoo bit, we would do, using some dummy data for the actual time series
ts <- rnorm(2) ## dummy data
require(zoo) ## load zoo
tsZoo <- zoo(ts, dats$times)
the last line gives:
> tsZoo
2004:071:15:23:41.87250 2004:103:15:24:15.35931
0.3503648 -0.2336064
One thing to note with fractional seconds is that i) the exact fraction you have may not be recordable using floating point arithmetic. Also, R may not show the full fractional seconds given the value of an option in R; digits.secs. See ?options for more on this particular option and how to change it.
Here's a commented example for the first string:
R> s <- "2004:103:15:24:15.35931"
R> # split on the ":" and convert the result to a numeric vector
R> n <- as.numeric(strsplit(s, ":")[[1]])
R> # Use the year, hour, minute, second to create a POSIXct object
R> # for the first of the year; then add the number of days (as seconds)
R> ISOdatetime(n[1], 1, 1, n[3], n[4], n[5])+n[2]*60*60*24
[1] "2004-04-13 16:24:15 CDT"

Resources