Using an R function output within another function - r

I am adapting an open-source code from GitHub (https://github.com/PeterDSteinberg/RSWMM) to calibrate EPA-SWMM. The code is written in two parts -- RSWMM.r contains the functions needed to look up/ replace SWMM values, read in the calibration files etc. runRSWMM.r is a wrapper containing the optimization code that calls in the relevant RSWMM functions. I need to edit the RSWMM code to read in a .txt file and have written the following code to do so.
getLIDtimeseriesFromTxt <- function(TxtFile, dateformat = "%m/%d/%y %H:%M"){
#RW (05/12/17): This function added to read in the appropriate CSV file
#containing the LID outputs
library(chron)
LID_data <- read.table(TxtFile, skip=9)
times <- LID_data[,1]
startTime <- as.chron("6/8/2006 1:00", format="%m/%d/%Y %H:%M", tz="GMT") #M/D/YY H:MM
simData <<- {}
simData$times <<- (times/24)+startTime
simData$obs <<- LID_data[,9] #the ninth column is ponding depth
return(simData)
}
getCalDataFromCSV<-function(CSVFile,dateFormat="%m/%d/%y %H:%M"){
temp=read.csv(file=CSVFile, header = TRUE, sep = ",", quote="\"", dec=".", fill = TRUE, comment.char="",stringsAsFactors = FALSE)
calData<<-{}
#RW (05/12/17): This line of code edited to match the date format pulled in from the simulation file.
calData$times<<-as.chron(temp[,1], format=dateFormat,tz="GMT")
calData$obs<<-temp[,2]
return(calData)
}
interpCalDataToSWMMTimes<-function(){
mergedData <- merge(simData, calData, by="times")
return(mergedData)
#first column of mergedData are times as chrons,
#second column is simulated data
#third column is observed data
}
When I run the full code from runRSWMM, I get the error the "simData" is not found, which implies that I cannot use merge. However, this doesn't appear to be a problem with the variable "calData", which I can see in my global environment in RStudio. What is the difference in the way the two variables are being output? How can I fix this error?

Related

Data transpose function in R not working properly

I am using R to do some work but I'm having difficulties in transposing data.
My data is in rows and the columns are different variables. When using the function phyDat, the author indicates a transpose function because importing data is stored in columns.
So I use the following code to finish this process:
#read file from local disk in csv format. this format can be generated by save as function of excel.
origin <- read.csv(file.choose(),header = TRUE, row.names = 1)
origin <- t(origin)
events <- phyDat(origin, type="USER", levels=c(0,1))
When I check the data shown in R studio, it is transposed but the result it is not. So I went back and modified the code as follows:
origin <- read.csv(file.choose(),header = TRUE, row.names = 1)
events <- phyDat(origin, type="USER", levels=c(0,1))
This time the data does not reflect transposed data, and the result is consistent with it.
How I currently solve the problem is transposing the data in CSV file before importing to R. Is there something I can do to fix this problem?
I had the same problem and I solved it by doing an extra step as follows:
#read file from local disk in csv format. this format can be generated by save as function of excel.
origin <- read.csv(file.choose(),header = TRUE, row.names = 1)
origin <- as.data.frame(t(origin))
events <- phyDat(origin, type="USER", levels=c(0,1))
Maybe it is too late but hope it could help other users with the same problem.

Using R to Compare Dates

I've got two csv files.
One file lists when and why an employee leaves.
EmployeeID,Department,Separation_Type,Separation_Date,FYFQ
119549,Sales,Retirement,09/30/2013
2629053,Sales,Termination,09/30/2013
120395,Sales,Retirement,11/01/2013
122450,Sales,Transfer,11/30/2013
123962,Sales,Transfer,11/30/2013
1041054,Sales,Resignation,12/01/2013
990962,Sales,Retirement,12/14/2013
135396,Sales,Retirement,01/11/2014
Another file is a lookup table shows the start and end dates of every fiscal quarter:
FYFQ,Start,End
FY2014FQ1,10/1/2013,12/31/2013
FY2014FQ2,1/1/2014,3/31/2014
FY2014FQ3,4/1/2014,6/30/2014
FY2014FQ4,7/1/2014,9/30/2014
FY2015FQ1,10/1/2014,12/31/2014
FY2015FQ2,1/1/2015,3/31/2015
I'd like R to find what FYFQ the Separation_Date occurred in and print it into a fourth column in the data.
Input:
Separations.csv:
>EmployeeID,Department,Separation_Type,Separation_Date,FYFQ
>990962,Sales,Retirement,12/14/2013
>135396,Sales,Retirement,01/11/2014
FiscalQuarterDates.csv:
>FYFQ,Start,End
>FY2013FQ4,7/1/2013,9/30/2013
>FY2014FQ1,10/1/2013,12/31/2013
>FY2014FQ2,1/1/2014,3/31/2014
Desired Output:
Output.csv:
>EmployeeID,Department,Separation_Type,Separation_Date,FYFQ
>990962,Sales,Retirement,12/14/2013,FY2014FQ1
>135396,Sales,Retirement,01/11/2014,FY2014FQ2
I'm assuming there's some function that would iterate through the FiscalQuarterDates.csv and evaluate if each separation date was in a FYFQ, but I'm not sure.
Any thoughts on the best way to do this?
This is what worked.
#read in csv and declare th3 4th column a date
separations <- read.csv(file="Separations_DummyData.csv", head=TRUE,sep=",",colClasses=c(NA,NA,NA,"Date"))
#Use the zoo package (I installed it) to convert separation_date to quarter type and then set the quarter back by 1/4. Then construct the variable with FYyFQq.
library(zoo)
separations$FYFQ <- format(as.yearqtr(separations$Separation_Date, "%m/%d/%Y") + 1/4, "FY%YFQ%q")
#Write out this to CSV in working directory.
write.csv(separations, file = "sepscomplete.csv", row.names = FALSE)
You really don't need a second dataframe: A simple function will solve this:
yr<-with(firstdf,as.numeric(substr(Seperation_Date,7,10)))
mth<-with(firstdf,as.numeric(substr(Seperation_Date,1,2)))
firstdf$FYFQ<-with(firstdf,
ifelse(mth<=3,paste0("FY",yr,"FQ2"),
ifelse(mth>3 & mth<=6,paste0("FY",yr,"FQ3"),
ifelse(mth>7 & mth<=9,paste0("FY",yr,"FQ4"),
paste0("FY",yr+1,"FQ1")
))))
Convert each date to "yearqtr" class (from the zoo package) and add 1/4 to shift it to the next calendar quarter. Then write it out using write.csv:
library(zoo)
DF$FYFQ <- format(as.yearqtr(DF$Separation_Date, "%m/%d/%Y") + 1/4, "FY%YFQ%q")
giving:
> write.csv(DF, file = stdout(), row.names = FALSE)
"EmployeeID","Department","Separation_Type","Separation_Date","FYFQ"
990962,"Sales","Retirement","12/14/2013","FY2014FQ1"
135396,"Sales","Retirement","01/11/2014","FY2014FQ2"
Note:
1) If FYFQ need not be exactly in the format shown then it could be simplified to just:
DF$FYFQ <- as.yearqtr(DF$Separation_Date, "%m/%d/%Y") + 1/4
2) The second input file listed in the question is not used.
3) We used this for the input data:
Lines <- "EmployeeID,Department,Separation_Type,Separation_Date,FYFQ
990962,Sales,Retirement,12/14/2013
135396,Sales,Retirement,01/11/2014"
DF <- read.csv(text = Lines)
4) Fixed so that it produces shifted calendar quarters.
The text of this answer was just a copy of another answer so it has been moved to the question.

R Programming: Difficulty removing NAs from frame when using lapply

Full disclosure: I am taking a Data Science course on Coursera. For this particular question, we need to calculate the mean of some pollutant data that is being read in from multiple files.
The main function I need help with also references a couple other functions that I wrote in the script. For brevity, I'm just going to list them and their purpose:
boundIDs: I use this to bound the input so that inputs won't be accepted that are out of range. (range is 1:332, so if someone enters 1:400 this changes the range to 1:332)
pollutantToCode: converts the pollutant string entered to that pollutant's column number in the data file
fullFilePath - Creates the file name and appends it to the full file path. So if
someone states they need the file for ID 1 in directory
"curse/your/sudden/but/inevitable/betrayal/", the function will return
"curse/your/sudden/but/inevitable/betrayal/001.csv" to be added to
the file list vector.
After all that, the main function I'm working with is:
pollutantmean <- function(directory = "", pollutant, id = 1:332){
id <- boundIDs(id)
pollutant <- pollutantToCode(pollutant)
numberOfIds <- length(id)
fileList <- character(numberOfIds)
for (i in 1:numberOfIds){
if (id[i] > 332){
next
}
fileList[i] <- fullFilePath(directory, id[i])
}
data <- lapply(fileList, read.csv)
print(data[[1]][[pollutant]])
}
Right now, I'm intentionally printing only the first frame of data to see what my output looks like. To remove the NAs I've tried using:
data <- lapply(fileList, read.csv)
data <- data[!is.na(data)]
But the NAs remained, so then I tried computing the mean directly and using the na.rm parameter:
print(mean(data[[1]][[pollutant]], na.rm = TRUE))
But the mean was still "NA". Then I tried na.omit:
data <- lapply(fileList, na.omit(read.csv))
...and unfortunately the problem persisted.
Can someone please help? :-/
(PS: Right now I'm just focusing on the first frame of whatever is read in, i.e. data[[1]], since I figure if I can't get it for the first frame there's no point in iterating over the rest.)

Creating a zoo object from a csv file (with a few inconsistencies) with R

I am trying to create a zoo object in R from the following csv file:
http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/Skewdailyprices.csv
The problem seems to be that there are a few minor inconsistencies in the period from 2/27/2006 to 3/20/2006 (some extra commas and an "x") that lead to problems.
I am looking for a method that reads the complete csv file into R automatically. There is a new data point every business day and when doing manual prepocessing you would have to re-edit the file every day by hand.
I am not sure if these are the only problems with this file but I am running out of ideas how to create a zoo object out of this time series. I think that with some more knowledge of R it should be possible.
Use colClasses to tell it that there are 4 fields and use fill so it knows to fill them if they are missing on any row. Ignore the warning:
library(zoo)
URL <- "http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/Skewdailyprices.csv"
z <- read.zoo(URL, sep = ",", header = TRUE, format = "%m/%d/%Y", skip = 1,
fill = TRUE, colClasses = rep(NA, 4))
It is a good idea to separate the cleaning and analysis steps. Since you mention that your dataset changes often, this cleaning must be automatic. Here is a solution for autocleaning.
#Read in the data without parsing it
lines <- readLines("Skewdailyprices.csv")
#The bad lines have more than two fields
n_fields <- count.fields(
"Skewdailyprices.csv",
sep = ",",
skip = 1
)
#View the dubious lines
lines[n_fields != 2]
#Fix them
library(stringr) #can use gsub from base R if you prefer
lines <- str_replace(lines, ",,x?$", "")
#Write back out to file
writeLines(lines[-1], "Skewdailyprices_cleaned.csv")
#Read in the clean version
sdp <- read.zoo(
"Skewdailyprices_cleaned.csv",
format = "%m/%d/%Y",
header = TRUE,
sep = ","
)

data.frame object to xts object conversion in R

I'd like to convert my csv files into xts objects as efficiently as possible. I seem to be stuck though with having to first applying the read.zoo method to create a zoo objects before being able to convert it to an xts object.
gold <- read.zoo("GOLD.CSV", sep=",", format="%m/%d/%Y", header=TRUE)
Gold <- as.xts (gold, order.by=index(gold), frequency=NULL)
Is this the most efficient way of converting my initial GOLD.CSV file into an R xts object?
If it is a file, you need to read it.
So use read.zoo() as you -- but then convert rightaway:
gold <- as.xts(read.zoo("GOLD.CSV", sep=",", format="%m/%d/%Y", header=TRUE))
Ok?
You can write your own read.xts function. We would call it a wrapper function and it should go something along the lines of
read.xts <- function(x, format = "%m/%d/%Y", header = TRUE, sep = ",") {
result <- as.xts(read.zoo(x, sep = sep, format = format, header = header))
return(result)
}
read.xts(file.choose()) # select your file
Notice the arguments in function(). They are passed to the body of the function (code between curly braces). If function() arguments have values, this means that this is their default. If you assign new values (e.g. function(x = "my.file.csv", sep = "\t")), they will overwrite the defaults. The last line shows you how you can use your new function. Feel free to extend this function with the rest of the read.zoo arguments. Should you have any specific question on how to do it, don't by shy and just ask. :)
I use a few of little gems like that in my daily work. I've created a file called workhorse.R and I load it (e.g. source("d:/workspace/workhorse.R")) whenever I need any of the little functions.

Resources