data.frame object to xts object conversion in R - r

I'd like to convert my csv files into xts objects as efficiently as possible. I seem to be stuck though with having to first applying the read.zoo method to create a zoo objects before being able to convert it to an xts object.
gold <- read.zoo("GOLD.CSV", sep=",", format="%m/%d/%Y", header=TRUE)
Gold <- as.xts (gold, order.by=index(gold), frequency=NULL)
Is this the most efficient way of converting my initial GOLD.CSV file into an R xts object?

If it is a file, you need to read it.
So use read.zoo() as you -- but then convert rightaway:
gold <- as.xts(read.zoo("GOLD.CSV", sep=",", format="%m/%d/%Y", header=TRUE))
Ok?

You can write your own read.xts function. We would call it a wrapper function and it should go something along the lines of
read.xts <- function(x, format = "%m/%d/%Y", header = TRUE, sep = ",") {
result <- as.xts(read.zoo(x, sep = sep, format = format, header = header))
return(result)
}
read.xts(file.choose()) # select your file
Notice the arguments in function(). They are passed to the body of the function (code between curly braces). If function() arguments have values, this means that this is their default. If you assign new values (e.g. function(x = "my.file.csv", sep = "\t")), they will overwrite the defaults. The last line shows you how you can use your new function. Feel free to extend this function with the rest of the read.zoo arguments. Should you have any specific question on how to do it, don't by shy and just ask. :)
I use a few of little gems like that in my daily work. I've created a file called workhorse.R and I load it (e.g. source("d:/workspace/workhorse.R")) whenever I need any of the little functions.

Related

Using an R function output within another function

I am adapting an open-source code from GitHub (https://github.com/PeterDSteinberg/RSWMM) to calibrate EPA-SWMM. The code is written in two parts -- RSWMM.r contains the functions needed to look up/ replace SWMM values, read in the calibration files etc. runRSWMM.r is a wrapper containing the optimization code that calls in the relevant RSWMM functions. I need to edit the RSWMM code to read in a .txt file and have written the following code to do so.
getLIDtimeseriesFromTxt <- function(TxtFile, dateformat = "%m/%d/%y %H:%M"){
#RW (05/12/17): This function added to read in the appropriate CSV file
#containing the LID outputs
library(chron)
LID_data <- read.table(TxtFile, skip=9)
times <- LID_data[,1]
startTime <- as.chron("6/8/2006 1:00", format="%m/%d/%Y %H:%M", tz="GMT") #M/D/YY H:MM
simData <<- {}
simData$times <<- (times/24)+startTime
simData$obs <<- LID_data[,9] #the ninth column is ponding depth
return(simData)
}
getCalDataFromCSV<-function(CSVFile,dateFormat="%m/%d/%y %H:%M"){
temp=read.csv(file=CSVFile, header = TRUE, sep = ",", quote="\"", dec=".", fill = TRUE, comment.char="",stringsAsFactors = FALSE)
calData<<-{}
#RW (05/12/17): This line of code edited to match the date format pulled in from the simulation file.
calData$times<<-as.chron(temp[,1], format=dateFormat,tz="GMT")
calData$obs<<-temp[,2]
return(calData)
}
interpCalDataToSWMMTimes<-function(){
mergedData <- merge(simData, calData, by="times")
return(mergedData)
#first column of mergedData are times as chrons,
#second column is simulated data
#third column is observed data
}
When I run the full code from runRSWMM, I get the error the "simData" is not found, which implies that I cannot use merge. However, this doesn't appear to be a problem with the variable "calData", which I can see in my global environment in RStudio. What is the difference in the way the two variables are being output? How can I fix this error?

R: Loop through a list of csv files to alter date format and create XTS objects

I'm trying to import a file of csv's and ultimately convert them to XTS objects in R.
Each individual csv is of the format:
Date Open High Low Close Volume
18-Jun-99 2.35 2.35 2.35 2.35 34000
21-Jun-99 2.35 2.35 2.35 2.35 57317
22-Jun-99 2.35 2.35 2.35 2.35 7000
The issue here is the date, however a function within lubridate converts this easily. For an individual csv my process is as follows:
require(xts)
CAR.csv <- read.csv("CAR.csv", header = TRUE)
require(lubridate)
CAR.csv$Date <- dmy(CAR.csv$Date)
CAR.csv <- read.zoo(CAR.csv)
CAR.csv <- as.xts(CAR.csv)
However I need to do this for many hundred files so I'd like to be able to loop through them all. I'm stuck at this point now:
setwd("C:/Users/Administrator/Desktop/data")
library(xts)
temp = list.files(pattern="*.csv")
for (i in 1:length(temp)) assign(temp[i], read.csv(temp[i], header = TRUE))
I don't really know how to apply the dmy function to only the date column within a loop and I would love any assistance that could point me in the right direction.
A previous version of my loop for csv files with the correct date format was this:
setwd("C:/Users/Administrator/Desktop/data")
library(xts)
temp = list.files(pattern="*.csv")
toDate <- function(x) as.Date(x, origin = "2005-01-01")
for (i in 1:length(temp)) assign(temp[i], as.xts(read.zoo((temp[i]), header = TRUE, sep = ",", FUN = toDate)))
In terms of a fully reproducible example here is a sample folder of csv's if required, however I suspect this is straight forward for most competent R users.
I would certainly love some advice.
Many thanks
setwd("C:/Users/Administrator/Desktop/data")
library(xts)
library(lubridate)
load_file <- function(file_name) {
csv_file <- read.csv(file_name, header = TRUE)
csv_file$Date.Time <- dmy(csv_file$Date.Time)
csv_file <- read.zoo(csv_file)
csv_file <- as.xts(csv_file)
csv_file
}
list_of_files = list.files(pattern="*.csv")
data <- lapply(X = list_of_files, FUN = load_file)
The code works by defining a function that when given the name of a file in the working directory, reads it and then performs the transformations required on that one file. Note that in your example data the date column is called Date.Time so I have changed the code to reflect this.
Instead of using a loop, I have applied the function to each individual filename in the list of filenames, using the base apply(...) function. The output of this operation is a list containing the transformed data you're after. To access each data object, use data[[1]] etc.
Just change your for-loop to what you would do with a single file:
for (i in 1:length(temp)){
assign("new.tmp", read.csv(temp[i], header = TRUE))
new.tmp$Date <- dmy(new.tmp$Date)
new.tmp <- read.zoo(new.tmp)
assign(temp[i],as.xts(new.tmp))
}
This might cost a little bit of time, since you copy the whole object once more often in each loop-iteration, but I think this is the simplest solution.
In general, I prefer to initialize a list before the loop, read and process that files, and then store them back in the list.
The main advantage of such approach is:
Keeping your environment clean
Ability to use the lapply to do the same processing for all the files loaded
Ability to extract/process a single file by simply indexing it either by filename or index
Code Sample:
paths.allFiles = list.files(pattern="*.csv") # Equivalent to "temp"
processedCSVs = list()
for(path.oneFile in paths.allFiles){ # hint: you can access the file names directly without indexing
csv = as.xts(read.zoo(path.oneFile, header = TRUE, sep = ",", FUN = toDate))
processedCSVs[path.oneFile] = csv
}
lapply(processedCSVs, nrow) # Returns all the nrows of all files
nrow(processedCSVs[[1]]) # Returns the nrows of the indexed file only

Time Series - plot.ts() and multiple graphs

I've seen several threads on the error I have
cannot plot more than 10 series as "multiple"
But none really explaining (1) What's going on and (2) how to get around it if you have multiple graphs.
I have a 12 different files.
Each file is 1 row of ~240-250 data points. This is time-series data. The values range changes from file to file.
I want to make a graph that has them all on one single plot. So something like par(mfrow=(4,3)).
However, when I use my code, it gives me the above error.
for(cand in cands)
{
par(mfrow=c(4,3))
for(type in types)
{
## Construct the file name
curFile = paste(folder, cand, base, type, close, sep="")
## Read in the file
ts = read.delim(curFile, sep="\t", stringsAsFactors=FALSE, header=FALSE, row.names=NULL,fill=TRUE, quote="", comment.char="")
plot.ts(ts)
}
}
First, don't call your time series object "ts". It's like calling your dog "dog". "ts" gets used in the system, and this can lead to confusion.
Have a look at the structure of your "ts" from reading the file. From your description, is the file a single row with 240+ columns? If so, that'll be a problem too.
read.delim() is expecting a column-oriented data file, not row-oriented. You'll need to transpose it if this is the case. Something like:
my.ts = t(
read.delim(curFile, sep="\t", stringsAsFactors=FALSE,
header=FALSE, row.names=NULL,
fill=TRUE, quote="", comment.char="")
)
my.ts = ts(my.ts)

Load csv into R as xts, or comparable to enable time series analysis

I am still learning R, and get very confused when using various data types, classes, etc. I have run into this issue of "Dates" not being in the right format for xts countless times now, and find a solution each time after searching long and hard for (what I consider) complicated solutions.
I am looking for a way to load a CSV into R and convert the date upon loading it each time I want to load a csv into R. 99% of my files contain Date as the first column, in format 01-31-1900 (xts wants YYYY-mm-dd).
Right now I have the following:
FedYieldCurve <- read.csv("Yield Curve.csv", header = TRUE, sep = ",", stringsAsFactors = FALSE)
FedYieldCurve$Date <- format(as.Date(FedYieldCurve$Date), "%Y/%m/%d")
and i am getting: Error in charToDate(x) :
character string is not in a standard unambiguous format
The format argument must be inside as.Date. Try this (if the dates in the files are stored in the 01-31-1900 format):
as.Date(FedYieldCurve$Date,format="%m-%d-%Y")
When you try to coerce a string to a Date object you have to specify the format of the string as the format argument in the as.Date call. You have the error you reported when you try to coerce a string which has a format other than the standard YYYY-mm-dd.
Provide a few lines of the file when asking questions like this. In the absence of this we have supplied some data below in a self contained example.
Use read.zoo from the zoo package (which xts loads) specifying the format. (Replace the read.zoo line with the commented line to read from a file.)
Lines <- "Date,Value
01-31-1900,3"
library(xts)
# z <- read.zoo("myfile.csv", header = TRUE, sep = ",", format = "%m-%d-%Y")
z <- read.zoo(text = Lines, header = TRUE, sep = ",", format = "%m-%d-%Y")
x <- as.xts(z)
See ?read.zoo and Reading Data in zoo.

Creating a zoo object from a csv file (with a few inconsistencies) with R

I am trying to create a zoo object in R from the following csv file:
http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/Skewdailyprices.csv
The problem seems to be that there are a few minor inconsistencies in the period from 2/27/2006 to 3/20/2006 (some extra commas and an "x") that lead to problems.
I am looking for a method that reads the complete csv file into R automatically. There is a new data point every business day and when doing manual prepocessing you would have to re-edit the file every day by hand.
I am not sure if these are the only problems with this file but I am running out of ideas how to create a zoo object out of this time series. I think that with some more knowledge of R it should be possible.
Use colClasses to tell it that there are 4 fields and use fill so it knows to fill them if they are missing on any row. Ignore the warning:
library(zoo)
URL <- "http://www.cboe.com/publish/scheduledtask/mktdata/datahouse/Skewdailyprices.csv"
z <- read.zoo(URL, sep = ",", header = TRUE, format = "%m/%d/%Y", skip = 1,
fill = TRUE, colClasses = rep(NA, 4))
It is a good idea to separate the cleaning and analysis steps. Since you mention that your dataset changes often, this cleaning must be automatic. Here is a solution for autocleaning.
#Read in the data without parsing it
lines <- readLines("Skewdailyprices.csv")
#The bad lines have more than two fields
n_fields <- count.fields(
"Skewdailyprices.csv",
sep = ",",
skip = 1
)
#View the dubious lines
lines[n_fields != 2]
#Fix them
library(stringr) #can use gsub from base R if you prefer
lines <- str_replace(lines, ",,x?$", "")
#Write back out to file
writeLines(lines[-1], "Skewdailyprices_cleaned.csv")
#Read in the clean version
sdp <- read.zoo(
"Skewdailyprices_cleaned.csv",
format = "%m/%d/%Y",
header = TRUE,
sep = ","
)

Resources