Using zoo package - r

Please help!
I have a .csv file with 4 columns: Date, VBLTX, FMAGX and SBUX. The three latter columns are adjusted closing prices of some stocks. And the date column are the months from Jan-1998 to Dec-2009. Here is the first couple or rows:
Date |VBLTX |FMAGX |SBUX
1/01/1998 |4.36 |44.38 |4.3
1/02/1998 |4.34 |47.74 |4.66
1/03/1998 |4.35 |47.74 |5.33
I am trying to read this into R as a zoo object that should look like this:
|VBLTX |FMAGX |SBUX
Jan 1998 |4.36 |44.38 |4.3
Feb 1998 |4.34 |47.74 |4.66
Mar 1998 |4.35 |47.74 |5.33
I have no idea how to make this work. I am currently using this line of code:
all_prices <- read.zoo("all_prices.csv", FUN = identity)
And this produces this zoo series:
|V2 |V3 |V4
Apr-00 |4.63 |73.15 |7.12
Apr-01 |5.22 |63.05 |9.11
Apr-02 |5.71 |53.88 |10.74
It appears to have sorted the csv file alphabetically rather than by date. Also if I scroll through the zoo series there is a row which is the column names from the csv file.
Any help would be appreciated
Thanks!

If you have "no idea" how to use a command then read the help file for it carefully -- in this case ?read.zoo. Also there is a vignette that comes with zoo entirely devoted to read.zoo examples: vignette("zoo-read") . Also reviewing ?yearmon would be useful here.
Assuming that the input file is as shown reproducibly in the Note at the end and NOT as shown in the question it should NOT have a .csv extension since it is not a csv file; however, ignoring that we have the following.
header = TRUE says the first line is a header, FUN = as.yearmon says we want to convert the first column to a yearmon class time index and format specifies its format (using the percent codes defined in ?strptime).
library(zoo)
read.zoo("all_prices.csv", header = TRUE, FUN = as.yearmon, format = "%d/%m/%Y")
giving:
VBLTX FMAGX SBUX
Jan 1998 4.36 44.38 4.30
Feb 1998 4.34 47.74 4.66
Mar 1998 4.35 47.74 5.33
Note
Lines <- "
Date VBLTX FMAGX SBUX
1/01/1998 4.36 44.38 4.3
1/02/1998 4.34 47.74 4.66
1/03/1998 4.35 47.74 5.33
"
cat(Lines, file = "all_prices.csv")

Related

How to create specefic columns out of text in r

Here is just an example I hope you can help me with, given that the input is a line from a txt file, I want to transform it into a table (see output) and save it as a csv or tsv file.
I have tried with separate functions but could not get it right.
Input
"PR7 - Autres produits d'exploitation 6.9 371 667 1 389"
Desired output
Variable
note
2020
2019
2018
PR7 - Autres produits d'exploitation
6.9
371
667
1389
I'm assuming that this badly delimited data-set is the only place where you can read your data.
I created for the purpose of this answer an example file (that I called PR.txt) that contains only the two following lines.
PR6 - Blabla 10 156 3920 245
PR7 - Autres produits d'exploitation 6.9 371 667 1389
First I create a function to parse each line of this data-set. I'm assuming here that the original file does not contain the names of the columns. In reality, this is probably not the case. Thus this function that could be easily adapted to take a first "header" line into account.
readBadlyDelimitedData <- function(x) {
# Read the data
dat <- read.table(text = x)
# Get the type of each column
whatIsIt <- sapply(dat, typeof)
# Combine the columns that are of type "character"
variable <- paste(dat[whatIsIt == "character"], collapse = " ")
# Put everything in a data-frame
res <- data.frame(
variable = variable,
dat[, whatIsIt != "character"])
# Change the names
names(res)[-1] <- c("note", "Year2021", "Year2020", "Year2019")
return(res)
}
Note that I do not call the columns with the yearly figure by only "numeric" names because giving rows or columns purely "numerical" names is not a good practice in R.
Once I have this function, I can (l)apply it to each line of the data by combining it with readLines, and collapse all the lines with an rbind.
out <- do.call("rbind", lapply(readLines("tests/PR.txt"), readBadlyDelimitedData))
out
variable note Year2021
1 PR6 - Blabla 10.0 156
2 PR7 - Autres produits d'exploitation 6.9 371
Year2020 Year2019
1 3920 245
2 667 1389
Finally, I save the result with read.csv :
read.csv(out, file = "correctlyDelimitedFile.csv")
If you can get your hands on the Excel file, a simple gdata::read.xls or openxlsx::read.xlsx would be enough to read the data.
I wish I knew how to make the script simpler... maybe a tidyr magic person would have a more elegant solution?

R: from a number of .csv to a single time series in xts

I have 100+ csv files in the current directory, all with the same characteristics. Some examples:
ABC.csv
,close,high,low,open,time,volumefrom,volumeto,timestamp
0,0.05,0.05,0.05,0.05,1405555200,100.0,5.0,2014-07-17 02:00:00
1,0.032,0.05,0.032,0.05,1405641600,500.0,16.0,2014-07-18 02:00:00
2,0.042,0.05,0.026,0.032,1405728000,12600.0,599.6,2014-07-19 02:00:00
...
1265,0.6334,0.6627,0.6054,0.6266,1514851200,6101389.25,3862059.89,2018-01-02 01:00:00
XYZ.csv
,close,high,low,open,time,volumefrom,volumeto,timestamp
0,0.0003616,0.0003616,0.0003616,0.0003616,1412640000,11.21,0.004054,2014-10-07 02:00:00
...
1183,0.0003614,0.0003614,0.0003614,0.0003614,1514851200,0.0,0.0,2018-01-02 01:00:00
The idea is to build in R a time series dataset in xts so that I could use the PerformanceAnalyticsand quantmod libraries. Something like that:
## ABC XYZ ... ... JKL
## 2006-01-03 NaN 20.94342
## 2006-01-04 NaN 21.04486
## 2006-01-05 9.728111 21.06047
## 2006-01-06 9.979226 20.99804
## 2006-01-09 9.946529 20.95903
## 2006-01-10 10.575626 21.06827
## ...
Any idea? I can provide my trials if required.
A solution using base R
If you know that your files are formatted the same way then you can merge them. Below is what I would have done.
Get a list a files (this assumes that all the .csv files are the one you actually need and they are placed in the working directory)
vcfl <- list.files(pattern = "*.csv")
lapply() to open all files and store them as.data.frame:
lsdf <- lapply(lsfl, read.csv)
Merge them. Here I used the column high but you can apply the same code on any variable (there likely is a solution without a loop)
out_high <- lsdf[[1]][,c("timestamp", "high")]
for (i in 2:length(vcfl)) {
out_high <- merge(out_high, lsdf[[i]][,c("timestamp", "high")], by = "timestamp")
}
Rename the column using the vector of files' names:
names(lsdf)[2:length(vcfl)] <- gsub(vcfl, pattern = ".csv", replacement = "")
You can now use as.xts() fron the xts package https://cran.r-project.org/web/packages/xts/xts.pdf
I guess there is an alternative solution using tidyverse, somebody else?
Hope this helps.

openxlsx: read.xlsx throws an error if the sheet name contains the "&" character

Create an .xlsx file with three sheets named: "Test 1", "S&P500 TR" and "SP500 TR". Put some random content in each sheet and save it as "Book1.xlsx".
Run:
> a <- getSheetNames("Book1.xlsx")
> a
[1] "Test 1" "S&P500 TR" "SP500 TR"
Now try:
> read.xlsx("Book1.xlsx", a[2])
Error in read.xlsx.default("Book1.xlsx", a[2]) :
Cannot find sheet named "S&P500 TR"
First check if you actually type the name S&P500 TR instead of using a[2] that would change anything.
Alternatively, you can use readxl package for importing;
library(readxl)
X1 <- read_excel("C:/1.xls", sheet = "S&P500 TR")
This is a spreadsheet that I had and it is the result after it is imported;
head(X1)
# A tibble: 6 × 4
# Year Month Community ` Average Daily`
# <dbl> <chr> <chr> <dbl>
# 1 2016 Jan Arlington 5.35
# 2 2016 Jan Ashland 1.26
# 3 2016 Jan Bedford 2.62
# 4 2016 Jan Belmont 3.03
# 5 2016 Jan Boston 84.89
# 6 2016 Jan Braintree 8.16
I ran into the same problem, but found a workaround. First load in the workbook using read.xlsx(). Then rename the problematic sheet to avoid the ampersand. To fix the code in your example:
wb = read.xlsx("Book1.xlsx")
renameWorksheet(wb, "S&P500 TR", "NEW NAME")
output = read.xlsx(wb, "NEW NAME")
Hope this helps!
First load the workbook, then use the which and grepl function to return the sheet number containing the sheet name (which can include the '&' character when done in this way). This seems to work quite well in an application I am currently working on.
An (incomplete) example is given below that should be easily modified to your context. In my case 'i' is a file name (looping over many files). The "toy" code is here:
wb <- loadWorkbook(file = i)
which( grepl("CAPEX & Depreciation", names(wb)) )

xts subset quarterly data

I want to subset a range of quarterly data held inside an xts object.
I see the documentation says "xts provides facilities for indexing based on any of the current time-based classes. These include yearqtr"
However I have tried the following, which do produce a range of data but not the dates I request.
a = as.xts(ts(rnorm(20), start=c(1980,1), freq=4))
a["1983"] # Returns 1983Q2 - 1984Q1 ?
a["1983-01/"] # Begins in 1983Q2 ?
a["1981-01/1983-03"] # Returns 1981Q2 - 1983Q2 ?
a[as.yearqtr("1981 Q2")] # Correct
a[as.yearqtr("1981 Q1")/as.yearqtr("1983 Q3")] # Does not work
Looks like a timezone issue. The xts index is always a POSIXct object, even if the index class is something else. Like a Date classed index, the yearqtr (and yearmon) classed index should have the timezone set to "UTC".
> a <- as.xts(ts(rnorm(20), start=c(1980,1), freq=4), tzone="UTC")
> a["1983"]
[,1]
1983 Q1 1.4877302
1983 Q2 -0.4594768
1983 Q3 -0.1906189
1983 Q4 -1.1518943
Warning message:
timezone of object (UTC) is different than current timezone ().
You can safely ignore the warning. If it really bothers you, you can set your R session's timezone to "UTC" via:
> Sys.setenv(TZ="UTC")
> a <- as.xts(ts(rnorm(20), start=c(1980,1), freq=4))
> a["1983"]
[,1]
1983 Q2 1.84636890
1983 Q3 -0.06872544
1983 Q4 -2.29822631
1984 Q1 -1.46025131
This will never work:
a[as.yearqtr("1981 Q1")/as.yearqtr("1983 Q3")] # Does not work
It looks like you're trying to do something like: a["1981 Q1/1983 Q3"], which isn't supported because "YYYY Qq" is not an ISO8601 format.

How do I change the index in a csv file to a proper time format?

I have a CSV file of 1000 daily prices
They are of this format:
1 1.6
2 2.5
3 0.2
4 ..
5 ..
6
7 ..
.
.
1700 1.3
The index is from 1:1700
But I need to specify a begin date and end date this way:
Start period is lets say, 25th january 2009
and the last 1700th value corresponds to 14th may 2013
So far Ive gotten this close to this problem:
> dseries <- ts(dseries[,1], start = ??time??, freq = 30)
How do I go about this? thanks
UPDATE:
managed to create a seperate object with dates as suggested in the answers and plotted it, but the y axis is weird, as shown in the screenshot
Something like this?
as.Date("25-01-2009",format="%d-%m-%Y") + (seq(1:1700)-1)
A better way, thanks to #AnandaMahto:
seq(as.Date("2009-01-25"), by="1 day", length.out=1700)
Plotting:
df <- data.frame(
myDate=seq(as.Date("2009-01-25"), by="1 day", length.out=1700),
myPrice=runif(1700)
)
plot(df)
R stores Date-classed objects as the integer offset from "1970-01-01" but the as.Date.numeric function needs an offset ('origin') which can be any staring date:
rDate <- as.Date.numeric(dseries[,1], origin="2009-01-24")
Testing:
> rDate <- as.Date.numeric(1:10, origin="2009-01-24")
> rDate
[1] "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29"
[6] "2009-01-30" "2009-01-31" "2009-02-01" "2009-02-02" "2009-02-03"
You didn't need to add the extension .numeric since R would automticallly seek out that function if you used the generic stem, as.Date, with an integer argument. I just put it in because as.Date.numeric has different arguments than as.Date.character.

Resources