Basically, I want to capture data using getSymbols (quantmod), write the file(s) to disk, and read them back in with another script. I would like to use xts objects if possible. I cannot seem to make this work. Here is what I have done (and many variations thereof):
getSymbols("VNQ", from = as.Date("2015-12-01"), to = as.Date("2015-12-15"))
this.tkr <- get("VNQ")
head(this.tkr)
VNQ.Open VNQ.High VNQ.Low VNQ.Close VNQ.Volume VNQ.Adjusted
2015-12-01 79.50 80.52 79.42 80.49 3847300 79.38125
2015-12-02 80.26 80.37 78.73 78.85 5713500 77.76385
2015-12-03 78.73 78.85 77.40 77.61 4737300 76.54093
2015-12-04 77.68 79.29 77.65 79.09 3434100 78.00054
2015-12-07 78.96 79.19 78.52 78.87 4195100 77.78357
2015-12-08 78.44 79.09 78.36 78.80 3638600 77.71454
class(this.tkr)
[1] "xts" “zoo"
write.zoo(this.tkr, "Data/TestZoo”)
## then in some other script ....
new.tkr <- read.table("Data/TestZoo", stringsAsFactors = FALSE)
class(new.tkr)
[1] “data.frame"
head(new.tkr)
V1 V2 V3 V4 V5 V6 V7
1 Index VNQ.Open VNQ.High VNQ.Low VNQ.Close VNQ.Volume VNQ.Adjusted
2 2015-12-01 79.5 80.519997 79.419998 80.489998 3847300 79.381254
3 2015-12-02 80.260002 80.370003 78.730003 78.849998 5713500 77.763845
4 2015-12-03 78.730003 78.849998 77.400002 77.610001 4737300 76.540928
5 2015-12-04 77.68 79.290001 77.650002 79.089996 3434100 78.000537
6 2015-12-07 78.959999 79.190002 78.519997 78.870003 4195100 77.783574
## attempt to convert this to an xts object ...
new.tkr <- new.tkr[2:nrow(new.tkr), ] #delete first row of text captions
new.xts <- xts(new.tkr[, 2:ncol(new.tkr)], as.Date(new.tkr$V1))
head(new.xts)
V2 V3 V4 V5 V6 V7
2015-12-01 "79.5" "80.519997" "79.419998" "80.489998" "3847300" "79.381254"
2015-12-02 "80.260002" "80.370003" "78.730003" "78.849998" "5713500" "77.763845"
2015-12-03 "78.730003" "78.849998" "77.400002" "77.610001" "4737300" "76.540928"
2015-12-04 "77.68" "79.290001" "77.650002" "79.089996" "3434100" "78.000537"
2015-12-07 "78.959999" "79.190002" "78.519997" "78.870003" "4195100" "77.783574"
2015-12-08 "78.440002" "79.089996" "78.360001" "78.800003" "3638600" “77.714538"
Why does the xts conversion insist on making the columns of mode “character"? When I look at str(new.xts) the columns are all factors. Where am I jumping the track?
To preserve as much metadata as possible, save it as an R data file:
saveRDS(this.tkr, file = '~/Desktop/data.Rds')
df2 <- readRDS('~/Desktop/data.Rds')
That way,
> class(df2)
[1] "xts" "zoo"
The downside of this approach is that your data is less portable if you need to share it with people using things besides R, but that doesn't sound like a issue in this case.
This will write a zoo object in text form (portably) and read it back:
library(quantmod)
this.tkr <- getSymbols("VNQ", from = as.Date("2015-12-01"), to = as.Date("2015-12-15"),
auto.assign = FALSE, return.class = "zoo")
write.zoo(this.tkr, "TestZoo")
zz <- read.zoo("TestZoo", header = TRUE)
identical(this.tkr, zz)
## [1] TRUE
If you have an xts object convert it to zoo first like this:
library(quantmod)
this.tkr <- getSymbols("VNQ", from = as.Date("2015-12-01"), to = as.Date("2015-12-15"),
auto.assign = FALSE)
z <- as.zoo(this.tkr)
write.zoo(z, "TestZoo")
zz <- read.zoo("TestZoo", header = TRUE)
identical(z, zz)
## [1] TRUE
x <- as.xts(zz)
Related
quantmode newbie here,
My end goal is to have a CSV file including monthly stock prices, I've downloaded the data using getSymbols using this code:
Symbols <- c("DIS", "TSLA","ATVI", "MSFT", "FB", "ABT","AAPL","AMZN",
"BAC","NFLX","ADBE","WMT","SRE","T","MS")
Data <- new.env()
getSymbols(c("^GSPC",Symbols),from="2015-01-01",to="2020-12-01"
,periodicity="monthly",
env=Data)
the line above works fine, now I need to create a data frame that only includes the adjusted prices for all the symbols with a data column ofc,
any help, please? :)
Desired output would be something similar to this
enter image description here
Another straightforward way to get your monthly data:
tickers <- c('AMZN','FB','GOOG','AAPL')
getSymbols(tickers,periodicity="monthly")
head(do.call("merge.xts",c(lapply(mget(tickers),"[",,6),all=FALSE)),3)
AMZN.Adjusted FB.Adjusted GOOG.Adjusted AAPL.Adjusted
2012-06-01 228.35 31.10 288.9519 17.96558
2012-07-01 233.30 21.71 315.3032 18.78880
2012-08-01 248.27 18.06 341.2658 20.46477
Note the logical argument all = FALSE is the equivalent of an innerjoin and you get data when all of your stocks have prices. all = TRUE fills data which is not available with NAs (outerjoin).
To write the file you can use:
write.zoo(monthlyPrices,file = 'filename.csv',sep=',',quote=FALSE)
First get your data from the environment:
require(quantmod)
# your code
dat <- mget(ls(Data), env=Data)
Then draw the data from the Objects:
newdat <- as.data.frame(sapply( names(dat), function(x) coredata(dat[[x]])[,1] ))
Note that this takes the Opening values (see: dat[[x]])[,1]), the Objects have more, e.g.:
names(dat[["AAPL"]])
[1] "AAPL.Open" "AAPL.High" "AAPL.Low" "AAPL.Close"
[5] "AAPL.Volume" "AAPL.Adjusted"
Last, get the dates (assumes symmetric dates for all symbols):
rownames(newdat) <- index(dat[["AAPL"]])
# OR, more universal, by extracting from the complete list:
rownames(newdat) <-
as.data.frame( sapply( names(dat), function(x) as.character(index(dat[[x]])) ) )[,1]
head(newdat, 3)
AAPL ABT ADBE AMZN ATVI BAC DIS FB GSPC MS
2015-01-01 27.8475 45.25 72.70 312.58 20.24 17.99 94.91 78.58 2058.90 39.05
2015-02-01 29.5125 44.93 70.44 350.05 20.90 15.27 91.30 76.11 1996.67 33.96
2015-03-01 32.3125 47.34 79.14 380.85 23.32 15.79 104.35 79.00 2105.23 35.64
MSFT NFLX SRE T TSLA WMT
2015-01-01 46.66 49.15143 111.78 33.59 44.574 86.27
2015-02-01 40.59 62.84286 112.38 33.31 40.794 84.79
2015-03-01 43.67 67.71429 108.20 34.56 40.540 83.93
Writing the csv:
write.csv(newdat, "file.csv")
I have a data frame with records from the month of October 2017. Column 6 has the dates as a character vector.
This is what it looks like:
> october2017[1:6,1:6]
V1 V2 V3 V4 V5 V6
1 89108060 IN0000005 P2 RK1 CA1-R 10/1/2017
2 10503818 IN0000014 P2 RK1 CA31 10/2/2017
3 89108152 765000054 P2 RK1 CA31 10/3/2017
4 89108152 765000197 P2 RK1 CA31 10/4/2017
5 89108206 200000162 P2 RK1 CA31 10/5/2017
6 89108206 100001098 P2 RK1 CA31 10/6/2017
> class(october2017$V6)
[1] "character"
The actual data frame is much larger than this. What I want to do is create a new column to denote the day of the week that matches each date and add it to the data frame. If the date is "10/1/2017" I want the new column denoting the day of the week to show "Sunday" in that row.
This is what I want the data frame to look like:
> october2017[1:6,1:7]
V1 V2 V3 V4 V5 V6 V7
1 89108060 IN0000005 P2 RK1 CA1-R 10/1/2017 Sunday
2 10503818 IN0000014 P2 RK1 CA31 10/2/2017 Monday
3 89108152 765000054 P2 RK1 CA31 10/3/2017 Tuesday
4 89108152 765000197 P2 RK1 CA31 10/4/2017 Wednesday
5 89108206 200000162 P2 RK1 CA31 10/5/2017 Thursday
6 89108206 100001098 P2 RK1 CA31 10/6/2017 Friday
This is what I tried:
newcol = weekdays(as.Date(october2017$v6, format="%m/%d/%Y"))
october2017 = cbind(october2017,newcol, stringsAsFactors=FALSE)
This is the error message I get when I try to run the first line of this code:
Error in as.Date.default(october2017$v6, format = "%m/%d/%Y") :
do not know how to convert 'october2017$v6' to class “Date”
Can anyone help me understand why this is happening?
as.Date is a function that uses S3 method dispatch. That is, there are actually several functions:
methods("as.Date")
# [1] as.Date.character as.Date.date as.Date.dates as.Date.default
# [5] as.Date.factor as.Date.numeric as.Date.POSIXct as.Date.POSIXlt
# see '?methods' for accessing help and source code
When you call as.Date(x), R looks at the class of the first object and uses the appropriate S3 method. If none is found and a .default function exists, then that will be used as a "last resort".
If you look at the source for each of the methods, you will only find the string "do not know how to convert" in as.Date.default:
as.Date.default
# function (x, ...)
# {
# if (inherits(x, "Date"))
# return(x)
# if (is.logical(x) && all(is.na(x)))
# return(structure(as.numeric(x), class = "Date"))
# stop(gettextf("do not know how to convert '%s' to class %s",
# deparse(substitute(x)), dQuote("Date")), domain = NA)
# }
If it were one of the known classes (character, date, dates, factor, numeric, POSIXct, or POSIXlt, and now also not Date or logical-NA), then it would have run the specific function instead (none of which include that error string). This suggests that your $v6 column is a different class. Without an MWE, it is complete speculation.
I suggest you find the actual class of your data
class(dataFrame$v6)
and figure out how to convert it to one of the known versions.
Edit
Furthermore, note that R is case-sensitive. Your MWE uses lower-case v6 but your column names are upper-case. How about just
october2017$V7 <- weekdays(as.Date(oct$V6, format="%m/%d/%Y"))
When you look at october2017$v6 (lower-case), it returns NULL, which triggers the .default method of as.Date.
This is several years after the fact, but I wanted to add the alternative method I used to get around this error.
First I pulled the character vector containing the dates into a new vector, then converted it to a 1-column dataframe containing a date class vector. From there I could overwrite the original column in my date.
dates <- data$date.column
dates <- data.frame(as.Date(dates, format = "%Y-%m-%d"))
data$date.column <- dates
I'm not entirely sure why it's necessary to wrap the as.Date() call in data.frame() in the middle step, but I found that it failed without it.
Hope it helps!
I am trying to use Import Dataset in R Studio to read ratings.dat from movielens.
Basically it has this format:
1::1::5::978824268
1::1022::5::978300055
1::1028::5::978301777
1::1029::5::978302205
1::1035::5::978301753
So I need to replace :: by : or ' or white spaces, etc. I use notepad++, it helps to load the file quite fast (compare to note) and can view very big file easily. However, when I do replacement, it shows some strange characters:
"LF"
as I do some research here, it said that it is \n (line feed or line break). But I do not know why when it load the file, it do not show these, only when I do replacement then they appear. And when I load into R Studio, it still detect as "LF", not line break and cause error in data reading.
What is the solution for that ? Thank you !
PS: I know there is python code for converting this but I don't want to use it, is there any other ways ?
Try this:
url <- "http://files.grouplens.org/datasets/movielens/ml-10m.zip"
## this part is agonizingly slow
tf <- tempfile()
download.file(url,tf, mode="wb") # download archived movielens data
files <- unzip(tf, exdir=tempdir()) # unzips and returns a vector of file names
ratings <- readLines(files[grepl("ratings.dat$",files)]) # read rating.dat file
ratings <- gsub("::", "\t", ratings)
# this part is much faster
library(data.table)
ratings <- fread(paste(ratings, collapse="\n"), sep="\t")
# Read 10000054 rows and 4 (of 4) columns from 0.219 GB file in 00:00:07
head(ratings)
# V1 V2 V3 V4
# 1: 1 122 5 838985046
# 2: 1 185 5 838983525
# 3: 1 231 5 838983392
# 4: 1 292 5 838983421
# 5: 1 316 5 838983392
# 6: 1 329 5 838983392
Alternatively (use the d/l code from jlhoward but he also updated his code to not use built-in functions and switch to data.table while i wrote this, but mine's still faster/more efficient :-)
library(data.table)
# i try not to use variable names that stomp on function names in base
URL <- "http://files.grouplens.org/datasets/movielens/ml-10m.zip"
# this will be "ml-10m.zip"
fil <- basename(URL)
# this will download to getwd() since you prbly want easy access to
# the files after the machinations. the nice thing about this is
# that it won't re-download the file and waste bandwidth
if (!file.exists(fil)) download.file(URL, fil)
# this will create the "ml-10M100K" dir in getwd(). if using
# R 3.2+ you can do a dir.exists() test to avoid re-doing the unzip
# (which is useful for large archives or archives compressed with a
# more CPU-intensive algorithm)
unzip(fil)
# fast read and slicing of the input
# fread will only spit on a single delimiter so the initial fread
# will create a few blank columns. the [] expression filters those
# out. the "with=FALSE" is part of the data.table inanity
mov <- fread("ml-10M100K/ratings.dat", sep=":")[, c(1,3,5,7), with=FALSE]
# saner column names, set efficiently via data.table::setnames
setnames(mov, c("user_id", "movie_id", "tag", "timestamp"))
mov
## user_id movie_id tag timestamp
## 1: 1 122 5 838985046
## 2: 1 185 5 838983525
## 3: 1 231 5 838983392
## 4: 1 292 5 838983421
## 5: 1 316 5 838983392
## ---
## 10000050: 71567 2107 1 912580553
## 10000051: 71567 2126 2 912649143
## 10000052: 71567 2294 5 912577968
## 10000053: 71567 2338 2 912578016
## 10000054: 71567 2384 2 912578173
It's quite a bit faster than built-in functions.
Small improvement to #hrbrmstr's answer:
mov <- fread("ml-10M100K/ratings.dat", sep=":", select=c(1,3,5,7))
I am trying to read in a CSV file and change it to XTS format. However, I am running into and issue with the CSV format have date and time fields in separate columns.
2012.10.30,20:00,1.29610,1.29639,1.29607,1.29619,295
2012.10.30,20:15,1.29622,1.29639,1.29587,1.29589,569
2012.10.30,20:30,1.29590,1.29605,1.29545,1.29574,451
2012.10.30,20:45,1.29576,1.29657,1.29576,1.29643,522
2012.10.30,21:00,1.29643,1.29645,1.29581,1.29621,526
2012.10.30,21:15,1.29621,1.29644,1.29599,1.29642,330
I am trying to pull it in with
euXTS <- as.xts(read.zoo(file="EURUSD15.csv", sep=",", format="%Y.%m.%d", header=FALSE))
But it gives me this warning message so I think somehow I have to attached the time stamp but I am not sure the best way to do that.
Warning message:
In zoo(rval3, ix) :
Some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
It is better to use read.zoo to read directly your ts in a zoo object, easily coerced to xts one:
library(xts)
ts.z <- read.zoo(text='2012.10.30,20:00,1.29610,1.29639,1.29607,1.29619,295
2012.10.30,20:15,1.29622,1.29639,1.29587,1.29589,569
2012.10.30,20:30,1.29590,1.29605,1.29545,1.29574,451
2012.10.30,20:45,1.29576,1.29657,1.29576,1.29643,522
2012.10.30,21:00,1.29643,1.29645,1.29581,1.29621,526
2012.10.30,21:15,1.29621,1.29644,1.29599,1.29642,330',
sep=',',index=1:2,tz='',format="%Y.%m.%d %H:%M")
as.xts(ts.z)
V3 V4 V5 V6 V7
2012-10-30 20:00:00 1.29610 1.29639 1.29607 1.29619 295
2012-10-30 20:15:00 1.29622 1.29639 1.29587 1.29589 569
2012-10-30 20:30:00 1.29590 1.29605 1.29545 1.29574 451
2012-10-30 20:45:00 1.29576 1.29657 1.29576 1.29643 522
2012-10-30 21:00:00 1.29643 1.29645 1.29581 1.29621 526
2012-10-30 21:15:00 1.29621 1.29644 1.29599 1.29642 330
>titletool<-read.csv("TotalCSVData.csv",header=FALSE,sep=",")
> class(titletool)
[1] "data.frame"
>titletool[1,1]
[1] Experiment name : CONTROL DB AD_1
>t<-titletool[1,1]
>t
[1] Experiment name : CONTROL DB AD_1
>class(t)
[1] "character"
now i want to create an object (vector) with the name "Experiment name : CONTROL DB AD_1" , or even better if possible CONTROL DB AD_1
Thank you
Use assign:
varname <- "Experiment name : CONTROL DB AD_1"
assign(varname, 3.14158)
get("Experiment name : CONTROL DB AD_1")
[1] 3.14158
And you can use a regular expression and sub or gsub to remove some text from a string:
cleanVarname <- sub("Experiment name : ", "", varname)
assign(cleanVarname, 42)
get("CONTROL DB AD_1")
[1] 42
But let me warn you this is an unusual thing to do.
Here be dragons.
If I understand correctly, you have a bunch of CSV files, each with multiple experiments in them, named in the pattern "Experiment ...". You now want to read each of these "experiments" into R in an efficient way.
Here's a not-so-pretty (but not-so-ugly either) function that might get you started in the right direction.
What the function basically does is read in the CSV, identify the line numbers where each new experiment starts, grabs the names of the experiments, then does a loop to fill in a list with the separate data frames. It doesn't really bother making "R-friendly" names though, and I've decided to leave the output in a list, because as Andrie pointed out, "R has great tools for working with lists."
read.funkyfile = function(funkyfile, expression, ...) {
temp = readLines(funkyfile)
temp.loc = grep(expression, temp)
temp.loc = c(temp.loc, length(temp)+1)
temp.nam = gsub("[[:punct:]]", "",
grep(expression, temp, value=TRUE))
temp.out = vector("list")
for (i in 1:length(temp.nam)) {
temp.out[[i]] = read.csv(textConnection(
temp[seq(from = temp.loc[i]+1,
to = temp.loc[i+1]-1)]),
...)
names(temp.out)[i] = temp.nam[i]
}
temp.out
}
Here is an example CSV file. Copy and paste it into a text editor and save it as "funkyfile1.csv" in the current working directory. (Or, read it in from Dropbox: http://dl.dropbox.com/u/2556524/testing/funkyfile1.csv)
"Experiment Name: Here Be",,
1,2,3
4,5,6
7,8,9
"Experiment Name: The Dragons",,
10,11,12
13,14,15
16,17,18
Here is a second CSV. Again, copy-paste and save it as "funkyfile2.csv" in your current working directory. (Or, read it in from Dropbox: http://dl.dropbox.com/u/2556524/testing/funkyfile2.csv)
"Promises: I vow to",,
"H1","H2","H3"
19,20,21
22,23,24
25,26,27
"Promises: Slay the dragon",,
"H1","H2","H3"
28,29,30
31,32,33
34,35,36
Notice that funkyfile1 has no column names, while funkyfile2 does. That's what the ... argument in the function is for: to specify header=TRUE or header=FALSE. Also the "expression" identifying each new set of data is "Promises" in funkyfile2.
Now, use the function:
read.funkyfile("funkyfile1.csv", "Experiment", header=FALSE)
# read.funkyfile("http://dl.dropbox.com/u/2556524/testing/funkyfile1.csv",
# "Experiment", header=FALSE) # Uncomment to load remotely
# $`Experiment Name Here Be`
# V1 V2 V3
# 1 1 2 3
# 2 4 5 6
# 3 7 8 9
#
# $`Experiment Name The Dragons`
# V1 V2 V3
# 1 10 11 12
# 2 13 14 15
# 3 16 17 18
read.funkyfile("funkyfile2.csv", "Promises", header=TRUE)
# read.funkyfile("http://dl.dropbox.com/u/2556524/testing/funkyfile2.csv",
# "Experiment", header=TRUE) # Uncomment to load remotely
# $`Promises I vow to`
# H1 H2 H3
# 1 19 20 21
# 2 22 23 24
# 3 25 26 27
#
# $`Promises Slay the dragon`
# H1 H2 H3
# 1 28 29 30
# 2 31 32 33
# 3 34 35 36
Go get those dragons.
Update
If your data are all in the same format, you can use the lapply solution mentioned by Andrie along with this function. Just make a list of the CSVs that you want to load, as below. Note that the files all need to use the same "expression" and other arguments the way the function is currently written....
temp = list("http://dl.dropbox.com/u/2556524/testing/funkyfile1.csv",
"http://dl.dropbox.com/u/2556524/testing/funkyfile3.csv")
lapply(temp, read.funkyfile, "Experiment", header=FALSE)