Related
quantmode newbie here,
My end goal is to have a CSV file including monthly stock prices, I've downloaded the data using getSymbols using this code:
Symbols <- c("DIS", "TSLA","ATVI", "MSFT", "FB", "ABT","AAPL","AMZN",
"BAC","NFLX","ADBE","WMT","SRE","T","MS")
Data <- new.env()
getSymbols(c("^GSPC",Symbols),from="2015-01-01",to="2020-12-01"
,periodicity="monthly",
env=Data)
the line above works fine, now I need to create a data frame that only includes the adjusted prices for all the symbols with a data column ofc,
any help, please? :)
Desired output would be something similar to this
enter image description here
Another straightforward way to get your monthly data:
tickers <- c('AMZN','FB','GOOG','AAPL')
getSymbols(tickers,periodicity="monthly")
head(do.call("merge.xts",c(lapply(mget(tickers),"[",,6),all=FALSE)),3)
AMZN.Adjusted FB.Adjusted GOOG.Adjusted AAPL.Adjusted
2012-06-01 228.35 31.10 288.9519 17.96558
2012-07-01 233.30 21.71 315.3032 18.78880
2012-08-01 248.27 18.06 341.2658 20.46477
Note the logical argument all = FALSE is the equivalent of an innerjoin and you get data when all of your stocks have prices. all = TRUE fills data which is not available with NAs (outerjoin).
To write the file you can use:
write.zoo(monthlyPrices,file = 'filename.csv',sep=',',quote=FALSE)
First get your data from the environment:
require(quantmod)
# your code
dat <- mget(ls(Data), env=Data)
Then draw the data from the Objects:
newdat <- as.data.frame(sapply( names(dat), function(x) coredata(dat[[x]])[,1] ))
Note that this takes the Opening values (see: dat[[x]])[,1]), the Objects have more, e.g.:
names(dat[["AAPL"]])
[1] "AAPL.Open" "AAPL.High" "AAPL.Low" "AAPL.Close"
[5] "AAPL.Volume" "AAPL.Adjusted"
Last, get the dates (assumes symmetric dates for all symbols):
rownames(newdat) <- index(dat[["AAPL"]])
# OR, more universal, by extracting from the complete list:
rownames(newdat) <-
as.data.frame( sapply( names(dat), function(x) as.character(index(dat[[x]])) ) )[,1]
head(newdat, 3)
AAPL ABT ADBE AMZN ATVI BAC DIS FB GSPC MS
2015-01-01 27.8475 45.25 72.70 312.58 20.24 17.99 94.91 78.58 2058.90 39.05
2015-02-01 29.5125 44.93 70.44 350.05 20.90 15.27 91.30 76.11 1996.67 33.96
2015-03-01 32.3125 47.34 79.14 380.85 23.32 15.79 104.35 79.00 2105.23 35.64
MSFT NFLX SRE T TSLA WMT
2015-01-01 46.66 49.15143 111.78 33.59 44.574 86.27
2015-02-01 40.59 62.84286 112.38 33.31 40.794 84.79
2015-03-01 43.67 67.71429 108.20 34.56 40.540 83.93
Writing the csv:
write.csv(newdat, "file.csv")
I am downloading some data using R package tseries,
require('tseries')
tickers<- c('JPM','AAPL','MSFT','FB','GE');
prices = matrix(NA,nrow=40,ncol=6)
startdate<-'2015-02-02'
enddate<-'2015-03-30'# 40 rows dim()
for(i in 1:5){
prices[,i]<-get.hist.quote(
instrument=tickers[i],
start=startdate,
end=enddate,
quote='AdjClose',
provider='yahoo')
}
colnames(prices)<-c('JPM','AAPL','MSFT','FB','GE');
I want to construct a matrix saving the adjclose price and date information, but I don't know how to access the zoo date column, say when I construct a zoo object using get.hist.quote(), I can view the object like this
But when I save them to matrix, the date column is missing
Here Map applied to get.hist.quote will create a zoo object for each ticker. Then we use zoo's multiway merge.zoo to merge them all together creating a final zoo object prices:
prices <- do.call(merge,
Map(get.hist.quote, tickers,
start=startdate,
end=enddate,
quote='AdjClose',
provider='yahoo')
)
I would probably keep all the series in a zoo object. This can be done like in the following code, thereby also avoiding your for-loop etc. You can always convert this object to a matrix by as.matrix() afterwards.
prices <-lapply(tickers, get.hist.quote, start=startdate, end=enddate, quote='AdjClose')
prices <- Reduce(cbind, prices)
names(prices) <- tickers
prices <- as.matrix(prices)
head(prices)
JPM AAPL MSFT FB GE
2015-02-02 55.10 118.16 40.99 74.99 23.99
2015-02-03 56.35 118.18 41.31 75.40 24.25
2015-02-04 56.01 119.09 41.54 75.63 23.94
2015-02-05 56.40 119.94 42.15 75.61 24.28
2015-02-06 57.51 118.93 42.11 74.47 24.30
2015-02-09 57.44 119.72 42.06 74.44 24.42
I am trying to read in a CSV file and change it to XTS format. However, I am running into and issue with the CSV format have date and time fields in separate columns.
2012.10.30,20:00,1.29610,1.29639,1.29607,1.29619,295
2012.10.30,20:15,1.29622,1.29639,1.29587,1.29589,569
2012.10.30,20:30,1.29590,1.29605,1.29545,1.29574,451
2012.10.30,20:45,1.29576,1.29657,1.29576,1.29643,522
2012.10.30,21:00,1.29643,1.29645,1.29581,1.29621,526
2012.10.30,21:15,1.29621,1.29644,1.29599,1.29642,330
I am trying to pull it in with
euXTS <- as.xts(read.zoo(file="EURUSD15.csv", sep=",", format="%Y.%m.%d", header=FALSE))
But it gives me this warning message so I think somehow I have to attached the time stamp but I am not sure the best way to do that.
Warning message:
In zoo(rval3, ix) :
Some methods for “zoo” objects do not work if the index entries in ‘order.by’ are not unique
It is better to use read.zoo to read directly your ts in a zoo object, easily coerced to xts one:
library(xts)
ts.z <- read.zoo(text='2012.10.30,20:00,1.29610,1.29639,1.29607,1.29619,295
2012.10.30,20:15,1.29622,1.29639,1.29587,1.29589,569
2012.10.30,20:30,1.29590,1.29605,1.29545,1.29574,451
2012.10.30,20:45,1.29576,1.29657,1.29576,1.29643,522
2012.10.30,21:00,1.29643,1.29645,1.29581,1.29621,526
2012.10.30,21:15,1.29621,1.29644,1.29599,1.29642,330',
sep=',',index=1:2,tz='',format="%Y.%m.%d %H:%M")
as.xts(ts.z)
V3 V4 V5 V6 V7
2012-10-30 20:00:00 1.29610 1.29639 1.29607 1.29619 295
2012-10-30 20:15:00 1.29622 1.29639 1.29587 1.29589 569
2012-10-30 20:30:00 1.29590 1.29605 1.29545 1.29574 451
2012-10-30 20:45:00 1.29576 1.29657 1.29576 1.29643 522
2012-10-30 21:00:00 1.29643 1.29645 1.29581 1.29621 526
2012-10-30 21:15:00 1.29621 1.29644 1.29599 1.29642 330
I'm using the quantmodpackage. I've got a vector of tickers like this :
c("AAPL","GOOG","IBM","GS","AMZN","GE")
and I want to create a function to calculate the EBIT margin of a stock (= operating income / total revenue). So for a given stock, I use the following piece of code which only works for GE (provided a ".f" is added a the end of the ticker) :
require(quantmod)
getFinancials("GE",period="A")
ebit.margin <- function(stock.ticker.f){
return(stock.ticker$IS$A["Operating Income",]/stock.ticker$IS$A["Total Revenue",])
}
ebit.margin("GE")
I would like to generalize this function in order to use then the applyfunction. There are several difficulties :
when applying the quantmod::getFinancialfunction to a ticker, the financial statements of the stocks are saved in the default environment. The viewFinancialhas then to be used to get and print the financial statements. I need a way to get access to the financial statements directly into the function
The function's argument function is a string like "GE.f" but it would more convenient to enter directly the ticker ("GE"). I've tried to use the paste0 and gsub to get a string like "GE.f" it doesn't work because "GE.f" doesn't belong to the financials class.
To sum up, I'm a bit lost...
It's easier if you use auto.assign=FALSE
s <- c("AAPL","GOOG","IBM","GS","AMZN","GE")
fin <- lapply(s, getFinancials, auto.assign=FALSE)
names(fin) <- s
lapply(fin, function(x) x$IS$A["Operating Income", ] / x$IS$A["Total Revenue",])
#$AAPL
#2012-09-29 2011-09-24 2010-09-25 2009-09-26
# 0.3529596 0.3121507 0.2818704 0.2736278
#
#$GOOG
#2012-12-31 2011-12-31 2010-12-31 2009-12-31
# 0.2543099 0.3068724 0.3540466 0.3514585
#
#$IBM
#2012-12-31 2011-12-31 2010-12-31 2009-12-31
# 0.2095745 0.1964439 0.1974867 0.1776439
#
#$GS
#2012-12-31 2011-12-31 2010-12-31 2009-12-31
#0.2689852 0.1676678 0.2804621 0.3837401
#
#$AMZN
#2012-12-31 2011-12-31 2010-12-31 2009-12-31
#0.01106510 0.01792957 0.04110630 0.04606471
#
#$GE
#2012-12-31 2011-12-31 2010-12-31 2009-12-31
#0.11811969 0.13753327 0.09415548 0.06387029
Anaother option is to laod your tickers in an new environnement.
tickers <- new.env()
s <- c("AAPL","GOOG","IBM","GS","AMZN","GE")
lapply(s, getFinancials,env=tickers)
sapply(ls(envir=tickers),
function(x) {x <- get(x) ## get the varible name
x$IS$A["Operating Income", ] / x$IS$A["Total Revenue",]})
AAPL.f AMZN.f GE.f GOOG.f GS.f IBM.f
2012-09-29 0.3529596 0.01106510 0.11811969 0.2543099 0.2689852 0.2095745
2011-09-24 0.3121507 0.01792957 0.13753327 0.3068724 0.1676678 0.1964439
2010-09-25 0.2818704 0.04110630 0.09415548 0.3540466 0.2804621 0.1974867
2009-09-26 0.2736278 0.04606471 0.06387029 0.3514585 0.3837401 0.1776439
EDIT
No need to use ls, get.... just the handy eapply (thanks #GSee) which applies FUN to the named values from an environment and returns the results as a list
eapply(tickers, function(x)
x$IS$A["Operating Income", ] / x$IS$A["Total Revenue",])
R beginner with what seems to be a pretty simple problem :
I have a number of email logs that I have read into R in the format:
>log1
Date Time From To
1 2000-01-01 00:00:00 bob#mail.com test1#mail.com
2 2000-01-02 01:00:00 carolyn #mail.com test2#mail.com
3 2000-01-03 02:00:00 chris#mail.com test3#mail.com
4 2000-01-04 03:00:00 chris #mail.com test4#mail.com
5 2000-01-05 04:00:00 alan#mail.com test5#mail.com
6 2000-01-06 05:00:00 alan.#mail.com test6#mail.com
I need to change log1$From and log1$To to a global unique numeric identifier, such that when I read in other logs later any given email address will receive the same identifier as previous logs.
I have tried:
id <- as.numeric(as.character(log1[,3])))
id<-as.numeric(levels(log1[,3])))
id <- charToRaw(log1[,4]), base=16)
Would some kind soul please help me out – Thanks!
Apologies should probably have included this:
Date=c( "01/01/2000" ,"02/01/2000" ,"03/01/2000", "04/01/2000" ,"05/01/2000" ,"06/01/2000","07/01/2000","08/01/2000",
"09/01/2000","10/01/2000","11/01/2000", "12/01/2000" ,"13/01/2000", "14/01/2000", "15/01/2000","16/01/2000"
,"17/01/2000","18/01/2000","19/01/2000","20/01/2000","01/01/2000","02/01/2000")
Time=c("00:00:00","01:00:00","02:00:00", "03:00:00" ,"04:00:00" ,"05:00:00", "06:00:00" ,"07:00:00", "08:00:00", "09:00:00" ,"10:00:00",
"11:00:00", "12:00:00","13:00:00", "14:00:00","15:00:00","16:00:00","17:00:00","18:00:00","19:00:00","00:00:00" ,"00:00:00")
From=c("bob.shults#mail.com","carolyn.green#mail.com","chris.long#mail.com","christi.nicolay#mail.com","alan.aronowitz#mail.com","alan.comnes#mail.com",
"dab#sprintmail.com","ana.correa#mail.com","andrew.fastow#mail.com","elena.kapralova#mail.com","bob.shults#mail.com","carolyn.green#mail.com",
"chris.long#mail.com","christi.nicolay#mail.com","alan.aronowitz#mail.com","alan.comnes#mail.com","dab#sprintmail.com","ana.correa#mail.com",
"andrew.fastow#mail.com","elena.kapralova#mail.com","bob.shults#mail.com","bob.shults#mail.com")
To=c("ana.correa#mail.com","test2#mail.com","test3#mail.com","test4#mail.com","test5#mail.com","test6#mail.com","test7#mail.com",
"test8#mail.com","test9#mail.com","test10#mail.com","test11#mail.com","test12#mail.com","test13#mail.com","test14#mail.com",
"test15#mail.com","test16#mail.com","test17#mail.com","test18#mail.com","test19#mail.com","test20#mail.com","ana.correa#mail.com","ana.correa#mail.com")
log<-data.frame(Date=Date,Time=Time,From=From,To=To)
Attempt at using MD5 to generate globally unique identifiers: Note how the identifier for ana.correa#mail.com is a correct match within ID_to but is not within ID_from
ID_to<-data.frame()
ID_from<-data.frame()
for (i in 1:nrow(log)){
to<-as.numeric(paste('0x', substr(rep(hmac('secret',log[i,4], algo='md5'), 2), c(1, 9, 17, 25), c(8, 16, 24, 32)),sep=""))
(ID_to<-rbind(ID_to,to))
from<-as.numeric(paste('0x', substr(rep(hmac('secret',log[i,3], algo='md5'), 2), c(1, 9, 17, 25),c(8, 16, 24, 32)),sep=""))
(ID_from<-rbind(ID_from,from))
}
ID_to[,3]<-paste(ID_to[,1],ID_to[,2], sep="")
ID_from[,3]<-paste(ID_from[,1],ID_from[,2], sep="")
edgelist<-data.frame(ID_from[,3],log[,3],ID_to[,3],log[,4],log[,1],log[,2])
print(edgelist)
ID_from...3. log...3. ID_to...3. log...4. log...1. log...2.
27488842661591306920 bob.shults#mail.com 18727221862165338513 ana.correa#mail.com 01/01/2000 00:00:00
38124472891255273775 carolyn.green#mail.com 1251903296725454474 test2#mail.com 02/01/2000 01:00:00
29070047663451376630 chris.long#mail.com 17074276751156451031 test3#mail.com 03/01/2000 02:00:00
8261398433828474582 christi.nicolay#mail.com 1563683670909194033 test4#mail.com 04/01/2000 03:00:00
18727221862165338513 alan.aronowitz#mail.com 26735368323826533112 test5#mail.com 05/01/2000 04:00:00
5680838251168988404 alan.comnes#mail.com 2923605896229594830 test6#mail.com 06/01/2000 05:00:00
2351312285811012730 dab#sprintmail.com 17171333544033270402 test7#mail.com 07/01/2000 06:00:00
328278708432069254 ana.correa#mail.com 33840664403556851587 test8#mail.com 08/01/2000 07:00:00
1127901879852039037 andrew.fastow#mail.com 1973548136161209824 test9#mail.com 09/01/2000 08:00:00
7349515121496417787 elena.kapralova#mail.com 5680838251168988404 test10#mail.com 10/01/2000 09:00:00
27488842661591306920 bob.shults#mail.com 328278708432069254 test11#mail.com 11/01/2000 10:00:00
38124472891255273775 carolyn.green#mail.com 1127901879852039037 test12#mail.com 12/01/2000 11:00:00
29070047663451376630 chris.long#mail.com 27488842661591306920 test13#mail.com 13/01/2000 12:00:00
8261398433828474582 christi.nicolay#mail.com 38124472891255273775 test14#mail.com 14/01/2000 13:00:00
18727221862165338513 alan.aronowitz#mail.com 29070047663451376630 test15#mail.com 15/01/2000 14:00:00
5680838251168988404 alan.comnes#mail.com 8261398433828474582 test16#mail.com 16/01/2000 15:00:00
2351312285811012730 dab#sprintmail.com 2351312285811012730 test17#mail.com 17/01/2000 16:00:00
328278708432069254 ana.correa#mail.com 7349515121496417787 test18#mail.com 18/01/2000 17:00:00
1127901879852039037 andrew.fastow#mail.com 41762759923562968495 test19#mail.com 19/01/2000 18:00:00
7349515121496417787 elena.kapralova#mail.com 24894056753582090007 test20#mail.com 20/01/2000 19:00:00
27488842661591306920 bob.shults#mail.com 18727221862165338513 ana.correa#mail.com 01/01/2000 00:00:00
27488842661591306920 bob.shults#mail.com 18727221862165338513 ana.correa#mail.com 02/01/2000 00:00:00
Attempt at levels/factor method:
Getting an error:
log <- union(levels(log[,3]), levels(log[,4]))
>Error in emails[, 3] : incorrect number of dimensions
You can use MD5 to generate globally unique identifiers since it has a very low probability of collisions, but since its output is 128-bit you need a few numbers to represent it (four integers in 32-bit R, two integers in 64-bit R). This should be easy to deal with using short numeric vectors, though.
Here is how you can generate such a vector of four integers for an email address (or any other string for that matter):
library(digest)
email <- 'test1#gmail'
as.numeric(paste('0x', substr(rep(hmac('secret56f8a7', email, algo='md5'), 4), c(1, 9, 17, 25), c(8, 16, 24, 32)), sep=''))
You could use algo='crc32' and obtain just one integer, but this isn't recommended since collisions are far more likely with CRC.
you need to create a unique id for every email in your logs. One way would be to calculate the crc checksum of every email and use that as a identifier, but it will be very long number. Or you could implement a hashmap in R and make the email the key of the hashmap.
I think this will do what you want, and it's efficient, and you can do it using only base packages...
Procedure:
1.Convert both columns to factors
2.Union the factor levels, in exactly the same way, so that each email has a unique ID in the factor levels.
3.Change the entries in each column to the number corresponding to their factor level. As a result, we can identify the times when "test1#gmail.com" sent and received emails by simply looking up "1" in both columns.
log1$From <- as.factor(log1$From)
log1$To <- as.factor(log1$To)
emails <- union(levels(log1$From), levels(log1$To))
levels(log1$From) <- emails
levels(log1$To) <- emails
log1$From <- as.numeric(log1$From)
log1$To <- as.numeric(log1$To)
It will probably be a good idea to keep a record of the original email addresses, as I have done here. Then if you were interested in, say, which emails test1#gmail.com sent:
log1[log1$From == which(emails == "test1#gmail.com"), ]
should do the trick! You can write a procedure to make that look much cleaner as well...