log return calculation from matrix - r

when I have a dataframe named "Historical_Stock_Prices_R" like this
Date1 MSFT AAPL GOOGL
25-01-05 21.03 4.87 88.56
26-01-05 21.02 4.89 94.62
27-01-05 21.10 4.91 94.04
28-01-05 21.16 5.00 95.17
I use the following formulas to get a lsit of monthly max and monthly mean log return from daily price data file
return<- cbind.data.frame(date=Historical_Stock_Prices_R$Date1[2:nrow(Historical_Stock_Prices_R)],apply(Historical_Stock_Prices_R[,2:4],2,function(x) log(x[-1]/x[-length(x)])*100))
return$Date <- as.Date(return$date,format="%d-%m-%y")
RMax <- aggregate(return[,-1],
by=list(Month=format(return$Date,"%y-%m")),
FUN=max)
RMean <- aggregate(return[,-1],
by=list(Month=format(return$Date,"%y-%m")),
FUN=mean)
But now I have a matrix (not a dataframe) named "df" like this
AAPL.High ABT.High ABBV.High ACN.High ADBE.High
07-01-02 NA NA NA NA NA
03-01-07 12.37 24.74 NA 37 41.32
04-01-07 12.28 25.12 NA 37.23 41
05-01-07 12.31 25 NA 36.99 40.9
Now how can I calculate same monthly mean and monthly max using similar kind of code?

Related

Reshaping data for panel regression from Datastream

I have downloaded data from Datastream in form one variable per sheet.
Current data view - One variable: Price
What I want to do it to convert each sheet (each variable) into panel format so that I can use plm() or export data to Stata (I am kind of new to R), so that it looks like
Click to view - What I expect to have
One conundrum is that I have >500 companies and manually writting the names (or codes) in the R code is very burdensome
I would really appreciate if you could sketch a basic code and not just refer to reshape function in R.
P.S. Sorry for posting this question if it was already answered.
Your current data set is in wide format and you need it in long format and melt function from reshape package will do very well
The primary key for melt function is date since it is the same for all companies
I have assumed a test dataset for the below demo:
#Save Price, volume, market value, shares, etc into individual CSV files
#Rename first column as "date" and Remove rows 2 and 3 since you do not need them
#Demo for price data
price_data = read.csv("path_to_price_csv_file",header=TRUE,stringsAsFactors=FALSE,na.strings="NA")
test_DF = price_data
require(reshape2)
require(PerformanceAnalytics)
data(managers)
test_DF = data.frame(date=as.Date(index(managers),format="%Y-%m-%d"),managers,row.names=NULL,stringsAsFactors=FALSE)
#This data is similar in format as your price data
head(test_DF)
# date HAM1 HAM2 HAM3 HAM4 HAM5 HAM6 EDHEC.LS.EQ SP500.TR US.10Y.TR US.3m.TR
# 1 1996-01-31 0.0074 NA 0.0349 0.0222 NA NA NA 0.0340 0.00380 0.00456
# 2 1996-02-29 0.0193 NA 0.0351 0.0195 NA NA NA 0.0093 -0.03532 0.00398
# 3 1996-03-31 0.0155 NA 0.0258 -0.0098 NA NA NA 0.0096 -0.01057 0.00371
# 4 1996-04-30 -0.0091 NA 0.0449 0.0236 NA NA NA 0.0147 -0.01739 0.00428
# 5 1996-05-31 0.0076 NA 0.0353 0.0028 NA NA NA 0.0258 -0.00543 0.00443
# 6 1996-06-30 -0.0039 NA -0.0303 -0.0019 NA NA NA 0.0038 0.01507 0.00412
#test_data = test_DF #replace price, volume , shares dataset here
#dateColumnName = "date" #name of your date column
#columnOfInterest1 = "manager" #for you this will be "Name"
#columnOfInterest2 = "return" #this will vary according to your input data, price, volume, shares etc.
Custom_Melt_DataFrame = function(test_data = test_DF ,dateColumnName = "date", columnOfInterest1 = "manager",columnOfInterest2 = "return") {
molten_DF = melt(test_data,dateColumnName,stringsAsFactors=FALSE)
colnames(molten_DF) = c(dateColumnName,columnOfInterest1,columnOfInterest2)
#format as character
molten_DF[,columnOfInterest1] = as.character(molten_DF[,columnOfInterest1])
#assign index
molten_DF$index = rep(1:(ncol(test_data)-1),each=nrow(test_data))
#reorder columns
molten_DF = molten_DF[,c("index",columnOfInterest1,dateColumnName,columnOfInterest2)]
return(molten_DF)
}
custom_data = Custom_Melt_DataFrame (test_data = test_DF ,dateColumnName = "date", columnOfInterest1 = "manager",columnOfInterest2 = "return")
head(custom_data,10)
# index manager date return
# 1 1 HAM1 1996-01-31 0.0074
# 2 1 HAM1 1996-02-29 0.0193
# 3 1 HAM1 1996-03-31 0.0155
# 4 1 HAM1 1996-04-30 -0.0091
# 5 1 HAM1 1996-05-31 0.0076
# 6 1 HAM1 1996-06-30 -0.0039
# 7 1 HAM1 1996-07-31 -0.0231
# 8 1 HAM1 1996-08-31 0.0395
# 9 1 HAM1 1996-09-30 0.0147
# 10 1 HAM1 1996-10-31 0.0288
tail(custom_data,10)
# index manager date return
# 1311 10 US.3m.TR 2006-03-31 0.00385
# 1312 10 US.3m.TR 2006-04-30 0.00366
# 1313 10 US.3m.TR 2006-05-31 0.00404
# 1314 10 US.3m.TR 2006-06-30 0.00384
# 1315 10 US.3m.TR 2006-07-31 0.00423
# 1316 10 US.3m.TR 2006-08-31 0.00441
# 1317 10 US.3m.TR 2006-09-30 0.00456
# 1318 10 US.3m.TR 2006-10-31 0.00381
# 1319 10 US.3m.TR 2006-11-30 0.00430
# 1320 10 US.3m.TR 2006-12-31 0.00441

Lag of the value, take the previous value logic

Here is what I try to achieve on whatiwant column:
df1 <- data.frame(value = c(99.99,99.98,99.97,99.96,99.95,99.94,
99.93,99.92,99.91,99.9,99.9,99.9),
new_value = c(NA,NA,99.98,NA,99.97,NA,
NA,NA,NA,NA,NA,NA),
whatiswant = c(99.99,99.96,99.98,99.95,99.97,99.94,
99.93,99.92,99.91,99.9,99.9,99.9))
To explain it with words whatiswant should have the value of new_value and for those not having the new_value, it should take the next lowest value available.
I think it is kind of a lag stuff. Here is the data.frame:
value new_value whatiswant
1 99.99 NA 99.99
2 99.98 NA 99.96
3 99.97 99.98 99.98
4 99.96 NA 99.95
5 99.95 99.97 99.97
6 99.94 NA 99.94
7 99.93 NA 99.93
8 99.92 NA 99.92
9 99.91 NA 99.91
10 99.90 NA 99.90
11 99.90 NA 99.90
12 99.90 NA 99.90
EDIT: Logic explained in following steps:
Step 1. if new_value is not NA then col3 takes the value. So the 3rd and
5th row are sorted.
Step 2. 1st row col3 takes the value of col1, as col2 is NA.
Step 3. 2nd row col3 takes the value of col1-row4, as 2nd and 3nd
row values for col1 is already used in Step 1.
Step 4. 4th row col3 takes the value of col1-row5, as all above rows
from col1 are taken in previous steps.
Step 5. The rest of the rows6-12 in col3 take the same value from
col1-rows6-12 as col2 is NA and non of the numbers col1-row6-12 are
used in previous steps.
In form of a function, each step in comment, ask if it's unclear:
t1 <- function(df) {
df[,'whatiswant'] <- df[,'new_value'] # step 1, use value of new_value
sapply(1:nrow(df),function(row) { # loop on each row
x <- df[row,] # take the row, just to use a single var instead later
ret <- unlist(x['whatiswant']) # initial value
if(is.na(ret)) { # If empty
if (x['value'] %in% df$whatiswant) { # test if corresponding value is already present
ret <- df$value[!df$value %in% df$whatiswant][1] # If yes take the first value not present
} else {
ret <- unlist(x['value']) # if not take this value
}
}
if(is.na(ret)) ret <- min(df$value) # No value left, take the min
df$whatiswant[row] <<- ret # update the df from outside sapply so the next presence test is ok.
})
return(df) # return the updated df
}
Output:
>df1[,3] <- NA # Set last column to NA
> res <- t1(df1)
> res
value new_value whatiswant
1 99.99 NA 99.99
2 99.98 NA 99.96
3 99.97 99.98 99.98
4 99.96 NA 99.95
5 99.95 99.97 99.97
6 99.94 NA 99.94
7 99.93 NA 99.93
8 99.92 NA 99.92
9 99.91 NA 99.91
10 99.90 NA 99.90
11 99.90 NA 99.90
12 99.90 NA 99.90

How to fetch 3-years historical price serie from Oanda with R?

I would like to process Bitcoin price in R but I'm unable to download time serie from Yahoo and Google.
From Yahoo the BTCUSD historical time serie is missing and the Google doesn't recognize the URL formated by getSymbols when symbol is "CURRENCY:EURUSD". I know R expect the ":" to be a list so I applied a workaround I found in Stakeoverflow to turn CURRENCY:EURUSD in CURRENCY.EURUSD but still Google cannot process the request.
Download from Oanda works like a charm but request cannot exceed 500 days. I try this workaround to bypass the limitation but it fails to populate correctly the prices object in which I have others symbols :
for some reason BTCUSD prices are missing for 2012 and part of 2013
also there are symbols from symbols's list that get NA with the wo.
tail(prices) (with the loop bellow)
UUP FXB FXE FXF FXY SLV GLD BTC
2014-08-31 NA NA NA NA NA NA NA 506.809
2014-09-30 22.87 159.33 124.48 102.26 88.80 16.35 116.21 375.386
2014-10-31 23.09 157.20 123.49 101.45 86.65 15.50 112.66 341.852
2014-11-30 NA NA NA NA NA NA NA 378.690
2014-12-31 23.97 153.06 119.14 98.16 81.21 15.06 113.58 312.642
2015-01-24 NA NA NA NA NA NA NA 229.813
Extract of print(prices) (with the loop bellow)
2013-06-28 22.56 150.17 128.93 103.92 98.63 18.97 119.11 NA
2013-07-31 22.09 150.12 131.74 105.99 99.93 19.14 127.96 NA
2013-08-30 22.19 152.93 130.84 105.45 99.63 22.60 134.62 NA
2013-09-30 21.63 159.70 133.85 108.44 99.47 20.90 128.18 133.794
2013-10-31 21.63 158.10 134.29 108.03 99.38 21.10 127.74 203.849
2013-11-30 NA NA NA NA NA NA NA 1084.800
2013-12-31 21.52 163.30 135.99 109.82 92.76 18.71 116.12 758.526
2014-01-31 21.83 161.95 133.29 108.00 95.58 18.45 120.09 812.097
tail(prices) (without the loop bellow)
UUP FXB FXE FXF FXY SLV GLD
2014-08-29 22.02 163.23 129.54 106.42 93.61 18.71 123.86
2014-09-30 22.87 159.33 124.48 102.26 88.80 16.35 116.21
2014-10-31 23.09 157.20 123.49 101.45 86.65 15.50 112.66
2014-11-28 23.47 153.46 122.46 101.00 82.01 14.83 112.11
2014-12-31 23.97 153.06 119.14 98.16 81.21 15.06 113.58
2015-01-23 25.21 147.23 110.33 110.95 82.57 17.51 124.23
What is wrong with this code ? Tx !
require(quantmod)
require(PerformanceAnalytics)
symbols <- c(
"UUP",
"FXB",
"FXE",
"FXF",
"FXY",
"SLV",
"GLD"
)
getSymbols(symbols, from="2004-01-01")
prices <- list()
for(i in 1:length(symbols)) {
prices[[i]] <- Cl(get(symbols[i]))
}
BTC <- list()
for(i in 1:2) {
BTC[[1]] <- getFX("BTC/USD",
from = Sys.Date() -499 * (i + 1),
to = Sys.Date() - 499 * i,
env = parent.frame(),
auto.assign = FALSE)
}
BTC[[1]] <- getFX("BTC/USD",
from = Sys.Date() -499,
to = Sys.Date(),
env = parent.frame(),
auto.assign = FALSE)
prices[[length(symbols)+1]] <- BTC[[1]]
prices <- do.call(cbind, prices)
colnames(prices) <- gsub("\\.[A-z]*", "", colnames(prices))
ep <- endpoints(prices, "months")
prices <- prices[ep,]
prices <- prices["1997-03::"]
Your for loop isn't using i, and then after the for loop you're overwriting the results (the list was of length 1 because BTC[[1]] was hardcoded)
Try this
btc <- do.call(rbind, lapply(0:2, function(i) {
getFX("BTC/USD",
from = Sys.Date() -499 * (i + 1),
to = Sys.Date() - 499 * i,
env=NULL)
}))
prices <- do.call(cbind, c(prices, list(btc)))
Edit: Here's a more complete example
library(quantmod)
# Use tryCatch() in case we try to get data too far in the past that
# Oanda doesn't provide. Return NULL if there is an error, and Filter
# to only include data that has at least 1 row.
btc <- do.call(rbind, Filter(NROW, lapply(0:5, function(i) {
tryCatch(getFX("BTC/USD",
from = Sys.Date() -499 * (i + 1),
to = Sys.Date() - 499 * i,
env=NULL), error=function(e) NULL)
})))
symbols <- c(
"UUP",
"FXB",
"FXE",
"FXF",
"FXY",
"SLV",
"GLD"
)
e <- new.env()
getSymbols(symbols, from=start(btc), env=e)
prices <- do.call(cbind, c(eapply(e, Cl)[symbols], list(btc)))
colnames(prices) <- gsub("\\.[A-z]*", "", colnames(prices))
head(na.locf(prices)[endpoints(prices, "months")])
# UUP FXB FXE FXF FXY SLV GLD BTC
#2010-07-31 23.74 156.15 129.88 95.38 114.60 17.58 115.49 0.06386
#2010-08-31 24.12 152.60 126.25 97.80 117.83 18.93 122.08 0.06441
#2010-09-30 22.84 156.33 135.81 101.00 118.57 21.31 127.91 0.06194
#2010-10-31 22.37 159.45 138.69 100.81 122.93 24.17 132.62 0.18530
#2010-11-30 23.50 154.72 129.30 98.87 118.16 27.44 135.42 0.27380
#2010-12-31 22.71 155.77 133.09 106.25 121.75 30.18 138.72 0.29190

Calling a list of tickers in quantmod using R

I want to get some data from a list of Chinese stocks using quantmod.
The list is like below:
002705.SZ -- 002730.SZ (in this sequence, there are some tickers matched with Null stock, for example, there is no stock called 002720.SZ)
300357.SZ -- 300402.SZ
603188.SS
603609.SS
603288.SS
603306.SS
603369.SS
I want to write a loop to run all these stocks to get the data from each of them and save them into one data frame.
This should get you started.
library(quantmod)
library(stringr) # for str_pad
stocks <- paste(str_pad(2705:2730,width=6,side="left",pad="0"),"SZ",sep=".")
get.stock <- function(s) {
s <- try(Cl(getSymbols(s,auto.assign=FALSE)),silent=T)
if (class(s)=="xts") return(s)
return (NULL)
}
result <- do.call(cbind,lapply(stocks,get.stock))
head(result)
# X002705.SZ.Close X002706.SZ.Close X002707.SZ.Close X002708.SZ.Close X002709.SZ.Close X002711.SZ.Close X002712.SZ.Close X002713.SZ.Close
# 2014-01-21 15.25 27.79 NA 17.26 NA NA NA NA
# 2014-01-22 14.28 28.41 NA 16.56 NA NA NA NA
# 2014-01-23 13.65 27.78 33.62 15.95 19.83 NA 36.58 NA
# 2014-01-24 15.02 30.56 36.98 17.55 21.81 NA 40.24 NA
# 2014-01-27 14.43 31.26 40.68 18.70 23.99 26.34 44.26 NA
# 2014-01-28 14.18 30.01 44.75 17.66 25.57 28.97 48.69 NA
This takes advantage of the fact that getSymbols(...) returns either an xts object, or a character string with an error message if the fetch fails.
Note that cbind(...) for xts objects aligns according to the index, so it acts like merge(...).
This produces an xts object, not a data frame. To convert this to a data.frame, use:
result.df <- data.frame(date=index(result),result)

selecting certain rows from R data frame

I have this huge data frame that has servernames, Date, CPU, memory as the headers. There are multiple servers names. I would like to be able to select certain server name order by the date column and create time serious graphs
this is a small subset of the data frame:
Hostname Date 5 60 61 CPUAVG CPUAVG+Sev CPUMaximum MemoryAVG
1 server1 2012-01-29 01:00:00 23.79 NA NA 2.33 0.72 2.33 23.76
2 server1 2012-01-29 02:00:00 23.91 NA NA 2.86 2.38 2.86 23.82
3 server1 2012-01-29 03:00:00 25.65 NA NA 6.25 9.59 6.25 24.85
4 server2 2012-01-29 04:00:00 26.30 NA NA 18.41 31.09 18.41 25.87
5 server3 2012-01-29 05:00:00 24.33 NA NA 1.92 0.42 1.92 24.24
6 server3 2012-01-29 06:00:00 24.40 NA NA 2.65 1.79 2.65 24.31
Checkout the 'subset' command.
thisServer <- subset (servers, Hostname="server1")
Then to order the rows
thisServerSorted <- thisServer[order(thisServer$Date),]
Then you can plot from there.
#convert Date to a date field (if needed)
library(lubridate)
servers$Date <- ymd_hms(servers$Date)
#select the servers you need
SelectedServers <- subset(servers, Hostname %in% c("server1", "server3"))
library(ggplot2)
#no need for sorting with ggplot2
ggplot(SelectedServers, aes(x = Date, y = CPUAVG, colour = Hostname)) + geom_line()
ggplot(SelectedServers, aes(x = Date, y = CPUAVG)) + geom_line() + facet_wrap(~Hostname)

Resources