Using a variable to add a data frame column in R - r

I am trying to achieve the following
stocks <- c('AXP', 'VZ', 'V')
library('quantmod')
getSymbols(stocks)
Above command creates 3 data variables named AXP, VZ, and V
prices <- data.frame(stringAsFactors=FALSE)
Here I am trying to create a column with name as ticket (e.g. AXP) with data in
The following should add 3 columns to the frame, names AXP, VZ, and V with data in
AXP$AXP.Adjusted, VZ$VZ.Adjusted, V$V.Adjusted
for (ticker in stocks)
{
prices$ticker <- ticker$ticker.Adjusted
}
How do I achieve this? R gives an error like this when I try this
Error in ticker$ticker.Adjusted :
$ operator is invalid for atomic vectors
Any ideas?
Thanks in advance

Here is a simpler way to do this
do.call('cbind', lapply(mget(stocks), function(d) d[,6]))
Explanation:
mget(stocks) gets the three data frames as a list
lapply extracts the 6th column which contains the variable of interest.
do.call passes the list from (2) to cbind, which binds them together as columns.
NOTE: This solution does not take care of the different number of columns in the data frames.

I did not understand your question before, now I think I understood what you want:
What you wrote does not work because the object ticker is character string. If you want to get the object named after that string, you have to evaluate the parsed text.
Try this:
for (ticker in stocks){
prices <- cbind(prices, eval(parse(text=ticker))[,paste0(ticker, ".", "Adjusted")])
}
This will give you:
An ‘xts’ object on 2007-01-03/2014-01-28 containing:
Data: num [1:1780, 1:4] 53.4 53 52.3 52.8 52.5 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "AXP.Adjusted" "AXP.Adjusted.1" "VZ.Adjusted" "V.Adjusted"
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 2
$ src : chr "yahoo"
$ updated: POSIXct[1:1], format: "2014-01-29 01:06:51"

One problem you're going to have is that the three downloads have different number of rows, so binding them all into a single data frame will fail.
The code below uses the last 1000 rows of each file (most recent), and does not use loops.
stocks <- c('AXP', 'VZ', 'V')
library('quantmod')
getSymbols(stocks)
prices=do.call(data.frame,
lapply(stocks,
function(s)tail(get(s)[,paste0(s,".Adjusted")],1000)))
colnames(prices)=stocks
head(prices)
# AXP VZ V
# 2010-02-08 34.70 21.72 80.58
# 2010-02-09 35.40 22.01 80.79
# 2010-02-10 35.60 22.10 81.27
# 2010-02-11 36.11 22.23 82.73
# 2010-02-12 36.23 22.15 82.38
# 2010-02-16 37.37 22.34 83.45
Working from the inside out, s is the ticker (so, e.g., "AXP"); get(s) returns the object with that name, so AXP; get(s)[,paste0(s,".Adjusted")] is equivalent to AXP[,"AXP.Adjusted"]; tail(...,1000) returns the last 1000 rows of .... So when s="AXP", the function returns the last 1000 rows of AXP$AXP.Adjusted.
lapply(...) applies that function to each element in stocks.
do.call(data.frame,...) invokes the data.frame function with the list of columns returned by lapply(...).

Related

how to get same result without for loop in r?

I am looking for another way to achieve the same result because the for statement is too slow.
I have the following data frame.
'data.frame': 50000 obs. of 2 variables:
$ user_id: chr "user1#test.com" "user2#test.com" ......
$ result : logi NA NA ......
Function f takes a user ID and returns a specific result.
f <- function(user_id){
......
return(json_result)
}
The result I want is as follows.
'data.frame': 50000 obs. of 2 variables:
$ user_id: chr "user1#test.com" "user2#test.com" ......
$ result : chr "{....}" "{....}" ......
I am running a loop like the code below, but the speed is too slow.
for (t in df$user_id) {
print(t)
df$result[df$user_id==t] <- f(t)
}
It takes about 3 seconds per user, and 3*50000 seconds to get a total of 50,000 users.
Is there any other way to get results faster?
You're looking for lapply function:
df$result <- lapply(df$user_id, f)
Alternatively, you can use purrr's map functions.
library(tidyverse)
purrr::map(df$user_id, f)
This will output a list where each element is the output of the function call. Depending on the output of your function, you could use a map variant to output a vector of some type. You can read about this in the docs: https://purrr.tidyverse.org/reference/map.html

Generate an xts of numerics from .csv with some characters/"#N/A"

I enter a headed Excel CSV and examine with str(returns.xts). The following code generates character values within the xts.
file <- "~/GCS/returns_Q216.csv"
returns_Q216_ <- read.csv(file=file)
returns <- read.zoo(data.frame(returns_Q216_), FUN = as.Date, format='%d/%m/%Y')
returns.xts <- as.xts(returns)
What is the best way to convert the xts contents to numeric from character whilst preserving xts (and date column)?
> `str(returns)`
An ‘xts’ object on 2007-01-31/2015-05-31 containing:
Data: `chr` [1:101, 1:18] "-0.002535663" "-0.001687755" "0.032882512" "0.024199512" "0.027812955" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:18] "UK.EQUITY" "EUR.EQUITY" "NA.EQUITY" "ASIA.EQUITY" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
NULL
> returns[8,9]
PROPERTY
2007-08-31 "-4.25063E-05"
When I try as.numeric(returns.xts) I get a structure 1x1 cell without the date as row.
> str(as.numeric(returns))
num [1:1818] -0.00254 -0.00169 0.03288 0.0242 0.02781 ...
You should use the na.strings argument to read.csv (which can be passed via read.zoo), as I said in my answer to your previous question.
file <- "~/GCS/returns_Q216.csv"
returns <- read.zoo(file, FUN=as.Date, format='%d/%m/%Y', na.strings="#N/A")
returns.xts <- as.xts(returns)

Sorting xts data to look like panel data in R

I need to use 'PerformanceAnalytics' package of R and to use this package, it requires me to convert the data into xts data. The data can be downloaded from this link: https://drive.google.com/file/d/0B8usDJAPeV85elBmWXFwaXB4WUE/edit?usp=sharing . Hence, I have created an xts data by using the following commands:
data<-read.csv('monthly.csv')
dataxts <- xts(data[,-1],order.by=as.Date(data$datadate,format="%d/%m/%Y"))
But after doing this, it looses the panel data structure. I tried to sort the xts data to get it back in panel data form but failed.
Can anyone please help me to reorganize the xts data to look like a panel data. I need to sort them by firm id (gvkey) and data(datadate).
xts objects are sorted by time index only. They cannot be sorted by anything else.
I would encourage you to split your data.frame into a list, by gvkey. Then convert each list element to xts and remove the columns that do not vary across time, storing them as xtsAttributes. You might also want to consider using the yearmon class, since you're dealing with monthly data.
You will have to determine how you want to encode non-numeric, time-varying values, since you cannot mix types in xts objects.
Data <- read.csv('monthly.csv', nrow=1000, as.is=TRUE)
DataList <- split(Data, Data$gvkey)
xtsList <- lapply(DataList, function(x) {
attrCol <- c("iid","tic","cusip","conm","exchg","secstat","tpci",
"cik","fic","conml","costat","idbflag","dldte")
numCol <- c("ajexm","ajpm","cshtrm","prccm","prchm","prclm",
"trfm", "trt1m", "rawpm", "rawxm", "cmth", "cshom", "cyear")
toEncode <- c("isalrt","curcdm")
y <- xts(x[,numCol], as.Date(x$datadate,format="%d/%m/%Y"))
xtsAttributes(y) <- as.list(x[1,attrCol])
y
})
Each list element is now an xts object, and is much more compact, since you do not repeat completely redundant data. And you can easily run analysis on each gvkey via lapply and friends.
> str(xtsList[["1004"]])
An ‘xts’ object on 1983-01-31/2012-12-31 containing:
Data: num [1:360, 1:13] 3.38 3.38 3.38 3.38 3.38 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:13] "ajexm" "ajpm" "cshtrm" "prccm" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 13
$ iid : int 1
$ tic : chr "AIR"
$ cusip : int 361105
$ conm : chr "AAR CORP"
$ exchg : int 11
$ secstat: chr "A"
$ tpci : chr "0"
$ cik : int 1750
$ fic : chr "USA"
$ conml : chr "AAR Corp"
$ costat : chr "A"
$ idbflag: chr "D"
$ dldte : chr ""
And you can access the attributes via xtsAttributes:
> xtsAttributes(xtsList[["1004"]])$fic
[1] "USA"
> xtsAttributes(xtsList[["1004"]])$tic
[1] "AIR"
An efficient way to achieve this goal is to covert the Panel Data (long format) into wide format using 'reshape2' package. After performing the estimations, convert it back to long format or panel data format. Here is an example:
library(foreign)
library(reshape2)
dd <- read.dta("DDA.dta") // DDA.dta is Stata data; keep only date, id and variable of interest (i.e. three columns in total)
wdd<-dcast(dd, datadate~gvkey) // gvkey is the id
require(PerformanceAnalytics)
wddxts <- xts(wdd[,-1],order.by=as.Date(wdd$datadate,format= "%Y-%m-%d"))
ssd60A<-rollapply(wddxts,width=60,SemiDeviation,by.column=TRUE,fill=NA) // e.g of rolling window calculation
ssd60A.df<-as.data.frame(ssd60A.xts) // convert dataframe to xts
ssd60A.df$datadate=rownames(ssd60A.df) // insert time index
lssd60A.df<-melt(ssd60A.df, id.vars=c('datadate'),var='gvkey') // convert back to panel format
write.dta(lssd60A.df,"ssd60A.dta",convert.factors = "string") // export as Stata file
Then simply merge it with the master database to perform some regression.

Sorting after aggregating in R

I first used aggregate to get the mean of one column in a data frame, per another column:
meanDemVoteHouseState <- aggregate(congress$X2012.House.Dem.vote,
by = list(state = congress$state),
FUN = mean)
I then wanted to print this in order. First I looked at the new data frame
str(meanDemVoteHouseState)
and got
'data.frame': 50 obs. of 2 variables:
$ state: chr "AK" "AL" "AR" "AZ" ...
$ x : num 0.29 0.34 0.29 0.462 0.566 ...
apparently, the new variable is now called "x".
But when I tried to sort on that:
meanDemVoteHouseState[order(x),]
I got an error "object 'x' not found".
I tried a number of other things, but nothing worked.
What am I missing ?
You want
meanDemVoteHouseState[order(meanDemVoteHouseState[,"x"]),]
If you do it in two steps in becomes clearer
myind <- order(meanDemVoteHouseState[,"x"]) # need 'x' fully qualified
meanDemVoteHouseState[myind, ]
Or use things like with() ...
It would probably be easier to just do
meanDemVoteHouseState <- aggregate(X2012.House.Dem.vote ~ state,
data = congress, FUN = mean)
Which would maintain the variable name (such as it is). You'd still need to sort, say with
ord <- with(meanDemVoteHouseState, order(X2012.House.Dem.vote))
meanDemVoteHouseState <- meanDemVoteHouseState[ord, ]
And at this point you may want to choose some shorter names for variables and objects.

R from character to numeric

I have this csv file (fm.file):
Date,FM1,FM2
28/02/2011,14.571611,11.469457
01/03/2011,14.572203,11.457512
02/03/2011,14.574798,11.487183
03/03/2011,14.575558,11.487802
04/03/2011,14.576863,11.490246
And so on.
I run this commands:
fm.data <- as.xts(read.zoo(file=fm.file,format='%d/%m/%Y',tz='',header=TRUE,sep=','))
is.character(fm.data)
And I get the following:
[1] TRUE
How do I get the fm.data to be numeric without loosing its date index. I want to perform some statistics operations that require the data to be numeric.
I was puzzled by two things: It didn't seem that that 'read.zoo' should give you a character matrix, and it didn't seem that changing it's class would affect the index values, since the data type should be separate from the indices. So then I tried to replicate the problem and get a different result:
txt <- "Date,FM1,FM2
28/02/2011,14.571611,11.469457
01/03/2011,14.572203,11.457512
02/03/2011,14.574798,11.487183
03/03/2011,14.575558,11.487802
04/03/2011,14.576863,11.490246"
require(xts)
fm.data <- as.xts(read.zoo(file=textConnection(txt),format='%d/%m/%Y',tz='',header=TRUE,sep=','))
is.character(fm.data)
#[1] FALSE
str(fm.data)
#-------------
An ‘xts’ object from 2011-02-28 to 2011-03-04 containing:
Data: num [1:5, 1:2] 14.6 14.6 14.6 14.6 14.6 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "FM1" "FM2"
Indexed by objects of class: [POSIXct,POSIXt] TZ:
xts Attributes:
List of 2
$ tclass: chr [1:2] "POSIXct" "POSIXt"
$ tzone : chr ""
zoo- and xts-objects have their data in a matrix accessed with coredata and their indices are a separate set of attributes.
I think the problem is you have some dirty data in you csv file. In other words FM1 or FM2 columns contain a character, somewhere, that stops it being interpreted as a numeric column. When that happens, XTS (which is a matrix underneath) will force the whole thing to character type.
Here is one way to use R to find suspicious data:
s <- scan(fm.file,what="character")
# s is now a vector of character strings, one entry per line
s <- s[-1] #Chop off the header row
all(grepl('^[-0-9,.]*$',s,perl=T)) #True means all your data is clean
s[ !grepl('^[-0-9,.]*$',s,perl=T) ]
which( !grepl('^[-0-9,.]*$',s,perl=T) ) + 1
The second-to-last line prints out all the csv rows that contain characters you did not expect. The last line tells you which rows in the file they are (+1 because we removed the header row).
Why not simply use read.csv and then convert the first column to an Date object using as.Date
> x <- read.csv(fm.file, header=T)
> x$Date <- as.Date(x$Date, format="%d/%m/%Y")
> x
Date FM1 FM2
1 2011-02-28 14.57161 11.46946
2 2011-03-01 14.57220 11.45751
3 2011-03-02 14.57480 11.48718
4 2011-03-03 14.57556 11.48780
5 2011-03-04 14.57686 11.49025

Resources