My code:
library(quantmod)
library(tseries)
library(ggplot2)
companies = c("IOC.BO", "BPCL.BO", "ONGC.BO", "HINDPETRO.BO", "GAIL.BO")
stocks = list()
for(i in 1:5){
stocks[[i]] = getSymbols(companies[i], auto.assign = FALSE)
}
stocks is a list of dataframes. Now I'm trying to bind the all $adjusted columns all the dataframes stored in stock but to do that I need to remove the rownames (someone please tell me if there's a better method to do this):
for(i in 1:5)
rownames(stocks[[i]])<- NULL
but the resulting dataframes still have their row names, could someone please tell me where I'm going wrong?
P.S. Further my end goal is to have a dataframe with only the adjusted columns of the dataframes in the list stocks for which I did this:
adjusted=data.frame()
for(i in 1:5)
coln=stocks[[1]][,6]
adjusted=cbind(ajusted,coln)
adjusted
but this returns adjusted as a list.
Row names
Regarding row names after running the code in the question
rownames(stocks[[1]])
## NULL
so it is not true that stocks have row names afterwards.
Adjusted series
To create a time series of adjusted values use Ad as shown below.
Adjusted <- do.call("merge", lapply(stocks, Ad))
Putting it all together
Note that we don't really need the entire row names processing and the following is sufficient. The second last line is optional as its only purpose is to make the column names nicer and the last line converts the xts object Adjusted to a data frame and may not be needed either since you may find working with an xts object more convenient than using data frames.
library(quantmod)
library(ggplot2)
stocks <- lapply(companies, getSymbols, auto.assign = FALSE)
Adjusted <- do.call("merge", lapply(stocks, Ad))
names(Adjusted) <- sub(".BO.Adjusted", "", names(Adjusted))
adjustedDF <- fortify(Adjusted)
Related
I wish to store some XTS objects as data frames within a list in R.
The XTS objects are stock price data collected using the tidyquant package, I need to convert these objects to data frames and store them in a list. I have one additional requirement, I only want to retain the index column and the closing price column for each stock.
I have tried using dplyr syntax to select the columns of interest but my code fails to select column indexes greater than 2
Error: Can't subset columns that don't exist.
x Locations 3 and 4 don't exist.
i There are only 2 columns.
This is the code I am using but I am struggling to understand how I can't select the closing price from my 'fortified' data frames
pacman::p_load(tidyquant,tidyverse,prophet)
tickers = c("AAPL","AMZN")
getSymbols(tickers,
from = '2015-01-01',
to = today(),
warnings = FALSE,
auto.assign = TRUE)
dfList <- list()
for (i in tickers) {
dfList[[i]] <- fortify.zoo(i) %>%
select(c(1,5))
}
When I convert an individual XTS object to a data frame using fortify.zoo I can select the columns of interest but not when I loop through them.
fortify.zoo(AAPL) %>% select(c(1,5)) %>% head(n = 10)
Can anyone help me understand where I am falling down in my understanding on this issue please?
getSymbols can put the stock data into an environment stocks and Cl will extract the close and the Index. Replace Cl with Ad if you want the adjusted close. Then iterate through the names in the environment. Finally leave it as an environment stocks or optionally convert it to a list L. No packages other than quantmod and the packages that it pulls in are used. Also there is the question if you even need to convert the data to data frames. You could just leave it as xts.
library(quantmod)
tickers = c("AAPL","AMZN")
stocks <- new.env()
getSymbols(tickers, env = stocks, from = '2015-01-01')
for(nm in ls(stocks)) stocks[[nm]] <- fortify.zoo(Cl(stocks[[nm]]))
L <- as.list(stocks) # optional
Another possibility if you do want a list is to replace the last two lines with an eapply:
L <- eapply(stocks, function(x) fortify.zoo(Cl(x)))
It is better to initialize a list with fixed length and name it with the tickers. In the OP's code, it is looping over the tickers directly, so each 'i' is the ticker name which is a string
dfList <- vector('list', length(tickers))
names(dfList) <- tickers
As the i here is a string name of the object "AAPL" or "AMZN", we can use get to return the value of that object from the global env
for (i in tickers) {
dfList[[i]] <- fortify.zoo(get(i)) %>%
select(c(1,5))
}
-check the dimensions
sapply(dfList, dim)
# AAPL AMZN
#[1,] 1507 1507
#[2,] 2 2
Another approach is mget to return all those objects into a list
library(purrr)
library(dplyr)
dfList2 <- mget(tickers) %>%
map(~ fortify.zoo(.x) %>%
select(1, 5))
I am doing some data analysis where I have my datasets in a folder and I use a for loop to go through all the datasets and (1) Plot a graph (2) Calculate some values from the graph and store them in a dataframe which is then appended to a list. The idea is to have graphs for each dataset and also a list having this summary dataframe for each dataset for analysis later.
With every dataset the for loop iterates through I have a variable specifying the current dataset in the loop. This variable is used to label and save the graph and to label and append the dataframe to a list. I am able to do the graph bit alright but I am not able to add the dataframe to the list in the for loop. My code is as follows:
# Create empty list for adding things to from each loop
parameters <- list()
# Begin the loop
for (file in filesVector) {
# Extract keywords from name of file to be used later
splitname <- strsplit(file, '4-')
splitname <- unlist(splitname)
secondhalf <- splitname[2]
splitsecondhalf <- strsplit(secondhalf, '\\.')
splitsecondhalf <- unlist(splitsecondhalf)
title <- splitsecondhalf[1]
# Extract values as a dataframe and assign to varying name
assign(paste(title, 'blanks', sep= '-'),data_drc_merge[data_drc_merge$ID ==
"B", ])
# Add to list
parameters <- c(parameters, paste(title, 'blanks', sep= '-'))
But when I try assigning it to a dataframe I get the current value of the variable added there instead
Any ideas how to fix this?
Could you use [[ and paste0 to paste the name of the data.frame you want to add to your list:
list_of_df = list()
for(i in files){
# do analysis...
list_of_df[[paste0(name_, i)]] = current_df
}
Hi there I am looking on the internet what is wrong and the na.omit() function is not removing the rows with NA. Could you please help me?
library(TTR)
library(quantmod)
library(doParallel) #this library is for parallel core processing
StartDate = "2010-01-01"
EndDate = "2020-03-20"
myStock <- c("AMZN")
getSymbols(myStock, src="yahoo", from=StartDate, to=EndDate)
gdat <-coredata(AMZN$AMZN.Close) # Create a 2-d array of all the data. Or...
Data <- data.frame(date=index(AMZN), coredata(AMZN)) # Create a data frame with the data and (optionally) maintain the date as an index
Data$rsi22 <- data.frame(RSI(Cl(Data), n=22))
Data$rsi44 <- data.frame(RSI(Cl(Data), n=44))
colnames(Data)
DatanoNA <- na.omit(Data) #remove rows with NAs
I think you're looking for the complete.cases() function. na.omit() is for removing NA values in a vector, not for removing rows containing NA values from a data frame.
Also, your data frame construction is a little wonky (see below for more explanation). Try this:
Data <- data.frame(date=index(AMZN), coredata(AMZN),
rsi22=RSI(Cl(Data), n=22),
rsi44=RSI(Cl(Data), n=44))
nrow(Data)
nrow(Data[complete.cases(Data),])
Normally every column of a data frame is a vector. The results of RSI() are stored as a vector. When you say
Data$rsi22 <- data.frame(RSI(Cl(Data), n=22))
what you're doing is wrapping the results in a data frame and then embedding it an another data frame (Data), which is something you can legally do in R but which is unusual and confuses a lot of the standard data-processing functions.
You could try complete.cases
DatanoNA <- Data[complete.cases(Data),]
I have written a function to strip down a data frame to contain only the columns I want to plot. I now want to iterate a list of data frames through that function, so that each individual data frame only contains the info relevant to my plot.
Here is the function:
clean_data <- function(show_df){
show_data <- show_df[,c(1:2,7)]
colnames(show_data) <- c("Week", "WeeklyGross", "AvgTicketPrice")
#turns WeeklyGross into Numeric values
show_data$WeeklyGross <- gsub('[^a-zA-Z0-9.]', '', show_data$WeeklyGross)
show_data$WeeklyGross <- as.numeric(show_data$WeeklyGross)
#turns AvgTicketPrice into Numeric values
show_data$AvgTicketPrice <- gsub('[^a-zA-Z0-9.]', '', show_data$AvgTicketPrice)
show_data$AvgTicketPrice <- as.numeric(show_data$AvgTicketPrice)
show_data
}
And here is my code when I attempt to iterate the list of my data frames through the function:
df.list <- list(atw_df, cly_df, gent_df, kin_df,
mo_df,on_df, van_df, war_df)
new_list <- list()
for (i in seq(df.list)){
new_list <- clean_data(i)
}
I know that my loop is missing something, but I cannot figure out what. I want to store each data frame from that list in it's revised format as a variable so that I can use them to plot the information.
EDIT: made some code changes, I am now receiving an incorrect number of dimensions error in show_df[, c(1:2, 7)]
EDIT2: more changes made to the for loop, still receiving same error message.
Once you have your function, and your list, simply do
new_list <- lapply(df.list, clean_data)
Which will call clean_data once for each data frame in df.list and return a list of newly cleaned data frames.
Thus your entire "loop" becomes
df.list <- list(atw_df, cly_df, gent_df, kin_df,
mo_df,on_df, van_df, war_df)
new_list <- lapply(df.list, clean_data)
Update: My NOAA GHCN-Daily weather station data functions have since been cleaned and merged into the rnoaa package, available on CRAN or here: https://github.com/ropensci/rnoaa
I'm designing a R function to calculate statistics across a data set comprised of multiple data frames. In short, I want to pull data frames by class based on a reference data frame containing the names. I then want to apply statistical functions to values for the metrics listed for each given day. In effect, I want to call and then overlay a list of data frames to calculate functions on a vector of values for every unique date and metric where values are not NA.
The data frames are iteratively read into the workspace from file based on a class variable, using the 'by' function. After importing the files for a given class, I want to rbind() the data frames for that class and each user-defined metric within a range of years. I then want to apply a concatenation of user-provided statistical functions to each metric within a class that corresponds to a given value for the year, month, and day (i.e., the mean [function] low temperature [class] on July 1st, 1990 [date] reported across all locations [data frames] within a given region [class]. I want the end result to be new data frames containing values for every date within a region and a year range for each metric and statistical function applied. I am very close to having this result using the aggregate() function, but I am having trouble getting reasonable results out of the aggregate function, which is currently outputting NA and NaN for most functions other than the mean temperature. Any advice would be much appreciated! Here is my code thus far:
# Example parameters
w <- c("mean","sd","scale") # Statistical functions to apply
x <- "C:/Data/" # Folder location of CSV files
y <- c("MaxTemp","AvgTemp","MinTemp") # Metrics to subset the data
z <- c(1970:2000) # Year range to subset the data
CSVstnClass <- data.frame(CSVstations,CSVclasses)
by(CSVstnClass, CSVstnClass[,2], function(a){ # Station list by class
suppressWarnings(assign(paste(a[,2]),paste(a[,1]),envir=.GlobalEnv))
apply(a, 1, function(b){ # Data frame list, row-wise
classData <- data.frame()
sapply(y, function(d){ # Element list
CSV_DF <- read.csv(paste(x,b[2],"/",b[1],".csv",sep="")) # Read in CSV files as data frames
CSV_DF1 <- CSV_DF[!is.na("Value")]
CSV_DF2 <- CSV_DF1[which(CSV_DF1$Year %in% z & CSV_DF1$Element == d),]
assign(paste(b[2],"_",d,sep=""),CSV_DF2,envir=.GlobalEnv)
if(nrow(CSV_DF2) > 0){ # Remove empty data frames
classData <<- rbind(classData,CSV_DF2) # Bind all data frames by row for a class and element
assign(paste(b[2],"_",d,"_bound",sep=""),classData,envir=.GlobalEnv)
sapply(w, function(g){ # Function list
# Aggregate results of bound data frame for each unique date
dataFunc <- aggregate(Value~Year+Month+Day+Element,data=classData,FUN=g,na.action=na.pass)
assign(paste(b[2],"_",d,"_",g,sep=""),dataFunc,envir=.GlobalEnv)
})
}
})
})
})
I think I am pretty close, but I am not sure if rbind() is performing properly, nor why the aggregate() function is outputting NA and NaN for so many metrics. I was concerned that the data frames were not being bound together or that missing values were not being handled well by some of the statistical functions. Thank you in advance for any advice you can offer.
Cheers,
Adam
You've tackled this problem in a way that makes it very hard to debug. I'd recommend switching things around so you can more easily check each step. (Using informative variable names also helps!) The code is unlikely to work as is, but it should be much easier to work iteratively, checking that each step has succeeded before continuing to the next.
paths <- dir("C:/Data/", pattern = "\\.csv$")
# Read in CSV files as data frames
raw <- lapply(paths, read.csv, str)
# Extract needed rows
filter_metrics <- c("MaxTemp", "AvgTemp", "MinTemp")
filter_years <- 1970:2000
filtered <- lapply(raw, subset,
!is.na(Value) & Year %in% filter_years & Element %in% filter_metrics)
# Drop any empty data frames
rows <- vapply(filtered, nrow, integer(1))
filtered <- filtered[rows > 0]
# Compute aggregates
my_aggregate <- function(df, fun) {
aggregate(Value ~ Year + Month + Day + Element, data = df, FUN = fun,
na.action = na.pass)
}
means <- lapply(filtered, my_aggregate, mean)
sds <- lapply(filtered, my_aggregate, sd)
scales <- lapply(filtered, my_aggregate, scale)