looping difficulty with data - r

i can covert one column into climdexInput object format using the following code :
tmax.dates <- as.PCICt(do.call(paste, t[,c("year",
"days")]), format="%Y %j", cal="gregorian")
tmin.dates <- as.PCICt(do.call(paste, t[,c("year",
"days")]), format="%Y %j", cal="gregorian")
prec.dates <- as.PCICt(do.call(paste, t[,c("year",
"days")]), format="%Y %j", cal="gregorian")
## Load the data in.
ci <- climdexInput.raw(tmax=ntuobs[,1],
tmin=ntuobs[,1],
prec=ntuobs[,1],tmax.dates, tmin.dates, prec.dates, base.range=c(2000, 2010))
## Create a timeseries of monthly maximum 5-day consecutive
however, ntuobs[] has 100 columns and i want apply the function on all 100 columns and then store it into 100 ci[,] columns .
"i tried applying for loop
for (i in 1:100){
ci[,i] <- climdexInput.raw(tmax=ntuobs[,i],
tmin=ntuobs[,i],
prec=ntuobs[,i],tmax.dates, tmin.dates, prec.dates, base.range=c(2000, 2010))
} "
but this gives an error
"Error in ci[, i] <- climdexInput.raw(tmax = changi[, i], tmin = ntuobs[, :
object of type 'S4' is not subsettable"
kindly suggest me ways to tackle this problem. Any loop methods using APPLY function is welcomed.
Thanks
example dataset
library(PCICt)
## Create a climdexInput object from some data already loaded in and
## ready to go.
## Parse the dates into PCICt.
tmax.dates <- as.PCICt(do.call(paste, ec.1018935.tmax[,c("year",
"jday")]), format="%Y %j", cal="gregorian")
tmin.dates <- as.PCICt(do.call(paste, ec.1018935.tmin[,c("year",
"jday")]), format="%Y %j", cal="gregorian")
prec.dates <- as.PCICt(do.call(paste, ec.1018935.prec[,c("year",
"jday")]), format="%Y %j", cal="gregorian")
## Load the data in.
ci <- climdexInput.raw(ec.1018935.tmax$MAX_TEMP,
ec.1018935.tmin$MIN_TEMP, ec.1018935.prec$ONE_DAY_PRECIPITATION,
tmax.dates, tmin.dates, prec.dates, base.range=c(1971, 2000))
## Create a timeseries of annual SDII values.
sdii <- climdex.sdii(ci)
but this is for a single column but mine data is 100 column(ensemble).

Related

addTA - Error in naCheck(x, n) : Series contains non-leading NAs

I recently tried to create my own technical indicator, a simple golden cross indicator. 50 - 200 day EMA to be added to my chartSeries chart. This worked fine with the code below at first, but after the updated package of quantmod was released it gives me this error message:
Code (stock data is downloaded through the getSymbols function in quantmod)
#20dayEMA - 50dayEMA Technical indicator, Price and Volume
newEMA <- function(x){(removeNA(EMA(p[,6],n=50)-(EMA(p[,6],n=200))))
}
emaTA <- newTA(newEMA)
emaTA(col='lightgoldenrod3', 'Price')
Then it gives me this error message:
Error in naCheck(x, n) : Series contains non-leading NAs
Does anyone know how to remove these non-leading NAs?
You can use na.omit and there is no need to convert to an xts-object as this is the default.
library(quantmod)
getSymbols("VELO.CO")
p <- na.omit(VELO.CO)
newEMA <- function(x) {
EMA(p[,6], n = 20) - (EMA(p[,6], n = 50))
}
emaTA <- newTA(newEMA)
barChart(VELO.CO)
emaTA(col = "lightgoldenrod3", "Price")
I'm not familiar with the quantmod package, but I played around with your code and I think I found a working solution:
library("quantmod")
getSymbols("VELO.CO")
p <- as.xts(c(VELO.CO))
# remove incomplete cases
vec <- which(!complete.cases(p)) # rows 2305 2398
p2 <- p[-vec, ]
newEMA <- function(x) {
EMA(p2[, 6], n = 20) - (EMA(p2[, 6], n = 50))
}
emaTA <- newTA(newEMA)
barChart(VELO.CO)
emaTA(col = "lightgoldenrod3", "Price")

User Based Recommendation in R

I am trying to do user based recommendation in R by using recommenderlab package but all the time I am getting 0(no) prediction out of the model.
my code is :
library("recommenderlab")
# Loading to pre-computed affinity data
movie_data<-read.csv("D:/course/Colaborative filtering/data/UUCF Assignment Spreadsheet_user_row.csv")
movie_data[is.na(movie_data)] <- 0
rownames(movie_data) <- movie_data$X
movie_data$X <- NULL
# Convert it as a matrix
R<-as.matrix(movie_data)
# Convert R into realRatingMatrix data structure
# realRatingMatrix is a recommenderlab sparse-matrix like data-structure
r <- as(R, "realRatingMatrix")
r
rec=Recommender(r[1:nrow(r)],method="UBCF", param=list(normalize = "Z-score",method="Cosine",nn=5, minRating=1))
recom <- predict(rec, r["1648"], n=5)
recom
as(recom, "list")
all the time I am getting out put like :
as(recom, "list")
$`1648`
character(0)
I am using user-row data from this link:
https://drive.google.com/file/d/0BxANCLmMqAyIQ0ZWSy1KNUI4RWc/view
In that data column A contains user id and apart from that all are movie rating for each movie name.
Thanks.
The line of code movie_data[is.na(movie_data)] <- 0 is the source of the error. For realRatingMatrix (unlike the binaryRatingMatrix) the movies that are not rated by the users are expected to be NA values, not zero values. For example, the following code gives the correct predictions:
library("recommenderlab")
movie_data<-read.csv("UUCF Assignment Spreadsheet_user_row.csv")
rownames(movie_data) <- movie_data$X
movie_data$X <- NULL
R<-as.matrix(movie_data)
r <- as(R, "realRatingMatrix")
rec=Recommender(r,method="UBCF", param=list(normalize = "Z-score",method="Cosine",nn=5, minRating=1))
recom <- predict(rec, r["1648"], n=5)
as(recom, "list")
# [[1]]
# [1] "X13..Forrest.Gump..1994." "X550..Fight.Club..1999."
# [3] "X77..Memento..2000." "X122..The.Lord.of.the.Rings..The.Return.of.the.King..2003."
# [5] "X1572..Die.Hard..With.a.Vengeance..1995."

How to estimate static yield curve with 'termstrc' package in R?

I am trying to estimate the static yield curve for Brazil using termstrc package in R. I am using the function estim_nss.couponbonds and putting 0% coupon-rates and $0 cash-flows, except for the last one which is $1000 (the face-value at maturity) -- as far as I know this is the function to do this, because the estim_nss.zeroyields only calculates the dynamic curve. The problem is that I receive the following error message:
"Error in (pos_cf[i] + 1):pos_cf[i + 1] : NA/NaN argument In addition: Warning message: In max(n_of_cf) : no non-missing arguments to max; returning -Inf "
I've tried to trace the problem using trace(estim_nss.couponbons, edit=T) but I cannot find where pos_cf[i]+1 is calculated. Based on the name I figured it could come from the postpro_bondfunction and used trace(postpro_bond, edit=T), but I couldn't find the calculation again. I believe "cf" comes from cashflow, so there could be some problem in the calculation of the cashflows somehow. I used create_cashflows_matrix to test this theory, but it works well, so I am not sure the problem is in the cashflows.
The code is:
#Creating the 'couponbond' class
ISIN <- as.character(c('ltn_2017','ltn_2018', 'ltn_2019', 'ltn_2021','ltn_2023')) #Bond's identification
MATURITYDATE <- as.Date(c(42736, 43101, 43466, 44197, 44927), origin='1899-12-30') #Dates are in system's format
ISSUEDATE <- as.Date(c(41288,41666,42395, 42073, 42395), origin='1899-12-30') #Dates are in system's format
COUPONRATE <- rep(0,5) #Coupon rates are 0 because these are zero-coupon bonds
PRICE <- c(969.32, 867.77, 782.48, 628.43, 501.95) #Prices seen 'TODAY'
ACCRUED <- rep(0.1,5) #There is no accrued interest in the brazilian bond's market
#Creating the cashflows sublist
CFISIN <- as.character(c('ltn_2017','ltn_2018', 'ltn_2019', 'ltn_2021', 'ltn_2023')) #Bond's identification
CF <- c(1000,1000,1000,1000,1000)# The face-values
DATE <- as.Date(c(42736, 43101, 43466, 44197, 44927), origin='1899-12-30') #Dates are in system's format
CASHFLOWS <- list(CFISIN,CF,DATE)
names(CASHFLOWS) <- c("ISIN","CF","DATE")
TODAY <- as.Date(42646, origin='1899-12-30')
brasil <- list(ISIN,MATURITYDATE,ISSUEDATE,
COUPONRATE,PRICE,ACCRUED,CASHFLOWS,TODAY)
names(brasil) <- c("ISIN","MATURITYDATE","ISSUEDATE","COUPONRATE",
"PRICE","ACCRUED","CASHFLOWS","TODAY")
mybonds <- list(brasil)
class(mybonds) <- "couponbonds"
#Estimating the zero-yield curve
ns_res <-estim_nss.couponbonds(mybonds, 'brasil' ,method = "ns")
#Testing the hypothesis that the error comes from the cashflow matrix
cf_p <- create_cashflows_matrix(mybonds[[1]], include_price = T)
m_p <- create_maturities_matrix(mybonds[[1]], include_price = T)
b <- bond_yields(cf_p,m_p)
Note that I am aware of this question which reports the same problem. However, it is for the dynamic curve. Besides that, there is no useful answer.
Your code has two problems. (1) doesn't name the 1st list (this is the direct reason of the error. But if modifiy it, another error happens). (2) In the cashflows sublist, at least one level of ISIN needs more than 1 data.
# ...
CFISIN <- as.character(c('ltn_2017','ltn_2018', 'ltn_2019',
'ltn_2021', 'ltn_2023', 'ltn_2023')) # added a 6th element
CF <- c(1000,1000,1000,1000,1000, 1000) # added a 6th
DATE <- as.Date(c(42736,43101,43466,44197,44927, 44928), origin='1899-12-30') # added a 6th
CASHFLOWS <- list(CFISIN,CF,DATE)
names(CASHFLOWS) <- c("ISIN","CF","DATE")
TODAY <- as.Date(42646, origin='1899-12-30')
brasil <- list(ISIN,MATURITYDATE,ISSUEDATE,
COUPONRATE,PRICE,ACCRUED,CASHFLOWS,TODAY)
names(brasil) <- c("ISIN","MATURITYDATE","ISSUEDATE","COUPONRATE",
"PRICE","ACCRUED","CASHFLOWS","TODAY")
mybonds <- list(brasil = brasil) # named the list
class(mybonds) <- "couponbonds"
ns_res <-estim_nss.couponbonds(mybonds, 'brasil', method = "ns")
Note: the error came from these lines
bonddata <- bonddata[group] # prepro_bond()'s 1st line (the direct reason).
# cf <- lapply(bonddata, create_cashflows_matrix) # the additional error
create_cashflows_matrix(mybonds[[1]], include_price = F) # don't run

SMA using R & TTR Package

Afternoon! I'm just starting out with R and learning about data frames, packages, etc... read a lot of the messages here but couldn't find an answer.
I have a table I'm accessing with R that has the following fields:
[Symbol],[Date],[Open],[High],[Low],[Close],[Volume]
And, I'm calculating SMAs on the close prices:
sqlQuery <- "Select * from [dbo].[Stock_Data]"
conn <- odbcDriverConnect(connectionString)
dfSMA <- sqlQuery(conn, sqlQuery)
sma20 <- SMA(dfSMA$Close, n = 20)
dfSMA["SMA20"] <- sma20
When I look at the output, it appears to be calculating the SMA without any regard for what the symbol is. I haven't tried to replicate the calculation, but I would suspect it's just doing it by 20 moving rows, regardless of date/symbol.
How do I restrict the calculation to a given symbol?
Any help is appreciated - just need to be pointed in the right direction.
Thanks
You're far more likely to get answers if you provide reproducible examples. First, let's replicate your data:
library(quantmod)
symbols <- c("GS", "MS")
getSymbols(symbols)
# Create example data:
dGS <- data.frame("Symbol" = "GS", "Date" = index(GS), coredata(OHLCV(GS)))
names(dGS) <- str_replace(names(dGS), "GS\\.", "")
dMS <- data.frame("Symbol" = "MS", "Date" = index(MS), coredata(OHLCV(MS)))
names(dMS) <- str_replace(names(dMS), "MS\\.", "")
dfSMA <- rbind(dGS, dMS)
> head(dfSMA)
Symbol Date Open High Low Close Volume Adjusted
1 GS 2007-01-03 200.60 203.32 197.82 200.72 6494900 178.6391
2 GS 2007-01-04 200.22 200.67 198.07 198.85 6460200 176.9748
3 GS 2007-01-05 198.43 200.00 197.90 199.05 5892900 177.1528
4 GS 2007-01-08 199.05 203.95 198.10 203.73 7851000 181.3180
5 GS 2007-01-09 203.54 204.90 202.00 204.08 7147100 181.6295
6 GS 2007-01-10 203.40 208.44 201.50 208.11 8025700 185.2161
What you want to do is subset your long data object, and then apply technical indicators on each symbol in isolation. Here is one approach to guide you toward acheiving your desired result.
You could do this using a list, and build the indicators on xts data objects for each symbol, not on a data.frame like you do in your example (You can apply the TTR functions to columns in a data.frame but it is ugly -- work with xts objects is much more ideal). This is template for how you could do it. The final output l.data should be intuitive to work with. Keep each symbol in a separate "Container" (element of the list) rather than combining all the symbols in one data.frame which isn't easy to work with.
make_xts_from_long_df <- function(x) {
# Subset the symbol you desire
res <- dfSMA[dfSMA$Symbol == x, ]
#Create xts, then allow easy merge of technical indicators
x_res <- xts(OHLCV(res), order.by = res$Date)
merge(x_res, SMA(Cl(x_res), n = 20))
}
l.data <- setNames(lapply(symbols, make_xts_from_long_df), symbols)

R subsetting by date range

seems simple enough and I've been through all similar questions and applied them all... I'm either getting nothing or everything...
Trying to took at water temperatures (WTEMP) for specific date range(SAMPLE_DATE) 2007-06-01 to 2007-09-30 from (allconmon)
here is my code so far...
bydate<-subset(allconmon, allconmon$SAMPLE_DATE > as.Date("2007-06-01") & allconmon$SAMPLE_DATE < as.Date("2007-09-30"))
Ive also tried this but get errors
bydate2<- as.xts(allconmon$WTEMP,order.by=allconmon$SAMPLE_DATE)
bydate2['2007-06-01/2007-09-30']
Error in xts(x, order.by = order.by, frequency = frequency, .CLASS = "double", :
order.by requires an appropriate time-based object
not sure what I'm doing wrong here... seems to work for other people
I will highly recommend you using zoo package in R while dealing with time series data.
The operation you mentioned is actually a window function in zoo.
Here is the example from ?window:
Examples
window(presidents, 1960, c(1969,4)) # values in the 1960's
window(presidents, deltat = 1) # All Qtr1s
window(presidents, start = c(1945,3), deltat = 1) # All Qtr3s
window(presidents, 1944, c(1979,2), extend = TRUE)
pres <- window(presidents, 1945, c(1949,4)) # values in the 1940's
window(pres, 1945.25, 1945.50) <- c(60, 70)
window(pres, 1944, 1944.75) <- 0 # will generate a warning
window(pres, c(1945,4), c(1949,4), frequency = 1) <- 85:89
pres
Here is a list of papers from JSS demonstrating the usage of the zoo package also reshape your data which I found very inspiring.
I figured it out! on multiple levels... first off I didn't notice that R did something funky with my sample date label when I uploaded from text file... probably my fault...
here is a small sample of the data set. its 5,573,301 observations of 30 variables
notice the funky symbol in front of sample date.... not sure why R did that...
ï..SAMPLE_DATE SampleTime STATION SONDE Layer TOTAL_DEPTH TOTAL_DEPTH_A BATT BATT_A WTEMP WTEMP_A SPCOND SPCOND_A SALINITY SALINITY_A DO_SAT DO_SAT_A
however what I did.... (i changed the name to x as allconmon was a bit excessive)
x <- read.csv(file = "C:/Users/Desktop/cmon2001-08.txt",quote = "",header = TRUE,sep = "\t", na.strings = c("","NULL"))
library(chron)
x$month <- months(as.Date(x$ï..SAMPLE_DATE, "%Y-%m-%d"))
x$year <- substr(as.character(x$ï..SAMPLE_DATE), 1, 4)
y <- x[x$month == 'June' | x$month == 'July' | x$month == 'August' | x$month == 'September' ,]
so now I was able to subset all my data by those 4 months and then later by year, station, and water temp....

Resources