Using R to Grab Multiple Quandl Future Curves - r

I am trying to grab settlement values from 12 different future curves on Quandl and then assign them into an xts object. I am currently getting the error
"Error: object of type 'closure' is not subsettable"
and only one column. Ideally I want to have twelve columns named CL1, CL2, etc.
for (i in 1:12) {
data1 = Quandl(paste("CHRIS/CME_CL", i, sep = ""), start_date= "2017-01-01", type = "xts")
if (i == 1){ CL <- cbind(data1$Date, data1$Settle) }
if(i > 1){CL = cbind(CL, data$Settle)}
}

data1 = data.frame(matrix(ncol = 12, nrow = 279))
for (i in 1:12){
data1[,i] = Quandl(paste("CHRIS/CME_CL",i, sep=""), start_date="2017-01-01")$Settle
}
Can you verify if this works? If it works, you should be able to convert the data.frame() object to a series one.
I got this error after working on it a bit.
Error: { "quandl_error": { "code": "QELx01", "message": "You have
exceeded the anonymous user limit of 50 calls per day. To make more
calls today, please register for a free Quandl account and then
include your API key with your requests." } }
Hope it works.

Related

How to use Trycatch to skip errors in data downloading in R

I am trying to download data from the USGS website using the dataRetrieval package of R.
For that purpose, I have generated a function called getstreamflow in R that works fine when I ran for example.
siteNumber <- c("094985005","09498501","09489500","09489499","09498502")
Streamflow = getstreamflow(siteNumber)
The output of the function is a list of data frames
I could run the function when there is no issue downloading the data, but for some stations, I got the following error:
Request failed [404]. Retrying in 1.1 seconds...
Request failed [404]. Retrying in 3.3 seconds...
For: https://waterservices.usgs.gov/nwis/site/?siteOutput=Expanded&format=rdb&site=0946666666
To avoid that the function stops when encounters an error, I am trying to use tryCatch as in the following code:
Streamflow = tryCatch(
expr = {
getstreamflow(siteNumber)
},
error = function(e) {
message(paste(siteNumber," there was an error"))
})
I want the function to skip the station and go to the next when encountering an error. Currently, the output I got is the one presented below, that obviously is wrong, because it says that for all the stations there was an error:
094985005 there was an error09498501 there was an error09489500 there was an error09489499 there was an error09498502 there was an error09511300 there was an error09498400 there was an error09498500 there was an error09489700 there was an error09500500 there was an error09489082 there was an error09510200 there was an error09489100 there was an error09490500 there was an error09510180 there was an error09494000 there was an error09490000 there was an error09489086 there was an error09489089 there was an error09489200 there was an error09489078 there was an error09510170 there was an error09493500 there was an error09493000 there was an error09498503 there was an error09497500 there was an error09510000 there was an error09509502 there was an error09509500 there was an error09492400 there was an error09492500 there was an error09497980 there was an error09497850 there was an error09492000 there was an error09497800 there was an error09510150 there was an error09499500 there was an error... <truncated>
What I am doing wrong using the tryCatch?
Answer
You wrote the tryCatch outside of getstreamflow. Hence, if one site fails, then getstreamflow will return an error and nothing else. You should either supply 1 site at a time, or put the tryCatch inside getstreamflow.
Example
x <- 1:5
fun <- function(x) {
for (i in x) if (i == 5) stop("ERROR")
return(x^2)
}
tryCatch(fun(x), error = function(e) paste0("wrong", x))
This returns:
[1] "wrong1" "wrong2" "wrong3" "wrong4" "wrong5"
Multiple arguments
You indicated that you have both siteNumber and datatype to iterate over.
Using Map, we can define a function that takes two inputs:
Map(function(x, y) tryCatch(fun(x, y),
error = function(e) message(paste(x, " there was an error"))),
x = siteNumber,
y = datatype)
Using a for-loop, we can just iterate over them:
Streamflow <- vector(mode = "list", length = length(siteNumber))
for (i in seq_along(siteNumber)) {
Streamflow[[i]] <- tryCatch(getstreamflow(siteNumber[i], datatype), error = function(e) message(paste(x, " there was an error")))
}
Or, as suggested, just modify getstreamflow.

how to write out multiple files in R?

I am a newbie R user. Now, I have a question related to write out multiple files with different names. Lets says that my data has the following structure:
IV_HAR_m1<-matrix(rnorm(1:100), ncol=30, nrow = 2000)
DV_HAR_m1<-matrix(rnorm(1:100), ncol=10, nrow = 2000)
I am trying to estimate multiple LASSO regressions. At the beginning, I was storing the iterations in one object called Dinamic_beta. This object was stored in only one file, and it saves the required information each time that my code iterate.
For doing this I was using stew which belongs to pomp package, but the total process takes 5 or 6 days and I am worried about a power outage or a fail in my computer.
Now, I want to save each environment (iterations) in a .Rnd file. I do not know how can I do that? but the code that I am using is the following:
library(glmnet)
library(Matrix)
library(pomp)
space <- 7 #THE NUMBER OF FILES THAT I would WANT TO CREATE
Dinamic_betas<-array(NA, c(10, 31, (nrow(IV_HAR_m1)-space)))
dimnames(Dinamic_betas) <- list(NULL, NULL)
set.seed(12345)
stew( #stew save the enviroment in a .Rnd file
file = "Dinamic_LASSO_RD",{ # The name required by stew for creating one file with all information
for (i in 1:dim(Dinamic_betas)[3]) {
tryCatch( #print messsages
expr = {
cv_dinamic <- cv.glmnet(IV_HAR_m1[i:(space+i-1),],
DV_HAR_m1[i:(space+i-1),], alpha = 1, family = "mgaussian", thresh=1e-08, maxit=10^9)
LASSO_estimation_dinamic<- glmnet(IV_HAR_m1[i:(space+i-1),], DV_HAR_m1[i:(space+i-1),],
alpha = 1, lambda = cv_dinamic$lambda.min, family = "mgaussian")
coefs <- as.matrix(do.call(cbind, coef(LASSO_estimation_dinamic)))
Dinamic_betas[,,i] <- t(coefs)
},
error = function(e){
message("Caught an error!")
print(e)
},
warning = function(w){
message("Caught an warning!")
print(w)
},
finally = {
message("All done, quitting.")
}
)
if (i%%400==0) {print(i)}
}
}
)
If someone can suggest another package that stores the outputs in different files I will grateful.
Try adding this just before the close of your loop
save.image(paste0("Results_iteration_",i,".RData"))
This should save your entire workspace to disk for every iteration. You can then use load() to load the workspace of every environment. Let me know if this works.

Calling R function in Vertica: Failure in UDx RPC call ... Exception in processPartitionForR: [0 (non-NA) cases]

I have created a transform UDF in R, which runs linear regression over a table partitioned by an Id of some entity. I tested it in console and it worked flawlessly, the results made sense and all was good. However, in the practical setting (code ran by the server, not me manually) with the same data I always see the same error:
ERROR 3399: Failure in UDx RPC call InvokeProcessPartition(): Error calling processPartition() in User Defined Object [remove_temperature_correlation] at [/scratch_a/release/24506/vbuild/vertica/OSS/UDxFence/RInterface.cpp:1387], error code: 0, message: Exception in processPartitionForR: [0 (non-NA) cases]
Here is the actual function:
require('splines')
timefy <- function(time) {
time = as.POSIXct(time, tz='utc', origin='1970-01-01T00:00:00Z')
return(time)
}
remove_temperature_correlation <- function(data, params=list()) {
names(data) = c('Time', 'Value', 'Temperature')
# Check params
df = params[['df']]
if (is.null(df))
df = 4
degree = params[['degree']]
if (is.null(degree))
degree = 1
# Convert Vertica timestamps to R's POSIXct format
data$ct = timefy(data$Time)
# Fit model
formula = Value ~ bs(Temperature, df = df, degree = degree)
fitmodel = lm(formula, data=data)
data$NormalizedValue = residuals(fitmodel)
return(data[c('Time', 'Value', 'Temperature', 'NormalizedValue')])
}
remove_temperature_correlation_parameters <- function()
{
num_params = 2
param = data.frame(datatype=rep(NA, num_params),
length=rep(NA, num_params),
scale=rep(NA, num_params),
name=rep(NA, num_params))
param[1,1] = 'int'
param[1,4] = 'df'
param[2,1] = 'int'
param[2,4] = 'degree'
return(param)
}
remove_temperature_correlation_return_type <- function(x, param)
{
num_params = 4
param = data.frame(datatype=rep(NA, num_params),
length=rep(NA, num_params),
scale=rep(NA, num_params),
name=rep(NA, num_params))
param[1,1] = 'timestamptz'
param[1,4] = 'Time'
param[2,1] = 'float'
param[2,4] = 'Value'
param[3,1] = 'float'
param[3,4] = 'Temperature'
param[4,1] = 'float'
param[4,4] = 'NormalizedValue'
return(param)
}
remove_temperature_correlation_factory <- function()
{
list(name=remove_temperature_correlation,
udxtype=c('transform'),
# time, value, temperature
intype=c('timestamptz', 'float', 'float'),
outtype=c('any'),
outtypecallback=remove_temperature_correlation_return_type,
parametertypecallback=remove_temperature_correlation_parameters,
volatility=c('stable'),
strict=c('called_on_null_input'))
}
I was trying to simulate a situation that causes similar error in test environment and found that supplying the function with just a single row raises the same error. I'm very new to Vertica and it's UDFs so it would really help if I could get some ideas on how to debug it further and ideas on what could be the cause. Some googling and conversations led me to believe that the cause is in the way the data is partitioned (maybe some empty partitions arriving to the UDF?)
Here is how I call the UDF:
create local temporary table TEMP_table
on commit preserve rows
as
SELECT
Id,
remove_temperature_correlation(Time, Value, Temperature USING PARAMETERS df = 4, degree=1)
OVER(partition by Id order by Time)
FROM temp_input;
The temp_input table is simply storing the corresponding data in multiple rows.
What could be the way to solve it, any ideas on how to find out where exactly this error happens and how to handle it?

Unknown error on Facebook API through R.

I'm trying to download all the posts from a facebook page through RFacebook, but when the page has an high number of posts (over 400 or so), the script stops, returning the error
"Error in callAPI(url = url, token = token) : An unknown error has occurred." at the line where I call the getPage.
library(Rfacebook)
library(stringr)
load("fb_oauth")
token=fb_oauth
page<-getPage("bicocca", token, n = 100000, since = NULL, until = NULL, feed = TRUE)
noSpaceMsg<-str_replace_all(page$message, "[\r\n]" , "")
output<-as.data.frame(cbind(page$from_name,page$id, noSpaceMsg, page$created_time, page$type, page$link, page$likes_count, page$comments_count, page$shares_count))
colnames(output)<-c("username","msgid", "message", "created_time", "type", "link", "likes", "comments", "shares")
write.csv(output, "bicocca.csv", row.names=FALSE)
Where is the problem? How can I fix it?
It seems to be a problem with the API, not with the R package. When I try to do the query in the Graph API Explorer here, I get an error too. No idea why.
One way around this is to query month by month, wrapping the getPage function in a try command:
page <- 'bicocca'
dates <- seq(as.Date("2010/10/01"), as.Date("2015/04/20"), by="month")
n <- length(dates)-1
df <- list()
for (i in 1:n){
cat(as.character(dates[i]), " ")
try(df[[i]] <- getPage(page, token, since=dates[i], until=dates[i+1]))
cat("\n")
}
df <- do.call(rbind, df)
This will not give you all the posts, but probably most of them.

Handling internet connection R

I`m trying to download several stocks from google, but every time the connection stops, R stops the loop. How can I handle this problem?
stocks <- c(
'MSFT',
'GOOG',
...
)
for (symbol in stocks)
{
stock_price <- getSymbols(symbol,src='google', from=startDate,to=endDate,auto.assign = FALSE)
prices[,j] <- stock_price[,1]
j <- j + 1
}
From the R manual "quantmod.pdf:
If auto.assign=FALSE or env=NULL (as of 0.4-0) the data will be returnedfrom the call, and will require the user to assign the results himself.Note that only one symbol at a time may be requested when auto assignment is disabled.
You are trying to request more than one ticket symbol at a time with the auto.assign parameter set to false and this is not allowed. However, you should be able to obtain all your symbols at once by adapting the following code:
data <- new.env()
getSymbols.extra(stocks, src = 'google', from = startDate, to = endDate, env = data, auto.assign = T)
plot(data$MSFT)
Pay careful attention to the R manual for getSymbols
"Data is fetched through one of the available getSymbols methods and saved in the env specified - the .GloblEnv by default.

Resources