R quantmod data merge regression error - r

This R code throws an error, namely
Error in .xts(e, .index(e1), .indexCLASS = indexClass(e1),
.indexFORMAT = indexFormat(e1), : index length must match number
of observations
Code:
library('quantmod')
library('foreach')
JNK <- getSymbols('JNK', from='2010-01-01',auto.assign=FALSE)[,6]
GSPC <- getSymbols('^GSPC', from='2010-01-01',auto.assign=FALSE)[,6]
JNK <- diff(log(JNK))
GSPC <- diff(log(GSPC))
Data <- na.omit(merge(JNK,GSPC, all=FALSE))
m <- lm(JNK ~ GSPC, data=Data)
plot(m)
Could anyone help me figure out what I'm doing wrong?

The actual column names of Data are JNK.Adjusted and GSPC.Adjusted. Hence, you should specify the complete names in the lm call:
m <- lm(JNK.Adjusted ~ GSPC.Adjusted, data=Data)
plot(m)
Otherwise, the plot function will look for the columns JNK and GSPC but will not find them in Data.

Related

R: Problem with raster prediction from a linear model

I am using the function raster::predict to extract the prediction part of a linear model as a raster but I am getting this error:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : object is not a matrix
In addition: Warning message:
'newdata' had 622 rows but variables found have 91 rows
My data set is a RasterStack of two satellite images (same CRS and data type). I have found this question but I couldn't solve my problem.
Here is the code and the data:
library(raster)
ntl = raster ("path/ntl.tif")
vals_ntl <- as.data.frame(values(ntl))
ntl_coords = as.data.frame(xyFromCell(ntl, 1:ncell(ntl)))
combine <- as.data.frame(cbind(ntl_coords,vals_ntl))
ebbi = raster ("path/ebbi.tif")
ebbi <- resample(ebbi, ntl, method = "bilinear")
vals_ebbi <- as.data.frame(values(ebbi))
s = stack(ntl, ebbi)
block.data <- as.data.frame(cbind(combine, vals_ebbi))
names(block.data)[3] <- "ntl"
names(block.data)[4] <- "ebbi"
block.data <- na.omit(block.data)
model <- lm(formula = ntl ~ ebbi, data = block.data)
#predict to a raster
r1 <- raster::predict(s, model, progress = 'text', na.rm = T)
plot(r1)
writeRaster(r1, filename = "path/lm_predict.tif")
The data can be downloaded from here (I don't know if by sharing a smaller dataset the problem would still exist so I decided to share the full dataset which is quite big when using the dput command to copy-paste it)
You are correct that dput is generally not very useful for spatial data; and that you should avoid using it. However, in most cases, there is no need to share data as you can create example data with code, or with data that ships with R, like in most examples in the help files and questions on this site. Saying that "I don't know if by sharing a smaller dataset the problem would still exist" suggests that the first thing you should do is to find out.
If you have a SpatRaster x that you want to reproduce, you can start with as.character(x), which is what I did to get the below.
library(terra)
ntl <- rast(ncols=48, nrows=91, nlyrs=1, xmin=582360, xmax=604440, ymin=1005560, ymax=1047420, names=c('avg_rad'), crs='EPSG:7767')
ebbi <- rast(ncols=48, nrows=91, nlyrs=1, xmin=582360, xmax=604440, ymin=1005560, ymax=1047420, names=c('B6_median'), crs='EPSG:7767')
values(ntl) <- sample(100, ncell(ntl), replace=TRUE)
values(ebbi) <- runif(ncell(ebbi))
Combine, set the names, and get the values into a data.frame. For larger datasets you could take a sample with spatSample(x, type="regular").
x <- c(ntl, ebbi)
names(x) <- c("ntl", "ebbi")
Fit the model. You can do that in two steps
v <- as.data.frame(x, na.rm=TRUE)
model <- lm(ntl ~ ebbi, data=v)
Or in one step
model <- lm(ntl ~ ebbi, data=x)
And now predict (set a filename if you want to save the raster to disk).
p <- predict(x$ebbi, model, filename="")
It is important that the first (SpatRaster) argument to predict has names that match the names in the model. So in this case you can use x$ebbi or x[[2]], but if you use ebbi you get a mysterious error message
p <- predict(ebbi, model)
#Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) : object is not a matrix
#In addition: Warning message:
#'newdata' had 48 rows but variables found have 91 rows
unless you first do
names(ebbi) <- "ebbi"
p <- predict(ebbi, model)
Alternative, using the raster package the solution is:
library(raster)
ntl = raster ("path/ntl.tif")
ebbi = raster ("path/ebbi.tif")
ebbi <- resample(ebbi, ntl, method = "bilinear")
s = stack(ntl, ebbi)
names(s) = c('ntl', 'ebbi') # important step in order to run the predict function successfully
block.data = data.frame(na.omit(values(s)))
names(block.data) <- c('ntl', 'ebbi')
model <- lm(formula = ntl ~ ebbi, data = block.data)
#predict to a raster
r1 <- raster::predict(s, model, progress = 'text', na.rm = T)
plot(r1)
writeRaster(r1, filename = "path/lm_predict.tif")
I found the answer based on this post.

In R, `Error in f(arg, ...) : NA/NaN/Inf in foreign function call (arg 1)` but there are no Infs, no NaNs, no `char`s, etc

I am trying to use the lqmm package in R and receiving the error Error in f(arg, ...) : NA/NaN/Inf in foreign function call (arg 1). I can successfully use it for a version of my data in which a variable called cluster_name is averaged over.
I've tried to verify that there are no NaNs or infinite values in my dataset this way:
na_data = mydata
new_DF <- na_data[rowSums(is.na(mydata)) > 0,] # yields a dataframe with no observations
is.na(na_data) <- sapply(na_data, is.infinite)
new_DF <- na_data[rowSums(is.na(mydata)) > 0,] # still a dataframe with no observations
There are no variables in my dataframe that are type char -- every such variable has been converted to a factor.
When I run my model
m1 = lqmm(std_brain ~ std_beh*type*taught, random = ~1, group=subject, data = begin_data, tau=.5, na.action=na.exclude)
on the first 12,528 lines of my dataset, the model works fine. Line 12,529 looks totally normal.
Similarly, if I run tail(mydata, 11943) I get a dataframe that runs without error, but tail(mydata, 11944) gives me a dataframe that generates the error. I can also run a subset from 9990:21825 without error, but extending the dataframe on either side generates the error. The whole dataframe is 29450 observations, and thus this middle slice contains the supposedly problematic observations. I tried making a smaller version of my dataset that contained just the borders of problems, and some observations around them, and I can see that 3/4 cases involve the same subject (7645), but I don't know what to make of that. I don't see how to make this reproducible without providing the whole dataframe (in case you were wondering, the small dataset doesn't cause any error). So here is the csv file I used.
Here is the function that gets the dataframe ready for analysis:
prep_data_set <- function(data_file, brain_var = 'beta', beh_var = 'accuracy') {
data = read.csv(data_file)
data$subject <- factor(data$subject)
data$type <- factor(data$type)
data$type <- relevel(data$type, ref = "S")
data$taught <- factor(data$taught)
data <- subset(data, data$run_num < 13)
data$run = factor(data$run_num)
brain_mean <- mean(data[[brain_var]])
brain_sd <- sd(data[[brain_var]])
beh_mean <- mean(data[[beh_var]])
beh_sd <- sd(data[[beh_var]])
data <- subset(data, data$cluster_name != "")
data$cluster_name <- factor(data$cluster_name)
data$mean_centered_brain <- data[[brain_var]]
data$std_brain <- data$mean_centered_brain/brain_sd
data$mean_centered_beh <- data[[beh_var]]
data$std_beh <- data$mean_centered_beh/beh_sd
return(data)
}
I run
mydata = prep_data_set(file.path(resdir, 'robust0005', 'pos_rel_con__all_clusters.csv'))
m1 = lqmm(std_brain ~ std_beh*type*taught, random = ~1, group=subject, data = mydata, tau=.5, na.action=na.exclude)
to generate the error.
By comparison
regular_model = lmer(std_brain ~ type*taught*std_beh + (1|subject/run) +
(1|subject:cluster_name), data = mydata)
runs fine.
I hope there is something interesting and generalizable in this question; I know it's kind of annoying to post to Stack Overflow with some idiosyncratic problem in a ~30000 line dataset.

Fixing missing data- how to transform table into ts object that works with KalmanRun?

I'm working with data from SteamCharts on a game- Warframe (https://steamdb.info/app/230410/graphs/)
Edit- The data is a .csv downloadable near "Steam charts for every day"
I'm modelling this timeseries data, but the package I'm using requires no missing values. To resolve this, I'm using arima to predict the missing values (instructions from link reproduced below)
https://stats.stackexchange.com/questions/104565/how-to-use-auto-arima-to-impute-missing-values
require(forecast)
# sample series
x0 <- x <- log(AirPassengers)
y <- x
# set some missing values
x[c(10,60:71,100,130)] <- NA
# fit model
fit <- auto.arima(x)
# Kalman filter
kr <- KalmanRun(x, fit$model)
# impute missing values Z %*% alpha at each missing observation
id.na <- which(is.na(x))
for (i in id.na)
y[i] <- fit$model$Z %*% kr$states[i,]
# alternative to the explicit loop above
sapply(id.na, FUN = function(x, Z, alpha) Z %*% alpha[x,],
Z = fit$model$Z, alpha = kr$states)
As of now, I've managed to convert to
Convert the Date strings to a DateTime object in my dataframe:
df <- read.csv(file="chart.csv", header=TRUE, sep=",")
df = df %>% select(DateTime, Players)
df_temp[['DateTime']] <- as.Date(strptime(df[['DateTime']], format='%Y-%m-%d %H:%M:%S'))
Get an xts object of my data (I believe arima only works with ts though)
df = xts(df$Players, df$DateTime)
df = ts(df)
The arima model fits, but when I try to use the KalmanRun, I get the following error:
Error in KalmanRun(x, fit$model) : invalid argument type
I believe there's an issue in how I'm converting it to a timeseries object, but don't know how to resolve it. Any help would be greatly appreciated. Thanks!

Problems with raster prediction from linear model in r

I'm having problems with predicting a raster using a linear model.
Firstly i create my model from the data found in my polygons.
# create model
poly <- st_read("polygon.shp")
df <- na.omit(poly)
df <- df[df$gdp > 0 & df$ntl2 > 0 & df$pop2 > 0,]
x <- log(df$ntl2)
y <- log(df$gdp*df$pop2)
c <- df$iso
d <- data.frame(x,y,c)
m <- lm(y~x+c,data=d)
Then i want to use raster::predict to estimate an output raster
# raster data
iso <- raster("iso.tif")
viirs <- raster("viirs.tif")
x <- log(viirs)
c <- iso
## predict with models
s <- stack(x,c)
predicted <- raster::predict(x,model=m)
however i get following response:
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
object is not a matrix
I don't know what the problem is and how to fix it. My current throughts are that its something to do with the factors/country codes:
My model includes country codes, as I would like to include some country fixed effects. Maybe there is a problems with including these. However even when excluding the country codes from the model and the entire dataframe, i still get the same error message.
Futhermore, my model is based on regional values from the whole world and the prediction datasets only include the extent of Turkey. Maybe this is the problem?
And here is the data:
https://drive.google.com/open?id=16cy7CJFrxQCTLhx-hXDNHJz8ej3vTEED
Perhaps it works if you do like this:
iso <- raster("iso.tif")
viirs <- raster("viirs.tif")
s <- stack(log(viirs), iso)
names(s) <- c("x", "c")
predicted <- raster::predict(s, model=m)
It won't work if the values in df$iso and iso.tif don't match (is one a factor, and the other numeric?).

Error in panel regression in case of different independent variable r

I am trying to run Fama Macbeth regression by the following code:
require(foreign)
require(plm)
require(lmtest)
fpmg <- pmg(return~max_1,df_all_11, index=c("yearmonth","firms" ))
Fama<-fpmg
coeftest(Fama)
It is working when I regress the data using the independent variable named 'max_1'. However when I change it and use another independent variable named 'ivol_1' the result is showing an error. The code is
require(foreign)
require(plm)
require(lmtest)
fpmg <- pmg(return~ivol_1,df_all_11, index=c("yearmonth","firms" ))
Fama<-fpmg
coeftest(Fama)
the error message is like this:
Error in pmg(return ~ ivol_1, df_all_11, index = c("yearmonth", "firms")) :
Insufficient number of time periods
or sometimes the error is like this
Error in model.frame.default(terms(formula, lhs = lhs, rhs = rhs, data = data, :
object is not a matrix
For your convenience, I am sharing my data with you. The data link is
data frame
I am wondering why this is happening in case of the different variable in the same data frame. I would be grateful if you can solve this problem.
This problem can be solved by mice function
library(mice)
library(dplyr)
require(foreign)
require(plm)
require(lmtest)
df_all_11<-read.csv("df_all_11.csv.part",sep = ",",header = TRUE,stringsAsFactor = F)
x<-data.frame(ivol_1=df_all_11$ivol_1,month=df_all_11$Month)
imputed_Data <- mice(x, m=3, maxit =5, method = 'pmm', seed = 500)
completeData <- complete(imputed_Data, 3)
df_all_11<-mutate(df_all_11,ivol_1=completeData$ivol_1)
fpmg2 <- pmg(return~ivol_1,df_all_11, index=c("yearmonth","firms"))
coeftest(fpmg2)
this problem because the variable ivol_1 have a lots of NA so you should impute the NA first then run the pmg function.

Resources