PLM is not recognizing my id variable name - r

I'm doing a regression analysis considering fixed effects using plm() from package plm. I have selected the twoways method to account for both time and individual effects. However, after runing the below code I keep receiving this message:
Error in pdata.frame(data, index) :
variable id does not exist (individual index)
Here the code:
pdata <- DATABASE[,c(2:4,13:21)]
pdata$id <- group_indices(pdata,ISO3.p,Productcode)
coutnin <- dcast.data.table(pdata,ISO3.p+Productcode~.,value.var = "id")
setcolorder(pdata,neworder=c("id","Year"))
pdata <- pdata.frame(pdata,index=c("id","Year"))
reg <- plm(pdata,diff(TV,1) ~ diff(RERcp,1)+diff(GDPR.p,1)-diff(GDPR.r,1), effect="twoways", model="within", index = c("id","Year"))
Please mind that pdata structure shows that there are multiple levels in the id variable which is in numeric form, I tried initially to use a string type variable but I keep receiving the same outcome:
Classes ‘data.table’ and 'data.frame': 1211800 obs. of 13 variables:
$ id : int 4835 6050 13158 15247 17164 18401 19564 23553 24895 27541 ...
$ Year : int 1996 1996 1996 1996 1996 1996 1996 1996 1996 1996 ...
$ Productcode: chr "101" "101" "101" "101" ...
$ ISO3.p : Factor w/ 171 levels "ABW","AFG","AGO",..: 8 9 20 22 27 28 29 34 37 40 ...
$ e : num 0.245 -0.238 1.624 0.693 0.31 ...
$ RERcp : num -0.14073 -0.16277 1.01262 0.03908 -0.00243 ...
$ RERpp : num -0.1712 NA NA NA -0.0952 ...
$ RER_GVC : num -3.44 NaN NA NA NaN ...
$ GDPR.p : num 27.5 26.6 23.5 20.3 27.8 ...
$ GDPR.r : num 30.4 30.4 30.4 30.4 30.4 ...
$ GVCPos : num 0.141 0.141 0.141 0.141 0.141 ...
$ GVCPar : num 0.436 0.436 0.436 0.436 0.436 ...
$ TV : num 17.1 17.1 17.1 17.1 17.1 ...
- attr(*, ".internal.selfref")=<externalptr>
When I convert the data.table into a pdata.frame I do not receive any warning, it happens only after I run the plm function. From running View(table(index(pdata), useNA = "ifany")) it displays no value larger than 1, therefore I assume I have no duplicates obs in my data.

Try to put the data argument at the second place in the plm statement. In case pdata has been converted to a pdata.frame already, leave out the index argument in the plm statement, i.e., try this:
reg <- plm(diff(TV,1) ~ diff(RERcp,1)+diff(GDPR.p,1)-diff(GDPR.r,1), data = pdata, effect = "twoways", model = "within")

Related

Error when running boxcox on response variable

I'm using the following code to try to transform my response variable for regression. Seems to need a log transformation.
bc = boxCox(auto.tf.lm)
lambda.mpg = bc$x[which.max(bc$y)]
auto.tf.bc <- with(auto_mpg, data.frame(log(mpg), as.character(cylinders), displacement**.2, log(as.numeric(horsepower)), log(weight), log(acceleration), model_year))
auto.tf.bc.lm <- lm(log(mpg) ~ ., data = auto.tf.bc)
view(auto.tf.bc)
I am receiving this error though.
Error in Math.data.frame(mpg) :
non-numeric variable(s) in data frame: manufacturer, model, trans, drv, fl, class
Not sure how to resolve this. The data is in a data frame, not csv.
Here's the output from str(auto.tf.bc). Sorry for such bad question formatting.
'data.frame': 392 obs. of 7 variables:
$ log.mpg. : num 2.89 2.71 2.89 2.77 2.83 ...
$ as.character.cylinders.: chr "8" "8" "8" "8" ...
$ displacement.0.2 : num 3.14 3.23 3.17 3.14 3.13 ...
$ log.horsepower. : num 4.87 5.11 5.01 5.01 4.94 ...
$ log.weight. : num 8.16 8.21 8.14 8.14 8.15 ...
$ log.acceleration. : num 2.48 2.44 2.4 2.48 2.35 ...
$ model_year : num 70 70 70 70 70 70 70 70 70 70 ...
removing the cylinders doesn't change anything.

PLSR in R with "pls" package

I'm trying to fit PLSR model, but I'm doing something wrong. Below, you can see how I created data frame and its structure.
reflektance <- read_excel("data/reflektance.xlsx", na = "NA")
reflektance <- dput(reflektance)
pH <- read_excel("data/rijen2016.xls", na = "NA")
pH <- na.omit(pH)
pH <- dput(pH)
reflektance<-aggregate(reflektance[, 2:753], list(reflektance$Vzorek), mean)
colnames(reflektance)[colnames(reflektance)=='Group.1']<-'Vzorek'
datapH <- merge(pH, reflektance, by="Vzorek")
datasetpH <- data.frame(pH=datapH[,2], ref=I(as.matrix(datapH[, 3:754], 22, 752)))
Problem is with using "plsr", because result is this error:
ph1<-plsr(pH ~ ref, ncomp = 5, data=datasetpH)
Error in pls::mvr(ref ~ pH, ncomp = 5, data = datasetpH, method = "kernelpls") :
Invalid number of components, ncomp
dput(reflectance):
https://jpst.it/RyyS
Here you can see structure of table datapH:
'data.frame': 22 obs. of 754 variables:
$ Vzorek: chr "5 - P01" "5 - P02" "5 - P03" "5 - R1 - A1" ...
$ pH/H2O: num 6.96 6.62 7.02 5.62 5.97 6.12 5.64 5.81 5.61 5.47 ...
$ 325 : num 0.017 0.0266 0.0191 0.0241 0.016 ...
$ 326 : num 0.021 0.0263 0.0154 0.0264 0.0179 ...
$ 327 : num 0.0223 0.0238 0.0147 0.028 0.0198 ...
...
And here structure of table datasetpH:
'data.frame': 22 obs. of 2 variables:
$ pH : num 6.96 6.62 7.02 5.62 5.97 6.12 5.64 5.81 5.61 5.47 ...
$ ref: AsIs [1:22, 1:752] 0.016983.... 0.026556.... 0.019059.... 0.024097.... 0.016000.... ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "325" "326" "327" "328" ...
Do you have any advice and solution? Thank you
The problem seems to come from one of your columns containing only NA's.
The last line of the output of names(df)gives:
[745] "1068" "1069" "1070" "1071" "1072" "1073" "1074" "1075" NA
Using your data + some randomly generated values for pH (which isn't in the reflektance dataframe, named df here):
test=data.frame(pH=rnorm(23,5,2), ref=I(as.matrix(df[, 2:752], 22, 751)))
pls::plsr(pH ~ ref, data=test)
Error in matrix(0, ncol = ncomp, nrow = npred) :
invalid 'ncol' value (< 0)
Note that the indexing is a bit different from yours. I didn't have the second column in df (the one that contains pH in yours).
If I remove the last column which contains NA's :
test=data.frame(pH=rnorm(23,5,2), ref=I(as.matrix(df[, 2:752], 22, 751)))
pls::plsr(pH ~ ref, data=test)
Partial least squares regression , fitted with the kernel algorithm.
Call:
plsr(formula = pH ~ ref, data = test)
Let me know if that fixes it.

Analysis of PCA

I'm using the rela package to check whether I can use PCA in my data.
paf.neur2 <- paf(neur2)
summary(paf.neur2)
# [1] "Your dataset is not a numeric object."
I want to see the KMO (The Kaiser-Meyer-Olkin measure of sampling adequacy test). How to do that?
Output of str(neur2)
'data.frame': 1457 obs. of 66 variables:
$ userid : int 200 387 458 649 931 991 1044 1075 1347 1360 ...
$ funct : num 3.73 3.79 3.54 3.04 3.81 ...
$ pronoun: num 2.26 2.55 2.49 1.98 2.71 ...
.
.
.
$ time : num 1.68 1.87 1.51 1.03 1.74 ...
$ work : num 0.7419 0.2311 -0.1985 -1.6094 -0.0619 ...
$ achieve: num 0.174 0.2469 0.1823 -0.478 -0.0513 ...
$ leisure: num 0.2852 0.0296 0.0583 -0.3567 -0.0408 ...
$ home : num -0.844 -0.58 -0.844 -2.207 -1.079 ...
.
Variables are all numeric.
According to ?paf, object is a numeric dataset (usually a coerced matrix from a prior data frame)
So you need to turn your data.frame neur2 into a matrix: as.matrix(neur2).
Here is a reproduction of your problem using the Seatbelts dataset:
library(rela)
Belts <- Seatbelts[,1:7]
class(Belts)
# [1] "mts" "ts" "matrix"
Belts <- as.data.frame(Belts)
# [1] "data.frame"
paf.belt <- paf(Belts)
[1] "Your dataset is not a numeric object."
Belts <- as.matrix(Belts)
class(Belts)
# [1] "matrix"
paf.belt <- paf(Belts) # Works
Two options which can do it for you:
kmo_DIY <- function(df){
csq = cor(df)^2
csumsq = (sum(csq)-dim(csq)[1])/2
library(corpcor)
pcsq = cor2pcor(cor(df))^2
pcsumsq = (sum(pcsq)-dim(pcsq)[1])/2
kmo = csumsq/(csumsq+pcsumsq)
return(kmo)
}
or
the function KMO() from the psych package.

read.table returns extra stuff on last column

I am trying to read the table from the following URL:
url <- 'http://faculty.chicagobooth.edu/ruey.tsay/teaching/introTS/m-ge3dx-4011.txt'
da <- read.table(url, header = TRUE, fill=FALSE, strip.white=TRUE)
I can look at the data using head:
> head(da)
date ge vw ew sp
1 19400131 -0.061920 -0.024020 -0.019978 -0.035228
2 19400229 -0.009901 0.013664 0.029733 0.006639
3 19400330 0.049333 0.018939 0.026168 0.009893
4 19400430 -0.041667 0.001196 0.013115 -0.004898
5 19400531 -0.197324 -0.220314 -0.269754 -0.239541
6 19400629 0.061667 0.066664 0.066550 0.076591
This works fine for the first 4 columns, for example, I can look at the column ew
> head(da$ew)
[1] -0.019978 0.029733 0.026168 0.013115 -0.269754 0.066550
but when I try to access the last one, I get some extra output which is not in the txt file.
> head(da$sp)
[1] -0.035228 0.006639 0.009893 -0.004898 -0.239541 0.076591
859 Levels: -0.000060 -0.000143 -0.000180 -0.000320 -0.000659 -0.000815 ... 0.163047
How do I get rid of the extra output? Thanks!
This is representation of a factor.
> str(da)
'data.frame': 861 obs. of 5 variables:
$ date: int 19400131 19400229 19400330 19400430 19400531 19400629 19400731 19400831 19400930 19401031 ...
$ ge : num -0.0619 -0.0099 0.0493 -0.0417 -0.1973 ...
$ vw : num -0.024 0.0137 0.0189 0.0012 -0.2203 ...
$ ew : num -0.02 0.0297 0.0262 0.0131 -0.2698 ...
$ sp : Factor w/ 859 levels "-0.000060","-0.000143",..: 226 411 445 42 353 828 613 585 441 684 ...
Row 58 has a dot instead of a number. This is sufficient information for R to handle the variable as a factor. Once you change the dot to NA or fix the error, you will be able to read in the data fine.
Another option would be to change the point to something meaningful after the data has been read in, and coercing to numeric afterwards. The following statement will coerce . to NA.
da$sp <- as.numeric(as.character(da$sp))
> str(da)
'data.frame': 861 obs. of 5 variables:
$ date: int 19400131 19400229 19400330 19400430 19400531 19400629 19400731 19400831 19400930 19401031 ...
$ ge : num -0.0619 -0.0099 0.0493 -0.0417 -0.1973 ...
$ vw : num -0.024 0.0137 0.0189 0.0012 -0.2203 ...
$ ew : num -0.02 0.0297 0.0262 0.0131 -0.2698 ...
$ sp : num -0.03523 0.00664 0.00989 -0.0049 -0.23954 ...

am I using the wrong data type with predict.nnet() in R [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
My lack of understanding of R is causing me to grind to a halt in my work and seek your help. I'm looking to build a neural network from some time series data and then build a prediction using separate data and the model returned by the trained neural network.
I created an xts containing the dependent variable nxtCl (a one-day forward closing stock price) and the independent variables (a set of corresponding prices and technical indicators).
I split the xts in two, one set being training data and the other set for testing/prediction, these are miData.train and miData.test respectively. Subsequently I altered these two xts to be scaled data frames.
miData.train <- scale(as.data.frame(miData.train))
miDate.test <- scale(as.data.frame(miData.test))
Using the package nnet I am able to build a neural network from the training data:
nn <- nnet(nxtCl ~ .,data=miData.train,linout=T,size=10,decay=0.001,maxit=10000)
The str() output for this returned formula object is:
> str(nn)
List of 18
$ n : num [1:3] 11 10 1
$ nunits : int 23
$ nconn : num [1:24] 0 0 0 0 0 0 0 0 0 0 ...
$ conn : num [1:131] 0 1 2 3 4 5 6 7 8 9 ...
$ nsunits : num 22
$ decay : num 0.001
$ entropy : logi FALSE
$ softmax : logi FALSE
$ censored : logi FALSE
$ value : num 4.64
$ wts : num [1:131] 2.73 -1.64 1.1 2.41 1.36 ...
$ convergence : int 0
$ fitted.values: num [1:901, 1] -0.465 -0.501 -0.46 -0.431 -0.485 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:901] "2005-07-15" "2005-07-18" "2005-07-19" "2005-07-20" ...
.. ..$ : NULL
$ residuals : num [1:901, 1] -0.0265 0.0487 0.0326 -0.0384 0.0632 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:901] "2005-07-15" "2005-07-18" "2005-07-19" "2005-07-20" ...
.. ..$ : NULL
$ call : language nnet.formula(formula = nxtCl ~ ., data = miData.train, inout = T, size = 10, decay = 0.001, maxit = 10000)
$ terms : language nxtCl ~ Op + Hi + Lo + Cl + vul + smaten + smafif + smath + vol + rsi + dvi
$ coefnames : chr [1:11] "Op" "Hi" "Lo" "Cl" ...
$ xlevels : Named list()
- attr(*, "class")= chr [1:2] "nnet.formula" "nnet"
I then try to run the prediction function using this model nn and the data I kept separate miData.test using the following function:
preds <- predict(object=nn, miData.test)
and I get the following error:
Error in terms.default(object, data = data) :
no terms component nor attribute
Running terms.default on miData.test I see that my data frame does not have any attributes:
terms.default(miData.test)
Error in terms.default(miData.test) : no terms component nor attribute
but is this why the prediction will not run?
miData.test has names that match the terms of nn:
> nn$terms
nxtCl ~ Op + Hi + Lo + Cl + vul + smaten + smafif + smath + vol +
rsi + dvi
> names(miData.test)[1] "Op" "Hi" "Lo" "Cl" "vul" "smaten" "smafif" "smath" "vol" "rsi" "dvi" "nxtCl"
And, in terms of structure, the data is exactly the same as that which was used to build nn in the first place. I tried adding my own named attributes to miData.test, matching the terms of nn but that did not work. The str() of miData.test returns:
> str(miData.test)
'data.frame': 400 obs. of 12 variables:
$ Op : num 82.2 83.5 80.2 79.8 79.8 ...
$ Hi : num 83.8 84.2 83 79.9 80.2 ...
$ Lo : num 81 82.7 79.2 78.3 78 ...
$ Cl : num 83.7 82.8 79.2 79 78.2 ...
$ vul : num 4.69e+08 2.94e+08 4.79e+08 3.63e+08 3.17e+08 ...
$ smaten: num 84.1 84.1 83.8 83.3 82.8 ...
$ smafif: num 86.9 86.8 86.7 86.6 86.4 ...
$ smath : num 111 111 111 110 110 ...
$ vol : num 0.335 0.341 0.401 0.402 0.382 ...
$ rsi : num 45.7 43.6 36.6 36.3 34.7 ...
$ dvi : num 0.00968 0.00306 -0.01575 -0.01189 -0.00623 ...
$ nxtCl : num 82.8 79.2 79 78.2 77.4 ...
Any help or insight in getting predict() to work in this instance would be greatly appreciated. Thanks.
Here's some reproducible code. In putting this together, I have 'removed' the error. Unfortunately, although it now works, I am none the wiser as to what was causing the problem before:
require(quantstrat)
require(PerformanceAnalytics)
require(nnet)
initDate <- "2004-09-30"
endDate <- "2010-09-30"
symbols <- c("SPY")
getSymbols(symbols, from=initDate, to=endDate, index.class=c("POSIXt","POSIXct"))
rsi <- RSI(Cl(SPY))
smaTen <- SMA(Cl(SPY))
smaFif <- SMA(Cl(SPY),n=50)
nxtCl <- lag(Cl(SPY),-1)
tmp <- SPY[,-5]
tmp <- tmp[,-5]
miData <- merge(tmp,rsi,smaTen,smaFif,nxtCl)
names(miData) <- c("Op","Hi","Lo","Cl","rsi","smaTen","smaFif","nxtCl")
miData <- miData[50:1512]
scaled.miData <- scale(miData)
miData.train <- as.data.frame(scaled.miData[1:1000])
miData.test <- as.data.frame(scaled.miData[1001:1463])
nn <- nnet(nxtCl ~ .,data=miData.train,linout=T,size=10,decay=0.001,maxit=10000)
preds <- predict(object=nn, miData.test)

Resources