Super noobish prolog arithmetic - math

Very new to prolog. For a program I would like to say A1 = X1 + X2 + X3 + X4 to add up to 16. Also, I want to say that X1 must = 1. Note X1 to Xn are all permuted numbers from 1-9.
I wrote the code the following way
A1 is (X1 is 1) + X2 + X3 + X4, A1 =:= 16.
From this I get the compiler error Arithmetic: `_7206 is 6' is not a function
it's the same if I replace the 'is' in (X1 is 1) with =, or =:=
P.S A preferred answer would be great if the formatting was kept the same.

Related

How to count the # of correctly selected models by a feature selection method in its output structured as a list or df in R

I have ran several optimal variable/model selection methods from machine/statistical learning on the same file folder of 58,000 (csv formatted) randomly generated synthetic datasets (all of the same size) in order to compare which method correctly selects the true underlying model for each dataset the most times. All of the scripts & many of the datasets can be found in my GitHub Repository for this research project.
I have already gotten the output/results I need, each of the file formatted datasets' names are formatted in the following manner: n1-n2-n3-n4 where n1 begins at 0 and ends at 1, n2 begins at 3 and ends at 15, n3 begins with 1 and ends with 9, and n4 begins with 1 and ends with 500.The dataframe/list with the results looks like the following:
> str(BM1_models)
'data.frame': 58000 obs. of 1 variable:
$ V1: chr "0-3-1-1; X1, X2, X3" "0-3-1-2; X1, X2, X3" "0-3-1-3; X1, X2, X3" "0-3-1-`4; X1, X2, X3" ...`
> head(BM1_models, n = 4)
V1
1 0-3-1-1; X1, X2, X3
2 0-3-1-2; X1, X2, X3
3 0-3-1-3; X1, X2, X3
4 0-3-1-4; X1, X2, X3
> tail(BM1_models, n = 4)
V1
57997 1-15-9-497; X2, X3, X4, X9, X10, X11, X13, X14
57998 1-15-9-498; X2, X3, X5, X6, X8, X9, X10, X11, X12, X15
57999 1-15-9-499; X3, X4, X5, X6, X8, X10, X11, X12, X15
58000 1-15-9-500; X2, X4, X6, X7, X8, X10, X11
How to tell whether the ML variable/factor selection method (in this case LASSO) is right for any given dataset is if the n2 for that dataset says 3, then the Independent Variables selected should be X1, X2, X3, if it says 4, the underlying structural model is X1, X2, X3, X4, and so on up until 15 (I'll explain what n1, n3, & n4 signify in a p.s. section at the bottom). So, I need to write something like a count function within a complex if function inside of it all within an lapply function here, but I don't know how exactly.
p.s. Part1
The datasets & scripts are also available in this GitHub repository of mine which should be far easier to navigate than the first one I linked to.
p.s. Part2
To clarify, if n2 = 5 for a given dataset and the model chosen was X1, X2, X4, X5 (known as an omitted variable model or X1, X2, X3, X4, X5, X8, X9, etc. (known as an extraneous variable model), it is not correct. Only a model which includes all of the variables X1 through Xn2 should be counted, every other result should not be.
p.s. Part3
n1 indicates the amount of multicollinearity between factors in the true (underlying) structural regression equation, n3 indicates the error variance, and n4 indicates which of the 500 random variations out of all possible randomly generated datasets for each set of the other 3 parameters it is (this is a Monte Carlo Simulation).
If I get this right, the idea is to check if the second part of a string of the form 'X1, X2, ..., Xn' equals what should be expected based on the first part of that same string. I think the easiest way is to write a function that makes the comparison for any single string, then sapply it over the string vector:
# testing df, only first (good) and last (bad) entry
df = data.frame(V1 = c('0-3-1-1; X1, X2, X3', '1-15-9-500; X2, X4, X6, X7, X8, X10, X11'))
good_model = function (str) {
str = unlist(strsplit(str, '; '))
desc = str[1]
pred = str[2]
n_2 = unlist(strsplit(desc, '-'))[2]
expt = paste0('X', 1:as.integer(n_2), collapse = ', ')
identical(pred, expt)
}
df$good = sapply(df$V1, good_model)
df
# V1 good
# 0-3-1-1; X1, X2, X3 TRUE
# 1-15-9-500; X2, X4, X6, X7, X8, X10, X11 FALSE
Note: I assumed the character after ; in the original string was a space, if it is a <tab> then the first call to strsplit() should be updated.

Error Message in Francis Huangs Article on Multilevel Confirmatory Factor Analysis

I am running a Mulitlevel Confirmatory Factorial Analysis in R based on the paper of Francis Huang. Iam working with the syntax from the article and the example data set used in the article.
Here is the Article: http://web.missouri.edu/~huangf/data/mcfa/MCFAinRHUANG.pdf
Watch out! The webadress for downloading the data and the R function is obsolete in the paper. In the syntax below the webadresses are correct.
load lavaan package
Library(lavaan)
That’s an r function
source('http://web.missouri.edu/huangf/data/mcfa/mcfa.R')
dataset containing only the grouping variable and the variables of interest.
raw <- read.csv("http://web.missouri.edu/huangf/data/mcfa/raw.csv")
the function is applied to the dataset specifying that sid (school id) is the clustering variable
x <- mcfa.input("sid", raw)
STEP 1
one factor model and two factor model with results
onefactor <- 'f1 =~ x1 + x2 + x3 + x4 + x5 + x6'
twofactor <- 'f1 =~ x1 + x2 + x4; f2= ~x3 + x5 + x6'
results1 <- cfa(onefactor, sample.cov = x$pw.cov, sample.nobs = x$n - x$G)
summary(results1, fit.measures = T, standardized = T)
results2 <- cfa(twofactor, sample.cov = x$pw.cov, sample.nobs = x$n - x$G)
summary(results2, fit.measures = T, standardized = T)
STEP 2
A null model is specified where both the SPW and SB matrices are used in a
multigroup setup using the factor structure defined at step 1 on both matrices with all equality constraints set to be equal. We do not really have two groups but the multigroup setup will be used to analyze both the 'within group' and the 'between group' matrices simultaneously. In lavaan, multiple input covariance matrices and the sample sizes for each are stored in a list object:
combined.cov <- list(within = x$pw.cov, between = x$b.cov)
combined.n <- list(within = x$n - x$G, between = x$G)
The first object in the list refers to group one and the second object refers to group two and we create two new objects (i.e., combined.cov and combined.n) that contain the two covariance matrices (i.e., SPW and SB) and the sample size for each (n - G and G, respectively).
Next, a model imposing the equality constraints must be specified. In this step, the model specification expands quite a bit. In lavaan, the equality constrains are imposed for the par-
ticular variable by indicating c(a,a)*variable where c() is the concatenate function, a is
a label assigned by the user to indicate that loading a for group one is set to be equal for
loading a in group two. The same label names instruct lavaan to use the same estimates be-
tween groups or in other words, specify equality constraints. To specify equal factor loadings
for both factors for the within and between models, we indicate: f1 =~ x1 + c(a,a)*x2 +
c(b,b)*x4; f2 =~ x3 + c(c,c)*x5 + c(d,d)*x6. The loadings for x1 and x3 are auto matically set to 1 so do not need to be specified.
nullmodel <- '
+ f1 =~ x1 + c(a,a)*x2 + c(b,b)*x4
+ f2 =~ x3 + c(c,c)*x5 + c(d,d)*x6
+ x1 ~~ c(e,e)*x1
+ x2 ~~ c(f,f)*x2
+ x3 ~~ c(g,g)*x3
+ x4 ~~ c(h,h)*x4
+ x5 ~~ c(i,i)*x5
+ x6 ~~ c(j,j)*x6
+ f1 ~~ c(k,k)*f1
+ f2 ~~ c(l,l)*f2
+ f1 ~~ c(m,m)*f2
+ '
results3 <- cfa(nullmodel, sample.cov = combined.cov,
+ sample.nobs = combined.n)
Thats the error message
Error: Unexpected '=' in:
"results3 <- cfa(nullmodel, sample.cov = combined.cov, + sample.nobs ="
With my data I am receiving the same error message.
Any ideas? Help is appreciated.
Best regards Konstantin

Mathematical functions in lavaan's lavTestWald function

I'm using the lavaan package in R and want to use lavaan::lavTestWald to test the fit of a model under linear constraints. This test is part of a loop, so there are a large number of models where I want to test these constraints.
As part of the test, I want to set the absolute value of two quantities to be equal. Is there a way to do this? I know about R's abs() function but haven't been able to figure out how to incorporate abs() into lavaan::lavTestWald.
Here's a reproducible example.
HS.model <- ' visual =~ x1 + a*x2 + b*x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9'
fit <- cfa(HS.model, data = HolzingerSwineford1939)
lavTestWald(fit, "abs(a) == abs(b)") # I want something like this
I don't know why, but the reason that your code is not working as expected seems related to abs(). This worked for me:
lavTestWald(fit, "sqrt(a^2) == sqrt(b^2)")
Note that you could also define new parameters in your model statement and test those, but I don't know if that makes any difference in your situation.
HS.model <- ' visual =~ x1 + a*x2 + b*x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9
# new1 := abs(a) - abs(b)
new2 := sqrt(a^2) - sqrt(b^2)
'
fit <- cfa(HS.model, data = HolzingerSwineford1939)
lavTestWald(fit, "new2 == 0")

Remove perfectly multicollinear variables from data frame

I have a data frame with variables, of which some contain the same information
x1 = runif(1000)
x2 = runif(1000)
x3 = x1 + x2
x4 = runif(1000)
x5 = runif(1000)*0.00000001 +x4
x6 = x5 + x3
x = data.frame(x1, x2, x3, x4, x5, x6)
In a next step I want to rid myself of all variables which are perfectly multicollinear, e.g. column x3 and x6 (there might be also other combinations).
In Stata this is fairly easy: _rmcoll varlist
How is this efficiently done in R?
EDIT:
Note that the ultimate goal is to compute the Mahalanobis distance between observations. For this I need to drop redunant variables. And as far as I can foresee, for this application it would not matter whether I drop x1, x2 or x3
I don't know of a built-in convenience function, but QR decomposition will do it.
We need the data frame to be a matrix:
X <- as.matrix(x)
Use a slightly lower than default tolerance to keep the slightly-non-multicollinear column:
qr.X <- qr(X, tol=1e-9, LAPACK = FALSE)
(rnkX <- qr.X$rank) ## 4 (number of non-collinear columns)
(keep <- qr.X$pivot[seq_len(rnkX)])
## 1 2 4 5
X2 <- X[,keep]
This strictly answers your question; you might also be able to use singular value decomposition (svd()) to implement Mahalanobis distances directly on this type of data ...
For completeness I post the quick-and-dirty solution I was using until now. I actually think it does not perform that badly compared to other methods.
x1 = runif(1000)
x2 = runif(1000)
x3 = x1 + x2
x4 = runif(1000)
x5 = runif(1000)*0.00000001 +x4
x6 = x5 + x3
x = data.frame(x1, x2, x3, x4, x5, x6)
const = rep(1,1000)
a<-lm(const ~ ., data=x)
names(a$coefficients[!is.na(a$coefficients)])[c(-1)]

R add1 function, scope argument to reference all variables

When using the add1 function to consider new variables, I would like to reference all variables (either in some dataframe or global environment), but I can not figure out how to use the scope argument to do this.
I am aware I can use it like this
X = data.frame(replicate(4,rnorm(20))) ; y = rnorm(20)
lm1 = lm(y ~ 1)
out = add1(lm1, scope= ~X$X1 + X$X2 + X$X3)
but I want to avoid manually writing in every variable.
As I have seen in other questions, I know the . symbol will not work but I am not sure why. It stands for what is already there, so if I do
x1 = rnorm(20) ; x2 = rnorm(20) ; x3 = rnorm(20) ; x4 = rnorm(20) ; y = rnorm(20)
out = add1(lm1, scope= ~ . )
it does not use what is already in the global environment.
I know the documentation says that scope must be "a formula giving the terms to be considered", but that is usually where . can be used to reference all variables.
Thanks in advance.
Also note I have read Chp 7 of MASS, and these related threads
scope from add1()-command in R
http://tolstoy.newcastle.edu.au/R/help/02b/3588.html
This is an even simpler answer, which I found after browsing this question
http://r.789695.n4.nabble.com/glm-formula-vs-character-td2543061.html
x1 = rnorm(100)
x2 = rnorm(100)
x3 = rnorm(100)
y = rnorm(100)
BaseReg = lm(y ~ 1)
newdf = data.frame(x1,x2,x3)
out = add1(BaseReg, names(newdf))
It is baffling that such a simple way to get this was not stated in the documentation for add1.
As the help page for add1 says the formula ~. means "what's already there". It is not any simpler to use as.formula for small numbers of names but this approach can be using in a function or script. (Generally one would expect to put the X's and Y in the same dataframe.)
as.formula(paste("~", paste(names(YX)[-c(1,5)],collapse="+")))
#~X1 + X2 + X3
YX <- cbind(y,X)
form <- as.formula(paste("~", paste(names(YX)[-c(1,5)],collapse="+")))
add1(lm1, form)
You appear to have stumbled across a more efficient strategy. If using a data object with column names: "y" "X1" "X2" "X3"
"X4:
> formula(YX)
y ~ X1 + X2 + X3 + X4
> formula(YX)[-2]
~X1 + X2 + X3 + X4
> as.list(formula(YX))
[[1]]
`~`
[[2]]
y
[[3]]
X1 + X2 + X3 + X4
> names(YX)
[1] "y" "X1" "X2" "X3" "X4"
You can see that a formula object has as its first element the formula-defining tilde which is really an R function. The second element is the LHS expression and the third elemtn is the RHS expression.
Here is something I found that works:
X = data.frame(replicate(4,rnorm(20)))
lm1 = lm(X1 ~ 1 ,data=X)
add1(lm1, scope=formula(X)[-2])
Granted, I have no idea why this is the case
formula(X)[-2]
# ~X2 + X3 + X4
I just found it by accident. Other things like formula(X)[-1] and formula(X)[-3] also return other things which are equally bizarre to me.

Resources