semPLS package backcalculating influence - r

I'm doing a PLS regression using the semPLS package in R and was wondering something
This is the example from ?sempls:
library(semPLS)
data(ECSImobi)
ecsi <- sempls(model=ECSImobi, data=mobi, wscheme="pathWeighting")
ecsi
We see IMAG1-5 is connected to the latent variable of Image. Each one has an outer loading between 0.57 and 0.77. Image now is connected to the variable Expectation and has a beta coefficient of 0.505.
Now my question is:
Is it possible to "back-calculate" the influence of 0.505 to each IMAG1-5-variable?
After looking at the model specifications and formula you could just do 0.505/0.77, and so on. But that doesn't make very much sense because the higher the correlation between IMAG1-5 and Image the lower the influence on Expectation doesn't make sense.

I'm really not sure about this, but since there is no answer here, I'll give it a shot.
If I understand correctly, what you want is kind of an effect size.
This can be calculated in the structural model for exogenous latent variables according to Cohen,1988, p.410-413 with the effect size f².
So if you want the effect of a exogenous Variable on an endogenous Variable you calculate the model two times, first with that specific exogenous Variable and second without it. Now you have two R² values of that endogenous Variable, which you put in the formula of Cohen.
An effect size of 0.02, 0.15 and 0.35 are weak, moderate and strong effects.
The R Code for the effect size of "Image" on "Loyalty" would look like this:
library(semPLS)
data(ECSImobi)
ecsi <- sempls(model=ECSImobi, data=mobi, wscheme="pathWeighting")
ecsi
#estimate model without LV "Image"
excl_image_ecsi <- sempls(model=removeLVs(ECSImobi,"Image"), data=mobi, wscheme="pathWeighting")
excl_image_ecsi
#rSquared of "Loyalty" with exogenous variable "Image"
rSquared(ecsi)[7]
#rSquared of "Loyalty" without exogenous variable "Image"
rSquared(excl_image_ecsi)[6]
#calculate effect size of "Image" on "Loyalty"
(fSquared <- (rSquared(ecsi)[7]-rSquared(excl_image_ecsi)[6])/(1-rSquared(ecsi)[7]))
Effect size of "Image" on "Loyalty" is 0.03 , which can be considered as weak.
So here are some ideas for calculating the effect size of an indicator based on the previous procedure of Cohen.
First estimate the model without the MV "IMAG1"
excl_IMAG1_ecsi <- sempls(model=removeMVs(ECSImobi,"IMAG1"), data=mobi, wscheme="pathWeighting")
Then calculate the f² of "IMAG1" on "Expectation"
#get rSquared of "Expectation"
rSquared(ecsi)[2]
#get new rSquared of "Expectation"
rSquared(excl_IMAG1_ecsi)[2]
#effect size of "IMAG1" on Expectation
(fSquared <- (rSquared(ecsi)[2]-rSquared(excl_IMAG1_ecsi)[2])/(1-rSquared(ecsi)[2]))
The effect size is 0.00868009, but because we are talking about removing an MV not an LV the previous rules of thumb 0.02 for weak, 0.15 for moderate and 0.35 for strong will likely not be appropriate to use here and I can't think of new ones atm.
This seems to be a valid transfer of Cohen's Idea, at least in my point of view, but I think thats not quite what you wanted so lets get crazy now.
#get path coefficient Image -> Expectation
ecsi$coefficients[25,2]
#get new path coefficient Image -> Expectation
excl_IMAG1_ecsi$coefficients[24,2]
#effect size of "IMAG1" on path coefficient Image -> Expectation
(fPath <- (ecsi$coefficients[25,2]-excl_IMAG1_ecsi$coefficients[24,2])/(1-excl_IMAG1_ecsi$coefficients[24,2]))
So now I tried to transfer Cohen's formula to path coefficients instead of rSquared Values. So we can see the effect of "IMAG1" on the path coefficient Image -> Expectation. It has changed from 0.5049139 to 0.4984685.
The fpath formula will very likely be false, considering path coefficients can be negative too, but I don't have the time to think about that and I also think thats still not what you wanted in first place.
So now I'm taking you literally:
"Now my question is: Is it possible to "back-calculate" the influence
of 0.505 to each IMAG1-5-variable?"
My first thought on that was: There can't be an influence of a path coefficient on and MV variable.
My second thought was: You mean the influence of LV "Expectation" to each MV IMAG1-5-variable, which is indeed represented through the path coefficient (which is 0.505).
My third thought was: There is no influence of LV "Expectation" to each MV IMAG1-5-variable, because "Image" is the endogenous and "Expectation" the exogenous variable. Which means "Image" does influence "Expectation", the arrow goes from "Image" to "Expectation".
Now I was curious, deleted the LV "Expectation" and calculated the model:
excl_exp_ecsi <- sempls(model=removeLVs(ECSImobi,"Expectation"), data=mobi, wscheme="pathWeighting")
excl_exp_ecsi
Now lets compare the outer loadings:
old ones:
lam_1_1 Image -> IMAG1 0.745
lam_1_2 Image -> IMAG2 0.599
lam_1_3 Image -> IMAG3 0.576
lam_1_4 Image -> IMAG4 0.769
lam_1_5 Image -> IMAG5 0.744
new ones:
lam_2_1 Image -> IMAG1 0.747
lam_2_2 Image -> IMAG2 0.586
lam_2_3 Image -> IMAG3 0.575
lam_2_4 Image -> IMAG4 0.773
lam_2_5 Image -> IMAG5 0.750
As you can see in the following visualisation of the ecsi model, "Image" is an exogenous, means independent latent Variable.
But there is still a slight change in the outer loadings as you can see in the two tables above, when deleting the "Expectation" variable. Now I can't tell you exactly why that is, because I do not know the algorithm good enough, but I hope I could clarify some things for you or other readers and not make it worse :).
Please note that I did not do research on that topic in literature and it just represents my idea how these could be calculated. There could also be a common way to do that and my approach could be deadly wrong.

Related

PCL Correspondence Grouping Recognize Transformation Details

I have followed the following tutorial:
http://pointclouds.org/documentation/tutorials/correspondence_grouping.php
In an effort to estimate the pose of an object in a scene. I have successfully got that tutorial working for both the sample point clouds as well as my own point clouds (after adjusting some parameters).
The person who wrote the tutorial had the following to say:
The recognize method returns a vector of Eigen::Matrix4f representing a transformation (rotation + translation) for each instance of the model found in the scene (obtained via Absolute Orientation)
I get this transformation matrix, but I do not understand what the values are in reference to.
For example, the same tutorial page has the following output:
| 0.969 -0.120 0.217 |
R = | 0.117 0.993 0.026 |
| -0.218 -0.000 0.976 |
t = < -0.159, 0.212, -0.042 >
While I understand that these values represent the rotation and translation of each model found in the scene - what are these values using as a reference point and how can they be used?
For example, if I wanted to place another model on top of the identified model, is it possible to use those values to localize the identified model? Or if I had a robot that wanted to interact with the model, could these values be used by the robot to infer where the object is in the scene?
The correspondence grouping method requires two parameters - a scene and a model. My current assumption is that the algorithm says "I found this model in the scene, now what transformation do I need to apply to the model to make it align with the scene?". Since the model was extracted from the scene, after the algorithm found the model in the scene, it determines what transformation needs to be applied. But since the model was extracted directly from the scene, very little transformation needs to be applied.
Could anyone provide some insight into these values?
OK, I believe I have verified my theory. I took a second image of the scene with the model moved to the left by about 1 meter. I extracted that model and used it as an input to the correspondence grouping. The translation matrix translates the object on the X axis by significantly more than the original object. This confirms to me that the transformation matrix that is printing is the transformation from the supplied object to the instance of the object recognized in the scene.

Why has the author used the following matrices for the following standardisation?

Can somebody tell me why this author has used the following code in their normalisation.
The first line appears fine to me they have standardised the training set by the following formula;
(x - mean(x)) / std(x)
However the second line and third line (validation and test) they have used the train mean (trainme) and train standard deviation (trainstd). Should they not have used the validation mean (validationme) and validation standard deviation (validationstd) along with the test mean and test standard deviation?
You can also view the page from the book at the following link (page 173)
What the authors are doing is reasonable and it's what is conventionally done. The idea is that the same normalization is applied to all inputs. This is essentially allocating some new parameters (offset and scale) and estimating them from the training data. In that scheme, if the value 100 is input, then the normalized value is (100 - offset)/scale, no matter where (training, testing, whatever) that 100 came from.
I guess one can also make an argument that the offset and scale should be context dependent in the sense that if you are given a set of data and for some reason the offset and scale are very different from the original training data, maybe what's important is how big each value is relative to the others in the same data set. E.g. maybe you should treat 200 the same as 100, if the scale is twice as big in the data set containing 200.
Whether that data-dependent scaling is reasonable would have to be decided case by case. I don't remember ever having seen it, but it's plausible that it could be the right thing to do in some cases.
By the way, you'll get more interest in general statistical questions at stats.stackexchange.com and/or datascience.stackexchange.com.

Transformation with box-cox in R

I have a vector like x = [7,41;7,32;7,14;6,46;7,36;7,23;7,16;7,28]. I did a shapiro test (shapiro.test) and the result for the p-value = 0.003391826 which means its not normal distributed and so i want to transform it with box cox (or if you have a better idea except of log and square root) into normal form.
This is the command i tried: boxcox_x=boxcox(x~1, lambda = seq(2,3,1/10), plotit = TRUE, eps=1/50, xlab=expression(lambda), ylab="log-Likelihood"). After this i saw in the diagram for example lambda = -2.
Then i wrote lambda.max=boxcox_x$x[which.max(boxcox_ph$y)] and the lambda value from this code was completely different from what i could see in the diagram
then i wrote: x_new=bcPower(x, lambda.max, jacobian.adjusted = FALSE) because i thought this code will give me my new vector which should be normal distributed but the result was completely different
Can anybody help me in an easy way of explaining (I am an newcomer)
Thank you
Getting a good approximation of the distribution is a bit of an art that depends on the context.
A bigger problem you may have is that you have a small sample size which could lead to unreliable estimates of the p-value or representation of the data in any distribution.

Cost function of Convolutional Neural Networ not serving intended purpose

So I have built a CNN and now I am trying to get the training of my network to work effectively despite my lack of a formal education on the topic. I have decided to use stochastic gradient descent with a standard mean squared error cost function. As stated in the title, the problem seems to lie within the cost function.
When I use a couple of training examples, I calculate the mean squared error for each, and get the mean, and use that as the full error. There are two output neurons, one for face, and one for not a face; which ever is higher is the class that is yielded. Essentially, if a training example yields the wrong classification, I calculate the error (with the desired value being the value of the class that was yielded).
Example:
Input an image of a face--->>>
Face: 500
Not face: 1000
So in this case, the network says that the image isn't a face, when in fact it is. The error comes out to:
500 - 1000 = -500
-500^2 = 250000 <<--error
(correct me if i'm doing anything wrong)
As you can see the desired value is set to the value of the incorrect class that was selected.
Now this is all good (from what I can tell), but here is my issue:
As I perform b-prop on the network multiple times, the mean cost of the entire training set falls to 0, but this is only because all of the weights in the network are becoming 0, so all classes always become 0.
After training:
Input not face->
Face: 0
Not face: 0
--note that if the classes are the same, the first one is selected
(0-0)^0 = 0 <<--error
So the error is being minimized to 0 (which is good I guess), but obviously not the way we want.
So my question is this:
How do I minimize the space between the classes when the class is wrong, but also get it to overshoot the incorrect class so that the correct class is yielded.
//example
Had this: (for input of face)
Face: 100
Not face: 200
Got this:
Face: 0
Notface: 0
Want this: (or something similar)
Face: 300
Not face: 100
I hope this question wasn't too vague...
But any help would be much appreciated!!!
The way you're computing the error doesn't correspond to the standard 'mean squared error'. But, even if you were to fix it, it makes more sense to use a different type of outputs and error that are specifically designed for classification problems.
One option is to use a single output unit with a sigmoid activation function. This neuron would output the probability that the input image is a face. The probability that it's not a face is given by 1 minus this value. This approach will work for binary classification problems. Softmax outputs are another option. You'd have two output units: the first outputs the probability that the input image is a face, and the second outputs the probability that it's not a face. This approach will also work for multi-class problems, with one output unit for each class.
In either case, use the cross entropy loss (also called log loss). Here, you have a target value (face or no face), which is the true class of the input image. The error is the negative log probability that the network assigns to the target.
Most neural nets that perform classification work this way. You can find many good tutorials here, and read this online book.

Mathematical library to compare simularities in graphs of data for a high level language (eg. Javascript)?

I'm looking for something that I guess is rather sophisticated and might not exist publicly, but hopefully it does.
I basically have a database with lots of items which all have values (y) that correspond to other values (x). Eg. one of these items might look like:
x | 1 | 2 | 3 | 4 | 5
y | 12 | 14 | 16 | 8 | 6
This is just a a random example. Now, there are thousands of these items all with their own set of x and y values. The range between one x and the x after that one is not fixed and may differ for every item.
What I'm looking for is a library where I can plugin all these sets of Xs and Ys and tell it to return things like the most common item (sets of x and y that follow a compareable curve / progression), and the ability to check whether a certain set is atleast x% compareable with another set.
With compareable I mean the slope of the curve if you would draw a graph of the data. So, not actaully the static values but rather the detection of events, such as a high increase followed by a slow decrease, etc.
Due to my low amount of experience in mathematics I'm not quite sure what I'm looking for is called, and thus have trouble explaining what I need. Hopefully I gave enough pointers for someone to point me into the right direction.
I'm mostly interested in a library for javascript, but if there is no such thing any library would help, maybe I can try to port what I need.
About Markov Cluster(ing) again, of which I happen to be the author, and your application. You mention you are interested in trend similarity between objects. This is typically computed using Pearson correlation. If you use the mcl implementation from http://micans.org/mcl/, you'll also obtain the program 'mcxarray'. This can be used to compute pearson correlations between e.g. rows in a table. It might be useful to you. It is able to handle missing data - in a simplistic approach, it just computes correlations on those indices for which values are available for both. If you have further questions I am happy to answer them -- with the caveat that I usually like to cc replies to the mcl mailing list so that they are archived and available for future reference.
What you're looking for is an implementation of a Markov clustering. It is often used for finding groups of similar sequences. Porting it to Javascript, well... If you're really serious about this analysis, you drop Javascript as soon as possible and move on to R. Javascript is not meant to do this kind of calculations, and it is far too slow for it. R is a statistical package with much implemented. It is also designed specifically for very speedy matrix calculations, and most of the language is vectorized (meaning you don't need for-loops to apply a function over a vector of values, it happens automatically)
For the markov clustering, check http://www.micans.org/mcl/
An example of an implementation : http://www.orthomcl.org/cgi-bin/OrthoMclWeb.cgi
Now you also need to define a "distance" between your sets. As you are interested in the events and not the values, you could give every item an extra attribute being a vector with the differences y[i] - y[i-1] (in R : diff(y) ). The distance between two items can then be calculated as the sum of squared differences between y1[i] and y2[i].
This allows you to construct a distance matrix of your items, and on that one you can call the mcl algorithm. Unless you work on linux, you'll have to port that one.
What you're wanting to do is ANOVA, or ANalysis Of VAriance. If you run the numbers through an ANOVA test, it'll give you information about the dataset that will help you compare one to another. I was unable to locate a Javascript library that would perform ANOVA, but there are plenty of programs that are capable of it. Excel can perform ANOVA from a plugin. R is a stats package that is free and can also perform ANOVA.
Hope this helps.
Something simple is (assuming all the graphs have 5 points, and x = 1,2,3,4,5 always)
Take u1 = the first point of y, ie. y1
Take u2 = y2 - y1
...
Take u5 = y5 - y4
Now consider the vector u as a point in 5-dimensional space. You can use simple clustering algorithms, like k-means.
EDIT: You should not aim for something too complicated as long as you go with javascript. If you want to go with Java, I can suggest something based on PCA (requiring the use of singular value decomposition, which is too complicated to be implemented efficiently in JS).
Basically, it goes like this: Take as previously a (possibly large) linear representation of data, perhaps differences of components of x, of y, absolute values. For instance you could take
u = (x1, x2 - x1, ..., x5 - x4, y1, y2 - y1, ..., y5 - y4)
You compute the vector u for each sample. Call ui the vector u for the ith sample. Now, form the matrix
M_{ij} = dot product of ui and uj
and compute its SVD. Now, the N most significant singular values (ie. those above some "similarity threshold") give you N clusters.
The corresponding columns of the matrix U in the SVD give you an orthonormal family B_k, k = 1..N. The squared ith component of B_k gives you the probability that the ith sample belongs to cluster K.
If it is ok to use java you really should have a look at Weka. It is possible to access all features via java code. Maybe you find a markov clustering, but if not, they hava a lot other clustering algorithem and its really easy to use.

Resources