Accessing class values in R's poLCA - r

I am trying my hand at learning Latent Component Analysis, while also learning R. I'm using the poLCA package, and am having a bit of trouble accessing the attributes. I can run the sample code just fine:
ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
ds = within(ds, (cesdcut = ifelse(cesd>20, 1, 0)))
library(poLCA)
res2 = poLCA(cbind(homeless=homeless+1,
cesdcut=cesdcut+1, satreat=satreat+1,
linkstatus=linkstatus+1) ~ 1,
maxiter=50000, nclass=3,
nrep=10, data=ds)
but in order to make this more useful, I'd like to access the attributes within the objects created by the poLCA class as such:
attr(res2, 'Nobs')
attr(res2, 'maxiter')
but they both come up as 'Null'. I expect Nobs to be 453 (determined by the function) and maxiter to be 50000 (dictated by my input value).
I'm sure I'm just being naive, but I could use any help available. Thanks a lot!

Welcome to R. You've got the model-fitting syntax right, in that you can get a model out (don't know how latent component analysis works, so can't speak to the statistical validity of your result). However, you've mixed up the different ways in which R can store information pertaining to a model.
poLCA returns an object of class poLCA, which is
a list containing the following elements:
(. . .)
Nobs number of fully observed cases (less than or equal to N).
maxiter maximum number of iterations through which the estimation algorithm was set
to run.
Since it's a list, you can extract individual elements from your model object using the $ operator:
res2$Nobs # number of observations
res2$maxiter # maximum iterations
In some cases, there might be extractor functions to get this information without having to do low-level indexing. For example, many model-fitting functions will have a fitted method, which pulls out the vector of fitted values on the training data; and similarly residuals pulls out the vector of residuals. You should check whether there are such extractor functions provided by the poLCA package and use them if possible; that way, you're not making assumptions about the structure of the model object that might be broken in the future.
This is distinct to getting the attributes of an object, which is what you use attr for. Attributes in R are what you might call metadata: they contain R-specific information about an object itself, rather than information about whatever it is the object relates to. Examples of common attributes include class (the class of an object), dim (the dimensions of an array or matrix), names (names of individual elements of a vector/list/array) and so on.

Related

Estimation to plot person-item map not feasible because items "have no 0-responses" in data matrix

I am trying to create a person item map that organizes the questions from a dataset in order of difficulty. I am using the eRm package and the output should looks like follows:
[person-item map] (https://hansjoerg.me/post/2018-04-23-rasch-in-r-tutorial_files/figure-html/unnamed-chunk-3-1.png)
So one of the previous steps, before running the function that outputs the map, I have to fit the data set to have a matrix which is the object that the plotting functions uses to create the actual map, but I am having an error when creating that matrix
I have already tried to follow and review some documentation that might be useful if you want to have some extra-information:
[Tutorial] https://hansjoerg.me/2018/04/23/rasch-in-r-tutorial/#plots
[Ploting function] https://rdrr.io/rforge/eRm/man/plotPImap.html
[Documentation] https://eeecon.uibk.ac.at/psychoco/2010/slides/Hatzinger.pdf
Now, this is the code that I am using. First, I install and load the respective libraries and the data:
> library(eRm)
> library(ltm)
Loading required package: MASS
Loading required package: msm
Loading required package: polycor
> library(difR)
Then I fit the PCM and generate the object of class Rm and here is the error:
*the PCM function here is specific for polytomous data, if I use a different one the output says that I am not using a dichotomous dataset
> res <- PCM(my.data)
>Warning:
The following items have no 0-responses:
AUT_10_04 AUN_07_01 AUN_07_02 AUN_09_01 AUN_10_01 AUT_11_01 AUT_17_01
AUT_20_03 CRE_05_02 CRE_07_04 CRE_10_01 CRE_16_02 EFEC_03_07 EFEC_05
EFEC_09_02 EFEC_16_03 EVA_02_01 EVA_07_01 EVA_12_02 EVA_15_06 FLX_04_01
... [rest of items]
>Responses are shifted such that lowest
category is 0.
Warning:
The following items do not have responses on
each category:
EFEC_03_07 LC_07_03 LC_11_05
Estimation may not be feasible. Please check
data matrix
I must clarify that all the dataset has a range from 1 to 5. Is a Likert polytomous dataset
Finally, I try to use the plot function and it does not have any output, the system just keep loading ad-infinitum with no answer
>plotPImap(res, sorted=TRUE)
I would like to add the description of that particular function and the arguments:
>PCM(X, W, se = TRUE, sum0 = TRUE, etaStart)
#X
Input data matrix or data frame with item responses (starting from 0);
rows represent individuals, columns represent items. Missing values are
inserted as NA.
#W
Design matrix for the PCM. If omitted, the function will compute W
automatically.
#se
If TRUE, the standard errors are computed.
#sum0
If TRUE, the parameters are normed to sum-0 by specifying an appropriate
W.
If FALSE, the first parameter is restricted to 0.
#etaStart
A vector of starting values for the eta parameters can be specified. If
missing, the 0-vector is used.
I do not understand why is necessary to have a score beginning from 0, I think that that what the error is trying to say but I don't understand quite well that output.
I highly appreciate any hint that you can provide me
Feel free to ask for any information that could be useful to reach the solution to this issue
The problem is not caused by the fact that there are no items with 0-responses. The model automatically corrects this by centering the response scale categories on zero. (You'll notice that the PI-map that you linked to is centered on zero. Also, I believe the map you linked to is of dichotomous data. Polytomous data should include the scale categories on the PI-map, I believe.)
Without being able to see your data, it is impossible to know the exact cause though.
It may be that the model is not converging. That may be what this error was alluding to: Estimation may not be feasible. Please check data matrix. You could check by entering > res at the prompt. If the model was able to converge you should see something like:
Conditional log-likelihood: -2.23709
Number of iterations: 27
Number of parameters: 8
...
Does your data contain answers with decimal numbers? I found the same error, I solved it by using dplyr::dense_rank() function:
df_ranked <- sapply(df_decimal_data, dense_rank)
Worked.

package "fdapace" (R) - How to access the principal components of the functional principal component analysis

After applying the FPCA() function of the "fdapace" package on a dataset, the function returns a FPCA object with various values and fields. Unfortunately I don't know which of those fields contain the Principal components and how to access them or plot them. I know that there is a documentation for the package but as a beginner it doesn't really help me(no criticism intended). You can find the documentation here: fdapace.pdf
The estimate of the functional principal components (FPCs) are saved in xiEst in the result list, a matrix each row of which is the FPCs for a subject in the data. You can make whatever plots you want with this information. See the following for an example.
res = FPCA(Ly, Lt)
res$xiEst # This is the matrix containing the FPC estimates.
Plotting the first eigenfunction:
workGrid = FPCAsparse$workGrid
phi1=FPCAsparse$phi[,1]
plot(workGrid,phi1)
Plotting the mean function:
mu=FPCAsparse$mu
workGrid = FPCAsparse$workGrid
plot(workGrid,mu)

Properly define the members and invariants of a class in R

We have an R package for a certain purpose. The basic data structure is a correlation function which is a real/complex valued function for a smallish (100) number (T) of time slices. We have multiple measurements (N) of it, so at its core it is a N×T matrix. But then there are more things that it can become:
One can bootstrap it with R samples such that it becomes an R×T matrix. However we want to keep the original data, so there is a field for the R×T matrix and another for the N×T matrix.
It can be symmetrized which will cut T in half and also alter various other functions that work with those objects.
Also it can be shifted which takes the difference between consecutive elements and therefore drops one time slice. The first column in the matrix then corresponds to t = 1 and not t = 0 any more, which becomes important in fits to the data.
Correlation functions may have an imaginary part, this is stored as a second real matrix. But they might not.
When doing non-linear operations with the data, we do that once with the average of the original data and each bootstrap sample. If the result is another correlation function, that object will not have “original data” but only the average.
So basically we have a class that can have various fields and only the average of the original data is really common.
To make things worse, there is no formal documentation for the possible members and the invariants associated with them. Coming from C++ where a concise class definition allows me do encapsulation, The S3 class system in R seems like an invitation for inconsistencies.
This surfaced a few times when some function taking such a correlation function as argument and expected some field to be present while it was not. The code is riddled with lines that just add another field to the class when performing an operation.
Long story short: Is there some automatically enforcable way in the S3 class system to have an exhaustive list of all the fields that a class can have? Right now I only see the possibility to document (in English) in the constructor function and just hope nobody missed a line where fields were added.

How to use RWeka classifiers function attribute "options"?

In RWeka classifiers, there is an attribute "options" in the classifier's function call, e.g. Bagging(formula, data, subset, na.action, control = Weka_control(), options = NULL). Could some one please give an example (a sample R code) on how to define these options?
I would be interested in passing on some options (such as the number of iterations and size of each bag) to Bagging meta learner of RWeka. Thanks in advance!
You can get at the features that you mentioned, but not through options.
First, what does options do? According to the help page ?Bagging
Argument options allows further customization. Currently, options model and instances (or partial matches for these) are used: if set to TRUE, the model frame or the corresponding Weka instances, respectively, are included in the fitted model object, possibly speeding up subsequent computations on the object. By default, neither is included.
So options simply stores more information in the returned result. To get at the features that you want, you need to use control. You will need to construct the value for control using the function Weka_control. Without some help, it is hard to know how to use that, but luckily, help is available through WOW the Weka Option Wizard. Because there are many options, the output is long. I am going to truncate it to just the part about the features that you mentioned - the number of iterations and size of each bag. But do look at what else is available.
WOW(Bagging)
-P Size of each bag, as a percentage of the training set size. (default 100)
-I <num>
Number of iterations. (current value 10)
Number of arguments: 1.
Repeating: I have truncated the output to show just these two options.
Example: Iris data
Suppose that I wanted to use bagging with the iris data with the bag size being 90% of the data (instead of the default 100%) and with 20 iterations (instead of the default 10). First, I would build the Weka_control, then include that in my call to Bagging.
WC = Weka_control(P=90, I=20)
BagOfIrises = Bagging(Species ~ ., data=iris, control=WC)
I hope that this helps.

Exclude specific tensors being updated by optimizer in TensorFlow

I have two graphs, which I suppose to train them independently, which means I have two different optimizers, but at the same time one of them is using the tensor values of the other graph. As a result, I need to be able to stop specific tensors being updated while training one of the graphs. I have assigned two different namescopes two my tensors and using this code to control updates over tensors for different optimizers:
mentor_training_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, "mentor")
train_op_mentor = mnist.training(loss_mentor, FLAGS.learning_rate, mentor_training_vars)
mentee_training_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, "mentee")
train_op_mentee = mnist.training(loss_mentee, FLAGS.learning_rate, mentee_training_vars)
the vars variable is being used like below, in the training method of mnist object:
def training(loss, learning_rate, var_list):
# Add a scalar summary for the snapshot loss.
tf.summary.scalar('loss', loss)
# Create the gradient descent optimizer with the given learning rate.
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
# Create a variable to track the global step.
global_step = tf.Variable(0, name='global_step', trainable=False)
# Use the optimizer to apply the gradients that minimize the loss
# (and also increment the global step counter) as a single training step.
train_op = optimizer.minimize(loss, global_step=global_step, var_list=var_list)
return train_op
I'm using the var_list attribute of the optimizer class in order to control vars being updated by the optimizer.
Right now I'm confused whether I have done what I supposed to do appropriately, and even if there is anyway to check if any optimizer would only update partial of a graph?
I would appreciate if anyone can help me with this issue.
Thanks!
I have had a similar problem and used the same approach as you, i.e. via the var_list argument of the optimizer. I then checked whether the variables not intended for training stayed the same using:
the_var_np = sess.run(tf.get_default_graph().get_tensor_by_name('the_var:0'))
assert np.equal(the_var_np, pretrained_weights['the_var']).all()
pretrained_weights is a dictionary returned by np.load('some_file.npz') which I used to store the pre-trained weights to disk.
Just in case you need that as well, here is how you can override a tensor with a given value:
value = pretrained_weights['the_var']
variable = tf.get_default_graph().get_tensor_by_name('the_var:0')
sess.run(tf.assign(variable, value))

Resources