Working with unsplit in R

Working with unsplit in R - r

This question is related to this here I have accepted too early as it doesn't solve what I actually needed.
The data looks more like this:
m4 <- read.table(header=T, text='
model1 model2 model3 Output Model
0.13 0.113 0.18 0.4 m4
0.157 0.11 0.21 0.50 m4
0.058 0.03 0.18 0.46 m4 ')
m3 <- read.table(header=T, text='
model1 model2 model3 Output Model
0.13 0.113 0.18 0.4 m3
0.157 0.11 0.21 0.50 m3
0.058 0.03 0.18 0.46 m3 ')
m2 <- read.table(header=T, text='
model1 model2 model3 Output Model
0.200 0.099 NA NA m3
0.356 0.25 NA NA m3 ')
m1 <- read.table(header=T, text='
model1 model2 model3 Output Model
0.200 0.099 0.3 0.9 m1
0.35 0.252 0.4 0.9 m1 ')
models <- list(m4=m4, m3=m3, m2=m2, m1=m1)
EDIT1:
Desired result with unsplit:
model1 model2 model3 Output Model
0.200 0.099 0.3 0.9 m1
0.35 0.252 0.4 0.9 m1
0.13 0.113 0.18 0.4 m4
0.157 0.11 0.21 0.50 m4
0.058 0.03 0.18 0.46 m4
The desired soulution must be within unsplit...that means: 4th list entry (4,4)== means 2 rows of the 4th list entry, likewise (1,1,1) means first list entry with 3 rows.
EDIT2: Can someone point me where I can read more about unsplit? I cannot find anything even in books.
EDIT 3: Now suppose that I have this helper function to provide me the indexing for extraction from the list:
mat <- matrix(c(1,1,1,1,1),5,4)
mat[1,1] <- 0.66; mat[1,2] <- 0.33; mat[1,3] <- 0.33
mat[2,1] <- .66; mat[2,2] <- 0.33; mat[2,3] <- 0.33
extract <- apply(as.matrix(mat),1,which.max)
This suppose to work:
unsplit(models, extract)

unsplit doesn't do what you think it does. To extract the 1st and 4th models, you just need your usual square bracket indexing.
models[c("m1", "m4")]
or
models[c(1, 4)]

You could use rbind and just access the elements with [:
do.call(rbind, models[c("m1", "m4")])
model1 model2 model3 Output Model
m1.1 0.200 0.099 0.30 0.90 m1
m1.2 0.350 0.252 0.40 0.90 m1
m4.1 0.130 0.113 0.18 0.40 m4
m4.2 0.157 0.110 0.21 0.50 m4
m4.3 0.058 0.030 0.18 0.46 m4

Related

Exclude factor loadings from ID variable in order to create latent concept

I conducted a factor analysis and wanted to create the latent concept (postmaterialism and materialism) with the correlated variables (see output fa). Later on I want to merge this data set I used for the fa with another data set, hence I kept the ID variable in order to use it later as key variable. Now my problem is that I need to exclude the factor loadings from the ID variable because otherwise it'll contort the score of the latent concept of each individual. I tried different commands like:
!("ID"), with = FALSE, - ("ID"), with = FALSE, setdiff(names(expl_fa2),("ID")), with = FALSE
but nothing worked.
This is my code for the latent variables:
data_fa_1 <- data_fa_1 %>% mutate(postmat = expl_fa2$score[,1], mat = expl_fa2$scores[,2])
And this is the output from the factor analysis:
Standardized loadings (pattern matrix) based upon correlation matrix
MR1 MR2 h2 u2 com
import_of_new_ideas 0.48 0.06 0.233 0.77 1.0
import_of_safety 0.06 0.61 0.375 0.63 1.0
import_of_trying_things 0.66 0.03 0.435 0.57 1.0
import_of_obedience 0.01 0.49 0.240 0.76 1.0
import_of_modesty 0.01 0.44 0.197 0.80 1.0
import_of_good_time 0.62 0.01 0.382 0.62 1.0
import_of_freedom 0.43 0.16 0.208 0.79 1.3
import_of_strong_gov 0.15 0.57 0.350 0.65 1.1
import_of_adventures 0.64 -0.15 0.427 0.57 1.1
import_of_well_behav 0.03 0.64 0.412 0.59 1.0
import_of_traditions 0.03 0.50 0.253 0.75 1.0
import_of_fun 0.67 0.03 0.449 0.55 1.0
ID 0.07 0.04 0.007 0.99 1.7
Can anyone help me with the command I need to use in order to exclude the factor loadings from the ID variable (see output fa) from the creation of the latent variables "postmat" and "mat"?

Not sure if this is really your question, but assuming you just want to remove the first column from a data.table, here is an example data.table and 3 ways how you could exclude the ID column for that example:
DT <- data.table(
ID=LETTERS[1:10],
matrix(rnorm(50), nrow=10, dimnames = list(NULL, paste0("col", 1:5)))
)
DT[,- 1]
DT[, -"ID"]
DT[, setdiff(colnames(DT), "ID"), with=FALSE]

Problem with grouped psych::alpha within the do::dplyr/tidyverse and broom::tidy

I have survey data performed using the same questionnaire in different languages. I would like to write an elegant dplyr/tidyverse code for the reliability for each language, using psych::alpha within. Let's imagine, that the data frame (df) looks like that:
I want to calculate item and scale reliability for Q_1:Q_6, for each group indicated by the group_var variable and the code I wrote looks like this
require(tidyverse)
require(psych)
require(broom)
df %>%
select(group_var, Q_1:Q_6) %>%
as.data.frame() %>%
group_by(group_var) %>%
do(tidy(psych::alpha(c(Q_1:Q_6))))
but when I run the code, I got an error message:
Error in psych::alpha(c(Q_1:Q_6)) :
object 'Q_1' not found
What is wrong with the code?
Thanks in advance.

I don't think tidy works on psych::alpha(), using an example:
r4 <- sim.congeneric()
tidy(alpha(r4))
Error: No tidy method for objects of class psych
So tidy is out of question, unless there is a Best thing you can do is wrap them up in a list within a tibble:
library(dplyr)
library(tidyr)
library(purrr)
library(psych)
library(broom)
df = data.frame(group_var=sample(LETTERS[1:6],100,replace=TRUE),
matrix(sample(0:3,900,replace=TRUE),nrow=100))
colnames(df)[-1] = c(paste0("Q_",1:6), paste0("V_", 23:25))
res = df %>%
select(group_var, Q_1:Q_6) %>%
nest(data=Q_1:Q_6) %>%
mutate(alpha = map(data,
~alpha(.x,keys=c("Q_1","Q_2","Q_3","Q_4","Q_5","Q_6"))
))
res$alpha[[1]]
Reliability analysis
Call: alpha(x = .x, keys = c("Q_1", "Q_2", "Q_3", "Q_4", "Q_5", "Q_6"))
raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
-0.37 -0.3 0.13 -0.04 -0.23 0.6 1.6 0.36 0.039
lower alpha upper 95% confidence boundaries
-1.54 -0.37 0.81
Reliability if an item is dropped:
raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
Q_1- -0.38 -0.38221 -0.143 -0.05854 -0.27652 0.61 0.028 -0.080
Q_2- -0.21 -0.19042 0.173 -0.03305 -0.15996 0.54 0.048 0.066
Q_3- -0.38 -0.26988 0.096 -0.04439 -0.21252 0.61 0.053 0.046
Q_4- -0.54 -0.41760 -0.064 -0.06261 -0.29458 0.68 0.045 -0.016
Q_5- -0.35 -0.26006 0.154 -0.04305 -0.20639 0.60 0.058 0.059
Q_6- 0.03 -0.00088 0.107 -0.00018 -0.00088 0.42 0.024 -0.016
Item statistics
n raw.r std.r r.cor r.drop mean sd
Q_1- 13 0.42 0.45 0.552 -0.062 0.77 1.01
Q_2- 13 0.38 0.33 -0.073 -0.162 1.85 1.14
Q_3- 13 0.39 0.38 0.083 -0.058 1.92 0.95
Q_4- 13 0.45 0.47 0.416 0.050 1.62 0.87
Q_5- 13 0.33 0.38 -0.039 -0.073 2.08 0.86
Q_6- 13 0.21 0.18 -0.137 -0.309 1.38 1.12
Non missing response frequency for each item
0 1 2 3 miss
Q_1 0.08 0.15 0.23 0.54 0
Q_2 0.38 0.23 0.23 0.15 0
Q_3 0.31 0.38 0.23 0.08 0
Q_4 0.15 0.38 0.38 0.08 0
Q_5 0.38 0.31 0.31 0.00 0
Q_6 0.15 0.38 0.15 0.31 0
A quick check seems tidystats might be able to do it, but I ran the example code and doesn't seem to work. So you can try it for yourself.

Obtain standardized loadings ("pattern matrix") from psych::fa object

The psych::print.psych() function produces beautiful output for the factor analysis objects produced by psych::fa(). I would like to obtain the table that follows the text "Standardized loadings (pattern matrix) based upon correlation matrix" as a data frame without cutting and pasting.
library(psych)
my.fa <- fa(Harman74.cor$cov, 4)
my.fa #Equivalent to print.psych(my.fa)
Yields the following (I'm showing the first four items here):
Factor Analysis using method = minres
Call: fa(r = Harman74.cor$cov, nfactors = 4)
Standardized loadings (pattern matrix) based upon correlation matrix
MR1 MR3 MR2 MR4 h2 u2 com
VisualPerception 0.04 0.69 0.04 0.06 0.55 0.45 1.0
Cubes 0.05 0.46 -0.02 0.01 0.23 0.77 1.0
PaperFormBoard 0.09 0.54 -0.15 0.06 0.34 0.66 1.2
Flags 0.18 0.52 -0.04 -0.02 0.35 0.65 1.2
I tried examining the source code for print.psych (Using View(print.psych) in RStudio), but could only find a section for printing standardized loadings for 'Factor analysis by Groups'.
The my.fa$weights are not standardized, and the table is missing the h2, u2, and com columns. If they can be standardized, the following code could work:
library(data.table)
library(psych)
my.fa <- fa(Harman74.cor$cov,4)
my.fa.table <- data.table(dimnames(Harman74.cor$cov)[[1]],
my.fa$weights, my.fa$communalities, my.fa$uniquenesses, my.fa$complexity)
setnames(my.fa.table, old = c("V1", "V3", "V4", "V5"),
new = c("item", "h2", "u2", "com"))
Printing my.fa.table gives the following (I show the first four lines), which indicates $weights is incorrect:
item MR1 MR3 MR2 MR4 h2 u2 com
1: VisualPerception -0.021000973 0.28028576 0.006002429 -0.001855021 0.5501829 0.4498201 1.028593
2: Cubes -0.003545975 0.11022570 -0.009545919 -0.012565221 0.2298420 0.7701563 1.033828
3: PaperFormBoard 0.028562047 0.13244895 -0.019162262 0.014448449 0.3384722 0.6615293 1.224154
4: Flags 0.009187032 0.14430196 -0.025374834 -0.033737089 0.3497962 0.6502043 1.246102
Replacing $weights with $loadings gives the following error message:
Error in as.data.frame.default(x, ...) :
cannot coerce class ‘"loadings"’ to a data.frame
Update:
Adding [,] fixed the class issue:
library(data.table)
library(psych)
my.fa <- fa(Harman74.cor$cov,4)
my.fa.table <- data.table(dimnames(Harman74.cor$cov)[[1]],
my.fa$loadings[,], my.fa$communalities, my.fa$uniquenesses, my.fa$complexity)
setnames(my.fa.table, old = c("V1", "V3", "V4", "V5"),
new = c("item", "h2", "u2", "com"))
my.fa.table
item MR1 MR3 MR2 MR4 h2 u2 com
1: VisualPerception 0.04224875 0.686002901 0.041831185 0.05624303 0.5501829 0.4498201 1.028593
2: Cubes 0.05309628 0.455343417 -0.022143990 0.01372376 0.2298420 0.7701563 1.033828
3: PaperFormBoard 0.08733001 0.543848733 -0.147686005 0.05523805 0.3384722 0.6615293 1.224154
4: Flags 0.17641395 0.517235582 -0.038878915 -0.02229273 0.3497962 0.6502043 1.246102
I would still be happy to get an answer that does this more elegantly or explains why this isn't built in.

It is not built in because each person wants something slightly different. As you discovered, you can create a table by combining four objects from fa: the loadings, the communalities, the uniqueness, and the complexity.
df <- data.frame(unclass(f$loadings), h2=f$communalities, u2= f$uniqueness,com=f$complexity)
round(df,2)
so, for the Thurstone correlation matrix:
f <- fa(Thurstone,3)
df <- data.frame(unclass(f$loadings), h2=f$communalities, u2= f$uniqueness,com=f$complexity)
round(df,2)
Produces
MR1 MR2 MR3 h2 u2 com
Sentences 0.90 -0.03 0.04 0.82 0.18 1.01
Vocabulary 0.89 0.06 -0.03 0.84 0.16 1.01
Sent.Completion 0.84 0.03 0.00 0.74 0.26 1.00
First.Letters 0.00 0.85 0.00 0.73 0.27 1.00
Four.Letter.Words -0.02 0.75 0.10 0.63 0.37 1.04
Suffixes 0.18 0.63 -0.08 0.50 0.50 1.20
Letter.Series 0.03 -0.01 0.84 0.73 0.27 1.00
Pedigrees 0.38 -0.05 0.46 0.51 0.49 1.96
Letter.Group -0.06 0.21 0.63 0.52 0.48 1.25
Or, you can try the fa2latex for nice LaTex based formatting.
fa2latex(f)
which produces a LateX table in quasi APA style.

sum, roundoff and replace values in R

I have two questions?
> data<-read.table("UC.txt",header=TRUE, sep="\t")
> data$tot<-data$P1+data$P2+data$P3+data$P4
> head(data, 5)
geno P1 P2 P3 P4 tot
1 G1 0.015 0.007 0.026 0.951 0.999
2 G2 0.008 0.006 0.015 0.970 0.999
3 G3 0.009 0.006 0.017 0.968 1.000
4 G4 0.011 0.007 0.017 0.965 1.000
5 G5 0.013 0.005 0.021 0.961 1.000
Question #1: sometimes, number of column varies, so, how to sum column2 to last column. something like data[2]:data[n]
library("plyr")
> VD<-function(P4, tot){
if(tot > 1) {return(P4-0.01)}
if(tot < 1) {return(P4+0.01)}
if(tot == 1) {return(P4)}
}
> minu<-ddply(data, 'geno', summarize, Result=VD(P4, tot))
> v <- data$geno==minu$geno
> data[v, "P4"] <- minu[v, "Result"]
> data <- subset(data, select = -tot)
> data$tot<-data$P1+data$P2+data$P3+data$P4
> head(data, 5)
geno P1 P2 P3 P4 tot
1 G1 0.02 0.01 0.03 0.94 1
2 G2 0.01 0.01 0.02 0.96 1
3 G3 0.01 0.01 0.02 0.96 1
4 G4 0.01 0.01 0.02 0.96 1
5 G5 0.01 0.01 0.02 0.96 1
Question #2: Here, I need to roundoff 'tot' to 1 by adjusting P1 to P4.
condition :
1) I should adjust the maximum among P1 to P4
2) The adjusting values may differ, like 0.01, 0.001, 0.0001. ( it is based on 1-tot)
How to do this?
Thanks in advance

For question1, to sum all columns except the first one:
dat$tot <- rowSums(dat[,-1])

Treatment randomization

I have a matrix of 8 rows and 12 columns, and randomly distributed 10 different treatments with 9 replicates and a final treatment only with 6 replicates in the matrix. The code might be redundant, but it was the first think that came to mind and worked. I just wanted to have a scheme so that I could follow easily afterwards in the lab, to avoid mistakes:
library(ggplot2)
library(RColorBrewer)
library(reshape2)
library(scales)
replicates<-c(rep(seq(1:11),c(rep(9,10),6)));replicates
dimna<-list(c("A","B","C","D","E","F","G","H"),seq(1,12,1))
plate<-array(sample(replicates),dim=c(8,12),dimnames=dimna);plate
platec<-melt(plate);platec
guide<-ggplot(platec,aes(Var2,Var1,fill=factor(value))) + geom_tile()+geom_text(aes(fill=factor(value),label=value)) + ylim(rev(levels(platec$Var1))) + theme_bw() + theme(panel.grid.major.y=element_blank(),panel.grid.minor.y=element_blank(),panel.grid.major.x=element_blank(), axis.text.x=element_text(size=10), axis.title.y=element_blank(), axis.text.y=element_text(size=12)) + scale_fill_brewer(name="",palette="Spectral") + scale_x_continuous("",labels=c(seq(1,12,1)),breaks=c(seq(1,12,1)));guide
However, now imagine that I take measurements for the randomized matrix multiple times. And for the data processing I need to identify the treatment and replicates in the matrix. I can either have the data at the end in a columnwise:
A1 A2 A3 A4 A5 A6 A7 A8
0.12 0.2 0.124 0.14 0.4 0.18 0.46 0.47
0.13 0.21 0.6 0 0 0.58 0.4 0.2
0.15 0.248 0.58 0.4 0.2 0.248 0.2 0.18
0.18 0.46 0.47 0.3 0.21 0.2 0.21 0.58
0.1784 0.14 0.95 0.7 0.248 0.21 0.248 0.248
.
.
.
Or rowwise fashion:
A1 0.12 0.13 0.15 0.18 0.1784
A2 0.2 0.21 0.248 0.46 0.14
A3 0.124 0.6 0.58 0.47 0.95
A4 0.14 0 0.4 0.3 0.7
A5 0.4 0 0.2 0.21 0.248
A6 0.18 0.58 0.248 0.2 0.21
A7 0.46 0.4 0.2 0.21 0.248
A8 0.47 0.2 0.18 0.58 0.248
...
Is there a way in R in which I can relate the random matrix to the data I have collected, I have no clue on how to begin even. I'm really sorry for not having an attempt even, but I honestly wouldn't know on how to start

I think I know what you're asking... let me know if this doesn't make sense.
You need to have a design dataframe first - let's make a dummy plate:
Wells <- paste0(rep(LETTERS[1:8],each=12), rep(1:12, times = 8))
design <- data.frame(Wells, ID = sample(letters[1:10], 96, replace = TRUE))
Then when you get your result, assuming it's in a dataframe (your 'rowwise fashion?'), you can merge them together:
#dummy result data
result <- data.frame(Wells, measure = rnorm(96, 0.5))
result_whole <- merge(design, result)
head(result_whole)
# Wells ID measure
#1 A1 j -0.4408472
#2 A10 d -0.5852285
#3 A11 d 1.0379943
#4 A12 e 0.6917493
#5 A2 g 0.8126982
#6 A3 b 2.0218953
If you keep your designs neatly, this is very straightforward. You can then label the results (measure in this case) however you want to keep track of it all.
I hope that addresses your problem...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Working with unsplit in R - r

unsplit doesn't do what you think it does. To extract the 1st and 4th models, you just need your usual square bracket indexing. models[c("m1", "m4")] or models[c(1, 4)]

You could use rbind and just access the elements with [: do.call(rbind, models[c("m1", "m4")]) model1 model2 model3 Output Model m1.1 0.200 0.099 0.30 0.90 m1 m1.2 0.350 0.252 0.40 0.90 m1 m4.1 0.130 0.113 0.18 0.40 m4 m4.2 0.157 0.110 0.21 0.50 m4 m4.3 0.058 0.030 0.18 0.46 m4

Related

Exclude factor loadings from ID variable in order to create latent concept

Problem with grouped psych::alpha within the do::dplyr/tidyverse and broom::tidy

Obtain standardized loadings ("pattern matrix") from psych::fa object

sum, roundoff and replace values in R

Treatment randomization

Categories

Resources