How create series of indexing matrix with R? - r

I would like to create a for-loop in order to generate several tables called S1, S2...
For the moment, this is what I have:
S1<-int[int$IdVar == "X1",1:n]
S2<-int[int$IdVar == "X2",1:n]
S3<-int[int$IdVar == "X3",1:n]
S4<-int[int$IdVar == "X4",1:n]
S5<-int[int$IdVar == "X5",1:n]
S6<-int[int$IdVar == "X6",1:n]
S7<-int[int$IdVar == "X7",1:n]
But I can have more or less factors for IdVar variable. I have to add or suppress lines...which is not very efficient!
Could you help me please to find the best way to create my loop ?
I hope I have made this sufficiently clear.
Thank you very much for your help,

We could loop the "X1:X7" values using lapply and subset the 'int' dataset. Change the names of the list elements as S1:S7
lst <- setNames(lapply(paste0('X', 1:7), function(x)
int[int$IdVar==x, 1:n]), paste0("S", 1:7))
It would be better to keep the datasets within the list rather than creating individual objects in the global environment. If you want to do this, then list2env can be used.
list2env(lst, envir=.GlobalEnv)
Another option is using assign with a for loop.

A sillier but nonetheless effective solution:
Get R to automatically write a script to assign the variables and then source that script.
for(s in 1:7){
write(paste0("S", s, "<-int[int$IdVar == \"X", s, "\",1:n])", "tmp.R", append = T)
}
source("tmp.R")
file.remove("tmp.R")
Such an approach I also find useful for referring back to variables that you've created in an automated manner (as you are doing). You can use assign to create variables, but not to refer back to them afterwards. (If anyone knows another way to do that I'd be interested).

Consider using a list instead of creating global variables. 'split' will do what you want, and if you don't like the names "X1", you can change then to "S1" easily. This will also handle all the values in 'IdVar' without you having to specify them:
> n <- 100
> int <- data.frame(IdVar = sample(paste0("X", 1:7), n, TRUE)
+ , val = runif(n)
+ , stringsAsFactors = FALSE
+ )
> # split into a list
> S <- split(int, int$IdVar)
>
> str(S)
List of 7
$ X1:'data.frame': 13 obs. of 2 variables:
..$ IdVar: chr [1:13] "X1" "X1" "X1" "X1" ...
..$ val : num [1:13] 0.104 0.515 0.135 0.501 0.94 ...
$ X2:'data.frame': 18 obs. of 2 variables:
..$ IdVar: chr [1:18] "X2" "X2" "X2" "X2" ...
..$ val : num [1:18] 0.7697 0.6108 0.9354 0.2199 0.0235 ...
$ X3:'data.frame': 16 obs. of 2 variables:
..$ IdVar: chr [1:16] "X3" "X3" "X3" "X3" ...
..$ val : num [1:16] 0.758 0.347 0.48 0.781 0.157 ...
$ X4:'data.frame': 11 obs. of 2 variables:
..$ IdVar: chr [1:11] "X4" "X4" "X4" "X4" ...
..$ val : num [1:11] 0.658 0.247 0.515 0.731 0.114 ...
$ X5:'data.frame': 15 obs. of 2 variables:
..$ IdVar: chr [1:15] "X5" "X5" "X5" "X5" ...
..$ val : num [1:15] 0.502 0.71 0.394 0.738 0.147 ...
$ X6:'data.frame': 14 obs. of 2 variables:
..$ IdVar: chr [1:14] "X6" "X6" "X6" "X6" ...
..$ val : num [1:14] 0.687 0.625 0.705 0.468 0.382 ...
$ X7:'data.frame': 13 obs. of 2 variables:
..$ IdVar: chr [1:13] "X7" "X7" "X7" "X7" ...
..$ val : num [1:13] 0.903 0.466 0.558 0.799 0.527 ...

Related

R function write.xlsx only convert 1 data

I have 4 columns and 34 rows of data. I tried to export it into excel with xlsx format using write.xlsx. But when I convert it, the excel file only shows 1 data.
library(openxlsx)
data = scale(DATA2)
write.xlsx(data, "outpu2t.xlsx");
This is my data
and this is the output
The key consideration here is that the output of the scale() function is an object of type matrix() when write.xlsx() requires an input of type data.frame(). The following code creates a data frame, uses scale() to scale it, and prints the structure to show that the data frame has bene converted to a matrix().
df <- data.frame(matrix(runif(4 * 34),ncol=4))
str(df)
> df <- data.frame(matrix(runif(4 * 34),ncol=4))
> str(df)
'data.frame': 34 obs. of 4 variables:
$ X1: num 0.438 0.134 0.671 0.392 0.613 ...
$ X2: num 0.9 0.793 0.668 0.351 0.275 ...
$ X3: num 0.201 0.892 0.74 0.788 0.14 ...
$ X4: num 0.996 0.619 0.492 0.904 0.615 ...
scaledData <- scale(df)
str(scaledData)
> scaledData <- scale(df)
> str(scaledData)
num [1:34, 1:4] -0.174 -1.386 0.752 -0.36 0.521 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "X1" "X2" "X3" "X4"
- attr(*, "scaled:center")= Named num [1:4] 0.482 0.591 0.508 0.471
..- attr(*, "names")= chr [1:4] "X1" "X2" "X3" "X4"
- attr(*, "scaled:scale")= Named num [1:4] 0.251 0.206 0.306 0.29
..- attr(*, "names")= chr [1:4] "X1" "X2" "X3" "X4"
We can solve the problem by casting the output of scale() with data.frame().
The following code generates a 4 x 34 matrix, scales it, and casts to a data.frame() as part of write.xlsx().
aMatrix <- matrix(runif(4 * 34),ncol=4)
library(openxlsx)
write.xlsx(data.frame(scale(aMatrix)),"./data/aSpreadsheet.xlsx")
The resulting spreadsheet looks like this when viewed in Microsoft Excel.
Note that writexl::write_xlsx() will also fail when passed an input of type matrix(), so this is not a tidyverse vs. openxlsx problem.
b <- scale(aMatrix)
write_xlsx(b,"./data/aSpreadsheetWritexl.xlsx")
...generates the following error:
> write_xlsx(b,"./data/aSpreadsheetWritexl.xlsx")
Error in write_xlsx(b, "./data/aSpreadsheetWritexl.xlsx") :
Argument x must be a data frame or list of data frames
I am cognisant that you asked this question pointing to the package {openxlsx}. I used this a while back as well and ran into multiple problems. Being biased and leaning towards the {tidyverse} family, there is a cool package that comes from that part of the R/RStudio ecosystem: {writexl}.
If not yet installed: install.packages("writexl")
Then run the following without pain ... and it does not require to install other fancy stuff/dependencies/etc:
library(writexl)
# create a reproducible data set of 34 rows
my_data <- iris[1:34,]
# write-out my_data to the data subfolder in the project - configure as appropriate for your environment
write_xlsx(x = my_data, path = "./data/my_data.xlsx")
This gets you without problems:
The solution is to convert matrix data into a data frame.
data <- as.data.frame(data)
Then,
write.xlsx(data, "outpu2t.xlsx")
will work as expected

Using apply over two lists of different lengths

This question is related to my earlier question found here: https://stackoverflow.com/questions/33089532/r-accounting-for-a-factor-with-this-logistic-regression-function-replace-lappl
I realize that I didn't do a good job at asking the first question, so here is a more simple analog with actual data:
My data looks something like this:
#data look like this, but with a variable number of "y" columms
wk<-rep(1:50,2)
X<-rnorm(1:100,1)
y1<-rnorm(1:100,1)
y2<-rnorm(1:100,1)
df1<-as.data.frame(cbind(wk,X,y1,y2))
df1$hyst<-ifelse(df1$wk>=5 & df1$wk<32, "R", "F")
Y<-df1[, -which(colnames(df1) %in% c("wk"))] #this step makes more sense with my actual data since I have a bunch of columns to remove
l1<-length(Y)-1
lst1<-lapply(2:l1,function(x){colnames(Y[x])})
dflst<-c("Y",'Y[Y$hyst=="R",]','Y[Y$hyst=="F",]')
I want to run a model over all Y columns for the full data set (all data) and for two subsets, when the factor hyst=="R" and when hyst=="F".
To do this, I have nested two lapply functions, which sort of works, but I think it essentially doubles my results and is causing me all sorts of list headaches.
Here is the nested lapply code:
lms <- lapply(dflst, function(z){
lapply(lst1, function(y) {
form <- paste0(y, " ~ X")
lm(form, data=eval(parse(text=z)))
})
})
How can I replace or modify the nested lapply function to obtain a model run for each Y column for each data set( all, "R", and "F")?
Construct your DF list like
DFlst <- c(list(full=Y), split(Y, Y$hyst))
str(DFlst)
List of 3
$ full:'data.frame': 100 obs. of 4 variables:
..$ X : num [1:100] 1.792 3.192 0.367 1.632 1.388 ...
..$ y1 : num [1:100] 3.354 1.189 1.99 0.639 0.1 ...
..$ y2 : num [1:100] 0.864 2.415 0.437 1.069 1.368 ...
..$ hyst: chr [1:100] "F" "F" "F" "F" ...
$ F :'data.frame': 46 obs. of 4 variables:
..$ X : num [1:46] 1.792 3.192 0.367 1.632 0.707 ...
..$ y1 : num [1:46] 3.354 1.189 1.99 0.639 0.894 ...
..$ y2 : num [1:46] 0.864 2.415 0.437 1.069 1.213 ...
..$ hyst: chr [1:46] "F" "F" "F" "F" ...
$ R :'data.frame': 54 obs. of 4 variables:
..$ X : num [1:54] 1.388 2.296 0.409 1.494 0.943 ...
..$ y1 : num [1:54] 0.1002 0.6425 -0.0918 1.199 0.8767 ...
..$ y2 : num [1:54] 1.368 1.122 0.402 -0.237 1.518 ...
..$ hyst: chr [1:54] "R" "R" "R" "R" ...
Do some regressions:
res <- lapply(DFlst, function(DF) {
cols = grep("^y[0-9]+$",names(DF),value=TRUE)
lapply(setNames(cols,cols),
function(y) lm(paste(y,"~X"), data=DF))
})
str(res, list.len=2, give.attr=FALSE)
List of 3
$ full:List of 2
..$ y1:List of 12
.. ..$ coefficients : Named num [1:2] 0.903 0.111
.. ..$ residuals : Named num [1:100] 2.2509 -0.0698 1.046 -0.4464 -0.9578 ...
.. .. [list output truncated]
..$ y2:List of 12
.. ..$ coefficients : Named num [1:2] 1.423 -0.166
.. ..$ residuals : Named num [1:100] -0.2623 1.5213 -0.9253 -0.0837 0.1751 ...
.. .. [list output truncated]
$ F :List of 2
..$ y1:List of 12
.. ..$ coefficients : Named num [1:2] 0.9289 0.0769
.. ..$ residuals : Named num [1:46] 2.2871 0.0146 1.0332 -0.4157 -0.0889 ...
.. .. [list output truncated]
..$ y2:List of 12
.. ..$ coefficients : Named num [1:2] 1.4177 -0.0789
.. ..$ residuals : Named num [1:46] -0.413 1.25 -0.952 -0.22 -0.149 ...
.. .. [list output truncated]
[list output truncated]

Rename subtitles of a list using a loop

I do have to rename sublist titles within a main matrix list called l1. Each Name(n) is related to a value as a character string. Here is my code :
names(l1)[1] <- Name1
names(l1)[2] <- Name2
names(l1)[3] <- Name3
names(l1)[4] <- Name4
## ...
names(l1)[43] <- Name43
As you can see, I have 43 sublists. Is there a way do do that using an automated loop like for (i in 1:43) or something ? I tried to perform a loop but I am a beginner and that's very hard for now.
Edit : I would like to rename the elements of my list without having to type 43 lines manually. Here is the first three elements of my list :
str(l1)
List of 43
$ XXX : num [1:640, 1:3] -0.83 -0.925 -0.623 -0.191 0.155 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:3] "EV_BICYCLE" "HW_DISTANCE" "NO_ASSETS"
$ XXX : num [1:640, 1:2] -0.159 0.485 -0.686 -0.245 -3.361 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:2] "HOME_OWN" "METRO_DISTANCE"
$ XXX : num [1:640, 1:3] -0.79 1.15 0.224 0.388 -1.571 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:3] "BICYCLE" "HOME_OWN_SC" "POP_SC"
That is to say, I would like to replace the 43 XXX by Name1, Name2 ... to Name43
Try
names(l1) <- unlist(mget(ls(pattern="^Nom_F")))
str(l1, list.len=2)
#List of 3
# $ Accessibility : int [1:5, 1:5] 10 10 3 9 7 6 8 2 7 8 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr [1:5] "A" "B" "C" "D" ...
# $ Access : int [1:5, 1:5] 6 4 10 5 9 8 9 4 7 1 ...
#..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr [1:5] "A" "B" "C" "D" ...
Instead of creating separate objects, you could create a vector of real titles. For example
v1 <- LETTERS[1:3]
names(l1) <- v1
data
set.seed(42)
l1 <- setNames(lapply(1:3, function(x)
matrix(sample(1:10, 5*5, replace=TRUE), ncol=5,
dimnames=list(NULL, LETTERS[1:5]))), rep('XXX',3))
Nom_F1 <- "Accessibility"
Nom_F2 <- "Access"
Nom_F3 <- "Poverty_and_SC"

Adding principal components as variables to a data frame

I am working with a dataset of 10000 data points and 100 variables in R. Unfortunately the variables I have do not describe the data in a good way. I carried out a PCA analysis using prcomp() and the first 3 PCs seem to account for a most of the variability of the data. As far as I understand, a principal component is a combination of different variables; therefore it has a certain value corresponding to each data point and can be considered as a new variable. Would I be able to add these principal components as 3 new variables to my data? I would need them for further analysis.
A reproducible dataset:
set.seed(144)
x <- data.frame(matrix(rnorm(2^10*12), ncol=12))
y <- prcomp(formula = ~., data=x, center = TRUE, scale = TRUE, na.action = na.omit)
PC scores are stored in the element x of prcomp() result.
str(y)
List of 6
$ sdev : num [1:12] 1.08 1.06 1.05 1.04 1.03 ...
$ rotation: num [1:12, 1:12] -0.0175 -0.1312 0.3284 -0.4134 0.2341 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:12] "X1" "X2" "X3" "X4" ...
.. ..$ : chr [1:12] "PC1" "PC2" "PC3" "PC4" ...
$ center : Named num [1:12] 0.02741 -0.01692 -0.03228 -0.03303 0.00122 ...
..- attr(*, "names")= chr [1:12] "X1" "X2" "X3" "X4" ...
$ scale : Named num [1:12] 0.998 1.057 1.019 1.007 0.993 ...
..- attr(*, "names")= chr [1:12] "X1" "X2" "X3" "X4" ...
$ x : num [1:1024, 1:12] 1.023 -1.213 0.167 -0.118 -0.186 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:1024] "1" "2" "3" "4" ...
.. ..$ : chr [1:12] "PC1" "PC2" "PC3" "PC4" ...
$ call : language prcomp(formula = ~., data = x, na.action = na.omit, center = TRUE, scale = TRUE)
- attr(*, "class")= chr "prcomp"
You can get them with y$x and then chose those columns you need.
x.new<-cbind(x,y$x[,1:3])
str(x.new)
'data.frame': 1024 obs. of 15 variables:
$ X1 : num 1.14 2.38 0.684 1.785 0.313 ...
$ X2 : num -0.689 0.446 -0.72 -3.511 0.36 ...
$ X3 : num 0.722 0.816 0.295 -0.48 0.566 ...
$ X4 : num 1.629 0.738 0.85 1.057 0.116 ...
$ X5 : num -0.737 -0.827 0.65 -0.496 -1.045 ...
$ X6 : num 0.347 0.056 -0.606 1.077 0.257 ...
$ X7 : num -0.773 1.042 2.149 -0.599 0.516 ...
$ X8 : num 2.05511 0.4772 0.18614 0.02585 0.00619 ...
$ X9 : num -0.0462 1.3784 -0.2489 0.1625 0.6137 ...
$ X10: num -0.709 0.755 0.463 -0.594 -1.228 ...
$ X11: num -1.233 -0.376 -2.646 1.094 0.207 ...
$ X12: num -0.44 -2.049 0.315 0.157 2.245 ...
$ PC1: num 1.023 -1.213 0.167 -0.118 -0.186 ...
$ PC2: num 1.2408 0.6077 1.1885 3.0789 0.0797 ...
$ PC3: num -0.776 -1.41 0.977 -1.343 0.987 ...
Didzis Elferts's response only works if your data, x, has no NAs. Here's how you can add the components if your data does have NAs.
library(tidyverse)
components <- y$x %>% rownames_to_column("id")
x <- x %>% rownames_to_column("id") %>% left_join(components, by = "id")

extract the correlation matrix for the factors in the psych package's fa.poly function

I'm working from caracal's great example conducting a factor analysis on dichotomous data and I'm now struggling to extract the factors from the object produced by the psych package's fa.poly function.
Can anyone help me extract the factors from the fa.poly object (and look at the correlation)?
Please see caracal's example for the working example.
In this example you create an object with:
faPCdirect <- fa.poly(XdiNum, nfactors=2, rotate="varimax") # polychoric FA
so somewhere in faPCdirect there is what you want. I recommend using str() to inspect the structure of faPCdirect
> str(faPCdirect)
List of 5
$ fa :List of 34
..$ residual : num [1:6, 1:6] 4.79e-01 7.78e-02 -2.97e-0...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:6] "X1" "X2" "X3" "X4" ...
.. .. ..$ : chr [1:6] "X1" "X2" "X3" "X4" ...
..$ dof : num 4
..$ fit
...skip stuff....
..$ BIC : num 4.11
..$ r.scores : num [1:2, 1:2] 1 0.0508 0.0508 1
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "MR2" "MR1"
.. .. ..$ : chr [1:2] "MR2" "MR1"
..$ R2 : Named num [1:2] 0.709 0.989
.. ..- attr(*, "names")= chr [1:2] "MR2" "MR1"
..$ valid : num [1:2] 0.819 0.987
..$ score.cor : num [1:2, 1:2] 1 0.212 0.212 1
So this says that this object is a list of five, with the first element called fa and that contains an element called score.cor that is a 2x2 matrix. I think what you want is the off diagonal.
> faPCdirect$fa$score.cor
[,1] [,2]
[1,] 1.0000000 0.2117457
[2,] 0.2117457 1.0000000

Resources