I am trying to create (vector) objects in R. Thereby, I want to achieve that I don't specify a priori the name of the object. For example if I have a list of length 3, I want to create the objects p1 to p3 and if I have a list of length 10, the objects p1to p10 have to be created. The length should be arbitrary and not a priori determined.
Thanks for your help!
I guess the proper way of doing that is to consider a list p = list() and then you can use p[[i]] with i as big as you wish without having specified any length.
Then once your list is filled up, you can rename it: names(p) = paste0("p",c(1:length(p)))
Finally, if you want to get all the pi variables directly accessible, you add attach(p)
This is kind of a hack but you can do the following
short_list <- list(rnorm(10),rnorm(20),1:3)
long_list <- c(short_list,short_list )
paste0("p",seq_along(short_list))
mapply(assign, paste0("p",seq_along(short_list)), short_list, MoreArgs = list(envir = .GlobalEnv))
result:
> p3
[1] 1 2 3
you can do the same with long_list
I dont see a statistical model you will need this. Better start working with lists like short_list or data.frame's directly.
PS If you just want to use it for glm you probably want to learn formula's in R.
glm(y~., data=your_data) takes all columns in your data-frame that are not named y as regressor. Maybe this helps.
assign (and maybe also attach) are often a sign that you have not yet arrived at an "Rish" version of the code.
Considering that you need this for modeling: if your $p_1 \cdot p_n$ are of the same type, you can put them into a matrix (inside a column of a data.frame; for modeling they anyways need to be of same length):
df$matrix <- p.matrix
If you directly create the data.frame, you need to make sure the matrix is not expanded to data.frame columns:
df <- data.frame (matrix = I (matrix), ...)
Then glm (y ~ matrix, ...) will work.
For examples of this technique see e.g. packages pls or hyperSpec or the pls paper in the Journal of Statistical Software.
Related
I am venturing into the world of creating an R S3 class for the first time. My basic object is just going to be a data frame or tibble with certain columns identified by the user, so that my class-specific functions know how to find what they need. The constructor is also going to add a few columns computed from the others, and impose an ordering based on particular columns and parameter values.
I am guessing that there is canonical code for this, but I am not sure where to find it. My thought was just to have a series of attributes that each contain the column name(s) of the appropriate column(s), but it would be nice if I could supply the alternative of names or numbers. I don't need fancy name creation features because I am starting with data frames that should already have them, but I do need to be able to access each column in my object by either its actual name or its attribute name.
I am not at all confident that I have the basic idea of how to do this down properly. For example, I am unsure if if there is any advantage to having each column name or group of names be its own attribute, as vs having one attribute object consisting of a list of named char vectors of column names. I am a little fuzzy on the structure R imposes on multiple attributes, actually. But I am hoping to make this a package, so I want to do it right.
Anyone have a similar class handy that they would recommend as a model? Or a pointer to an a well-implemented base class of similar structure would also do the job (if implemented exclusively in R code).
Here is my basic idea of how I am doing the constructor:
distr <- function(X, inc, comp, AdultEq="sqrt", ..., major=NULL, minor=NULL,
wt){
attr(as.tbl(X), "class") <- "distr"
attr(X, "income") <- inc
attr(X, "incomeComponents") <- comp
attr(X, "adultEquiv") <- AdultEq
attr(X, "majorGroup") <- major
attr(X, "minorGroup") <- minor
attr(X, "weight") <- wt
# etc.
# adjust income and components for household composition
X <- mutate(X, adjInc = X[, income] / if(is.function(adultEquiv)) {
adultEquiv(...)} else {equivLst[[adultEquiv]](...)},
adjIncComp <- X[, incomeComponents] / if(is.function(adultEquiv)) {
adultEquiv(...)} else {equivLst[[adultEquiv]](...)})
X <- arrange(X, c(majorGroup, adjInc))
X <- group_by(majorGroup)
X <- mutate(X, cdf <- cumsum(weight/sum(weight)) )
# etc.
}
Then I will have methods for weighted sums and quantiles, conditional means, summary statistics, a print method, and so forth.
I tried to create a new tag for R's s3-classes, but I guess I don't have enough rep yet.
With what you're describing, you could consider creating a new S4 class. S4 is a stricter version of S3, but it makes sense with your data structure. So, instead of attributes, you could use slots. This means you can verify the object [and each column], but it also means you'd give up data frame properties for something that's more like a general list. You could then set the generics (show/print, plot, summary, etc) for that class. Hadley's book is okay on S4 classes. I also found the R manual to be very useful.
http://adv-r.had.co.nz/S4.html
https://cran.r-project.org/doc/manuals/r-release/R-ints.html
I am trying to set up a function that will run lm() on a model derived from a user defined matrix in R.
The modelMatrix would be set up by the user, but would be expected to be structured as follows:
source target
1 "PVC" "AA"
2 "Aro" "AA"
3 "PVC" "Aro"
This matrix serves to allow the user to define the dependent(target) and independent(source) variables in the lm and refers to the column names of another valuesMatrix:
PVC Ar AA
[1,] -2.677875504 0.76141471 0.006114699
[2,] 0.330537781 -0.18462039 -0.265710261
[3,] 0.609826160 -0.62470233 0.715474554
I need to take this modelMatrix and generate a relevant number of lms. Eg in this case:
lm(AA ~ PVC + Aro)
and
lm(Aro ~ PVC)
I tried this, but it would seem that as the user changes the model matrix, it becomes erroneous and I need to explicitly specify each independent variable according to the modelMatrix.
```lm(as.formula(paste(unique(modelMatrix[,"target"])[1], "~ .",sep="")),
data=data.frame(valuesMatrix))
```
Do I need to set up 2 loops (1 nested) to fetch the source and target strings and paste them into the formula or am I overlooking some detail. Very confused.
Ideally I would like the user to be able to change the modelMatrix to include 1 or many lms and one or many independent variables for each lm. I would really appreciate your assistance as I am really hitting a wall here.
Thanks.
For your specific example this code should work -
source <- c("PVC","Aro","PVC")
target <- c("AA","AA","Aro")
modelMatrix <- data.frame(source = source, target = target)
valuesMatrix <- as.matrix(rbind(c(-2.677875504,0.76141471,0.006114699), c(0.330537781,-0.18462039,-0.265710261),
c(0.609826160,-0.62470233,0.715474554)))
colnames(valuesMatrix) <- c("PVC","Aro","AA")
unique.target <- as.character(unique(modelMatrix$target))
lm.models <- lapply(unique.target, function(x) {lm(as.formula(paste(x,"~ .", sep = "")),
data = data.frame(valuesMatrix[,colnames(valuesMatrix) %in%
c(x,as.character(modelMatrix$source[which(modelMatrix$target==x)]))]))})
You can use the ideas of the for loop but for loops can be to expensive especially if your modelMatrix get really large. It might not look as appealing but the lapply function is optimized for this type of job. The only other trick to this was to keep the columns required to perform the lm.
You can pull the results of each lm also but using:
lm[[1]] and lm[[2]]
I'm trying to write a function I can apply to a string vector or list instead of writing a loop. My goal is to run a regression for different endogenous variables and save the resulting tables. Since experienced R users tell us we should learn the apply functions, I want to give it a try. Here is my attempt:
Broken Example:
library(ExtremeBounds)
Data <- data.frame(var1=rbinom(30,1,0.2),var2=rbinom(30,1,0.2),var3=rnorm(30),var4=rnorm(30),var5=rnorm(30),var6=rnorm(30))
spec1 <- list(y=c("var1"),freevars=("var3"),doubtvars=c("var4","var5"))
spec2 <- list(y=c("var2"),freevars=("var4"),doubtvars=c("var3","var5","var6"))
specs <- c("spec1","spec2")
myfunction <- function(x){
eba <- eba(data=Data, y=x$y,
free=x$freevars,
doubtful=x$doubtvars,
reg.fun=glm, k=1, vif=7, draws=50, se.fun = se.robust, weights = "lri", family = binomial(logit))
output <- eba$bounds
output <- output[,-(3:7)]
}
lapply(specs,myfunction)
Which gives me an error that makes me guess that R does not understand when x should be "spec1" or "spec2". Also, I don't quite understand what lapply would try to collect here. Could you provide me with some best practice/hints how to communicate such things to R?
error: Error in x$y : $ operator is invalid for atomic vectors
Working example:
Here is a working example for spec1 without using apply that shows what I'm trying to do. I want to loop this example through 7 specs but I'm trying to get away from loops. The output does not have to be saved as a csv, a list of all outputs or any other collection would be great!
eba <- eba(data=Data, y=spec1$y,
free=spec1$freevars,
doubtful=spec1$doubtvars,
reg.fun=glm, k=1, vif=7, draws=50, se.fun = se.robust, weights = "lri", family = binomial(logit))
output <- eba$bounds
output <- output[,-(3:7)]
write.csv(output, "./Results/eba_pmr.csv")
Following the comments of #user20650, the solution is quite simple:
In the lapply command, use lapply(mget(specs),myfunction) which gets the names of the list elements of specs instead of the lists themselves.
Alternatively, one could define specs as a list: specs <- list(spec1,spec2) but that has the downside that the lapply command will return a list where the different specifications are numbered. The first version keeps the names of the specifications (spec1 and spec2) which which makes work with the resulting list much easier.
I've been working on a project for a little bit for a homework assignment and I've been stuck on a logistical problem for a while now.
What I have at the moment is a list that returns 10000 values in the format:
[[10000]]
X-squared
0.1867083
(This is the 10000th value of the list)
What I really would like is to just have the chi-squared value alone so I can do things like create a histogram of the values.
Is there any way I can do this? I'm fine with repeating the test from the start if necessary.
My current code is:
nsims = 10000
for (i in 1:nsims) {cancer.cells <- c(rep("M",24),rep("B",13))
malig[i] <- sum(sample(cancer.cells,21)=="M")}
benign = 21 - malig
rbenign = 13 - benign
rmalig = 24 - malig
for (i in 1:nsims) {test = cbind(c(rbenign[i],benign[i]),c(rmalig[i],malig[i]))
cancerchi[i] = chisq.test(test,correct=FALSE) }
It gives me all I need, I just cannot perform follow-up analysis on it such as creating a histogram.
Thanks for taking the time to read this!
I'll provide an answer at the suggestion of #Dr. Mike.
hist requires a vector as input. The reason that hist(cancerchi) will not work is because cancerchi is a list, not a vector.
There a several ways to convert cancerchi, from a list into a format that hist can work with. Here are 3 ways:
hist(as.data.frame(unlist(cancerchi)))
Note that if you do not reassign cancerchi it will still be a list and cannot be passed directly to hist.
# i.e
class(cancerchi)
hist(cancerchi) # will still give you an error
If you reassign, it can be another type of object:
(class(cancerchi2 <- unlist(cancerchi)))
(class(cancerchi3 <- as.data.frame(unlist(cancerchi))))
# using the ldply function in the plyr package
library(plyr)
(class(cancerchi4 <- ldply(cancerchi)))
these new objects can be passed to hist directly
hist(cancerchi2)
hist(cancerchi3[,1]) # specify column because cancerchi3 is a data frame, not a vector
hist(cancerchi4[,1]) # specify column because cancerchi4 is a data frame, not a vector
A little extra information: other useful commands for looking at your objects include str and attributes.
I'm using the library poLCA. To use the main command of the library one has to create a formula as follows:
f <- cbind(V1,V2,V3)~1
After this a command is invoked:
poLCA(f,data0,...)
V1, V2, V3 are the names of variables in the dataset data0. I'm running a simulation and I need to change the formula several times. Sometimes it has 3 variables, sometimes 4, sometimes more.
If I try something like:
f <- cbind(get(names(data0)[1]),get(names(data0)[2]),get(names(data0)[3]))~1
it works fine. But then I have to know in advance how many variables I will use. I would like to define an arbitrary vector
vars0 <- c(1,5,17,21)
and then create the formula as follows
f<- cbind(get(names(data0)[var0]))
Unfortunaly I get an error. I suspect the answer may involve some form of apply but I still don't understand very well how this functions work. Thanks in advance for any help.
Using data from the examples in ?poLCA this (possibly hackish) idiom seems to work:
library(poLCA)
vec <- c(1,3,4)
M4 <- poLCA(do.call(cbind,values[,vec])~1,values,nclass = 1)
Edit
As Hadley points out in the comments, we're making this a bit more complicated than we need. In this case values is a data frame, not a matrix, so this:
M1 <- poLCA(values[,c(1,2,4)]~1,values,nclass = 1)
generates an error, but this:
M1 <- poLCA(as.matrix(values[,c(1,2,4)])~1,values,nclass = 1)
works fine. So you can just subset the columns as long as you wrap it in as.matrix.
#DWin mentioned building the formula with paste and as.formula. I thought I'd show you what that would look like using the election dataset.
library("poLCA")
data(election)
vec <- c(1,3,4)
f <- as.formula(paste("cbind(",paste(names(election)[vec],collapse=","),")~1",sep=""))