Loop over non-standard variable names in R - r

I have a dataframe (df) with variables that look similar to vector-variables:
myvariable[1], myvariable[2] , myvariable[3] , etc.
However, if I want to refer to them, R automatically creates barticks around them:
df$`myvariable[1]`
I want to use those variables within a for-loop, and hence, want to change the number within the brackets automatically. Does anyone know how to do this?
PS: This question is different from other questions insofar as R doesn't see my variables as vector variables but rather as single variables that look the same. Hence, the []-part of my variables is seen as only some kind of string and not as a subsetting operator.
PS2: dput(head(zTT$subjects[, c("myvariable[1]","myvariable[3]","myvariable[4]")],4))
structure(list(\`myvariable[1]\` = c(2, 4, 2, 9), \`myvariable[3]\` = c(1,
1,2, 3), \`myvariable[4]\` = c(2, 4, 2, 7)), .Names = c("myvariable[1]",
"myvariable[3]", "myvariable[4]"), row.names = c(NA, 4L), class = "data.frame")

As akrun has suggested, you can use [[. The code below uses your own data frame to construct the string which corresponds to the list names.
temp <- structure(list(`myvariable[1]` = c(2, 4, 2, 9),
`myvariable[3]` = c(1, 1,2, 3),
`myvariable[4]` = c(2, 4, 2, 7)),
.Names = c("myvariable[1]", "myvariable[3]",
"myvariable[4]"), row.names = c(NA, 4L),
class = "data.frame")
for (i in c(1, 3, 4)) {
myVar <- paste0("myvariable[", i, "]")
print(temp[[myVar]])
}

Related

Execute ANOVA & Follow-Up Tests on multiple columns

I'm attempting to perform a Hinkin and Tracey content validation on potential scale items and have the following dataset (sample) with 74 unique columns:
cleandata <- structure(list(Condition = c("RS", "AS", "BGPS", "APCS", "OP", "TS"
), alt_energy = c(2, 5, 3, 3, 2, 2), animal_product = c(5, 3,
4, 4, 3, 1), deforest = c(5, 1, 4, 1, 2, 1)), row.names = c(NA,
6L), class = c("tbl_df", "tbl", "data.frame"))
Only one column (Condition) is the "dimension," and I want to see if the mean responses for the remaining 73 columns significantly differ between conditions. Basically, I want to see if the scale item successfully reflects only 1 dimension.
I want to run an anova and Tukey HSD on all the columns at once to get everything in one neat output:
test <- aov(as.matrix(cleandata[, -1]) ~ as.factor(Condition), data = cleandata)
summary(test, effect.size = "both", detailed = TRUE, observed = NULL)
But I'm unable to run the HSD follow-up
tukeyHSD(test)
Getting the following error: Error in model.tables.aov(x, "means") :
'model.tables' is not implemented for multiple responses
Is there anyway to loop the HSD so I get one clean output of anova results and follow-up pairwise comparisons?

comparing other data frames and selecting rows in for loop in R

I have a few complicated conditions in my algorithm and I can not handle them. I need help.
I have 4 data frames including index numbers and outputs like
u1 = data.frame(index = c(1,3,12,21,24),output =c(1,0,1,0,1), u2 = data.frame(index = c(2,5,16,1),output =c(0,1,1,0)` , u3 = data.frame(index = c(1,5,7,16),output =c(0,1,0,0) , u4 = data.frame(index = c(21,24,8),output =c(0,0,1).
I try to write a for(i in 1:4) loop in R (if you can solve without for loop, you are welcome). In each step, the ith loop will check the other data frames index numbers (for instance, in step i=1, check u2, u3, u4) and if any index number is repeated at least 2times in other data frames, I will take it. After that, I will check the output of these same indexes. If outputs of these indexes are same, I will select that index number. let me give an example on my data frames above for i =1;
check the indexes of u2, u3, and u4. Index number 5 is repeated 2 times (outputs are same, 1 ) and I will select it, index number 1 is repeated 2 times and outputs are the same I will select it and index number 16 repeated 2 times but outputs are different that is why I will not select it.
i =2;
check the indexes of u1, u3, and u4...
Thanks.
Perhaps this helps - get the 'u1', 'u2', ... objects in a list ('lst1'), then loop over the names of the list, subset the list without the loop name (setdiff), rbind, then, get the table, create a logical vector based on the frequency count for '1's, extract the names based on that, convert the named list to a two column data.frame (stack)
lst1 <- mget(ls(pattern = '^u\\d+$'))
stack(lapply(setNames(names(lst1), names(lst1)),
function(u) names(which(table(do.call(rbind, lst1[setdiff(names(lst1),
u)]))[,'1'] > 1))))[2:1]
ind values
1 u1 5
2 u4 5
data
u1 <- structure(list(index = c(1, 3, 12, 21, 24), output = c(1, 0,
1, 0, 1)), class = "data.frame", row.names = c(NA, -5L))
u2 <- structure(list(index = c(2, 5, 16, 1), output = c(0, 1, 1, 0)), class = "data.frame", row.names = c(NA,
-4L))
u3 <- structure(list(index = c(1, 5, 7, 16), output = c(0, 1, 0, 0)), class = "data.frame", row.names = c(NA,
-4L))
u4 <- structure(list(index = c(21, 24, 8), output = c(0, 0, 1)), class = "data.frame", row.names = c(NA,
-3L))

Correlation between variables under the for loop

I have an issue that is shown below. I tried to solve it but was not successful. I have a dataframe df1. I need to make a table of correlation between the variables within a for loop. Reason being I do not want to make the code look long and complicated.
df1 <- structure(list(a = c(1, 2, 3, 4, 5), b = c(3, 5, 7, 4, 3), c = c(3,
6, 8, 1, 2), d = c(5, 3, 1, 3, 5)), class = "data.frame", row.names =
c(NA, -5L))
I tried with the below code using 2 for loops
fv <- as.data.frame(combn(names(df1),2,paste, collapse="&"))
colnames(fv) <- "ColA"
fv$ColB <- sapply(strsplit(fv$ColA,"\\&"),'[',1)
fv$ColC <- sapply(strsplit(fv$ColA,"\\&"),'[',2)
asd <- list()
for (i in fv$ColB) {
for (j in fv$ColC) {
asd[i,j] <- as.data.frame(cor(df1[,i],df1[,j]))}}
May I know what wrong I am doing
We can apply cor directly on the data.frame and convert to 'long' format with melt. As the values in the lower triangular part is the mirror values of those in the upper triangular part, either one of these can be assigned to NA and then do the melt
library(reshape2)
out[lower.tri(out, diag = TRUE)] <- NA
melt(out, na.rm = TRUE)

Rename factors in a spineplot with R

Is it possible to rename factor's in a spineplot? The names of my factors are to long, so they overlap.
Thanks for your advices!
Reading the help for spineplot, it is clear that you can pass the parameters yaxlabels and xaxlabels to control the vectors for annotation of the axes.
One useful function is abbreviate which will shorten character strings.
Combining this information with the spineplot example gives:
treatment <- factor(rep(c(1, 2), c(43, 41)), levels = c(1, 2),
labels = c("placebo", "treated"))
improved <- factor(rep(c(1, 2, 3, 1, 2, 3), c(29, 7, 7, 13, 7, 21)),
levels = c(1, 2, 3),
labels = c("none", "some", "marked"))
spineplot(improved ~ treatment, yaxlabels=abbreviate(levels(improved), 2))
Not all of the plot functions in R have this type of parameter. For a more general solution, it might be necessary to rename the factors before passing to a plot function. You can access and modify factor names using the levels function:
levels(treatment) <- abbreviate(levels(treatment), 5)
plot(improved ~ treatment)

Rearrange data for ANOVA

I haven't quite got my head around R and how to rearrange data. I have an old SPSS data file that needs rearranging so I can conduct an ANOVA in R.
My current data file has this format:
ONE <- matrix(c(1, 2, 777.75, 609.30, 700.50, 623.45, 701.50, 629.95, 820.06, 651.95,"nofear","nofear"), nr=2,dimnames=list(c("1", "2"), c("SUBJECT","AAYY", "BBYY", "AAZZ", "BBZZ", "XX")))
And I need to rearrange it to this:
TWO <- matrix(c(1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 777.75, 701.5, 700.5, 820.06, 609.3, 629.95, 623.95, 651.95), nr=8, dimnames=list(c("1", "1", "1", "1", "2", "2", "2", "2"), c("SUBJECT","AA", "ZZ", "XX", "RT")))
I am sure that there is an easy way of doing it, rather than hand coding. Thanks for the consideration.
This should do it. You can tweak it a bit, but this is the idea:
library(reshape)
THREE <- melt(as.data.frame(ONE),id=c("SUBJECT","XX"))
THREE$AA <- grepl("AA",THREE$variable)
THREE$ZZ <- grepl("ZZ",THREE$variable)
THREE$variable <- NULL
# cleanup
THREE$XX <- as.factor(THREE$XX)
THREE$AA <- as.numeric(THREE$AA)
THREE$ZZ <- as.numeric(THREE$ZZ)
Reshape and reshape() both help with this kind of stuff but in this simple case where you have to generate the variables hand coding is pretty easy, just take advantage of automatic replication in R.
TWO <- data.frame(SUBJECT = rep(1:2,each = 4),
AA = rep(1:0, each = 2),
ZZ = 0:1,
XX = 1,
RT = as.numeric(t(ONE[,2:5])))
That gives the TWO you asked for but it doesn't generalize to a larger ONE easily. I think this makes more sense
n <- nrow(ONE)
TWO <- data.frame(SUBJECT = rep(ONE$SUBJECT, 4),
AB = rep(1:0, each = n),
YZ = rep(0:1, each = 2*n),
fear = ONE$XX,
RT = unlist(ONE[,2:5]))
This latter one gives more representative variable names, and handles the likely case that your data is actually much bigger with XX (fear) varying and more subjects. Also, given that you're reading it in from an SPSS data file then ONE is actually a data frame with numeric numbers and factored character columns. The reshaping was only this part of the code...
TWO <- data.frame(SUBJECT = rep(ONE$SUBJECT, 4),
fear = ONE$XX,
RT = unlist(ONE[,2:5]))
You could add in other variables afterward.

Resources