Convert all variables into ordered factors - r

I am using the semTools package to carry out EFA using categorical data. The efaUnrotate() function requires variables as ordered factors.
I am trying to convert all of my already factor variables into an ordered one using a simple code, which does not seem to work unfortunately. I wonder if anyone had an explanation for this?
My data:
test <- structure(list(fp_weightloss = structure(c(1L, 1L, 1L, 1L, 1L,
1L), .Label = c("0", "1"), class = "factor"), fp_gripstrength = structure(c(1L,
2L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor"),
fp_walktime = structure(c(2L, 1L, 2L, 2L, 1L, 1L), .Label = c("0",
"1"), class = "factor"), fp_metmins = structure(c(2L, 1L,
1L, 1L, 2L, 1L), .Label = c("0", "1"), class = "factor")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -6L))
My code:
test_ord <- as.data.frame(sapply(test, as.ordered))
sapply(test_ord, class)
Results in no change:
fp_weightloss fp_gripstrength fp_walktime fp_metmins
"factor" "factor" "factor" "factor"
When I would expect:
class(as.ordered(test$fp_weightloss))
[1] "ordered" "factor"

The problem is sapply: best avoid it entirely, since its implicit conversions often invisibly mess with data, and they do here. Use lapply instead:
test_ord <- as.data.frame(lapply(test, as.ordered))
In general I prefer using vapply since it handles non-list return values, but getting vapply to work with S3 classes doesn’t seem possible.

Related

Arules in R - values that I exclude keep returning

I am applying the apriori algorithm in R with the database structured as followed (in dput()):
structure(list(Firm.s.global.reorganization = structure(c(1L,
2L, 1L, 2L, 2L), .Label = c("no", "yes"), class = "factor"),
Delivery.time = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("no",
"yes"), class = "factor"), Automation.of.production.process = structure(c(2L,
1L, 2L, 1L, 1L), .Label = c("no", "yes"), class = "factor"),
Poor.quality.of.offshored.production = structure(c(1L, 1L,
1L, 1L, 1L), .Label = c("no", "yes"), class = "factor"),
Made.in.effect = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("no",
"yes"), class = "factor"), Proximity.to.customers = structure(c(1L,
1L, 1L, 1L, 1L), .Label = c("no", "yes"), class = "factor")), row.names = c(NA,
5L), class = "data.frame")
When I run my code I only want values to return that have a "yes" value, thus I use the following code:
rules7 <- apriori(data4, parameter = list(support = 0.05,confidence = 0.5, maxlen=5), appearance=list(rhs=c("Firm.s.global.reorganization=yes"),
lhs=c("Delivery.time=yes",
"Automation.of.production.process=yes",
"Poor.quality.of.offshored.production=yes",
"Made.in.effect=yes",
"Proximity.to.customers=yes",
"Implementation.of.strategies.based.on.product.process.innovation=yes",
"Untapped.production.capacity=yes",
"Know.how.in.the.home.country=yes",
"Change.in.total.costs.of.sourcing=yes",
"Logistics.costs=yes",
"Need.for.greater.organizational.flexibility=yes",
"Economic.crisis=yes",
"Improve.customer.service=yes",
"Labour.costs..gap.reduction=yes",
"Government.support.to.relocation=yes",
"Proximity.to.suppliers=yes",
"Loyalty.to.the.home.country=yes"),default="lhs"))
But the results I keep receiving include:
lhs rhs support confidence coverage lift count
[1] {Made.in.effect=no,
Untapped.production.capacity=no,
Economic.crisis=yes} => {Firm.s.global.reorganization=yes} 0.02521008 1.0000000 0.02521008 3.838710 6
even though I explicitly used "Made.in.effect=yes" in my code to avoid the "no's".
How can I make sure I only receive "yes" results on both lhs and rhs?
Thanks!
well already fixed it.
Incase someone struggles with it in the future:
change the default to:
default="none"))

R How to tell a (t-test) function the needed column in an indirect way?

This is the data with the two columns 'weight' and 'group':
genderweight <- structure(list(weight = c(95.0626365041014, 65.9189881179415,
64.1289176345525, 66.1688823533661, 81.6245374434498, 85.1845386418439,
81.0348729928744, 92.161156464954, 86.3842380662202, 64.8582493776221,
62.3256566394621, 85.0980797936812, 80.0399859200671, 83.3698935236987,
62.8710960018134, 77.0097819307823, 62.9067362884316, 62.8505200797307,
62.2199243419118, 86.2430806667288, 83.8522826935738, 59.3086045947413,
82.578094058482, 62.9779809883867), group = structure(c(2L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L,
1L, 2L, 2L, 1L, 2L, 1L), levels = c("F", "M"), class = "factor")), row.names = c(NA,
-24L), class = c("tbl_df", "tbl", "data.frame"))
Package and library needed:
install.packages("rstatix")
library(rstatix)
I would like to use a placeholder in the following function:
t_test(genderweight, weight ~ group, detailed = TRUE)
My placeholder could be named i, for example, and afterwards I would like to run:
i <- "weight"
t_test(genderweight, i ~ group, detailed = TRUE)
Or alternatively, i could be a number, e.g. i = 1 and then I would like to run:
t_test(genderweight,genderweight[,i] ~ group, detailed = TRUE)
For both ways, I get an error message of the following type:
Error in `vec_as_location2_result()`:
! Can't extract columns that don't exist.
✖ Column `genderweight[, 1]` doesn't exist.
Run `rlang::last_error()` to see where the error occurred.
Is there a way to tell the function in an indirect way which column you want for the t-test?

how to remove duplicated strings and merge all columns strings in one?

I have a data looks like the following df
df<- structure(list(V1 = structure(c(5L, 1L, 2L, 3L, 4L), .Label = c("DNAJC11;FGOTG",
"MAPK14", "PPIB", "RBX1", "USP14"), class = "factor"), V2 = structure(c(4L,
3L, 2L, 1L, 1L), .Label = c("", "DNAJC9", "MAPK14", "USP14"), class = "factor"),
V3 = structure(c(3L, 2L, 4L, 5L, 1L), .Label = c("", "DNAJC11;FGOTG",
"GCLC", "GSR", "STIP1"), class = "factor")), .Names = c("V1",
"V2", "V3"), class = "data.frame", row.names = c(NA, -5L))
I want to merge all columns into one and then keep the unique ones
for example the output should look like this
USP14
DNAJC11;FGOTG
MAPK14
PPIB
RBX1
DNAJC9
GCLC
GSR
STIP1
I tried to use meltfunction but I could not figure out how to do this, any comment is appreciated. Thanks
unique(as.vector(as.matrix(df)))
To remove the entries with no characters:
vec<-unique(as.vector(as.matrix(df)))
vec[-which(vec=="")]
or, courtesy #rawr
Filter(nzchar, unique(as.vector(as.matrix(df))))

Error in r.squaredGLMM()

I am constructing GLMMs (using glmer() of "lme4" R package) and sometimes I get an error when estimating R2 values (using r.squaredGLMM() from "MuMIn" package).
The model I am trying to fit is simmilar to this one:
library(lme4)
lmA <- glmer(x~y+(1|w)+(1|w/k), data = data1, family = binomial(link="logit"))
Then, to estime R2, I use:
library(MuMIn)
r.squaredGLMM(lmA)
And I get this:
The result is correct only if all data used by the model has not changed since model was fitted. Error in .rsqGLMM(fam = family(x),
varFx = var(fxpred), varRe = varRe, : 'names' attribute [2] must be the same length as the vector [0]
Do you have any idea why this error appears? For instance, If I use only a single random factor (in this case, (1|w)) this error does not appear.
Here is my dataset:
data1 <-
structure(list(w = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L,
1L, 2L, 1L), .Label = c("CA", "CB"), class = "factor"), k = structure(c(4L,
4L, 3L, 3L, 3L, 4L, 1L, 3L, 2L, 3L, 2L), .Label = c("CAF01-CAM01",
"CAM01", "CBF01-CBM01", "CBM01"), class = "factor"), x = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L), y = c(-0.034973549,
0.671720643, 4.557044729, 5.347170897, 2.634240583, -0.555740207,
4.118277809, 2.599825716, 0.95853864, 4.327804344, 0.057331718
)), .Names = c("w", "k", "x", "y"), class = "data.frame", row.names = c(NA,
-11L))
Any thoughts?
This was a bug that has been fixed in version >= 1.15.8 (soon on CRAN, currently on R-Forge).

change the names for certain columns in a data frame [duplicate]

This question already has answers here:
Changing column names of a data frame
(18 answers)
Closed 7 years ago.
If I want to change the name from 2 column to the end , why my command does not work ?
fredTable <- structure(list(Symbol = structure(c(3L, 1L, 4L, 2L, 5L), .Label = c("CASACBM027SBOG",
"FRPACBW027SBOG", "TLAACBM027SBOG", "TOTBKCR", "USNIM"), class = "factor"),
Name = structure(1:5, .Label = c("bankAssets", "bankCash",
"bankCredWk", "bankFFRRPWk", "bankIntMargQtr"), class = "factor"),
Category = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "Banks", class = "factor"),
Country = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "USA", class = "factor"),
Lead = structure(c(1L, 1L, 3L, 3L, 2L), .Label = c("Monthly",
"Quarterly", "Weekly"), class = "factor"), Freq = structure(c(2L,
1L, 3L, 3L, 4L), .Label = c("1947-01-01", "1973-01-01", "1973-01-03",
"1984-01-01"), class = "factor"), Start = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "Current", class = "factor"), End = c(TRUE,
TRUE, TRUE, TRUE, FALSE), SeasAdj = c(FALSE, FALSE, FALSE,
FALSE, TRUE), Percent = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "Fed", class = "factor"),
Source = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "Res", class = "factor"),
Series = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("Level",
"Ratio"), class = "factor")), .Names = c("Symbol", "Name",
"Category", "Country", "Lead", "Freq", "Start", "End", "SeasAdj",
"Percent", "Source", "Series"), row.names = c("1", "2", "3",
"4", "5"), class = "data.frame")
Then in order to change the second column name to the end I use the following order but does not work
names(fredTable[,-1]) = paste("case", 1:ncol(fredTable[,-1]), sep = "")
or
names(fredTable)[,-1] = paste("case", 1:ncol(fredTable)[,-1], sep = "")
In general how one can change column names of specific columns for example
2 to end, 2 to 7 and etc and set it as the name s/he like
Replace specific column names by subsetting on the outside of the function, not within the names function as in your first attempt:
> names(fredTable)[-1] <- paste("case", 1:ncol(fredTable[,-1]), sep = "")
Explanation
If we save the new names in a vector newnames we can investigate what is going on under the hood with replacement functions.
#These are the names that will replace the old names
newnames <- paste("case", 1:ncol(fredTable[,-1]), sep = "")
We should always replace specific column names with the format:
#The right way to replace the second name only
names(df)[2] <- "newvalue"
#The wrong way
names(df[2]) <- "newvalue"
The problem is that you are attempting to create a new vector of column names then assign the output to the data frame. These two operations are simultaneously completed in the correct replacement.
The right way [Internal]
We can expand the function call with:
#We enter this:
names(fredTable)[-1] <- newnames
#This is carried out on the inside
`names<-`(fredTable, `[<-`(names(fredTable), -1, newnames))
The wrong way [Internal]
The internals of replacement the wrong way are like this:
#Wrong way
names(fredTable[-1]) <- newnames
#Wrong way Internal
`names<-`(fredTable[-1], newnames)
Notice that there is no `[<-` assignment. The subsetted data frame fredTable[-1] does not exist in the global environment so no assignment for `names<-` occurs.

Resources