I'm using the rela package to check whether I can use PCA in my data.
paf.neur2 <- paf(neur2)
summary(paf.neur2)
# [1] "Your dataset is not a numeric object."
I want to see the KMO (The Kaiser-Meyer-Olkin measure of sampling adequacy test). How to do that?
Output of str(neur2)
'data.frame': 1457 obs. of 66 variables:
$ userid : int 200 387 458 649 931 991 1044 1075 1347 1360 ...
$ funct : num 3.73 3.79 3.54 3.04 3.81 ...
$ pronoun: num 2.26 2.55 2.49 1.98 2.71 ...
.
.
.
$ time : num 1.68 1.87 1.51 1.03 1.74 ...
$ work : num 0.7419 0.2311 -0.1985 -1.6094 -0.0619 ...
$ achieve: num 0.174 0.2469 0.1823 -0.478 -0.0513 ...
$ leisure: num 0.2852 0.0296 0.0583 -0.3567 -0.0408 ...
$ home : num -0.844 -0.58 -0.844 -2.207 -1.079 ...
.
Variables are all numeric.
According to ?paf, object is a numeric dataset (usually a coerced matrix from a prior data frame)
So you need to turn your data.frame neur2 into a matrix: as.matrix(neur2).
Here is a reproduction of your problem using the Seatbelts dataset:
library(rela)
Belts <- Seatbelts[,1:7]
class(Belts)
# [1] "mts" "ts" "matrix"
Belts <- as.data.frame(Belts)
# [1] "data.frame"
paf.belt <- paf(Belts)
[1] "Your dataset is not a numeric object."
Belts <- as.matrix(Belts)
class(Belts)
# [1] "matrix"
paf.belt <- paf(Belts) # Works
Two options which can do it for you:
kmo_DIY <- function(df){
csq = cor(df)^2
csumsq = (sum(csq)-dim(csq)[1])/2
library(corpcor)
pcsq = cor2pcor(cor(df))^2
pcsumsq = (sum(pcsq)-dim(pcsq)[1])/2
kmo = csumsq/(csumsq+pcsumsq)
return(kmo)
}
or
the function KMO() from the psych package.
Related
I'm using the following code to try to transform my response variable for regression. Seems to need a log transformation.
bc = boxCox(auto.tf.lm)
lambda.mpg = bc$x[which.max(bc$y)]
auto.tf.bc <- with(auto_mpg, data.frame(log(mpg), as.character(cylinders), displacement**.2, log(as.numeric(horsepower)), log(weight), log(acceleration), model_year))
auto.tf.bc.lm <- lm(log(mpg) ~ ., data = auto.tf.bc)
view(auto.tf.bc)
I am receiving this error though.
Error in Math.data.frame(mpg) :
non-numeric variable(s) in data frame: manufacturer, model, trans, drv, fl, class
Not sure how to resolve this. The data is in a data frame, not csv.
Here's the output from str(auto.tf.bc). Sorry for such bad question formatting.
'data.frame': 392 obs. of 7 variables:
$ log.mpg. : num 2.89 2.71 2.89 2.77 2.83 ...
$ as.character.cylinders.: chr "8" "8" "8" "8" ...
$ displacement.0.2 : num 3.14 3.23 3.17 3.14 3.13 ...
$ log.horsepower. : num 4.87 5.11 5.01 5.01 4.94 ...
$ log.weight. : num 8.16 8.21 8.14 8.14 8.15 ...
$ log.acceleration. : num 2.48 2.44 2.4 2.48 2.35 ...
$ model_year : num 70 70 70 70 70 70 70 70 70 70 ...
removing the cylinders doesn't change anything.
I'm doing a regression analysis considering fixed effects using plm() from package plm. I have selected the twoways method to account for both time and individual effects. However, after runing the below code I keep receiving this message:
Error in pdata.frame(data, index) :
variable id does not exist (individual index)
Here the code:
pdata <- DATABASE[,c(2:4,13:21)]
pdata$id <- group_indices(pdata,ISO3.p,Productcode)
coutnin <- dcast.data.table(pdata,ISO3.p+Productcode~.,value.var = "id")
setcolorder(pdata,neworder=c("id","Year"))
pdata <- pdata.frame(pdata,index=c("id","Year"))
reg <- plm(pdata,diff(TV,1) ~ diff(RERcp,1)+diff(GDPR.p,1)-diff(GDPR.r,1), effect="twoways", model="within", index = c("id","Year"))
Please mind that pdata structure shows that there are multiple levels in the id variable which is in numeric form, I tried initially to use a string type variable but I keep receiving the same outcome:
Classes ‘data.table’ and 'data.frame': 1211800 obs. of 13 variables:
$ id : int 4835 6050 13158 15247 17164 18401 19564 23553 24895 27541 ...
$ Year : int 1996 1996 1996 1996 1996 1996 1996 1996 1996 1996 ...
$ Productcode: chr "101" "101" "101" "101" ...
$ ISO3.p : Factor w/ 171 levels "ABW","AFG","AGO",..: 8 9 20 22 27 28 29 34 37 40 ...
$ e : num 0.245 -0.238 1.624 0.693 0.31 ...
$ RERcp : num -0.14073 -0.16277 1.01262 0.03908 -0.00243 ...
$ RERpp : num -0.1712 NA NA NA -0.0952 ...
$ RER_GVC : num -3.44 NaN NA NA NaN ...
$ GDPR.p : num 27.5 26.6 23.5 20.3 27.8 ...
$ GDPR.r : num 30.4 30.4 30.4 30.4 30.4 ...
$ GVCPos : num 0.141 0.141 0.141 0.141 0.141 ...
$ GVCPar : num 0.436 0.436 0.436 0.436 0.436 ...
$ TV : num 17.1 17.1 17.1 17.1 17.1 ...
- attr(*, ".internal.selfref")=<externalptr>
When I convert the data.table into a pdata.frame I do not receive any warning, it happens only after I run the plm function. From running View(table(index(pdata), useNA = "ifany")) it displays no value larger than 1, therefore I assume I have no duplicates obs in my data.
Try to put the data argument at the second place in the plm statement. In case pdata has been converted to a pdata.frame already, leave out the index argument in the plm statement, i.e., try this:
reg <- plm(diff(TV,1) ~ diff(RERcp,1)+diff(GDPR.p,1)-diff(GDPR.r,1), data = pdata, effect = "twoways", model = "within")
I'm trying to fit PLSR model, but I'm doing something wrong. Below, you can see how I created data frame and its structure.
reflektance <- read_excel("data/reflektance.xlsx", na = "NA")
reflektance <- dput(reflektance)
pH <- read_excel("data/rijen2016.xls", na = "NA")
pH <- na.omit(pH)
pH <- dput(pH)
reflektance<-aggregate(reflektance[, 2:753], list(reflektance$Vzorek), mean)
colnames(reflektance)[colnames(reflektance)=='Group.1']<-'Vzorek'
datapH <- merge(pH, reflektance, by="Vzorek")
datasetpH <- data.frame(pH=datapH[,2], ref=I(as.matrix(datapH[, 3:754], 22, 752)))
Problem is with using "plsr", because result is this error:
ph1<-plsr(pH ~ ref, ncomp = 5, data=datasetpH)
Error in pls::mvr(ref ~ pH, ncomp = 5, data = datasetpH, method = "kernelpls") :
Invalid number of components, ncomp
dput(reflectance):
https://jpst.it/RyyS
Here you can see structure of table datapH:
'data.frame': 22 obs. of 754 variables:
$ Vzorek: chr "5 - P01" "5 - P02" "5 - P03" "5 - R1 - A1" ...
$ pH/H2O: num 6.96 6.62 7.02 5.62 5.97 6.12 5.64 5.81 5.61 5.47 ...
$ 325 : num 0.017 0.0266 0.0191 0.0241 0.016 ...
$ 326 : num 0.021 0.0263 0.0154 0.0264 0.0179 ...
$ 327 : num 0.0223 0.0238 0.0147 0.028 0.0198 ...
...
And here structure of table datasetpH:
'data.frame': 22 obs. of 2 variables:
$ pH : num 6.96 6.62 7.02 5.62 5.97 6.12 5.64 5.81 5.61 5.47 ...
$ ref: AsIs [1:22, 1:752] 0.016983.... 0.026556.... 0.019059.... 0.024097.... 0.016000.... ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "325" "326" "327" "328" ...
Do you have any advice and solution? Thank you
The problem seems to come from one of your columns containing only NA's.
The last line of the output of names(df)gives:
[745] "1068" "1069" "1070" "1071" "1072" "1073" "1074" "1075" NA
Using your data + some randomly generated values for pH (which isn't in the reflektance dataframe, named df here):
test=data.frame(pH=rnorm(23,5,2), ref=I(as.matrix(df[, 2:752], 22, 751)))
pls::plsr(pH ~ ref, data=test)
Error in matrix(0, ncol = ncomp, nrow = npred) :
invalid 'ncol' value (< 0)
Note that the indexing is a bit different from yours. I didn't have the second column in df (the one that contains pH in yours).
If I remove the last column which contains NA's :
test=data.frame(pH=rnorm(23,5,2), ref=I(as.matrix(df[, 2:752], 22, 751)))
pls::plsr(pH ~ ref, data=test)
Partial least squares regression , fitted with the kernel algorithm.
Call:
plsr(formula = pH ~ ref, data = test)
Let me know if that fixes it.
I have a data frame:
rawDataLogged
I have a function:
doForRow <- function(row) {
transpose <- t(row);
transpose <- transpose[like(row.names(transpose), "H.M")]
frame <- data.frame(transpose)
frame$BR <- c(1,1,2,2)
frame$TR <- c(1,2,1,2)
colnames(frame)[1] <- "Log2Ratio"
frame$Log2Ratio <- as.numeric(levels(frame$Log2Ratio))[frame$Log2Ratio]
summ <- summary(aov(Log2Ratio ~ BR + Error(TR), data=frame))
summ[[2]][[1]]["BR",]$'Pr(>F)'
}
If I execute my function with a row from my data frame, I get a result:
> doForRow(rawDataLogged[5,])
[1] 0.4973168
However if I try to use 'apply' to get the results for all my rows, it does not work:
tmp <- apply(rawDataLogged, 1, doForRow)
Error in $<-.data.frame(*tmp*, "BR", value = c(1, 1, 2, 2)) :
replacement has 4 rows, data has 0
When I place a breakpoint in my own function, I see that 'row' is empty, as in nothing seems to be getting passed into my function by apply.
Any ideas why this could be happening? I've spent hours trying to solve this myself, perhaps a loop would be easiest instead of an apply family function. I'm at a loss as to why my function is called without any row data.
I have placed an R data file containing the 'rawDataLogged' object at this url: Link which could be used for debugging. Example data created using dput: Link
Here is a dump from str to show the structure of my data frame:
'data.frame': 1262 obs. of 15 variables:
$ Protein.IDs : Factor w/ 1262 levels "sp|A0AVT1|UBA6_HUMAN;tr|H0Y8S8|H0Y8S8_HUMAN",..: 654 190 894 196 834 268 474 1221 366 973 ...
$ Majority.protein.IDs : Factor w/ 1262 levels "sp|A0AVT1|UBA6_HUMAN",..: 654 190 894 196 834 268 474 1221 366 973 ...
$ Ratio.M.L.normalized.X1.1: num -0.27 -0.707 0.244 -0.728 -2.025 ...
$ Ratio.H.L.normalized.X1.1: num 0.0036 0.0588 -0.0886 0.1561 -0.0843 ...
$ Ratio.H.M.normalized.X1.1: num 0.339 0.66 -0.211 0.477 1.926 ...
$ Ratio.M.L.normalized.X1.2: num -0.132 -0.661 0.283 -1.045 -1.223 ...
$ Ratio.H.L.normalized.X1.2: num -0.07779 0.10273 -0.00251 -0.09755 0.18929 ...
$ Ratio.H.M.normalized.X1.2: num 0.0793 0.7718 -0.2657 0.9651 1.3532 ...
$ Ratio.M.L.normalized.X3.1: num -3.55 -2.08 -1.99 -1.98 -1.85 ...
$ Ratio.H.L.normalized.X3.1: num 0.1336 0.0777 -0.1014 -0.3478 -0.0259 ...
$ Ratio.H.M.normalized.X3.1: num -0.187 2.259 1.852 1.511 1.928 ...
$ Ratio.M.L.normalized.X3.2: num 0.106 -2.118 -1.864 -2.364 -1.847 ...
$ Ratio.H.L.normalized.X3.2: num 0.0141 0.0746 -0.0315 -0.1772 -0.0936 ...
$ Ratio.H.M.normalized.X3.2: num -0.143 2.248 1.842 2.279 1.758 ...
$ id : int 1369 564 2170 577 1966 700 1050 1357 855 2482 ...
I have a data frame in R that I have read in from a csv file. How can I append a string ("EA") to the end of all column names? I have figured out code that works for a single column, but for some reason my loop does not return renamed fields.
Here is the dataframe:
> str(mydataframe)
'data.frame': 8368 obs. of 4 variables:
$ gene: Factor w/ 8368 levels "A1BG","A1CF",..: 6949 4379 7111 4691 2331 4914 506 4985 7109 2072 ...
$ p : num 1.23e-09 1.05e-07 1.20e-07 2.53e-07 6.67e-07 ...
$ beta: num 2.86 2.52 2.51 1.72 2.34 ...
$ se : num 0.471 0.474 0.474 0.334 0.471 ...
Here is the code:
for(i in names(mydataframe)){
i_renamed <- paste(i, "EA", sep=".")
mydataframe$i_renamed <- mydataframe$i
mydataframe$i <- NULL
}
...but afterwards the object is still the same
> str(mydataframe)
'data.frame': 8368 obs. of 4 variables:
$ gene: Factor w/ 8368 levels "A1BG","A1CF",..: 6949 4379 7111 4691 2331 4914 506 4985 7109 2072 ...
$ p : num 1.23e-09 1.05e-07 1.20e-07 2.53e-07 6.67e-07 ...
$ beta: num 2.86 2.52 2.51 1.72 2.34 ...
$ se : num 0.471 0.474 0.474 0.334 0.471 ...
The desired result is a field "gene.EA" that is identical to the original "gene" field, etc for all columns
Thank you
You can avoid trying to use a loop to do this.
names(mydataframe) <- paste0(names(mydataframe), '.EA')
Or explicitly, you could do:
mydataframe <- setNames(mydataframe, paste0(names(mydataframe), '.EA'))