t-tests on different groups by iteration in R - r

I have a group of patient scores such as:
P1 <- c(7.81,6.93,7.11)
P2 <- c(8.61,7.95,8.11)
P3 <- c(8.41,7.65,7.01)
....etc
I have a big group of healthy people scores such as:
HC <- c(5.22,4.87,6.93,5.27,6.01,4.55,.....etc)
I have listed the names of patients in a vector:
patients <- c('P1','P2','P3',....etc)
I am trying to perform t-tests for each of the patient scores against the healthy control group. I have written:
for (i in patients){t.test(patients[i],HC)}
I was expecting R to print the result of a load of t-tests to the console but it tells me:
Error in t.test.default(patients[i], HC) :
not enough 'x' observations
In addition: Warning message:
In mean.default(x) : argument is not numeric or logical: returning NA
I just need to get some P-values on the data and think this may be a simple syntax problem but don't work much with R and can't seem to find a quick answer. Any help would be great?

Use a list for patients containing the actual vectors, rather than the names of the vectors:
> patients <- list(P1, P2, P3)
> for (i in patients){print(t.test(i,HC)$p.value)}
[1] 0.005015573
[1] 0.0002672035
[1] 0.00899473

Try this: for (i in patients){t.test(get(i),HC)}
The problem is that i is cycling through your patients vector and returning a character. R doesn't know what to do with the character 'P1'. get tells R to look in the environment for an object called 'P1'.

Related

How to remove outliers for variable versus another variable in imported dataset in R?

I was asked to make boxplot of variable SAW for the 2 surgical intervention types defined by HSW and dataset name is mydata, Then i was asked to check if there any outliers in the boxplot and i found outliers but i can't remove them and i tried multiple ways but all goes with failure.
could you please help me with that issue?
and that is my boxplot
boxplot(mydata$SAW~mydata$HSW,main="SAW for two surgical")
no_outliers <- subset(mydata, mydata$SAW > (Q1 - 1.5*IQR) & mydata$HSW < (Q3 + 1.5*IQR))
This was my last trial but it gave me error says
Error in surgery$SAW : $ operator is invalid for atomic vectors
On way would be to use the boxplot object itself-
old <- boxplot(disp~am,mtcars)
# old$out has the outlier values stored
# filter the df using those values
new <- mtcars[!mtcars$disp %in% old$out,]
## new boxplot withut ouliers..
boxplot(disp~am,new)
#also
rstatix::identify_outliers()

Loop through a character vector to use in a function

I am conducting a methodcomparison study, comparing measurements from two different systems. My dataset has a large number of columns with variabels containing measurements from one of the two systems.
aX and bX are both measures of X, but from system a and b. I have about 80 pairs of variabels like this.
A simplified version of my data looks like this:
set.seed(1)
df <- data.frame(
ID = as.factor(rep(1:2, each=10)),
aX = rep(1:10+rnorm(10,mean=1,sd=0.5),2),
bX = rep(1:10+rnorm(10,mean=1,sd=0.5),2),
aY = rep(1:10+rnorm(10,mean=1,sd=0.5), 2),
bY = rep(1:10-rnorm(10,mean=1,sd=0.5),2))
head(df)
ID aX bX aY bY
1 1 1.686773 2.755891 2.459489 -0.6793398
2 1 3.091822 3.194922 3.391068 1.0513939
3 1 3.582186 3.689380 4.037282 1.8061642
4 1 5.797640 3.892650 4.005324 3.0269025
5 1 6.164754 6.562465 6.309913 4.6885298
6 1 6.589766 6.977533 6.971936 5.2074973
I am trying to loop through the elements of a character vector, and use the elements to point to columns in the dataframe. But I keep getting error messages when I try to call functions with variable names generated in the loop.
For simplicity, I have changed the loop to include a linear model as this produces the same type of error as I have in my original script.
#This line is only included to show that
#the formula used in the loop works when
#called with directly with the "real" column names
(broom::glance(lm(aX~bX, data = df)))$r.squared
[1] 0.9405218
#Now I try the loop
varlist <- c("X", "Y")
for(i in 1:length(varlist)){
aVAR <- paste0("a", varlist[i])
bVAR <- paste0("b", varlist[i])
#VAR and cVAR appear to have names identical column names in the df dataframe
print(c(aVAR, bVAR))
#Try the formula with the loop variable names
print((broom::glance(lm(aVAR~bVAR, data = df)))$r.squared)
}
The error messages I get when calling the functions from inside the loop vary according to the function I am calling, the common denominator for all the errors is that the occur when I try to use the character vector (varlist) to pick out specific columns.
Example of error messages:
rmcorr(ID, aVAR, bVAR, df)
Error in rmcorr(ID, aVAR, bVAR, df) :
'Measure 1' and 'Measure 2' must be numeric
or
broom::glance(lm(aVAR~bVAR, data = df))
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
In addition: Warning message:
In storage.mode(v) <- "double" : NAs introduced by coercion
Can you help me understand what goes wrong in the loop? Or suggest and show another way to acomplish what I am trying to do.
Variables aren't evaluated in formulas (the things with ~).
You can type
bert ~ ernie
and not get an error even if variables named bert and ernie do not exist. Formula store relationships between symbols/names and does not attempt to evaulate them. Also note we are not using quotes here. Variable names (or symbols) are not interchangeable with character values (ie aX is very different from "aX").
So when putting together a formula from string values, I suggest you use the reformualte() function. It takes a vector of names for the right-hand side and an optional value for the left hand side. So you would create the same formula with
reformulate("ernie", "bert")
# bert ~ ernie
And you can use the with your lm
lm(reformulate(bVAR, aVAR), data = df)
I'm too lazy to search for a duplicate on how to construct formulas programmatically, so here is a solution:
varlist <- c("X", "Y")
for(i in 1:length(varlist)){
#make these symbols:
aVAR <- as.symbol(paste0("a", varlist[i]))
bVAR <- as.symbol(paste0("b", varlist[i]))
#VAR and cVAR appear to have names identical column names in the df dataframe
print(c(aVAR, bVAR))
#Try the formula with the loop variable names
#construct the call to `lm` with `bquote` and `eval` the expression
print((broom::glance(eval(bquote(lm(.(aVAR) ~ .(bVAR), data = df)))))$r.squared)
}

apply using values of each line in a data.frame as parameters in R

I have 1 data.frame as follows, each line is a different Stock data :
Teste=data.frame(matrix(runif(25), nrow=5, ncol=5))
colnames(Teste) <- c("AVG_VOLUME","AVG_RETURN","VOL","PRICE","AVG_XX")
AVG_VOLUME AVG_RETURN VOL PRICE AVG_XX
1 0.7028197 0.9264265 0.2169411 0.80897110 0.3047671
2 0.7154557 0.3314615 0.4839466 0.63529520 0.5633933
3 0.4038030 0.4347487 0.3441471 0.07028743 0.7704912
4 0.5392530 0.6414982 0.4482528 0.11087518 0.3512511
5 0.8720084 0.9615865 0.8081017 0.45781973 0.0137508
What i want to do is to apply the function GBM from package sde (https://cran.r-project.org/web/packages/sde/sde.pdf) using the cols AVG_RETURN, VOL, PRICE as arguments for all lines in the data.frame.
Something like this :
Result <- apply(Teste,1,function(x) {
GBM(x[,"PRICE"],x[,"AVG_RETURN"],x[,"VOL"],1,252)
})
So i want the Result to be a data.frame that runs GBM for each Stock in the Teste data.frame.
How can i get this result ?
The answer to the narrow question about why you are getting errors is that when the apply function passes values it is only as a vector rather than a dataframe, so removing hte commas in the arguments to "[" will get you a result.
Result <- apply(Teste,1,function(x) {
GBM(x[,"PRICE"],x[,"AVG_RETURN"],x[,"VOL"],1,252)
})
If you need it to be a dataframe where each stock would be a column, and the input datastructure has meaningful stock names, then I suggest using:
dfRes <- setNames( data.frame(Result), rownames(Teste) )
I think the only way this could be meaningful in a risk analysis context is if many more simulation runs than these single instances are assembled in some higher level context.

R: Non-numeric argument to mathematical function

I have written a userdefined function:
epf <- function(z,x,noise=std_noise){
z_dims <- length(z)
std_noise <- 0.5*matrix(1,1,z_dims)
std_noise <- as.data.frame(std_noise)
obs_prob <- dnorm(z,x[1:z_dims],noise)
error <- prod(cbind(1,obs_prob))
return(error)
}
This function is called in a for-loop in another function:
w <- matrix(0,N,1)
for (i in 1:N){
w[i] <- epf(z,p[i,],R_noise)
}
where z is a 2-dimensional vector, N=1000, p is a dataframe of 1000 observations and 4 variables and R_noise is a dataframe og 1 observation and 4 variables.
Here I get the error: "Non-numeric argument to mathematical function", for the line obs_prob <- dnorm(z,x[1:z_dims],noise)
Can anyone help me with finding the error?
I have looked through questions similar to mine, but I still can't find the error in my code.
Edit:
Added definition of N
dnorm(as.matrix(z), x[1:a_dims], noise) may work better.
And more broadly speaking, a data frame with one row and two columns may be better expressed as a vector. Data frames look like matrices and as you put it 'two-dimensional vectors', but they are different in important aspects.
The same error may be occurring because you are feeding dnorm a second data frame in its last argument noise by passing R_noise.
Also, consider that p[i, ] has four values. It is being subsetted by obs_prob with x[1:z_dims]. In this case, z_dims will equal 2 since length(z) is 2. So you are evaluating dnorm(data.frame(z), p[1, ][1:2], data.frame(R_noise)).

Unable to Convert Chi-Squared Values into a Numeric Column in R

I've been working on a project for a little bit for a homework assignment and I've been stuck on a logistical problem for a while now.
What I have at the moment is a list that returns 10000 values in the format:
[[10000]]
X-squared
0.1867083
(This is the 10000th value of the list)
What I really would like is to just have the chi-squared value alone so I can do things like create a histogram of the values.
Is there any way I can do this? I'm fine with repeating the test from the start if necessary.
My current code is:
nsims = 10000
for (i in 1:nsims) {cancer.cells <- c(rep("M",24),rep("B",13))
malig[i] <- sum(sample(cancer.cells,21)=="M")}
benign = 21 - malig
rbenign = 13 - benign
rmalig = 24 - malig
for (i in 1:nsims) {test = cbind(c(rbenign[i],benign[i]),c(rmalig[i],malig[i]))
cancerchi[i] = chisq.test(test,correct=FALSE) }
It gives me all I need, I just cannot perform follow-up analysis on it such as creating a histogram.
Thanks for taking the time to read this!
I'll provide an answer at the suggestion of #Dr. Mike.
hist requires a vector as input. The reason that hist(cancerchi) will not work is because cancerchi is a list, not a vector.
There a several ways to convert cancerchi, from a list into a format that hist can work with. Here are 3 ways:
hist(as.data.frame(unlist(cancerchi)))
Note that if you do not reassign cancerchi it will still be a list and cannot be passed directly to hist.
# i.e
class(cancerchi)
hist(cancerchi) # will still give you an error
If you reassign, it can be another type of object:
(class(cancerchi2 <- unlist(cancerchi)))
(class(cancerchi3 <- as.data.frame(unlist(cancerchi))))
# using the ldply function in the plyr package
library(plyr)
(class(cancerchi4 <- ldply(cancerchi)))
these new objects can be passed to hist directly
hist(cancerchi2)
hist(cancerchi3[,1]) # specify column because cancerchi3 is a data frame, not a vector
hist(cancerchi4[,1]) # specify column because cancerchi4 is a data frame, not a vector
A little extra information: other useful commands for looking at your objects include str and attributes.

Resources