Dataset summation in R - r

I have a dataset in R with two columns labelled x and y each with over 1000 values. I need to find sum((xi^2-xbar^2)(yi-ybar))/sum((xi-xbar)^4) for a linear regression problem. All I can think to use is:
sum(((data$x)^2-mean(data$x)^2)(data$y-mean(data$y)))/sum((data$x-mean(data$x))^4)
But this just gives me Error: attempt to apply non-function. I haven't got a clue how to correct this. Any help would be much appreciated.

Question: How do you figure out what the problem is in an expression that is visually overwhelming?
Answer: take it apart piece by piece.
df <- data.frame(x = rnorm(10), y = rnorm(10))
df$x^2
# works fine
df$x^2 - mean(x)^2
# works fine **SEE NOTE**
sum(df$x^2 - mean(x)^2)
# works fine
# sum(DF$x^2 - mean(x)^2)(data$y-mean.... oh i see
You're trying to multiply by putting parens next to each other. Use *
NOTE: NO IT DOESN'T ... on a second pass, you might discover that your values aren't correct, but this isn't what throws the error if you have an x object already in your environment (and that object doesn't have any NA values)

I think this is related to the () and how you refer x and y variable from data. Try the following.
sum(((data$x)^2-(mean(data$x))^2)*(data$y-mean(data$y)))/sum((data$x-(mean(data$x))^2))

Related

Having problem with having a fraction result

I was trying to solve a basic matrix problem. .
I used :
A<- matrix(c(2,7,5,7), 2,2)
b<- c(8,12)
solve(A,b, fractions = TRUE)
However, my result only gives me results in decimal places. How can get fractions results?
I also want to plot this equation above.
I used:
plotEqn(A,b)
However, it tells me this equation can't be found. Can I have some advice please?
Thank you very much!!!
For your first question,
MASS::fractions(solve(A,b))
gives {4/21, 32/21} (note that you won't always be guaranteed the correct answer, as R does floating-point calculation unlike e.g. Mathematica)
For your second question, it looks like the plotEqn() function is in the matlib package: if you have that package installed, then either first loading the package (with library("matlib")) or matlib::plotEqn(A,b) should work.
On closer inspection it looks like you want matlib::Solve() for the first question (note that R is case-sensitive, so solve and Solve are different):
library(matlib)
Solve(A,b, fraction=TRUE)
## x1 = 4/21
## x2 = 32/21

How to counter the 'non-numeric matrix extent' error in R?

I'm trying to generate a data frame of simulated values from the student's t distribution using the standard stochastic equation. The function I use is as follows:
matgen<-function(means,chi,covariancematrix)
{
cols<-ncol(means);
normals<-mvrnorm(n=500,mu=means,Sigma = covariancematrix);
invgammas<-rigamma(n=500,alpha=chi/2,beta=chi/2);
gen<-as.data.frame(matrix(data=NA,ncol=cols,nrow=500));
i<-1;
while(i<=500)
{
gen[i,]<-t(means)+normals[i,]*sqrt(invgammas[i]);
i<=i+1;
}
return(gen);
}
If it's not clear, I'm trying to create an empty data frame, that takes in values in cols number of columns and 500 rows. The values are numeric, of course, and R tells me that in the 9th row:
gen<-as.data.frame(matrix(data=NA,ncol=cols,nrow=500));
There's an error: 'non-numeric matrix extent'.
I remember using as.data.frame() to convert matrices into data frames in the past, and it worked quite smoothly. Even with numbers. I have been out of touch for a while, though, and can't seem to recollect or find online a solution to this problem. I tried is.numeric(), as.numeric(), 0s instead of NA there, but nothing works.
As Roland pointed out, one problem is, that col doesn't seem to be numeric. Please check if means is a dataframe or matrix, e.g. str(means). If it is, your code should not result in the error: 'non-numeric matrix extent'.
You also have some other issues in your code. I created a simplified example and pointed out the bugs I found as comments in the code:
library(MASS)
library(LearnBayes)
means <- cbind(c(1,2,3),c(4,5,6))
chi <- 10
matgen<-function(means,chi,covariancematrix)
{
cols <- ncol(means) # if means is a dataframe or matrix, this should work
normals <- rnorm(n=20,mean=100,sd=10) # changed example for simplification
# normals<-mvrnorm(n=20,mu=means,Sigma = covariancematrix)
# input to mu of mvrnorm should be a vector, see ?mvrnorm; but this means that ncol(means) is always 1 !?
invgammas<-rigamma(n=20,a=chi/2,b=chi/2) # changed alpha= to a and beta= to b
gen<-as.data.frame(matrix(data=NA,ncol=cols,nrow=20))
i<-1
while(i<=20)
{
gen[i,]<-t(means)+normals[i]*sqrt(invgammas[i]) # changed normals[i,] to normals [i], because it is a vector
i<-i+1 # changed <= to <-
}
return(gen)
}
matgen(means,chi,covariancematrix)
I hope this helps.
P.S. You don't need ";" at the end of every line in R

Missing function(x) in defined function

I understand that in the following
aa <- sapply(c("BMI","KOL"),function(x) as.formula(paste('Surv(BL_AGE,CVD_AGE,INCIDENT_CVD) ~', paste(colnames(s)[c(21,259,330,380)], collapse='+'))))
I am missing x
but i really don't understand how and where to insert it to be correct.
Thank you for any help.
Making this an answer instead of a comment due to amount of text.
If I understand you correctly, you're trying to iterate over a list of variables, which you want to add (each in turn) to a set of independent variables in a survival model. The issue in the code you gave is that you don't give x a place. There are several approaches to do so.
The first one is very similar to what you're doing, and creates the formulas. I demonstrate this using the 'cancer' dataset:
library(survival)
data(cancer)
myvars <- c("meal.cal","wt.loss")
a1 <- sapply(myvars,function(x){
as.formula(sprintf("Surv(time, status)~age+sex+%s",x))
}
)
#then we can fit our models
lapply(a1,function(x){coxph(formula=x,data=cancer)})
In my opinion, this is a bit convoluted and can be done in one step:
models <- lapply(myvars, function(x){
form <- as.formula(sprintf("Surv(time, status)~age+sex+%s",x))
fit <- coxph(formula=form, data=cancer)
return(fit)
})
Using the code you started with, we can simply add 'x' to the vector of dependent variables. However, this is not very readable code and I'm always a bit nervous about feeding column indices to models. You might be safer using variable names instead.
aa <- sapply(c("BMI","KOL"),function(x) as.formula(paste('Surv(BL_AGE,CVD_AGE,INCIDENT_CVD) ~', paste(c(x,colnames(s)[c(21,259,330,380)]), collapse='+'))))

How to do box plots on a range of variables

I have a data matrix with approximately one hundred variables and I want to do box plots of these variables. Doing them one by one is possible, but tedious. The code I use for my box plots is:
boxplot(myVar ~ Group*Trt*Time,data=exp,col=c('red','blue'),frame.plot=T,las=2, ylab='Counts', at=c(1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19))
I started doing them one by one, but realized there must be better options. So, the boxplot call will take only one variable at at time (I may be wrong), so I am looking for a way to get it done in one go. A for loop? Next, I would like to print the name of the current variable (= the colName) on the plot in order to keep them apart.
Appreciate suggestions.
Thank you.
jd
Why not try the following:
data(something)
panel.bxp <- function(x, ...)
{
a <- par("a"); on.exit(par(a))
par(a = c(0, 2, a[3:4]))
boxplot(x, add=TRUE)
}
Then, to run the function, you can try something like the following:
pairs(something, diag.panel = panel.bxp, text.panel = function(...){})
EDIT: There is also a nice link to an article here on R-bloggers which you might want to have a look at.
Being very new to R, I've tried to follow my 'old' thinking - making a for-loop. Here is what I came up with. Probably very primitive, and therefore, I'd appreciate comments/suggestions. Anyway: the loop:
for (i in 1:ncol(final)) {
#print(i)
c <- colnames(final)[i]
#print(c)
b <- final[,i]
#b <- t(b)
#dim(b)
#print(b)
exp <- data.frame(Group,Trt,Time,b)
#dim(exp)
#print(exp)
boxplot(b ~ Group*Trt*Time,data=exp,col=c('red','blue'),frame.plot=T, las=2, ylab='Counts',main=c, at=c(1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19))
}
The loop runs through the data matrix 'final', (48rows x 67cols). Picks up the column header, c, which is used in the boxplot call as main title. Picks up the data column, b. Sets up the experiment using the Group, Trt, and Time factors established outside the loop, and calls the boxplot.
This seem to do what I want. Oddly, Rstudio does not allow more than 25 (approx) plots to be stored in the plots console, so I have to run this loop in a couple of rounds.
Anyway, sorry for answering my own question. Better solutions are greatly appreciated since my way is pretty amateourish, I suspect.

Creating formulas in R involving an arbitrary number of variables

I'm using the library poLCA. To use the main command of the library one has to create a formula as follows:
f <- cbind(V1,V2,V3)~1
After this a command is invoked:
poLCA(f,data0,...)
V1, V2, V3 are the names of variables in the dataset data0. I'm running a simulation and I need to change the formula several times. Sometimes it has 3 variables, sometimes 4, sometimes more.
If I try something like:
f <- cbind(get(names(data0)[1]),get(names(data0)[2]),get(names(data0)[3]))~1
it works fine. But then I have to know in advance how many variables I will use. I would like to define an arbitrary vector
vars0 <- c(1,5,17,21)
and then create the formula as follows
f<- cbind(get(names(data0)[var0]))
Unfortunaly I get an error. I suspect the answer may involve some form of apply but I still don't understand very well how this functions work. Thanks in advance for any help.
Using data from the examples in ?poLCA this (possibly hackish) idiom seems to work:
library(poLCA)
vec <- c(1,3,4)
M4 <- poLCA(do.call(cbind,values[,vec])~1,values,nclass = 1)
Edit
As Hadley points out in the comments, we're making this a bit more complicated than we need. In this case values is a data frame, not a matrix, so this:
M1 <- poLCA(values[,c(1,2,4)]~1,values,nclass = 1)
generates an error, but this:
M1 <- poLCA(as.matrix(values[,c(1,2,4)])~1,values,nclass = 1)
works fine. So you can just subset the columns as long as you wrap it in as.matrix.
#DWin mentioned building the formula with paste and as.formula. I thought I'd show you what that would look like using the election dataset.
library("poLCA")
data(election)
vec <- c(1,3,4)
f <- as.formula(paste("cbind(",paste(names(election)[vec],collapse=","),")~1",sep=""))

Resources