Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
How do I get the VIF value without Gvif and GVIF^(1/(2*Df)). I have tried the command vif(model) and need just the vif value but I get the output as gvif
If you want the VIF value from a regression model, the simplest solution is using the car package:
library(car)
vif(model)
Which returns the vif value:
gdp labour_participation m_per1000f
1.100277 1.457567 1.667722
time_prison
1.247356
If you want to calculate the VIF value manually (the harder way to doing it but without using any library), you can do so to verify that the results you got from the above is indeed correct:
vif_lp <- 1/(1-(summary(lm(labour_participation ~ gdp + m_per1000f + time_prison, crime))$r.squared))
vif_lp
# returns [1] 1.457567
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last month.
Improve this question
I have a question regarding the correlation coefficient.
Why, if both variables are numeric, does the coefficient give me N/A? Thanks
When I test different variables in a dependent, on several occasions I get N/A as a result. This happens when I do it between a numeric dependent and independent variable.
There is likely two possible reasons
One of the variables is constant
There are NA in your data, if so:
In R there are two functions that compute the pearson correlation, let's see an example.
Data
x <- rnorm(10)
y <- x;y[1] <- NA
There is the cor function
cor(x,y)
that will result in a NA by default. But if you change the argument use
cor(x,y,use = "na.or.complete")
It will result in 1. Another way is to use the function cor.test, that by default ignores missing values.
cor.test(x,y)
But since is a test function, the output is a list object. If you only want the coefficient. you can get the value, by:
cor.test(x,y)$estimate
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
So I have data on CpG sites, and a column which defines their chromosomal position (e.g. 10000).
How would I change these values such that I can attain values in a range dependent on that original value. For example 10000 would be +/- 500 (9500 - 10500).
I'm going to be using the same parameters for each variable regardless of it's value.
I have tried
df$upstream <- df$value - 500
df$downstream <- df$value + 500
Which returns the upper and lower values I need, but how do I get this 'range' into a single column (e.g. such that I can search for it in genomebrowser)?
I worked with such dataset during and on my side, to perform this, I use to create new columns on my dataset using (as mentioned in the comment):
df$upstream = df$position - 500
df$downstream = df$position + 500
Hope it helped
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Given a table I need to use apply() to find t the correlation between each one of the 8 variables in the state.x77 matrix and the Population variable. state.x77 is a built in matrix with 8 columns.
I had to first create a function called cor_var due to the instructions and then have to use apply(). So here is my input:
cor_var=function(v1,v2=state.x77[,"Income"]){cor(v1,v2)}
apply(mat,2,cor_var,v2=state.x77[,"Population"])
the v2 is the extra optional argument for apply() ... argument, so this should work but it is returning Error in cor(v1, v2) : incompatible dimensions. Any help on where I am wrong would be appreciated. I have to use cor_var and apply two functions btw, can't use lappy or mapply.
You can use :
apply(state.x77,2,function(x) cor(x, state.x77[,"Income"]))
#Population Income Illiteracy Life Exp Murder HS Grad Frost Area
# 0.2082276 1.0000000 -0.4370752 0.3402553 -0.2300776 0.6199323 0.2262822 0.3633154
We can use
apply(mat,2,cor_var,v2=state.x77[,"Population"])
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
set.seed(1234)
dataPartition <- sample(2,nrow(data),replace=TRUE,prob=c(0.7,0.3))
trainData <- data[dataPartition ==1,]
testData <- [dataPartition ==2,]
It partition your data into two groups.
sample(2,nrow(data),replace=TRUE,prob=c(0.7,0.3))
You sample a vector in the length of your matrix which is composed of 1 and 2 with probability of 0.7 and 0.3.
trainData <- data[dataPartition ==1,]
testData <- data[Partition ==2,] ## Fixed the brackets
This is just to divide your data into two in order to be able (i presume) validate a model.
Here is a more detailed answer to why divide your data into train and test
https://stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I am trying to make a histogram of grades. Here are my variables.
> grade <- factor(c("A","A","A","B","A","A","A","A","B","A","C","B","B","B"))
> numberBook <- c(53,42,40,40,39,34,34,30,28,24,22,21,20,16)
But when I plot it, I get an error message.
> hist(numberBook~grade)
Error in hist.default(numberBook ~ grade) : 'x' must be numeric
What can I do?
I'm not sure why you've got multiple letters so I've guessed that you want a total of all the A, B and Cs. This may not be quite right. I've recreated your data like this using rep and summing the counts of grades (could be wrong)
data <-c(rep("A",(53+42+40+34+34+30+28+22)), rep("B",(39+24+20+16+22)),rep("C",22))
Then I can plot the data using barplot:
barplot(prop.table(table(data)))
Barplot is probably what you want here.