Partial correlation in R without correlating everything - r

I need help figuring out how to write the R code for a partial correlation, I'm still fairly new to R. I have a dataset with 22 columns of interval data. I'm trying to run a partial correlation of columns 19:22 with columns 2:18, controlling for/partialling out column 1. I've used the following code:
par.r=partial.r(dataset, c(2:22), c(1))
The problem is that this gives me everything correlated together, which isn't what I'm looking for. If this was a standard correlation I would use the below code:
normal.correlation = corr.test(dataset[,c(2:18)],[,c(19:22)], method="spearman")
My question is how do I run a partial correlation with my variables without correlating everything? Thanks for any help you can provide.

Is it because of a timing issue that you don't want to correlate everything? If not then you could use the following to extract the relevant part of the (full) partial correlation matrix
par.r=partial.r(dataset, c(2:22), c(1))
par[1:17,18:21]

Related

How to add weights to prcomp() to do PCA analysis and subsequently crossvalidate the model?

I have a dataset, df, with one column where weights are present as below as a .csv file:
Outcome,Heat,Mobility,Time,weights
Good,125,0.2,9,2
Neutral,250,0.5,10,2
Bad,12,1.6,1,3
Good,162,0.1,9,1
Good,150,0.3,9,1
Bad,8,5.2,2,4
Neutral,330,0.2,12,3
Neutral,200,0.6,8,1
Bad,50,12,4,3
Good,130,0.9,10,4
I usually begin PCA analysis by using prcomp(df[,2:4]). But there doesn't seem to be any option to add the weights.
I tried doing prcomp(df[,2:4],scale. =as.numeric(unlist(df[5]))) option, but that gave errors stating that the number of columns provided was not suitable. Is there a way to add the associated to each row here, somehow?
Also, how I go about cross validating the model I generate here using the "leave-one-out" approach?

Creating a data frame in R

I am using some simple forecasting method such as Naive for my project. I am using accuracy(naive_train, test) to measure the accuracy of these method. Right now I have this as my output of meanf method.
However, I want to create a dataframe like this to compare the different methods.
How can I make a data frame like this and adding the extra column on the right indicating which method this is? Thank you!

Not losing observations when faced with missing data

I have a dataset where I've fitted a linear model and I've tried to use the step function on this linear model. I get an error message "saying number of rows in use has changed: remove missing values?".
I noticed that a few of the observations (not many) in my dataset had NA values for one variable. I've seen similar questions which suggest using na.omit(), but when I do this I lose the observations. I want to keep the observations however, because they contain useful information for the other variables. Is there a way to use step and avoid losing the observations?
You can call the nobs function to check that the number of observations is unchanged, and its use.fallback argument to potentially guess the missing values. The R documentation however recommends omitting the relevant data before running step.
I would discourage you from simply omitting the missing values if they are indeed really missing. You can use multiple imputation via Amelia to impute the data such that you have a full dataset.
see here: https://cran.r-project.org/web/packages/Amelia/Amelia.pdf
also I would recommend reviewing the book "Statistical Analysis With Missing Data" by R. Little and D.B. Rubin.

Applying univariate coxph function to multiple covariates (columns) at once

First, I gathered from this link Applying a function to multiple columns that using the "function" function would perhaps do what I'm looking for. However, I have not been able to make the leap from thinking about it in the way presented to making it actually work in my situation (or really even knowing where to start). I'm a beginner in R so I apologize in advance if this is a really "newb" question. My data is a data frame that consists of an event variable (tumor recurrence) and a time variable (followup time/time to recurrence) as well as recurrence risk factors (t-stage, tumor size,age at dx, etc.). Some risk factors are categorical and some are continuous. I have been running my univariate analysis by hand, one at a time like this example univariateageatdx<-coxph(survobj~agedx), and then collecting the data. This gets very tedious for multiple factors and doing it for a few different recurrence types. I figured there must be a way to code such that I could basically have one line of code that had the coxph equation and then applied it to all of my variables of interest and spit out a result that had the univariate analysis results for each factor. I tried using cbind to bind variables (i.e x<-cbind("agedx","tumor size") then running cox coxph(recurrencesurvobj~x) but this of course just did the multivariate analysis on these variables and didn't split them out as true univariate analyses.
I also tried the following code based on a similar problem that I found on a different site, but it gave the error shown and I don't know quite what to make of it. Is this on the right track?
f <- as.formula(paste('regionalsurvobj ~', paste(colnames(nodcistradmasvssubcutmasR)[6-9], collapse='+')))
I then ran it has coxph(f)
Gave me the results of a multivariate cox analysis.
Thanks!
**edit: I just fixed the error, I needed to use the column numbers I suppose not the names. Changes are reflected in the code above. However, it still runs the variables selected as a multivariate analysis and not as the true univariate analysis...
If you want to go the formula-route (which in your case with multiple outcomes and multiple variables might be the most practical way to go about it) you need to create a formula per model you want to fit. I've split the steps here a bit (making formulas, making models and extracting data), they can off course be combined this allows you to inspect all your models.
#example using transplant data from survival package
#make new event-variable: death or no death
#to have dichot outcome
transplant$death <- transplant$event=="death"
#making formulas
univ_formulas <- sapply(c("age","sex","abo"),function(x)as.formula(paste('Surv(futime,death)~',x))
)
#making a list of models
univ_models <- lapply(univ_formulas, function(x){coxph(x,data=transplant)})
#extract data (here I've gone for HR and confint)
univ_results <- lapply(univ_models,function(x){return(exp(cbind(coef(x),confint(x))))})

Princomp error in R : covariance matrix is not non-negative definite

I have this script which does a simple PCA analysis on number of variables and at the end attaches two coordinates and two other columns(presence, NZ_Field) to the output file. I have done this many times before but now its giving me this error:
I understand that it means there are negative eigenvalues. I looked at similar posts which suggest to use na.omit but it didn't work.
I have uploaded the "biodata.Rdata" file here:
covariance matrix is not non-negative definite
https://www.dropbox.com/s/1ex2z72lilxe16l/biodata.rdata?dl=0
I am pretty sure it is not because of missing values in data because I have used the same data with different "presence" and "NZ_Field" column.
Any help is highly appreciated.
load("biodata.rdata")
#save data separately
coords=biodata[,1:2]
biovars=biodata[,3:21]
presence=biodata[,22]
NZ_Field=biodata[,23]
#Do PCA
bpc=princomp(biovars ,cor=TRUE)
#re-attach data with auxiliary data..coordinates, presence and NZ location data
PCresults=cbind(coords, bpc$scores[,1:3], presence, NZ_Field)
write.table(PCresults,file= "hlb_pca_all.txt", sep= ",",row.names=FALSE)
This does appear to be an issue with missing data so there are a few ways to deal with it. One way is to manually do listwise deletion on the data before running the PCA which in your case would be:
biovars<-biovars[complete.cases(biovars),]
The other option is to use another package, specifically psych seems to work well here and you can use principal(biovars), and while the output is bit different it does work using pairwise deletion, so basically it comes down to whether or not you want to use pairwise or listwise deletion. Thanks!

Resources