I am trying to run a survival analysis on a set of data I have collected. In this data frame (m3), each row is a new patient and each column is a mutation I have identified. I have made a binary data table to indicate whether each patient is positive or negative for the mutation. I can run a survfit function for each column(mutation), but I have hundreds and want to loop through them. I have written the following code, but don't think it is correct (nothing is being output).
for (i in m3[,2:256]) {survdiff(Surv(m3$Overall.Survival, m3$Status) ~ i,
data = m3)}
Once I gather this data I want to make a table with each mutation (column) as a row and put the p-value from this survfit object as the column.
I'm not sure why I don't have any output for the for loop and even more so how to generate the new data frame. I believe I would be subsetting it.
Related
I have performed an operation using the mclust package on a nonmissing data frame. The nonmissing data frame was created using the dplyr package by using the select function. As such, row.names appears as a vector in the data frame passed to the mclust function.
I next have extracted some critical values (the case 'classification') from this function as:
class<-functionobject$classification
Thus, the numeric list of classification values is associated with row.names.
When I attempt to append this list of values to a new data frame of the same length (the same cases) without row.names, I lose important ordering, it seems. I know this as when I compare classification groups on other variables in the new data frame, they are not equal to the values obtained in the mclust function using those same variables.
The reason I can not simply append to the nonmissing data frame (with row.names) used in the mclust function is that I require other variables from the data set not used in the function and which needed merged on ID variables as:
NEW_DF=merge(mclust_DF, other_DF, by=c("X1", "X2"))
So I end up with a data frame of the same length but which no longer has row.names on which I want to append the classification values from the mclust function described above. Although no errors are thrown when I use:
FINAL_DF<- cbind.data.frame(NEW_DF, class)
The data are off as I can see inspection of group (class) means on relevant variables do NOT equal those from the mclust function (which they should as it is the same core input data).
I realize I am missing something obvious here, but I have not found an answer despite an exhaustive search of the archives. What is the correct way to go about this rather tedious wrangling?
FWIW: a simple, though perhaps still inefficient solution overall, was to bind the saved classification data from the mclust function to the nonmissing data frame BEFORE merging with additional validation variables as when the merge occurs, the 'row.names' vector induced by dplyr in the select cases function is lost and cases are resorted.
This solution dawned on me as I realized that the mclust function was based on the nonmissing data frame (created using dplyr) and thus resultant data objects followed the case ordering from input data (by row.names)
I am looking at some code that prepares a dataframe for several prediction models to be tested later on. The general idea is to predict NormSec based on all the other columns.
Not sure what predict(dummies,newdata=data) does in this case.
I know that predict is used to predict based on an already trained fit. Why is it used in this case? The code works, just trying to understand it.
data<-read.csv(file="datatable.csv")
attach(data)
#selecting the useful columns from data table:
data<-data.frame(NormSec, Rivalry,Stars,NormFB,SeasonPart,FootballSeason,LeBron,Weekend,LastSeasonWins
,Holiday,BigGame,OverUnders,DaysSinceLast,DaysUntilNext, Weekday, Monthday, NewArena)
dummies <- dummyVars(NormSec~., data = data)
attach(dummies)
#Here is the function I don't get:
dataDescr<-predict(dummies,newdata=data)
dataDescr<-data.frame(dataDescr)
attach(dataDescr)
dummies is a dummy variable object and DataDescr (output of predict()) is the original dataframe without the NormSec column.
I am attempting to run a chi sqare analysis on the data frame (called "habitat.re") below however im having difficulty as I've gotten it to read the data but its giving the wrong results, when i prompt it with $expected it returns 18 different colums when there should be 3 (one for each site).
All the tourorials ive been able to find have the data as a table, however i've not been able to convert it correctly myself.
The chisq.test function is intended to work with two variables, or columns in this case. If you want to compare all three of your columns, then I suspect you would want to compare 1-2, 2-3, and 3-3, e.g.
chisq.test(x=habitat.re$Gidgee, y=habitat.re$`Ian's Place`)
chisq.test(x=habitat.re$`Ian's Place`, y=habitat.re$`Saw Mulga`)
chisq.test(x=habitat.re$Gidgee, y=habitat.re$`Saw Mulga`)
Actually, just typing in the above should reveal much useful information directly to the R console, something like this:
data: habitat.re$Gidgee and y=habitat.re$`Ian's Place`
X-squared = 5.5569, df = 1, p-value = 0.01841
A sufficiently low p-value might indicate that the two columns are in fact dependent.
Pearson's Chi-Squared Test requires a data frame to be made into a matrix table containing only the variables you need as numerical values. N.B. my data frame is called "habitat.re"
habitat.df<-data.matrix(habitat.re, rownames.force = NA)# convert to matrix table
habitat.df<- habitat.df[,-c(1,2,3)] # delete first 3 columns
rownames(habitat.df) <- habitat.re$COMMON.NAME #pull names from original
chisq.test(habitat.df) #do chisquare test
chisq.test(habitat.df)$expected #return predicted values
The following are images of my data frames
habitat.re
habitat.df
I have a data frame in which the element at index (i,j) is the sample mean obtained under method j and during time period i. I have another data frame which contains the standard errors of the corresponding sample means. I can convert either data frame to LaTeX table using the stargazer package but what I would like to do is to create a single LaTeX table using both data frames such that the element at index (i,j) of this table should contain the corresponding sample mean right above the standard error of this mean in parentheses. Obviously, I want to avoid manual data entry as much as possible.
I am new to R and trying to use wilcox.test on my data : I have a dataframe 36021X246 with rownames as probeIDs and the last row is a label which indicates which group the samples belong to - "control" for the first 140 and "treated" for the last 106.
I would greatly appreciate knowing how to define the two groups when I perform the test....I am unable to find much information on the "formula" argument online except that -
"formula
a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding groups."
If someone could explain what lhs~rhs means and how to define this formula I would really appreciate it.
Thanks!
R typically assumes that each row is a case and the columns are associated variables. If the cases from both your samples occur in the same data frame, one column would be an indicator variable for sample membership. Let's call is IndSample. The Wilcoxon is a univariate test, so you would have another column containing the response values you are testing on. Let's call it Y. You then write
wilcox.test(y ~ IndSample, data=MyData, .....)
and the rest of your parameters for the test: is it two-sided? Do you want an exact statistic? (Probably not, in your case.)
It looks to me as if your data is on its side. That's problematic with a data frame, since you can't just pull out a row from a data frame, the way you would with a matrix.
You need to grab the last row and turn it into a factor - something like
factor(c(MyData[lastrow,]))
Then pull out the row that contains your response:
Y <- as.numeric(c(MyData[ResponseRow,]))
Then do the wilcoxon.
However, I am not sure that I have properly understood your situation. That seems to be a very large data matrix for a modest wilcoxon test.