Sort matrix keep colnames - r

I did a regression and coerced all the estimated coefficients in beta.form. R tells me it is a matrix with dim(1,161). I would like to sort the coefficients in increasing manner.
My code:
beta.form <- reg.form$coefficients
beta.ranked <- sort(beta.form, decreasing=FALSE)
My problem: I would like to keep the names of the stocks. But beta.ranked returns me only the sorted values. Which is good already, but I need to know which value belongs to which stock.
If anyone could tell me how to sort while keeping colnames, I would appreciate it a lot!

Ok I solved it!
First I converted the output from the regression into a data.frame and then sorted it. The names and values are presented together.
The mistake I made was that I coerced the data into a vector and the structure of 1x161 was lost.
Solved!

Related

How to assign a vector to dataframe when vector has fewer rows than dataframe?

I am trying to calculate stock returns on the basis of prices, but since returns will only be calculable from t=2 and so on, I'm one replacement short from being able to assign the returns vector to the original dataframe.
I have assigned NA values to the dframe$returns to begin with, but this hasn't helped. I need a way to come around this issue.
Here's what I have so far:
dframe$returns <- NA
n <- nrows(dframe)
dframe$returns <- (dframe[2:n,adj.price]-dframe[1:(n-1),adj.price])/dframe[1:(n-1),adj.price]
I've tried Google, and I only find people creating return vectors, which is not the real problem, as my issue is how to deal with assigning a vector to a dataframe, where the vector is shorter than the dataframe. Anyone here that can help me?

how to make groups of variables from a data frame in R?

Dear Friends I would appreciate if someone can help me in some question in R.
I have a data frame with 8 variables, lets say (v1,v2,...,v8).I would like to produce groups of datasets based on all possible combinations of these variables. that is, with a set of 8 variables I am able to produce 2^8-1=63 subsets of variables like {v1},{v2},...,{v8}, {v1,v2},....,{v1,v2,v3},....,{v1,v2,...,v8}
my goal is to produce specific statistic based on these groupings and then compare which subset produces a better statistic. my problem is how can I produce these combinations.
thanks in advance
You need the function combn. It creates all the combinations of a vector that you provide it. For instance, in your example:
names(yourdataframe) <- c("V1","V2","V3","V4","V5","V6","V7","V8")
varnames <- names(yourdataframe)
combn(x = varnames,m = 3)
This gives you all permutations of V1-V8 taken 3 at a time.
I'll use data.table instead of data.frame;
I'll include an extraneous variable for robustness.
This will get you your subsetted data frames:
nn<-8L
dt<-setnames(as.data.table(cbind(1:100,matrix(rnorm(100*nn),ncol=nn))),
c("id",paste0("V",1:nn)))
#should be a smarter (read: more easily generalized) way to produce this,
# but it's eluding me for now...
#basically, this generates the indices to include when subsetting
x<-cbind(rep(c(0,1),each=128),
rep(rep(c(0,1),each=64),2),
rep(rep(c(0,1),each=32),4),
rep(rep(c(0,1),each=16),8),
rep(rep(c(0,1),each=8),16),
rep(rep(c(0,1),each=4),32),
rep(rep(c(0,1),each=2),64),
rep(c(0,1),128)) *
t(matrix(rep(1:nn),2^nn,nrow=nn))
#now get the correct column names for each subset
# by subscripting the nonzero elements
incl<-lapply(1:(2^nn),function(y){paste0("V",1:nn)[x[y,][x[y,]!=0]]})
#now subset the data.table for each subset
ans<-lapply(1:(2^nn),function(y){dt[,incl[[y]],with=F]})
You said you wanted some statistics from each subset, in which case it may be more useful to instead specify the last line as:
ans2<-lapply(1:(2^nn),function(y){unlist(dt[,incl[[y]],with=F])})
#exclude the first row, which is null
means<-lapply(2:(2^nn),function(y){mean(ans2[[y]])})

How to adapt wilcox.test to my data in R?

I am new to R and trying to use wilcox.test on my data : I have a dataframe 36021X246 with rownames as probeIDs and the last row is a label which indicates which group the samples belong to - "control" for the first 140 and "treated" for the last 106.
I would greatly appreciate knowing how to define the two groups when I perform the test....I am unable to find much information on the "formula" argument online except that -
"formula
a formula of the form lhs ~ rhs where lhs is a numeric variable giving the data values and rhs a factor with two levels giving the corresponding groups."
If someone could explain what lhs~rhs means and how to define this formula I would really appreciate it.
Thanks!
R typically assumes that each row is a case and the columns are associated variables. If the cases from both your samples occur in the same data frame, one column would be an indicator variable for sample membership. Let's call is IndSample. The Wilcoxon is a univariate test, so you would have another column containing the response values you are testing on. Let's call it Y. You then write
wilcox.test(y ~ IndSample, data=MyData, .....)
and the rest of your parameters for the test: is it two-sided? Do you want an exact statistic? (Probably not, in your case.)
It looks to me as if your data is on its side. That's problematic with a data frame, since you can't just pull out a row from a data frame, the way you would with a matrix.
You need to grab the last row and turn it into a factor - something like
factor(c(MyData[lastrow,]))
Then pull out the row that contains your response:
Y <- as.numeric(c(MyData[ResponseRow,]))
Then do the wilcoxon.
However, I am not sure that I have properly understood your situation. That seems to be a very large data matrix for a modest wilcoxon test.

How do I match fitted(gamm4.model) values with DF despite NAs?

I'm trying to make use of the fitted values from a gamm4 model and need them to match up with the right rows in the dataframe I'm working with.
Here's the model I run:
gam.outcome <- gamm4(formula = outcome ~ male + s(gpa),
random = ~ (1|school),
data=avr, na.action="na.exclude")
With an lmer object the "na.exclude" option leaves NAs in the fitted values so that a fitted(lmer.output) call returns a vector the same length and order as the dataframe. But in gamm4 I've tried fitted(gam.outcome$gam) and fitted(gam.outcome$mer) but don't know how to deal with the results of either. The latter omits all NA, despite the "na.exclude" option. The former includes twice as many NA values as lmer which should be a clue of some kind, but I'm too thick to get it. All I know is that either way the vector doesn't line up with the original data.
I imagine there is more than one way to solve my problem. I greatly appreciate help improving or tagging my question as well as answering it. Thanks!
Approximately (untested):
myfitted <- numeric(nrow(avr))
myfitted[!complete.cases(avr)] <- NA
myfitted[complete.cases(avr)] <- fitted(gam.outcome$mer)
Or (also untested)
avrframe <- model.frame(outcome~male+gpa+school,na.action=na.exclude)
napredict(attr(avrframe,"na.action"),fitted(gam.outcome$mer))
The first solution assumes that all of the NA values in avr are either in the columns you are interested in, or are in the same rows as NA values in the columns you are interested in. The second attempts to figure this out automatically.

least square means for dataset with missing data

I am writing for some help in R.
I am doing a simple RCBD analysis using the following script to
compare genotypes (Name)
for the trait "X".
library(stats)
data_1=read.table(file="test.txt", head=TRUE)
result_X =aov(X~Block+Name, data=data_1)
sink("result_X.txt")
summary(result_X)
sink()
My data has missing data ("NA"). So, after calculating the LSD,
I would like to compare
the genotypes in descending order. I do not think the averages
are good since some
'blocks' for some 'names' are missing.
So, the question is, what is the script to print out the 'least
square means', which I
think are the best to compare with LSD than simple averages.
Thank you for your help,
Oswald
Consider the function na.omit, which can filter out rows of data which contain NA.

Resources