I have a data.frame in R with columns that also have column names.
I have another data.frame with 0s and -1s that controls which columns to use from the first data.frame in a subsequent analysis.
I now ran into an issue that I cannot wrap my head around.
First of all, the "offending" line of code is:
covar.data<-covar.data[,!onoff]
FYI I have confirmed both covar.data and onoff are data.frames.
When I run this with onoff selecting 2 or more columns, everything is fine, and the resulting covar.data is still a data.frame - and this is important, because I need to use the column names in the rest of my analysis.
However, if I have onoff selecting only 1 column, covar.data turns into a matrix!! This is a problem, because the column name also disappears!
I tried
covar.data<-as.data.frame(covar.data[,!onoff])
and
covar.data<-as.data.frame(covar.data[,!onoff], col.names=TRUE)
but that didn't make a difference in the disappearance of the column name.
I don't understand why R decides to turn the data.frame into a matrix (only for the times I am left with one column), and I cannot figure out how to preserve the data.frame PLUS the column names.
If you select a single column of a data.frame, R assumes you want to extract that data as a vector rather than returning another data.frame (and in most cases this is exactly the behavior you want). But if you do want to keep that single column as a data.frame, then you should do
covar.data[,!onoff, drop=F]
Related
I'm in a very basic class that introduces R for genetic purposes. I'm encountering a rather peculiar problem in trying to follow the instructions given. Here is what I have along with the instructor's notes:
MangrovesRaw<-read.csv("C:/Users/esteb/Documents/PopGen/MangrovesSites.csv")
#i'm going to make a new dataframe now, with one column more than the mangrovesraw dataframe but the same number of rows.
View(MangrovesRaw)
Mangroves<-data.frame(matrix(nrow = 528, ncol = 23))
#next I want you to name the first column of Mangroves "pop"
colnames(Mangroves)<-c(col1="pop")
#i'm now assigning all values of that column to be 1
Mangroves$pop<-1
#assign the rest of the columns (2 to 23) to the entirety of the MangrovesRaw dataframe
#then change the names to match the mangroves raw names
colnames(Mangroves)[2:23]<-colnames(MangrovesRaw)
I'm not really sure how to assign columns that haven't been named used the $ as we have in the past. A friend suggested I first run
colnames(Mangroves)[2:23]<-colnames(MangrovesRaw)
Mangroves$X338<-MangrovesRaw
#X338 is the name of the first column from MangrovesRaw
But while this does transfer the data from MangrovesRaw, it comes at the cost of having my column names messed up with X338. added to every subsequent column. In an attempt to modify this I found the following "fix"
colnames(Mangroves)[2:23]<-colnames(MangrovesRaw)
Mangroves$X338<-MangrovesRaw[,2]
#Mangroves$X338<-MangrovesRaw[,2:22]
#MangrovesRaw has 22 columns in total
While this transferred all the data I needed for the X338 Column, it didn't transfer any data for the remaining 21 columns. The code in # just results in the same problem of having X388. show up in all my column names.
What am I doing wrong?
There are a few ways to solve this problem. It may be that your instructor wants it done a certain way, but here's one simple solution: just cbind() the Mangroves$pop column with the real data. Then the data and column names are already added.
Mangroves <- cbind(Mangroves$pop, MangrovesRaw)
Here's another way:
Mangroves[, 2:23] <- MangrovesRaw
colnames(Mangroves)[2:23] <- colnames(MangrovesRaw)
My question is about renaming multiple column names at once.
I have a dataframe called 'growth' with 46 columns.
Columns 2:46 are all named as dates, but all of the dates have an X in front of them, e.g. 'X1981'.
Naturally I want to remove the X from all of the column names.
I cannot understand why the following is not working:
colnames(growth[ ,2:length(growth)]) <- substring(colnames(growth[ ,2:length(growth)]),2)
Please help me with some insights.
Nevermind, I changed the instruction to...
names(growth)[2:46] <- substring(names(growth)[2:46],2)
...and now it works. Clearly it had something to do with how I was subsetting the columns.
I am reading a txt file into R and have several columns that should be numeric, but everything is interpreted as character. Now I would like to convert only a few columns within that matrix (I converted it to a matrix in a first step) to numeric, but I only managed to extract columns, but that way I got rid of the type matrix...
data <- as.numeric(data[,1])
Now, I've found similar questions here but none of the answers worked in the way that it conserved the type matrix.
For example, I've tried to store the affected columns in a vector and then perform the action on that vector with lapply
cols<- c("a","b","d")
data<- as.matrix(lapply(cols, as.numeric))
But this gives me only empty fields, and of course it only shows the columns I selected and not the rest of the matrix. I also got the error message
NAs introduced by coercion
As a last step I tried the following, but I ended up having a list and not a matrix anymore
data[1:25] <- as.matrix(lapply(data[1:25], as.numeric))
What I would like to have, is a matrix where several columns (not just 1:25 as in my example above but rather, say, columns 1,3 and 6) are converted to numeric and the rest stays the same.
Does someone have an answer and maybe even an explanation for why the things I've tried didn't work?
I'm relatively new to R, and I can't figure out how to split the list that I'm working with. I have
B<-tapply(newdata$lf.d1, newdata$year, mean)
But I want to concatenate the mean values onto another matrix without the year values. How would I go about doing this?
The result of tapply with a single grouping factor will be an R contingency table with rownames. There is only a single column (actually not even that because it is a table object and only has a single dimension unless you coerce it with as.matrix). If you want to remove the names, then use the unname function.
unname(B)
unname(as.matrix(B))
I have a simple problem. I have a data frame with 121 columns. columns 9:121 need to be numeric, but when imported into R, they are a mixture of numeric and integers and factors. Columns 1:8 need to remain characters.
I’ve seen some people use loops, and others use apply(). What do you think is the most elegant way of doing this?
Thanks very much,
Paul M
Try the following... The apply function allows you to loop over either rows, cols, or both, of a dataframe and apply any function, so to make sure all your columns from 9:121 are numeric, you can do the following:
table[,9:121] <- apply(table[,9:121],2, function(x) as.numeric(as.character(x)))
table[,1:8] <- apply(table[,1:8], 2, as.character)
Where table is the dataframe you read into R.
Briefly I specify in the apply function the table I want to loop over - in this case the subset of your table we want to make changes to, then we specify the number 2 to indicate columns, and finally give the name of the as.numeric or as.character functions. The assignment operator then replaces the old values in your table with the new ones of correct format.
-EDIT: Just changed the first line as I recalled that if you convert from a factor to a number, what you get is the integer of the factor level and not the number you think you are getting to factors first need to be converted to characters, then numbers, which was can do just by wrapping as.character inside as.numeric.
When you read in the table use strinsAsFactors=FALSE then there will not be any factors.