I can't get the following formula to coerce selected columns of a data frame to numeric
x<-read.csv("more.csv",colClasses="character")
test<-list(more[,10],more[,11],more[,12])
test2<-lapply(test,as.numeric)
But
is.numeric(more[,10])
[1] FALSE
When I use the following formula it works:
more[,10]<-as.numeric(more[,10])
is.numeric(more[,10])
[1] TRUE
I can't make out the error in the first formula used.
Because my work doesn't always necessarily use the same column locations, I personally would use a combination of josliber's approach and the original proposal. Create a vector of the column names you want to coerce, and use lapply to modify only those columns.
test <- as.vector(c("test1", "test2", "test3"))
more[test] = lapply(more[test], as.numeric)
Cheers
When you use test2 <- lapply(test, as.numeric), it is building a new list (called test2) with all the elements converted to be numeric. This does not change test and also does not change the data frame more.
You could convert columns 10, 11, and 12 to numeric in your data frame with something like:
more[,10:12] = as.numeric(more[,10:12])
you can also use as.dataframe if as.numeric is not working.
Assuming your data frame is x and the column name is "value" :
x["value"] <- as.numeric(x[,"value"])
My example won't work if you eliminate the "," in x[,"value"]. I like this method due to its simplicity.
Related
I've imported a data frame from a csv-file
dat3 <- read.csv(file.choose(),as.is = TRUE)
contains names and values. My problem is, that when I try to replace a value in the data frame, e.g.
dat3[3,6]<-12
then it just assumes, that "12" is a text string and not a value, thus preventing me from using that number to mathematical operations. I'd like to being able to replace some numbers in the data frame and using them for mathematical operations.
When I try adding 1 to dat3[3,6] I get: "Error in dat3[3, 6] + 1 : non-numeric argument to binary operator".
I've tried:
lapply(dat3[3,6], as.numeric)
dat3[3,6]<-as.numeric(12)
But it doesn't work. I have though no problems in using the already imported numbers in the data frame. This only happens for numbers which I replace.
Yes!
I've found the answer!
It is:
dat[, c(3:6)] <- sapply(dat[, c(3:6)], as.numeric)
to convert column to numbers.
Thank you all!
I know how to convert one factor of a dataframe to numeric:
rds$fcv12afa3num <- as.numeric(levels(rds$fcv12afa3))[rds$fcv12afa3]
My two questions:
But how can I convert all dataframe-columns simultaneously, if the df consists only of factors?
How can I convert several factors simultaneously, based on a pattern of the column name?
I have many NA's, if that matters.
Thanks for your answer, Christian
Without example data, I can't give a completely exact answer, but this should get you started.
factorVars <- names(YourData)[vapply(YourData, is.factor, logical(1))]
YourData[, factorVars] <- lapply(YourData[, factorVars, drop = FALSE],
as.numeric)
Some notes:
Use drop = FALSE to handle the case of there only being one factor in your data frame.
If all of the factors are data frames, you may get a list object in return. You'd have to run that list through as.data.frame to get your data frame back.
I have a data frame (my_df) with columns named after individual county numbers. I melted/cast the data from a much larger set to get to this point. The first column name is year and it is a list of years from 1970-2011. The next 3010 columns are counties. However, I'd like to rename the county columns to be "column_"+county number.
This code executes in R but for whatever reason doesn't update the column names. they remain solely the numbers... any help?
new_col_names = paste0("county_",colnames(my_df[,2:ncol(my_df)]))
colnames(my_df[,2:ncol(my_df)]) = new_col_names
The problem is the subsetting within the colnames call.
Try names(my_df) <- c(names(my_df)[1], new_col_names) instead.
Note: names and colnames are interchangeable for data.frame objects.
EDIT: alternate approach suggested by flodel, subsetting outside the function call:
names(my_df)[-1] <- new_col_names
colnames() is for a matrix (or matrix-like object), try simply names() for a data.frame
Example:
new_col_names=paste0("county_",colnames(my_df[,2:ncol(my_df)]))
my_df <- data.frame(a=c(1,2,3,4,5), b=rnorm(5), c=rnorm(5), d=rnorm(5))
names(my_df) <- c(names(my_df)[1], new_col_names)
I made the following example code to give you an idea of my real dataset. I have 2 datasets, a factor variable List and a logical variable ok.
df1 <- c("a","b","c","d","e","f","g")
df2 <- c("a","d","e")
List <- factor(as.integer(df1 %in% df2))
ok <- c(TRUE,FALSE, FALSE,FALSE,TRUE,FALSE,TRUE)
The List and the ok variables has both a length of 7. I want to remove all the samples in List with the condition TRUE in ok. For example: the first, fifth and seventh variables need to be removed in the List variable.
Can anyone help me with this?
Thanks
Easier than you think.
List[!ok]
Perhaps List[!ok]? BTW, you don't need as.logical as vector ok will be saved internaly as logical.
I am having trouble turning my data.frame into a matrix format. Because I wanted to change my data.frame with mostly factor variables into a numeric matrix, I used the following code
UN2010frame <- data.matrix(lapply(UN2010, as.numeric))
However when I checked the mode of the UN2010frame, it still showed up as a list. Because the code I want to run (Ordrating) does not accept data in a list format, I used UN2010matrix <- unlist(UN2010frame) to unlist my matrix. When I did this, my first row ( which was formerly a row with column names) turned into NAs. This was a problem for me because when I tried to run an ordinal IRT model using this data set, I got the following error message.
> Error in 1:nrow(Y) : argument of
> length 0
I think it is because all the values in my first row are now gone.
If you could help me on any front, It would be deeply appreciated.
Thank you very much!
Haillie
First, the correct use of data.matrix is :
data.matrix(UN2010)
as it converts automatically to numeric. The lapply in your code is the first source for the error you get. You put a list in the data.matrix function, not a dataframe. So it returns a list of matrices, and not a matrix.
Second, unlist returns a vector, not a matrix. So pretty sure you won't find a "first row with NA", as you have a vector. Which might explain part of your confusion.
You probably have a character column somewhere. Converting this to numeric gives NA. If you don't want this, then exclude them from the further analysis. One possibility is to use colwise() from the plyr package to convert only the factors:
colwise(as.numeric,is.factor)(UN2010)
Which returns a dataframe with only the factors. This can be easily converted by data.matrix() or as.matrix(). Alternatively you use the base solution :
id <- sapply(UN2010,is.character)
sapply(UN2010[!id],as.numeric)
which will return you a matrix with all non-character columns converted to numeric.If you really want to keep the dataframe with all original columns, you can do :
UN2010frame <- UN2010
UN2010frame[!id] <- lapply(UN2010[!id],as.numeric)
Toy example code :
UN2010 <- data.frame(
F1 = factor(rep(letters[1:3],10)),
F2 = factor(rep(letters[5:10],5)),
Char = rep(letters[11:16],each=5),
Num = 1:30,
stringsAsFactors=FALSE
)
Try as.data.frame instead of data.matrix.