I've imported a data frame from a csv-file
dat3 <- read.csv(file.choose(),as.is = TRUE)
contains names and values. My problem is, that when I try to replace a value in the data frame, e.g.
dat3[3,6]<-12
then it just assumes, that "12" is a text string and not a value, thus preventing me from using that number to mathematical operations. I'd like to being able to replace some numbers in the data frame and using them for mathematical operations.
When I try adding 1 to dat3[3,6] I get: "Error in dat3[3, 6] + 1 : non-numeric argument to binary operator".
I've tried:
lapply(dat3[3,6], as.numeric)
dat3[3,6]<-as.numeric(12)
But it doesn't work. I have though no problems in using the already imported numbers in the data frame. This only happens for numbers which I replace.
Yes!
I've found the answer!
It is:
dat[, c(3:6)] <- sapply(dat[, c(3:6)], as.numeric)
to convert column to numbers.
Thank you all!
Related
I am simply trying to create a dataframe.
I read in data by doing:
>example <- read.csv(choose.files(), header=TRUE, sep=";")
The data contains 2 columns with 8736 rows plus a header.
I then simply want to combine this with the column of a dataframe with the same amount of rows (!) by doing:
>data_frame <- as.data.frame(example$x, example$y, otherdata$z)
It produces the following error
Warning message:
In as.data.frame.numeric(example$x, example$y, otherdata$z) :
'row.names' is not a character vector of length 8736 -- omitting it. Will be an error!
I have never had this problem before. It seems so easy to tackle but I cant help myself at the moment.
Overview
As long as the nrow(example) equals length(otherdata$z), use cbind.data.frame to combine columns into one data frame. An advantage with cbind.data.frame() is that there is no need to call the individual columns within example when binding them with otherdata$z.
# create a new data frame that adds the 'z' field from another source
df_example <- cbind.data.frame(example, otherdata$z)
I programmed a function, which created (or at least tried to create) a data frame of numeric values. I need to retrieve these numeric values later on in the function. For that purpose, I explicitly assigned all values in the data frame a numeric class, using
as.numeric()
Later on in my function, when I extract the elements from the data frame, using
mydataframe[1,2]
I get an error "non-numeric argument to binary operator". I don't really understand what is non-numeric in my data frame.
If I ask for class and mode of the values in the data frame, they are both "numeric", storage mode is "double". Can anyone enlighten me? Where do I go wrong?
By the way, I can extract elements without error, if I use
as.numeric(mydataframe[1,2])
But I need to extract quite a lot of elements, so I prefer all elements of my data frame being numeric.
My code:
mydata <- by(data, data[,index], function(data) {
*myfunction including a for-loop, creating a vector of numbers (subvar1)*}
var1 <- as.numeric(sum(subvar1) / n)
var2 <- as.numeric(mean(data[,value]))
var3 <- nrow(data)
var3 <- as.numeric(var3)
list(var1=var1, var2=var2, var3=var3)})
mydataframe <- data.matrix(do.call(rbind, mydata))
Thanks in advance!
I have a big data frame (22k rows, 400 columns) which is generated using read.csv from a csv file. It appears that every column is a factor and all the row values are the levels of this factor.
I now want to do some analysis (like PCA) but I can't work with it unless it is a matrix, but even when I try it like matrix, all I get is
> prcomp(as.matrix(my_data))
Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric
Is there a way of transforming this data frame with factors to a simple big matrix?
I am new in R so forgive all the (maybe terrible) mistakes.
Thanks
You can do it that way:
df<-data.frame(a=as.factor(c(1,2,3)), b=as.factor(c(2,3,4)))
m<-apply(apply(df, 1, as.character), 1, as.numeric)
apply uses a method on the given data.frame. It is important not to leave out to transform it to character first, because otherwise it will be converted to the internal numeric representation of the factor.
To add column names, do this:
m<-m[-1,] # removes the first 'empty' row
colnames(m)<-c("a", "b") # replace the right hand side with your desired column names, e.g. the first row of your data.frame
One more tip. You probably read the data.frame from a file, when you set the parameter header=TRUE, the first row will not be the header but the column names of the data.frame will be correct.
I can't get the following formula to coerce selected columns of a data frame to numeric
x<-read.csv("more.csv",colClasses="character")
test<-list(more[,10],more[,11],more[,12])
test2<-lapply(test,as.numeric)
But
is.numeric(more[,10])
[1] FALSE
When I use the following formula it works:
more[,10]<-as.numeric(more[,10])
is.numeric(more[,10])
[1] TRUE
I can't make out the error in the first formula used.
Because my work doesn't always necessarily use the same column locations, I personally would use a combination of josliber's approach and the original proposal. Create a vector of the column names you want to coerce, and use lapply to modify only those columns.
test <- as.vector(c("test1", "test2", "test3"))
more[test] = lapply(more[test], as.numeric)
Cheers
When you use test2 <- lapply(test, as.numeric), it is building a new list (called test2) with all the elements converted to be numeric. This does not change test and also does not change the data frame more.
You could convert columns 10, 11, and 12 to numeric in your data frame with something like:
more[,10:12] = as.numeric(more[,10:12])
you can also use as.dataframe if as.numeric is not working.
Assuming your data frame is x and the column name is "value" :
x["value"] <- as.numeric(x[,"value"])
My example won't work if you eliminate the "," in x[,"value"]. I like this method due to its simplicity.
I am having trouble turning my data.frame into a matrix format. Because I wanted to change my data.frame with mostly factor variables into a numeric matrix, I used the following code
UN2010frame <- data.matrix(lapply(UN2010, as.numeric))
However when I checked the mode of the UN2010frame, it still showed up as a list. Because the code I want to run (Ordrating) does not accept data in a list format, I used UN2010matrix <- unlist(UN2010frame) to unlist my matrix. When I did this, my first row ( which was formerly a row with column names) turned into NAs. This was a problem for me because when I tried to run an ordinal IRT model using this data set, I got the following error message.
> Error in 1:nrow(Y) : argument of
> length 0
I think it is because all the values in my first row are now gone.
If you could help me on any front, It would be deeply appreciated.
Thank you very much!
Haillie
First, the correct use of data.matrix is :
data.matrix(UN2010)
as it converts automatically to numeric. The lapply in your code is the first source for the error you get. You put a list in the data.matrix function, not a dataframe. So it returns a list of matrices, and not a matrix.
Second, unlist returns a vector, not a matrix. So pretty sure you won't find a "first row with NA", as you have a vector. Which might explain part of your confusion.
You probably have a character column somewhere. Converting this to numeric gives NA. If you don't want this, then exclude them from the further analysis. One possibility is to use colwise() from the plyr package to convert only the factors:
colwise(as.numeric,is.factor)(UN2010)
Which returns a dataframe with only the factors. This can be easily converted by data.matrix() or as.matrix(). Alternatively you use the base solution :
id <- sapply(UN2010,is.character)
sapply(UN2010[!id],as.numeric)
which will return you a matrix with all non-character columns converted to numeric.If you really want to keep the dataframe with all original columns, you can do :
UN2010frame <- UN2010
UN2010frame[!id] <- lapply(UN2010[!id],as.numeric)
Toy example code :
UN2010 <- data.frame(
F1 = factor(rep(letters[1:3],10)),
F2 = factor(rep(letters[5:10],5)),
Char = rep(letters[11:16],each=5),
Num = 1:30,
stringsAsFactors=FALSE
)
Try as.data.frame instead of data.matrix.