I am trying to add a column for totals to a dataframe using R and am getting this error:
Error in rowSums(EurostatCrime2017[, 7:10]) : 'x' must be numeric.
Here is my code:
EurostatCrime2017$All_Theft <- rowSums(EurostatCrime2017[,7:11])
It could be due to the type issue. If we check the type of the columns with str
str(EurostatCrime2017[,7:10])
will find if the columns are not numeric or integers.
One option is to convert the columns to numeric
EurostatCrime2017[,7:10] <- lapply(EurostatCrime2017[,7:10], function(x)
as.numeric(as.character(x)))
Here, we specified as.character in case the columns are factor.
and then do the rowSums
I tried the options and it doesnt seem to be working. Here is a link to the document I am working on.
https://drive.google.com/open?id=193JI7z41xvpDh88MWrKp52I3HiQ76LFb
Related
I am trying to remove two rows from my dataset with this simple line of code:
my_data_screen <- my_data [-influential]
However, I get the error message Error: Can't negate columns that don't exist.
(The "influential" variable simply contains two numbers of rows, which is the result of calculating outliers from my sample.)
Even why I try do something as simple as targeting a specific row (i.e. my_data [37]), I get the same error message.
Why is R interpreting my command as targeting columns, rather than rows?
Hi with your code R cannot understand if you select a row or a column.
As #ThomasIsCoding suggest you should use:
my_data_screen <- my_data[-influential,]
Comma indicate there are rows, if you want to delete columns the following specification is the right one:
my_data_screen <- my_data[,-influential]
In summary, the position of commas tell R if you want to delete columns or rows.
If you have my_data as data.frame, then you should use
my_data[37, ]
since my_data[37] is indexing my_data in terms of columns by default.
Please read about https://rspatial.org/intr/4-indexing.html
If you are familiar with tidyverse, you should use :
The filter() function to remove rows : filter(!(influential %in% specified_values))
The select() function to remove columns : select(-influential)
If I have a vector of column names from a data frame, how can I check if they’re all numeric. If there’s any non-numeric variable, how to identify it?
I’ve tried the first one but couldn’t move to the second one until I could solve it.
When I tried the following, I keep getting false
all(df[,numeric_cols] %>% is.numeric())
Is there a one line code that I can put within if condition, and find the one that is not numeric?
You can check if all class are numeric by
all(sapply(df[,numeric_cols], class) == "numeric")
To identify non-numeric columns one way would be :
names(Filter(function(x) !is.numeric(x), df[,numeric_cols]))
I am simply trying to create a dataframe.
I read in data by doing:
>example <- read.csv(choose.files(), header=TRUE, sep=";")
The data contains 2 columns with 8736 rows plus a header.
I then simply want to combine this with the column of a dataframe with the same amount of rows (!) by doing:
>data_frame <- as.data.frame(example$x, example$y, otherdata$z)
It produces the following error
Warning message:
In as.data.frame.numeric(example$x, example$y, otherdata$z) :
'row.names' is not a character vector of length 8736 -- omitting it. Will be an error!
I have never had this problem before. It seems so easy to tackle but I cant help myself at the moment.
Overview
As long as the nrow(example) equals length(otherdata$z), use cbind.data.frame to combine columns into one data frame. An advantage with cbind.data.frame() is that there is no need to call the individual columns within example when binding them with otherdata$z.
# create a new data frame that adds the 'z' field from another source
df_example <- cbind.data.frame(example, otherdata$z)
I have a data frame consisting of five character variables which represent specific bacteria. I then have thousands of observations of each variable that all begin with the letter K. eg
x <- c(K0001,K0001,K0003,K0006)
y <- c(K0001,K0001,K0002,K0003)
z <- c(K0001,K0002,K0007,K0008)
r <- c(K0001,K0001,K0001,K0001)
o <- c(K0003,K0009,K0009,K0009)
I need to identify unique observations in the first column that don't appear in any of the remaining four columns. I have tried the approach suggested here which I think would work if I could create individual vectors using select ...
How to tell what is in one vector and not another?
but when I try to create a vector for analysis using the code ...
x <- select(data$x)
I get the error
Error in UseMethod("select_") :
no applicable method for 'select_' applied to an object of class "character
I have tried to mutate the vectors using as.factor and as.numeric but neither of these approaches work as the first gives an equivalent error as above, and as.numeric returns NAs.
Thanks in advance
The reference that you cited recommended using setdiff. The only thing that you need to do to apply that solution is to convert the four columns into one, so that it can be treated as a set. You can do that with unlist
setdiff(data$x, unlist(data[,2:5]))
"K0006"
I am having trouble turning my data.frame into a matrix format. Because I wanted to change my data.frame with mostly factor variables into a numeric matrix, I used the following code
UN2010frame <- data.matrix(lapply(UN2010, as.numeric))
However when I checked the mode of the UN2010frame, it still showed up as a list. Because the code I want to run (Ordrating) does not accept data in a list format, I used UN2010matrix <- unlist(UN2010frame) to unlist my matrix. When I did this, my first row ( which was formerly a row with column names) turned into NAs. This was a problem for me because when I tried to run an ordinal IRT model using this data set, I got the following error message.
> Error in 1:nrow(Y) : argument of
> length 0
I think it is because all the values in my first row are now gone.
If you could help me on any front, It would be deeply appreciated.
Thank you very much!
Haillie
First, the correct use of data.matrix is :
data.matrix(UN2010)
as it converts automatically to numeric. The lapply in your code is the first source for the error you get. You put a list in the data.matrix function, not a dataframe. So it returns a list of matrices, and not a matrix.
Second, unlist returns a vector, not a matrix. So pretty sure you won't find a "first row with NA", as you have a vector. Which might explain part of your confusion.
You probably have a character column somewhere. Converting this to numeric gives NA. If you don't want this, then exclude them from the further analysis. One possibility is to use colwise() from the plyr package to convert only the factors:
colwise(as.numeric,is.factor)(UN2010)
Which returns a dataframe with only the factors. This can be easily converted by data.matrix() or as.matrix(). Alternatively you use the base solution :
id <- sapply(UN2010,is.character)
sapply(UN2010[!id],as.numeric)
which will return you a matrix with all non-character columns converted to numeric.If you really want to keep the dataframe with all original columns, you can do :
UN2010frame <- UN2010
UN2010frame[!id] <- lapply(UN2010[!id],as.numeric)
Toy example code :
UN2010 <- data.frame(
F1 = factor(rep(letters[1:3],10)),
F2 = factor(rep(letters[5:10],5)),
Char = rep(letters[11:16],each=5),
Num = 1:30,
stringsAsFactors=FALSE
)
Try as.data.frame instead of data.matrix.