I am provided with a dataset and I am asked to perform binning based on a particular column value. Here the column value is in factor when I tried converting to numeric I am getting either the NA coercion or getting the factor values but not the data in the table.
data$imdbVotes <- as.numeric(as.character(data$imdbVotes))
When I tried with this code I got the error:
Warning message:
NAs introduced by coercion
This is the table provided and I have to perform binning based on IMDB votes.
Hi Nice meeting you out of Edwisor. What you are doing is perfectly right. There have to be some NAs in the file.
For example if you try tail(data,7) you will see that the value of imdbVotes for the movie Venky is missing. Now we have two options. Either get the data for this item. Or keep it as NA.
In an ideal scenario when the data is critical, I would extract the data again so that there are no missing values. In this case, I am going to leave it as NA, so it doesn't mess with the calculations.
Related
this the original dataset "hk"
select(hk,date,new_cases) "new_cases" lied in the third column in the dataset. It's strange that it showed an error of "'new_cases' not found"
a <- select(hk,new_cases)Then, I tried select one variable. But I failed to draw it. The new dataset was the same as the original dataset after selecting.
select(hk,date)When picking up the "date" variable, it returned NAs
I'm down as I couldn't figure out where problems are.
This is very similar with the following question: R SVM return NA for predictions with missing data
However, the response suggested there does not work (at least for me). Therefore I would like to be more general and try a different approach (or adjust the one proposed there). I can predict using my svm model on the complete.cases() of my data frame. However, it is very important for me to have NA values for all rows with missing data.
My theoretical approach should be the following: predict on complete.cases() of my data frame. Find the index of complete cases. Somehow cbind the column with predictions back to my data.frame(), while adding NA values for all values whose indices are different from those of complete cases. In the essence I should create a column in a data frame by combining two vectors: one of predictions, the other of NA values (based on known indices). However, I am stupid enough not to be able to write the few lines of code for doing that.
Got a data set with a bunch of rows (1-285) and columns (x__1-x__70). Trying to select data from rows 12 and 13, starting from column x__5 to x__70.
I can select individual cells WPP[12,"x__5"], full columns WPP[,x__5], full rows WPP[12,], full row ranges WPP[12:13,], but can't do column ranges.
I'd like WPP[12:13,"X__5":"x__70"] but I get this:
"Error in "X__5":"x__70" : NA/NaN argument
In addition: Warning messages:
1: In check_names_df(j, x) : NAs introduced by coercion
2: In check_names_df(j, x) : NAs introduced by coercion"
I also encountered this problem, while converting some of the SAS code in R. Unfortunately R does not have any mechanism, which can be used for this purpose.
Hence, its important that you do it using the available set of tools. In R, you can subset columns by name(you have to give all column names, explicitly), or position index.
Below are the two solutions:
WPP[12:13, paste0("X__", 5:70)] # explicit column names
requiredColIndex <- which(names(WPP) %in% paste0("X__", 5:70))
WPP[12:13, requiredColIndex] # using the index of columns
You can also use subset function, as pointed out in the comments.
rowIndex <- 1:nrow(mtcars) %in% 12:15
subset(mtcars, rowIndex, select=wt:gear)
But sub-setting rows in subset is a bit tricky, as one can observe in the code above. However you can always use, something like below:
subset(mtcars[12:15, ], select=wt:gear)
which will give similar result.
However, if you are doing subsetting of rows using data-frame way/method, then it is easier to subset the columns using similar data-frame way/method.
I am reading a txt file into R and have several columns that should be numeric, but everything is interpreted as character. Now I would like to convert only a few columns within that matrix (I converted it to a matrix in a first step) to numeric, but I only managed to extract columns, but that way I got rid of the type matrix...
data <- as.numeric(data[,1])
Now, I've found similar questions here but none of the answers worked in the way that it conserved the type matrix.
For example, I've tried to store the affected columns in a vector and then perform the action on that vector with lapply
cols<- c("a","b","d")
data<- as.matrix(lapply(cols, as.numeric))
But this gives me only empty fields, and of course it only shows the columns I selected and not the rest of the matrix. I also got the error message
NAs introduced by coercion
As a last step I tried the following, but I ended up having a list and not a matrix anymore
data[1:25] <- as.matrix(lapply(data[1:25], as.numeric))
What I would like to have, is a matrix where several columns (not just 1:25 as in my example above but rather, say, columns 1,3 and 6) are converted to numeric and the rest stays the same.
Does someone have an answer and maybe even an explanation for why the things I've tried didn't work?
I am very new to R and I am struggling to understand how to omit NA values in a specific way.
I have a large dataframe with several columns (up to 40) and rows (up to 200ish). I want to use data from one of the columns to do simple stats (wilcox.test, boxplot, etc): one column will have a continuous variable (V1), while the other has a binary variable (V2; 0 or 1), which divides 2 groups. I want to do this for the continuous variable using different V2 binary variables, which are unrelated. I organized this data in Excel, saved it as CSV and am using R Studio.
All these columns have interspersed NA values and when I use omit.na, it just takes off every single row where a NA value is present, which takes away an awful load of data. Is there any simple solution to do this? I have seen some answers to similar topics, but none seems quite exactly what I need to do.
Many thanks for any answer. Again, I am a baby-level newbie to R and may have overlooked something in other topics!
If I understand, you want to apply to function to a pair of column each time.
wilcox.test(V1,V2)
wilcox.test(V1,V3)...
Where Vi have no missing values. I would do something like this :
## use complete.cases to assert that you have no missing values
## for the selected pair
apply_clean <-
function(x,y){
ok <- complete.cases(x, y)
wilcox.test(x[ok],dat$V1[ok])
}
## apply this function to all columns after removing the continuous column
lapply(subset(dat,select=-V1),apply_clean,y=dat$V1)
You can manipulate the data.frame to omit based on any rules you like. For example:
dirty.frame <- data.frame(col1 = c(1,2,3,4,5,6,7,NA,9,10), col2 = c(10, 9, 8, 7,6,5,4,3,2,1))
cleaned.frame <- dirty.frame[!is.na(dirty.frame$col1),]
This code used is.na() to test if a row in a specific column is na. The ! means not, and will omit that row.