Converting Data Type from data.table package in R - r

this might be a dumb/obvious question but unfortunately I haven't had much luck finding information about it online so I thought I'd ask it here. Basically, I'm working with the data.table package in R and I have imported a data set into R where, in a particular column, the values can be both numeric values and character values (and even blank/empty values), and I want to be able to obtain a value from that column and use it for calculations.
The thing about the data.table package though is that when you import a file using the fread() function it automatically sets all values in that file as a character data type, so this can cause a few issues since this means that all numbers are automatically character types as well. I have worked around this slightly by using the as.numeric() function so that if a value obtained from that column is a number then it can be easily converted to numeric type and used in calculations. However, since the column also contains other characters (specifically, it can also have \N or N as values) and since it can also contain blank/empty values, then this means the as.numeric() function will show up with an error. For example, I initially wrote an IF loop to detect whether a column cell had a character value or a numeric value as follows:
if( as.numeric(..{Reference to column cell from file here}...) == NA ) {
x <- 0
}
(where x is just some variable), but it did not work and instead gave the output:
Error in if ((as.numeric(.... :
missing value where TRUE/FALSE needed
In addition: Warning message:
In eval(expr, envir, enclos) : NAs introduced by coercion
(I should note that is.numeric() also did not work since all values in a data.table data set are automatically character values so this function always gives FALSE regardless of it's actual data type).
So clearly I need a better function or method to work around this. Is there a function capable of reading a 'character' value from a column and being able to detect whether that value is truly a numeric type or character type (or even neither, in the case of an empty cell)? Thanks in advance

Related

Converting chr to numeric and still not able to take mean

I am working with a dataframe from NYC opendata. On the information page it claims that a column, ACRES, is numeric, but when I download it is chr. I've tried the following:
parks$ACRES <- as.numeric(as.character(parks$ACRES))
which turned the column info type into dbl, but I was unable to take the mean, so I tried:
parks$ACRES <- as.integer(as.numeric(parks$ACRES))
I've also tried sapply() and I get an error message with NAs introduced by coercion. I tried convert() to but R didn't recognize it though it is supposed to be part of dplyr.
Either way I get NA as a result for the mean.
I've tried taking the mean a few different ways:
mean(parks[["ACRES"]])
mean(parks$ACRES)
Which also didn't work? Is it the dataframe? I'm wondering since it is from the government there are limits?
I'd appreciate any help.
You have NAs in your data. Either they were there before you converted or some of the data can't be converted to numeric directly (do you have comma separators for the 1000s in your input? Those need to be removed before converting to numeric).
Identifying why you have NAs and fixing if necessary is the first step you'll need to do. If the NAs are valid then what you want to do is to add the na.rm = TRUE parameter to the mean function which ignores NAs while calculating the mean.
Check to see how ACRES is being loaded in (i.e., what data type is it?). If it's being loaded in as a factor, you will have trouble changing a factor to a numerical value. The way to solve this is to use the 'stringsAsFactors = FALSE' argument in your read.csv or whatever function you're using to read in the data.

How to select column of dataframe using numbers

At the moment I am selecting columns the usual way:
`df$column1`
But I want to loop through the columns of my dataframe into a plot function, and when I use the method:
`df[,1]`
I get the error:
Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
Error in is.finite(x) : default method not implemented for type 'list'
So what would be the best way to select columns of the dataframe so that I can easily loop through the columns and avoid this error.
You can access dataframe columns using double square brackets e.g. df[[1]] prints the first column. For more info on why/how/etc, see Dave Tang's blog post: https://davetang.org/muse/2013/08/16/double-square-brackets-in-r

Convert characters in an existing dataframe into rownames in R (2018)

I tried the solution given here.
I'm having an identical problem, where my dataframe uses numbers instead of the first column as row names.
When I use the solution from 2012, I get the error:
Error in row.names<-.data.frame(*tmp*, value = value) : invalid 'row.names' length
In addition:
Warning message: Setting row names on a tibble is deprecated.
I saw there is another comment on the original post with the same error, so I think maybe it's an issue with a newer version of R? I would like to know why this error comes up and how I can set my rownames in a dataframe using a column of characters/strings. The end goal is to be able to use these rownames to label the points in a graph (specifically, a prcomp autoplot, if that's important).
I've tried other ways to set my rownames, including loading up the dataframe with just the last two columns while setting the rownames from another dataset, and trying to label plots after the fact with rownames from elsewhere. No matter what, I get the error that my row names are of the wrong length, even though the number of rows is identical.

Error in col2rgb(d) : invalid color name in tweenr

I'm getting this error a lot in using tweenr in RStudio on mac but I'm unable to replicate it using dummy dataset. My dataset is a list of data frames with I want to apply tween_states. Works fine on dummy data, but always return Error in col2rgb(d) : invalid color name and recognise my first character column as a 'color' whenever I use real data.
Hard to be sure, but I think you are passing too many columns to the tweenr function.
The data you send to the tweenr function should be trimmed column wise to only contain the columns used as argument names and one additional column of values that will be tweened
Getting the same issue! I fixed it by making sure the first column only has numbers, no strings. For whatever reason the first column is interpreted as colors if it contains strings. I didn't need to trim any columns down as the other poster suggested.

R: Error in .Primitive, non-numeric argument to binary operator

I did some reading on similar SO questions, but couldn't figure out how to resolve my error.
I have written the following string of code:
points[paste0(score.avail,"_pts")] <-
Map('*', points[score.avail], mget(paste0(score.avail,'_m')) )
Essentially, I have a list of columns in the 'points' data frame, defined by 'score.avail'. I am multiplying each of the columns by a respective constant, defined as the paste0(score.avail, '_m') expression. It appends new fields based on the multiplication, given by paste0(score.avail, "_pts") expression.
I have used this function before in a similar setup with no issues. However, I am now getting the following error:
Error in .Primitive("*")(dots[[1L]][[1L]], dots[[2L]][[1L]]) :
non-numeric argument to binary operator
I'm pretty sure R is telling me that one of the fields I'm trying to multiply is not numeric. However, I have checked all my fields, and they are numeric. I have even tried running a line as.numeric(score.avail) but that doesn't help. I also ran the following to remove NA's in the fields (before the Map function above).
for(col in score.avail){
points[is.na(get(col)) & (data.source == "average" |
data.source == "averageWeighted"), (col) := 0]}
The thing that stumps me is that this expression has worked with no issues before.
Update
I did some more digging by separating out each component of my original function. I'm getting odd output when running points[score.avail]. Previously when I ran this, it would return just the columns for all of my rows. Now, however, I'm getting none of the rows in my original data frame -- rather, it is imputing the column names in the 'score.avail' list as rows and filling in NA's everywhere (this is clearly the source of my problem).
I think this is because I'm using the object I'm pointing to is a data.table with keyvars set. Previously with this function, I had been pointing to a data frame.
Off to try a few more things.
Another Update
I was able to solve my problem by copying the 'points' object using as.data.frame(). However, I will leave the question open to see if anyone knows how to reset the data table key vars so that the function I specified above will work.
I was able to solve my problem by copying the 'points' object using as.data.frame(). Apparently classifying the object as a data.table was causing my headaches.

Resources