How can I convert list into dataframe? - r

I wanted to use left join function for these two datasets.
Let me call the upper one as "a", and
enter image description here
the lower one is "b".
enter image description here
But it didn't work, and I found that it's because common variable(column) used to join them are incompatible types. Thus, I wanted to join these by "link_id" but that of a is integer and that of b is character.
Both are data frame as you can see in the pictures, and I thought this would be easy to solve, by running "as.integer(b)". But the error 'list' object cannot be coerced to type 'integer' was returned despite I saw the data type is dataframe by using str function...
And I also tried "as.integer(b$link_id)", but 'warning message : NAs introduced by coersion' was returned. But there wasn't any NA value in dataframe. When I ignore this warning message and run left_join(a, b), it still returns it can't because of incompatible types of key variable...
So what should I do now? I think currently the best option is finding the way list to dataframe.

Convert b$link_id to numeric and then use merge :
b$link_id=as.numeric(b$link_id)
merge(a,b,by="link_id")

Related

How to select column of dataframe using numbers

At the moment I am selecting columns the usual way:
`df$column1`
But I want to loop through the columns of my dataframe into a plot function, and when I use the method:
`df[,1]`
I get the error:
Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous.
Error in is.finite(x) : default method not implemented for type 'list'
So what would be the best way to select columns of the dataframe so that I can easily loop through the columns and avoid this error.
You can access dataframe columns using double square brackets e.g. df[[1]] prints the first column. For more info on why/how/etc, see Dave Tang's blog post: https://davetang.org/muse/2013/08/16/double-square-brackets-in-r

Error in col2rgb(d) : invalid color name in tweenr

I'm getting this error a lot in using tweenr in RStudio on mac but I'm unable to replicate it using dummy dataset. My dataset is a list of data frames with I want to apply tween_states. Works fine on dummy data, but always return Error in col2rgb(d) : invalid color name and recognise my first character column as a 'color' whenever I use real data.
Hard to be sure, but I think you are passing too many columns to the tweenr function.
The data you send to the tweenr function should be trimmed column wise to only contain the columns used as argument names and one additional column of values that will be tweened
Getting the same issue! I fixed it by making sure the first column only has numbers, no strings. For whatever reason the first column is interpreted as colors if it contains strings. I didn't need to trim any columns down as the other poster suggested.

Converting Data Type from data.table package in R

this might be a dumb/obvious question but unfortunately I haven't had much luck finding information about it online so I thought I'd ask it here. Basically, I'm working with the data.table package in R and I have imported a data set into R where, in a particular column, the values can be both numeric values and character values (and even blank/empty values), and I want to be able to obtain a value from that column and use it for calculations.
The thing about the data.table package though is that when you import a file using the fread() function it automatically sets all values in that file as a character data type, so this can cause a few issues since this means that all numbers are automatically character types as well. I have worked around this slightly by using the as.numeric() function so that if a value obtained from that column is a number then it can be easily converted to numeric type and used in calculations. However, since the column also contains other characters (specifically, it can also have \N or N as values) and since it can also contain blank/empty values, then this means the as.numeric() function will show up with an error. For example, I initially wrote an IF loop to detect whether a column cell had a character value or a numeric value as follows:
if( as.numeric(..{Reference to column cell from file here}...) == NA ) {
x <- 0
}
(where x is just some variable), but it did not work and instead gave the output:
Error in if ((as.numeric(.... :
missing value where TRUE/FALSE needed
In addition: Warning message:
In eval(expr, envir, enclos) : NAs introduced by coercion
(I should note that is.numeric() also did not work since all values in a data.table data set are automatically character values so this function always gives FALSE regardless of it's actual data type).
So clearly I need a better function or method to work around this. Is there a function capable of reading a 'character' value from a column and being able to detect whether that value is truly a numeric type or character type (or even neither, in the case of an empty cell)? Thanks in advance

R: Error in .Primitive, non-numeric argument to binary operator

I did some reading on similar SO questions, but couldn't figure out how to resolve my error.
I have written the following string of code:
points[paste0(score.avail,"_pts")] <-
Map('*', points[score.avail], mget(paste0(score.avail,'_m')) )
Essentially, I have a list of columns in the 'points' data frame, defined by 'score.avail'. I am multiplying each of the columns by a respective constant, defined as the paste0(score.avail, '_m') expression. It appends new fields based on the multiplication, given by paste0(score.avail, "_pts") expression.
I have used this function before in a similar setup with no issues. However, I am now getting the following error:
Error in .Primitive("*")(dots[[1L]][[1L]], dots[[2L]][[1L]]) :
non-numeric argument to binary operator
I'm pretty sure R is telling me that one of the fields I'm trying to multiply is not numeric. However, I have checked all my fields, and they are numeric. I have even tried running a line as.numeric(score.avail) but that doesn't help. I also ran the following to remove NA's in the fields (before the Map function above).
for(col in score.avail){
points[is.na(get(col)) & (data.source == "average" |
data.source == "averageWeighted"), (col) := 0]}
The thing that stumps me is that this expression has worked with no issues before.
Update
I did some more digging by separating out each component of my original function. I'm getting odd output when running points[score.avail]. Previously when I ran this, it would return just the columns for all of my rows. Now, however, I'm getting none of the rows in my original data frame -- rather, it is imputing the column names in the 'score.avail' list as rows and filling in NA's everywhere (this is clearly the source of my problem).
I think this is because I'm using the object I'm pointing to is a data.table with keyvars set. Previously with this function, I had been pointing to a data frame.
Off to try a few more things.
Another Update
I was able to solve my problem by copying the 'points' object using as.data.frame(). However, I will leave the question open to see if anyone knows how to reset the data table key vars so that the function I specified above will work.
I was able to solve my problem by copying the 'points' object using as.data.frame(). Apparently classifying the object as a data.table was causing my headaches.

How can I specify only some colClasses in sqldf file.format?

I have some CSV files with problematic columns for sqldf, causing some numeric columns to be classed as character. How can I just specify the classes for those columns, and not every column? There are many columns, and I don't necessarily want to have to specify the class for all of them.
Much of the data in these problem columns are zeros, so sqldf reads them as integer, when they are numeric (or real) data type. Note that read.csv correctly assigns classes.
I'm not clever enough to generate a suitable data set that has the right properties (first 50 values zero, then a value of say 1.45 in 51st row), but here's an example call to load the data:
df <- read.csv.sql("data.dat", sql="select * from file",
file.format=list(colClasses=c("attr4"="numeric")))
which returns this error:
Error in sqldf(sql, envir = p, file.format = file.format, dbname = dbname, :
formal argument "file.format" matched by multiple actual arguments
Can I somehow use another read.table call to work out the data types?
Can I read all columns in as character, and then convert some to numeric? There are a small number that are character, and it would be easier to specify those than all of the numeric columns. I have come up with this ugly partial solution, but it still fails on the final line with same error message:
df.head <- read.csv("data.dat", nrows=10)
classes <- lapply(df.head, class) # also fails to get classes correct
classes <- replace(classes, classes=="integer", "numeric")
df <- read.csv.sql("data.dat", sql="select * from file",
file.format=list(colClasses=classes))
Take a closer look at the documentation for read.csv.sql, specifically at the argument nrows:
nrows: Number of rows used to determine column types. It defaults to 50. Using -1 causes it to use all rows for determining column types.
Another thing you'll note from looking at the documentation for read.csv.sql and sqldf is that there is no colClasses parameter. If you read the file.format documenation in sqldf , you'll see that parameters in the file.format list are not passed to read.table but rather to sqliteImportFile, which has no understanding of R's data types. If you don't like modifying the nrows parameter, you could read the entire dataframe as having character type and then use whatever methods you like to figure out what column should be what class. You're always going to have the problem of not knowing whether an integer is an integer or numeric until you read the entire column however. Also, if the speed issue is really killing you here, you may want to consider moving away from CSV's.

Resources