R error: "duplicate 'row.names' are not allowed" - r

I got the error when I wanted to set the first column as the row names:
dt <- fread('../data/data_logTMP.csv', header = T)
rownames(dt) <- dt$GENE
I used duplicated() to check the values:
> which(duplicated(dt$GENE) == TRUE)
[1] 20209 21919
Therefore, I compared these values:
> dt$GENE[20209] == dt$GENE[21919]
[1] FALSE
> dt$GENE[20209]
[1] "1-Mar"
> dt$GENE[21919]
[1] "2-Mar"
Why were these two values recognized as duplicated? And how can I fix this problem?

As you are using fread for reading the file the default class of you object dt will be of data.table. By design data.table will not support row.names. Therefore you need to pass an additional argument to fread as shown below to make sure that the class of the object that you are reading is not a data.table.
data.table::fread(input = "file name",sep = ",",header = T,data.table = FALSE)

Related

R base::options with variables

I have noticed some behaviour in R's base::options() that I am unable to fully understand.
the following is fine:
> vals_vector
[1] "temp" "hum" "co2" "voc" "pm1" "pm2_5" "pm10"
> options("hum" = TRUE)
> if (getOption("hum")) {
+ print("stuff")
+ }
[1] "stuff"
And this is also fine:
> options(TEMP_ENABLE = "temp" %in% vals_vector)
> getOption("TEMP_ENABLE")
[1] TRUE
However the following does not work.
> options(as.character(vals_vector[1]) = TRUE)
Error: unexpected '=' in "options(as.character(vals_vector[1]) ="
> as.character(vals_vector[1])
[1] "temp"
> "temp"
[1] "temp"
Makes no sense. You can see, I have evaluated the argument and it's exactly the same is both cases. Just in one I've used the variable. My intention was to use a loop to set an option for each variable present in a data set. Why doesn't this work as expected?
You can't use function calls as stand-ins for argument names in R. This is nothing to do with options, it's just how the R parser works. Take the following example, using the function data.frame.
data.frame(A_B = 2)
#> A_B
#> 1 2
Suppose we wanted to generate the name A_B programmatically:
paste("A", "B", sep = "_")
#> [1] "A_B"
Looks good. But if we try to use this function call with the intention that its output is interpreted as an argument name, the parser will simply tell us we have a syntax error:
data.frame(paste("A", "B", sep = "_") = 2)
#> Error: unexpected '=' in "data.frame(paste("A", "B", sep = "_") ="
There are ways round this - with most base R functions we would create a named list programmatically and pass that as an argument list using do.call:
mylist <- list(TRUE)
names(mylist) <- vals_vector[1]
do.call(options, mylist)
getOption("temp")
#> [1] TRUE
However, if you read the docs for options, it says
Options can also be passed by giving a single unnamed argument which is a named list.
So a more concise idiom would be:
options(setNames(list(TRUE), vals_vector[1]))
getOption("temp")
#> [1] TRUE

'Error in !`*tmp*` : invalid argument type' error when using !! operator in R

I am trying to programmatically initialize some variables in R so that the variable name would be the evaluated content of the string.
Just this code:
library(dplyr)
v <- 'sum.of.ranfx'
new_v = sym(v)
!!new_v <- vector(mode = "list", length = 122)
fails with
Error in !`*tmp*` : invalid argument type
Google gives me no hits for this exact error. Here is an example accepted and upvoted SO answer whose syntax example I think I am following. Can you tell me what I'm doing wrong?
You may use assign -
v <- 'sum.of.ranfx'
assign(v, vector(mode = "list", length = 122))
raw !! is not accepted, so you should put ` around the !!new to make it work
`!!new`= vector(mode = "list", length = 122)
I think it's because ! is used to 'reverse' a T or a F
ex:
> !TRUE
[1] FALSE
> !!TRUE
[1] TRUE
So when you try to create a variable without specifying the ! should be used as a character, R tries to perform this reverse operation.
Was this understandable?
We may use list2env
v <- 'sum.of.ranfx'
list2env(setNames(list(vector(mode = "list", length = 122)), v), .GlobalEnv)
-checking the object
> head(sum.of.ranfx)
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
[[4]]
NULL
[[5]]
NULL
[[6]]
NULL
For curious future readers, here is what I was doing wrong: attaching the wrong package. I needed library(rlang) in addition to library(dplyr), and I was confused by the fact that one piece of example code on the internet was explicitly showing dplyr being attached, and not rlang, but that's not a feature of the SO answer I linked in the question. When rlang is attached it runs without error.

R - How to determine if every value in column of dataframe is zero?

I have a dataframe and want to determine for a given column if every value in the column is equal to zero.
This is the code I have:
z <- read.zoo(sub, sep = ",", header = TRUE, index = 1:2, tz = "", format = "%Y-%m-%d %H:%M:%S")
if(all.equal(z$C_duration, 0))
C_dur_acf = NA
But I am getting an error:
Error in if (all.equal(z$C_duration, 0)) { :
argument is not interpretable as logical
The code should return a boolean value (TRUE/FALSE) if the entire column is all zeros.
Use all builtin: all(z$C_duration == 0)
Here is an example by using the iris dataset built in R and apply function in addiction with all that allows you to test if all elements of the objects you pass in it do respect one or more logical conditions.
Do note that in this case the "objects" is a column of the data frame. The code with lapply do the same for every column.
lapply(iris[-5], function(x) all(x == 0))
$Sepal.Length
[1] FALSE
$Sepal.Width
[1] FALSE
$Petal.Length
[1] FALSE
$Petal.Width
[1] FALSE
To use all.equal:
if(all.equal(z$C_duration, rep(0, length(z$C_duration)){
C_dur_acf = NA
}
In essence all.equal does a pair-wise test. The if statement is failing because all.equal(z$C_duration,0) returns: "Numeric: lengths (##, 1) differ"
HTH!

Data.frames in R: name autocompletion?

Sorry if this is trivial. I am seeing the following behaviour in R:
> myDF <- data.frame(Score=5, scoreScaled=1)
> myDF$score ## forgot that the Score variable was capitalized
[1] 1
Expected result: returns NULL (even better: throws error).
I have searched for this, but was unable to find any discussion of this behaviour. Is anyone able to provide any references on this, the rationale on why this is done and if there is any way to prevent this? In general I would love a version of R that is a little stricter with its variables, but it seems that will never happen...
The $ operator needs only the first unique part of a data frame name to index it. So for example:
> d <- data.frame(score=1, scotch=2)
> d$sco
NULL
> d$scor
[1] 1
A way of avoiding this behavior is to use the [[]] operator, which will behave like so:
> d <- data.frame(score=1, scotch=2)
> d[['scor']]
NULL
> d[['score']]
[1] 1
I hope that was helpful.
Cheers!
Using [,""] instead of $ will throw an error in case the name is not found.
myDF$score
#[1] 1
myDF[,"score"]
#Error in `[.data.frame`(myDF, , "score") : undefined columns selected
myDF[,"Score"]
#[1] 5
myDF[,"score", drop=TRUE] #More explicit and will also work with tibble::as_tibble
#Error in `[.data.frame`(myDF, , "score", drop = TRUE) :
# undefined columns selected
myDF[,"Score", drop=TRUE]
#[1] 5
as.data.frame(myDF)[,"score"] #Will work also with tibble::as_tibble and data.table::as.data.table
#Error in `[.data.frame`(as.data.frame(myDF), , "score") :
# undefined columns selected
as.data.frame(myDF)[,"Score"]
#[1] 5
unlist(myDF[,"score"], use.names = FALSE) #Will work also with tibble::as_tibble and data.table::as.data.table
#Error in `[.data.frame`(myDF, , "score") : undefined columns selected
unlist(myDF[,"Score"], use.names = FALSE)
#[1] 5

Replace NAs in a ffdf object

I`m working with a ffdf object which has NAs in some of the columns. The NAs are the result of a left outer merge using merge.ffdf.I would like to replace the NAs with 0s but not managing to do it.
Here is the code I am running:
library(ffbase)
deals <- merge(deals,rk,by.x=c("DEALID","STICHTAG"),by.y=c("ID","STICHTAG"),all.x=TRUE)
attributes(deals)
$names
[1] "virtual" "physical" "row.names"
$class
[1] "ffdf"
vmode(deals$CREDIT_R)
[1] "double"
idx <- ffwhich(deals,is.na(CREDIT_R)) # CREDIT_R is one of the columns with NAs
deals.strom[idx,"CREDIT_R"]<-0
error in `[<-.ffdf`(`*tmp*`, idx, "CREDIT_R", value = 0) :
ff/ffdf-iness of value and selected columns don't match
Any idea what I am doing wrong? In general I would like to learn more about replacing methods for class ff and ffdf. Any suggestion where I can find some examples about the topic?
The manual of package ff indicates a function called ffindexset.
idx <- is.na(deals$CREDIT_R) ## This uses is.na.ff_vector from ffbase
idx <- ffwhich(idx, idx == TRUE) ## Is part of ffbase
deals$CREDIT_R <- ffindexset(x=deals$CREDIT_R, index=idx, value=ff(0, length=length(idx), vmode = "double")) ## Is part of ff
deals$CREDIT_R[idx] <- ff(0, length=length(idx), vmode = "double") ## this one will probably also work
Also have a look at ?Extract.ff

Resources