Select odd rows from a specific column in a dataframe - r

I have a large df with a specific numeric column named Amount.
df = data.frame(Amount = c(as.numeric(1:14)), stringsAsFactors = FALSE)
I want to select odd rows. So far, I have tried with the syntax below but I always get this error messages:
df$Amount[c(FALSE, TRUE),]
Error in df$Amount[c(FALSE, TRUE), ] : incorrect number of dimensions
seq_len(ncol(df$Amount)) %% 2
Error in seq_len(ncol(df$Amount)) :
argument must be coercible to non-negative integer
In addition: Warning message:
In seq_len(ncol(df$Amount)) :
first element used of 'length.out' argument
odd = seq(1,14,1)
df$Amount[odd,1]
Error in P20$Journal.Amount[even, 1] : incorrect number of dimensions
P20$Journal.Amount[seq(2,length(14), 2),]
Error in seq.default(2, length(14), 2) : wrong sign in 'by' argument
My question is: Is there a way I can do this directly? I tried with the solutions of questions previously posted but so far, I keep having these error messages.
BaseR preferably.

The row/column index is used when there are dim attributes. vector doesn't have it.
is.vector(df$Amount)
If we extract the vector, then just use the row index
df$Amount[c(FALSE, TRUE)]
If we want to subset the rows of the dataset,
df[c(FALSE, TRUE), 'Amount', drop = FALSE]
In the above code, we are specify the row index (i), 'j' as the column index or column name, and drop (?Extract - is by default drop = TRUE for data.frame. So, we need to specify drop = FALSE to not lose the dimensions and coerce to a vector)

Related

can't attach row and col names into a dataframe in r

I have a dataframe matrix with 31053 obs and 4909 variables. I have two separate dataframes, barcodes.tsv and featurescorrected with 4909 and 31053 rows, respectively, which are the row and col names of this dataframe. I am trying to attach them with the following
barcodes.tsv <- t(barcodes.tsv)
row.names(matrix) = featurescorrected
col.names(matrix) = barcodes.tsv
But I get these two errors
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length
> col.names(matrix) = barcodes.tsv
Error in col.names(matrix) = barcodes.tsv :
could not find function "col.names<-"
I don't understand how the length is not correct, as it has the exact same value as my dataframe. I also don't get why the col.names function is not found, as far as I know this is not from a package or anything like it
What am I doing wrong?
As #AdroMine said in the comments, your col.names function needs to be colnames. You can use this code:
barcodes.tsv <- t(barcodes.tsv)
row.names(matrix) = featurescorrected
colnames(matrix) = barcodes.tsv

Subset data.table based on value in column of type list

So I have this case currently of a data.table with one column of type list.
This list can contain different values, NULL among other possible values.
I tried to subset the data.table to keep only rows for which this column has the value NULL.
Behold... my attempts below (for the example I named the column "ColofTypeList"):
DT[is.null(ColofTypeList)]
It returns me an Empty data.table.
Then I tried:
DT[ColofTypeList == NULL]
It returns the following error (I expected an error):
Error in .prepareFastSubset(isub = isub, x = x, enclos = parent.frame(), :
RHS of == is length 0 which is not 1 or nrow (96). For robustness, no recycling is allowed (other than of length 1 RHS). Consider %in% instead.
(Just a precision my original data.table contains 96 rows, which is why the error message say such thing:
which is not 1 or nrow (96).
The number of rows is not the point).
Then I tried this:
DT[ColofTypeList == list(NULL)]
It returns the following error:
Error: comparison of these types is not implemented
I also tried to give a list of the same length than the length of the column, and got this same last error.
So my question is simple: What is the correct data.table way to subset the rows for which elements of this "ColofTypeList" are NULL ?
EDIT: here is a reproducible example
DT<-data.table(Random_stuff=c(1:9),ColofTypeList=rep(list(NULL,"hello",NULL),3))
Have fun!
If it is a list, we can loop through the list and apply the is.null to return a logical vector
DT[unlist(lapply(ColofTypeList, is.null))]
# ColofTypeList anotherCol
#1: 3
Or another option is lengths
DT[lengths(ColofTypeList)==0]
data
DT <- data.table(ColofTypeList = list(0, 1:5, NULL, NA), anotherCol = 1:4)
I have found another way that is also quite nice:
DT[lapply(ColofTypeList, is.null)==TRUE]
It is also important to mention that using isTRUE() doesn't work.

How to properly apply RowMeans()? "X is not numeric" error

I have two columns within OtherIncludedClean, and I would like to add another column of OtherIncludedClean$Mean; however, my efforts are in vain.
I have tried:
OtherIncludedClean$mean <- rowMeans(OtherIncludedClean, na.rm = FALSE, dims = 1)
But, the above reports the error:
"Error in base::rowMeans(x, na.rm = na.rm, dims = dims, ...) :
'x' must be numeric"
I have also attempted:
OtherIncludedClean$mean <- apply(OtherIncludedClean, 1, function(x) { mean(x, na.rm=TRUE) })
Which reports this error:
"1: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA"
For all 141 rows.
Any and all help appreciated. Thank you .
My columns are "X__1" and "X__2"
When we get error 'x' must be numeric", it is better to check the column types. An easier option is
str(OtherIncludedClean)
If we find that the types are not numeric/integer and it is character/factor, we need to convert it to numeric type (assuming that most of the values are numeric in a column and due to one or two elements which is not numeric, it changes the type).
The way to convert is as.numeric. For a single column, as.numeric(data$columnname) if it is character class and for factor class,
as.numeric(as.character(data$columnname))
Here, we need to change all the columns to numeric (assuming it is character class). For that, loop through the columns with lapply and assign the output back to the dataset
OtherIncludedClean[] <- lapplyOtherIncludedClean, as.numeric)
and then apply the rowMeans
If the class of only a subset of columns are character, then we need to only loop through those columns
i1 <- !sapply(OtherIncludedClean, is.numeric)
OtherIncludedClean[i1] <- lapplyOtherIncludedClean[i1], as.numeric)

Removing rows in list (with ff)

I have a data set where I want to remove every row in which Dataset$a
does not have the value "Right". Dataset$a is a list with three diffrent objects "Right", "Wrong1" and "Wrong2". I tried to do this by using the code:
Dataset$a <- subset.ffdf(Dataset, a == "Right")
But I get the error
Error in if (any(B < 1)) stop("B too small") : missing value where TRUE/FALSE needed
In addition: Warning message:
In bbatch(n, as.integer(BATCHBYTES/theobytes)) :
NAs introduced by coercion to integer range
What should I do instead?
The warning that is returned has something do in working with a large data set.
see here
Before doing your filter see if row a is a factor or character using:
str(Dataset$a)
If it is in character format, this should work
finalDf <- Dataset[Dataset$a != "Right", ]
Or you can use dplyr like so:
require(dplyr)
newData <- Dataset%>% dplyr::filter(a=="Right")

How to transform data table columns, indexed by position, by reference?

I have a data.table that houses several columns of factors. I'd like to convert 2 columns originally read as factors to their original numeric values. Here's what I've tried:
data[, c(4,5):=c(as.numeric(as.character(4)), as.numeric(as.character(5))), with=FALSE]
This gives me the following warnings:
Warning messages:
1: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), :
Supplied 2 items to be assigned to 7 items of column 'Bentley (R)' (recycled leaving remainder of 1 items).
2: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), :
Supplied 2 items to be assigned to 7 items of column 'Sparks (D)' (recycled leaving remainder of 1 items).
3: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), :
Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.
4: In `[.data.table`(data, , `:=`(c(4, 5), c(as.numeric(as.character(4)), :
Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.
Also I can tell the conversion has not succeeded because the 4th and 5th columns persist in being factors after this code has run.
As an alternate, I tried this code, which won't run at all:
data[, ':=' (4=c(as.numeric(as.character(4)), 5 = as.numeric(as.character(5)))), with=FALSE]
Finally, I tried referencing the column names via colnames:
data[ , (colnames(data)[4]) := as.numeric(as.character(colnames(data)[4]))]
This runs but results in a row of NAs as well as the following errors:
Warning messages:
1: In eval(expr, envir, enclos) : NAs introduced by coercion
2: In `[.data.table`(data, , `:=`((colnames(data)[4]), as.numeric(as.character(colnames(data)[4])))) :
Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.
3: In `[.data.table`(data, , `:=`((colnames(data)[4]), as.numeric(as.character(colnames(data)[4])))) :
RHS contains -2147483648 which is outside the levels range ([1,6]) of column 1, NAs generated
I need to do this by position and not by column name, since the column name will depend on the URL. What's the proper way to transform columns by position using data.table?
I also have a related query, which is how to transform numbered columns relative to other numbered columns. For example, if I want to set the 3rd column to be equal to 45 minus the value of the 3rd column plus the value of the 4th column, how would I do that? Is there some way to distinguish between a real # vs a column number? I know something like this is not the way to go:
dt[ , .(4) = 45 - .(3) + .(4), with = FALSE]
So then how can this be done?
If you want to assign by reference and position, you need to get the column names to assign to as a character vector or the column numbers as an integer vector and use .SDcols (at least in data.table 1.9.4).
First a reproducible example:
library(data.table)
DT <- data.table(iris)
DT[, c("Sepal.Length", "Petal.Length") := list(factor(Sepal.Length), factor(Petal.Length))]
str(DT)
Now let's convert the columns:
DT[, names(DT)[c(1, 3)] := lapply(.SD, function(x) as.numeric(as.character(x))),
.SDcols = c(1, 3)]
str(DT)
Alternatively:
DT[, c(1,3) := lapply(.SD, function(x) as.numeric(as.character(x))), .SDcols=c(1,3)]
str(DT)
Note that := expects a vector of column names or positions on the left side and a list on the right side.

Resources