can't attach row and col names into a dataframe in r

can't attach row and col names into a dataframe in r - r

I have a dataframe matrix with 31053 obs and 4909 variables. I have two separate dataframes, barcodes.tsv and featurescorrected with 4909 and 31053 rows, respectively, which are the row and col names of this dataframe. I am trying to attach them with the following
barcodes.tsv <- t(barcodes.tsv)
row.names(matrix) = featurescorrected
col.names(matrix) = barcodes.tsv
But I get these two errors
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length
> col.names(matrix) = barcodes.tsv
Error in col.names(matrix) = barcodes.tsv :
could not find function "col.names<-"
I don't understand how the length is not correct, as it has the exact same value as my dataframe. I also don't get why the col.names function is not found, as far as I know this is not from a package or anything like it
What am I doing wrong?

As #AdroMine said in the comments, your col.names function needs to be colnames. You can use this code:
barcodes.tsv <- t(barcodes.tsv)
row.names(matrix) = featurescorrected
colnames(matrix) = barcodes.tsv

Related

rowsum of last column of dataframe

I have a function below to calculate summary,i want to calculate the sum of last row (last row can have many columns, also can have "NA". do we have any solution for this..????
dataa<-data.frame(
aa = c("q","r","y","v","g","y","d","s","n","k","y","d","s","t","n","u","l","h","x","c","q","r","y","v","g","y","d","s","n","k","y","d","s","t","n","u","l","h","x","c"),
col1=c(1,2,3,2,1,2,3,4,4,4,5,3,4,2,1,2,5,3,2,1,2,4,2,1,3,2,1,2,3,1,2,2,4,4,4,1,2,5,3,5),
col2=c(2,1,1,7,4,1,2,7,5,7,2,6,2,2,6,3,4,3,2,5,7,5,6,4,4,6,5,6,4,1,7,3,2,7,7,2,3,7,2,4)
)
df <- database %>% select(!!var1,!!var2)
tab1 <- expss::cro_cpct(df[[1]],df[[2]])
Error in FUN(X[[i]], ...) :
only defined on a data frame with all numeric variables

Since your first column contains the string #Total cases, sum will throw an error. Excluding the first column will work. Also, adding na.rm=TRUE will ignore NAs
sum(tab1[nrow(tab1),-1], na.rm = T)

Subset data.table based on value in column of type list

So I have this case currently of a data.table with one column of type list.
This list can contain different values, NULL among other possible values.
I tried to subset the data.table to keep only rows for which this column has the value NULL.
Behold... my attempts below (for the example I named the column "ColofTypeList"):
DT[is.null(ColofTypeList)]
It returns me an Empty data.table.
Then I tried:
DT[ColofTypeList == NULL]
It returns the following error (I expected an error):
Error in .prepareFastSubset(isub = isub, x = x, enclos = parent.frame(), :
RHS of == is length 0 which is not 1 or nrow (96). For robustness, no recycling is allowed (other than of length 1 RHS). Consider %in% instead.
(Just a precision my original data.table contains 96 rows, which is why the error message say such thing:
which is not 1 or nrow (96).
The number of rows is not the point).
Then I tried this:
DT[ColofTypeList == list(NULL)]
It returns the following error:
Error: comparison of these types is not implemented
I also tried to give a list of the same length than the length of the column, and got this same last error.
So my question is simple: What is the correct data.table way to subset the rows for which elements of this "ColofTypeList" are NULL ?
EDIT: here is a reproducible example
DT<-data.table(Random_stuff=c(1:9),ColofTypeList=rep(list(NULL,"hello",NULL),3))
Have fun!

If it is a list, we can loop through the list and apply the is.null to return a logical vector
DT[unlist(lapply(ColofTypeList, is.null))]
# ColofTypeList anotherCol
#1: 3
Or another option is lengths
DT[lengths(ColofTypeList)==0]
data
DT <- data.table(ColofTypeList = list(0, 1:5, NULL, NA), anotherCol = 1:4)

I have found another way that is also quite nice:
DT[lapply(ColofTypeList, is.null)==TRUE]
It is also important to mention that using isTRUE() doesn't work.

Select odd rows from a specific column in a dataframe

I have a large df with a specific numeric column named Amount.
df = data.frame(Amount = c(as.numeric(1:14)), stringsAsFactors = FALSE)
I want to select odd rows. So far, I have tried with the syntax below but I always get this error messages:
df$Amount[c(FALSE, TRUE),]
Error in df$Amount[c(FALSE, TRUE), ] : incorrect number of dimensions
seq_len(ncol(df$Amount)) %% 2
Error in seq_len(ncol(df$Amount)) :
argument must be coercible to non-negative integer
In addition: Warning message:
In seq_len(ncol(df$Amount)) :
first element used of 'length.out' argument
odd = seq(1,14,1)
df$Amount[odd,1]
Error in P20$Journal.Amount[even, 1] : incorrect number of dimensions
P20$Journal.Amount[seq(2,length(14), 2),]
Error in seq.default(2, length(14), 2) : wrong sign in 'by' argument
My question is: Is there a way I can do this directly? I tried with the solutions of questions previously posted but so far, I keep having these error messages.
BaseR preferably.

The row/column index is used when there are dim attributes. vector doesn't have it.
is.vector(df$Amount)
If we extract the vector, then just use the row index
df$Amount[c(FALSE, TRUE)]
If we want to subset the rows of the dataset,
df[c(FALSE, TRUE), 'Amount', drop = FALSE]
In the above code, we are specify the row index (i), 'j' as the column index or column name, and drop (?Extract - is by default drop = TRUE for data.frame. So, we need to specify drop = FALSE to not lose the dimensions and coerce to a vector)

Trying to figure out how to return the mean value of each column in a data frame using a list

I have a data frame which shows the average life expectancy of a country from 1800 to 2018.The Columns are labeled like this: XYear. For example: X2000. I made a function which returns the mean value of a selected column. Here's the part I'm struggling with: the assignment is asking me to create a list which has the mean value of every column in the data frame, using the aforementioned function.
I tried making a list element which would select all rows and columns except for the first ones (selecting them with [-1,-1]).
life_exp <- read.csv("data/life_expectancy_years.csv", stringsAsFactors = FALSE)
Write a function get_col_mean() that takes in a column name and a data frame and returns the mean of that column. Make sure to properly handle NA values
get_col_mean <- function(col_name, data_frame_name) {
return(mean(data_frame_name[, col_name], na.rm = TRUE))
}
Create a list col_means that has the mean value of each column in the data frame (except the Country column). You should use your function above.
I tried this:
column_means = get_col_mean(life_exp$life_exp[, -1], life_exp)
But I got this error message:
In mean.default(data_frame_name[, col_name], na.rm = TRUE) :
argument is not numeric or logical: returning NA

I believe you are misusing the $ operator. This is used to grab a single column by name.
#data frame
z <- data.frame(l = c(1,2,3,4), y = c(4,3,2,3), c =c(1,'',3,4)))
z$l
[1] 1 2 3 4
z$z
NULL
#numeric (note that I am providing the column name as a string
get_col_mean("l", z)
#outout
[1] 3
#this is the same as putting NULL in
get_col_mean(z$z, z)
#your presumed error
[1] NA
Warning message:
In mean.default(data_frame_name[, col_name], na.rm = TRUE) :
argument is not numeric or logical: returning NA
If you are looking to apply this to each column, a for loop or the apply family of functions is likely what you are looking for.

For Loop to convert string to list

I have a column in a data frame, which contains string values. I want to convert these values to lists of characters. When i try to execute the following code:
library(tidyverse)
col <- c("a,b,c,d","e,f,h")
df <- data_frame(col)
for (i in 1:length(df$col)) {
df$col[[i]] <- as.vector(unlist(strsplit(df$col[[i]],",")),mode ="list")
}
i get this error message:
Error in df$col[[i]] <- as.vector(unlist(strsplit(df$col[[i]], ",")), : more elements supplied than there are to replace
Traceback:
Is there a way to convert all the values in the column to lists ?
Thanks

If I understand your question correctly, then this will do the trick:
rapply(df, list)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

can't attach row and col names into a dataframe in r - r

As #AdroMine said in the comments, your col.names function needs to be colnames. You can use this code: barcodes.tsv <- t(barcodes.tsv) row.names(matrix) = featurescorrected colnames(matrix) = barcodes.tsv

Related

rowsum of last column of dataframe

Subset data.table based on value in column of type list

Select odd rows from a specific column in a dataframe

Trying to figure out how to return the mean value of each column in a data frame using a list

For Loop to convert string to list

Categories

Resources