data.table subsetting in i with column number [duplicate] - r

This question already has answers here:
data.table in r : subset using column index
(2 answers)
Closed 5 years ago.
Is it possible to subset a data.table in i, referencing the column not by its name (e.g. by number/position)?
Example:
library(data.table)
dt <- data.table(A=1:18, Name=c('A','B','C'))
dt2 <- data.table(A=2:20, Username=c('A','B','C'))
#stuff happens and eventually I end up with either dt or dt2 copied to a final dt
#depending on which is there, I want to get only "A"s
final[Name=='A']
final[Username=='A']
But I want a way that I can subset both data.tables with the same call despite the different column names. One potential solution is to set the key for each dt as Name and Username, then subset like this: final['A'] but I am wondering if there is another way.
I can't change the column names because they are going into a table in a shiny app.

If this is based on position, then we extract the column with numeric column index using [[ and do the comparison to get the logical vector and subset the rows based on it
final[final[[2]]=="A"]

Related

dplyr filter-brand new to R, how do I filter out multiple values from a single column [duplicate]

This question already has answers here:
Opposite of %in%: exclude rows with values specified in a vector
(13 answers)
Closed last year.
How can I filter out (exclude) from a single column called "record". I would like to exclude record = (1,2,3,6,8,10,15,16) from a single column. dataset name is "sample". Sorry for a simple question I am brand new to R.
sample data set below
The dplyr library from tidyverse is very helpful for these types of problems.
library(dplyr)
df_filtered<-df %>%
filter(!(record %in% c(1,2,3,6,8,10,15,16)))

creating, directly, data.tables with column names from variables, and using variables for column names with := [duplicate]

This question already has answers here:
Select / assign to data.table when variable names are stored in a character vector
(6 answers)
Closed 3 years ago.
The only way I know so far is in two steps: creating the columns with dummy names and then using setnames(). I would like to do it in one step, probably there is some parameter/option, but am not able to find it
# the awkward way I have found so far
col_names <- c("one", "two","three")
dt <- data.table()
# add columns with dummy names...
setnames(dt, col_names )
Also interested in a way to be able to use a variable with :=, something like
colNameVar <- "dummy_values"
DT[ , colNameVar := 1:10]
This question to me does not seem a duplicate of Select / assign to data.table when variable names are stored in a character vector
here I ask about when creating a data.table, word "creating" in the title.
This is totally different from when the data table is already created, which is the subject of the question indicated as duplicate, for the latter there are kown ways clearly documented, that do not work in the case I ask about here.
PS. Note similar question indicated in comment by # Ronak Shah: Create empty data frame with column names by assigning a string vector?
For the first question, I'm not absolutely sure, but you may want to try and see if fread is of any help creating an empty data.table with named columns.
As for the second question, try
DT[, c(nameOfCols) := 10]
Where nameOfCols is the vector with names of the columns you want to modify. See ?data.table

Loop Through Column Names with Similar Structure [duplicate]

This question already has answers here:
How to extract columns with same name but different identifiers in R
(3 answers)
Closed 3 years ago.
I have a very large dataset. Of those, a small subset have the same column name with an indexing value that is numeric (unlike the post "How to extract columns with same name but different identifiers in R" where the indexing value is a string). For example
Q_1_1, Q_1_2, Q_1_3, ...
I am looking for a way to either loop through just those columns using the indices or to subset them all at once.
I have tried to use paste() to write their column names but have had no luck. See sample code below
Define Dataframe
df = data.frame("Q_1_1" = rep(1,5),"Q_1_2" = rep(2,5),"Q_1_3" = rep(3,5))
Define the Column Name Using Paste
cn <- as.symbol(paste("Q_1_",1, sep=""))
cn
df$cn
df$Q_1_1
I want df$cn to return the same thing as df$Q_1_1, but df$cn returns NULL.
If you are just trying to subset your data frame by column name, you could use dplyr for subseting all your indexed columns at once and a regex to match all column names with a certain pattern:
library(dplyr)
df = data.frame("Q_1_1" = rep(1,5),"Q_1_2" = rep(2,5),"Q_1_3" = rep(3,5), "A_1" = rep(4,5))
newdf <- df %>%
dplyr::select(matches("Q_[0-9]_[0-9]"))
the [0-9] in the regex matches any digit between the _. Depending on what variable you're trying to match you might have to change the regular expression.
The problem with your solution was that you only saved the name of your columns but did not actually assign it back to the data frame / to a column.
I hope this helps!

Delete multiple columns by reference using reverse selection in data.Table [duplicate]

This question already has an answer here:
How do I subset column variables in DF1 based on the important variables I got in DF2?
(1 answer)
Closed 5 years ago.
I want to delete the columns that are not in a list using reference.
library("data.table")
df <- data.frame("ID"=1:10,"A"=1:10,"B"=1:10,"C"=1:10,"D"=1:10)
setDT(df,key="ID")
list_to_keep <- c("ID","A","B","C")
df[,!names(df)%in%list_to_keep,with=FALSE]
gives me a selection of the columns that I want to delete, but when I do:
df <- data.frame("ID"=1:10,"A"=1:10,"B"=1:10,"C"=1:10,"D"=1:10)
setDT(df,key="ID")
list_to_keep <- c("ID","A","B","C")
df[,!names(df)%in%list_to_keep:=NULL,with=FALSE]
I get LHS of := isn't a column names ('character' or positions ('integer' or 'numeric'). What is the correct way of doing this?
We can use the setdiff to get the names of the dataset that are not in the list_to_keep and assign (:=) it to NULL
df[, setdiff(names(df), list_to_keep) := NULL]
As #rosscova mentioned, using which on the logical vector can be used to get the position of the column and to assign the columns to NULL
df[, which(!names(df)%in%list_to_keep):=NULL]
LHS of := is "A character vector of column names (or numeric positions) or a variable that evaluates as such."
!names(df)%in%list_to_keep is logical vector.
So,
df[,names(df)[!names(df)%in%list_to_keep]:=NULL]
will work.

Passing columns in a variable (dynamically) when aggregating in data.table [duplicate]

This question already has answers here:
In R data.table, how do I pass variable parameters to an expression?
(1 answer)
Select / assign to data.table when variable names are stored in a character vector
(6 answers)
How to use data.table within functions and loops?
(2 answers)
Closed 5 years ago.
I need to aggregate a data.table and create a table with counts, means and other statistics for several variables. The format for the output table should always be the same, but I need to aggregate by various methods. How can I set the output columns and aggregate statistics once and use for different by= choices?
# Create data.table
library(data.table)
DT <- data.table(iris)
# This works, but is long and needs to be updated in multiple
# place whenever I update the output format
DT[,list(theCount=.N,
meanSepalWidth=mean(Sepal.Width),
meanPetalWidth=mean(Petal.Width)),
by=Species]
# This does not work. How could I achieve what I'm trying to do here?
col.list <- list(theCount=.N,
meanSepalWidth=mean(Sepal.Width),
meanPetalWidth=mean(Petal.Width))
DT[,col.list, by=Species]

Resources