Using an element from a table in selecting columns/rows in R - r

I've been working on a process to create all possible combinations of unique integers for lengths 1:n. I found the nCr function (combn function in the combinat package to be useful here).
Once all unique occurrences are iterated, they are appended to a consolidation table that contains any possible length+combination of the digits 1:n. A subset of the final table's relevant column (one record) looks like this (column is named String and the subset table f1):
c(1,3,4,5,9,10)
I need to select these columns from a secondary data source (df) one at a time (I am going to loop through this table), so my logic was to use this code:
df[,f1$String]
However, I get a message that says that undefined columns are selected, but if I copy and paste the contents of the cell such as:
df[,c(1, 3, 4, 5, 9, 10)]
it works fine ... I've tried all I can think of at this point; if anyone has some insight it would be greatly appreciated.
Code to reproduce is:
library(combinat)
library(data.table)
library(plyr)
rm(list=ls())
NCols=10
NRows=10
myMat<-matrix(runif(NCols*NRows), ncol=NCols)
XVars <- as.data.frame(myMat)
colnames(XVars) <- c("a","b","c","d","e","f","g","h","i","j")
x1 <- as.data.frame(colnames(XVars[1:ncol(XVars)]))
colnames(x1) <- "Independent.Variable"
setDT(x1)[, Index := .GRP, by = "Independent.Variable"]
colClasses = c("character", "numeric", "numeric")
col.names = c("String", "r!", "n!")
Combination <- read.table(text = "", colClasses = colClasses, col.names = col.names)
for(i in 1:nrow(x1)){
x2<- as.data.frame(combn(nrow(x1),i))
for (i in 1:ncol(x2)){
x3 <- paste("c(",paste(x2[1:nrow(x2),i], collapse = ", "), ")", sep="")
x3 <- as.data.frame(x3)
colnames(x3) <- "String"
x3 <- mutate(x3, "r!" = nrow(x2))
x3 <- mutate(x3, "n!" = nrow(x1))
Combination <- rbind(Combination, x3)
}
}
setDT(Combination)[, Index := .GRP, by = c("String", "r!", "n!")]
f1 <- Combination[717,]
f1$String <- as.character(f1$String)
## reference to data frame
myMat[,(f1$String)]
## pasted element
myMat[, c(1, 3, 4, 5, 9, 10)]

f1$String is the string "c(1, 3, 4, 5, 9, 10)". When you use myMat[,(f1$String)], R will look for the column with name "c(1, 3, 4, 5, 9, 10)". To get column numbers 1,3,4,5,9,10, you have to parse the string to an R expression and evaluate it first:
myMat[,eval(parse(text=f1$String))]

As #user3794498 noticed, you set f1$String as.character() so you cannot use is to get the columns you want.
You can change the way you define f1 or extract the column numbers from f1$String. Something like this should also work (load stringr before) myMat[, f1$String %>% str_match_all("[0-9]+") %>% unlist %>% as.numeric].

Related

Loop over several dataframes to do several actions in R

I have several dataframes (dataframe_1, dataframe_2...) that I want to loop in order to execute the same functions over all the dataframes. These functions are:
Select specific columns:
dataframe_1 <- dataframe_1[, c("Column_1", "Column_2")]
Rename the columns:
dataframe_1 <- rename(dtaframe_1, New_Name_for_Column_1 = Column_1)
Create new columns. For example, by using the ifelse() function:
dataframe_1$Column_3 <- ifelse(dataframe_1$Column_1 = 5, 1, 0)
I have proven the code with some dataframes individually without errors.
However, if I execute the following loop:
list_dataframes = list(dataframe_1, dataframe_2)
for (dataframe in 1:length(list_dataframes)){
dataframe <- dataframe[, c("Column_1", "Column_2")]
dataframe <- rename(dtaframe, New_Name_for_Column_1 = Column_1)
dataframe$Column_3 <- ifelse(dataframe$Column_1 = 5, 1, 0)
}
The following error arises:
Error in dataframe[, c("Column_1", "Column_2", :
incorrect number of dimensions
(All dataframes have the same column names.)
Any idea?
Thanks!
You are not iterating over the list of dataframes, but rather over a sequence 1:length(list_dataframes). Consider the following for illustration:
a = list("a", "b")
for (i in a){print(i)}
for (i in 1:length(a)){print(i)}
In your code, you need to explicitly access the list elements like this:
list_dataframes = list(dataframe_1, dataframe_2)
for (df_number in 1:length(list_dataframes)){
list_dataframes[[df_number]] <- list_dataframes[[df_number]][, c("Column_1", "Column_2")]
list_dataframes[[df_number]] <- rename(list_dataframes[[df_number]], New_Name_for_Column_1 = Column_1)
list_dataframes[[df_number]]$Column_3 <- ifelse(list_dataframes[[df_number]]$Column_1 = 5, 1, 0)
}
the code for (dataframe in 1:length(list_dataframes)) creates a vector of numbers c(1,2) in which the value of one value at a time is stored in a variable named dataframe. This iteration variable is scalar i.e. it has 1 dimension and a length of 1. This is why you can not subset doing dataframe[, c("Column_1", "Column_2")] Do this instead: list_dataframes[[dataframe]][, c("Column_1", "Column_2")]
You could try to iterate over dataframes using purrr::map_dfr(), e.g.
list_dataframes = list(dataframe_1, dataframe_2)
library(dplyr)
library(purrr)
list_dataframes %>%
map_dfr(~.x %>%
select(Column_1, Column_2) %>%
rename(New_Name_for_Column_1 = Column_1) %>%
mutate(Column3= ifelse(Column_1 == 5, 1, 0)))

Add a Column created Within a Function to a dataframe in R

I have searched and tried multiple previously asked questions that might be similar to my question, but none worked.
I have a dataframe in R called df2, a column called df2$col. I created a function to take the df, the df$col, and two parameters that are names for two new columns I want created and worked on within the function. After the function finishes running, I want a return df with the two new columns included. I get the two columns back indeed, but they are named after the placeholders in the function shell. See below:
df2 = data.frame(col = c(1, 3, 4, 5),
col1 = c(9, 6, 8, 3),
col2 = c(8, 2, 8, 4))
the function I created will take col and do something to it; return the transformed col, as well as the two newly created columns:
no_way <- function(df, df_col_name, df_col_flagH, df_col_flagL) {
lo_perc <- 2
hi_perc <- 6
df$df_col_flagH <- as.factor(ifelse(df_col_name<lo_perc, 1, 0))
df$df_col_flagL <- as.factor(ifelse(df_col_name>hi_perc, 1, 0))
df_col_name <- df_col_name + 1.4
df_col_name <- df_col_name * .12
return(df)
}
When I call the function, no_way(df2, col, df$new_col, df$new_col2), instead of getting a df with col, col1, col2, new_col1, new_col2, I get the first three right but get the parametric names for the last two. So something like df, col, col1, col2, df_col_flagH, df_col_flagL. I essentially want the function to return the df with the new columns' names I give it when I am calling it. Please help.
I don't see what your function is trying to do, but this might point you in the right direction:
no_way <- function(df = df2, df_col_name = "col", df_col_flagH = "col1", df_col_flagL = "col2") {
lo_perc <- 2
hi_perc <- 6
df[[df_col_flagH]] <- as.factor(ifelse(df[[df_col_name]] < lo_perc, 1, 0)) # as.factor?
df[[df_col_flagL]] <- as.factor(ifelse(df[[df_col_name]] > hi_perc, 1, 0))
df[[df_col_name]] <- (df[[df_col_name]] + 1.4) * 0.12 # Do in one step
return(df)
}
I needed to call the function with the new column names as strings instead:
no_way(mball, 'TEAM_BATTING_H', 'hi_TBH', 'lo_TBH')
Additionally, I had to use brackets around the target column in my function.

How can I name a value by calling a character value?

I wish to gives values in a vector names. I know how to do that but in this case I have many names and many values, both within vectors within lists, and typing them by hand would by suicide.
This method:
> values <- c('jessica' = 1, 'jones' = 2)
> values
jessica jones
1 2
obviously works. However, this method:
> names <- c('jessica', 'jones')
> values <- c(names[1] = 1, names[2] = 2)
Error: unexpected '=' in "values <- c(names[1] ="
Well... I cannot understand why R refuses to read these as pure characters to assign them as names.
I realize I can create values and names separately and then assign names as names(values) but again, my actual case is far more complex. But really I would just like to know why this particular issue occurs.
EDIT I: The ACTUAL data I have is a list of vectors, each is a different combination of amounts of ingredients, and then a giant vector of ingredient names. I cannot just set the name vector as names, because the individual names need to be placed by hand.
EDIT II: Example of my data structure.
ingredients <- c('ing1', 'ing2', 'ing3', 'ing4') # this vector is much longer in reality
amounts <- list(c('ing1' = 1, 'ing2' = 2, 'ing4' = 3),
c('ing2' = 2, 'ing3' = 3),
c('ing1' = 12, 'ing2' = 4, 'ing3' = 3),
c('ing1' = 1, 'ing2' = 1, 'ing3' = 2, 'ing4' = 5))
# this list too is much longer
I could type each numeric value's name individually as presented, but there are many more, and so I tried instead to input the likes of:
c(ingredients[1] = 1, ingredients[2] = 2, ingredients[4] = 3)
But this throws an error:
Error: unexpected '=' in "amounts <- list(c(ingredients[1] ="
We can use setNames
setNames(1:2, names)
Another option is deframe if we have a two column dataset
library(tibble)
tibble(names, val = 1:2) %>%
deframe

Trying to compare two dataframes, and writing a logical result to a new dataframe in R

I have an R dataframe that contains 18 columns, I would like to write a function that compares column 1 to column 2, and if both columns contain the same value, a logical result of T or F is written to a new column (this part is not too hard for me), however I would like to repeat this process over for the next columns and write T/F to a new column.
values col 1 = values col 2, write T/F to new column, values col 3 = values col 4, write T/F to a new column (or write results to a new dataframe)
I have been trying to do this with the purrr package, and use the pmap/map function, but I know I am making a mistake and missing some important part.
This function should work if I understand your problem correctly.
df <-
data.frame(a = c(18, 6, 2 ,0),
b = c(0, 6, 2, 18),
c = c(1, 5, 6, 8),
d = c(3, 5, 9, 2))
compare_columns <-
function(x){
n_columns <- ncol(x)
odd_columns <- 2*1:(n_columns/2) - 1
even_columns <- 2*1:(n_columns/2)
comparisons_list <-
lapply(seq_len(n_columns/2),
function(y){
df[, odd_columns[y]] == df[, even_columns[y]]
})
comparisons_df <-
as.data.frame(comparisons_list,
col.names = paste0("column", odd_columns, "_column", even_columns))
return(cbind(x, comparisons_df))
}
compare_columns(df)

Using list's elements in loops in r (example: setDT)

I have multiple data frames and I want to perform the same action in all data frames, such, for example, transform all them into data.tables (this is just an example, I want to apply other functions too).
A simple example can be (df1=df2=df3, without loss of generality here)
df1 <- data.frame(var1 = c(1, 2, 3, 4, 5), var2 =c(1, 2, 2, 1, 2), var3 = c(10, 8, 15, 7, 9))
df2 <- data.frame(var1 = c(1, 2, 3, 4, 5), var2 =c(1, 2, 2, 1, 2), var3 = c(10, 8, 15, 7, 9))
df3 <- data.frame(var1 = c(1, 2, 3, 4, 5), var2 =c(1, 2, 2, 1, 2), var3 = c(10, 8, 15, 7, 9))
My approach was: (i) to create a list of the data frames (list.df), (ii) to create a list of how they should be called afterwards (list.dt) and (iii) to loop into those two lists:
list.df:
list.df<-vector('list',3)
for(j in 1:3){
name <- paste('df',j,sep='')
list.df[j] <- name
}
list.dt
list.dt<-vector('list',3)
for(j in 1:3){
name <- paste('dt',j,sep='')
list.dt[j] <- name
}
Loop (to make all data frames into data tables):
for(i in 1:3){
name<-list.dt[i]
assign(unlist(name), setDT(list.df[i]))
}
I am definitely doing something wrong as the result of this are three data tables with 1 variable, 1 observation (exactly the name list.df[i]).
I've tried to unlist the list.df thinking r would recognize that as an entire data frame and not only as a string:
for(i in 1:3){
name<-list.dt[i]
assign(unlist(name), setDT(unlist(list.df[i])))
}
But I get the error message:
Error in setDT(unlist(list.df[i])) :
Argument 'x' to 'setDT' should be a 'list', 'data.frame' or 'data.table'
Any suggestions?
You can just put all the data into one dataframe. Then, if you want to iterate through dataframes, use dplyr::do or, preferably, other dplyr functions
library(dplyr)
data =
list(df1 = df2, df2 = df2, df3 = df3) %>%
bind_rows(.id = "source") %>%
group_by(source)
Change your last snippet to this:
for(i in 1:3){
name <- list.dt[i]
assign(unlist(name), setDT(get(list.df[[i]])))
}
# Alternative to using lists
list.df <- paste0("df", 1:3)
# For loop that works with the length of the input 'list'/vector
# Creates the 'dt' objects on the fly
for(i in seq_along(list.df)){
assign(paste0("dt", i), setDT(get(list.df[i])))
}
Using data.table (which deserve far more advertising):
a) If you need all your data.frames converted to data.tables, then as was already suggested in the comments by #A5C1D2H2I1M1N2O1R2T1, iterate over your data.frames with setDT
library(data.table)
lapply(mget(paste0("df", 1:3)), setDT)
# or, if you wish to type them one by one:
lapply(list(df1, df2, df3), setDT)
class(df1) # check if coercion took place
# [1] "data.table" "data.frame"
b) If you need to bind your data.frames by rows, then use data.table::rbindlist
data <- rbindlist(mget(paste0("df", 1:3)), idcol = TRUE)
# or, if you wish to type them one by one:
data <- rbindlist(list(df1 = df1, df2 = df2, df3 = df3), idcol = TRUE)
Side note: If you like chaining/piping with the magrittr package (which you see almost always in combination with dplyr syntax), then it goes like:
library(data.table)
library(magrittr)
# for a)
mget(paste0("df", 1:3)) %>% lapply(setDT)
# for b)
data <- mget(paste0("df", 1:3)) %>% rbindlist(idcol = TRUE)

Resources