applicate function on dataframe in R - r

I need to create a function that returns the vector of unique values of a column of a dataframe. As input, i should mention the data frame and the column name.
this is what i did :
Val_Uniques <- function(df, col) {
return(unique(df$col))
}
Val_Uniques(mytable, city)
the result is NULL, how can i fix it please ?
I want to add a trycatchblock and print awarning message "the column does not exist" if the name of the column is wrong.
Thank you in advance

I'm sure you're looking for deparse(substitute(x)) and get() here. The former converts the specified names into strings, the latter loads your data at the first place. For the exception we simply could use an if expression.
Val_Uniques <- function(df, col) {
df <- deparse(substitute(df))
df <- get(df)
col <- deparse(substitute(col))
if(!(col %in% names(df)))
stop("the column does not exist")
return(unique(df[[col]]))
}
Test
> Val_Uniques(mytable, city)
[1] A D B C E
Levels: A B C D E
> Val_Uniques(mytable, foo)
Error in Val_Uniques(mytable, foo) : the column does not exist
Data
mytable <- data.frame(city=LETTERS[c(1, 4, 4, 2, 3, 2, 5, 4)],
var=c(1, 3, 22, 4, 5, 8, 7, 9))

Try this one:
df <- data.frame(id = c("A", "B", "C", "C"),
val = c(1,2,3,3), stringsAsFactors = FALSE)
Val_Uniques <- function(df, col) {
return(unique(df[, col]))
}
Val_Uniques(df, "id")
[1] "A" "B" "C"
This link helps with passing column names to functions: Pass a data.frame column name to a function

Related

Add a Column created Within a Function to a dataframe in R

I have searched and tried multiple previously asked questions that might be similar to my question, but none worked.
I have a dataframe in R called df2, a column called df2$col. I created a function to take the df, the df$col, and two parameters that are names for two new columns I want created and worked on within the function. After the function finishes running, I want a return df with the two new columns included. I get the two columns back indeed, but they are named after the placeholders in the function shell. See below:
df2 = data.frame(col = c(1, 3, 4, 5),
col1 = c(9, 6, 8, 3),
col2 = c(8, 2, 8, 4))
the function I created will take col and do something to it; return the transformed col, as well as the two newly created columns:
no_way <- function(df, df_col_name, df_col_flagH, df_col_flagL) {
lo_perc <- 2
hi_perc <- 6
df$df_col_flagH <- as.factor(ifelse(df_col_name<lo_perc, 1, 0))
df$df_col_flagL <- as.factor(ifelse(df_col_name>hi_perc, 1, 0))
df_col_name <- df_col_name + 1.4
df_col_name <- df_col_name * .12
return(df)
}
When I call the function, no_way(df2, col, df$new_col, df$new_col2), instead of getting a df with col, col1, col2, new_col1, new_col2, I get the first three right but get the parametric names for the last two. So something like df, col, col1, col2, df_col_flagH, df_col_flagL. I essentially want the function to return the df with the new columns' names I give it when I am calling it. Please help.
I don't see what your function is trying to do, but this might point you in the right direction:
no_way <- function(df = df2, df_col_name = "col", df_col_flagH = "col1", df_col_flagL = "col2") {
lo_perc <- 2
hi_perc <- 6
df[[df_col_flagH]] <- as.factor(ifelse(df[[df_col_name]] < lo_perc, 1, 0)) # as.factor?
df[[df_col_flagL]] <- as.factor(ifelse(df[[df_col_name]] > hi_perc, 1, 0))
df[[df_col_name]] <- (df[[df_col_name]] + 1.4) * 0.12 # Do in one step
return(df)
}
I needed to call the function with the new column names as strings instead:
no_way(mball, 'TEAM_BATTING_H', 'hi_TBH', 'lo_TBH')
Additionally, I had to use brackets around the target column in my function.

Setting colnames of several data frames based on a list variable

I have a list of multiple data frames which are built the same way. I would like to change the name of the 1 column of each data frame to the name of the data frame itself and append some text. From several different answers I figured lapply and working on lists would be the best way to go.
Example data:
df1 <- data.frame(A = 1, B = 2, C = 3)
df2 <- data.frame(A = 1, B = 2, C = 3)
dfList <- list(df1,df2)
col1 <- names(dfList)
df<-lapply(dfList, function(x) {
names(x)[1:2] <- c(col1[1:length(col1)]"appended text","Col2","Col3");x
})
The problem seems to be with calling the correct entry in the "col1" variable for each data frame within my code.
Any ideas on how I should address/ express this correctly? Thanks a lot!
df1<-data.frame(A = 1, B = 2, C = 3)
df2<-data.frame(A = 1, B = 2, C = 3)
dfList <- list(df1=df1,df2=df2)
names(dfList)
col1 <- names(dfList)
for(i in 1:length(dfList))
names(dfList[[names(dfList[i])]])[1]<-names(dfList)[i]
dfList
Here is one option with tidyverse
library(tidyverse)
map(dfList, ~ .x %>%
rename(Aappended_text = A))
If this is based on the column index, create a function
fName <- function(lst, new_name, index){
map(lst, ~
.x %>%
rename_at(index, funs(paste0(., new_name))))
}
fName(dfList, "appended_text", 1)
I'm not sure if I'm understanding your quesiton completely but is tihs what you're after:
df1 <- data.frame(A = 1, B = 2, C = 3)
df2 <- data.frame(A = 1, B = 2, C = 3)
dfList <- list(df1,df2)
df <- lapply(dfList, function(x) {
colnames(x) <- c(paste0(colnames(x)[1], "appended text"), colnames(x)[2:length(colnames(x))])
return(x)
})
Output:
> df
[[1]]
Aappended text B C
1 1 2 3
[[2]]
Aappended text B C
1 1 2 3
You can simply use lapply
lapply(dfList, function(x) {
names(x)[1L] <- "some text"
x
})
But if you want to rename by the name of the data frame elements in your list, first you need to name them e.g. dfList <- list(df1 = df1, df2 = df2) and you can't acces them directly with lapply(x, ... so you need to lapplyover your list by indexes, for example :
lapply(seq_along(dfList), function(i) {
names(dfList[[i]])[1L] <- names(dfList[i])
dfList[[i]]
})

Using an element from a table in selecting columns/rows in R

I've been working on a process to create all possible combinations of unique integers for lengths 1:n. I found the nCr function (combn function in the combinat package to be useful here).
Once all unique occurrences are iterated, they are appended to a consolidation table that contains any possible length+combination of the digits 1:n. A subset of the final table's relevant column (one record) looks like this (column is named String and the subset table f1):
c(1,3,4,5,9,10)
I need to select these columns from a secondary data source (df) one at a time (I am going to loop through this table), so my logic was to use this code:
df[,f1$String]
However, I get a message that says that undefined columns are selected, but if I copy and paste the contents of the cell such as:
df[,c(1, 3, 4, 5, 9, 10)]
it works fine ... I've tried all I can think of at this point; if anyone has some insight it would be greatly appreciated.
Code to reproduce is:
library(combinat)
library(data.table)
library(plyr)
rm(list=ls())
NCols=10
NRows=10
myMat<-matrix(runif(NCols*NRows), ncol=NCols)
XVars <- as.data.frame(myMat)
colnames(XVars) <- c("a","b","c","d","e","f","g","h","i","j")
x1 <- as.data.frame(colnames(XVars[1:ncol(XVars)]))
colnames(x1) <- "Independent.Variable"
setDT(x1)[, Index := .GRP, by = "Independent.Variable"]
colClasses = c("character", "numeric", "numeric")
col.names = c("String", "r!", "n!")
Combination <- read.table(text = "", colClasses = colClasses, col.names = col.names)
for(i in 1:nrow(x1)){
x2<- as.data.frame(combn(nrow(x1),i))
for (i in 1:ncol(x2)){
x3 <- paste("c(",paste(x2[1:nrow(x2),i], collapse = ", "), ")", sep="")
x3 <- as.data.frame(x3)
colnames(x3) <- "String"
x3 <- mutate(x3, "r!" = nrow(x2))
x3 <- mutate(x3, "n!" = nrow(x1))
Combination <- rbind(Combination, x3)
}
}
setDT(Combination)[, Index := .GRP, by = c("String", "r!", "n!")]
f1 <- Combination[717,]
f1$String <- as.character(f1$String)
## reference to data frame
myMat[,(f1$String)]
## pasted element
myMat[, c(1, 3, 4, 5, 9, 10)]
f1$String is the string "c(1, 3, 4, 5, 9, 10)". When you use myMat[,(f1$String)], R will look for the column with name "c(1, 3, 4, 5, 9, 10)". To get column numbers 1,3,4,5,9,10, you have to parse the string to an R expression and evaluate it first:
myMat[,eval(parse(text=f1$String))]
As #user3794498 noticed, you set f1$String as.character() so you cannot use is to get the columns you want.
You can change the way you define f1 or extract the column numbers from f1$String. Something like this should also work (load stringr before) myMat[, f1$String %>% str_match_all("[0-9]+") %>% unlist %>% as.numeric].

Using list's elements in loops in r (example: setDT)

I have multiple data frames and I want to perform the same action in all data frames, such, for example, transform all them into data.tables (this is just an example, I want to apply other functions too).
A simple example can be (df1=df2=df3, without loss of generality here)
df1 <- data.frame(var1 = c(1, 2, 3, 4, 5), var2 =c(1, 2, 2, 1, 2), var3 = c(10, 8, 15, 7, 9))
df2 <- data.frame(var1 = c(1, 2, 3, 4, 5), var2 =c(1, 2, 2, 1, 2), var3 = c(10, 8, 15, 7, 9))
df3 <- data.frame(var1 = c(1, 2, 3, 4, 5), var2 =c(1, 2, 2, 1, 2), var3 = c(10, 8, 15, 7, 9))
My approach was: (i) to create a list of the data frames (list.df), (ii) to create a list of how they should be called afterwards (list.dt) and (iii) to loop into those two lists:
list.df:
list.df<-vector('list',3)
for(j in 1:3){
name <- paste('df',j,sep='')
list.df[j] <- name
}
list.dt
list.dt<-vector('list',3)
for(j in 1:3){
name <- paste('dt',j,sep='')
list.dt[j] <- name
}
Loop (to make all data frames into data tables):
for(i in 1:3){
name<-list.dt[i]
assign(unlist(name), setDT(list.df[i]))
}
I am definitely doing something wrong as the result of this are three data tables with 1 variable, 1 observation (exactly the name list.df[i]).
I've tried to unlist the list.df thinking r would recognize that as an entire data frame and not only as a string:
for(i in 1:3){
name<-list.dt[i]
assign(unlist(name), setDT(unlist(list.df[i])))
}
But I get the error message:
Error in setDT(unlist(list.df[i])) :
Argument 'x' to 'setDT' should be a 'list', 'data.frame' or 'data.table'
Any suggestions?
You can just put all the data into one dataframe. Then, if you want to iterate through dataframes, use dplyr::do or, preferably, other dplyr functions
library(dplyr)
data =
list(df1 = df2, df2 = df2, df3 = df3) %>%
bind_rows(.id = "source") %>%
group_by(source)
Change your last snippet to this:
for(i in 1:3){
name <- list.dt[i]
assign(unlist(name), setDT(get(list.df[[i]])))
}
# Alternative to using lists
list.df <- paste0("df", 1:3)
# For loop that works with the length of the input 'list'/vector
# Creates the 'dt' objects on the fly
for(i in seq_along(list.df)){
assign(paste0("dt", i), setDT(get(list.df[i])))
}
Using data.table (which deserve far more advertising):
a) If you need all your data.frames converted to data.tables, then as was already suggested in the comments by #A5C1D2H2I1M1N2O1R2T1, iterate over your data.frames with setDT
library(data.table)
lapply(mget(paste0("df", 1:3)), setDT)
# or, if you wish to type them one by one:
lapply(list(df1, df2, df3), setDT)
class(df1) # check if coercion took place
# [1] "data.table" "data.frame"
b) If you need to bind your data.frames by rows, then use data.table::rbindlist
data <- rbindlist(mget(paste0("df", 1:3)), idcol = TRUE)
# or, if you wish to type them one by one:
data <- rbindlist(list(df1 = df1, df2 = df2, df3 = df3), idcol = TRUE)
Side note: If you like chaining/piping with the magrittr package (which you see almost always in combination with dplyr syntax), then it goes like:
library(data.table)
library(magrittr)
# for a)
mget(paste0("df", 1:3)) %>% lapply(setDT)
# for b)
data <- mget(paste0("df", 1:3)) %>% rbindlist(idcol = TRUE)

Note and recode duplicates

I have a dataframe that's similar to what's below:
num <- c(1, 2, 3, 4)
name <- c("A", "B", "C", "A")
df <- cbind(num, name)
I'm looking to essentially turn this into:
num <- c(1, 2, 3, 4)
name <- c("A1", "B", "C", "A2")
df <- cbind(num, name)
How would I do this automatically, since my actual data is much larger?
Puginablanket,
See below for two solutions, one using the plyr package and the other using base R's by and do.call functions.
eg <- data.frame(num = c(1, 2, 3, 4, 5),
name = c("A", "B", "C", "A", "B"),
stringsAsFactors = FALSE)
do.call(rbind, by(eg, eg$name, function(x) {
x$name2 <- paste0(x$name, 1:nrow(x))
x
}))
plyr::ddply(eg, "name", function(x) {
x$name2 <- paste0(x$name, 1:nrow(x))
x
})
Depending on your application, it might make sense to create a separate column which tracks this duplication (so that you're not using string parsing at a later step to pull it back apart).
It might be worth considering the built-in make.unique(), although it doesn't do exactly what the OP wants (it doesn't label the first duplicated value, so that it can be run multiple times in succession). A little bit of extra trickiness is also required since name is a factor:
df <- data.frame(num = c(1, 2, 3, 4),
name = c("A", "B", "C", "A"))
df <- transform(df, name=factor(make.unique(
as.character(name),sep="")))
## num name
## 1 1 A
## 2 2 B
## 3 3 C
## 4 4 A1
I converted your matrix to a dataframe
df <- data.frame(num, name)
#Get duplicat names
ext <- as.numeric(ave(as.character(df$name) , df$name,
FUN=function(x) cumsum(duplicated(x))+1))
nms <- df$name[ext > 1]
#add into data
df$newname <- ifelse( df$name %in% nms, paste0(df$name, ext), as.character(df$name))
Here's a one-line solution, assuming you really do have a data.frame rather than a matrix (a matrix is what is returned by your cbind() command):
df <- data.frame(num=1:4, name=c('A','B','C','A') );
transform(df,name=paste0(name,ave(c(name),name,FUN=function(x) if (length(x) > 1) seq_along(x) else '')));
## num name
## 1 1 A1
## 2 2 B
## 3 3 C
## 4 4 A2

Resources