Substitute digits with strings contained in a reference dataframe - r

I have adf_a looking like:
df_a <- tibble::tribble(
~id, ~string,
115088, "1-3-5-13",
678326, "1-9-13-3",
105616, "1-3-5-13"
)
Each id is associated with the string column, that stores strings composed by digits separated by "-".
I have a reference dataframe for which each id_string is associated with a string of text.
id <- tibble::tribble(
~name, ~id_string,
"aaa", 1,
"bbb", 3,
"ccc", 5,
"ddd", 13,
"eee", 9,
"fff", 8,
"ggg", 6
)
I would like to substitute the digits in the string column in df_a with the text stored in the reference dataframe id.
the result should be:
df_output <- tibble::tribble(
~id, ~string,
115088, "aaa-bbb-ccc-ddd",
678326, "aaa-eee-ddd- bbb",
105616, "aaa-bbb-ccc-ddd"
)

Yeah you got a pretty nasty one right here, this is the type of thing i would write a dedicated c++ method and call it from R because as I see it, it has asymetries.
I wrote an iterative loop for you- it might work -Im not sure, but even if it does and your data is over 200K rows it will become a problem and might take a long time to finish.
temp = strsplit(df_a$string, "-") %>% lapply(function(x) as.numeric(x))
temp.List = list()
actual.List = list()
for(i in 1:length(temp)){
for (j in 1:nrow(id)){
if(temp[[i]] %in% id$id_string[j]){
temp.List[j] = id$name[j]
}else{
temp.List[j] = NULL
}
}
actual.List[[i]]= temp.List %>% unlist %>% paste(sep ='-')
}
desired.Output = cbind(df_a$id,actual.List %>% unlist)
#cleanup
rm(temp,temp.List,actual.List)

Related

Automatically create data frames based on factor levels of a column

I have some fake case data with a manager id, type, and location. I'd like to automatically create data frames with the average number of cases a manager has at a given location.
# create fake data
manager_id <- c(1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3)
type <- c("A", "A", "B", "B", "B", "A", "A", "A", "C", "A", "B", "B", "C", "C", "C")
location <- c("Beach", "Beach", "Beach", "Beach", "Beach", "City", "City", "City", "Farm", "Farm", "Farm", "Farm", "Farm", "Farm", "City")
manager_id <- data.frame(manager_id)
type <- data.frame(type)
location <- data.frame(location)
df <- cbind(manager_id, type, location)
After creating fake data, I created a function that finds this average. The function works.
avgs_function <- function(dat){
dat1 <- dat %>% group_by(manager_id) %>% summarise(total = n())
total <- mean(dat1$total)
total <- round(total, 0)
total
}
I then loop through each location, create data frames using the avgs_function, and store them in a list. Then I call the data frames into my global environment. Something is going wrong here that I can't figure out. The weird thing is that is was working fine yesterday.
df_list <- unique(df$location) %>%
set_names() %>%
map(~avgs_function(df))
names(df_list) <- paste0(names(df_list), "_avg")
list2env(df_list, envir = .GlobalEnv)
Right now, the code is giving these values:
Beach_avg = 5
City_avg = 5
Farm_avg = 5
I would like:
Beach_avg = 5
City_avg = 2
Farm_avg = 3
I believe the issue is happening with the purrr package. Any help would be greatly appreciated!
I don't think you need purrr at all (just dplyr): this gets your desired output
result <-(df
%>% count(manager_id, location)
%>% group_by(location)
%>% summarise(across(n, mean))
)
(although without the _avg added to the location names: you could add mutate(across(location, paste0, "_avg")) (or something with glue) if you wanted)
This also doesn't create the separate variables you wanted (although obviously you can add more stuff e.g. with(result, setNames(list(n), location)) %>% list2env(), but in general workflows that populate your global workspace with a bunch of different named variables are a bad idea - collections like this can usually be handled better by keeping them inside a list/data frame/tibble ...

How to replace empty cells of a particular column in a list with a character in R

I have a list with around 100 data frames and want to replace the empty cells in a particular column (named Event) of all the data frames in the list with a character. I first tried the following code,
lapply(my_list, function(x) replace(x,is.na(x),"No_Event"))
The above code replaces all the NA into '"No_Event". But I want the replacement of the empty cells in a specific column. Also not sure how to represent the blank cells in the code. The following " " doesn't work.
Then I tried,
lapply(my_list, function(x) transform(x, Event = chartr(" ", 'No_Event', Event))
I understand that the above code replaces a particular letter with the specified character, but not sure how to transform the empty/blank cells of the specific column with a character. Besides, I also tried some other codes, which produce errors. Apologies if the question is very basic and the approach that I followed is wrong.
Thanks
Here is a reproducible example (R Version 4.1.0)
library(tidyverse)
my_list <- list(
data.frame(a = 1:5, Event = c(6, "", "", 9, ""), c = 11:15),
data.frame(a = 1:5, Event = c("", "", "", 8, 9), c = 16:20)
)
lapply(my_list, FUN = \(x) {
x |> mutate(Event = case_when(Event == "" ~ "No event", TRUE ~ Event))
})
For earlier R versions:
lapply(my_list, FUN = function(x) {
x %>% mutate(Event = case_when(Event == "" ~ "No event", TRUE ~ Event))
})

applicate function on dataframe in R

I need to create a function that returns the vector of unique values of a column of a dataframe. As input, i should mention the data frame and the column name.
this is what i did :
Val_Uniques <- function(df, col) {
return(unique(df$col))
}
Val_Uniques(mytable, city)
the result is NULL, how can i fix it please ?
I want to add a trycatchblock and print awarning message "the column does not exist" if the name of the column is wrong.
Thank you in advance
I'm sure you're looking for deparse(substitute(x)) and get() here. The former converts the specified names into strings, the latter loads your data at the first place. For the exception we simply could use an if expression.
Val_Uniques <- function(df, col) {
df <- deparse(substitute(df))
df <- get(df)
col <- deparse(substitute(col))
if(!(col %in% names(df)))
stop("the column does not exist")
return(unique(df[[col]]))
}
Test
> Val_Uniques(mytable, city)
[1] A D B C E
Levels: A B C D E
> Val_Uniques(mytable, foo)
Error in Val_Uniques(mytable, foo) : the column does not exist
Data
mytable <- data.frame(city=LETTERS[c(1, 4, 4, 2, 3, 2, 5, 4)],
var=c(1, 3, 22, 4, 5, 8, 7, 9))
Try this one:
df <- data.frame(id = c("A", "B", "C", "C"),
val = c(1,2,3,3), stringsAsFactors = FALSE)
Val_Uniques <- function(df, col) {
return(unique(df[, col]))
}
Val_Uniques(df, "id")
[1] "A" "B" "C"
This link helps with passing column names to functions: Pass a data.frame column name to a function

Using an element from a table in selecting columns/rows in R

I've been working on a process to create all possible combinations of unique integers for lengths 1:n. I found the nCr function (combn function in the combinat package to be useful here).
Once all unique occurrences are iterated, they are appended to a consolidation table that contains any possible length+combination of the digits 1:n. A subset of the final table's relevant column (one record) looks like this (column is named String and the subset table f1):
c(1,3,4,5,9,10)
I need to select these columns from a secondary data source (df) one at a time (I am going to loop through this table), so my logic was to use this code:
df[,f1$String]
However, I get a message that says that undefined columns are selected, but if I copy and paste the contents of the cell such as:
df[,c(1, 3, 4, 5, 9, 10)]
it works fine ... I've tried all I can think of at this point; if anyone has some insight it would be greatly appreciated.
Code to reproduce is:
library(combinat)
library(data.table)
library(plyr)
rm(list=ls())
NCols=10
NRows=10
myMat<-matrix(runif(NCols*NRows), ncol=NCols)
XVars <- as.data.frame(myMat)
colnames(XVars) <- c("a","b","c","d","e","f","g","h","i","j")
x1 <- as.data.frame(colnames(XVars[1:ncol(XVars)]))
colnames(x1) <- "Independent.Variable"
setDT(x1)[, Index := .GRP, by = "Independent.Variable"]
colClasses = c("character", "numeric", "numeric")
col.names = c("String", "r!", "n!")
Combination <- read.table(text = "", colClasses = colClasses, col.names = col.names)
for(i in 1:nrow(x1)){
x2<- as.data.frame(combn(nrow(x1),i))
for (i in 1:ncol(x2)){
x3 <- paste("c(",paste(x2[1:nrow(x2),i], collapse = ", "), ")", sep="")
x3 <- as.data.frame(x3)
colnames(x3) <- "String"
x3 <- mutate(x3, "r!" = nrow(x2))
x3 <- mutate(x3, "n!" = nrow(x1))
Combination <- rbind(Combination, x3)
}
}
setDT(Combination)[, Index := .GRP, by = c("String", "r!", "n!")]
f1 <- Combination[717,]
f1$String <- as.character(f1$String)
## reference to data frame
myMat[,(f1$String)]
## pasted element
myMat[, c(1, 3, 4, 5, 9, 10)]
f1$String is the string "c(1, 3, 4, 5, 9, 10)". When you use myMat[,(f1$String)], R will look for the column with name "c(1, 3, 4, 5, 9, 10)". To get column numbers 1,3,4,5,9,10, you have to parse the string to an R expression and evaluate it first:
myMat[,eval(parse(text=f1$String))]
As #user3794498 noticed, you set f1$String as.character() so you cannot use is to get the columns you want.
You can change the way you define f1 or extract the column numbers from f1$String. Something like this should also work (load stringr before) myMat[, f1$String %>% str_match_all("[0-9]+") %>% unlist %>% as.numeric].

R- apply a function to a multiple data frames, data frame names to be pulled from another dataframe

I have seen may posts regarding this topic but sadly not able to find what I am trying to achieve. I have three dataframes (data1,data2,data3) as below:
first_name=c("xxx", "yyy", "zzz")
address=c("ca", "sa", "la")
data1=data.frame(first_name,address)
first_name=c("aaa", "bbb", "ccc")
address=c("ca", "sa", "la")
data2=data.frame(first_name,address)
first_name=c("abc", "bab", "cac")
address=c("ca", "sa", "la")
data3=data.frame(first_name,address)
I need to apply a function to above 3 dataframes(not only these 3 there can be many more, it has to be dynamic). The data frames are present in my environment. Also, there is a metadata dataframe which contains the names of this dataframes along with other columns.
name = c(2, 3, 5)
entity = c("aa", "bb", "cc")
tablename = c("data1", "data2", "data3")
metadata = data.frame(name, entity, tablename)
From the above metadata, I need to extract the values of tablename column, which are basically dataframes that are already present in my environment. I need to apply a function to these dataframes and generate different outputs for each.
Here is my function(example):
myfunction <- function(data) {
if(class(data)[1] != 'data.frame' && class(data)[1] != 'data.table') {
stop('Invalid input: data should be either data.frame or data.table')
} else {
variable <- names(data)
class <- sapply(data, class)
contents <- data.frame(variable,
class)
return(contents)
}
}
But When I am tying to use "list" and "lapply" it not working.
tablelist <- list(sqldf("select tablename from metadata group by 1"))
lapply(tablelist,myfunction)
Above code is appying my function to the tablename column itself instead of values of tablename column.
Any help would be much appreciated.
This will do it:
data1 <- data.frame(first_name=c("xxx", "yyy", "zzz"), address=c("ca", "sa", "la"))
data2 <- data.frame(first_name=c("aaa", "bbb", "ccc"), address=c("ca", "sa", "la"))
data3 <- data.frame(first_name=c("abc", "bab", "cac"), address=c("ca", "sa", "la"))
metadata <- data.frame(name = c(2, 3, 5), entity = c("aa", "bb", "cc"), tablename = c("data1", "data2", "data3") )
myfunction <- function(data) {
if(class(data)[1] != 'data.frame' && class(data)[1] != 'data.table') {
stop('Invalid input: data should be either data.frame or data.table')
} else {
variable <- names(data)
class <- sapply(data, class)
contents <- data.frame(variable,
class)
return(contents)
}
}
lapply(as.character(metadata$tablename), function(dfname) myfunction(get(dfname)))
as.character() is needed because data.frame() has StringsAsFactors=TRUE as default.
Or you change to StringsAsFactors=FALSE:
metadata <- data.frame(name = c(2, 3, 5), entity = c("aa", "bb", "cc"),
tablename = c("data1", "data2", "data3") , stringsAsFactors = FALSE)
lapply(metadata$tablename, function(dfname) myfunction(get(dfname)))
You can put the result in a list: L <- lapply(...) You can see one Element of the list by (e.g.) L[[1]] And you can name the elements of the list to access the elements by name:
names(L) <- as.character(metadata$tablename)
L[["data1"]] # or
L$data1

Resources