create new dataframes from a master database in R - r

I have a database of different notifiable diseases.
I want to extract a dataframe for each disease in that database so that I can make an automated report form a template in Rmarkdown.
I created a function for creating the dataframe
NMC <- is master database
The database lists all conditions reported
I created a list of those conditions
conditions <- list(unique(NMC$Condition))
I then created a function to create a new dataframe based on the condition
newdf <- function(data, var){
var <- data %>% filter(data$Condition %in% paste0(var))
var
}
Now I want to run my function to create a number of new dataframes from the master database. I thought of doing a for loop:
for (df in conditions){
df <- newdf(NMC, "df")
}
Which runs but doesn't give me anything.
So I found split(), but this hasn't perfectly solved my problem as I still need to type out all the conditions to get each df to apply to the r template.
NMC <- split(NMC, factor(NMC$Condition), drop= FALSE)
#then to get a specifc df (which is laborious)
rubella <- NMC$congenitalrubellasyndrome
# How can i get the dataframes per condition into my environemnt, or access them easily, maybe with %>% fucntion?
My end goal is to then apply an R template to each data frame so that i have a standard epicurve/descriptive stats for each disease.
Thanks

> df <- data.frame(a = rep(letters[1:10], each = 3), x = 1:30)
> for (i in df$a) {
+ assign(i, df[df$a == i, ])
+ }
> ls()
[1] "a" "b" "c" "d" "df" "e" "f" "g" "h" "i" "j"
> a
a x
1 a 1
2 a 2
3 a 3
But see my comment above.

Related

R function that selects certain columns from a dataframe

I am trying to figure out how to write a function in R than can select specific columns from a dataframe(df) for subsetting:
Essentially I have df with columns or colnames : count_A.x, count_B.x, count_C.x, count_A.y, count_B.y, count_C.y.
I would ideally like a function where I can select both "count_A.x" and "count_A.y" columns by simply specifying count_A in function argument.
I tried the following:
e.g. pull_columns2 <- function(df,count_char){
df_subset<- df%>% select(,c(count_char.x, count_char.y))
}
Unfortunately when I run the above code [i.e., pull_columns2(df, count_A)] the following code it rightfully says that column count_char.x does not exist and does not "convert" count_char.x to count_A
pull_columns2(df, count_A)
We can use
pull_columns2 <- function(df,count_char){
df_subset<- df %>% select(contains(count_char))
df_subset
}
#> then use it as follows
df %>% pull_columns2("count_A")
Try
select_func = function(df, pattern){
return(df[colnames(df)[which(grepl(pattern, colnames(df)))]])
}
df = data.frame("aaa" = 1:10, "aab" = 1:10, "bb" = 1:10, "ca" = 1:10)
select_func(df,"b")

set dataframe name inside a function using lapply

Let's say, for the sake of the example, that I have a list of departments. Everyone of them is on a separate table named "departmentName", so I created a list this way.
depts <- c("financial","sales",.....)
and then iterate to get members this way creating a function:
get.employees <- function(tablename) {
con <- DBI::dbConnect(connectiondata....)
query <- glue::glue("select name,position,area from {tablename}")
assign(tablename,
dplyr::tbl(conn, sql(query)) %>% collect())
}
lapply(depts,get.employees)
It works fine but It returned a list of data frames with no name assigned to every element as I was expecting.
I need every dataframe named as the department name.
1) Simplifying the example to use get.employees and depts in the Note at the end we can use Map instead of lapply:
L <- Map(get.employees, depts)
names(L)
## [1] "finance" "sales"
2) This also works:
L2 <- sapply(depts, get.employees, simplify = FALSE)
names(L2)
## [1] "finance" "sales"
Note
Simplified example:
get.employees <- function(x) BOD
depts <- c("finance", "sales")
You can also try-
> ls <- mapply(get.employees, depts,SIMPLIFY = F)
> names(ls)
[1] "finance" "sales"
Note- Input data was taken from answer provided by #G. Grothendleck

How can I erase duplicate data from my dataframe

my code so far looks like this, I have been trying to eliminate the letters in a new and old vector that repeat themselves. the letters represent emails. I have tried using unique and distinct functions, but they keep one of the duplicate values when I need to erase them all. this is the vector I would like as a result
c(b,c,e,f,t,r,w,u,p,q)
new <- c("a","b","c","d","e","f","t")
old <- c("r","w","u","a","d","p","q")
num <- c(1:7)
df_new <- data.frame(num, new)
df_old <- data.frame(num, old)
df_new <- transmute(df_new, num, emails = new)
df_old <- transmute(df_old, num, emails = old)
all_emails <- merge(df_new, df_old, all = TRUE)
From what you show, you are complicating things unnecessarily by putting them in a data frame. Try this:
new <- c("a","b","c","d","e","f","t")
old <- c("r","w","u","a","d","p","q")
x = c(new, old)
result = x[!duplicated(x) & !duplicated(x, fromLast = TRUE)]
result
# [1] "b" "c" "e" "f" "t" "r" "w" "u" "p" "q"
Another method, if both your vectors are individually unique and you just need to drop everything that is in both new and old:
result = setdiff(union(new, old), intersect(new, old))

r- transforming columns, calling them by $name, using a loop

imported tibble from textfile. Many numeric columns are imported as "chr". I guess it's because they contain a "," instead of a ".".
My goal is to write a loop which runs through the names of desired columns, replaces "," with "." and converts columns into "num".
Little example:
data <- data.frame("A1" =c("2,1","2,1","2,1"), "A2" =c("1,3","1,3","1,3"),
stringsAsFactors = F) %>% as.tibble() #example data
colname <- c("A1", "A2") #creating variable for loop
for(i in colname) {
nam <- paste0("data$", i)
assign(nam, as.numeric(gsub(",",".", eval(parse(text = paste0("data$",i))))) )
}
Instead of overwriting the existing column, R creates a new variable:
data$A1 # that's the existing column as part of the tibble
[1] "2,1" "2,1" "2,1"
`data$A1` # thats just a new variable. mind the little``
[1] 2.1 2.1 2.1
I also tried to assign (<-) the new numeric values via eval, but that does not work either.
eval(parse(text = paste0("data$", i))) <- as.numeric(
gsub(",",".", eval(parse(text = paste0("data$",i)))))
Error: target of assignment expands to non-language object
Any suggestions on how to transform? I have the same issue with other columns that I want to aggregate to a new variable. This variable should also be part of the existing tibble. I could do it by hand. This would take lots of time and probably produce many mistakes.
Thanks a lot!
Sam
As you are already working with the tidyverse, you can use dplyr::mutate_at and the colname variable you have already defined.
data %>%
mutate_at(.vars = colname,
.funs = function(x) { as.numeric(gsub(",", ".", x)) })

Removing values, stored in character vector, from a list

I want to remove certain values from a list to create a refined list. I have all of the values I want to remove stored in a character vector named remove. The values in remove correspond to the first column of the list. I've run the following code:
refined_list = list
for (i in length(list)){
if (refined_list[i,1] %in% remove){
refined_list = refined_list[-i,]
}
else{
refined_list = refined_list
}
}
only the initialization of refined list seems to register. No errors, but refined_list is identical to list. It's a mystery to me
It doesn't seem like you're actually talking about lists, since a list cannot be subset as you are proposing (i.e. list[, 1]). But if you're looking for a solution for a data.frame, here's one:
# Set up some test data
dd <- data.frame(letters = letters[1:10], stringsAsFactors = FALSE)
remove <- letters[c(1, 4, 6)]
# Shed values that are in remove
dd[!(dd[, 1] %in% remove), 1, drop = FALSE]
#> "b" "c" "e" "g" "h" "i" "j"

Resources