Simple loop with subset and variable name assignment - r

I am actually learning R and I don't understand why this simple assignment does not works. I would like to subset by year using the filter function of the dplyr package. After several tentatives, here are a reproducible example using the gapminder dataset.
I could use the subset function, lapply, or even anonymous function to solve this problem, but here, I just want to understand why this specific code is not working.
library(gapminder)
library(dplyr)
for (i in unique(gapminder$year)) {
paste0("gapminder", i) <- print(gapminder %>%
filter(year == i))
}
With or without print, same problem

It's because your assignment is to a function (paste0).
If you remove that part it prints each filtered dataframe:
library(gapminder)
library(dplyr)
for (i in unique(gapminder$year)) {
print(gapminder %>% filter(year == i))
}
You could assign each to a list, like so:
my_list <- list()
library(gapminder)
library(dplyr)
for (i in seq_along(unique(gapminder$year))) {
year_filter <- unique(gapminder$year)[i] # each iteration we get another year
my_list[[i]] <- gapminder %>% filter(year == year_filter)
cat(paste0("gapminder", year_filter, " ")) # use cat if you want to print at each iteration
}
paste0 just concatenates vectors after converting to character.

Use assign function to store the output.
for (i in unique(gapminder$year))
{
assign(paste0("gapminder", i),print(gapminder %>%filter(year == i)))
}
If you want to get the specific output, use get function.
out_i = get(paste0("gapminder", i))

Related

How to drop columns that meet a certain pattern over a list of dataframes

I'm trying to drop columns that have a suffix .1 - indicating that this is a repeated column name. This needs to act over a list of dataframe
I have written a function:
drop_duplicated_columns <- function (df) {
lapply(df, function(x) {
x <- x %>% select(-contains(".1"))
x
})
return(df)
}
However it is not working. Any ideas why?
One tidy way to solve this problem would be to first create a function that works for one data.frame and then map this function to a list
library(tidyverse)
drop_duplicated_columns <- function(df) {
df %>%
select(-contains(".1"))
}
Or even better
drop_duplicated_columns <- . %>%
select(-contains(".1"))
Usage in pipes, combine it with a map
list_dfs <- list(mtcars,mtcars)
list_dfs %>%
map(drop_duplicated_columns)
If you just need one function you can create a new pipe using the functioning code that you tested before
drop_duplicated_columns_list <- . %>%
map(drop_duplicated_columns)
list_dfs %>%
drop_duplicated_columns_list()

Problem with mutate keyword and functions in R

I got a problem with the use of MUTATE, please check the next code block.
output1 <- mytibble %>%
mutate(newfield = FND(mytibble$ndoc))
output1
Where FND function is a FILTER applied to a large file (5GB):
FND <- function(n){
result <- LARGETIBBLE %>% filter(LARGETIBBLE$id == n)
return(paste(unique(result$somefield),collapse=" "))
}
I want to execute FND function for each row of output1 tibble, but it just executes one time.
Never use $ in dplyr pipes, very rarely they are used. You can change your FND function to :
library(dplyr)
FND <- function(n){
LARGETIBBLE %>% filter(id == n) %>% pull(somefield) %>%
unique %>% paste(collapse = " ")
}
Now apply this function to every ndoc value in mytibble.
mytibble %>% mutate(newfield = purrr::map_chr(ndoc, FND))
You can also use sapply :
mytibble$newfield <- sapply(mytibble$ndoc, FND)
FND(mytibble$ndoc) is more suitable for data frames. When you use functions such as mutate on a tibble, there is no need to specify the name of the tibble, only that of the column. The symbols %>% are already making sure that only data from the tibble is used. Thus your example would be:
output1 <- mytibble %>%
mutate(newfield = FND(ndoc))
FND <- function(n){
result <- LARGETIBBLE %>% filter(id == n)
return(paste(unique(result$somefield),collapse=" "))
}
This would be theoretically, however I do not know if your function FND will work, maybe try it and if not, give some practical example with data and what you are trying to achieve.

How do I make a for loop with the filter function?

I'm having a problem with using the filter() function inside a for loop, it doesn't filter the data frame and instead creates an i value. The code is below:
library(tidyverse)
library(magrittr)
library(dplyr)
funcexrds <- readRDS("C:/Users/chlav/Dropbox/Antidumping/Data/ano_pais_imp/funcex.rds")
funcexrds <- funcexrds %>% arrange(desc_cnae, pais)
View(funcexrds)
funcexpais_lista <- funcexrds %>% select(pais) %>% as.list()
funcexcnae_lista <- funcexrds %>% select(desc_cnae) %>% as.list()
subset1 <- filter(funcexrds, pais == "África do Sul", desc_cnae == "Abate de reses, exceto suínos")
for (i in 1:length(unique(funcexpais_lista))) {
funcexrds_t <- filter(funcexrds, pais == "i")
}
As you can see if you reproduce the code, subset1 returns the filtered dataset as you expect, but the for loop doesn't
I agree with #Clemsang. If you're trying to get the for loop to pull out whatever relevant information is at Pais == 1, Pais == 2, etc. putting i outside of quotes effectively shows the for loop where to put the number you indicated in
for (i in 1:length(unique(funcexpais_lista)))
Also just some housekeeping to keep in mind, since tidyverse already contains the dplyr and magrittr functions, you should only need to load tidyverse before starting your code!

Getting Looped Output into an Appended Object

So I am trying to make a basic sensitivity analysis script. The outputs come out as I want via the print I added to the end of the script. Issue is that I would like a tibble or object that has all the outputs appended together that I can export as a csv or xlsx.
I created two functions, sens_analysis which runs all the code, and multiply_across which multiplies across each possible percentage across each possible column of your table. You need multiply_across to run the sens_analysis.
I would normally like a title but instead I just added an indicator column instead that I can sort by.
I made everything with mtcars so it should be easy to replicate, the issue is that I just have a huge print at the end; not an object that I can manipulate or pull from for other analysis.
I have been trying the rbind, bind_row, appending rows in a variety of ways.
Or building a new object. As you can see in the code at line (18) I make something called output that I have tried to populate, which hasn't gone well.
rm(list = ls())
library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
library(magrittr)
library(xtable)
data<-mtcars
percent<-c(.05,.1,.15)
goods<-c("hp","gear","wt")
weight<-c(6,7,8)
disagg<-"cyl"
func<-median
sens_analysis<-function(data=data, goods=goods, weight=weight, disagg=disagg, precent=percent, func=func){
output<-NULL%>%
as.tibble()
basket<-(rbind(goods,weight))
percent<-c(0,percent,(percent*-1))
percent_to_1<-percent+1
data_select<-data%>%
dplyr::select(c(goods,disagg))%>%
group_by_at(disagg)%>%
summarise_at(.vars = goods ,.funs = func)%>%
as_tibble()
data_select_weight<-purrr::map2(data_select[,-1], as.numeric(basket[2,]),function(var, weight){
var*weight
})%>% as_tibble %>%
add_column(data_select[,1], .before = 1)
colnames(data_select_weight)[1]<-disagg
multiply_across(data_select_weight,percent_to_1)
return(output)
#output2<-rbind(output2,output)
}
############################
multiply_across<-function(data=data_select_weight,list=percent_to_1){
varlist<-names(data[,-1])
for(i in varlist){
df1 = data[,i]
for(j in list){
df<-data
df[,i]<-round(df1*j,2)
df<-mutate(df, total = round(rowSums(df[,-1]),2))%>%
mutate(type=paste0(i," BY ",(as.numeric(j)-1)*100,"% OVER ",disagg))%>%
print(df)
#output<-bind_rows(output,df)
#output<-bind_rows(output,df)
#output[[j]]<-df[[j]]
}
}
}
##############################################################################################
sens_analysis(data,goods,weight,disagg,percent,func)
The expected result if you just run the code straight-up should just be a bunch of printed tibbles, that arent in an object. But ideally, for future analysis on the data or easy of use, a table of the outputs appended together would be best.
So I figured it out and will add my answer here in case someone else hits this issues.
I created a list within loops and then binded those lists together.
Just focus on the binding rows outside the right for-loop.
multiply_across<-function(data=data_select_weight,
list=percent_to_1){
varlist <- colnames(data[, -1])
output_list <- list()
for (i in varlist) {
df1 <- data[,i]
for (j in list) {
name <- paste0(i, " BY ", (as.numeric(j)-1)*100, "% OVER ", disagg)
df <- as_tibble(data)
df[,i] <- round(df1*j, 2)
df <- mutate(df, total = round(rowSums(df[,-1]),2))%>%
mutate(type = paste0(i, " BY ", (as.numeric(j)-1)*100, "% OVER ", disagg))
df<-df[,c(6,1,2,3,4,5)]
output_list[[paste0(i," BY ",(as.numeric(j)-1)*100)]] <- (assign(paste0(i," BY ",(as.numeric(j)-1)*100,"% OVER ",disagg),df))
}
}
bind_rows(lapply(output_list,
as.data.frame.list,
stringsAsFactors=F))
}

Dplyr conditional select and mutate

I have working code which excludes columns based on a parameter and mutates certain columns based on other parameters. There is this SO question Can dplyr package be used for conditional mutating? but it does not address conditional select
Is there a way to have pure dplyr code without the if statements?
Working R Code:
# Loading
diamonds_tbl <- diamonds
head(diamonds_tbl)
# parameters
initialColumnDrop <- c('x','y','z')
forceCategoricalColumns <- c('carat','cut', 'color')
forceNumericalColumns <- c('')
# Main Code
if(length(which(colnames(diamonds_tbl) %in% initialColumnDrop))>=1){
diamonds_tbl_clean <- diamonds_tbl %>%
select(-one_of(initialColumnDrop)) #Drop specific columns in columnDrop
}
if(length(which(colnames(diamonds_tbl_clean) %in% forceCategoricalColumns))>=1){
diamonds_tbl_clean <- diamonds_tbl_clean %>%
mutate_at(forceCategoricalColumns,funs(as.character)) #Force columns to be categorical
}
if(length(which(colnames(diamonds_tbl_clean) %in% forceNumericalColumns))>=1){
diamonds_tbl_clean <- diamonds_tbl_clean %>%
mutate_at(forceNumericalColumns,funs(as.numeric)) #Force columns to be numeric
}
I don't really understand the desire for a "pure dplyr" solution, but you can make any problem easier with helper functions. For example you could write a a function to run a transformation only if certain columns are found
run_if_cols_match <- function(data, cols, expr) {
if (any(names(data) %in% cols)) {
expr(data)
} else {
data
}
}
Then you could use that in a pipe
diamonds_tbl_clean <- diamonds_tbl %>%
run_if_cols_match(initialColumnDrop,
. %>% select(-one_of(initialColumnDrop))) %>%
run_if_cols_match(forceCategoricalColumns,
. %>% mutate_at(forceCategoricalColumns,funs(as.character))) %>%
run_if_cols_match(forceNumericalColumns,
. %>% mutate_at(forceNumericalColumns,funs(as.numeric)))
which would do the same thing as your code. Here just just conditionally run different anonymous pipes.

Resources