Create a dataframe looping a function's results - r

So this is a simplification of my problem.
I have a dataframe like this:
df <- data.frame(name=c("lucas","julio","jack","juan"),number=c(1,15,100,22))
And I have a function that creates new values for every name, like this:
var_number <- function(x) {
example <- df %>%
filter(name %in% unique(df$name)[x]) %>%
select(-name) %>%
mutate(value1=number/2^5, value2=number^5)
(example)
}
var_number(1)
0.03125 1
Now I have two new values for every name and I would like to create a loop to save each result in a new dataframe.
I know how to solve this particular problem, but I need a general solution that allows me to save the results of all functions into a dataframe.
I'm looking for an automatic way to do something like this:
result<- bind_rows(var_number(1),var_number(2),var_number(3),var_number(4))
Since I would have to apply var_number around 1000 times and the lenght would change with every test i do.
There is anyway I can do something like this? I was thinking about doing it with "for", but I'm not really sure about how to do it, I have just started with R and I am a total newbie.

This answers my problem:
library(tidyverse) # contains purrr library
#an arbitrary function that always outputs a dataframe
# with a consistent number of columns, in this case 3
myfunc <- function(x){
data.frame(a=x*2,
b=x^2,
c=log2(x))
}
# iterate over 1:10 as inputs to myfunc, and
# combine the results rowwise into a df
purrr::map_dfr(1:10,
~myfunc(.))

Why do you want to apply var_number function for each name, create a new dataframe for each and then combine all of them together?
Do it only once in the same dataframe.
library(dplyr)
df1 <- df %>%
mutate(value1=number/2^5,value2=number^5) %>%
select(-name)
If you want to do it only for specific names, you can filter them first before applying the above.

Related

Trying to iterate over a list and append dataframes of weighted means in dplyr

I am trying to create a table which provides the weighted means of a list of variables by categories of another list of variables. I want to iterate over the second list of variables with each iteration appending the dataframe to the previous dataframe. I think this is supposed to involve imap_dfr from purrr but I can't quite get the code right. I want to use tidyverse for my code.
I'll use the illinois dataset from the pollster package for my example.
require(pollster)
# rv and voter dummy variables that I want to recode to 1
# and 0 so that I can get the percent of people who are 1s # in each variable. Here I recode them.
voter_vars <- c("rv", "voter")
df2 <- illinois %>%
mutate_at(
voter_vars, ~
recode(.x,
"1" = 0,
"2" = 1)) %>%
mutate_at(
voter_vars, ~
as.numeric(.x))
So those are the variables I want as the columns in my table. To get the weighted means for these two variables I write a function
news_summary <- function(var1){
var1 <- ensym(var1)
df3 <- df2 %>%
group_by(!!var1) %>%
summarise_at(vars(voter_vars),
funs(weighted.mean(., weight, na.rm=TRUE)))
return(df3)
}
This creates a data frame output if I run it for one variable in the dataset
news_summary(educ6)
But what I want to do is run it for three variables in the dataset, rowbinding each output to the previous output so I have a table with all of the weighted means together.
demographic_vars <- c("educ6", "raceethnic", "maritalstatus")
However, I don't quite understand how to put this into imap_dfr (which I think is what I am supposed to use to do this) to make it work. I tried this based on code I found elsewhere. But it doesn't work.
purrr::imap_dfr(demographic_vars ~ news_summary(!!.x))

Developing Functions to Make New Dataframes in R

I am trying to develop a function that will take data, see if it matches a value in a category (e.g., 'Accident', and if so, develop a new dataframe using the following code.
cat.df <- function(i) {
sdb.i <- sdb %>%
filter(Category == i) %>%
group_by(Year) %>%
summarise(count = n()}
The name of the dataframe should be sdb.i, where i is the name of the category (e.g., 'Accident'). Unfortunately, I cannot get it to work. I'm notoriously bad with functions and would love some help.
It's not entirely clear what you are after so I am making a guess.
First of all, your function cat.df misses a closing bracket so it would not run.
I think it is good practice to pass all objects as parameters to a function. In my example I use the iris dataset so I pass this explicitly to the function.
You cannot change the name of a data frame in the way you describe. I offer two alternatives: if the count of your categories is limited you can just create separate names for each object. it you have many categories, best to combine all result objects into a list.
library(dplyr)
data(iris)
cat.df <- function(data, i) {
data <- data %>%
filter(Species== i) %>%
group_by(Petal.Width) %>%
summarise(count = n())
return(data)
}
result.setosa <- cat.df(iris, "setosa") # unique name
Species <- sort(unique(iris$Species))
results_list <- lapply(Species, function(x) cat.df(iris, x)) # combine all df's into a list
names(results_list) <- Species # name the list elements
You can then get the list elements as e.g. results_list$setosa or results_list[[1]].

Applying dplyr's tally over large amount of columns to create codebook

I have a dataframe ov 100+ variables and I would like to create a codebook to see the frequencies of each variable (and ideally output this to excel). Right now, I'm using the following code:
freq_fun <- function(var){
var <- enquo(var)
frequencies <- raw %>% group_by(group, !!var) %>% tally()
return(frequencies)
}
I added in the return in the hopes that looping by column names would at least show me the output but this was unsuccessful.
At this point, my plan is to do the following:
for(i in colnames(rawxl[,9:107])){
assign(paste0(i,"freq"), freq_queue(!!i))
}
output each dataframe to a csv and then copy and paste into one excel doc. This is undesirable for obvious reasons, but I can't see a clear way around it. What is a better way to do this?

returning a list from a user function using group_by in R

I have a data.frame, I would like to group the data by one of the columns and then apply a function, which operates on the remaining columns of the data. The function returns a list of mixed objects.
If I was just returning one value from the group I know that I could use something like:
df %>% group_by(Column_1) %>% summarise(my_function)
I also know that I could perform operations on a list using the lapply which will happily return a list. I'm just not sure how to combines these two pieces of knowledge to acheive my desired result.
example code added, userFunction and data are representitive, but should give a good enough idea of what I want.
userFunction <- function(carData){
return(list(
a = carData$am * carData$carb,
b = plot(carData$disp ~ carData$carb),
c = mean(carData$drat)
))
}
mtcars %>%
group_by(cyl) %>%
summarise(userFunction)
I'd like to get back a list of lenght the number of factors in the columns i group_by. In the list should be a, b and c.
This seems to work as I was want.
this <- by(mtcars, mtcars$am, userFunction)

pass grouped dataframe to own function in dplyr

I am trying to transfer from plyr to dplyr. However, I still can't seem to figure out how to call on own functions in a chained dplyr function.
I have a data frame with a factorised ID variable and an order variable. I want to split the frame by the ID, order it by the order variable and add a sequence in a new column.
My plyr functions looks like this:
f <- function(x) cbind(x[order(x$order_variable), ], Experience = 0:(nrow(x)-1))
data <- ddply(data, .(ID_variable), f)
In dplyr I though this should look something like this
f <- function(x) cbind(x[order(x$order_variable), ], Experience = 0:(nrow(x)-1))
data <- data %>% group_by(ID_variable) %>% f
Can anyone tell me how to modify my dplyr call to successfully pass my own function and get the same functionality my plyr function provides?
EDIT: If I use the dplyr formula as described here, it DOES pass an object to f. However, while plyr seems to pass a number of different tables (split by the ID variable), dplyr does not pass one table per group but the ENTIRE table (as some kind of dplyr object where groups are annotated), thus when I cbind the Experience variable it appends a counter from 0 to the length of the entire table instead of the single groups.
I have found a way to get the same functionality in dplyr using this approach:
data <- data %>%
group_by(ID_variable) %>%
arrange(ID_variable,order_variable) %>%
mutate(Experience = 0:(n()-1))
However, I would still be keen to learn how to pass grouped variables split into different tables to own functions in dplyr.
For those who get here from google. Let's say you wrote your own print function.
printFunction <- function(dat) print(dat)
df <- data.frame(a = 1:6, b = 1:2)
As it was asked here
df %>%
group_by(b) %>%
printFunction(.)
prints entire data. To get dplyr print multiple tables grouped by, you should use do
df %>%
group_by(b) %>%
do(printFunction(.))

Resources