Is there an explanation for this R function merge() error? - r

I am trying to use the R merge function to combine two data.frames, but keep getting the following error:
Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
I am not sure what this error means or how to resolve it.
My code thus far is the following:
movies <- read_csv("movies.csv")
firsts = vector(length = nrow(movies))
for (i in 1:nrow(movies)) {
firsts[i] = movies$director[i] %>% str_split(" ", n = 2) %>% unlist %>% .[1]
}
movies$firsts = firsts
movies <- movies[-c(137, 147, 211, 312, 428, 439, 481, 555, 602, 830, 850, 1045, 1080, 1082, 1085, 1096, 1255, 1258, 1286, 1293, 1318, 1382, 1441, 1456, 1494, 1509, 1703, 1719, 1735, 1944, 1968, 1974, 1977, 2098, 2197, 2409, 2516, 2546, 2722, 2751, 2988, 3191,
3227, 3270, 3283, 3285, 3286, 3292, 3413, 3423, 3470, 3480, 3511, 3676, 3698, 3826, 3915, 3923, 3954, 4165, 4381, 4385, 4390, 4397, 4573, 4711, 4729, 4774, 4813, 4967, 4974, 5018, 5056, 5258, 5331, 5405, 5450, 5469, 5481, 4573, 5708, 5715, 5786, 5886, 5888, 5933, 5934, 6052, 6091, 6201, 6234, 6236, 6511, 6544, 6551, 6562, 6803, 4052, 4121, 4326),]
movies <- movies[-c(4521,5846),]
g <- gender_df(movies, name_col = "firsts", year_col = "year", method = c("ssa"))
merge(movies, g, by = c("firsts", "name"), all = FALSE)

I thinks you are trying to give the by argument a non-valid value. Indeed, the documentation tells:
By default the data frames are merged on the columns with names they
both have, but separate specifications of the columns can be given by
by.x and by.y. The rows in the two data frames that match on the
specified columns are extracted, and joined together. If there is more
than one match, all possible matches contribute one row each. For the
precise meaning of ‘match’, see match.
In your case, you shall try the following:
merge(x = movies,y = g, by.x = "firsts", by.y = "name", all = FALSE)

Related

R ggplot issue with multi line plot [duplicate]

I would like to plot each column of a dataframe to a separate layer in ggplot2.
Building the plot layer by layer works well:
df<-data.frame(x1=c(1:5),y1=c(2.0,5.4,7.1,4.6,5.0),y2=c(0.4,9.4,2.9,5.4,1.1),y3=c(2.4,6.6,8.1,5.6,6.3))
ggplot(data=df,aes(df[,1]))+geom_line(aes(y=df[,2]))+geom_line(aes(y=df[,3]))
Is there a way to plot all available columns at ones by using a single function?
I tried to do it this way but it does not work:
plotAllLayers<-function(df){
p<-ggplot(data=df,aes(df[,1]))
for(i in seq(2:ncol(df))){
p<-p+geom_line(aes(y=df[,i]))
}
return(p)
}
plotAllLayers(df)
One approach would be to reshape your data frame from wide format to long format using function melt() from library reshape2. In new data frame you will have x1 values, variable that determine from which column data came, and value that contains all original y values.
Now you can plot all data with one ggplot() and geom_line() call and use variable to have for example separate color for each line.
library(reshape2)
df.long<-melt(df,id.vars="x1")
head(df.long)
x1 variable value
1 1 y1 2.0
2 2 y1 5.4
3 3 y1 7.1
4 4 y1 4.6
5 5 y1 5.0
6 1 y2 0.4
ggplot(df.long,aes(x1,value,color=variable))+geom_line()
If you really want to use for() loop (not the best way) then you should use names(df)[-1] instead of seq(). This will make vector of column names (except first column). Then inside geom_line() use aes_string(y=i) to select column by their name.
plotAllLayers<-function(df){
p<-ggplot(data=df,aes(df[,1]))
for(i in names(df)[-1]){
p<-p+geom_line(aes_string(y=i))
}
return(p)
}
plotAllLayers(df)
I tried the melt method on a large messy dataset and wished for a faster, cleaner method. This for loop uses eval() to build the desired plot.
fields <- names(df_normal) # index, var1, var2, var3, ...
p <- ggplot( aes(x=index), data = df_normal)
for (i in 2:length(fields)) {
loop_input = paste("geom_smooth(aes(y=",fields[i],",color='",fields[i],"'))", sep="")
p <- p + eval(parse(text=loop_input))
}
p <- p + guides( color = guide_legend(title = "",) )
p
This ran a lot faster then a large melted dataset when I tested.
I also tried the for loop with aes_string(y=fields[i], color=fields[i]) method, but couldn't get the colors to be differentiated.
For the OP's situation, I think pivot_longer is best. But today I had a situation that did not seem amenable to pivoting, so I used the following code to create layers programmatically. I did not need to use eval().
data_tibble <- tibble(my_var = c(650, 1040, 1060, 1150, 1180, 1220, 1280, 1430, 1440, 1440, 1470, 1470, 1480, 1490, 1520, 1550, 1560, 1560, 1600, 1600, 1610, 1630, 1660, 1740, 1780, 1800, 1810, 1820, 1830, 1870, 1910, 1910, 1930, 1940, 1940, 1940, 1980, 1990, 2000, 2060, 2080, 2080, 2090, 2100, 2120, 2140, 2160, 2240, 2260, 2320, 2430, 2440, 2540, 2550, 2560, 2570, 2610, 2660, 2680, 2700, 2700, 2720, 2730, 2790, 2820, 2880, 2910, 2970, 2970, 3030, 3050, 3060, 3080, 3120, 3160, 3200, 3280, 3290, 3310, 3320, 3340, 3350, 3400, 3430, 3540, 3550, 3580, 3580, 3620, 3640, 3650, 3710, 3820, 3820, 3870, 3980, 4060, 4070, 4160, 4170, 4170, 4220, 4300, 4320, 4350, 4390, 4430, 4450, 4500, 4650, 4650, 5080, 5160, 5160, 5460, 5490, 5670, 5680, 5760, 5960, 5980, 6060, 6120, 6190, 6480, 6760, 7750, 8390, 9560))
# This is a normal histogram
plot <- data_tibble %>%
ggplot() +
geom_histogram(aes(x=my_var, y = ..density..))
# We prepare layers to add
stat_layers <- tibble(distribution = c("lognormal", "gamma", "normal"),
fun = c(dlnorm, dgamma, dnorm),
colour = c("red", "green", "yellow")) %>%
mutate(args = map(distribution, MASS::fitdistr, x=data_tibble$my_var)) %>%
mutate(args = map(args, ~as.list(.$estimate))) %>%
select(-distribution) %>%
pmap(stat_function)
# Final Plot
plot + stat_layers
The idea is that you organize a tibble with the arguments that you want to plug into a geom/stat function. Each row should correspond to a + layer that you want to add to the ggplot. Then use pmap. This creates a list of layers that you can simply add to your plot.
Reshaping your data so you don't need the loop is the best option. Otherwise with newer versions of ggplot, you can use the .data pronoun inside the aes(). You can do
plotAllLayers<-function(df){
p <- ggplot(data=df, aes(df[,1]))
for(i in names(df)[2:ncol(df)]){
p <- p + geom_line(aes(y=.data[[i]]))
}
return(p)
}
plotAllLayers(df)
We use the .data pronoun to get at the data passed to the ggplot object, and we iterate over the column names because .data doesn't like indexes for some reason.

Rstudio, issues with get(x) [duplicate]

I would like to plot each column of a dataframe to a separate layer in ggplot2.
Building the plot layer by layer works well:
df<-data.frame(x1=c(1:5),y1=c(2.0,5.4,7.1,4.6,5.0),y2=c(0.4,9.4,2.9,5.4,1.1),y3=c(2.4,6.6,8.1,5.6,6.3))
ggplot(data=df,aes(df[,1]))+geom_line(aes(y=df[,2]))+geom_line(aes(y=df[,3]))
Is there a way to plot all available columns at ones by using a single function?
I tried to do it this way but it does not work:
plotAllLayers<-function(df){
p<-ggplot(data=df,aes(df[,1]))
for(i in seq(2:ncol(df))){
p<-p+geom_line(aes(y=df[,i]))
}
return(p)
}
plotAllLayers(df)
One approach would be to reshape your data frame from wide format to long format using function melt() from library reshape2. In new data frame you will have x1 values, variable that determine from which column data came, and value that contains all original y values.
Now you can plot all data with one ggplot() and geom_line() call and use variable to have for example separate color for each line.
library(reshape2)
df.long<-melt(df,id.vars="x1")
head(df.long)
x1 variable value
1 1 y1 2.0
2 2 y1 5.4
3 3 y1 7.1
4 4 y1 4.6
5 5 y1 5.0
6 1 y2 0.4
ggplot(df.long,aes(x1,value,color=variable))+geom_line()
If you really want to use for() loop (not the best way) then you should use names(df)[-1] instead of seq(). This will make vector of column names (except first column). Then inside geom_line() use aes_string(y=i) to select column by their name.
plotAllLayers<-function(df){
p<-ggplot(data=df,aes(df[,1]))
for(i in names(df)[-1]){
p<-p+geom_line(aes_string(y=i))
}
return(p)
}
plotAllLayers(df)
I tried the melt method on a large messy dataset and wished for a faster, cleaner method. This for loop uses eval() to build the desired plot.
fields <- names(df_normal) # index, var1, var2, var3, ...
p <- ggplot( aes(x=index), data = df_normal)
for (i in 2:length(fields)) {
loop_input = paste("geom_smooth(aes(y=",fields[i],",color='",fields[i],"'))", sep="")
p <- p + eval(parse(text=loop_input))
}
p <- p + guides( color = guide_legend(title = "",) )
p
This ran a lot faster then a large melted dataset when I tested.
I also tried the for loop with aes_string(y=fields[i], color=fields[i]) method, but couldn't get the colors to be differentiated.
For the OP's situation, I think pivot_longer is best. But today I had a situation that did not seem amenable to pivoting, so I used the following code to create layers programmatically. I did not need to use eval().
data_tibble <- tibble(my_var = c(650, 1040, 1060, 1150, 1180, 1220, 1280, 1430, 1440, 1440, 1470, 1470, 1480, 1490, 1520, 1550, 1560, 1560, 1600, 1600, 1610, 1630, 1660, 1740, 1780, 1800, 1810, 1820, 1830, 1870, 1910, 1910, 1930, 1940, 1940, 1940, 1980, 1990, 2000, 2060, 2080, 2080, 2090, 2100, 2120, 2140, 2160, 2240, 2260, 2320, 2430, 2440, 2540, 2550, 2560, 2570, 2610, 2660, 2680, 2700, 2700, 2720, 2730, 2790, 2820, 2880, 2910, 2970, 2970, 3030, 3050, 3060, 3080, 3120, 3160, 3200, 3280, 3290, 3310, 3320, 3340, 3350, 3400, 3430, 3540, 3550, 3580, 3580, 3620, 3640, 3650, 3710, 3820, 3820, 3870, 3980, 4060, 4070, 4160, 4170, 4170, 4220, 4300, 4320, 4350, 4390, 4430, 4450, 4500, 4650, 4650, 5080, 5160, 5160, 5460, 5490, 5670, 5680, 5760, 5960, 5980, 6060, 6120, 6190, 6480, 6760, 7750, 8390, 9560))
# This is a normal histogram
plot <- data_tibble %>%
ggplot() +
geom_histogram(aes(x=my_var, y = ..density..))
# We prepare layers to add
stat_layers <- tibble(distribution = c("lognormal", "gamma", "normal"),
fun = c(dlnorm, dgamma, dnorm),
colour = c("red", "green", "yellow")) %>%
mutate(args = map(distribution, MASS::fitdistr, x=data_tibble$my_var)) %>%
mutate(args = map(args, ~as.list(.$estimate))) %>%
select(-distribution) %>%
pmap(stat_function)
# Final Plot
plot + stat_layers
The idea is that you organize a tibble with the arguments that you want to plug into a geom/stat function. Each row should correspond to a + layer that you want to add to the ggplot. Then use pmap. This creates a list of layers that you can simply add to your plot.
Reshaping your data so you don't need the loop is the best option. Otherwise with newer versions of ggplot, you can use the .data pronoun inside the aes(). You can do
plotAllLayers<-function(df){
p <- ggplot(data=df, aes(df[,1]))
for(i in names(df)[2:ncol(df)]){
p <- p + geom_line(aes(y=.data[[i]]))
}
return(p)
}
plotAllLayers(df)
We use the .data pronoun to get at the data passed to the ggplot object, and we iterate over the column names because .data doesn't like indexes for some reason.

Replacing values in df using index - why not working?

I am using the function provided in here: Replacing values in df using index and here: How to repeat the Grubbs test and flag the outliers
# Function to detect outliers with Grubbs test in a vector
grubbs.flag <- function(vector) {
outliers <- NULL
test <- vector
grubbs.result <- grubbs.test(test)
pv <- grubbs.result$p.value
# throw an error if there are too few values for the Grubb's test
if (length(test) < 3 ) stop("Grubb's test requires > 2 input values")
na.vect <- test
while(pv < 0.05) {
outliers <- c(outliers,as.numeric(strsplit(grubbs.result$alternative," ")[[1]][3]))
test <- vector[!vector %in% outliers]
# stop if all but two values are flagged as outliers
if (length(test) < 3 ) {
warning("All but two values flagged as outliers")
break
}
grubbs.result <- grubbs.test(test)
pv <- grubbs.result$p.value
idx.outlier <- which(vector %in% outliers)
na.vect <- replace(vector, idx.outlier, NA)
}
return(na.vect)
}
It works perfectly on example data provided there. But when I am trying to run it on my dataframe its seems that loop does not end or something. Does anyone know why is that?
My data:
test <- structure(list(Abs_18 = c(0.04359, 0.05682, 0.05002, 0.04997,
0.03433, 0.060055, 0.0447, 0.0499, 0.04509, 0.04875, 0.04052,
0.062785, 0.07602, 0.05072, 0.04253, 0.05595, 0.02888, 0.077018,
0.05416, 0.04966, 0.0476, 0.04252, 0.03891, 0.065207, 0.02675,
0.05892, 0.03523, 0.04546, 0.02696, 0.024995, 0.02469, 0.0442,
0.04504, 0.04421, 0.04683, 0.08017, -0.065334, 0.04914, 0.04086,
0.05341, 0.02706, 0.065362, 0.01571, 0.01021, 0.04802, 0.04807,
0.02735, 0.062755), FL_18 = c(3618, 3526, 3543, 5323, 5050, 767,
3641, 3418, 3353, 4179, 4864, 760, 3693, 3408, 3309, 5057, 4686,
748, 3693, 3349, 3240, 3934, 4876, 741, 2394, 3477, 3417, 4254,
4899, 755, 2375, 3486, 3370, 4516, 4838, 772, 817, 3449, 3361,
3945, 4856, 802, 2293, 2529, 3410, 4460, 5175, 813), Abs_25 = c(0.04261,
0.05332, 0.04966, 0.0482, 0.03355, 0.059344, 0.04572, 0.04967,
0.04275, 0.04989, 0.02745, 0.059196, 0.04649, 0.05517, 0.04181,
0.06214, 0.02749, 0.074719, 0.05264, 0.044, 0.04486, 0.03999,
0.0331, 0.058829, 0.03119, 0.05943, 0.03781, 0.04003, 0.02383,
0.069582, 0.02868, 0.04943, 0.04566, 0.0422, 0.03265, 0.067265,
-0.067674, 0.05038, 0.03828, 0.03854, 0.02671, 0.071176, 0.01602,
0.01055, 0.03961, 0.04729, 0.03009, 0.06377), FL_25 = c(2714,
2656, 2625, 3856, 3642, 606, 2759, 2580, 2498, 3276, 3495, 596,
2808, 2590, 2482, 3759, 3365, 586, 2838, 2548, 2433, 2864, 3557,
591, 1878, 2664, 2588, 3081, 3603, 602, 1820, 2672, 2576, 3154,
3589, 617, 572, 2661, 2575, 2918, 3601, 635, 1739, 1924, 2650,
3260, 3866, 655)), .Names = c("Abs_18", "FL_18", "Abs_25", "FL_25"
), row.names = c(NA, -48L), class = "data.frame")
I am using:
apply(test,2,grubbs.flag)

use if() to use select() within a dplyr pipe chain

Read these two posts already:
can dplyr package be used for conditional mutating?
R Conditional evaluation when using the pipe operator %>%
I'm using Shiny input$selector and if the user has selected a particular value, I want my dataframe to be different than otherwise.
Here's a chain:
filtered_funnel <- reactive({
lastmonth_funnel %>%
filter(input$channel == "All" | Channel == input$channel) %>%
filter(input$promo == "All" | Promo == input$promo) %>%
## HERE IS WHERE I'M STRUGGLING
{if(input$promo != "none") select(., c("Channel", "Promo", "ShippingDetails", "Checkout", "Transactions"))} %>%
gather(Funnel, Sessions, -Channel, -Promo) %>%
group_by(Channel, Promo, Funnel) %>%
summarise(Sessions = sum(Sessions))
})
If the user input does not equal "none" I would like to select variables "Channel", "Promo", "ShippingDetails", "Checkout" and "Transactions".
I tried a few variations of the problem line above but kept getting errors:
When I tried this within the pipe chain
{if(input$promo != "none") select(., c("Channel", "Promo", "ShippingDetails", "Checkout", "Transactions"))} %>%
I received this error:
Warning: Error in : All select() inputs must resolve to integer column
positions. The following do not:
* c("Channel", "Promo", "ShippingDetails", "Checkout", "Transactions")
I also tried:
{if(input$promo != "none") select(., c(Channel, Promo, ShippingDetails, Checkout:Transactions))} %>%
This actually runs till I select "none" in the input, in which case I get
Error in : is.character(x) is not TRUE
I got the same error when I tried this:
{ifelse(input$promo != "none", select(., c(Channel, Promo, ShippingDetails, Checkout:Transactions)), .)} %>%
How can I nest in a dplyr pipe chain a select statement that says if input$promo != "none" then select Channel, Promo, ShippingDetails, Checkout:Transactions from the passed object in the pipe?
-- Here's dput of the randomly generated data--
> dput(lastmonth_funnel)
structure(list(Channel = c("Facebook", "Youtube", "SEM", "Organic",
"Direct", "Email", "Facebook", "Youtube", "SEM", "Organic", "Direct",
"Email", "Facebook", "Youtube", "SEM", "Organic", "Direct", "Email",
"Facebook", "Youtube", "SEM", "Organic", "Direct", "Email", "Facebook",
"Youtube", "SEM", "Organic", "Direct", "Email"), Promo = c("none",
"none", "none", "none", "none", "none", "banannas", "banannas",
"banannas", "banannas", "banannas", "banannas", "carrots", "carrots",
"carrots", "carrots", "carrots", "carrots", "pears", "pears",
"pears", "pears", "pears", "pears", "apples", "apples", "apples",
"apples", "apples", "apples"), Sessions = c(6587, 3015, 6316,
11219, 8117, 6473, 12464, 14032, 14318, 17535, 16219, 7838, 10685,
12040, 19907, 13694, 6187, 16784, 21425, 18890, 24891, 16251,
16977, 25206, 28573, 18704, 29178, 22069, 39687, 53734), AddToCart = c(279,
4955, 5636, 8991, 15530, 18374, 9431, 5980, 4852, 5412, 4114,
1782, 370, 3208, 6311, 9760, 7428, 6792, 3500, 5446, 1507, 783,
2032, 833, 397, 2760, 5784, 9810, 13274, 14470), Registrations = c(194,
3210, 3573, 6067, 10305, 12653, 6564, 3874, 3076, 3652, 2730,
1227, 257, 2078, 4001, 6586, 4929, 4677, 2436, 3528, 955, 528,
1348, 573, 276, 1788, 3667, 6620, 8808, 9964), ShippingDetails = c(134,
2235, 2593, 4266, 7408, 9244, 4557, 2698, 2232, 2568, 1962, 896,
178, 1447, 2904, 4631, 3543, 3417, 1691, 2457, 693, 371, 969,
418, 191, 1245, 2661, 4655, 6332, 7280), Checkout = c(90, 1436,
1792, 2864, 4672, 5666, 3078, 1734, 1543, 1724, 1237, 549, 120,
930, 2007, 3109, 2234, 2094, 1142, 1579, 479, 249, 611, 256,
129, 800, 1839, 3125, 3993, 4462), Transactions = c(59, 937,
1192, 1819, 2602, 2926, 2039, 1132, 1026, 1095, 689, 283, 79,
607, 1335, 1975, 1244, 1081, 756, 1031, 318, 158, 340, 132, 85,
522, 1223, 1985, 2224, 2304)), class = "data.frame", row.names = c(NA,
-30L), .Names = c("Channel", "Promo", "Sessions", "AddToCart",
"Registrations", "ShippingDetails", "Checkout", "Transactions"
))
You need to make sure that your statement between { returns a data.frame regardless of the condition. So you need an else ..
cond <- FALSE
mtcars %>%
group_by(cyl) %>%
{ if (cond) filter(., am == 1) else . } %>%
summarise(m = mean(wt))
Works fine with TRUE or FALSE.
(Also note that a simple example like this really makes the question a lot more easy to grasp.)

how to add layers in ggplot using a for-loop

I would like to plot each column of a dataframe to a separate layer in ggplot2.
Building the plot layer by layer works well:
df<-data.frame(x1=c(1:5),y1=c(2.0,5.4,7.1,4.6,5.0),y2=c(0.4,9.4,2.9,5.4,1.1),y3=c(2.4,6.6,8.1,5.6,6.3))
ggplot(data=df,aes(df[,1]))+geom_line(aes(y=df[,2]))+geom_line(aes(y=df[,3]))
Is there a way to plot all available columns at ones by using a single function?
I tried to do it this way but it does not work:
plotAllLayers<-function(df){
p<-ggplot(data=df,aes(df[,1]))
for(i in seq(2:ncol(df))){
p<-p+geom_line(aes(y=df[,i]))
}
return(p)
}
plotAllLayers(df)
One approach would be to reshape your data frame from wide format to long format using function melt() from library reshape2. In new data frame you will have x1 values, variable that determine from which column data came, and value that contains all original y values.
Now you can plot all data with one ggplot() and geom_line() call and use variable to have for example separate color for each line.
library(reshape2)
df.long<-melt(df,id.vars="x1")
head(df.long)
x1 variable value
1 1 y1 2.0
2 2 y1 5.4
3 3 y1 7.1
4 4 y1 4.6
5 5 y1 5.0
6 1 y2 0.4
ggplot(df.long,aes(x1,value,color=variable))+geom_line()
If you really want to use for() loop (not the best way) then you should use names(df)[-1] instead of seq(). This will make vector of column names (except first column). Then inside geom_line() use aes_string(y=i) to select column by their name.
plotAllLayers<-function(df){
p<-ggplot(data=df,aes(df[,1]))
for(i in names(df)[-1]){
p<-p+geom_line(aes_string(y=i))
}
return(p)
}
plotAllLayers(df)
I tried the melt method on a large messy dataset and wished for a faster, cleaner method. This for loop uses eval() to build the desired plot.
fields <- names(df_normal) # index, var1, var2, var3, ...
p <- ggplot( aes(x=index), data = df_normal)
for (i in 2:length(fields)) {
loop_input = paste("geom_smooth(aes(y=",fields[i],",color='",fields[i],"'))", sep="")
p <- p + eval(parse(text=loop_input))
}
p <- p + guides( color = guide_legend(title = "",) )
p
This ran a lot faster then a large melted dataset when I tested.
I also tried the for loop with aes_string(y=fields[i], color=fields[i]) method, but couldn't get the colors to be differentiated.
For the OP's situation, I think pivot_longer is best. But today I had a situation that did not seem amenable to pivoting, so I used the following code to create layers programmatically. I did not need to use eval().
data_tibble <- tibble(my_var = c(650, 1040, 1060, 1150, 1180, 1220, 1280, 1430, 1440, 1440, 1470, 1470, 1480, 1490, 1520, 1550, 1560, 1560, 1600, 1600, 1610, 1630, 1660, 1740, 1780, 1800, 1810, 1820, 1830, 1870, 1910, 1910, 1930, 1940, 1940, 1940, 1980, 1990, 2000, 2060, 2080, 2080, 2090, 2100, 2120, 2140, 2160, 2240, 2260, 2320, 2430, 2440, 2540, 2550, 2560, 2570, 2610, 2660, 2680, 2700, 2700, 2720, 2730, 2790, 2820, 2880, 2910, 2970, 2970, 3030, 3050, 3060, 3080, 3120, 3160, 3200, 3280, 3290, 3310, 3320, 3340, 3350, 3400, 3430, 3540, 3550, 3580, 3580, 3620, 3640, 3650, 3710, 3820, 3820, 3870, 3980, 4060, 4070, 4160, 4170, 4170, 4220, 4300, 4320, 4350, 4390, 4430, 4450, 4500, 4650, 4650, 5080, 5160, 5160, 5460, 5490, 5670, 5680, 5760, 5960, 5980, 6060, 6120, 6190, 6480, 6760, 7750, 8390, 9560))
# This is a normal histogram
plot <- data_tibble %>%
ggplot() +
geom_histogram(aes(x=my_var, y = ..density..))
# We prepare layers to add
stat_layers <- tibble(distribution = c("lognormal", "gamma", "normal"),
fun = c(dlnorm, dgamma, dnorm),
colour = c("red", "green", "yellow")) %>%
mutate(args = map(distribution, MASS::fitdistr, x=data_tibble$my_var)) %>%
mutate(args = map(args, ~as.list(.$estimate))) %>%
select(-distribution) %>%
pmap(stat_function)
# Final Plot
plot + stat_layers
The idea is that you organize a tibble with the arguments that you want to plug into a geom/stat function. Each row should correspond to a + layer that you want to add to the ggplot. Then use pmap. This creates a list of layers that you can simply add to your plot.
Reshaping your data so you don't need the loop is the best option. Otherwise with newer versions of ggplot, you can use the .data pronoun inside the aes(). You can do
plotAllLayers<-function(df){
p <- ggplot(data=df, aes(df[,1]))
for(i in names(df)[2:ncol(df)]){
p <- p + geom_line(aes(y=.data[[i]]))
}
return(p)
}
plotAllLayers(df)
We use the .data pronoun to get at the data passed to the ggplot object, and we iterate over the column names because .data doesn't like indexes for some reason.

Resources