Dynamic row index object not found in user-defined function R - r

I am trying to pass an integer ("i") to a function in which "i" is used as a row index for a data frame. However, doing this...
user_definedFUN <- function (i){
...
result <- df[i, "col_name"]
...
}
x <- user_definedFUN(1)
...yields the following error:
Error in `[.data.frame`(df, i, "col_name") :
object 'i' not found
I'm certain this is a simple issue of how I am referencing "i" within the brackets (even if not simple enough for me to find a solution); however, I have provided additional details below if necessary.
The data.frame:
gen_name <- c("Boomers","Gen X","Millenials","Gen Z")
gen_years <- c("1946 to 1964","1965 to 1980","1981 to 1996", "1997 to 2011")
gen_xmin <- c(11, 9, 5, 2)
gen_xmax <- c(15, 11, 8, 5)
GEN_G.labels <- data.frame(gen_name, gen_years, gen_xmin, gen_xmax)
The data.frame contains information for four generations that will be used to plot rectangles as layers on a ggplot bar chart of populations by age.
The rectangles will be created by the following function that will be called from a loop and is provided the row index for specific generation (1 = "Boomers", 2 = "Gen X", etc.)
genlabelsFUN <- function(i){
# return a geom_rect()
rv <- geom_rect(aes(
xmin = GEN_G.labels[i, "gen_xmin"],
xmax = GEN_G.labels[i, "gen_xmax"],
ymin = 1000,
ymax = 1100)
, fill = "red")
return(rv)
}
ggplot(...snip...) +
...snip... +
genlabelsFUN(1)
The function works if a static index value is used. For example, 'GEN_G.labels[1, "gen_xmin"]' instead of 'GEN_G.labels[i, "gen_xmin"]' places a red rectangle between 11 and 15 on the x-axis at 1,000 on the y-axis with a height of 100. Although, the function is pointless without the dynamic aspect of "i".
The following image shows the output when using a static index value (Note: I'm using a different y-axis scale in my example above for simplicity). The final code will loop through each row of GEN_G.labels and run genlabelsFUN() to create a similar rectangle for each generation.
Thanks
EDIT:
Full ggplot
scaleFUN <- function(x) formatC( x / 1000, format = "f", big.mark = ",", digits = 0) #format as thousands with comma
ggplot(data = GEN_G.data_frame, aes(x = range, y = persons)) +
geom_bar(stat = "identity") +
theme_classic() +
theme(
axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_continuous(
name = "Persons (thousands)",
labels = scaleFUN) +
genlabelsFUN(1)
EDIT 2:
Reproducible example (functioning based on MrFlick comment below)
GEN_G.dataframe <- data.frame(
range = c(1:21),
persons = abs(rnorm(21))*50)
GEN_G.labelsx <- data.frame(
gen_name = c("Group A","Group B","Group C","Group D"),
gen_xmin = c(11, 9, 5, 2),
gen_xmax = c(15, 11, 9, 5))
GEN_G.labelsx$gen_name <- factor(
GEN_G.labelsx$gen_name,
levels = GEN_G.labelsx$gen_name)
ggplot() +
geom_bar(data=GEN_G.dataframe,aes(x=range, y=persons),stat="identity") +
theme_classic() +
theme(
axis.text.x = element_text(angle = 90, hjust = 1)) +
geom_rect(aes(
xmin = gen_xmin,
xmax = gen_xmax,
ymin = 175,
ymax = 180,
fill = gen_name),
data = GEN_G.labelsx)
Output from Edit 2 example.

You can't use variables like i inside an aes(). Symbols inside the aes() are not evaluated until the plot is actually drawn. There's no way for R to properly capture the environment were i is defined that way so the value would have changed by the time the plot is drawn.
However, I don't really think a loop/function is even necessary. You should just be able to do
geom_rect(aes(xmin=gen_xmin, xmax=gen_xmax), ymin=1000, ymax=1000, data=GEN_G.labels)
to use a different data.frame for that layer. Then all the boxes are drawn at once without a loop.

Related

How to add tick marks on a plot that is not from plot() in R

I use a R package, SetMethods, to get the fsQCA results of panel data. In the package, it uses cluster.plot() function to generate a plot.
However, I have a hard time letting the x-axis of the graph show the number of units as tick marks. For example, I want it shows 10, 20, 30,..,140 on the x-axis to know how many units' consistency score lower than a certain point.
Is there any method to add tick marks on a plot that is not generated by plot() function? Thanks in advance.
Here I use the dataset in the package as an example.
install.packages("SetMethods")
library(SetMethods)
data("PAYF")
PS <- minimize(data = PAYF,
outcome = "HL",
conditions = c("HE","GG","AH","HI","HW"),
incl.cut = 0.9,
n.cut = 2,
include = "?",
details = TRUE,
show.cases = TRUE)
PS
# Perform cluster diagnostics:
CB <- cluster(data = PAYF,
results = PS,
outcome = "HL",
unit_id = "COUNTRY",
cluster_id = "REGION",
necessity=FALSE,
wicons = FALSE)
CB
# Plot pooled, between, and within consistencies:
cluster.plot(cluster.res = CB,
labs = TRUE,
size = 8,
angle = 6,
wicons = TRUE)
Finally, I get a graph as follows.
However, I want it shows 10, 20, 30,..,140 on the x-axis to know how many units' consistency score lower than a certain point.
Is there any method to add tick marks on a plot that is not generated by plot() function? Thanks in advance.
If you look inside the cluster.plot function definition (in RStudio press F2 while pointer is on it) you will see that it uses ggplot2 under the hood. Only it doesn't return ggplot2 objects but just prints them one over another. Because of this it's not really possible to modify the output afterwards in any covenient manner.
But you can always copy the function code and rewrite it for your own need. The part that prints the final plot in your case is
CTw <- list()
ticklabw = unique(as.character(cluster.res$unit_ids))
xtickw <- seq(1, length(ticklabw), by = 1)
if (class(cluster.res) == "clusterminimize") {
for (i in 1:length(cluster.res$output)) {
CTw[[i]] <- cluster.res$output[[i]]$WICONS
dtw <- data.frame(x = xtickw, y = CTw[[i]])
dtw <- dtw[order(dtw$y), ]
dtw$xr <- reorder(dtw$x, 1 - dtw$y)
pw <- ggplot(dtw, aes(y = dtw[, 2], x = dtw[,
3])) + geom_point() + ylim(0, 1) + theme_classic(base_size = 16) +
geom_hline(yintercept = cluster.res$output[[i]]$POCOS) +
labs(title = names(cluster.res$output[i]),
x = "Units", y = "Consistency") + theme(axis.text.x = element_blank())
suppressWarnings(print(pw))
}
}
You can modify the ggplot2 construction part to something like this (packages ggplot2 and dplyr need to be loaded):
pw <-
dtw %>%
mutate(x_ind = as.numeric(xr)) %>%
ggplot(aes(x_ind, y)) +
geom_point() +
ylim(0, 1) +
theme_classic(base_size = 16) +
geom_hline(yintercept = cluster.res$output[[i]]$POCOS) +
scale_x_continuous(breaks = seq(from = 0, to = 140, by = 10)) +
labs(title = names(cluster.res$output[i]),
x = "Units", y = "Consistency")

Dropping data outside valid range when using geom_ma in scatterplot

I have four categories that I am plotting her using ggplot. I would like add a moving average using geom_ma but I have too few of the green dots to get a good moving average (I would prefer a period of at least 20). How can I keep the scatterplot as is and only add a MA of the purple and blue dots, which would be in my range of a 20 period moving average?
Example:
ggplot(data, aes(x, y, color=Str)) + geom_point(stat="identity") + geom_ma(ma_fun = SMA, n = 20, linetype=1, size=1, na.rm=TRUE)
I get the error: "Warning message:
Computation failed in stat_sma():
n = 20 is outside valid range: [1, 10]"
This is a great example of why it helps to provide a minimal reproducible example. You have provided the code that produced the error, but there is nothing wrong with the code on its own: it will only cause this error with certain inputs. Given suitable data, your code is fine.
Let's make a dummy data frame with the same name and column names as your data frame. We will make data for the first 330 days of 2020, and we will have 4 groups in Str, so a total of 1320 rows:
library(tidyquant)
library(ggplot2)
set.seed(1)
data <- data.frame(x = rep(seq(as.Date("2020-01-01"),
by = "day", length.out = 330), 4),
y = as.vector(replicate(4, 1000 * cumsum(rnorm(330)))),
Str = rep(c("A", "B", "C", "D"), each = 330))
Now if we use your exact plotting code, we can see that the plot is fine:
ggplot(data, aes(x, y, color = Str)) +
geom_point(stat="identity") +
geom_ma(ma_fun = SMA, n = 20, linetype = 1, size = 1, na.rm = TRUE)
But if one or more of our Str groups has fewer than 20 measurements, then we get your error. Let's remove most of the Str == "A" and Str == "B" cases, and repeat the plot:
data <- data[c(1:20 * 33, 661:1320),]
ggplot(data, aes(x, y, color = Str)) +
geom_point(stat="identity") +
geom_ma(ma_fun = SMA, n = 20, linetype = 1, size = 1, na.rm = TRUE)
#> Warning: Computation failed in `stat_sma()`:
#> n = 20 is outside valid range: [1, 10]
We get your exact warning, and the MA lines disappear from all the groups. Clearly we cannot get a 20-measurement moving average if we only have 10 data points, so geom_ma just gives up.
The fix here is to use the data = argument in geom_ma to filter out any groups with fewer than 20 data points:
ggplot(data, aes(x, y, color = Str)) +
geom_point(stat="identity") +
geom_ma(ma_fun = SMA, n = 20, linetype = 1, size = 1, na.rm = TRUE,
data = data[data$Str %in% names(table(data$Str)[table(data$Str) > 20]),])

Keep ggplot secondary axis scale fixed

I'm making a ggplot with a secondary axis using the sec_axis() function but am having trouble retaining the correct scale.
Below is a reproducible example
# load package
library(ggplot2)
# produce dummy data
data = data.frame(week = 1:5,
count = c(45, 67, 21, 34, 50),
rate = c(3, 6, 2, 5, 3))
# calculate scale (and save as an object called 'scale')
scale = max(data$count)/10
# produce ggplot
p = ggplot(data, aes(x = week)) +
geom_bar(aes(y = count), stat = "identity") +
geom_point(aes(y = rate*scale)) +
scale_y_continuous(sec.axis = sec_axis(~./scale, name = "% positive",
breaks = seq(0, 10, 2)))
# look at ggplot - all looks good
p
# change the value of the scale object
scale = 2
# look at ggplot - you can see the scale has now change
p
In reality I am producing a series of ggplot's within a loop and within each iteration of the loop the 'scale' object changes
Question
How do I ensure the scale of my secondary y-axis remains fixed? (even if the value of the 'scale' object changes)
EDIT
I wanted to keep the example as simple as possible (see example above) but on request I'll add an example which includes a loop
# load package
library(ggplot2)
# produce dummy data
data = data.frame(group = c(rep("A", 5), rep("B", 5)),
week = rep(1:5, 2),
count = c(45, 67, 21, 34, 50,
120, 200, 167, 148, 111),
rate = c(3, 6, 2, 5, 3,
15, 17, 20, 11, 12))
# define the groups i want to loop over
group = c("A", "B")
# initalize an empty list (to store figures)
fig_list = list()
for(i in seq_along(group)){
# subset the data
data.sub = data[data$group == group[i], ]
# calculate scale (and save as an object called 'scale')
scale = max(data.sub$count)/20
# produce the plot
p = ggplot(data.sub, aes(x = week)) +
geom_bar(aes(y = count), stat = "identity") +
geom_point(aes(y = rate*scale), size = 4, colour = "dark red") +
scale_y_continuous(sec.axis = sec_axis(~./scale, name = "% positive",
breaks = seq(0, 20, 5))) +
ggtitle(paste("Plot", group[i]))
# print the plot
print(p)
# store the plot in a list
fig_list[[group[i]]] = p
}
I get the following figures when printing within the loop (everything looks good)
However... if I call the figure for group A from the list I created you can see the secondary y-axis scale is now incorrect (it has used the scale created for group B)
fig_list[["A"]]
Thanks for your edit, this makes things clearer. Your problem stems from the way R evaluates objects. The plot in your fig_list is not an image, but an outline on how the plot should be generated. It is only generated when you call print (by typing fig_list["A"]and hitting enter). Since the value for scale changes throughout the loop, if you evaluate the plot later, it will be incorrect, since it will use the last iteration of scale.
An easy solution is to wrap your code for plotting in a function and use lapply:
make_plot <- function(df) {
scale = max(df$count)/20
ggplot(df, aes(x = week)) +
geom_bar(aes(y = count), stat = "identity") +
geom_point(aes(y = rate*scale), size = 4, colour = "dark red") +
scale_y_continuous(sec.axis = sec_axis(~./scale, name = "% positive",
breaks = seq(0, 20, 5))) +
ggtitle(paste("Plot", unique(df$group)))
}
grouped_data <- split(data, data$group)
fig_list <- lapply(grouped_data, make_plot)
Now when you call the first plot, it is evaluated correctly.
fig_list["A"]
#> $A
This still works when you happen to have an object scale with a bogus value in your environment, since R looks up scale within the function call, and not in the global environment.
Created on 2018-09-02 by the reprex
package (v0.2.0).

R/ggplot2: Collapse or remove segment of y-axis from scatter-plot

I'm trying to make a scatter plot in R with ggplot2, where the middle of the y-axis is collapsed or removed, because there is no data there. I did it in photoshop below, but is there a way to create a similar plot with ggplot?
This is the data with a continuous scale:
But I'm trying to make something like this:
Here is the code:
ggplot(data=distance_data) +
geom_point(
aes(
x = mdistance,
y = maxZ,
shape = factor(subj),
color = factor(side),
size = (cSA)
)
) +
scale_size_continuous(range = c(4, 10)) +
theme(
axis.text.x = element_text(colour = "black", size = 15),
axis.text.y = element_text(colour = "black", size = 15),
axis.title.x = element_text(colour = "black", size= 20, vjust = 0),
axis.title.y = element_text(colour = "black", size= 20),
legend.position = "none"
) +
ylab("Z-score") +
xlab("Distance")
You could do this by defining a coordinate transformation. A standard example are logarithmic coordinates, which can be achieved in ggplot by using scale_y_log10().
But you can also define custom transformation functions by supplying the trans argument to scale_y_continuous() (and similarly for scale_x_continuous()). To this end, you use the function trans_new() from the scales package. It takes as arguments the transformation function and its inverse.
I discuss first a special solution for the OP's example and then also show how this can be generalised.
OP's example
The OP wants to shrink the interval between -2 and 2. The following defines a function (and its inverse) that shrinks this interval by a factor 4:
library(scales)
trans <- function(x) {
ifelse(x > 2, x - 1.5, ifelse(x < -2, x + 1.5, x/4))
}
inv <- function(x) {
ifelse(x > 0.5, x + 1.5, ifelse(x < -0.5, x - 1.5, x*4))
}
my_trans <- trans_new("my_trans", trans, inv)
This defines the transformation. To see it in action, I define some sample data:
x_val <- 0:250
y_val <- c(-6:-2, 2:6)
set.seed(1234)
data <- data.frame(x = sample(x_val, 30, replace = TRUE),
y = sample(y_val, 30, replace = TRUE))
I first plot it without transformation:
p <- ggplot(data, aes(x, y)) + geom_point()
p + scale_y_continuous(breaks = seq(-6, 6, by = 2))
Now I use scale_y_continuous() with the transformation:
p + scale_y_continuous(trans = my_trans,
breaks = seq(-6, 6, by = 2))
If you want another transformation, you have to change the definition of trans() and inv() and run trans_new() again. You have to make sure that inv() is indeed the inverse of inv(). I checked this as follows:
x <- runif(100, -100, 100)
identical(x, trans(inv(x)))
## [1] TRUE
General solution
The function below defines a transformation where you can choose the lower and upper end of the region to be squished, as well as the factor to be used. It directly returns the trans object that can be used inside scale_y_continuous:
library(scales)
squish_trans <- function(from, to, factor) {
trans <- function(x) {
if (any(is.na(x))) return(x)
# get indices for the relevant regions
isq <- x > from & x < to
ito <- x >= to
# apply transformation
x[isq] <- from + (x[isq] - from)/factor
x[ito] <- from + (to - from)/factor + (x[ito] - to)
return(x)
}
inv <- function(x) {
if (any(is.na(x))) return(x)
# get indices for the relevant regions
isq <- x > from & x < from + (to - from)/factor
ito <- x >= from + (to - from)/factor
# apply transformation
x[isq] <- from + (x[isq] - from) * factor
x[ito] <- to + (x[ito] - (from + (to - from)/factor))
return(x)
}
# return the transformation
return(trans_new("squished", trans, inv))
}
The first line in trans() and inv() handles the case when the transformation is called with x = c(NA, NA). (It seems that this did not happen with the version of ggplot2 when I originally wrote this question. Unfortunately, I don't know with which version this startet.)
This function can now be used to conveniently redo the plot from the first section:
p + scale_y_continuous(trans = squish_trans(-2, 2, 4),
breaks = seq(-6, 6, by = 2))
The following example shows that you can squish the scale at an arbitrary position and that this also works for other geoms than points:
df <- data.frame(class = LETTERS[1:4],
val = c(1, 2, 101, 102))
ggplot(df, aes(x = class, y = val)) + geom_bar(stat = "identity") +
scale_y_continuous(trans = squish_trans(3, 100, 50),
breaks = c(0, 1, 2, 3, 50, 100, 101, 102))
Let me close by stressing what other already mentioned in comments: this kind of plot could be misleading and should be used with care!

How do you label a horizontal line when the x axis is categorical?

There is a worked example that shows how to label a straight line in R using ggplot2. Please look at example 5 - "Recreate the following plot of flight volume by longitude".
How do you code if the x axis was categorical instead of continuous? How would one write the part of the syntax in geom_text that is currently
data = data.frame(x = - 119, y = 0)
I created a line
+ geom_text(aes(x,y, label = "seronegative"),
data = data.frame(x = 1, y = 20),
size = 4, hjust = 0, vjust = 0, angle = 0)
and I tried several options
data = data.frame(x = 1, y = 20)
data = data.frame(x = factor(1), y = 20)
#where gard is the name of one of the categories
data = data.frame(x = "gard", y = 20)
...but I get the error
invalid argument to unary operator
It's not entirely clear to me what you're trying to do, since you say you try to create a line, and then your code uses geom_text. Assuming that you'd like to place a vertical line, with a text label oriented vertically on that line, using a categorical x variable, here's a simple example:
dat <- data.frame(x = letters[1:5],y = 1:5)
txt <- data.frame(x = 1.5, y = 1, lab = "label")
ggplot(dat,aes(x = x, y = y)) +
geom_point() +
geom_vline(xintercept = 1.5) +
geom_text(data = txt,aes(label = lab),angle = 90, hjust = 0, vjust = 0)
which on my machine produces this output:
Note that I put the text labels in a separate data frame, outside the ggplot call. That is not be strictly necessary, but I prefer it as I find that it avoids confusion.
Using an x value of 1.5 for the text label works here, as would setting it to "a" if you wanted it directly on the plotted x values.
The error you're describing suggests to me a simple syntax error somewhere in your code (which you haven't completely provided). Perhaps this example will help you to spot it.

Resources