Setting limits to a discrete data based x-axis - r

I am struggling to make a graph, using ggplot2.
below, you can see the output I get and the relative code:
library(ggplot2)
## Defining Dataframe
Dati <- data.frame(Correction = c("0%", "+5%", "+10%", "+15%"),
Vix = c(65700, 48000, 45500, 37800))
## Create factors
Dati$Correction <- as.factor(Dati$Correction)
Dati$Correction <- factor(Dati$Correction, levels = c("0%", "+5%", "+10%", "+15%"))
## Defining graph
Graph <- ggplot(data=Dati, aes(x=Correction, y=Vix)) + geom_point(color = "#e60000", shape = 1, size = 3.5) +
geom_smooth(aes(as.numeric(Correction), Vix), level=0.75, span = 1, color = "#e60000", method=lm) +
xlab("xlab") + ylab("y lab") + labs(color='') +
guides(color = FALSE, size = FALSE)
Graph
What I would like is to set the limits, perhaps with scale_x_discrete() but I am not succeeding with it, so as to remove the external space before and after the line:
Is it possible to do somehow? I would like to do this also in case in x-axis would be presents textual values.
thank you in advance for every eventual help.

You should set an expand in the scale_x_discrete function. This will change the limits of your discrete x-axis. You can change this to what you want. You can use this code:
## Defining graph
Graph <- ggplot(data=Dati, aes(x=Correction, y=Vix)) + geom_point(color = "#e60000", shape = 1, size = 3.5) +
geom_smooth(aes(as.numeric(Correction), Vix), level=0.75, span = 1, color = "#e60000", method=lm) +
xlab("xlab") + ylab("y lab") + labs(color='') +
guides(color = FALSE, size = FALSE) +
scale_x_discrete(expand=c(0.05, 0))
Graph
Output:

The x axis is a continuous variable, so you need to use scale_x_continuous(). To remove the padding on the scale, set scale_x_continuous(expand = c(0, 0))

Related

Use free_y scale on first axis and fixed on second + facet_grid + ggplot2

Is there any method to set scale = 'free_y' on the left hand (first) axis in ggplot2 and use a fixed axis on the right hand (second) axis?
I have a dataset where I need to use free scales for one variable and fixed for another but represent both on the same plot. To do so I'm trying to add a second, fixed, y-axis to my data. The problem is I cannot find any method to set a fixed scale for the 2nd axis and have that reflected in the facet grid.
This is the code I have so far to create the graph -
#plot weekly seizure date
p <- ggplot(dfspw_all, aes(x=WkYr, y=Seizures, group = 1)) + geom_line() +
xlab("Week Under Observation") + ggtitle("Average Seizures per Week - To Date") +
geom_line(data = dfsl_all, aes(x =WkYr, y = Sleep), color = 'green') +
scale_y_continuous(
# Features of the first axis
name = "Seizures",
# Add a second axis and specify its features
sec.axis = sec_axis(~.[0:20], name="Sleep")
)
p + facet_grid(vars(Name), scales = "free_y") +
theme(axis.ticks.x=element_blank(),axis.text.x = element_blank())
This is what it is producing (some details omitted from code for simplicity) -
What I need is for the scale on the left to remain "free" and the scale on the right to range from 0-24.
Secondary axes are implemented in ggplot2 as a decoration that is a transformation of the primary axis, so I don't know an elegant way to do this, since it would require the secondary axis formula to be aware of different scaling factors for each facet.
Here's a hacky approach where I scale each secondary series to its respective primary series, and then add some manual annotations for the secondary series. Another way might be to make the plots separately for each facet like here and use patchwork to combine them.
Given some fake data where the facets have different ranges for the primary series but the same range for the secondary series:
library(tidyverse)
fake <- tibble(facet = rep(1:3, each = 10),
x = rep(1:10, times = 3),
y_prim = (1+sin(x))*facet/2,
y_sec = (1 + sin(x*3))/2)
ggplot(fake, aes(x, y_prim)) +
geom_line() +
geom_line(aes(y= y_sec), color = "green") +
facet_wrap(~facet, ncol = 1)
...we could scale each secondary series to its primary series, and add custom annotations for that secondary series:
fake2 <- fake %>%
group_by(facet) %>%
mutate(y_sec_scaled = y_sec/max(y_sec) * (max(y_prim))) %>%
ungroup()
fake2_labels <- fake %>%
group_by(facet) %>%
summarize(max_prim = max(y_prim), baseline = 0, x_val = 10.5)
ggplot(fake2, aes(x, y_prim)) +
geom_line() +
geom_line(aes(y= y_sec_scaled), color = "green") +
facet_wrap(~facet, ncol = 1, scales = "free_y") +
geom_text(data = fake2_labels, aes(x = x_val, y = max_prim, label = "100%"),
hjust = 0, color = "green") +
geom_text(data = fake2_labels, aes(x = x_val, y = baseline, label = "0%"),
hjust = 0, color = "green") +
coord_cartesian(xlim = c(0, 10), clip = "off") +
theme(plot.margin = unit(c(1,3,1,1), "lines"))

How to plot filled points and confidence ellipses with the same color using ggplot in R?

I would like to plot a graph from a Discriminant Function Analysis in which points must have a black border and be filled with specific colors and confidence ellipses must be the same color as the points are filled. Using the following code, I get almost the graph I want, except that points do not have a black border:
library(ggplot2)
library(ggord)
library(MASS)
data("iris")
set.seed(123)
linear <- lda(Species~., iris)
linear
dfaplot <- ggord(linear, iris$Species, labcol = "transparent", arrow = NULL, poly = FALSE, ylim = c(-11, 11), xlim = c(-11, 11))
dfaplot +
scale_shape_manual(values = c(16,15,17)) +
scale_color_manual(values = c("#00FF00","#FF00FF","#0000FF")) +
theme(legend.position = "none")
PLOT 1
I could put a black border on the points by using the following code, but then confidence ellipses turn black.
dfaplot +
scale_shape_manual(values = c(21,22,24)) +
scale_color_manual(values = c("black","black","black")) +
scale_fill_manual(values = c("#00FF00","#FF00FF","#0000FF")) +
theme(legend.position = "none")
PLOT 2
I would like to keep the ellipses as in the first graph, but the points as in the second one. However, I am being unable to figure out how I could do this. If anyone has suggestions on how to do this, I would be very grateful. I am using the "ggord" package because I learned how to run the analysis using it, but if anyone has suggestions on how to do the same with only ggplot, it would be fine.
This roughly replicates what is going on in ggord. Looking at the source for the package, the ellipses are implemented differently in ggord than below, hence the small differences. If that is a big deal you can review the source and make changes. By default, geom_point doesn't have a fill attribute. So we set the shapes to a character type that does, and then specify color = 'black' in geom_point(). The full code (including projecting the original data) is below.
set.seed(123)
linear <- lda(Species~., iris)
linear
# Get point x, y coordinates
df <- data.frame(predict(linear, iris[, 1:4]))
df$species <- iris$Species
# Get explained variance for each axis
var_exp <- 100 * linear$svd ^ 2 / sum(linear$svd ^ 2)
ggplot(data = df,
aes(x = x.LD1,
y = x.LD2)) +
geom_point(aes(fill = species,
shape = species),
size = 4) +
stat_ellipse(aes(color = species),
level = 0.95) +
ylim(c(-11, 11)) +
xlim(c(-11, 11)) +
ylab(paste("LD2 (",
round(var_exp[2], 2),
"%)")) +
xlab(paste("LD1 (",
round(var_exp[1], 2),
"%)")) +
scale_color_manual(values = c("#00FF00","#FF00FF","#0000FF")) +
scale_fill_manual(values = c("#00FF00","#FF00FF","#0000FF")) +
scale_shape_manual(values = c(21, 22, 24)) +
coord_fixed(1) +
theme_bw() +
theme(
legend.position = "none"
)
To plot arrows, you can grab the scaling from the output it and plot it with geom_segment. I played with the colors/alpha so they were visible in the plot below.
scaling <- data.frame(linear$scaling)
...
geom_segment(data = scaling,
aes(x = 0,
y = 0,
xend = LD1,
yend = LD2),
arrow = arrow(),
color = "black") +
geom_text(data = scaling,
aes(x = ifelse(LD1 <= 0.1, LD1 - 2, LD1 + 2),
y = ifelse(LD2 <= 0.1, LD2 - 1, LD2 + 1)),
label = rownames(scaling),
color = "black") +
...

Set the width and gap in geom_bar in a large dataset with a lot of unique values

I have the dataframe below:
res<-sample.int(2187, 2187)
freq<-floor(runif(2187, 95,105))
t<-data.frame(res,freq)
and Im trying to create a bar chart based on this but despite the fact that I use width and color arguments I still cannot create space between the bars which are black instead of the selected fill.
library(ggplot2)
require(scales)
ggplot(t,width=0.1)+
geom_bar(aes(x=res,y=freq ,fill = (t$res==101)),
color = "black",stat = "identity") +
scale_fill_manual(values=c("darkblue", "lightblue"), guide = F) +
theme_classic(base_size = 16)+ theme(legend.position = "none")+
scale_x_discrete(breaks = seq(80, 115, 5))+ scale_y_continuous(labels = comma)
Note that this code works nice for a dataset with much fewer unique values like:
fac<-factor(rep(c(80,85,100,100.5,100.7,101,101.5,110,105),2000000))
res<-data.frame(fac)
new<-data.frame(table(res))
require(scales)
ggplot(new,width=0.1)+
geom_bar(aes(x=res,y=Freq ,fill = (new$res==101)),
color = "black",stat = "identity") +
scale_fill_manual(values=c("darkblue", "lightblue"), guide = F) +
theme_classic(base_size = 16)+ theme(legend.position = "none")+
scale_x_discrete(breaks = seq(80, 115, 5))+ scale_y_continuous(labels = comma)
May be I am completely wrong but if I understand correctly, the OP wants to reproduce the second chart from scratch using a sample of random numbers instead of already tabulated counts.
To create a histogram / bar chart, we only need a vector of random numbers (wraped in a data.frame for ggplot) and let geom_bar() do the counting. In addition, a particular bar will be highlighted.
By using floor(), the random numbers are already binned but are still considered as continuous by ggplot(). Therefore, they need to be turned into factor.
# create data
set.seed(123L) # ensure random data are reproducible
t <- data.frame(res = floor(runif(2187, 95, 105)))
library(ggplot2)
ggplot(t) +
aes(x = as.factor(res), fill = res == 101) +
geom_bar() +
theme_classic(base_size = 16) +
scale_fill_manual(values = c("darkblue", "lightblue"), guide = FALSE) +
xlab("res") +
ylab("freq")
Edit: geom_histogram()
Ther is an alternative approach using geom_histogram().
geom_histogram() does all steps in one go: The binning (no need to use floor()) as well as counting and plotting:
set.seed(123L) # ensure random data are reproducible
t2 <- data.frame(res = runif(2187, 95,105)) # floor() omitted here
ggplot(t2) +
aes(x = res, fill = floor(res) == 101) +
geom_histogram(breaks = seq(95, 105, 1), closed = "left") +
theme_classic(base_size = 16) +
scale_fill_manual(values = c("darkblue", "lightblue"), guide = FALSE) +
xlab("res") +
ylab("freq")
Here, the breaks parameter was used to specify the bin boundaries explicitely. Alternatively, the number of bins or the width of the bins can be specifies. This gives flexibilty to play around with the parameters.
Edit 2
The OP has asked about the case where the random numbers are uniformly distributed between 100 and 1015. With an adjustment to the sequence of breaks,
set.seed(123L) # ensure random data are reproducible
t3 <- data.frame(res = runif(2187, 100, 1015))
ggplot(t3) +
aes(x = res, fill = floor(res) == 101) +
geom_histogram(breaks = seq(100, 1015, 1), closed = "left") +
theme_classic(base_size = 16) +
scale_fill_manual(values = c("darkblue", "lightblue"), guide = FALSE) +
xlab("res") +
ylab("freq")
returns
This chart contains over 900 bars for each bin of width 1 which aren't all visible depending on the screen resolution as already explained by Jon Spring.
Therefore, it might be more suitable to reduce the number of bins, e.g., to 100 bins:
ggplot(t3) +
aes(x = res, fill = floor(res) == 101) +
geom_histogram(bins = 100L) +
theme_classic(base_size = 16) +
scale_fill_manual(values = c("darkblue", "lightblue"), guide = FALSE) +
xlab("res") +
ylab("freq")
Please note that 101 is still highlighted in the lower left corner.
Edit -- added alternate solutions at bottom.
If you have over 2,000 bars, and each one has a black outline 1 pixel wide on each side, that'll take something on the order of 6,000 horizontal pixels (ignoring anti-aliasing) to see one with a different fill. Most screens have much lower resolution than that.
If you must use bars, and must show every value, one option would be to drop the outline with color = NA and set width = 1 (as a term in the geom_col/geom_bar call) so there's no distracting blank space between bars. Even then, the different color at res == 101 is only visible at certain resolutions. (That might vary on device settings and anti-aliasing.)
ggplot(t)+
geom_col(aes(x=res,y=freq , fill = (res==101)),
color = NA, width = 1) +
scale_fill_manual(values=c("darkblue", "lightblue"), guide = F) +
theme_classic(base_size = 16) +
scale_x_continuous(breaks = c(500*0:4, 101))
If you must show all 2000 points, but want to highlight one, it might make sense to use a different geom that spreads the data out to use more of the available space.
For instance, we might use geom_point or geom_jitter to plot all the coordinates in 2d space. Here, I highlight the element with res == 101. I use arrange to make sure the special dot gets plotted last so that it doesn't get occluded.
library(dplyr)
ggplot(t %>% arrange(res == 101),
aes(x = res, y = freq,
fill = res == 101,
size = res == 101)) +
geom_jitter(shape = 21, stroke = 0.1)
Or we might plot the data as a line, highlighting the special dot on its own:
ggplot(t, aes(res, freq)) +
geom_line(color = "gray70") +
geom_point(data = subset(t, res == 101)) +
expand_limits(y=0)

Transforming the y-axis without changing raw data in ggplot2

I have a question about how to transform the y-axis in ggplot2. My plot now has two lines and a scatter plot. For the scatter plot, I am very interested in the area around zero. Is there a possible way to enlarge the space between 0% and 5% and narrow the space between 20% and 30%?
I have tried to use coord_trans(y = "log10") to transform into a log form. But in this case, I have a lot of negative values, so if I want to use sqrt or log, the negative values will be removed. Do you have any suggestions?
Example of data points:
df1 = data.frame(y = runif(200,min = -1, max = 1))
df1 = data.frame( x= seq(1:200), y = df1[order(abs(df1$y)),])
ggplot(df1) +
geom_point(colour = "black",aes(x,y) ,size = 0.1)
I want to have more space between 0% and 5 % and less space between 5% and 30%.
I have tried to use trans_new() to transform the axes.
eps <- 1e-8
tn <- trans_new("logpeps",
function(x) (x+eps)^(3),
function(y) ((y)^(1/3) ),
domain=c(- Inf, Inf)
)
ggplot(df1)+ geom_point(colour = "black",aes(x,y) ,size = 0.1) +
# xlab("Observations sorted by PD in v3.1") + ylab("Absolute PD difference ") +
# ggtitle("Absolute PD for RiskCalc v4.0 relative to v3.1") +
scale_x_continuous(breaks = seq(0, round(rownum/1000)*1000, by = round(rownum/100)*10)) +
scale_y_continuous(limits = c(-yrange,yrange),breaks = c(-breaksY,breaksY),
sec.axis = sec_axis(~.,breaks = c(-breaksY[2:length(breaksY)],breaksY), labels = scales:: percent
)) +
# geom_line(data = df, aes(x,y[,3], colour = "blue"),size = 1) +
# geom_line(data = ds,aes(xval, yval,colour = "red"),size = 1) +
coord_trans(y = tn) +
scale_color_discrete(name = element_blank())
But it compresses the plot to the center, which is opposite to what I want. Then I try to use y = y^3, but it shows an
ERROR: zero_range(range)
Try a cube root transform on the y values:
aes(y=yVariable^(1/3))
or use trans_new() to define a new transformation (such as cube root, with pleasing breaks and labels).
A couple thoughts:
You can remove the empty edges of the plot like so:
scale_y_continuous(expand = c(0,0))
If you want to try the log transformation, just do:
scale_y_log10()
If you want to focus the window:
scale_y_continuous(limits=c(-.15,.15), expand=c(0,0))
Also consider adding theme_bw() for a cleaner look

ggplot specific thick line

How would one be able to plot one line thicker than the other. I tried using the geom_line(size=X) but then this increases the thickness of both lines. Let say I would like to increase the thickness of the first column, how would one be able to approach this?
a <- (cbind(rnorm(100),rnorm(100))) #nav[,1:10]
sa <- stack(as.data.frame(a))
sa$x <- rep(seq_len(nrow(a)), ncol(a))
require("ggplot2")
p<-qplot(x, values, data = sa, group = ind, colour = ind, geom = "line")
p + theme(legend.position = "none")+ylab("Millions")+xlab("Age")+
geom_line( size = 1.5)
You need to map line thickness to the variable:
p + geom_line(aes(size = ind))
To control the thickness use scale_size_manual():
p + geom_line(aes(size = ind)) +
scale_size_manual(values = c(0.1, 1))

Resources