I am fairly new to R and ggplotting. I'm trying to line plot the total of TRUE observations across time, but the counts seem to be capped at 1.00.
dd <- data.frame(x = c(1,1,1,2,2,3,3),
y = c(TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE))
ggplot(dd,aes(x,as.numeric(y)))+
geom_line()
count(as.numeric(y)) does not work, can you help me?
It is recommended to prepare the statistics before graphing them
library(ggplot2)
library(dplyr)
# summarized the TRUE count for each x value
graph_data <- dd %>%
group_by(x) %>%
summarise(count_y = sum(y))
# plot the data using geom_line
ggplot(data = graph_data) +
geom_line(aes(x, count_y)) +
# added the scale y for start the y-axis from 0
scale_y_continuous(limits = c(0, NA), expand = c(0, 0))
Created on 2021-05-22 by the reprex package (v2.0.0)
Related
There seems to be quite a bit of information for plotting NMDS outputs (i.e. NMDS1 vs NMDS1) using ggplot2 however I cannot find a way to plot the vegan::stressplot() (shepard's plot) using ggplot2.
Is there a way to produce a ggplot2 version of a metaMDS output?
Reproducible code
library(vegan)
set.seed(2)
community_matrix = matrix(
sample(1:100,300,replace=T),nrow=10,
dimnames=list(paste("community",1:10,sep=""),paste("sp",1:30,sep="")))
example_NMDS=metaMDS(community_matrix, k=2)
stressplot(example_NMDS)
Created on 2021-09-17 by the reprex package (v2.0.1)
Here's a workaround to plot a very similar plot using ggplot2.The trick was to get the structure of the stressplot(example_NMDS) and extract the data stored in that object. I used the tidyverse package that includes ggplot and other packages such as tidyr that contains the pivot_longer function.
library(vegan)
library(tidyverse)
# Analyze the structure of the stressplot
# Notice there's an x, y and yf list
str(stressplot(example_NMDS))
# Create a tibble that contains the data from stressplot
df <- tibble(x = stressplot(example_NMDS)$x,
y = stressplot(example_NMDS)$y,
yf = stressplot(example_NMDS)$yf) %>%
# Change data to long format
pivot_longer(cols = c(y, yf),
names_to = "var")
# Create plot
df %>%
ggplot(aes(x = x,
y = value)) +
# Add points just for y values
geom_point(data = df %>%
filter(var == "y")) +
# Add line just for yf values
geom_step(data = df %>%
filter(var == "yf"),
col = "red",
direction = "vh") +
# Change axis labels
labs(x = "Observed Dissimilarity", y = "Ordination Distance") +
# Add bw theme
theme_bw()
I am working on 3-way interaction effect plotting using my own data. But my code creates too many (continuous) shapes along the lines.
How can I leave the points only at the ends of the lines instead of the figure attached above?
I will deeply appreciate if anybody helps.
g1=ggplot(mygrid,aes(x=control,y=pred,color=factor(nknowledge),
lty=factor(nknowledge),shape=factor(nknowledge)))+
geom_line(size=1.5)+
geom_point(size=2.5)+
labs(x="control", y="attitudes",lty = "inc level")+
scale_linetype_manual("know level",breaks=1:3,values=c("longdash", "dotted","solid"),label=c("M-SD","M","M+SD"))+
scale_color_manual("know level",breaks=1:3,values=c("red", "blue","grey"),label=c("M-SD","M","M+SD"))+
scale_shape_manual("know level",breaks=1:3,values=c(6,5,4),label=c("M-SD","M","M+SD"))+
theme_classic()
This could be achieved by making use of a second dataset which filters the data for the endpoints by group using e.g. a group_by and range and passing the filtered dataset as data to geom_point:
Using some random example data try this:
set.seed(42)
mygrid <- data.frame(
control = runif(30, 1, 7),
pred = runif(30, 1, 3),
nknowledge = sample(1:3, 30, replace = TRUE)
)
library(ggplot2)
library(dplyr)
mygrid_pt <- mygrid %>%
group_by(nknowledge) %>%
filter(control %in% range(control))
ggplot(mygrid,aes(x=control,y=pred,color=factor(nknowledge),
lty=factor(nknowledge),shape=factor(nknowledge)))+
geom_line(size=1.5)+
geom_point(data = mygrid_pt, size=2.5)+
labs(x="control", y="attitudes",lty = "inc level")+
scale_linetype_manual("know level",breaks=1:3,values=c("longdash", "dotted","solid"),label=c("M-SD","M","M+SD"))+
scale_color_manual("know level",breaks=1:3,values=c("red", "blue","grey"),label=c("M-SD","M","M+SD"))+
scale_shape_manual("know level",breaks=1:3,values=c(6,5,4),label=c("M-SD","M","M+SD"))+
theme_classic()
If you use geom_point, then you'll get points for all rows in your data frame. If you want specific points and shapes plotted at the ends of your lines, you'll want to create a filtered data frame for the only points you want to have plotted.
library(ggplot2); library(dplyr)
g1 <- ggplot()+
geom_line(data = mtcars,
mapping = aes(x=hp,y=mpg,color=factor(cyl),lty=factor(cyl)),
size=1.5)+
geom_point(data = mtcars %>% group_by(cyl) %>% filter(hp == max(hp) | hp == min(hp)),
mapping = aes(x=hp,y=mpg,color=factor(cyl),shape=factor(cyl)),
size=2.5)
g1
Created on 2021-01-28 by the reprex package (v0.3.0)
I have a faceted ggplot where each facet is a day of the month. I want to replace days 4 through 29 with space and three dots (ellipsis) to remind my reader about the omitted days. I see that theme() allows customization, but I do not see any options that would allow me to insert space and three dots between day 3 and day 30 facets.
Should I resort to Photoshop or MS Paint?
library(tidyverse)
tibble(day = rep(1:30, 5), value = runif(5*30)) %>%
filter(day %in% c(1:3, 30)) %>%
ggplot(aes(x = value)) +
geom_histogram() +
facet_grid(day ~ .)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Created on 2019-11-13 by the reprex package (v0.3.0)
Here's an option making day a factor:
library(ggplot2)
library(dplyr)
tibble(day = rep(1:30, 5), value = runif(5*30)) %>%
filter(day %in% c(1:3, 30)) %>%
mutate(day = factor(day, levels = c('1','2','3','...','30')))%>% #new
ggplot(aes(x = value)) +
geom_histogram() +
facet_grid(day ~ ., drop = F) #added drop = F
I have a continuous scale including some values which codify different categories of missing (for example 998,999), and I want to make a plot excluding these numeric missing values.
Since the values are together, I can use xlim each time, but since it determines the domain of the plot I have to change the values for each different case.
Then, I ask for a solution. I think in two possibilities.
Is it possible to put non-determining limits to the x-values? I mean, if I give 990 as a maximum limit, but the maximum value that appears is 100, the plot should show an x-range till approximately 100, not 990, as xlim does.
Is there an opposite function to xlim?, meaning that the range determined by the limits (or a discrete set of values given) won't be included in the x-axis.
Thanks in advance.
I think the simplest way is to exclude these values in the plot, either before or during the ggplot call.
MWE
library(tidyverse)
# Create data with overflowing data
mtcars2 <- mtcars
mtcars2[5:15, 'mpg'] <- 998
# Full plot
mtcars2 %>% ggplot() +
geom_point(aes(x = mpg, y = disp))
Filtering before plot
mtcars2 %>%
filter(mpg < 250) %>%
ggplot() +
geom_point(aes(x = mpg, y = disp))
Filtering during plot
mtcars2 %>%
ggplot() +
geom_point(aes(x = mpg, y = disp), data = . %>% filter(mpg < 250))
I would filter those missing values from the original dataset:
library(dplyr)
df <- data.frame(cat = rep(LETTERS[1:4], 3),
values = sample(10, 12, replace = TRUE)
)
# Add missing values
df$values[c(1,5,10)] <- 999
df$values[c(2,7)] <- 998
invalid_values <- c(998, 999)
library(ggplot2)
df %>%
filter(!values %in% invalid_values) %>%
ggplot() +
geom_point(aes(cat, values))
Alternatively, if that's not possible for some reason, you can define a scale transformation:
df %>%
ggplot() +
geom_point(aes(cat, values)) +
scale_y_continuous(trans = scales::trans_new('remove_invalid',
transform = function(d) {d <- if_else(d %in% invalid_values, NA_real_, d)},
inverse = function(d) {if_else(is.na(d), 999, d)}
)
)
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning: Removed 5 rows containing missing values (geom_point).
Created on 2018-05-09 by the reprex package (v0.2.0).
I've got discrete data which i presented in ranges
for example
Marks Freq cumFreq
1 (37.9,43.1] 4 4
2 (43.1,48.2] 16 20
3 (48.2,53.3] 76 96
i need to plot the cmf for this data, I know that there is
plot(ecdf(x))
but i don't what to add for it to have what I need.
Here are a few options:
library(ggplot2)
library(scales)
library(dplyr)
## Fake data
set.seed(2)
dat = data.frame(score=c(rnorm(130,40,10), rnorm(130,80,5)))
Here's how to plot the ECDF if you have the raw data:
# Base graphics
plot(ecdf(dat$score))
# ggplot2
ggplot(dat, aes(score)) +
stat_ecdf(aes(group=1), geom="step")
Here's one way to plot the ECDF if you have only summary data:
First, let's group the data into bins, similar to what you have in your question. We use the cut function to create the bins and then create a new pct column to calculate each bins fraction of the total number of scores. We use the dplyr chaining operator (%>%) to do it all in one "chain" of functions.
dat.binned = dat %>% count(Marks=cut(score,seq(0,100,5))) %>%
mutate(pct = n/sum(n))
Now we can plot it. cumsum(pct) calculates the cumulative percentages (like cumFreq in your question). geom_step creates step plot with these cumulative percentages.
ggplot(dat.binned, aes(Marks, cumsum(pct))) +
geom_step(aes(group=1)) +
scale_y_continuous(labels=percent_format())
Here's what the plots look like:
What about this:
library(ggplot2)
library(scales)
library(dplyr)
set.seed(2)
dat = data.frame(score = c(rnorm(130,40,10), rnorm(130,80,5)))
dat.binned = dat %>% count(Marks = cut(score,seq(0,100,5))) %>%
mutate(pct = n/sum(n))
ggplot(data = dat.binned, mapping = aes(Marks, cumsum(pct))) +
geom_line(aes(group = 1)) +
geom_point(data = dat.binned, size = 0.1, color = "blue") +
labs(x = "Frequency(Hz)", y = "Axis") +
scale_y_continuous(labels = percent_format())