I have the following data:
unigrams Freq
1 the 236133
2 to 154296
3 and 128165
4 a 127434
5 i 124599
6 of 103380
7 in 81985
8 you 69504
9 is 65243
10 for 62425
11 it 60298
12 that 58605
13 on 45935
14 my 45424
15 with 38270
16 this 34799
17 was 33009
18 be 32725
19 have 31728
20 at 30255
and this set of data:
bigrams Freq
1 of the 20707
2 in the 19443
3 for the 11090
4 to the 10939
5 on the 10280
6 to be 9555
7 at the 7184
8 i have 6408
9 and the 6387
10 i was 6143
11 is a 6114
12 and i 5993
13 i am 5843
14 in a 5770
15 it was 5644
16 for a 5343
17 if you 5326
18 it is 5196
19 with the 5092
20 have a 4936
I would like to place two qplots together side-by-side, ncol = 2. I tried the gridExtra library, but it is generating errors that I can't seem to figure out how to correct. Any ideas on how to do this, please?
library(gridExtra)
# The 20 most unigrams in the dataset
ugrams <- as.data.frame(unigrams)
graph.data <- ugrams[order(ugrams$Freq, decreasing = T), ]
graph.data <- graph.data[1:20, ]
p1 <- qplot(unigrams,Freq, data=graph.data,fill=unigrams,geom=c("histogram"))
# The 20 most bigrams in the dataset
bgrams <- as.data.frame(bigrams)
graph.data <- bgrams[order(bgrams$Freq, decreasing = T), ]
graph.data <- graph.data[1:20, ]
p2 <- qplot(bigrams,Freq, data=graph.data,fill=bigrams,geom=c("histogram"))
grid.arrange(p1,p2,ncol=2)
This is the error that is generated:
<error/rlang_error>
stat_bin() can only have an x or y aesthetic.
Backtrace:
1. (function (x, ...) ...
2. ggplot2:::print.ggplot(x)
4. ggplot2:::ggplot_build.ggplot(x)
5. ggplot2:::by_layer(function(l, d) l$compute_statistic(d, layout))
6. ggplot2:::f(l = layers[[i]], d = data[[i]])
7. l$compute_statistic(d, layout)
8. ggplot2:::f(..., self = self)
9. self$stat$setup_params(data, self$stat_params)
10. ggplot2:::f(...)
I would like to have the graphs resemble this one:
Which was accomplished by the following code:
# The 20 most quadgrams in the dataset
qgrams <- as.data.frame(quadgrams)
graph.data <- qgrams[order(qgrams$Freq, decreasing = T), ]
graph.data <- graph.data[1:20, ]
ggplot(data=graph.data, aes(x=quadgrams, y=Freq, fill=quadgrams)) + geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle = 40, hjust = 1))
Is that possible
Edited for your shift from histograms to bar plots. Assuming that graph.data is actually your ugrams dataset, the working single plot is
Putting them side-by-side can be done with facets:
dplyr::bind_rows(
unigrams = select(ugrams, grams = unigrams, Freq),
bigrams = select(bigrams, grams = bigrams, Freq),
.id = "id") %>%
arrange(-Freq) %>%
mutate(
id = factor(id, levels = c("unigrams", "bigrams")),
grams = factor(grams, levels = grams)
) %>%
ggplot(aes(x = grams, y = Freq, fill = grams)) +
facet_wrap(~ id, ncol = 2, scales = "free_x") +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 40, hjust = 1))
(Obviously, these are "too small" to hold all of the legend, but that depends on where you are using it. I wonder if the legend shouldn't be included, since it is somewhat redundant with the x-axis labels.)
The y-axis on the left is harder to see because it is dwarfed by the unigrams on the right. While it does bias the plot (it might be natural to compare the vertical levels of the plot on the left with those on the right), you can alleviate that by freeing both the "x" (already free) and "y" axes with scales="free":
Related
I have many graphics with two times series plotted on them.
That is to say, I have one plot of y_1 and y_2 against a common set of dates.
For each plot, I would like to present the correlation on the plot between each pair of series. That is to say I would like to compute: cor(y_1,y_2) and include the resulting number on each plot.
This is surprisingly difficult to do in a principled way in ggplot2. I've found no simple way to do it using stat_cor so far.
I have already looked at other functions recommended for this task, but they are all designed for reporting the correlation of y_1 and y_2 in situations in which y_1 is plot against y_2 rather than both y_1 and y_2 are plot against time.
I would prefer a ggplot2-ish way to do this but I'm open to using any graphics software within R. Here is code for a minimal working example and what I have tried.
library(reprex); library(ggplot2); library(ggpubr)
n <- 6;
Q=sample(18:30, n, replace=TRUE)
# make sample data
dat <- data.frame(id=1:n,
date=seq.Date(as.Date("2020-12-26"), as.Date("2020-12-31"), "day"),
group=rep(LETTERS[1:2], n/2),
quantity= Q,
price= 100 - 2*Q + rnorm(n))
dat
#> id date group quantity price
#> 1 1 2020-12-26 A 19 63.02628
#> 2 2 2020-12-27 B 26 49.66597
#> 3 3 2020-12-28 A 27 44.98031
#> 4 4 2020-12-29 B 24 51.11224
#> 5 5 2020-12-30 A 29 41.11129
#> 6 6 2020-12-31 B 28 43.04494
tseriesplot <- ggplot(dat, aes(x = date)) + ggtitle("Oil: Daily Quantity and Price") +
geom_line(aes(y = Q, color = "Quantity (thousands of barrels)")) +
geom_line(aes(y = price, color = "Price"))
tseriesplot
# naive attempt fails
tseriesplot + stat_cor(data = dat, aes(x=quantity, y=price),method="pearson")
#> Error: Invalid input: date_trans works with objects of class Date only
Created on 2021-01-05 by the reprex package (v0.3.0)
I thought this would be a good question because it is similar to more complex questions elsewhere, e.g. https://stat.ethz.ch/pipermail/r-help/2020-July/467805.html but much more basic.
1) annotate Create the text txt you want to plot and then use annotate:
txt <- with(dat, sprintf("cor: %.2f", cor(quantity, price)))
tseriesplot +
annotate("text", label = txt, x = min(dat$date), y = max(dat$quantity, dat$price),
hjust = -0.1)
2) grid.text Another approach is to use grid graphics which allows one to specify the location independently of the data. Using txt from above:
library(grid)
tseriesplot
grid.text(txt, 0.1, 0.9)
3a) zoo This would also work:
library(zoo)
z <- read.zoo(dat[c("date", "price", "quantity")])
txt <- sprintf("cor: %.2f", cor(z)[2])
autoplot(z, facet = NULL) +
annotate("text", label = txt, x = start(z), y = max(z), hjust = -0.1)
3b) scale
or you could scale the variables as that does not affect the correlation:
z <- scale(z)
autoplot(z, facet = NULL) +
annotate("text", label = txt, x = start(z), y = max(z), hjust = -0.1)
Discussion
Overall putting together parts of different solutions this seems the most compact
library(zoo)
library(grid)
z <- read.zoo(dat[c("date", "price", "quantity")])
autoplot(z, facet = NULL)
grid.text(sprintf("cor: %.2f", cor(z)[2]), 0.1, 0.9)
Instead of trying to figure out how to do this with ggpubr::stat_cor you could simply compute the correlation coefficient and add it as an annotation to your plot using e.g. annotate:
library(ggplot2)
library(ggpubr)
set.seed(42)
n <- 6;
Q=sample(18:30, n, replace=TRUE)
# make sample data
dat <- data.frame(id=1:n,
date=seq.Date(as.Date("2020-12-26"), as.Date("2020-12-31"), "day"),
group=rep(LETTERS[1:2], n/2),
quantity= Q,
price= 100 - 2*Q + rnorm(n))
dat
#> id date group quantity price
#> 1 1 2020-12-26 A 18 64.63286
#> 2 2 2020-12-27 B 22 56.40427
#> 3 3 2020-12-28 A 18 63.89388
#> 4 4 2020-12-29 B 26 49.51152
#> 5 5 2020-12-30 A 27 45.90534
#> 6 6 2020-12-31 B 21 60.01842
tseriesplot <- ggplot(dat, aes(x = date)) + ggtitle("Oil: Daily Quantity and Price") +
geom_line(aes(y = quantity, color = "Quantity (thousands of barrels)")) +
geom_line(aes(y = price, color = "Price"))
tseriesplot +
annotate("text",
x = min(dat$date),
y = 70,
label = paste0("p = ", scales::number(cor(dat$quantity, dat$price, method = "pearson"), accuracy = .01)),
hjust = 0)
I'm building a dynamic flexdashboard with plotly and I was wondering if there was a way to dynamically resize my dashboard. For example, I have created plots of subjects being tested over time. When I shrink the page down, what I'd like is for it to dynamically adjust to a time-series plot of the average for the group at each test day.
My data looks like this:
library(flexdashboard)
library(knitr)
library(tidyverse)
library(plotly)
subject <- rep(c("A", "B", "C"), each = 8)
testDay <- rep(1:8, times = 3)
variable1 <- rnorm(n = length(subject), mean = 30, sd = 10)
variable2 <- rnorm(n = length(subject), mean = 15, sd = 3)
df <- data.frame(subject, testDay, variable1, variable2)
subject testDay variable1 variable2
1 A 1 21.816831 8.575000
2 A 2 14.947327 17.387903
3 A 3 18.014435 16.734653
4 A 4 33.100524 11.381793
5 A 5 37.105911 13.862776
6 A 6 32.181317 10.722458
7 A 7 41.107293 9.176348
8 A 8 36.674051 17.114815
9 B 1 33.710838 17.508234
10 B 2 23.788428 13.903532
11 B 3 42.846120 17.032208
12 B 4 9.785957 15.275293
13 B 5 32.551619 21.172497
14 B 6 36.912465 18.694263
15 B 7 40.061797 13.759541
16 B 8 41.094825 15.472144
17 C 1 27.663408 17.949291
18 C 2 31.263966 11.546486
19 C 3 39.734050 19.831854
20 C 4 25.461309 19.239821
21 C 5 22.128139 10.837672
22 C 6 31.234339 16.976004
23 C 7 46.273664 19.255745
24 C 8 27.057218 21.086204
My plotly code looks like this (a graph of each subject over time):
Dynamic Chart
===========================
Row
-----------------------------------------------------------------------
```{r}
p1 <- df %>%
ggplot(aes(x = as.factor(testDay), y = variable1, color = subject, group = 1)) +
geom_line() +
theme_bw() +
ggtitle("Variable 1")
ggplotly(p1)
```
```{r}
p2 <- df %>%
ggplot(aes(x = as.factor(testDay), y = variable2, color = subject, group = 1)) +
geom_line() +
theme_bw() +
ggtitle("Variable 2")
ggplotly(p2)
```
Is there a way that when I shrink the website down these plots can dynamically change to a group average plot, like this:
p1_avg <- df %>%
ggplot(aes(x = as.factor(testDay), y = variable1, group = 1)) +
stat_summary(fun.y = "mean", geom = "line") +
theme_bw() +
ggtitle("Variable 1 Avg")
ggplotly(p1_avg)
p2_avg <- df %>%
ggplot(aes(x = as.factor(testDay), y = variable2, group = 1)) +
stat_summary(fun.y = "mean", geom = "line") +
theme_bw() +
ggtitle("Variable 2 Avg")
ggplotly(p2_avg)
You can put your plotly object inside the plotly function renderPlotly() for dynamically resizing to the page. See an example how I used the function in this blog post:
https://medium.com/analytics-vidhya/shiny-dashboards-with-flexdashboard-e66aaafac1f2
I have a dataframe in R. First column is date. The rest columns are data for each category.
Date View1 View2 View3 View4 View5 View6
1 2010-05-17 13 10 13 10 13 10
2 2010-05-18 11 11 13 10 13 10
3 2010-05-19 4 12 13 10 13 10
4 2010-05-20 2 10 13 10 13 10
5 2010-05-21 23 16 13 10 13 10
6 2010-05-22 26 15 13 10 13 10
How can plot a timeplot with two lines? Each line for each column. i.e one line for View1, one line for View2, one line for View3 and so on. The x-axis is Date. Is there a function in ggplot can achieve this easily?
I searched other posts, see a solution below, but it gives me nothing on the plot.
mydf %>% gather(key,value, View1, View2, View3, View4, View5, View6 ) %>% ggplot(aes(x=Date, y=value, colour=key))
I also tried the commands below.
test_data_long1 <- melt(mydf, id="Date")
ggplot(data=test_data_long1,
aes(x=date, y=value, colour=variable)) +
geom_line()
It gives me an error.
Don't know how to automatically pick scale for object of type function. Defaulting to continuous.
Error: All columns in a tibble must be 1d or 2d objects:
* Column `x` is function
You coud just rewrite a dataframe for plotting:
dates <- c("2010-05-17", "2010-05-18", "2010-05-19", "2010-05-20", "2010-05-21")
df<- data.frame(date = dates, view1 = sample(10:15, 5, replace = TRUE),
view2 = sample(10:15, 5, replace = TRUE),
view3 = sample(10:15, 5, replace = TRUE),
view4 = sample(10:15, 5, replace = TRUE))
df$date <- as.Date(df$date)
toPlot1 <- df[,c(1,2)]
toPlot1[,3] <- "view1"
names(toPlot1) <- c("date", "n", "view")
toPlot2 <- df[,c(1,5)]
toPlot2[,3] <- "view4"
names(toPlot2) <- c("date", "n", "view")
toPlot<-bind_rows(toPlot1, toPlot2)
The graph would be the following:
ggplot(toPlot, aes(date, n, linetype = view)) + geom_line()
Or, using the reshape2 package you could just melt the data:
library(reshape2)
meltedDf <- melt(df , id.vars = 'date', variable.name = 'series')
ggplot(meltedDf, aes(date, value, linetype = series)) + geom_line()
This question already has answers here:
ggplot bar plot with facet-dependent order of categories
(4 answers)
Closed 5 years ago.
In the df below, I want to reorder bars from highest to lowest in each facet
I tried
df <- df %>% tidyr::gather("var", "value", 2:4)
ggplot(df, aes (x = reorder(id, -value), y = value, fill = id))+
geom_bar(stat="identity")+facet_wrap(~var, ncol =3)
It gave me
It didn't order the bars from highest to lowest in each facet.
I figured out another way to get what I want. I had to plot each variable at a time, then combine all plots using grid.arrange()
#I got this function from #eipi10's answer
#http://stackoverflow.com/questions/38637261/perfectly-align-several-plots/38640937#38640937
#Function to extract legend
# https://github.com/hadley/ggplot2/wiki/Share-a-legend-between-two-ggplot2-graphs
g_legend<-function(a.gplot) {
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend)
}
p1 <- ggplot(df[df$var== "A", ], aes (x = reorder(id, -value), y = value, fill = id))+
geom_bar(stat="identity") + facet_wrap(~var, ncol =3)
fin_legend <- g_legend(p1)
p1 <- p1 + guides(fill= F)
p2 <- ggplot(df[df$var== "B", ], aes (x = reorder(id, -value), y = value, fill = id))+
geom_bar(stat="identity") + facet_wrap(~var, ncol =3)+guides(fill=FALSE)
p3 <- ggplot(df[df$var== "C", ], aes (x = reorder(id, -value), y = value, fill = id))+
geom_bar(stat="identity") + facet_wrap(~var, ncol =3)+guides(fill=FALSE)
grid.arrange(p1, p2, p3, fin_legend, ncol =4, widths = c(1.5, 1.5, 1.5, 0.5))
The result is what I want
I wonder if there is a straightforward way that can help me order the bars from highest to lowest in all facets without having to plot each variable separtely and then combine them. Any suggestions will be much appreciated.
DATA
df <- read.table(text = c("
id A B C
site1 10 15 20
site2 20 10 30
site3 30 20 25
site4 40 35 40
site5 50 30 35"), header = T)
The approach below uses a specially prepared variable for the x-axis with facet_wrap() but uses the labels parameter to scale_x_discrete() to display the correct x-axis labels:
Prepare data
I'm more fluent in data.table, so this is used here. Feel free to use what ever package you prefer for data manipulation.
Edit: Removed second dummy variable, only ord is required
library(data.table)
# reshape from wide to long
molten <- melt(setDT(df), id.vars = "id")
# create dummy var which reflects order when sorted alphabetically
molten[, ord := sprintf("%02i", frank(molten, variable, -value, ties.method = "first"))]
molten
# id variable value ord
# 1: site1 A 10 05
# 2: site2 A 20 04
# 3: site3 A 30 03
# 4: site4 A 40 02
# 5: site5 A 50 01
# 6: site1 B 15 09
# 7: site2 B 10 10
# 8: site3 B 20 08
# 9: site4 B 35 06
#10: site5 B 30 07
#11: site1 C 20 15
#12: site2 C 30 13
#13: site3 C 25 14
#14: site4 C 40 11
#15: site5 C 35 12
Create plot
library(ggplot2)
# `ord` is plotted on x-axis instead of `id`
ggplot(molten, aes(x = ord, y = value, fill = id)) +
# geom_col() is replacement for geom_bar(stat = "identity")
geom_col() +
# independent x-axis scale in each facet,
# drop absent factor levels (not the case here)
facet_wrap(~ variable, scales = "free_x", drop = TRUE) +
# use named character vector to replace x-axis labels
scale_x_discrete(labels = molten[, setNames(as.character(id), ord)]) +
# replace x-axis title
xlab("id")
Data
df <- read.table(text = "
id A B C
site1 10 15 20
site2 20 10 30
site3 30 20 25
site4 40 35 40
site5 50 30 35", header = T)
If you're willing to lose the X axis labels, you can do this by using the actual y values as the x aesthetic, then dropping unused factor levels in each facet:
ggplot(df, aes (x = factor(-value), y = value, fill = id))+
geom_bar(stat="identity", na.rm = TRUE)+
facet_wrap(~var, ncol =3, scales = "free_x", drop = TRUE) +
theme(
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)
Result:
The loss of x-axis labels is probably not too bad here as you still have the colours to go on (and the x-axis is confusing anyway since it's not consistent across facets).
I have the following generated data frame called Raw_Data:
Time Velocity Type
1 10 1 a
2 20 2 a
3 30 3 a
4 40 4 a
5 50 5 a
6 10 2 b
7 20 4 b
8 30 6 b
9 40 8 b
10 50 9 b
11 10 3 c
12 20 6 c
13 30 9 c
14 40 11 c
15 50 13 c
I plotted this data with ggplot2:
ggplot(Raw_Data, aes(x=Time, y=Velocity))+geom_point() + facet_grid(Type ~.)
I have the objects: Regression_a, Regression_b, Regression_c. These are the linear regression equations for each plot. Each plot should display the corresponding equation.
Using annotate displays the particular equation on each plot:
annotate("text", x = 1.78, y = 5, label = Regression_a, color="black", size = 5, parse=FALSE)
I tried to overcome the issue with the following code:
Regression_a_eq <- data.frame(x = 1.78, y = 1,label = Regression_a,
Type = "a")
p <- x + geom_text(data = Raw_Data,label = Regression_a)
This did not solve the problem. Each plot still showed Regression_a, rather than just plot a
You can put the expressions as character values in a new dataframe with the same unique Type's as in your data-dataframe and add them with geom_text:
regrDF <- data.frame(Type = c('a','b','c'), lbl = c('Regression_a', 'Regression_b', 'Regression_c'))
ggplot(Raw_Data, aes(x = Time, y = Velocity)) +
geom_point() +
geom_text(data = regrDF, aes(x = 10, y = 10, label = lbl), hjust = 0) +
facet_grid(Type ~.)
which gives:
You can replace the text values in regrDF$lbl with the appropriate expressions.
Just a supplementary for the adopted answer if we have facets in both horizontal and vertical directions.
regrDF <- data.frame(Type1 = c('a','a','b','b'),
Type2 = c('c','d','c','d'),
lbl = c('Regression_ac', 'Regression_ad', 'Regression_bc', 'Regression_bd'))
ggplot(Raw_Data, aes(x = Time, y = Velocity)) +
geom_point() +
geom_text(data = regrDF, aes(x = 10, y = 10, label = lbl), hjust = 0) +
facet_grid(Type1 ~ Type2)
The answer is good but still imperfect as I do not know how to incorporate math expressions and newline simultaneously (Adding a newline in a substitute() expression).