I have a dataframe in R. First column is date. The rest columns are data for each category.
Date View1 View2 View3 View4 View5 View6
1 2010-05-17 13 10 13 10 13 10
2 2010-05-18 11 11 13 10 13 10
3 2010-05-19 4 12 13 10 13 10
4 2010-05-20 2 10 13 10 13 10
5 2010-05-21 23 16 13 10 13 10
6 2010-05-22 26 15 13 10 13 10
How can plot a timeplot with two lines? Each line for each column. i.e one line for View1, one line for View2, one line for View3 and so on. The x-axis is Date. Is there a function in ggplot can achieve this easily?
I searched other posts, see a solution below, but it gives me nothing on the plot.
mydf %>% gather(key,value, View1, View2, View3, View4, View5, View6 ) %>% ggplot(aes(x=Date, y=value, colour=key))
I also tried the commands below.
test_data_long1 <- melt(mydf, id="Date")
ggplot(data=test_data_long1,
aes(x=date, y=value, colour=variable)) +
geom_line()
It gives me an error.
Don't know how to automatically pick scale for object of type function. Defaulting to continuous.
Error: All columns in a tibble must be 1d or 2d objects:
* Column `x` is function
You coud just rewrite a dataframe for plotting:
dates <- c("2010-05-17", "2010-05-18", "2010-05-19", "2010-05-20", "2010-05-21")
df<- data.frame(date = dates, view1 = sample(10:15, 5, replace = TRUE),
view2 = sample(10:15, 5, replace = TRUE),
view3 = sample(10:15, 5, replace = TRUE),
view4 = sample(10:15, 5, replace = TRUE))
df$date <- as.Date(df$date)
toPlot1 <- df[,c(1,2)]
toPlot1[,3] <- "view1"
names(toPlot1) <- c("date", "n", "view")
toPlot2 <- df[,c(1,5)]
toPlot2[,3] <- "view4"
names(toPlot2) <- c("date", "n", "view")
toPlot<-bind_rows(toPlot1, toPlot2)
The graph would be the following:
ggplot(toPlot, aes(date, n, linetype = view)) + geom_line()
Or, using the reshape2 package you could just melt the data:
library(reshape2)
meltedDf <- melt(df , id.vars = 'date', variable.name = 'series')
ggplot(meltedDf, aes(date, value, linetype = series)) + geom_line()
Related
I have many graphics with two times series plotted on them.
That is to say, I have one plot of y_1 and y_2 against a common set of dates.
For each plot, I would like to present the correlation on the plot between each pair of series. That is to say I would like to compute: cor(y_1,y_2) and include the resulting number on each plot.
This is surprisingly difficult to do in a principled way in ggplot2. I've found no simple way to do it using stat_cor so far.
I have already looked at other functions recommended for this task, but they are all designed for reporting the correlation of y_1 and y_2 in situations in which y_1 is plot against y_2 rather than both y_1 and y_2 are plot against time.
I would prefer a ggplot2-ish way to do this but I'm open to using any graphics software within R. Here is code for a minimal working example and what I have tried.
library(reprex); library(ggplot2); library(ggpubr)
n <- 6;
Q=sample(18:30, n, replace=TRUE)
# make sample data
dat <- data.frame(id=1:n,
date=seq.Date(as.Date("2020-12-26"), as.Date("2020-12-31"), "day"),
group=rep(LETTERS[1:2], n/2),
quantity= Q,
price= 100 - 2*Q + rnorm(n))
dat
#> id date group quantity price
#> 1 1 2020-12-26 A 19 63.02628
#> 2 2 2020-12-27 B 26 49.66597
#> 3 3 2020-12-28 A 27 44.98031
#> 4 4 2020-12-29 B 24 51.11224
#> 5 5 2020-12-30 A 29 41.11129
#> 6 6 2020-12-31 B 28 43.04494
tseriesplot <- ggplot(dat, aes(x = date)) + ggtitle("Oil: Daily Quantity and Price") +
geom_line(aes(y = Q, color = "Quantity (thousands of barrels)")) +
geom_line(aes(y = price, color = "Price"))
tseriesplot
# naive attempt fails
tseriesplot + stat_cor(data = dat, aes(x=quantity, y=price),method="pearson")
#> Error: Invalid input: date_trans works with objects of class Date only
Created on 2021-01-05 by the reprex package (v0.3.0)
I thought this would be a good question because it is similar to more complex questions elsewhere, e.g. https://stat.ethz.ch/pipermail/r-help/2020-July/467805.html but much more basic.
1) annotate Create the text txt you want to plot and then use annotate:
txt <- with(dat, sprintf("cor: %.2f", cor(quantity, price)))
tseriesplot +
annotate("text", label = txt, x = min(dat$date), y = max(dat$quantity, dat$price),
hjust = -0.1)
2) grid.text Another approach is to use grid graphics which allows one to specify the location independently of the data. Using txt from above:
library(grid)
tseriesplot
grid.text(txt, 0.1, 0.9)
3a) zoo This would also work:
library(zoo)
z <- read.zoo(dat[c("date", "price", "quantity")])
txt <- sprintf("cor: %.2f", cor(z)[2])
autoplot(z, facet = NULL) +
annotate("text", label = txt, x = start(z), y = max(z), hjust = -0.1)
3b) scale
or you could scale the variables as that does not affect the correlation:
z <- scale(z)
autoplot(z, facet = NULL) +
annotate("text", label = txt, x = start(z), y = max(z), hjust = -0.1)
Discussion
Overall putting together parts of different solutions this seems the most compact
library(zoo)
library(grid)
z <- read.zoo(dat[c("date", "price", "quantity")])
autoplot(z, facet = NULL)
grid.text(sprintf("cor: %.2f", cor(z)[2]), 0.1, 0.9)
Instead of trying to figure out how to do this with ggpubr::stat_cor you could simply compute the correlation coefficient and add it as an annotation to your plot using e.g. annotate:
library(ggplot2)
library(ggpubr)
set.seed(42)
n <- 6;
Q=sample(18:30, n, replace=TRUE)
# make sample data
dat <- data.frame(id=1:n,
date=seq.Date(as.Date("2020-12-26"), as.Date("2020-12-31"), "day"),
group=rep(LETTERS[1:2], n/2),
quantity= Q,
price= 100 - 2*Q + rnorm(n))
dat
#> id date group quantity price
#> 1 1 2020-12-26 A 18 64.63286
#> 2 2 2020-12-27 B 22 56.40427
#> 3 3 2020-12-28 A 18 63.89388
#> 4 4 2020-12-29 B 26 49.51152
#> 5 5 2020-12-30 A 27 45.90534
#> 6 6 2020-12-31 B 21 60.01842
tseriesplot <- ggplot(dat, aes(x = date)) + ggtitle("Oil: Daily Quantity and Price") +
geom_line(aes(y = quantity, color = "Quantity (thousands of barrels)")) +
geom_line(aes(y = price, color = "Price"))
tseriesplot +
annotate("text",
x = min(dat$date),
y = 70,
label = paste0("p = ", scales::number(cor(dat$quantity, dat$price, method = "pearson"), accuracy = .01)),
hjust = 0)
I have the following data:
unigrams Freq
1 the 236133
2 to 154296
3 and 128165
4 a 127434
5 i 124599
6 of 103380
7 in 81985
8 you 69504
9 is 65243
10 for 62425
11 it 60298
12 that 58605
13 on 45935
14 my 45424
15 with 38270
16 this 34799
17 was 33009
18 be 32725
19 have 31728
20 at 30255
and this set of data:
bigrams Freq
1 of the 20707
2 in the 19443
3 for the 11090
4 to the 10939
5 on the 10280
6 to be 9555
7 at the 7184
8 i have 6408
9 and the 6387
10 i was 6143
11 is a 6114
12 and i 5993
13 i am 5843
14 in a 5770
15 it was 5644
16 for a 5343
17 if you 5326
18 it is 5196
19 with the 5092
20 have a 4936
I would like to place two qplots together side-by-side, ncol = 2. I tried the gridExtra library, but it is generating errors that I can't seem to figure out how to correct. Any ideas on how to do this, please?
library(gridExtra)
# The 20 most unigrams in the dataset
ugrams <- as.data.frame(unigrams)
graph.data <- ugrams[order(ugrams$Freq, decreasing = T), ]
graph.data <- graph.data[1:20, ]
p1 <- qplot(unigrams,Freq, data=graph.data,fill=unigrams,geom=c("histogram"))
# The 20 most bigrams in the dataset
bgrams <- as.data.frame(bigrams)
graph.data <- bgrams[order(bgrams$Freq, decreasing = T), ]
graph.data <- graph.data[1:20, ]
p2 <- qplot(bigrams,Freq, data=graph.data,fill=bigrams,geom=c("histogram"))
grid.arrange(p1,p2,ncol=2)
This is the error that is generated:
<error/rlang_error>
stat_bin() can only have an x or y aesthetic.
Backtrace:
1. (function (x, ...) ...
2. ggplot2:::print.ggplot(x)
4. ggplot2:::ggplot_build.ggplot(x)
5. ggplot2:::by_layer(function(l, d) l$compute_statistic(d, layout))
6. ggplot2:::f(l = layers[[i]], d = data[[i]])
7. l$compute_statistic(d, layout)
8. ggplot2:::f(..., self = self)
9. self$stat$setup_params(data, self$stat_params)
10. ggplot2:::f(...)
I would like to have the graphs resemble this one:
Which was accomplished by the following code:
# The 20 most quadgrams in the dataset
qgrams <- as.data.frame(quadgrams)
graph.data <- qgrams[order(qgrams$Freq, decreasing = T), ]
graph.data <- graph.data[1:20, ]
ggplot(data=graph.data, aes(x=quadgrams, y=Freq, fill=quadgrams)) + geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle = 40, hjust = 1))
Is that possible
Edited for your shift from histograms to bar plots. Assuming that graph.data is actually your ugrams dataset, the working single plot is
Putting them side-by-side can be done with facets:
dplyr::bind_rows(
unigrams = select(ugrams, grams = unigrams, Freq),
bigrams = select(bigrams, grams = bigrams, Freq),
.id = "id") %>%
arrange(-Freq) %>%
mutate(
id = factor(id, levels = c("unigrams", "bigrams")),
grams = factor(grams, levels = grams)
) %>%
ggplot(aes(x = grams, y = Freq, fill = grams)) +
facet_wrap(~ id, ncol = 2, scales = "free_x") +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 40, hjust = 1))
(Obviously, these are "too small" to hold all of the legend, but that depends on where you are using it. I wonder if the legend shouldn't be included, since it is somewhat redundant with the x-axis labels.)
The y-axis on the left is harder to see because it is dwarfed by the unigrams on the right. While it does bias the plot (it might be natural to compare the vertical levels of the plot on the left with those on the right), you can alleviate that by freeing both the "x" (already free) and "y" axes with scales="free":
I have around 20 variables which are coming from 4 different sources. I want to visualize for each variable how the data across sources varies using ggplot.
I was thinking a line chart would be a good option to visualize. My x-axis can be each responses and 4 lines for the sources would show me how data is changing across these 4 sources of data. I can have region as a split variable to visualize by region.
My data looks like something below (I have provided only 2 variables for simplicity):
library(data.table)
set.seed(1200)
ID <- seq(1001,1100)
region <- sample(1:10,100,replace = T)
Var1_source1 <- sample(1:100,100,replace = T)
Var1_source2 <- sample(1:100,100,replace = T)
Var1_source3 <- sample(1:100,100,replace = T)
Var1_source4 <- sample(1:100,100,replace = T)
Var2_source1 <- sample(1:100,100,replace = T)
Var2_source2 <- sample(1:100,100,replace = T)
Var2_source3 <- sample(1:100,100,replace = T)
Var2_source4 <- sample(1:100,100,replace = T)
df1 <- as.data.table(data.frame(ID,
region,
Var1_source1,
Var1_source2,
Var1_source3,
Var1_source4,
Var2_source1,
Var2_source2,
Var2_source3,
Var2_source4))
I feel this is unique requirement as I do not have anything specific to be plotted on my x axis
I am not entirely sure what you are hoping the plot to look like from your description, but the first part of any ggplot is getting the data a long format.
library(tidyverse)
df2 <- gather(df1, group, value, - c(ID, region)) %>%
separate(group, c("Var", "Source"))
head(df2)
ID region Var Source value
1 1001 2 Var1 source1 92
2 1002 4 Var1 source1 44
3 1003 5 Var1 source1 15
4 1004 6 Var1 source1 42
5 1005 5 Var1 source1 39
6 1006 6 Var1 source1 48
We now have a column which we can use within the ggplot. I am not entirely sure what you want plotting but this is an example:
ggplot(df2, aes(x = region, y = value, colour = Source)) +
stat_summary(fun.y = mean, geom ="line")
Or we can use a facet to split between the two variables:
ggplot(df2, aes(x = region, y = value, colour = Source)) +
stat_summary(fun.y = mean, geom ="line") +
facet_grid(Var~.)
This question already has answers here:
ggplot bar plot with facet-dependent order of categories
(4 answers)
Closed 5 years ago.
In the df below, I want to reorder bars from highest to lowest in each facet
I tried
df <- df %>% tidyr::gather("var", "value", 2:4)
ggplot(df, aes (x = reorder(id, -value), y = value, fill = id))+
geom_bar(stat="identity")+facet_wrap(~var, ncol =3)
It gave me
It didn't order the bars from highest to lowest in each facet.
I figured out another way to get what I want. I had to plot each variable at a time, then combine all plots using grid.arrange()
#I got this function from #eipi10's answer
#http://stackoverflow.com/questions/38637261/perfectly-align-several-plots/38640937#38640937
#Function to extract legend
# https://github.com/hadley/ggplot2/wiki/Share-a-legend-between-two-ggplot2-graphs
g_legend<-function(a.gplot) {
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend)
}
p1 <- ggplot(df[df$var== "A", ], aes (x = reorder(id, -value), y = value, fill = id))+
geom_bar(stat="identity") + facet_wrap(~var, ncol =3)
fin_legend <- g_legend(p1)
p1 <- p1 + guides(fill= F)
p2 <- ggplot(df[df$var== "B", ], aes (x = reorder(id, -value), y = value, fill = id))+
geom_bar(stat="identity") + facet_wrap(~var, ncol =3)+guides(fill=FALSE)
p3 <- ggplot(df[df$var== "C", ], aes (x = reorder(id, -value), y = value, fill = id))+
geom_bar(stat="identity") + facet_wrap(~var, ncol =3)+guides(fill=FALSE)
grid.arrange(p1, p2, p3, fin_legend, ncol =4, widths = c(1.5, 1.5, 1.5, 0.5))
The result is what I want
I wonder if there is a straightforward way that can help me order the bars from highest to lowest in all facets without having to plot each variable separtely and then combine them. Any suggestions will be much appreciated.
DATA
df <- read.table(text = c("
id A B C
site1 10 15 20
site2 20 10 30
site3 30 20 25
site4 40 35 40
site5 50 30 35"), header = T)
The approach below uses a specially prepared variable for the x-axis with facet_wrap() but uses the labels parameter to scale_x_discrete() to display the correct x-axis labels:
Prepare data
I'm more fluent in data.table, so this is used here. Feel free to use what ever package you prefer for data manipulation.
Edit: Removed second dummy variable, only ord is required
library(data.table)
# reshape from wide to long
molten <- melt(setDT(df), id.vars = "id")
# create dummy var which reflects order when sorted alphabetically
molten[, ord := sprintf("%02i", frank(molten, variable, -value, ties.method = "first"))]
molten
# id variable value ord
# 1: site1 A 10 05
# 2: site2 A 20 04
# 3: site3 A 30 03
# 4: site4 A 40 02
# 5: site5 A 50 01
# 6: site1 B 15 09
# 7: site2 B 10 10
# 8: site3 B 20 08
# 9: site4 B 35 06
#10: site5 B 30 07
#11: site1 C 20 15
#12: site2 C 30 13
#13: site3 C 25 14
#14: site4 C 40 11
#15: site5 C 35 12
Create plot
library(ggplot2)
# `ord` is plotted on x-axis instead of `id`
ggplot(molten, aes(x = ord, y = value, fill = id)) +
# geom_col() is replacement for geom_bar(stat = "identity")
geom_col() +
# independent x-axis scale in each facet,
# drop absent factor levels (not the case here)
facet_wrap(~ variable, scales = "free_x", drop = TRUE) +
# use named character vector to replace x-axis labels
scale_x_discrete(labels = molten[, setNames(as.character(id), ord)]) +
# replace x-axis title
xlab("id")
Data
df <- read.table(text = "
id A B C
site1 10 15 20
site2 20 10 30
site3 30 20 25
site4 40 35 40
site5 50 30 35", header = T)
If you're willing to lose the X axis labels, you can do this by using the actual y values as the x aesthetic, then dropping unused factor levels in each facet:
ggplot(df, aes (x = factor(-value), y = value, fill = id))+
geom_bar(stat="identity", na.rm = TRUE)+
facet_wrap(~var, ncol =3, scales = "free_x", drop = TRUE) +
theme(
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)
Result:
The loss of x-axis labels is probably not too bad here as you still have the colours to go on (and the x-axis is confusing anyway since it's not consistent across facets).
I am new to R script and need help with plotting the data.
My data looks like this
run1Seek run2Seek run3Seek
1 12 23 28
2 10 27 0
3 23 19 0
4 22 24 0
5 21 26 0
6 11 26 0
I need to plot the ID value on x axis and run1Seek, run2Seek, run3Seek values on y axis. Something like this in the below image:
Try this:
library(ggplot2)
# Random data
mat <- matrix(sample(1:100, size = 1000, replace = T), ncol = 2)
colnames(mat) <- c("Run1Seek", "Run2Seek")
# Make data frame
ds <- data.frame(ID = 1:500, mat)
# Melt to long format
ds <- reshape2::melt(ds, "ID")
# Look at data
head(ds)
# Plot
ggplot(ds, aes(x = ID, y = value, fill = variable)) + geom_bar(stat = "identity")