My data is,
$ Age : int 20 25 30 35 40 45 50 55 60
$ Test.Positive : int 1 0 1 1 2 2 0 1 0
$ Test.Negative : int 0 1 3 2 4 1 3 1 1
I am able to create individual dot plots for each as,
YM_R = rep(Age,YM)
df1 <- as.data.frame(YM_R)
YP_R = rep(Age,YP)
df2 <- as.data.frame(YP_R)
gm <- ggplot(df1) +
geom_dotplot(aes(x=df1$YM_R, y="Y-"), color='green', fill='green', binwidth = 2)
gm <- ggplot(df2) +
geom_dotplot(aes(x=df2$YP_R, y="Y+"), color='red', fill='red', binwidth = 2)
But I don't know how to combine them. Sample of how I want is in the image attached. Any pointers appreciated.
I suggest instead of thinking about "combining" plots, look instead to "facet" them.
Using an example from ?geom_dotplot:
library(ggplot2)
ggplot(mtcars, aes(mpg)) +
geom_dotplot(method="histodot", binwidth=1.5)
By adding a single call to facet_grid (there's facet_wrap as well), we can break them out:
ggplot(mtcars, aes(mpg)) +
geom_dotplot(method="histodot", binwidth=1.5) +
facet_grid(cyl ~ .)
Related
I have an existing ggplot with geom_col and some observations from a dataframe. The dataframe looks something like :
over runs wickets
1 12 0
2 8 0
3 9 2
4 3 1
5 6 0
The geom_col represents the runs data column and now I want to represent the wickets column using geom_point in a way that the number of points represents the wickets.
I want my graph to look something like this :
As
As far as I know, we'll need to transform your data to have one row per point. This method will require dplyr version > 1.0 which allows summarize to expand the number of rows.
You can adjust the spacing of the wickets by multiplying seq(wickets), though with your sample data a spacing of 1 unit looks pretty good to me.
library(dplyr)
wicket_data = dd %>%
filter(wickets > 0) %>%
group_by(over) %>%
summarize(wicket_y = runs + seq(wickets))
ggplot(dd, aes(x = over)) +
geom_col(aes(y = runs), fill = "#A6C6FF") +
geom_point(data = wicket_data, aes(y = wicket_y), color = "firebrick4") +
theme_bw()
Using this sample data:
dd = read.table(text = "over runs wickets
1 12 0
2 8 0
3 9 2
4 3 1
5 6 0", header = T)
I am using facet grid to generate neat presentations of my data.
Basically, my data frame has four columns:
idx, density, marker, case.
There are 5 cases, each case corresponds to 5 markers, and each marker corresponds to multiple idx, each idx corresponds to one density.
The data is uploaded here:
data frame link
I tried to use facet_grid to achieve my goal, however, I obtained a really messed up graph:
The x-axis and y-axis are messed up, the codes are:
library(ggplot2)
library(cowplot)
plot.density <-
ggplot(df_densityWindow, aes(x = idx, y = density)) +
geom_col() +
facet_grid(marker ~ case, scales = 'free') +
background_grid(major = 'y', minor = "none") + # add thin horizontal lines
panel_border() # and a border around each panel
plot(plot.density)
EDIT:
I reupload the file, now it should be work:
download file here
All 4 columns have been read as factors. This is an issue from however you loaded the data into R. Take a look at:
df <- readRDS('df.rds')
str(df)
'data.frame': 52565 obs. of 4 variables:
$ idx : Factor w/ 4712 levels "1","10","100",..: 1 1112 2223 3334 3546 3657 3768 3879 3990 2 ...
$ density: Factor w/ 250 levels "1022.22222222222",..: 205 205 204 203 202 201 199 198 197 197 ...
$ marker : Factor w/ 5 levels "CD3","CD4","CD8",..: 1 1 1 1 1 1 1 1 1 1 ...
$ case : Factor w/ 5 levels "Case_1","Case_2",..: 1 1 1 1 1 1 1 1 1 1 ...
Good news is that you can fix it with:
df$idx <- as.integer(as.character(df$idx))
df$density <- as.numeric(as.character(df$density))
Although you should look into how you are loading the data, to avoid future.
As another trick, try the above code without using the as.character calls, and compare the differences.
As already explained by MrGumble, the idx and density variables are of type factor but should be plotted as numeric.
The type.convert() function does the data conversion in one go:
library(ggplot2)
library(cowplot)
ggplot(type.convert(df_densityWindow), aes(x = idx, y = density)) +
geom_col() +
facet_grid(marker ~ case, scales = 'free') +
background_grid(major = 'y', minor = "none") + # add thin horizontal lines
panel_border() # and a border around each panel
I have a dataset as CSV with three columns:
timestamp (e.g. 2018/12/15)
keyword (e.g. "hello")
count (e.g. 7)
I want one plot where all the lines of the same keyword are connected with each other and timestamp is on the X- and count is on the Y- axis. I would like each keyword to have a different color for its line and the line being labeled with the keyword.
The CSV has only ~30.000 rows and R runs on a dedicated machine. Performance can be ignored.
I tried various approaches with mathplot and ggplot in this forum, but didn't get it to work with my own data.
What is the easiest solution to do this in R?
Thanks!
EDIT:
I tried customizing Romans code and tried the following:
`csvdata <- read.csv("c:/mydataset.csv", header=TRUE, sep=",")
time <- csvdata$timestamp
count <- csvdata$count
keyword <- csvdata$keyword
time <- rep(time)
xy <- data.frame(time, word = c(keyword), count, lambda = 5)
library(ggplot2)
ggplot(xy, aes(x = time, y = count, color = keyword)) +
theme_bw() +
scale_color_brewer(palette = "Set1") + # choose appropriate palette
geom_line()`
This creates a correct canvas, but no points/lines in it...
DATA:
head(csvdata)
keyword count timestamp
1 non-distinct-word 3 2018/08/09
2 non-distinct-word 2 2018/08/10
3 non-distinct-word 3 2018/08/11
str(csvdata)
'data.frame': 121 obs. of 3 variables:
$ keyword : Factor w/ 10 levels "non-distinct-word",..: 5 5 5 5 5 5 5 5 5 5 ...
$ count : int 3 2 3 1 6 6 2 3 2 1 ...
$ timestamp: Factor w/ 103 levels "2018/08/09","2018/08/10",..: 1 2 3 4 5 6 7 8 9 10 ...
Something like this?
# Generate some data. This is the part poster of the question normally provides.
today <- as.Date(Sys.time())
time <- rep(seq.Date(from = today, to = today + 30, by = "day"), each = 2)
xy <- data.frame(time, word = c("hello", "world"), count = rpois(length(time), lambda = 5))
library(ggplot2)
ggplot(xy, aes(x = time, y = count, color = word)) +
theme_bw() +
scale_color_brewer(palette = "Set1") + # choose appropriate palette
geom_line()
This question already has answers here:
Add legend to ggplot2 line plot
(4 answers)
Closed 2 years ago.
I was attempting (unsuccessfully) to show a legend in my R ggplot2 graph which involves multiple plots. My data frame df and code is as follows:
Individuals Mod.2 Mod.1 Mod.3
1 2 -0.013473145 0.010859793 -0.08914021
2 3 -0.011109863 0.009503278 -0.09049672
3 4 -0.006465788 0.011304668 -0.08869533
4 5 0.010536718 0.009110458 -0.09088954
5 6 0.015501212 0.005929766 -0.09407023
6 7 0.014565584 0.005530390 -0.09446961
7 8 -0.009712516 0.012234843 -0.08776516
8 9 -0.011282278 0.006569570 -0.09343043
9 10 -0.011330579 0.003505439 -0.09649456
str(df)
'data.frame': 9 obs. of 4 variables:
$ Individuals : num 2 3 4 5 6 7 8 9 10
$ Mod.2 : num -0.01347 -0.01111 -0.00647 0.01054 0.0155 ...
$ Mod.1 : num 0.01086 0.0095 0.0113 0.00911 0.00593 ...
$ Mod.3 : num -0.0891 -0.0905 -0.0887 -0.0909 -0.0941 ...
ggplot(df, aes(df$Individuals)) +
geom_point(aes(y=df[,2]), colour="red") + geom_line(aes(y=df[,2]), colour="red") +
geom_point(aes(y=df[,3]), colour="lightgreen") + geom_line(aes(y=df[,3]), colour="lightgreen") +
geom_point(aes(y=df[,4]), colour="darkgreen") + geom_line(aes(y=df[,4]), colour="darkgreen") +
labs(title = "Modules", x = "Number of individuals", y = "Mode")
I looked up the following stackflow threads, as well as Google searches:
Merging ggplot2 legend
ggplot2 legend not showing
`ggplot2` legend not showing label for added series
ggplot2 legend for geom_area/geom_ribbon not showing
ggplot and R: Two variables over time
ggplot legend not showing up in lift chart
Why ggplot2 legend not show in the graph
ggplot legend not showing up in lift chart.
This one was created 4 days ago
This made me realize that making legends appear is a recurring issue, despite the fact that legends usually appear automatically.
My first question is what are the causes of a legend to not appear when using ggplot? The second is how to solve these causes. One of the causes appears to be related to multiple plots and the use of aes(), but I suspect there are other reasons.
colour= XYZ should be inside the aes(),not outside:
geom_point(aes(data, colour=XYZ)) #------>legend
geom_point(aes(data),colour=XYZ) #------>no legend
Hope it helps, it took me a hell long way to figure out.
You are going about the setting of colour in completely the wrong way. You have set colour to a constant character value in multiple layers, rather than mapping it to the value of a variable in a single layer.
This is largely because your data is not "tidy" (see the following)
head(df)
x a b c
1 1 -0.71149883 2.0886033 0.3468103
2 2 -0.71122304 -2.0777620 -1.0694651
3 3 -0.27155800 0.7772972 0.6080115
4 4 -0.82038851 -1.9212633 -0.8742432
5 5 -0.71397683 1.5796136 -0.1019847
6 6 -0.02283531 -1.2957267 -0.7817367
Instead, you should reshape your data first:
df <- data.frame(x=1:10, a=rnorm(10), b=rnorm(10), c=rnorm(10))
mdf <- reshape2::melt(df, id.var = "x")
This produces a more suitable format:
head(mdf)
x variable value
1 1 a -0.71149883
2 2 a -0.71122304
3 3 a -0.27155800
4 4 a -0.82038851
5 5 a -0.71397683
6 6 a -0.02283531
This will make it much easier to use with ggplot2 in the intended way, where colour is mapped to the value of a variable:
ggplot(mdf, aes(x = x, y = value, colour = variable)) +
geom_point() +
geom_line()
ind = 1:10
my.df <- data.frame(ind, sample(-5:5,10,replace = T) ,
sample(-5:5,10,replace = T) , sample(-5:5,10,replace = T))
df <- data.frame(rep(ind,3) ,c(my.df[,2],my.df[,3],my.df[,4]),
c(rep("mod.1",10),rep("mod.2",10),rep("mod.3",10)))
colnames(df) <- c("ind","value","mod")
Your data frame should look something likes this
ind value mod
1 5 mod.1
2 -5 mod.1
3 3 mod.1
4 2 mod.1
5 -2 mod.1
6 5 mod.1
Then all you have to do is :
ggplot(df, aes(x = ind, y = value, shape = mod, color = mod)) +
geom_line() + geom_point()
I had a similar problem with the tittle, nevertheless, I found a way to show the title: you can add a layer using
ggtitle ("Name of the title that you want to show")
example:
ggplot(data=mtcars,
mapping = aes(x=hp,
fill = factor(vs)))+
geom_histogram(bins = 9,
position = 'identity',
alpha = 0.8, show.legend = T)+
labs(title = 'Horse power',
fill = 'Vs Motor',
x = 'HP',
y = 'conteo',
subtitle = 'A',
caption = 'B')+
ggtitle("Horse power")
Hi Stack Overflow community,
I have a dataset:
conc branch length stage factor
1 1000 3 573.5 e14 NRG4
2 1000 7 425.5 e14 NRG4
3608 1000 44 5032.0 P10 NRG4
3609 1000 0 0.0 P10 NRG4
FYI
> str(dframe1)
'data.frame': 3940 obs. of 5 variables:
$ conc : Factor w/ 6 levels "0","1","10","100",..: 6 6 6 6 6 6 6 6 6 6 ...
$ branch: int 3 7 5 0 1 0 0 4 1 1 ...
$ length: num 574 426 204 0 481 ...
$ stage : Factor w/ 8 levels "e14","e16","e18",..: 1 1 1 1 1 1 1 1 1 1 ...
$ factor: Factor w/ 2 levels "","NRG4": 2 2 2 2 2 2 2 2 2 2 ...
I would like to create facetted line graphs, plotting the mean +/- standard error of the mean
I have tried experimenting and building a ggplot from others (here and on the web).
I have successfully used scripts that will make bargraphs this way:
errbar.ggplot.facets <- ggplot(dframe1, aes(x = conc, y = length))
### function to calculate the standard error of the mean
se <- function(x) sd(x)/sqrt(length(x))
### function to be applied to each panel/facet
my.fun <- function(x) {
data.frame(ymin = mean(x) - se(x),
ymax = mean(x) + se(x),
y = mean(x))}
g.err.f <- errbar.ggplot.facets +
stat_summary(fun.y = mean, geom = "bar",
fill = clrs.hcl(48)) +
stat_summary(fun.data = my.fun, geom = "linerange") +
facet_wrap(~ stage) +
theme_bw()
print(g.err.f)
Source: http://teachpress.environmentalinformatics-marburg.de/2013/07/creating-publication-quality-graphs-in-r-7/
In fact, I have created facetted line graphs with this script:
`ggplot(data=dframe1, aes(x=conc, y = length, group = stage)) +
geom_line() + facet_wrap(~stage)`
image: postimg.org/image/ebpdc0sb7
However, I used a transformed dataset of only means, SEM in another column, but I don't know how to add them.
Given the complexity (for me) of the bargraphs + error line scripts above, I have not yet been able to integrate/synthesize these into something I need.
In this case, the colour is not important to have.
P.S. I apologise for the long thread (and perhaps the overkill on some details). This is my first online R question, so not sure of correct etiquette. Thank you all in advance for being so helpful!
Darian
In case your dataframe has a column for the mean and the se you could do something like this:
library("dplyr")
library("ggplot2")
# Create a dummydataframe with columns mean and se
df <- mtcars %>%
group_by(gear, cyl) %>%
summarise(mean_mpg = mean(mpg), se_mpg = se(mpg))
ggplot(df, aes(x = gear, y = mean_mpg)) +
geom_bar(stat = "identity") +
geom_errorbar(aes(ymin = mean_mpg - se_mpg, ymax = mean_mpg + se_mpg)) +
facet_wrap(~cyl)