add custom legend in ggplot2 - r

I'm working with ggplot2 to generate some geom_line plots which i've already generated from another data.frame which is not important to mention here. but it also contains the same id value as the following dataframe.
I have this data frame called df:
id X Y total
1 3214 6786 10000
2 4530 5470 10000
3 2567 7433 10000
4 1267 8733 10000
5 2456 7544 10000
6 6532 6532 10000
7 5642 4358 10000
What i want to do is create custom legend which present for a specific id the percentage of X and Y on each of the geom_line for when the id variable is the same. So basically for each geom_line of e.g(id=1, draw the percentage for that id in the geom_line plot)
I've tried to use geom_text, but the problem is that it's printing everything in one line which i cannot see anything of it.
how this can be done ??
EDIT
olddf dataframe is something like that:
id pos X Y Z
1
1.....
1
2
3
4
3 ......
.
.
that's the code that i've tried
for(i in df$id)
{
test = subset(olddf, id==i)
mdata <- melt(test, id=c("pos","id"))
pl = ggplot() + geom_line(data=mdata, aes(x=pos, y=value, color=variable)) + geom_text(data=df, aes(x=6000, y=0.1, label=(X*total)/100), size=5)
}

The answer (as discussed in chat) is quite straightforward:
Change geom_text(data = df, ...) to geom_text(data = df[df$id == i, ], ...)

Related

Overlapped data with messed up axises using facet_grid in R

I am using facet grid to generate neat presentations of my data.
Basically, my data frame has four columns:
idx, density, marker, case.
There are 5 cases, each case corresponds to 5 markers, and each marker corresponds to multiple idx, each idx corresponds to one density.
The data is uploaded here:
data frame link
I tried to use facet_grid to achieve my goal, however, I obtained a really messed up graph:
The x-axis and y-axis are messed up, the codes are:
library(ggplot2)
library(cowplot)
plot.density <-
ggplot(df_densityWindow, aes(x = idx, y = density)) +
geom_col() +
facet_grid(marker ~ case, scales = 'free') +
background_grid(major = 'y', minor = "none") + # add thin horizontal lines
panel_border() # and a border around each panel
plot(plot.density)
EDIT:
I reupload the file, now it should be work:
download file here
All 4 columns have been read as factors. This is an issue from however you loaded the data into R. Take a look at:
df <- readRDS('df.rds')
str(df)
'data.frame': 52565 obs. of 4 variables:
$ idx : Factor w/ 4712 levels "1","10","100",..: 1 1112 2223 3334 3546 3657 3768 3879 3990 2 ...
$ density: Factor w/ 250 levels "1022.22222222222",..: 205 205 204 203 202 201 199 198 197 197 ...
$ marker : Factor w/ 5 levels "CD3","CD4","CD8",..: 1 1 1 1 1 1 1 1 1 1 ...
$ case : Factor w/ 5 levels "Case_1","Case_2",..: 1 1 1 1 1 1 1 1 1 1 ...
Good news is that you can fix it with:
df$idx <- as.integer(as.character(df$idx))
df$density <- as.numeric(as.character(df$density))
Although you should look into how you are loading the data, to avoid future.
As another trick, try the above code without using the as.character calls, and compare the differences.
As already explained by MrGumble, the idx and density variables are of type factor but should be plotted as numeric.
The type.convert() function does the data conversion in one go:
library(ggplot2)
library(cowplot)
ggplot(type.convert(df_densityWindow), aes(x = idx, y = density)) +
geom_col() +
facet_grid(marker ~ case, scales = 'free') +
background_grid(major = 'y', minor = "none") + # add thin horizontal lines
panel_border() # and a border around each panel

Trying to make a bar chart with each categorical column as a different color

I found a cool Wes Anderson palette package but I am failing here in actually using it. The variable I am looking at (Q1) has options 1 and 2. There is an NA in the set which is getting plotted however I would like to remove it as well.
library(readxl)
library(tidyverse)
library(wesanderson)
RA_Survey <- read_excel("file extension")
ggplot(data = RA_Survey, mapping = aes(x = Q1)) +
geom_bar() + scale_fill_manual(values=wes_palette(n=2, name="GrandBudapest"))
The plot I'm getting is working but without the color. Any ideas?
There are several issues which need to be addressed.
Using the Wes Anderson palette
As already mentioned by Mako, the fill aesthetic was missing from the call to aes().
Furthermore, the OP reports an error message saying Palette not found. The wesanderson package contains a list of available palettes:
names(wesanderson::wes_palettes)
[1] "BottleRocket1" "BottleRocket2" "Rushmore1" "Rushmore" "Royal1" "Royal2" "Zissou1"
[8] "Darjeeling1" "Darjeeling2" "Chevalier1" "FantasticFox1" "Moonrise1" "Moonrise2" "Moonrise3"
[15] "Cavalcanti1" "GrandBudapest1" "GrandBudapest2" "IsleofDogs1" "IsleofDogs2"
There is no palette called "GrandBudapest" as requested in OP's code. Instead, we have to choose between "GrandBudapest1" and "GrandBudapest2".
Also, the help file help("wes_palette") lists the available palettes.
Here is a working example which uses the dummy data created in the Data section below:
library(ggplot2)
library(wesanderson)
ggplot(RA_Survey, aes(x = Q1, fill = Q1)) +
geom_bar() +
scale_fill_manual(values=wes_palette(n=2, name="GrandBudapest1"))
Removing NA
The OP has asked to remove the NAs from the set. There are two options:
Tell ggplot() to remove the NAs.
Remove the NAs from te data by filtering.
We can tell ggplot() to remove NAs when plotting the x axis:
library(ggplot2)
library(wesanderson)
ggplot(RA_Survey, aes(x = Q1, fill = Q1)) +
geom_bar() +
scale_fill_manual(values=wes_palette(n=2, name="GrandBudapest1")) +
scale_x_discrete(na.translate = FALSE)
Note, this produces a warning message Removed 3 rows containing non-finite values (stat_count). To get rid of the message, we can use geom_bar(na.rm = TRUE).
The other option removes the NAs from the data by filtering
library(dplyr)
library(ggplot2)
library(wesanderson)
ggplot(RA_Survey %>% filter(!is.na(Q1)), aes(x = Q1, fill = Q1)) +
geom_bar() +
scale_fill_manual(values=wes_palette(n=2, name="GrandBudapest1"))
which creates exactly the same chart.
Data
As the OP has not provided a sample dataset, we need to create our own:
library(dplyr)
set.seed(123L)
RA_Survey <- data_frame(Q1 = sample(c("1", "2", NA), 20, TRUE, c(3, 6, 1)))
RA_Survey
# A tibble: 20 x 1
Q1
<chr>
1 2
2 1
3 2
4 1
5 NA
6 2
7 2
8 1
9 2
10 2
11 NA
12 2
13 1
14 2
15 2
16 1
17 2
18 2
19 2
20 NA

R Plot Bar graph transposed dataframe

I'm trying to plot the following dataframe as bar plot, where the values for the filteredprovince column are listed on a separate column (n)
Usually, the ggplot and all the other plots works on horizontal dataframe, and after several searches I am not able to find a way to plot this "transposed" version of dataframe.
The cluster should group each bar graph, and within each cluster I would plot each filteredprovince based on the value of the n column
Thanks you for the support
d <- read.table(text=
" cluster PROVINCIA n filteredprovince
1 1 08 765 08
2 1 28 665 28
3 1 41 440 41
4 1 11 437 11
5 1 46 276 46
6 1 18 229 18
7 1 35 181 other
8 1 29 170 other
9 1 33 165 other
10 1 38 153 other ", header=TRUE,stringsAsFactors = FALSE)
UPDATE
Thanks to the suggestion in comments I almost achived the format desired :
ggplot(tab_s, aes(x = cluster, y = n, fill = factor(filteredprovince))) + geom_col()
There is any way to put on Y labels not frequencies but the % ?
If I understand correctly, you're trying to use the geom_bar() geom which gives you problems because it wants to make sort of an histogram but you already have done this kind of summary.
(If you had provided code which you have tried so far I would not have to guess)
In that case you can use geom_col() instead.
ggplot(d, aes(x = filteredprovince, y = n, fill = factor(PROVINCIA))) + geom_col()
Alternatively, you can change the default stat of geom_bar() from "count" to "identity"
ggplot(d, aes(x = filteredprovince, y = n, fill = factor(PROVINCIA))) +
geom_bar(stat = "identity")
See this SO question for what a stat is
EDIT: Update in response to OP's update:
To display percentages, you will have to modify the data itself.
Just divide n by the sum of all n and multiply by 100.
d$percentage <- d$n / sum(d$n) * 100
ggplot(d, aes(x = cluster, y = percentage, fill = factor(filteredprovince))) + geom_col()
I'm not sure I perfectly understand, but if the problem is the orientation of your dataframe, you can transpose it with t(data) where data is your dataframe.

ggplot2 adding custom legend when plotting two lines from subset of columns

I've looked all over stack and other sites to fix my code but can't see what's wrong. I am trying to plot 2 lines on the same graph on ggplot that are portions of 2 different columns. For example, I have a column of length 8 of which the first four rows are M (male) and the last four rows are F (female). I have two columns of data and one column for condition (factor).
ModelMF <- data.frame(ProbGender, ProbCond, ProbMF, Act_pct)
where:
ProbGender ProbCond ProbMF Act_pct
M 0 .75 .71
M 10 .67 .69
M 20 .61 .54
M 30 .81 .77
F 0 .88 .82
F 10 .73 .71
F 20 .67 .71
F 30 .60 .63
I have tried the following but I keep getting errors (see below):
ggplot(data = ModelMF, aes(x = ProbCond)) + geom_line(data =
ModelMF[ModelMF$ProbGender=="M",], aes(y=ProbMF), color = 'col1') +
geom_point(data = ModelMF[ModelMF$ProbGender=="M",], aes(y = ProbMF)) +
geom_line(data = ModelMF[ModelMF$ProbGender=="M",], aes(y=Act_pct), color =
'col2') + geom_point(data = ModelMF[ModelMF$ProbGender=="M",], aes(y =
Act_pct)) + scale_color_manual(values = c('col1' = 'darkblue', 'col2' ='lightblue'))
Preferably I would like to be able to create a custom legend that lets me map the colors as I've attempted to do using scale_color_manual, but I get the following error:
Error in grDevices::col2rgb(colour, TRUE) : invalid color name 'col1'
I'm not sure if it is due to the fact that I'm subsetting data within the df or something else I'm just missing? Also if I add the female lines I assume I can simply follow the same procedure?
Thanks in advance.

Map with geom_bin2d overlay with additional stat info

I am trying to reproduce something similar to this map using ggplot2:
This is what I've done so far:
load("mapdata.Rdata")
> ls() #2 datasets: "depth" for basemap (geom_contour) and "data" is use to construct geom_bin2d
[1] "data" "depth"
> head(data)
latitude longitude GRcounts
740 67.20000 -57.83333 0
741 67.11667 -57.80000 0
742 67.10000 -57.93333 1
743 67.06667 -57.80000 0
751 67.15000 -58.15000 0
762 67.18333 -58.15000 0
ggplot(data=data,aes(x =longitude, y =latitude))
+theme_bw()
+ stat_bin2d(binwidth = c(0.5, 0.5))
+geom_contour(data=depth,aes(lon, lat, z=dn),colour = "black", bins=5)
+ xlim(c(-67,-56)) + ylim(c(65,71))
Which gives me this map:
The last step is to display over my geom_bin2d circles with size proportional to the sum of the counts (Grcounts) within each bin.
Any tips on how to do so in ggplot (preferably) would be much appreciated.
follow-up question: alignment mismatch between stat_bin2d and stat_summary2d when using facet_wrap
When I run the following code on the diamonds data set, there are no apparent problem: However if I do run the same code on my data, I do get misalignment problems. Any thoughts on what may cause this problem?
p<-ggplot(diamonds,aes(x =carat, y =price,colour=cut))+
stat_summary2d(fun=sum,aes(z=depth,group=cut),bins=10)
p+facet_wrap(~cut)
df <- ggplot_build(p)$data[[1]]
summary(df)##now 5 groups, 1 panel
df$x<-with(df,(xmin+xmax)/2)
df$y<-with(df,(ymin+ymax)/2)
plot1<-ggplot(diamonds,aes(carat, price))+ stat_bin2d(bins=10)
plot1+geom_point(data=df,aes(x,y,size=value,group=group),color="red",shape=1)+facet_wrap(~group)
This is my Rcode and plot:
p<-ggplot(dat,aes(x =longitude, y =latitude,colour=SizeClass))+
stat_summary2d(fun=sum,aes(z=GRcounts,group=SizeClass),bins=10)
p+facet_wrap(~SizeClass)
df <- ggplot_build(p)$data[[1]]
summary(df)##now 4 groups, 1 panel
df$x<-with(df,(xmin+xmax)/2)
df$y<-with(df,(ymin+ymax)/2)
plot1<-ggplot(dat,aes(longitude, latitude))+ stat_bin2d(bins=10)
plot1+geom_point(data=df,aes(x,y,size=value,group=group),color="red",shape=1)+facet_wrap(~group)
> head(dat[c(7,8,14,21)])###mydata
latitude longitude GRcounts SizeClass
742 67.10000 -57.93333 1 (100,150)
784 67.21667 -57.95000 1 (100,150)
756 67.11667 -57.80000 1 (<100)
1233 68.80000 -59.55000 2 (100,150)
1266 68.68333 -59.60000 2 (100,150)
1288 68.66667 -59.65000 1 (100,150)
My data set can be downloaded here: data
As your dataset doesn't work on my computer will use diamonds dataset as example.
Make new plot of your data with stat_summary2d() and set z= as argument you want to sum (in your case GRcounts) and provide fun=sum to sum those values. Store it as some object.
p<-ggplot(diamonds,aes(carat,price))+stat_summary2d(fun=sum,aes(z=depth))
Use function ggplot_build() to get data used for plot. Coordinates of rectangles are in columns xmin, xmax, ymin and ymax and sum are in column value.
df <- ggplot_build(p)$data[[1]]
head(df)
fill xbin ybin value ymax ymin yint xmax xmin xint PANEL group
1 #55B1F7 [0.2,0.36] [326,943] 641318.2 942.5667 326.0000 1 0.3603333 0.2000000 1 1 1
2 #1A3955 [0.2,0.36] (943,1.56e+03] 75585.5 1559.1333 942.5667 2 0.3603333 0.2000000 1 1 1
3 #132B43 [0.2,0.36] (1.56e+03,2.18e+03] 415.8 2175.7000 1559.1333 3 0.3603333 0.2000000 1 1 1
4 #132B43 [0.2,0.36] (2.18e+03,2.79e+03] 304.4 2792.2667 2175.7000 4 0.3603333 0.2000000 1 1 1
5 #244D71 (0.36,0.521] [326,943] 179486.8 942.5667 326.0000 1 0.5206667 0.3603333 2 1 1
6 #2D5F8A (0.36,0.521] (943,1.56e+03] 271688.9 1559.1333 942.5667 2 0.5206667 0.3603333 2 1 1
For the points calculate x and y positions as mean of xmin,xmax and ymin,ymax.
df$x<-with(df,(xmin+xmax)/2)
df$y<-with(df,(ymin+ymax)/2)
Use this new data frame to add points to your original plot with stat_bin2d().
ggplot(diamonds,aes(carat,price))+stat_bin2d()+
geom_point(data=df,aes(x=x,y=y,size=value),color="red",shape=1)
UPDATE - solution with facetting
To use facet_wrap() and combine stat_bin2d() and points you should use some workaround as there seems to be some problem.
First, create two plots - one for sums with stat_summary2d() and one for counts with stat_bin2d(). Both plots should be faceted.
plot1 <- ggplot(dat,aes(x =longitude, y =latitude))+
stat_summary2d(fun=sum,aes(z=GRcounts),bins=10)+facet_wrap(~SizeClass)
plot2 <- ggplot(dat,aes(longitude, latitude))+ stat_bin2d(bins=10)+
facet_wrap(~SizeClass)
Now extract data from both plots using ggplot_build() and store them as objects. For the sums data frame (df1) calculated x and y coordinates as in example above.
df1 <- ggplot_build(plot1)$data[[1]]
df1$x<-with(df,(xmin+xmax)/2)
df1$y<-with(df,(ymin+ymax)/2)
df2<-ggplot_build(plot2)$data[[1]]
Now plot your data using those new data frames - df1 for points and df2 for rectangles. With geom_rect() you will get rectangles which fill= depend on count. For faceting use column PANEL.
ggplot()+geom_rect(data=df2,aes(xmin=xmin,xmax=xmax,
ymin=ymin,ymax=ymax,fill=count))+
geom_point(data=df1,aes(x=x,y=y,size=value),shape=1,color="red")+
facet_wrap(~PANEL)

Resources