Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a dataframe mydf as shown below. I want to plot a stacked 3d plot (as shown in sample plot below) where A, C, G, T columns are represented by bars (years in the sample plot). Length column ranges from 18 to 34 for each file.name and want them to be on the Gas Type axis as in example plot, then I finally want file.name column to be on country axis as in sample plot below. How can I get this plotted in R? Thanks for your help.
mydf <- structure(list(file.name = structure(c(1L, 1L, 2L, 2L, 4L, 4L
), .Label = c("merged_read_counts", "DCL1_VF_1_GAGTGG_L008_R1_001",
"DCL1_VF_2_GGTAGC_L008_R1_001", "DCL2_VF_1_ACTGAT_L008_R1_001",
"DCL2_VF_2_ATGAGC_L008_R1_001", "DCLd_SSHADV1_1_AGTTCC_L008_R1_001",
"DCLd_SSHADV1_2_ATGTCA_L008_R1_001", "DCLd_SSHADV1_3_CCGTCC_L008_R1_001",
"DCLd_SSHV2L_1_GTAGAG_L008_R1_001", "DCLd_SSHV2L_2_GTCCGC_L008_R1_001",
"DCLd_SSHV2L_3_GTGAAA_L008_R1_001", "DCLd_VF_1_GTGGCC_L008_R1_001",
"DCLd_VF_2_GTTTCG_L008_R1_001", "DCLd_VF_3_CGTACG_L008_R1_001",
"WT_SSHADV1_1_GGCTAC_L008_R1_001", "WT_SSHADV1_2_CTTGTA_L008_R1_001",
"WT_SSHADV1_3_AGTCAA_L008_R1_001", "WT_SSHV2L_1_GCCAAT_L008_R1_001",
"WT_SSHV2L_2_CAGATC_L008_R1_001", "WT_SSHV2L_3_ACTTGA_L008_R1_001",
"WT_SSHV2L_4_GATCAG_L008_R1_001", "WT_SSHV2L_5_TAGCTT_L008_R1_001",
"WT_VF_1_ATCACG_L008_R1_001", "WT_VF_2_CGATGT_L008_R1_001", "WT_VF_3_TTAGGC_L008_R1_001",
"WT_VF_4_TGACCA_L008_R1_001", "WT_VF_5_ACAGTG_L008_R1_001"), class = "factor"),
length = c(18L, 19L, 18L, 19L, 18L, 19L), A = c(2294436L,
2588528L, 52104L, 47190L, 103378L, 59269L), C = c(1501040L,
2838174L, 35888L, 93922L, 38132L, 31912L), G = c(2106623L,
1714702L, 80765L, 64930L, 129040L, 161517L), T = c(5065628L,
7462881L, 62174L, 87905L, 274783L, 110125L)), .Names = c("file.name",
"length", "A", "C", "G", "T"), row.names = c(1L, 2L, 18L, 19L,
58L, 59L), class = "data.frame")
I don't think 3d bar plots are available in ggplot(ggplot2 3D Bar Plot).
I would suggest grouped, stacked barplots instead. Something like:
library(ggplot2)
library(reshape2)
df <- melt(mydf,id.vars=c("file.name","length"))
ggplot(df, aes(x = file.name, y = value, fill = variable)) +
geom_bar(stat = 'identity', position = 'stack') + facet_grid(~ length)
Related
I'm trying to visualize how a neural network separates a simple 2 dimension points into 2 classes. I use geom_point to denote the training points and geom_raster to denote how the neural network separates the 2D space. Here's the functions and some of the data points plotted.
library(tidyverse)
library(neuralnet)
data2 <- structure(list(X1 = c(152, 178, 19, 101, 145, 184), x = c(32.4083268723916,
84.5016641449183, 114.483315175202, 51.914560098842, 79.6402378017537,
82.6861507166177), y = c(18.339864264708, 83.42093185056, 63.2843023451388,
55.7215069333086, 42.6517407153766, 86.5805756277405), label = structure(c(2L,
1L, 1L, 2L, 2L, 1L), .Label = c("1", "2"), class = "factor")), row.names = c(152L,
178L, 19L, 101L, 145L, 184L), class = "data.frame")
nn.model <- neuralnet(label~x+y, data2, hidden=4, linear.output=FALSE)
background <- expand_grid(x=seq(-40,120,0.1), y=seq(0,100,0.1))
background$label <- predict(nn.model, background) %>% apply(1, which.max)
ggplot()+geom_raster(data=background, aes(x, y, fill=label))+geom_point(data=data2, aes(x, y, color=label))+scale_color_manual(values=c("white","red"))
In the original dataset, the points lie in x range (-40, 120) and y range (0, 100); therefore the background expands accordingly. This approach, of course, takes some time because R will need to have the neural network predict some 1600 x 1000 points and then render them on the geom_raster layer.
My question: is there way to optimize or do this another way in ggplot (or in another package, if this problem is solved well there), as this approach is brute force in geom_rastering the background?
This question already has answers here:
geom_smooth on a subset of data
(3 answers)
Closed 3 years ago.
Data: Height was recorded daily
I want to plot the Height of my Plants (Plant A1 - Z50)
in single Plots, and i want to Highlight the current Year.
So i made a Subset of each Plant and a subset for the current year (2018)
Now i need a Plot with the total record an the highlighted Data from 2018
dput(Plant)
structure(list(Name = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("Plant A1", "Plant B1", "Plant C1"), class = "factor"),
Date = structure(c(1L, 4L, 5L, 7L, 1L, 4L, 6L, 1L, 2L, 3L
), .Label = c(" 2001-01-01", " 2001-01-02", " 2001-01-03",
" 2002-01-01", " 2002-02-01", " 2019-01-01", " 2019-12-31"
), class = "factor"), Height_cm = c(91, 106.1, 107.4, 145.9,
169.1, 192.1, 217.4, 139.8, 140.3, 140.3)), .Names = c("Name",
"Date", "Height_cm"), class = "data.frame", row.names = c(NA,
-10L))
Plant_A1 <- filter(Plant, Name == "Plant A1")
Current_Year <- as.numeric("2018")
Plant_A1_Subset <- filter(Plant_A1, format(Plant_A1$Date, '%Y') == Current_Year)
ggplot(data=Plant_A1,aes(x=Plant_A1$Date, y=Plant_A1$Heigth)) +
geom_point() +
geom_smooth(method="loes", level=0.95, span=1/2, color="red") +
labs(x="Data", y="Height cm")
Now i don't know how to put my new Subset for 2018(Plant_A1_Subset) into this graph.
As noted, this question has a duplicate with an answer in this question.
That said here's likely the most common way of handling your problem.
In ggplot2 future calls inherits any arguments passed into aes of the ggplot(aes(...)) function. Thus the plot will always use these arguments in future ggplot functions, unless one manually overwrites the arguments. However we can solve your problem, by simply adding an extra argument in the aes of geom_point. Below I've illustrated a simple way to achieve what you might be looking for.
Specify the aes argument in individual calls
The first method is likely the most intuitive. aes controls the the plotted parameters. As such if you want to add colour to certain points, one way is to let the aes be individual to the geom_point and geom_smooth argument.
library(ggplot2)
library(lubridate) #for month(), year(), day() functions
current_year <- 2018
ggplot(data = Plant_A1, aes(x = Date, y = Heigth)) +
#Note here, colour set in geom_point
geom_point(aes(col = ifelse(year(Date) == current_year, "Yes", "No"))) +
geom_smooth(method="loess", level=0.95,
span=1/2, color="red") +
labs(x="Data", y="Height cm",
col = "Current year?") #Specify legend title for colour
Note here that i have used the inheritance of the aes argument. Simply put, the aes will check the names within data, and if it can find it, it will use these as variables. So there is no need to specify data$....
I have an usual problem when using geom_errorbar in ggplot2.
The error bars are not within range but that is of no concern here.
My problem is that geom_errorbar is plotting the confidence intervals for the same data differently depending on what other data is plotted with it.
The code below filters the data only passing data where Audio1 is equal to "300SW" OR "3500MFL" in the uncommented SE and AggBar.
SE<-c(0.0861829641865964, 0.0296894376485468, 0.0323219002250762,
0.0937013798013447)
AggBar <- structure(list(Report = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("One Flash", "Two Flashes"), class = "factor"),
Visual = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("one",
"two"), class = "factor"), Audio = c("300SW", "300SW", "300SW",
"300SW", "3500MFL3500CL", "3500MFL3500CL", "3500MFL3500CL",
"3500MFL3500CL"), Prob = c(0.938828282828283, 0.0611717171717172,
0.754141414141414, 0.245858585858586, 0.534484848484848,
0.465515151515151, 0.0830909090909091, 0.916909090909091)), .Names = c("Report",
"Visual", "Audio", "Prob"), row.names = c(NA, -8L), class = "data.frame")
#SE<-c(0.0310069159026252, 0.113219880555153, 0.0861829641865964, 0.0296894376485468)
#AggBar <- structure(list(Report = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
#2L), .Label = c("One Flash", "Two Flashes"), class = "factor"),
#Visual = structure(c(1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("one",
#"two"), class = "factor"), Audio = c("300MFL300CL", "300MFL300CL",
#"300MFL300CL", "300MFL300CL", "300SW", "300SW", "300SW",
#"300SW"), Prob = c(0.562242424242424, 0.437757575757576,
#0.0921010101010101, 0.90789898989899, 0.938828282828283,
#0.0611717171717172, 0.754141414141414, 0.245858585858586)), .Names = c("Report",
#"Visual", "Audio", "Prob"), row.names = c(NA, -8L), class = "data.frame")
prob.bar = ggplot(AggBar, aes(x = Report, y = Prob, fill = Report)) + theme_bw() #+ facet_grid(Audio~Visual)
prob.bar + #This changes all panels' colour
geom_bar(position=position_dodge(.9), stat="identity", colour="black", width=0.8)+
theme(legend.position = "none") + labs(x="Report", y="Probability of Report", title = expression("Visual Condition")) + scale_fill_grey() +
scale_fill_grey(start=.4) +
scale_y_continuous(limits = c(0, 1), breaks = (seq(0,1,by = .25)))+
facet_grid(Audio ~ Visual)+
geom_errorbar(aes(ymin=Prob-SE, ymax=Prob+SE),
width=.1, # Width of the error bars
position=position_dodge(.09))
This results in the following output:
The Audio1 variables are seen on the rightmost vertical labels.
However if I filter where it only passes where Audio1 is equal to "300SW" OR "300MFL" (the commented SE and AggBar) the error bars for "300SW change":
The Audio1 variables are seen on the rightmost vertical labels with "300SW" on the bottom this time.
This change is the incorrect one because when I plot just the Audio1 "300SW" the error bars match the original plot.
I have tried plotting the Audio1 "300SW" with other variables not presented here and it is only when presenting with "300MFL" that this change occurs.
If you look at the SE variable contents you will see that there is no change in the values therein for "300SW" in both versions of the code. Yet the outputs differ.
I cannot fathom what is happening here. Any ideas or suggestions are welcome.
Thanks very much for your time.
#Antonios K below has highlighted that when "300SW" is on top of the grid the error bars are correctly drawn. I'm guessing that the error bars are being incorrectly matched to the bars although I don't know why this is the case.
The problem is that SE is not stored inside the data frame: it's just floating around in the global environment. When the data is facetted (which involves rearranging the order), it no longer lines up with the correct records. Fix the problem by storing SE in the data frame:
AggBar$SE <- c(0.0310069159026252, 0.113219880555153, 0.0861829641865964, 0.0296894376485468)
ggplot(AggBar, aes(Report, Prob, Report)) +
geom_bar(stat = "identity", fill = "grey50") +
geom_errorbar(aes(ymin = Prob - SE, ymax = Prob + SE), width = 0.4) +
facet_grid(Audio ~ Visual)
The bit of code that plots the error bars is :
geom_errorbar(aes(ymin=Prob-SE, ymax=Prob+SE),
width=.1, # Width of the error bars
position=position_dodge(.09))
So, I guess it's something there.
As you said the SE variable is the same in both cases, but what you plot there is Prob-SE and Prob+SE. And if you do AggBar$Prob-SE and AggBar$Prob+SE you'll get different values for 300SW for each case.
Might have to do with the order of your Audio1 values. The other cases that worked did they have 300SW on the top part of the plots as well maybe?
Try
sort(unique(DataRearrange$Audio1) )
[1] "300MFL" "300SW" "3500MFL"
Combining first two will give you 300SW on the bottom part of the plots.
Combining last two will give you 300SW on the top part.
So, to check this assumption, in your second case when you combine 300MFL and 300SW try to replace 300SW with 1_300SW (so that 300SW will be plotted on top) and see what happens. Just do :
DataRearrange$Audio1[DataRearrange$Audio1=="300SW"] = "1_300SW"
# Below is the alternative coupling..
ErrorBarsDF <- DataRearrange[(DataRearrange$Audio1=="1_300SW" | DataRearrange$Audio1=="300MFL"), c("correct","Visual1", "Audio1", "Audio2","correct_response", "response", "subject_nr")]
DataRearrange <- DataRearrange[(DataRearrange$Audio1=="1_300SW" | DataRearrange$Audio1=="300MFL"), c("correct","Visual1", "Audio1", "Audio2","correct_response", "response", "subject_nr")]
I want to rank the variables in my dataset in a descending order of the Number of Plants used. I tried ranking in .csv and then exporting it in R. But even then, the plot was not ranked in the required order. Here is my dataset
df <- structure(list(Lepidoptera.Family = structure(c(3L, 2L, 5L, 1L, 4L, 6L),
.Label = c("Hesperiidae", "Lycaenidae", "Nymphalidae", "Papilionidae", "Pieridae","Riodinidae"), class = "factor"),
LHP.Families = c(55L, 55L, 15L, 14L, 13L, 1L)),
.Names = c("Lepidoptera.Family", "LHP.Families"),
class = "data.frame", row.names = c(NA, -6L))
library(ggplot2)
library(reshape2)
gg <- melt(df,id="Lepidoptera.Family", value.name="LHP.Families", variable.name="Type")
ggplot(gg, aes(x=Lepidoptera.Family, y=LHP.Families, fill=Type))+
geom_bar(stat="identity")+
coord_flip()+facet_grid(Type~.)
How do i rank them in a descending order? Also, i want to combine 3 plots into one. How can i go about it?
The reason this is happening is that ggplot plots the x variables that are factors in the ordering of the underlying values (recall that factors are stored as numbers underneath the covers). If you want to graph them in an alternate order, you should change the order of the levels before plotting
gg$Lepidoptera.Family<-with(gg,
factor(Lepidoptera.Family,
levels=Lepidoptera.Family[order(LHP.Families)]))
The trick is to reorder the levels of the Lepidoptera.Family factor, which by default is alphabetical:
df = within(df, {
factor(Lepidoptera.Family, levels = reorder(Lepidoptera.Family, LHP.Families))
})
gg <- melt(df,id="Lepidoptera.Family", value.name="LHP.Families", variable.name="Type")
ggplot(gg, aes(x=Lepidoptera.Family, y=LHP.Families, fill=Type))+ geom_bar(stat="identity")+ coord_flip()+facet_grid(Type~.)
I am building a quantile-quantile plot out of an variable called x from a data frame called df in the working example provided below. I would like to label the points with the name variable of my df dataset.
Is it possible to do this in ggplot2 without resorting to the painful solution (coding the theoretical distribution by hand and then plotting it against the empirical one)?
Edit: it happens that yes, thanks to a user who posted and then deleted his answer. See the comments after Arun's answer below. Thanks to Didzis for his otherwise clever solution with ggbuild.
# MWE
df <- structure(list(name = structure(c(1L, 2L, 3L, 4L, 5L, 7L, 9L,
10L, 6L, 12L, 13L, 14L, 15L, 16L, 17L, 19L, 18L, 20L, 21L, 22L,
8L, 23L, 11L, 24L), .Label = c("AUS", "AUT", "BEL", "CAN", "CYP",
"DEU", "DNK", "ESP", "FIN", "FRA", "GBR", "GRC", "IRL", "ITA",
"JPN", "MLT", "NLD", "NOR", "NZL", "PRT", "SVK", "SVN", "SWE",
"USA"), class = "factor"), x = c(-0.739390016757746, 0.358177826874146,
1.10474523846099, -0.250589535389937, -0.423112615445571, -0.862144579740376,
0.823039669834058, 0.079521521937704, 1.08173649722493, -2.03962942823921,
1.05571087029737, 0.187147291278723, -0.144770773941437, 0.957990771847331,
-0.0546549555439176, -2.70142550075757, -0.391588386498849, -0.23855544527369,
-0.242781575907386, -0.176765072121165, 0.105155860923456, 2.69031085872414,
-0.158320176671995, -0.564560815972446)), .Names = c("name",
"x"), row.names = c(NA, -24L), class = "data.frame")
library(ggplot2)
qplot(sample = x, data = df) + geom_abline(linetype = "dotted") + theme_bw()
# ... using names instead of points would allow to spot the outliers
I am working on an adaptation of this gist, and will consider sending other questions to CrossValidated if I have questions about the regression diagnostics, which might be of interest to CV users.
You can save your original QQ plot as object (used function ggplot() and stat_qq() instead of qplot())
g<-ggplot(df, aes(sample = x)) + stat_qq()
Then with function ggplot_build() you can extract data used for plotting. They are stored in element data[[1]]. Saved those data as new data frame.
df.new<-ggplot_build(g)$data[[1]]
head(df.new)
x y sample theoretical PANEL group
1 -2.0368341 -2.7014255 -2.7014255 -2.0368341 1 1
2 -1.5341205 -2.0396294 -2.0396294 -1.5341205 1 1
3 -1.2581616 -0.8621446 -0.8621446 -1.2581616 1 1
4 -1.0544725 -0.7393900 -0.7393900 -1.0544725 1 1
5 -0.8871466 -0.5645608 -0.5645608 -0.8871466 1 1
6 -0.7415940 -0.4231126 -0.4231126 -0.7415940 1 1
Now you can add to hew data frame names of observations. Important is to use order() as data in new data frame are ordered.
df.new$name<-df$name[order(df$x)]
Now plot new data frame as usual and instead of geom_point() provide geom_text().
ggplot(df.new,aes(theoretical,sample,label=name))+geom_text()+
geom_abline(linetype = "dotted") + theme_bw()
The points are too close by. I would do something like this:
df <- df[with(df, order(x)), ]
df$t <- quantile(rnorm(1000), seq(0, 100, length.out = nrow(df))/100)
p <- ggplot(data = df, aes(x=t, y=x)) + geom_point(aes(colour=df$name))
This gives:
If you insist on having labels inside the plot, then, you could try something like:
df <- df[with(df, order(x)), ]
df$t <- quantile(rnorm(1000), seq(0, 100, length.out = nrow(df))/100)
p <- ggplot(data = df, aes(x=t, y=x)) + geom_point(aes(colour=df$name))
p <- p + geom_text(aes(x=t-0.05, y=x-0.15, label=df$name, size=1, colour=df$name))
p
You can play around with the x and y coordinates and if you want you can always remove the colour aesthetics.
#Arun has a good solution in the comment above, but this works with R 4.0.3:
ggplot(data = df, aes(sample = x)) + geom_qq() + geom_text_repel(label=df$name[order(df$x)], stat="qq") + stat_qq_line()
Basically the same thing, with addition of stat_qq_line() and [order(df$x)] as part of the label. If you don't include the order function then your labels will be all out of order and very misleading.
Here's hoping this saves someone else some hours of their life.