I want to rank the variables in my dataset in a descending order of the Number of Plants used. I tried ranking in .csv and then exporting it in R. But even then, the plot was not ranked in the required order. Here is my dataset
df <- structure(list(Lepidoptera.Family = structure(c(3L, 2L, 5L, 1L, 4L, 6L),
.Label = c("Hesperiidae", "Lycaenidae", "Nymphalidae", "Papilionidae", "Pieridae","Riodinidae"), class = "factor"),
LHP.Families = c(55L, 55L, 15L, 14L, 13L, 1L)),
.Names = c("Lepidoptera.Family", "LHP.Families"),
class = "data.frame", row.names = c(NA, -6L))
library(ggplot2)
library(reshape2)
gg <- melt(df,id="Lepidoptera.Family", value.name="LHP.Families", variable.name="Type")
ggplot(gg, aes(x=Lepidoptera.Family, y=LHP.Families, fill=Type))+
geom_bar(stat="identity")+
coord_flip()+facet_grid(Type~.)
How do i rank them in a descending order? Also, i want to combine 3 plots into one. How can i go about it?
The reason this is happening is that ggplot plots the x variables that are factors in the ordering of the underlying values (recall that factors are stored as numbers underneath the covers). If you want to graph them in an alternate order, you should change the order of the levels before plotting
gg$Lepidoptera.Family<-with(gg,
factor(Lepidoptera.Family,
levels=Lepidoptera.Family[order(LHP.Families)]))
The trick is to reorder the levels of the Lepidoptera.Family factor, which by default is alphabetical:
df = within(df, {
factor(Lepidoptera.Family, levels = reorder(Lepidoptera.Family, LHP.Families))
})
gg <- melt(df,id="Lepidoptera.Family", value.name="LHP.Families", variable.name="Type")
ggplot(gg, aes(x=Lepidoptera.Family, y=LHP.Families, fill=Type))+ geom_bar(stat="identity")+ coord_flip()+facet_grid(Type~.)
Related
I am analyzing monthly observations of water input (rainfall) and output (evaporation) at a given location.
I need to plot time series of both rainfall and evaporation, shading the area between the data points with varying colors according to which line is above the other.
This is what I have:
library(ggplot2)
library(reshape2)
dat1 <- structure(list(month = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L,
12L), value = c(226.638505697305, 186.533910906497, 141.106702957603,
93.4474376969313, 134.58903301495, 77.6436398653559, 77.301864710113,
69.7349071531699, 109.208227499776, 165.197186758555, 156.057081859175,
168.342059689587, 136.34266772667, 119.741309096806, 120.395245911241,
98.1418096019397, 72.4585192294772, 59.6209861948614, 69.6993145911677,
97.1585171469416, 118.357052089691, 132.74037278737, 139.141233379528,
146.583047731729), var = c("rainfall", "rainfall", "rainfall",
"rainfall", "rainfall", "rainfall", "rainfall", "rainfall", "rainfall",
"rainfall", "rainfall", "rainfall", "evaporation", "evaporation",
"evaporation", "evaporation", "evaporation", "evaporation", "evaporation",
"evaporation", "evaporation", "evaporation", "evaporation", "evaporation"
)), row.names = c(NA, -24L), class = "data.frame")
ggplot(dat1, aes(x=month,y=value, colour=var)) +
geom_line() +
scale_color_manual(values=c("firebrick1", "dodgerblue")) +
theme_bw(base_size=18)
which yields the following graph (with little edits to show what I'm trying to achieve):
My initial attempt to fill the areas between the lines was based on this SO answer:
dat2 <- data.frame(month=1:12,
rainfall=dat1[dat1$var=="rainfall",]$value,
evaporation=dat1[dat1$var=="evaporation",]$value)
dat2 <- cbind(dat2, min_line=pmin(dat2[,2],dat2[,3]) )
dat2 <- melt(dat2, id.vars=c("month","min_line"), variable.name="var", value.name="value")
ggplot(data=dat2, aes(x=month, fill=var)) +
geom_ribbon(aes(ymax=value, ymin=min_line)) +
scale_fill_manual(values=c(rainfall="dodgerblue", evaporation="firebrick1"))
However, it's not quite what I need.
How can I achieve the desired result?
The reason you're getting the wrong shading is probably because the data is a bit on the coarse side. My advice would be to interpolate the data first. Assuming dat1 is from your example.
library(ggplot2)
# From long data to wide data
dat2 <- tidyr::pivot_wider(dat1, values_from = value, names_from = var)
# Setup interpolated data (tibble because we can then reference column x)
dat3 <- tibble::tibble(
x = seq(min(dat2$month), max(dat2$month), length.out = 1000),
rainfall = with(dat2, approx(month, rainfall, xout = x)$y),
evaporation = with(dat2, approx(month, evaporation, xout = x)$y)
)
Then, we need to find a way to identify groups, and here is a helper function for that. Group IDs are based on the runs in run length encoding.
# Make function to identify groups
rle_id <- function(x) {
x <- rle(x)
rep.int(seq_along(x$lengths), x$lengths)
}
And now we can plot it.
ggplot(dat3, aes(x)) +
geom_ribbon(aes(ymin = pmin(evaporation, rainfall),
ymax = pmax(evaporation, rainfall),
group = rle_id(sign(rainfall - evaporation)),
fill = as.factor(sign(rainfall - evaporation))))
Created on 2021-02-14 by the reprex package (v1.0.0)
Hi I am relatively new to R. I am struggling with what seems like it should be a relatively simple task- I am trying to make a frequency histogram using ggplot2 from a subset of data from a longer dataframe.
Here is an example of the data structure us in the picture attached
https://i.stack.imgur.com/HIwQv.png
The data is from a survey where 0 means not selected and 1 means it was selected. There are numeric in the original dataset I want a histogram of the frequency in which each variable was selected. The column variables on the x-axis and frequency counts on the y-axis. I have various subsets like this within a dataframe and I would like each to subset to how their own graph.
I first subset the columns of interest
new dataset <-subset(df, select = c(WAB_R, WAB_B, BDAE, PNT))
When I checked the class it was dataframe and no longer numeric
I tried to use as.numeric to convert it back to a numeric, but with no luck
I could use some guidance in how to structure the data to then obtain a histogram.
Thanks Carla
Maybe try this approach using tidyverse functions. You have to reshape to long selecting the desired variables. Here the code using ggplot2 for the final plot:
library(tidyverse)
#Code 1
df %>% select(c(WAB_R, WAB_B, BDAE, PNT)) %>%
pivot_longer(everything()) %>%
ggplot(aes(x=value))+
geom_histogram(stat = 'count',aes(fill=name),
position = position_dodge2(0.9,preserve = 'single'))+
labs(fill='Variable')
Output:
Or this:
#Code 2
df %>% select(c(WAB_R, WAB_B, BDAE, PNT)) %>%
pivot_longer(everything()) %>%
ggplot(aes(x=factor(value)))+
geom_histogram(stat = 'count',aes(fill=name),
position = position_dodge2(0.9,preserve = 'single'))+
labs(fill='Variable')+xlab('value')
Output:
Some data used:
#Data
df <- structure(list(ID = 1:4, WAB_R = c(0L, 1L, 0L, 1L), WAB_B = c(0L,
1L, 0L, 0L), BDAE = c(0L, 0L, 0L, 1L), PNT = c(0L, 0L, 0L, 0L
)), class = "data.frame", row.names = c(NA, -4L))
This question already has answers here:
geom_smooth on a subset of data
(3 answers)
Closed 3 years ago.
Data: Height was recorded daily
I want to plot the Height of my Plants (Plant A1 - Z50)
in single Plots, and i want to Highlight the current Year.
So i made a Subset of each Plant and a subset for the current year (2018)
Now i need a Plot with the total record an the highlighted Data from 2018
dput(Plant)
structure(list(Name = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L), .Label = c("Plant A1", "Plant B1", "Plant C1"), class = "factor"),
Date = structure(c(1L, 4L, 5L, 7L, 1L, 4L, 6L, 1L, 2L, 3L
), .Label = c(" 2001-01-01", " 2001-01-02", " 2001-01-03",
" 2002-01-01", " 2002-02-01", " 2019-01-01", " 2019-12-31"
), class = "factor"), Height_cm = c(91, 106.1, 107.4, 145.9,
169.1, 192.1, 217.4, 139.8, 140.3, 140.3)), .Names = c("Name",
"Date", "Height_cm"), class = "data.frame", row.names = c(NA,
-10L))
Plant_A1 <- filter(Plant, Name == "Plant A1")
Current_Year <- as.numeric("2018")
Plant_A1_Subset <- filter(Plant_A1, format(Plant_A1$Date, '%Y') == Current_Year)
ggplot(data=Plant_A1,aes(x=Plant_A1$Date, y=Plant_A1$Heigth)) +
geom_point() +
geom_smooth(method="loes", level=0.95, span=1/2, color="red") +
labs(x="Data", y="Height cm")
Now i don't know how to put my new Subset for 2018(Plant_A1_Subset) into this graph.
As noted, this question has a duplicate with an answer in this question.
That said here's likely the most common way of handling your problem.
In ggplot2 future calls inherits any arguments passed into aes of the ggplot(aes(...)) function. Thus the plot will always use these arguments in future ggplot functions, unless one manually overwrites the arguments. However we can solve your problem, by simply adding an extra argument in the aes of geom_point. Below I've illustrated a simple way to achieve what you might be looking for.
Specify the aes argument in individual calls
The first method is likely the most intuitive. aes controls the the plotted parameters. As such if you want to add colour to certain points, one way is to let the aes be individual to the geom_point and geom_smooth argument.
library(ggplot2)
library(lubridate) #for month(), year(), day() functions
current_year <- 2018
ggplot(data = Plant_A1, aes(x = Date, y = Heigth)) +
#Note here, colour set in geom_point
geom_point(aes(col = ifelse(year(Date) == current_year, "Yes", "No"))) +
geom_smooth(method="loess", level=0.95,
span=1/2, color="red") +
labs(x="Data", y="Height cm",
col = "Current year?") #Specify legend title for colour
Note here that i have used the inheritance of the aes argument. Simply put, the aes will check the names within data, and if it can find it, it will use these as variables. So there is no need to specify data$....
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a dataframe mydf as shown below. I want to plot a stacked 3d plot (as shown in sample plot below) where A, C, G, T columns are represented by bars (years in the sample plot). Length column ranges from 18 to 34 for each file.name and want them to be on the Gas Type axis as in example plot, then I finally want file.name column to be on country axis as in sample plot below. How can I get this plotted in R? Thanks for your help.
mydf <- structure(list(file.name = structure(c(1L, 1L, 2L, 2L, 4L, 4L
), .Label = c("merged_read_counts", "DCL1_VF_1_GAGTGG_L008_R1_001",
"DCL1_VF_2_GGTAGC_L008_R1_001", "DCL2_VF_1_ACTGAT_L008_R1_001",
"DCL2_VF_2_ATGAGC_L008_R1_001", "DCLd_SSHADV1_1_AGTTCC_L008_R1_001",
"DCLd_SSHADV1_2_ATGTCA_L008_R1_001", "DCLd_SSHADV1_3_CCGTCC_L008_R1_001",
"DCLd_SSHV2L_1_GTAGAG_L008_R1_001", "DCLd_SSHV2L_2_GTCCGC_L008_R1_001",
"DCLd_SSHV2L_3_GTGAAA_L008_R1_001", "DCLd_VF_1_GTGGCC_L008_R1_001",
"DCLd_VF_2_GTTTCG_L008_R1_001", "DCLd_VF_3_CGTACG_L008_R1_001",
"WT_SSHADV1_1_GGCTAC_L008_R1_001", "WT_SSHADV1_2_CTTGTA_L008_R1_001",
"WT_SSHADV1_3_AGTCAA_L008_R1_001", "WT_SSHV2L_1_GCCAAT_L008_R1_001",
"WT_SSHV2L_2_CAGATC_L008_R1_001", "WT_SSHV2L_3_ACTTGA_L008_R1_001",
"WT_SSHV2L_4_GATCAG_L008_R1_001", "WT_SSHV2L_5_TAGCTT_L008_R1_001",
"WT_VF_1_ATCACG_L008_R1_001", "WT_VF_2_CGATGT_L008_R1_001", "WT_VF_3_TTAGGC_L008_R1_001",
"WT_VF_4_TGACCA_L008_R1_001", "WT_VF_5_ACAGTG_L008_R1_001"), class = "factor"),
length = c(18L, 19L, 18L, 19L, 18L, 19L), A = c(2294436L,
2588528L, 52104L, 47190L, 103378L, 59269L), C = c(1501040L,
2838174L, 35888L, 93922L, 38132L, 31912L), G = c(2106623L,
1714702L, 80765L, 64930L, 129040L, 161517L), T = c(5065628L,
7462881L, 62174L, 87905L, 274783L, 110125L)), .Names = c("file.name",
"length", "A", "C", "G", "T"), row.names = c(1L, 2L, 18L, 19L,
58L, 59L), class = "data.frame")
I don't think 3d bar plots are available in ggplot(ggplot2 3D Bar Plot).
I would suggest grouped, stacked barplots instead. Something like:
library(ggplot2)
library(reshape2)
df <- melt(mydf,id.vars=c("file.name","length"))
ggplot(df, aes(x = file.name, y = value, fill = variable)) +
geom_bar(stat = 'identity', position = 'stack') + facet_grid(~ length)
I am building a quantile-quantile plot out of an variable called x from a data frame called df in the working example provided below. I would like to label the points with the name variable of my df dataset.
Is it possible to do this in ggplot2 without resorting to the painful solution (coding the theoretical distribution by hand and then plotting it against the empirical one)?
Edit: it happens that yes, thanks to a user who posted and then deleted his answer. See the comments after Arun's answer below. Thanks to Didzis for his otherwise clever solution with ggbuild.
# MWE
df <- structure(list(name = structure(c(1L, 2L, 3L, 4L, 5L, 7L, 9L,
10L, 6L, 12L, 13L, 14L, 15L, 16L, 17L, 19L, 18L, 20L, 21L, 22L,
8L, 23L, 11L, 24L), .Label = c("AUS", "AUT", "BEL", "CAN", "CYP",
"DEU", "DNK", "ESP", "FIN", "FRA", "GBR", "GRC", "IRL", "ITA",
"JPN", "MLT", "NLD", "NOR", "NZL", "PRT", "SVK", "SVN", "SWE",
"USA"), class = "factor"), x = c(-0.739390016757746, 0.358177826874146,
1.10474523846099, -0.250589535389937, -0.423112615445571, -0.862144579740376,
0.823039669834058, 0.079521521937704, 1.08173649722493, -2.03962942823921,
1.05571087029737, 0.187147291278723, -0.144770773941437, 0.957990771847331,
-0.0546549555439176, -2.70142550075757, -0.391588386498849, -0.23855544527369,
-0.242781575907386, -0.176765072121165, 0.105155860923456, 2.69031085872414,
-0.158320176671995, -0.564560815972446)), .Names = c("name",
"x"), row.names = c(NA, -24L), class = "data.frame")
library(ggplot2)
qplot(sample = x, data = df) + geom_abline(linetype = "dotted") + theme_bw()
# ... using names instead of points would allow to spot the outliers
I am working on an adaptation of this gist, and will consider sending other questions to CrossValidated if I have questions about the regression diagnostics, which might be of interest to CV users.
You can save your original QQ plot as object (used function ggplot() and stat_qq() instead of qplot())
g<-ggplot(df, aes(sample = x)) + stat_qq()
Then with function ggplot_build() you can extract data used for plotting. They are stored in element data[[1]]. Saved those data as new data frame.
df.new<-ggplot_build(g)$data[[1]]
head(df.new)
x y sample theoretical PANEL group
1 -2.0368341 -2.7014255 -2.7014255 -2.0368341 1 1
2 -1.5341205 -2.0396294 -2.0396294 -1.5341205 1 1
3 -1.2581616 -0.8621446 -0.8621446 -1.2581616 1 1
4 -1.0544725 -0.7393900 -0.7393900 -1.0544725 1 1
5 -0.8871466 -0.5645608 -0.5645608 -0.8871466 1 1
6 -0.7415940 -0.4231126 -0.4231126 -0.7415940 1 1
Now you can add to hew data frame names of observations. Important is to use order() as data in new data frame are ordered.
df.new$name<-df$name[order(df$x)]
Now plot new data frame as usual and instead of geom_point() provide geom_text().
ggplot(df.new,aes(theoretical,sample,label=name))+geom_text()+
geom_abline(linetype = "dotted") + theme_bw()
The points are too close by. I would do something like this:
df <- df[with(df, order(x)), ]
df$t <- quantile(rnorm(1000), seq(0, 100, length.out = nrow(df))/100)
p <- ggplot(data = df, aes(x=t, y=x)) + geom_point(aes(colour=df$name))
This gives:
If you insist on having labels inside the plot, then, you could try something like:
df <- df[with(df, order(x)), ]
df$t <- quantile(rnorm(1000), seq(0, 100, length.out = nrow(df))/100)
p <- ggplot(data = df, aes(x=t, y=x)) + geom_point(aes(colour=df$name))
p <- p + geom_text(aes(x=t-0.05, y=x-0.15, label=df$name, size=1, colour=df$name))
p
You can play around with the x and y coordinates and if you want you can always remove the colour aesthetics.
#Arun has a good solution in the comment above, but this works with R 4.0.3:
ggplot(data = df, aes(sample = x)) + geom_qq() + geom_text_repel(label=df$name[order(df$x)], stat="qq") + stat_qq_line()
Basically the same thing, with addition of stat_qq_line() and [order(df$x)] as part of the label. If you don't include the order function then your labels will be all out of order and very misleading.
Here's hoping this saves someone else some hours of their life.