I have a binary data frame, which each row represents data related to a user (size of data frame :90 rows * 65 cols). The last column of this data frame contains the label for the users (4 labels :Excellent, Good, bad, fail).
My question is, how can I plot only one density curve for each label. I mean, My final plot would have only 4 curves (each curve corresponding to each label).
Thanks
I think that this question can be found here (Multiple Groups in geom_density() plot) so my answer is almost exactly the same.
The only difference is that I used mtcars with an extra column :
library(ggplot2)
test <- head(mtcars)
addcol <- c("great", "good", "bad", "great", "bad", "good")
test <- cbind(test, addcol)
ggplot() +
geom_density(data = test, aes(x = wt, group = addcol, color = addcol), adjust=2) +
xlab("wt") +
ylab("Density")
Related
I have data saved in multiple datasets, each consisting of four variables. Imagine something like a data.table dt consisting of the variables Country, Male/Female, Birthyear, Weighted Average Income. I would like to create a graph where you see only one country's weighted average income by birthyear and split by male/female. I've used the facet_grid() function to get a grid of graphs for all countries as below.
ggplot() +
geom_line(data = dt,
aes(x = Birthyear,
y = Weighted Average Income,
colour = 'Weighted Average Income'))+
facet_grid(Country ~ Male/Female)
However, I've tried isolating the graphs for just one country, but the below code doesn't seem to work. How can I subset the data correctly?
ggplot() +
geom_line(data = dt[Country == 'Germany'],
aes(x = Birthyear,
y = Weighted Average Income,
colour = 'Weighted Average Income'))+
facet_grid(Country ~ Male/Female)
For your specific case the problem is that you are not quoting Male/Female and Weighted Average Income. Also your data and basic aesthetics should likely be part of ggplot and not geom_line. Doing so isolates these to the single layer, and you would have to add the code to every layer of your plot if you were to add for example geom_smooth.
So to fix your problem you could do
library(tidyverse)
plot <- ggplot(data = dt[Country == 'Germany'],
aes(x = Birthyear,
y = sym("Weighted Average Income"),
col = sym("Weighted Average Income")
) + #Could use "`x`" instead of sym(x)
geom_line() +
facet_grid(Country ~ sym("Male/Female")) ##Could use "`x`" instead of sym(x)
plot
Now ggplot2 actually has a (lesser known) builtin functionality for changing your data, so if you wanted to compare this to the plot with all of your countries included you could do:
plot %+% dt # `%+%` is used to change the data used by one or more layers. See help("+.gg")
I tried to plot the distribution of my test and train data set in a histogram and found something curious:
Background:
I have a test set with 50 rows and a training set with 100 rows each with the same column structure.
I'd normally plot the data like that:
plot2 <- ggplot(data=Donald_1) +
geom_histogram(aes_string(x = "Alter", y = "..count..", fill = "Group"),
bins=20, alpha=0.7)
which results in the right histogram shown below. I then wondered how it could be that test has a higher count than training as the test set is only 50 rows instead of 100. And it seems as if the test bars show the sum of the test and training bars of the left plot.
Then I tried:
plot1 <- ggplot() +
geom_histogram(data=Donald_1 %>% filter(Group == "Training"),
aes_string(x="Alter", y="..count..", fill = "Group"),
bins=20, alpha=0.7) +
geom_histogram(data=Donald_1 %>% filter(Group == "Test"),
aes_string(x="Alter", y="..count..", fill="Group"),
bins=20, alpha=0.7)
which results in the left plot shown below and that results makes more sense to me.
I now wonder, why the first attempt doesn't result in the same plot as the second attempt. Am I missing something obvious here?
In your dataframe, you have the column "Group" which represents both values Training and Test.
ggplot understands that you are representing one histogram with two groups.
Your second plot represents two distinct histograms on the same grid, and transparency (alpha) makes it what it actually what it look like.
Moreover, maybe you will prefer this one :
plot3 <- ggplot(data=Donald_1) +
geom_histogram(aes_string(x = "Alter", y = "..count..", fill = "Group"),
bins=20, alpha=0.7, position="dodge")
I am currently working with a big biological dataset with many datapoint. The Head() function in R gives me the following column names:
intensity - Sample - Acession - Study - Dx
Intensity is the only data that is numeric. The others are character.
First, I have unfactorized all data into the following df: unfactordata. Next, I am interested in making a scatterplot of a specific subset of data which I do with the following piece of code where after I try to scatterplot it with a geom_smooth line in between. I use the following code:
scatplotprot <- function(name){
proteinname <- subset(unfactordata, Acession == name)
p <- ggplot(data = proteinname, aes(x = Dx, y = intensity, color = Study)) +
geom_point() +
geom_smooth(method = 'lm', aes(group = Dx))
return(p)
}
This does gives me a scatterplot with all the intensity values between 2 groups (Dx), as well as being coloured depending on which Study the datapoint originates from. However, it will not show me a line between the two groups (Dx). Depending on which Acession I call I expect to see between 3 to 8 lines.
Hope anyone can help me clear this hopefully small problem.
Warmest,
Patrick
I have 12 variables, M1, M2, ..., M12, for which I compute a certain statistic x.
df = data.frame(model = paste("M", 1:28, sep = ""), x = runif(28, 1, 1.05))
levels = seq(0.8, 1.2, 0.05)
I would like to plot this data as follows:
Each circle (contour) represents the a level of that statistic "x". The three blue lines simply represent three different scenarios.
The dataframe included in this example represents one scenario. The blue line would simply join the values of all the models M1 to M28 for that specific scenario.
Is there any tool in R that allow for such a plot? I tried contour() from library(MASS) but the contours are not drawn as perfect circles.
Any help would be appreciated. Thanks!
Here is a ggplot solution:
library(ggplot2)
ggplot(data=df, aes(x=model, y=x, group=1)) +
geom_line() + coord_polar() +
scale_y_continuous(limits=range(levels), breaks=levels, labels=levels)
Note this is a little confusing because of the names in your data frame. x is really the y variable here, and model the real x, so the graph scale label seems odd.
EDIT: I had to set your factor levels for model in the data frame so they plot in the correct order.
I have a population and a sample of that population. I've made a few plots comparing them using ggplot2 and its faceting option, but it occurred to me that having the sample in its own facet will distort the population plots (however slightly). Is there a way to facet the plots so that all records are in the population plot, and just the sampled records in the second plot?
Matt,
If I understood your question properly - you want to have a faceted plot where one panel contains all of your data, and the subsequent facets contain only a subset of that first plot?
There's probably a cleaner way to do this, but you can create a new data.frame object with the appropriate faceting variable that corresponds to each subset. Consider:
library(ggplot2)
df <- data.frame(x = rnorm(100), y = rnorm(100), sub = sample(letters[1:5], 100, TRUE))
df2 <- rbind(
cbind(df, faceter = "Whole Sample")
, cbind(df[df$sub == "a" ,], faceter = "Subset A")
#other subsets go here...
)
qplot(x,y, data = df2) + facet_wrap(~ faceter)
Let me know if I've misunderstood your question.
-Chase