Have an assignment where we need to provide one-dimensional graphs for EDA but the sample code given answers most of the requirements already (simple scatter and box plots and a histogram) so I am trying to "spice it up" a little by creating some more interesting graphs. Only need a couple.
The data set is the twin IQ data across several studies/authors and I was wanting to do a back-to-back histogram of the twins separated by author. So far I can do an overlay of authors or the back to back of the twins using ggplot but I am then stuck when trying to separate in to either 4 graphs or overlaid back-to-backs.
The code I was using for the overlay was ggplot with either geom_density or geom_histogram and the code for the back-to-back came from R-Bloggers and I used the first snippet:
ggplot(df, aes(IQ)) + geom_histogram(aes(x = x1, y = ..density..), fill = "blue") + geom_histogram( aes(x = x2, y = -..density..), fill = "green")
What I am looking for is a way to combine these two techniques or how to get ggplot to split the graphs up by factor in much the same was as plot/lattice does when you do, for example:
bwplot(y~x1.x2|Author, data=df)
The snippet that I am using to achieve separate plots includes facet_grid() such that the final code is:
ggplot(df, aes(y)) + facet_grid(~Author) + geom_histogram(aes(x = x1, y = ..density..), fill = "green") + geom_histogram(aes(x = x2, y = -..density..), fill = "blue")
I wasn't previously aware of the facet_grid() function of ggplot so thank you very much to MLavoie and Brandon Bertelsen.
Related
I'm very new to R and I have to deal with quite big datasets from a previous work. During these previous studies, the person worked on excel and I have to adapt everything on R.
Especially, I did 2 simple linear regressions. To simplify, the first one represents Y as a function of X from one dataframe, let's call it My_Data_1, and the second one is Y' as a function of X, with Y' a variable in the dataframe My_Data_2. In other words, X,Y and Y' come from different dataframes (among many other variables).
I'd like to compare the two regressions by plotting them on a single graph using ggplot2.
However, I don't know how to procede because the dataframe import is done in the ggplot, such as:
ggplot(data = My_Data_1, aes(x=X, y=Y)) +
geom_point() + etc...
I tried to put only x in the ggplot() and to put Y and Y' in geom_point() but it doesn't solve anything: Y' is unknown in this case because only one dataframe is imported in ggplot.
I didn't find solution. One would be to create a new table but I'd like to know if there is another way to do so.
I hope it was clear enough... Thanks by advance for your help!
You can use multiple layers inside one plot. For example you could use two different geom_smooth() layers. Each layer can be build from different data.
For example your first layer could look like this:
geom_smooth(data= df1, aes(x=x, y=y),method ="lm", se = FALSE, color = "blue")
And your second layer could look like this:
geom_smooth(data= df2, aes(x=x, y=y),method ="lm", se = FALSE, color = "red")
Same goes for your geom_point() layer.
In the end you can piece it together with +.
ggplot()+
geom_smooth(data= df1, aes(x=x, y=y),method ="lm", se = FALSE, color = "blue") +
geom_smooth(data= df2, aes(x=x, y=y),method ="lm", se = FALSE, color = "red") +
labs(title= "Two Regression Lines", x = "x Value", y = "Y Value")
I am currently writing a theoretical article where no data is used and unfortunately I must say that I find ggplot hard to use in such applications for showing theoretical examples. I've been using ggplot for years on real, empirical data and there I liked ggplot very much. However, consider my current case. I am trying to plot two exponential functions together on a graph. One function is 0.5^x and the other one is 0.8^x. In order to produce a ggplot graph, I have to do the following:
x <- 1:20
a <- 0.5^x
b <- 0.8^x
data.frame(x, a, b) %>%
pivot_longer(c(a, b)) %>%
ggplot(aes(x = x, y = value, color = name, group = name))+
geom_line()
Output:
Which completely doesn't correspond to the psychological process in my head to create such a graph - mainly becasue of converting it to the long format to be able to group it.
In my head, I am creating two simple, but distinct curves on the same canvas. So I should be able to use something like:
qplot(x, 0.5^x, geom = "line")+
qplot(x, 0.8^x, geom = "line")
However, that doesn't work because
Can't add `qplot(x, 0.8^x, geom = "line")` to a ggplot object.
Any help with how to create such a simple graph without reshaping the data would be appreciated, thanks.
Using geom_function you could do:
library(ggplot2)
ggplot() +
geom_function(fun = ~ 0.5^.x, mapping = aes(color = "a")) +
geom_function(fun = ~ 0.8^.x, mapping = aes(color = "b")) +
xlim(1, 20)
Created on 2022-05-08 by the reprex package (v2.0.1)
Maybe something like this. It is possible to keep the data in wide format. But generally it is better to bring it long foramt:
library(ggplot2)
ggplot()+
geom_line(aes(x, 0.5^x, color="red"))+
geom_line(aes(x, 0.8^x, color = "blue"))+
scale_color_identity()
Hi I am trying to code for a scatter plot for three variables in R:
Race= [0,1]
YOI= [90,92,94]
ASB_mean = [1.56, 1.59, 1.74]
Antisocial <- read.csv(file = 'Antisocial.csv')
Table_1 <- ddply(Antisocial, "YOI", summarise, ASB_mean = mean(ASB))
Table_1
Race <- unique(Antisocial$Race)
Race
ggplot(data = Table_1, aes(x = YOI, y = ASB_mean, group_by(Race))) +
geom_point(colour = "Black", size = 2) + geom_line(data = Table_1, aes(YOI,
ASB_mean), colour = "orange", size = 1)
Image of plot: https://drive.google.com/file/d/1E-ePt9DZJaEr49m8fguHVS0thlVIodu9/view?usp=sharing
Data file: https://drive.google.com/file/d/1UeVTJ1M_eKQDNtvyUHRB77VDpSF1ASli/view?usp=sharing
Can someone help me understand where I am making mistake? I want to plot mean ASB vs YOI grouped by Race. Thanks.
I am not sure what is your desidered output. Maybe, if I well understood your question I Think that you want somthing like this.
g_Antisocial <- Antisocial %>%
group_by(Race) %>%
summarise(ASB = mean(ASB),
YOI = mean(YOI))
Antisocial %>%
ggplot(aes(x = YOI, y = ASB, color = as_factor(Race), shape = as_factor(Race))) +
geom_point(alpha = .4) +
geom_point(data = g_Antisocial, size = 4) +
theme_bw() +
guides(color = guide_legend("Race"), shape = guide_legend("Race"))
and this is the output:
#Maninder: there are a few things you need to look at.
First of all: The grammar of graphics of ggplot() works with layers. You can add layers with different data (frames) for the different geoms you want to plot.
The reason why your code is not working is that you mix the layer call and or do not really specify (and even mix) what is the scatter and line visualisation you want.
(I) Use ggplot() + geom_point() for a scatter plot
The ultimate first layer is: ggplot(). Think of this as your drawing canvas.
You then speak about adding a scatter plot layer, but you actually do not do it.
For example:
# plotting antisocal data set
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race)))
will plot your Antiscoial data set using the scatter, i.e. geom_point() layer.
Note that I put Race as a factor to have a categorical colour scheme otherwise you might end up with a continous palette.
(II) line plot
In analogy to above, you would get for the line plot the following:
# plotting Table_1
ggplot() +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean))
I save showing the plot of the line.
(III) combining different layers
# putting both together
ggplot() +
geom_point(data = Antisocial, aes(x = YOI, y = ASB, colour = as.factor(Race))) +
geom_line(data = Table_1, aes(x = YOI, y = ASB_mean)) +
## this is to set the legend title and have a nice(r) name in your colour legend
labs(colour = "Race")
This yields:
That should explain how ggplot-layering works. Keep an eye on the datasets and geoms that you want to use. Before working with inheritance in aes, I recommend to keep the data= and aes() call in the geom_xxxx. This avoids confustion.
You may want to explore with geom_jitter() instead of geom_point() to get a bit of a better presentation of your dataset. The "few" points plotted are the result of many datapoints in the same position (and overplotted).
Moving away from plotting to your question "I want to plot mean ASB vs YOI grouped by Race."
I know too little about your research to fully comprehend what you mean with that.
I take it that the mean ASB you calculated over the whole population is your reference (aka your Table_1), and you would like to see how the Race groups feature vs this population mean.
One option is to group your race data points and show them as boxplots for each YOI.
This might be what you want. The boxplot gives you the median and quartiles, and you can compare this per group against the calculated ASB mean.
For presentation purposes, I highlighted the line by increasing its size and linetype. You can play around with the colours, etc. to give you the aesthetics you aim for.
Please note, that for the grouped boxplot, you also have to treat your integer variable YOI, I coerced into a categorical factor. Boxplot works with fill for the body (colour sets only the outer line). In this setup, you also need to supply a group value to geom_line() (I just assigned it to 1, but that is arbitrary - in other contexts you can assign another variable here).
ggplot() +
geom_boxplot(data = Antisocial, aes(x = as.factor(YOI), y = ASB, fill = as.factor(Race))) +
geom_line(data = Table_1, aes(x = as.factor(YOI), y = ASB_mean, group = 1)
, size = 2, linetype = "dashed") +
labs(x = "YOI", fill = "Race")
Hope this gets you going!
I'm currently working on a very simple data.frame, containing three columns:
x contains x-coordinates of a set of points,
y contains y-coordinates of the set of points, and
weight contains a value associated to each point;
Now, working in ggplot2 I seem to be able to plot contour levels for these data, but i can't manage to find a way to fill the plot according to the variable weight. Here's the code that I used:
ggplot(df, aes(x,y, fill=weight)) +
geom_density_2d() +
coord_fixed(ratio = 1)
You can see that there's no filling whatsoever, sadly.
I've been trying for three days now, and I'm starting to get depressed.
Specifying fill=weight and/or color = weight in the general ggplot call, resulted in nothing. I've tried to use different geoms (tile, raster, polygon...), still nothing. Tried to specify the aes directly into the geom layer, also didn't work.
Tried to convert the object as a ppp but ggplot can't handle them, and also using base-R plotting didn't work. I have honestly no idea of what's wrong!
I'm attaching the first 10 points' data, which is spaced on an irregular grid:
x = c(-0.13397460,-0.31698730,-0.13397460,0.13397460,-0.28867513,-0.13397460,-0.31698730,-0.13397460,-0.28867513,-0.26794919)
y = c(-0.5000000,-0.6830127,-0.5000000,-0.2320508,-0.6547005,-0.5000000,-0.6830127,-0.5000000,-0.6547005,0.0000000)
weight = c(4.799250e-01,5.500250e-01,4.799250e-01,-2.130287e+12,5.798250e-01,4.799250e-01,5.500250e-01,4.799250e-01,5.798250e-01,6.618956e-01)
any advise? The desired output would be something along these lines:
click
Thank you in advance.
From your description geom_density doesn't sound right.
You could try geom_raster:
ggplot(df, aes(x,y, fill = weight)) +
geom_raster() +
coord_fixed(ratio = 1) +
scale_fill_gradientn(colours = rev(rainbow(7)) # colourmap
Here is a second-best using fill=..level... There is a good explanation on ..level.. here.
# load libraries
library(ggplot2)
library(RColorBrewer)
library(ggthemes)
# build your data.frame
df <- data.frame(x=x, y=y, weight=weight)
# build color Palette
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")), space="Lab")
# Plot
ggplot(df, aes(x,y, fill=..level..) ) +
stat_density_2d( bins=11, geom = "polygon") +
scale_fill_gradientn(colours = myPalette(11)) +
theme_minimal() +
coord_fixed(ratio = 1)
I'm struggling with facet_wrap in R. It should be simple however the facet variable is not being picked up? Here is what I'm running:
plot = ggplot(data = item.household.descr.count, mapping = aes(x=item.household.descr.count$freq, y = item.household.descr.count$descr, color = item.household.descr.count$age.cat)) + geom_point()
plot = plot + facet_wrap(~ age.cat, ncol = 2)
plot
I colored the faceting variable to try to help illustrate what is going on. The plot should have only one color in each facet instead of what you see here. Does anyone know what is going on?
This error is caused by fact that you are using $and data frame name to refer to your variables inside the aes(). Using ggplot() you should only use variables names in aes() as data frame is named already in data=.
plot = ggplot(data = item.household.descr.count,
mapping = aes(x=freq, y = descr, color = age.cat)) + geom_point()
plot = plot + facet_wrap(~ age.cat, ncol = 2)
plot
Here is an example using diamonds dataset.
diamonds2<-diamonds[sample(nrow(diamonds),1000),]
ggplot(diamonds2,aes(diamonds2$carat,diamonds2$price,color=diamonds2$color))+geom_point()+
facet_wrap(~color)
ggplot(diamonds2,aes(carat,price,color=color))+geom_point()+
facet_wrap(~color)