How to create graph for Boston Dataset in r - r

My data is "Boston Housing Dataset", I want to produce a graphic that looks like this:
the code for the plot is in Python (unfortunately, I do not know python only r). link for the Code: kaggle.com/prasadperera/the-boston-housing-dataset .
but instead of y='medv' i need y='crim' with all the rest of the variable, in order to find predictors that have an interesting association with the crime.
I have tried to do this in r, my code is:
Very appreciative of the help, Thanks!

One way to do something similar in ggplot is with faceting. Here's what that might look like
library(ggplot2)
library(dplyr)
library(tidyr)
data(Boston, package="MASS")
Boston %>%
select(crim, lstat, indus, nox, ptratio, rm, tax, dis, age) %>%
gather(obs, val, -crim) %>%
ggplot(aes(val, crim, color=obs, fill=obs)) +
geom_smooth(method="lm", se=TRUE) +
geom_point() +
facet_wrap(~obs, scales="free_x") +
scale_color_discrete(guide=FALSE) +
scale_fill_discrete(guide=FALSE)
The basic key is to reshape the data so you can draw just one plot and facet it rather than drawing many plots and arranging them.

Related

Two regression lines in one plot?

I have two cor.test results. I would like to visualize this in a plot using geom_smooth. So far, I have a working code with one regression line, but I don't know how to add a second regression line to the same plot. This is my code so far:
cor(opg6wave1$godimportant, opg6wave1$aj, use = 'complete.obs')#-0.309117
cor(opg6wave6$godimportant, opg6wave6$aj, use = 'complete.obs') ##=-0.4321519
ggplot(opg6wave1, aes(x= godimportant, y= aj))+
geom_smooth()+
labs(title = "Religion og abort over tid", x='Religiøsitet', y= 'Holdning til abort')+
theme_classic()
Thank y'all:)
I don't have access to your dataset, you might want to share it? I'm using the diamonds dataset from tidyverse. By putting the dataset in the ggplot(...) command you then have it transfer to any underlying geom_.... You want to specify the data for each regression line separately. We can have two geom_smooth() by specifying the data for each of them separately.
library(tidyverse)
ggplot()+
geom_smooth(diamonds %>% filter(color=="E"),
mapping=aes(x=depth, y=price))+
geom_smooth(diamonds %>% filter(color=="J"),
mapping=aes(x=depth, y=price)) +
theme_classic()
The above for linear model smooth:
ggplot()+
geom_smooth(diamonds %>% filter(color=="E"),
mapping=aes(x=depth, y=price),
method=lm)+
geom_smooth(diamonds %>% filter(color=="J"),
mapping=aes(x=depth, y=price),
method=lm) +
theme_classic()

Scatterplot using ggplot

I need to create a scatterplot of count vs. depth of 12 species using ggplot.
This is what I have so far:
library(ggplot2)
ggplot(data = ReefFish, mapping = aes(count, depth))
However, how do I use geom_point(), geom_smooth(), and facet_wrap() to include a smoother as well as include just the 12 species I want from the data (ReefFish)? Since I believe what I have right now includes all species from the data.
Here is an example of part of my data:
Since I don't have access to the ReefFish data set, here's an example using the built-in mpg data set about cars. To make it work with your data set, just edit this code to replace manufacturers with species.
Filter the data
First we filter the data so that it only includes the species/manufacturers we're interested in.
# load our packages
library(ggplot2)
library(magrittr)
library(dplyr)
# set up a character vector of the manufacturers we're interested in
manufacturers <- c("audi", "nissan", "toyota")
# filter our data set to only include the manufacturers we care about
mpg_filtered <- mpg %>%
filter(manufacturer %in% manufacturers)
Plot the data
Now we plot. Your code was just about there! You just needed to add the plot elements, you wanted, like so:
mpg_filtered %>%
ggplot(mapping = aes(x = cty,
y = hwy)) +
geom_point() +
geom_smooth() +
facet_wrap(~manufacturer)
Hope that helps, and let me know if you have any issues.

How can I use column labels as Y axis in ggplot?

Hello,
I have a dateset structured as shown in the link above. I am extremely new to R. And this is probably super easy to get done. But I cannot figure out how to plot this dataset using ggplot...
Could anyone guide and give me hints?
I basically want to color lines according to socioeconomic levels and visualize it by each years' value...
You need to reshape you data to run ggplot.
library(reshape)
library(dplyr)
library(ggplot2)
df_long <- melt(df) # reshape the dataframe to a long format
df_long %>%
ggplot( aes(x=variable, y=value, group=group, color=group)) +
geom_line()
Note: You will get better answers if you post your code with a reproducible dataset.

How to make all lines in count plot start from the control group

I am trying to make a count plot from RNA-seq data for individual genes. I am only interested in the comparisons between each treatment and the control group and my data are paired, so I'm trying to show this. I have managed to make the graph on the left (Counts of single gene) by using the plotCounts function of DEseq2 and then modify the graph a bit. The code is the following:
data <- plotCounts(dds, gene="GB41122", intgroup=c("Treatment", "Home", "Behaviour"), returnData=TRUE)
data <- ggplot(data, aes(x=Treatment, y=count, shape = Behaviour, color=Home, group=Home)) +
scale_y_log10() +
geom_point() + geom_line()
How could this be modified so that the graph looks like the one to the right?
Also, how can I reorder the treatment levels so that I have ctr to the left, then CO1 and CO2 to the right?
Thank you!
Andrea
I don't know how change the lines, but to reordrer the treatment levels, try adding this to your code:
+ scale_x_discrete(limits=c("Ctr", "CO1", "CO2"))

Plotting multiple individual correlation plots using ggplot

That is gonna be a very basic and naive question, but as my R and programming skills are very limited, I have no idea how to solve it. I would really appreciate if you guys could help me on this.
I want to plot multiple correlation plots comparing a fixed x-axis (Sepal.Length, in the example below) with each column on my dataset as y-axis (Sepal.Width, Petal.Length and Petal.Width). I suspect I might need to use apply, but I don't know how to build it in a function.
Right now I am able to do it manually one by one, but that is not helpful at all. Bellow, I am sharing the piece of the code I would like to apply to every column in my dataset.
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width, color=Species)) +
geom_smooth(aes(group = 1), method=lm,) +
geom_point(size=4, shape=20, alpha=0.6) + theme(legend.position="none") +
annotate(x=min(iris$Sepal.Width),y=min(iris$Sepal.Width),hjust=.2,
label=paste("R = ", round(cor(iris$Sepal.Width, iris$Sepal.Width),2)),
geom="text", size=4)
After generating all plots my idea is plot all of them side by side using grid.arrange package.
Are you looking something like this?
library(tidyr)
library(dplyr)
library(ggplot2)
iris %>% select(-Species) %>%
gather(YCol, YValue, -Sepal.Length) %>%
ggplot(aes(x=Sepal.Length, y=YValue)) +
geom_point() +
facet_grid(YCol~.)
It contains same Y-axis but if you do not want, then you could use scales="free_y".

Resources