I have uploaded a datafame and done a quick plot of all variables using:
df %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram()
Reference: https://drsimonj.svbtle.com/quick-plot-of-all-variables
I have split this data frame into two data frames based on a binary variable (in my case, Smoker/Non-smoker) in one of the columns. I would like to perform the same quick plot of all variables but have overlayed, different coloured histograms for each of the new data frames (to see if they differ significantly).
I found the following:
Overlaying two ggplot facet_wrap histograms
But it only does the facet_wrap over a single variable. Is there a way to do this by filtering the gathered data frame by the binary value something like:
df %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram(subset(df,Smoker==1), fill = "Red", alpha=0.3) +
geom_histogram(subset(df,Smoker==2), fill = "Blue", alpha=0.3)
Idea would be to overlay the following:
df_s %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram(fill = "Red", alpha=0.3)
df_ns %>%
keep(is.numeric) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free") +
geom_histogram(fill = "Blue", alpha=0.3)
I could do this will a loop but would like to do it with the df key-value pairs if possible.
df %>%
keep(is.numeric) %>% # you may need to remove this as smoker will need to be factor for grouping to work
tidyr::gather(key,value, -Smoker) %>% #- preserve smoker and use to colour
ggplot(aes(value, fill = Smoker)) +
facet_wrap(~ key, scales = "free") +
geom_histogram(alpha = 0.30) +
scale_fill_manual(values = c("red","blue"))
Related
This is a question referring to Plotting multiple columns against one column in ggplot2. Unfortunately I can't comment there so I need to ask it as a new question. Thanks to rnorouzian for asking the question and neilfws for answering.
I'm trying to layer all facets on one scatter plot (with geom_point) with a legend stating the name of the column.
The answer by neilfws contains this code, output was shown in the question:
library(tidyverse)
data <- read.csv('https://raw.githubusercontent.com/rnorouzian/e/master/vp_cond.csv')
data %>%
pivot_longer(cols = 1:12) %>%
mutate(name = factor(name, levels = paste0("X", 1:12))) %>%
ggplot(aes(x, value)) +
geom_line() +
facet_wrap(~name) +
theme_bw()
I tried this and got a good output but without a legend. Where can I specify "name" being used as info in the legend, also using different colors?
data %>%
pivot_longer(cols = 1:12) %>%
mutate(name = factor(name, levels = paste0("X", 1:12))) %>%
ggplot(aes(x, value)) +
geom_point() +
theme_bw()
Simply map name to the colour aesthetic. To do this, add colour = name inside aes:
data %>%
pivot_longer(cols = 1:12) %>%
mutate(name = factor(name, levels = paste0("X", 1:12))) %>%
ggplot(aes(x, value, colour = name)) +
geom_line() +
theme_bw()
I am using ggplot to create numerous dot charts (using geom_point and facet_wrap) for two variables with multiple categories. Multiple categories have only a few points, which I cannot use,
Is there a way to set a minimum number of observations for each facet (i.e. only show plots with 10 or more observations)?
by_AB <- group_by(df, A, B)
by_AB%>%
ggplot(aes(X,Y)) +
geom_point() +
facet_wrap(A~B, scales="free") +
geom_smooth(se = FALSE) +
theme_bw()```
It is best to remove the small groups from your data before plotting. This is very easy if you do
df %>%
group_by(A, B) %>%
filter(n() > 1) %>%
ggplot(aes(X,Y)) +
geom_point() +
facet_wrap(A~B, scales="free") +
geom_smooth(se = FALSE) +
theme_bw()
Obviously, we don't have your data, so here is an example using the built-in mtcars data set. Suppose we want to plot mpg against wt, but facet by carb:
library(tidyverse)
mtcars %>%
ggplot(aes(wt, mpg)) +
geom_point() +
facet_wrap(.~carb)
Two of our facets look out of place because they only have a single point. We can simply filter these groups out en route to ggplot:
mtcars %>%
group_by(carb) %>%
filter(n() > 1) %>%
ggplot(aes(wt, mpg)) +
geom_point() +
facet_wrap(.~carb)
Created on 2022-08-25 with reprex v2.0.2
I'm new to using R so please bear with me as my code might not look the best. So I want to combine these two line graphs together since right now I have written code for each item that I am analyzing. This is the dataset I am using: https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-09-01/readme.md I used the "Arable_Land" dataset!
##USA Arable Land
plot_arable_land_USA <- arable_land %>%
filter(Code == "USA") %>%
select(c(Year, Code, `Arable land needed to produce a fixed quantity of crops ((1.0 = 1961))`)) %>%
pivot_longer(-c(Year, Code)) %>%
ggplot(aes(x = Year, y = value,color=name,group=name)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y') +
theme_light() +
theme(legend.position = 'none')
ggplotly(plot_arable_land_USA)
##Canada Arable Land
plot_arable_land_CAN <- arable_land %>%
filter(Code == "CAN") %>%
select(c(Year, Code, `Arable land needed to produce a fixed quantity of crops ((1.0 = 1961))`)) %>%
pivot_longer(-c(Year, Code)) %>%
ggplot(aes(x = Year, y = value,color=name,group=name)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y') +
theme_light() +
theme(legend.position = 'none')
ggplotly(plot_arable_land_CAN)
Ideally, I would like one graph to show both like one line (in Purple) to show the USA and another line(in Brown) to show Canada.
Thank you!
Try this. It is a better practice to reshape data to long as you did. In your case you can add filter() to choose the desired countries. Then, reshape to long and design the plot. The key is setting color and group with Code in order to obtain the desired lines. You can set the colors using scale_color_manual() and I have left the facet option to get the title. Here the code:
library(plotly)
library(tidyverse)
#Code
plot_arable_land_CAN <- arable_land %>% select(-Entity) %>%
filter(Code %in% c('USA','CAN')) %>%
pivot_longer(-c(Code,Year)) %>%
ggplot(aes(x = Year, y = value,color=Code,group=Code)) +
geom_line() +
facet_wrap(.~name,scales = 'free_y') +
theme_light() +
theme(legend.position = 'none')+
scale_color_manual(values = c('brown','purple'))
#Transform
ggplotly(plot_arable_land_CAN)
Output:
I am missing some basics in R.
How do I make a plot for each column in a data frame?
I have tried making plots for each column separately. I was wondering if there was a easier way?
library(dplyr)
library(ggplot2)
data(economics)
#scatter plots
ggplot(economics,aes(x=pop,y=pce))+
geom_point()
ggplot(economics,aes(x=pop,y=psavert))+
geom_point()
ggplot(economics,aes(x=pop,y=uempmed))+
geom_point()
ggplot(economics,aes(x=pop,y=unemploy))+
geom_point()
#boxplots
ggplot(economics,aes(y=pce))+
geom_boxplot()
ggplot(economics,aes(y=pop))+
geom_boxplot()
ggplot(economics,aes(y=psavert))+
geom_boxplot()
ggplot(economics,aes(y=uempmed))+
geom_boxplot()
ggplot(economics,aes(y=unemploy))+
geom_boxplot()
All I'm looking for is having 1 box plot 2*2 and 1 2*2 scatter plot with ggplot2. I understand there is facet grid which I have failed to understand how to implement.(I believe this can be achieved easily with par(mfrow()) and base R plots. I saw somewhere else using using widening the data? which i didn't understand.
In cases like this the solution is almost always to reshape the data from wide to long format.
economics %>%
select(-date) %>%
tidyr::gather(variable, value, -pop) %>%
ggplot(aes(x = pop, y = value)) +
geom_point(size = 0.5) +
facet_wrap(~ variable, scales = "free_y")
economics %>%
tidyr::gather(variable, value, -date) %>%
ggplot(aes(y = value)) +
geom_boxplot() +
facet_wrap(~ variable, scales = "free_y")
I've seen a lot of people use facets to visualize data. I want to be able to run this on every column in my dataset and then have it grouped by some categorical value within each individual plot.
I've seen others use gather() to plot histogram or densities. I can do that ok, but I guess I fundamentally misunderstand how to use this technique.
I want to be able to do just what I have below - but when I have it grouped by a category. For example, histogram of every column but stacked by the value color. Or dual density plots of every column with these two lines of different colors.
I'd like this - but instead of clarity it is every single column like this...
library(tidyverse)
# what I want but clarity should be replaced with every column except FILL
ggplot(diamonds, aes(x = price, fill = color)) +
geom_histogram(position = 'stack') +
facet_wrap(clarity~.)
# it would look exactly like this, except it would have the fill value by a group.
gathered_data = gather(diamonds %>% select_if(is.numeric))
ggplot(gathered_data , aes(value)) +
geom_histogram() +
theme_classic() +
facet_wrap(~key, scales='free')
tidyr::gather needs four pieces:
1) data (in this case diamonds, passed through the pipe into the first parameter of gather below)
2) key
3) value
4) names of the columns that will be converted to key / value pairs.
gathered_data <- diamonds %>%
gather(key, value,
select_if(diamonds, is.numeric) %>% names())
It's not entirely clear what you are looking for. A picture of your expected output would have been much more illuminating than a description (not all of us are native English speakers...), but perhaps something like this?
diamonds %>%
rename(group = color) %>% # change this line to use another categorical
# column as the grouping variable
group_by(group) %>% # select grouping variable + all numeric variables
select_if(is.numeric) %>%
ungroup() %>%
tidyr::gather(key, value, -group) %>% # gather all numeric variables
ggplot(aes(x = value, fill = group)) +
geom_histogram(position = "stack") +
theme_classic() +
facet_wrap(~ key, scales = 'free')
# alternate example using geom density
diamonds %>%
rename(group = cut) %>%
group_by(group) %>%
select_if(is.numeric) %>%
ungroup() %>%
tidyr::gather(key, value, -group) %>%
ggplot(aes(x = value, color = group)) +
geom_density() +
theme_classic() +
facet_wrap(~ key, scales = 'free')