I am trying to run simulations in R using the tidyverse. This code works, but doesn't scale well to more than a few variables.
Any thoughts on how to improve this? I've tried purrr but I didn't find any success.
The example below draws 5 values from a normal distribution and repeats this 3 times. How could I repeat it n times instead of 3?
n = 5
x=1:n
y1 = rnorm(n)
y2 = rnorm(n)
y3 = rnorm(n)
# put data into tibble
df <- tibble(x=x, y1=y1, y2=y2, y3=y3)
# Tidy data -- go from wide to long
df <- pivot_longer(df, cols=starts_with('y'))
# Make plot
ggplot(df, aes(x=x, y=value, group=name, color=name))+
geom_line()
If we need to replicate, then
library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)
n <- 5
rpl <- 3
replicate(rpl, rnorm(n), simplify = FALSE) %>%
set_names(str_c('y', seq_along(.))) %>%
as_tibble %>%
mutate(x = row_number()) %>%
pivot_longer(cols = starts_with('y')) %>%
ggplot(aes(x=x, y=value, group=name, color=name))+
geom_line()
I'm trying to plot a bar plot geom_bar() with ggplot, having a variable on x-axis and its count on Y-axis and the third to be as a legend and inside each legend the percentage of that legend from the count of Y-axis.
I'm using the built-in mtcars dataset as a reprex.
library (ggplot2)
ggplot(data = mtcars, aes(x=cyl, y= ..count..)) +
geom_bar(aes(fill = factor(gear)))
I would like to have these percentages inside each legend, in a decent way if possible.
prop.table(table(eight$gear))
3 5
0.8571429 0.1428571
prop.table(table(six$gear))
3 4 5
0.2857143 0.5714286 0.1428571
prop.table(table(four$gear))
3 4 5
0.09090909 0.72727273 0.18181818
The documentation for this is listed here
But I reached the point where I can't put y as labels since I get it from a ggplot special start variable.
I'm sorry I can't upload images, since I'm a newbie having no sufficient reputation to post one, but running that mtcars code will generate what I'm asking for.
I'd probably format the % labels separately and them add them as a second data frame to the chart.
Formatting using dplyr:
library(dplyr)
legendLabels <- mtcars %>%
group_by(cyl, gear) %>%
summarise(count = n()) %>%
arrange(cyl, desc(gear)) %>%
mutate(percent = round(count/sum(count)*100,2),
yPos = cumsum(count))
And the updated chart:
ggplot(data = mtcars, aes(x=cyl, y= ..count..)) +
geom_bar(aes(fill = factor(gear))) +
geom_text(data = legendLabels, aes(x=cyl, y=yPos-0.5, label=paste0(percent,"%")))
Giving this chart:
Hopefully this is what you're looking for.
I want to do a scatter (xy) plot of variables in a melted data frame as shown below.
df
class var mean
0 x 4.25
0 y 6.25
1 x 2.00
1 y 11.00
I have tried this, but it plots 4 points. How can plot x and y?
library(ggplot2)
ggplot(df, aes(x=mean, y=mean, group=var, colour=class)) +
geom_point( size=5, shape=21, fill="white")
As Heroka pointed out, you need the data to be in a more wide type format. If the data was read in like this, you may use the following to convert it.
## you don't need this since you already have df
text = "class var mean
0 x 4.25
0 y 6.25
1 x 2.00
1 y 11.00"
df = read.delim(textConnection(text),header=TRUE,strip.white=TRUE,
stringsAsFactors = FALSE, sep = " ");df2
## use this library to switch from long-wide
library(reshape2)
df2 = dcast(df, class ~ var, value.var = "mean")
library(ggplot2)
ggplot(df2, aes(x=x, y=y, colour=class)) +
geom_point( size=5, shape=21, fill="white")
I'm using ggplot with facet_wrap to generate 3 side-by-side plots with linear models. In addition, I have another dimension (let's call it "z") I'd like to visualize by varying the size of the points on the plots.
Currently, the plots I generate keep the size of the points on the same scale across all 3 facets. I would instead like to scale the point sizes by facet - that way, one can quickly tell which point contains the highest "z" value for each facet.
Is there any way to do this without creating 3 separate plots? I've included a sample of my data and the code I used below:
x <- c(0.03,1.32,2.61,3.90,5.20,6.48,7.77,0.75,2.04,3.33,4.62,5.91,7.20,8.49,0.41,1.70,3.00,4.28,5.57,6.86,8.15)
y <- c(650,526,382,110,72,209,60,559,296,76,48,64,20,22,50,102,176,21,20,25,5)
z <- c(391174,244856,836435,46282,40351,27118,17411,26232,59162,9737,1917,20575,1484,450,12071,13689,133326,1662,711,728,412)
facet <- c("A","A","A","A","A","A","A","B","B","B","B","B","B","B","C","C","C","C","C","C","C")
df <- data.frame(x,y,z,facet)
ggplot(df, aes(x=x, y=y)) +
geom_point(aes(size=z)) +
geom_smooth(method="lm") +
facet_wrap(~facet)
The method below reassigns z to it's z-score within it's facet:
require(dplyr)
require(ggplot)
require(magrittr)
require(scales)
x <- c(0.03,1.32,2.61,3.90,5.20,6.48,7.77,0.75,2.04,3.33,4.62,5.91,7.20,8.49,0.41,1.70,3.00,4.28,5.57,6.86,8.15)
y <- c(650,526,382,110,72,209,60,559,296,76,48,64,20,22,50,102,176,21,20,25,5)
z <- c(391174,244856,836435,46282,40351,27118,17411,26232,59162,9737,1917,20575,1484,450,12071,13689,133326,1662,711,728,412)
facet <- c("A","A","A","A","A","A","A","B","B","B","B","B","B","B","C","C","C","C","C","C","C")
df <- data.frame(x,y,z,facet)
df %<>%
group_by(facet) %>%
mutate(z = scale(z)) # calculate point size within group
ggplot(df, aes(x=x, y=y, group = facet)) +
geom_point(aes(size=z)) +
geom_smooth(method="lm") +
facet_wrap(~facet )
Try to rescale size for each facet to take values in (0,1]:
df %>%
group_by(facet) %>%
mutate(newz = z/max(z)) %>%
ggplot(., aes(x=x, y=y)) +
geom_point(aes(size=newz)) +
geom_smooth(method="lm") +
facet_wrap(~facet)
I would just take the mean of the df$z by each df$facet
AverageFacet <- df %>% group_by(facet) %>% summarize(meanwithinfacet= mean(z, na.rm=TRUE))
df <- merge(df, AverageFacet)
df$pointsize<- df$z - df$meanwithinfacet
Now each point size depends on the mean of the facets
> head(df,10)
facet x y z meanwithinfacet pointsize
1 A 0.03 650 391174 229089.57 162084.429
2 A 1.32 526 244856 229089.57 15766.429
3 A 2.61 382 836435 229089.57 607345.429
4 A 3.90 110 46282 229089.57 -182807.571
5 A 5.20 72 40351 229089.57 -188738.571
6 A 6.48 209 27118 229089.57 -201971.571
7 A 7.77 60 17411 229089.57 -211678.571
8 B 0.75 559 26232 17079.57 9152.429
9 B 2.04 296 59162 17079.57 42082.429
and plot
ggplot(df, aes(x=x, y=y)) +
geom_point(aes(size=pointsize)) +
geom_smooth(method="lm") +
facet_wrap(~facet)
Looks like this, not sure about the legend though.
You could also instead of using the absolute difference from the mean use the how many standard deviates from the mean a given z is
AverageFacet <- df %>% group_by(facet) %>% summarize(meanwithinfacet= mean(z, na.rm=TRUE), sdwithinfacet= sd(z, na.rm=TRUE))
df <- merge(df, AverageFacet)
df$absoluteDiff<- df$z - df$meanwithinfacet
df$SDfromMean <- df$absoluteDiff / df$sdwithinfacet
ggplot(df, aes(x=x, y=y)) +
geom_point(aes(size=SDfromMean)) +
geom_smooth(method="lm") +
facet_wrap(~facet)