data = 110 columns( x1: x100) (numeric)
I am trying to plot all the columns using the following code,
x2=melt(x1)
ggplot(x2,aes(x = value)) +
facet_wrap(~variable,scales = "free_x") +
geom_histogram(aes=(density))
+title(sub = s, line = 5.5)
s<-summary(x1)
I'm also want to append summary data to every column in the plot for some analysis . Is there any alternate for this?
Thanks.
To my knowledge, you cannot add a subtitles to subplots of a facet_wrap. You could change the title to accomodate what you want though. Here is an example with min, max, mean and median :
data = data.frame(x1=rnorm(100),
x2=rnorm(100),
x3=rnorm(100),
x4=rnorm(100),
x5=rnorm(100),
x6=rnorm(100))
library(dplyr)
library(tidyr)
library(ggplot2)
x <- data %>%
gather(variable,value) %>%
group_by(variable) %>%
mutate(mean=mean(value),
median=median(value),
min=min(value),
max=max(value),
lab=paste(first(variable),"\nMin:",round(min,2),"Mean:",round(mean,2),"Median:",round(median,2),"Max:",round(max,2)))
ggplot(x,aes(x = value)) +
facet_wrap(~lab,scales = "free_x") +
geom_histogram(aes=(density)) +
geom_vline(aes(x=mean))
Related
I have a dataframe of Lots, Time, Value with the same structure as the sample data below.
df <- tibble(Lot = c(rep(123,4),rep(265,5),rep(132,3),rep(455,4)),
time = c(seq(4), seq(5), seq(3), seq(4)), Value = runif(16))
I'd like to split the dataframe by every N Lots and plot them. The Lots are different sizes so I can't subset the data by every n rows!
I've been using an approach like this but it's not scalable for a large dataset.
df %>% filter(Lot == c(123, 265)) %>% ggplot(., aes(x = time, y = Value)) +
geom_point() + stat_smooth()
How can I do this?
Create a lot number column and create a list of plots for every n unique lot values.
This would give you list of plots.
library(tidyverse)
lot_n <- 2
df %>%
mutate(Lot_number = match(Lot, unique(Lot)),
group = ceiling(Lot_number/lot_n)) %>%
group_split(group) %>%
map(~ggplot(.x, aes(x = time, y = Value)) +
geom_point() + stat_smooth()) -> list_plots
list_plots
Individual plots can be accessed via list_plots[[1]], list_plots[[2]] etc.
You can also plot the data with facets.
df %>%
mutate(Lot_number = match(Lot, unique(Lot)),
group = ceiling(Lot_number/lot_n)) %>%
ggplot(aes(x = time, y = Value)) +
geom_point() + stat_smooth() +
facet_wrap(~group, scales = 'free')
Consider the column "disp" in mtcars. I am trying to divide disp into intervals so that I can count the number of observations in each interval. After doing this I want to plot the results as a ggplot geom_line
This is what I have tried:
library (tidyverse)
library (ggplot2)
a1 <- mtcars %>% arrange(desc(disp)) %>%
mutate(counts = cut_interval(disp, length = 5)) %>% group_by(counts) %>% mutate(nn = n())
a2 <- a1 %>% select(counts,nn) %>% unique()
ggplot(a2, aes(counts, nn)) +
geom_point(shape = 16, size = 1, show.legend = FALSE) +
theme_bw()
I get the intervals I need in a2. i can use it to plot a scatterplot but I can see that there is no proper scale. Is there any way to use these intervals to get a continuous scale and draw a lineplot of counts vs nn?
mtcars %>% ggplot(aes(x = disp)) + geom_histogram(binwidth = 1) + theme_bw()
Thanks so much Rui Barradas! I just needed a count plot so no need of doing extra stuff.
I need help on setting the individual x-axis limits on different facets as described below.
A programmatical approach is preferred since I will apply the same template to different data sets.
first two facets will have the same x-axis limits (to have comparable bars)
the last facet's (performance) limits will be between 0 and 1, since it is calculated as a percentage
I have seen this and some other related questions but couldn't apply it to my data.
Thanks in advance.
df <-
data.frame(
call_reason = c("a","b","c","d"),
all_records = c(100,200,300,400),
problematic_records = c(80,60,100,80))
df <- df %>% mutate(performance = round(problematic_records/all_records, 2))
df
call_reason all_records problematic_records performance
a 100 80 0.80
b 200 60 0.30
c 300 100 0.33
d 400 80 0.20
df %>%
gather(key = facet_group, value = value, -call_reason) %>%
mutate(facet_group = factor(facet_group,
levels=c('all_records','problematic_records','performance'))) %>%
ggplot(aes(x=call_reason, y=value)) +
geom_bar(stat="identity") +
coord_flip() +
facet_grid(. ~ facet_group)
So here is one way to go about it with facet_grid(scales = "free_x"), in combination with a geom_blank(). Consider df to be your df at the moment before piping it into ggplot.
ggplot(df, aes(x=call_reason, y=value)) +
# geom_col is equivalent to geom_bar(stat = "identity")
geom_col() +
# geom_blank includes data for position scale training, but is not rendered
geom_blank(data = data.frame(
# value for first two facets is max, last facet is 1
value = c(rep(max(df$value), 2), 1),
# dummy category
call_reason = levels(df$call_reason)[1],
# distribute over facets
facet_group = levels(df$facet_group)
)) +
coord_flip() +
# scales are set to "free_x" to have them vary independently
# it doesn't really, since we've set a geom_blank
facet_grid(. ~ facet_group, scales = "free_x")
As long as your column names remain te same, this should work.
EDIT:
To reorder the call_reason variable, you could add the following in your pipe that goes into ggplot:
df %>%
gather(key = facet_group, value = value, -call_reason) %>%
mutate(facet_group = factor(facet_group,
levels=c('all_records','problematic_records','performance')),
# In particular the following bit:
call_reason = factor(call_reason, levels(call_reason)[order(value[facet_group == "performance"])]))
I can't quite figure this out. A CSV of 200+ rows assigned to data like so:
gid,bh,p1_id,p1_x,p1_y
90467,R,543333,80.184,98.824
90467,L,408045,74.086,90.923
90467,R,543333,57.629,103.797
90467,L,408045,58.589,95.937
Trying to group by p1_id and plot the mean values for p1_x and p1_y:
grp <- data %>% group_by(p1_id)
Trying to plot geom_point objects like so:
geom_point(aes(mean(grp$p1_x), mean(grp$p1_y), color=grp$p1_id))
But that isn't showing unique plot points per distinct p1_id values.
What's the missing step here?
Why not calculate the mean first?
library(dplyr)
grp <- data %>%
group_by(p1_id) %>%
summarise(mean_p1x = mean(p1_x),
mean_p1y = mean(p1_y))
Then plot:
library(ggplot2)
ggplot(grp, aes(x = mean_p1x, y = mean_p1y)) +
geom_point(aes(color = as.factor(p1_id)))
Edit: As per #eipi10, you can also pipe directly into ggplot
data %>%
group_by(p1_id) %>%
summarise(mean_p1x = mean(p1_x),
mean_p1y = mean(p1_y)) %>%
ggplot(aes(x = mean_p1x, y = mean_p1y)) +
geom_point(aes(color = as.factor(p1_id)))
Here is a snapshot of data:
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales, restaurant_change_labor, restaurant_change_POS)
I want two bars for each of the columns indicating the change. One graph for each of the columns.
I tried:
ggplot(aes(x = rest_change$restaurant_change_sales), data = rest_change) + geom_bar()
This is not giving the result the way I want. Please help!!
So ... something like:
library(ggplot2)
library(dplyr)
library(tidyr)
restaurant_change_sales = c(3330.443, 3122.534)
restaurant_change_labor = c(696.592, 624.841)
restaurant_change_POS = c(155.48, 139.27)
rest_change = data.frame(restaurant_change_sales,
restaurant_change_labor,
restaurant_change_POS)
cbind(rest_change,
change = c("Before", "After")) %>%
gather(key,value,-change) %>%
ggplot(aes(x = change,
y = value)) +
geom_bar(stat="identity") +
facet_grid(~key)
Which will produce:
Edit:
To be extra fancy e.g. make it so that the order of x-axis labels goes from "Before" to "After", you can add this line: scale_x_discrete(limits = c("Before", "After")) to the end of the ggplot function
Your data are not formatted properly to work well with ggplot2, or really any of the plotting packages in R. So we'll fix your data up first, and then use ggplot2 to plot it.
library(tidyr)
library(dplyr)
library(ggplot2)
# We need to differentiate between the values in the rows for them to make sense.
rest_change$category <- c('first val', 'second val')
# Now we use tidyr to reshape the data to the format that ggplot2 expects.
rc2 <- rest_change %>% gather(variable, value, -category)
rc2
# Now we can plot it.
# The category that we added goes along the x-axis, the values go along the y-axis.
# We want a bar chart and the value column contains absolute values, so no summation
# necessary, hence we use 'identity'.
# facet_grid() gives three miniplots within the image for each of the variables.
ggplot2(rc2, aes(x=category, y=value, facet=variable)) +
geom_bar(stat='identity') +
facet_grid(~variable)
You have to melt your data:
library(reshape2) # or library(data.table)
rest_change$rowN <- 1:nrow(rest_change)
rest_change <- melt(rest_change, id.var = "rowN")
ggplot(rest_change,aes(x = rowN, y = value)) + geom_bar(stat = "identity") + facet_wrap(~ variable)