How to distinguish individuals from groups in ggplot2 - r

I have panel data that that I want to visualize using ggplot2 such that each individual gets its own line and its color reflects the group that it is apart of. For example:
require(ggplot2)
set.seed(123)
frame <- data.frame(id = 1:6, month1 = sample(0:1, 6, replace = TRUE), month2 = sample(0:1, 6, replace = TRUE), month3 = sample(0:1, 6, replace = TRUE), group1 = rep(0:1, 3), group2 = rep(1:0, 3))
frame2 <- reshape(data = frame, direction = "long", idvar = "id", timevar = "time", varying = list(2:4))
ggplot(frame2, aes(x = time, y = month1, group = id, colour = id)) + geom_smooth()
In this plot, I would like each member of group1 to be red and each member of gruop2 to be blue and have each individual get its own line. Any idea on how to do this? Thanks.

You were close. You might consider jittering the lines as well if in your real application the Y axis variable is discrete.
ggplot(frame2, aes(x = time, y = month1, group = as.factor(id),
colour = as.factor(group2))) + geom_smooth()

Related

Issue on boxplot in R language

May I ask how can I distribute each of these four to two boxplots which contain the pulse meter of male and female.
islands = read.csv('Data.csv')
boxplot(islands$Pulse.meter.First..0m, islands$Pulse.meter.25m, islands$Pulse.meter.Second..0m, islands$Pulse.meter.25m.1)
Things like
boxplot(islands$Pulse.meter.25m ~ islands$Sex)
can distinguish them, but not working for four of them in the same time
before
Wanna boxplot like this
Here is an example using random data, since you hadn't provided data to download. The key is to first transform the data from the 'wide' format as you currently have the data, with a column per value, to a 'long' format, where all values are in the same column with an additional label column. Then the interaction function can be used to create an interaction between the pulse meter type and sex.
# example data with random values
islands <- data.frame(Sex = rep(c('Male', 'Female'), 15),
Pulse.meter.First..0m = rnorm(30, mean = 2),
Pulse.meter.25m = rnorm(30, mean = 1),
Pulse.meter.Second..0m = rnorm(30, mean = 3),
Pulse.meter.25m.1 = rnorm(30, mean = 4))
# reshape from wide to long
islands_long <- reshape(islands,
direction = "long",
varying = 2:5,
v.names = "value",
times = names(islands)[2:5],
timevar = 'measurement')
# plot the boxplot, 'cex.axis' decrease the font size so all the x-axis labels are visible
boxplot(value ~ interaction(Sex, measurement), data = islands_long, pars=list(cex.axis=0.5))
This generates:
library(ggplot2)
library(dplyr)
library(tidyverse)
df <- data.frame(
Gender = sample(c("Male", "Female"), 20, replace = TRUE),
Pulse.meter.First..0m = sample(10:60, 20, replace = FALSE),
Pulse.meter.25m = sample(30:60, 20, replace = FALSE),
Pulse.meter.Second..0m = sample(30:60, 20, replace = FALSE),
Pulse.meter.25m.1 = sample(10:60, 20, replace = FALSE)
)
df <- df %>%
group_by(Gender) %>%
pivot_longer(cols = Pulse.meter.First..0m:Pulse.meter.25m.1, names_to = "Pulse_meter", values_to = "Count") %>%
unite("Groups", Gender:Pulse_meter)
df$Groups <- factor(df$Groups, levels=c("Female_Pulse.meter.First..0m", "Male_Pulse.meter.First..0m",
"Female_Pulse.meter.25m","Male_Pulse.meter.25m",
"Female_Pulse.meter.Second..0m","Male_Pulse.meter.Second..0m",
"Female_Pulse.meter.25m.1","Male_Pulse.meter.25m.1"))
ggplot(data = df, aes(x= Groups, y = Count)) +
geom_boxplot() +
scale_x_discrete(labels=c("(F,0m)","(M,0m)","(F,25m)","(M,25m)", "(F,second_0m)", "(M,second_0m)",
"(F,25m.1)","(M,25m.1)")) +
labs(y="Counts") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

How to make and customize sections within the bars of a barplot created with ggpplot2?

I have a data frame structured like data created here:
set.seed(123)
data <- data.frame(Loc = paste("Loc", seq(1:20), sep = ""),
A = sample(c(0,15,20,25,40),size = 20,replace = T, prob = c(45,25,15,10,5)),
B = sample(c(0,15,20,25,40),size = 20,replace = T, prob = c(45,25,15,10,5)),
C = sample(c(0,15,20,25,40),size = 20,replace = T, prob = c(45,25,15,10,5))
)
data$D <- 100-(data[,2]+data[,3]+data[,4])
data$total <- sample(c(10:20), replace = T, length(data[,1]))
Here, Loc is a grouping variable with 20 levels. Each Loc represents a locations from which samples were taken (the actual "samples" are not here). A, B, and C and D represent clusters that observations were assigned to. The associated values for each Loc that are in the columns A, B, and C and D represent the percentage of observations from each Loc that were assigned to each cluster. The total column represents the total number of observations that were taken from each Loc. For instance, there were 14 observations for Loc1 25% of those observations were assigned to cluster B, and 75% were assigned to cluster D.
I have made a bar plot that shows Loc on the x-axis and total on the y-axis. Assuming each cluster will be given a unique "color", I am trying to color the bars in such a way that for a given Loc the colors will represent the percentage of observations that were assigned to each cluster. For instance, say cluster B is yellow and cluster D is blue, then the bar for Loc1 will be 25% yellow and 75% blue.
I have tried several variants of this:
library(tidyverse)
data%>%
pivot_longer(-c(Loc,total), names_to= "Group", values_to = "val")%>%
ggplot(., aes(x=Loc, y=total, col = Group))+
geom_bar(stat = "identity", aes(fill = val))+
geom_text(aes(label = total))
Which produces this:
Which is close, but not what I want. How can I make this kind of plot? if possible, I would also like to move the value for total to the top of each bar, and the percentage associated with each respective color to be in the center of that "color" or "cluster's" section within each bar.
Try this. I added a variable with the numbers by group.
library(ggplot2)
library(dplyr)
library(tidyr)
set.seed(123)
data <- data.frame(Loc = paste("Loc", seq(1:20), sep = ""),
A = sample(c(0,15,20,25,40),size = 20,replace = T, prob = c(45,25,15,10,5)),
B = sample(c(0,15,20,25,40),size = 20,replace = T, prob = c(45,25,15,10,5)),
C = sample(c(0,15,20,25,40),size = 20,replace = T, prob = c(45,25,15,10,5))
)
data$D <- 100-(data[,2]+data[,3]+data[,4])
data$total <- sample(c(10:20), replace = T, length(data[,1]))
data1 <- data %>%
pivot_longer(-c(Loc,total), names_to= "Group", values_to = "val") %>%
# Number per Group
mutate(val1 = val * total / 100)
data1 %>%
# Map val1 on y, Group on fill
ggplot(., aes(x=Loc, y=val1, fill = Group))+
geom_bar(stat = "identity")+
# Make label only for the first group. Here: A
geom_text(aes(y = total, label = ifelse(Group == "A", total, "")), nudge_y = 1, size = 3) +
# Add percentages
geom_text(aes(y = val1,
label = ifelse(val > 0, scales::percent(val, scale = 1, accuracy = 1), "")),
position = position_stack(vjust = .5), size = 3)
You can try this:
library(reshape2)
library(tidyverse)
#Format Loc
data$Loc <- factor(data$Loc,levels = paste0('Loc',1:dim(data)[1]),ordered = T)
#Melt
df <- melt(data,id.vars = c('Loc','total'))
#Create label
df$Label <- ifelse(df$value==0,NA,paste0(df$value,'%'))
#Plot
ggplot(df,aes(x=Loc,y=value,color=variable,group=variable,label=Label,fill=variable))+
geom_bar(stat='identity')+
geom_text(position=position_stack(vjust=0.5),color='black')+
geom_text(inherit.aes = FALSE, data = data,
aes(x = Loc, y = 100, label = total), vjust = -0.25)

ggplot missing plot with x-axis factor

The following works fine:
my_df <- data.frame(x_val = 1:10, y_val = sample(1:20,10),
labels = sample(c("a", "b"), 10, replace = T))
ggplot(data = my_df, aes(x = x_val, y = y_val)) + geom_line()
but if I chance x_val to factor, I am getting blank plot and message:
my_df <- data.frame(x_val = 1:10, y_val = sample(1:20,10),
labels = sample(c("a", "b"), 10, replace = T))
my_df$x_val <- as.factor(my_df$x_val)
ggplot(data = my_df, aes(x = x_val, y = y_val)) + geom_line()
message:
geom_path: Each group consists of only one observation. Do you
need to adjust the group aesthetic?
I can obviously drop factor conversion, but I need it in order to replace labels of x axis with scale_x_discrete(breaks = 1:10,labels= my_df$labels). Here is where I borrowed it link
Any thoughts?
Can you just leave x_val as numeric and use scale_x_continuous(breaks = 1:10,labels= my_df$labels) instead?

How to control legend with many groups

I have a plot like this:
Which was created with this code:
# Make data:
set.seed(42)
n <- 1000
df <- data.frame(values = sample(0:5, size = n, replace = T, prob = c(9/10, rep(0.0167,5))),
group = rep(1:100, each = 10),
fill2 = rep(rnorm(10), each = 100),
year = rep(2001:2010, times = 100)
)
df$values <- ifelse(df$year %in% 2001:2007 == T, 0, df$values)
# Plot
require(ggplot2)
p <- ggplot(data = df, aes(x = year, y = values, colour = as.factor(group))) + geom_line()
p
Since there are so many groups, the legend is really not helpfull.
Ideally I would like just two elements in the legend, one for group = 1 and for all the other groups (they should all have the same color). Is there a way to force this?
you can define a new variable that has only two values, but still plot lines according to their original group,
ggplot(data = df, aes(x = year, y = values, group = group,
colour = ifelse(group == 1, "1", "!1"))) +
geom_line() +
scale_colour_brewer("groups", palette="Set1")

Unfilled area in ggplot geom_area

I'm trying to do a plot with ggplot2 and geom_area. The fill is set by a variable. For some reason, only the 'outer' groups are filled. I can't figure out how to get the inner regions filled as well.
The same problem seems to occur here but no answer is given on how to solve it.
Below is an minimal example of the code i'm using and the resulting plot:
I'm using R 3.3 and ggplot2_2.1.0
Any help would be appreciated.
df <- data.frame(month = seq(from = as.Date("2016-01-01"), to = as.Date("2016-12-31"), by = "month"),
type = c(rep("past", times = 5), "current", rep("future", times = 6)),
amount = c(seq(from = 100, to = 1200, by = 100)))
df$type <- factor(df$type, levels = c("past", "current", "future"))
ggplot(data = df, aes(x = month, y = amount, fill = type)) +
geom_area()
I added 2 points in time arround the "current" value in order to produce an area. The problem is that with only one point no area can be drawn.
library(ggplot2)
df <- data.frame(month = seq(from = as.Date("2016-01-01"), to = as.Date("2016-12-31"), by = "month"),
type = c(rep("past", times = 5), "current", rep("future", times = 6)),
amount = c(seq(from = 100, to = 1200, by = 100)))
df <- rbind(df[1:5, ],
data.frame(month = as.Date(c("2016-05-15", "2016-06-15")),
type = c("current", "current"),
amount = c(550, 650)),
df[7:12, ])
df$type <- factor(df$type, levels = c("past", "current", "future"))
ggplot(data = df, aes(x = month, y = amount, fill = type)) +
geom_area()

Resources