Splitting a dataframe by every n unique values of a variable - r

I have a dataframe of Lots, Time, Value with the same structure as the sample data below.
df <- tibble(Lot = c(rep(123,4),rep(265,5),rep(132,3),rep(455,4)),
time = c(seq(4), seq(5), seq(3), seq(4)), Value = runif(16))
I'd like to split the dataframe by every N Lots and plot them. The Lots are different sizes so I can't subset the data by every n rows!
I've been using an approach like this but it's not scalable for a large dataset.
df %>% filter(Lot == c(123, 265)) %>% ggplot(., aes(x = time, y = Value)) +
geom_point() + stat_smooth()
How can I do this?

Create a lot number column and create a list of plots for every n unique lot values.
This would give you list of plots.
library(tidyverse)
lot_n <- 2
df %>%
mutate(Lot_number = match(Lot, unique(Lot)),
group = ceiling(Lot_number/lot_n)) %>%
group_split(group) %>%
map(~ggplot(.x, aes(x = time, y = Value)) +
geom_point() + stat_smooth()) -> list_plots
list_plots
Individual plots can be accessed via list_plots[[1]], list_plots[[2]] etc.
You can also plot the data with facets.
df %>%
mutate(Lot_number = match(Lot, unique(Lot)),
group = ceiling(Lot_number/lot_n)) %>%
ggplot(aes(x = time, y = Value)) +
geom_point() + stat_smooth() +
facet_wrap(~group, scales = 'free')

Related

Converting a factor into a string column in a dataset

I am trying to plot the Id column with some other variables which I have managed to do with geom_col but when my plot is retrieved I can see that R is taking the column "Id" as a factor or number and I am not getting the results I am looking for, here the graph:
How can I convert the column into a string so that it actually allow me to plot all the users that participated in the survey which are 33? Here is where I'm coming from:
activity_distance <-
merged_activity_calories %>%
group_by(Id) %>%
summarise(
mean_activity_distance= mean(VeryActiveDistance),
mean_ma_distance= mean(ModeratelyActiveDistance),
mean_la_distance= mean(LightActiveDistance),
mean_sa_distance= mean(SedentaryActiveDistance)
)
ggplot(data= activity_distance) +
geom_col(mapping= aes(x=Id , y= mean_activity_distance))
You can convert ID into a factor.
library(ggplot)
df <- data.frame(id = c(1232121321321321,123213213213,123213213213213),
y = c(123,234,22.4))
ggplot(df) +
geom_col(mapping = aes(x = (id), y = y))
ggplot(df) +
geom_col(mapping = aes(x = factor(id), y = y))

Method of ordering groups in ggplot line plot

I have created a plot with the following code:
df %>%
mutate(vars = factor(vars, levels = reord)) %>%
ggplot(aes(x = EI1, y = vars, group = groups)) +
geom_line(aes(color=groups)) +
geom_point() +
xlab("EI1 (Expected Influence with Neighbor)") +
ylab("Variables")
The result is:
While the ei1_other group is in descending order on x, the ei1_gun points are ordered by variables. I would like both groups to follow the same order, such that ei1_gun and ei1_other both start at Drugs and then descend in order of the variables, rather than descending by order of x values.
The issue is that the order by which geom_line connects the points is determined by the value on the x-axis. To solve this issue simply swap x and y and make use of coord_flip.
As no sample dataset was provided I use an example dataset based on mtcars to illustrate the issue and the solution. In my example data make is your vars, value your EI1 and name your groups:
library(ggplot2)
library(dplyr)
library(tidyr)
library(forcats)
example_data <- mtcars %>%
mutate(make = row.names(.)) %>%
select(make, hp, mpg) %>%
mutate(make = fct_reorder(make, hp)) %>%
pivot_longer(-make)
Mapping make on x and value on y results in an unordered line plot as in you example. The reason is that the order by which the points get connected is determined by value:
example_data %>%
ggplot(aes(x = value, y = make, color = name, group = name)) +
geom_line() +
geom_point() +
xlab("EI1 (Expected Influence with Neighbor)") +
ylab("Variables")
In contrast, swapping x and y, i.e. mapping make on x and value on y, and making use of coord_flip gives a nice ordererd line plot as the order by which the points get connected is now determined by make (of course we also have to swap xlab and ylab):
example_data %>%
ggplot(aes(x = make, y = value, color = name, group = name)) +
geom_line() +
geom_point() +
coord_flip() +
ylab("EI1 (Expected Influence with Neighbor)") +
xlab("Variables")

Plots for two variables within a group about each other with ggplot2

I want to use ggplot2 to plot two variables for multiple (in the example below: 4) individuals. Now I want that for every individual, the graphs for the two variables are about each other.
Example data:
da = data.frame(id = c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), day = c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4), var1= c(3,4,2,1,2,2,2,3,4,4,5,3,2,1,2,3), var2 = c(1,1,1,2,2,2,1,2,2,1,2,1,1,1,1,2))
I can do the plots for the two variables separately:
da %>% ggplot(aes(x= day, y = var1)) + geom_line()+ facet_wrap(~id, nrow = 2)
da %>% ggplot(aes(x= day, y = var2)) + geom_line()+ facet_wrap(~id, nrow = 2)
I get two separate plots:
But what I want is this (...I moved the plots with Paint to show you what I need):
Try pivoting to longer:
library(tidyverse)
da %>%
pivot_longer(var1:var2) %>%
ggplot(aes(x = day, y = value)) + geom_line() + facet_grid(name ~ id)
I would suggest an approach using patchwork where you can arrange your plots as you desire. The solution of #arg0naut91 is a great way to tackle the issue but if you want to place plots without faceting you can use next code:
library(ggplot2)
library(tidyverse)
library(patchwork)
#Data
da = data.frame(id = c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4),
day = c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4),
var1= c(3,4,2,1,2,2,2,3,4,4,5,3,2,1,2,3),
var2 = c(1,1,1,2,2,2,1,2,2,1,2,1,1,1,1,2))
#Plots
G1 <- da %>% ggplot(aes(x= day, y = var1)) + geom_line()+ facet_wrap(~id, nrow = 1)
G2 <- da %>% ggplot(aes(x= day, y = var2)) + geom_line()+ facet_wrap(~id, nrow = 1)
#Bind plots
G1/G2
wrap_plots(G1,G2,ncol = 1)
Output:

Using ggplot to group in two different ways

I have data that looks kinda like this:
id = rep(1:33,3)
year = rep(1:3,33)
group = sample(c(1:3),99, replace=T)
test_result = sample(c(TRUE,FALSE), size=99, replace = T)
df = data.frame(id, year, group, test_result)
df$year = as.factor(year)
df$group = as.factor(group)
My goal is to visualize it so that I can see how group number and year relate to test_result.
df %>%
group_by(id,year) %>%
summarize(x=sum(test_result)) %>%
ggplot() +
geom_histogram(aes(fill = year,
x = x),
binwidth = 1,
position='dodge') +
theme_minimal()
gets me almost all the way there. What I want is to be able to add something like facet_wrap(group~.) to the end of this to show how these change by group but obviously group is not part of the aggregated dataframe.
Right now my best solution is just to show multiple plots like
df %>% filter(group==1) # Replace group number here
group_by(id,year) %>%
summarize(x=sum(test_result)) %>%
ggplot() +
geom_histogram(aes(fill = year,
x = x),
binwidth = 1,
position='dodge') +
theme_minimal()
but I'd love to figure out how to put them all in one figure and I'm wondering if maybe the way to do that is to put more of the grouping logic into ggplot?

R: using ggplot2 with a group_by data set

I can't quite figure this out. A CSV of 200+ rows assigned to data like so:
gid,bh,p1_id,p1_x,p1_y
90467,R,543333,80.184,98.824
90467,L,408045,74.086,90.923
90467,R,543333,57.629,103.797
90467,L,408045,58.589,95.937
Trying to group by p1_id and plot the mean values for p1_x and p1_y:
grp <- data %>% group_by(p1_id)
Trying to plot geom_point objects like so:
geom_point(aes(mean(grp$p1_x), mean(grp$p1_y), color=grp$p1_id))
But that isn't showing unique plot points per distinct p1_id values.
What's the missing step here?
Why not calculate the mean first?
library(dplyr)
grp <- data %>%
group_by(p1_id) %>%
summarise(mean_p1x = mean(p1_x),
mean_p1y = mean(p1_y))
Then plot:
library(ggplot2)
ggplot(grp, aes(x = mean_p1x, y = mean_p1y)) +
geom_point(aes(color = as.factor(p1_id)))
Edit: As per #eipi10, you can also pipe directly into ggplot
data %>%
group_by(p1_id) %>%
summarise(mean_p1x = mean(p1_x),
mean_p1y = mean(p1_y)) %>%
ggplot(aes(x = mean_p1x, y = mean_p1y)) +
geom_point(aes(color = as.factor(p1_id)))

Resources