Issue on boxplot in R language

Issue on boxplot in R language - r

May I ask how can I distribute each of these four to two boxplots which contain the pulse meter of male and female.
islands = read.csv('Data.csv')
boxplot(islands$Pulse.meter.First..0m, islands$Pulse.meter.25m, islands$Pulse.meter.Second..0m, islands$Pulse.meter.25m.1)
Things like
boxplot(islands$Pulse.meter.25m ~ islands$Sex)
can distinguish them, but not working for four of them in the same time
before
Wanna boxplot like this

Here is an example using random data, since you hadn't provided data to download. The key is to first transform the data from the 'wide' format as you currently have the data, with a column per value, to a 'long' format, where all values are in the same column with an additional label column. Then the interaction function can be used to create an interaction between the pulse meter type and sex.
# example data with random values
islands <- data.frame(Sex = rep(c('Male', 'Female'), 15),
Pulse.meter.First..0m = rnorm(30, mean = 2),
Pulse.meter.25m = rnorm(30, mean = 1),
Pulse.meter.Second..0m = rnorm(30, mean = 3),
Pulse.meter.25m.1 = rnorm(30, mean = 4))
# reshape from wide to long
islands_long <- reshape(islands,
direction = "long",
varying = 2:5,
v.names = "value",
times = names(islands)[2:5],
timevar = 'measurement')
# plot the boxplot, 'cex.axis' decrease the font size so all the x-axis labels are visible
boxplot(value ~ interaction(Sex, measurement), data = islands_long, pars=list(cex.axis=0.5))
This generates:

library(ggplot2)
library(dplyr)
library(tidyverse)
df <- data.frame(
Gender = sample(c("Male", "Female"), 20, replace = TRUE),
Pulse.meter.First..0m = sample(10:60, 20, replace = FALSE),
Pulse.meter.25m = sample(30:60, 20, replace = FALSE),
Pulse.meter.Second..0m = sample(30:60, 20, replace = FALSE),
Pulse.meter.25m.1 = sample(10:60, 20, replace = FALSE)
)
df <- df %>%
group_by(Gender) %>%
pivot_longer(cols = Pulse.meter.First..0m:Pulse.meter.25m.1, names_to = "Pulse_meter", values_to = "Count") %>%
unite("Groups", Gender:Pulse_meter)
df$Groups <- factor(df$Groups, levels=c("Female_Pulse.meter.First..0m", "Male_Pulse.meter.First..0m",
"Female_Pulse.meter.25m","Male_Pulse.meter.25m",
"Female_Pulse.meter.Second..0m","Male_Pulse.meter.Second..0m",
"Female_Pulse.meter.25m.1","Male_Pulse.meter.25m.1"))
ggplot(data = df, aes(x= Groups, y = Count)) +
geom_boxplot() +
scale_x_discrete(labels=c("(F,0m)","(M,0m)","(F,25m)","(M,25m)", "(F,second_0m)", "(M,second_0m)",
"(F,25m.1)","(M,25m.1)")) +
labs(y="Counts") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

Related

R Plotting three timeseries in two facet_grids in ggplot

Is it possible to plot three timeseries in only two grids using ggplot and facet_grid()?
# Create some fake data
stock1 = cumprod(1+c(0, rnorm(99, 0, .05)))
stock2 = cumprod(1+c(0, rnorm(99, 0, .075)))
indicator = sample(1:50, 100, replace = TRUE)
date_seq = seq.Date(as.Date("2023-01-01"), length.out = 100, by = 1)
df = data.frame(date = date_seq, stock1 = stock1, stock2 = stock2, indicator = indicator)
Now I would like to see an upper graph with the two stocks and one lower graph with the indicator using facet_grid().
The only result I get is a three-grid plot
grid_df = pivot_longer(df, c(stock1, stock2, indicator), names_to = "underlying", values_to = "values")
ggplot(grid_df, aes(x = date, y = values, colour = underlying)) +
geom_line() +
facet_grid(vars(underlying), scales = "free")
I dont know how to group the two stocks to bring them into one grid.
Thanks for help!

You could add an extra column to your longer format data where you could combine the stocks 1 and 2 to one string called stocks and leave the indicator alone using an ifelse to assign them to the facet_grid like this:
library(ggplot2)
library(dplyr)
library(tidyr)
grid_df = pivot_longer(df, c(stock1, stock2, indicator), names_to = "underlying", values_to = "values") %>%
mutate(grids = ifelse(underlying == "indicator", "indicator", "stock"))
ggplot(grid_df, aes(x = date, y = values, colour = underlying)) +
geom_line() +
facet_grid(vars(grids), scales = "free")
Created on 2023-02-19 with reprex v2.0.2

Is it possible to change a line from a ggplot/geom_line plot depending on what month the datapoint corresponds to?

Would like to know if it is possible to make a geom_line be red when it's between sept-feb and blue for the rest of the months?

Yes, it's possible. The easiest way to do it by creating a vector of your colors, the same length as the rows in your dataframe, and passing it to the col argument in geom_line().
Here is an example:
library(dplyr, warn.conflicts = FALSE)
library(ggplot2)
library(lubridate, warn.conflicts = FALSE)
# create some dates and values
df <- tibble(
date = today() + seq(from = -365, to = 0, by = 30),
value = runif(length(date), min = 300, max = 800)
)
df %>%
ggplot(aes(x = date, y = value)) +
geom_line(
col = ifelse(month(df$date) %>% between(3, 8), "blue", "red")
) +
geom_point()

How to plot line graph of normalized differences from binned data with ggplot?

I have several sets of data that I calculate binned normalized differences for. The results I want to plot within a single line plot using ggplot. The lines representing different combinations of the paired differences are supposed to be distinguished by colors and line types.
I am stuck on taking the computed values from the bins (would be y-axis values now), and plotting these onto an x-axis.
Below is the code I use for importing the data and calculating the normalized differences.
# Read data from column 3 as data table for different number of rows
# you could use replicate here for test
# dat1 <- data.frame(replicate(1,sample(25:50,10000,rep=TRUE)))
# dat2 <- data.frame(replicate(1,sample(25:50,9500,rep=TRUE)))
dat1 <- fread("/dir01/a/dat01.txt", header = FALSE, data.table=FALSE, select=c(3))
dat2 <- fread("/dir02/c/dat02.txt", header = FALSE, data.table=FALSE, select=c(3))
# Change column names
colnames(dat1) <- c("Dat1")
colnames(dat2) <- c("Dat2")
# Perhaps there is a better way to compute the following as all-in-one? I have broken these down step by step.
# 1) Sum for each bin
bin1 = cut(dat1$Dat1, breaks = seq(25, 50, by = 2))
sum1 = tapply(dat1$Dat1, bin1, sum)
bin2 = cut(dat2$Dat2, breaks = seq(25, 50, by = 2))
sum2 = tapply(dat2$Dat2, bin2, sum)
# 2) Total sum of all bins
sumt1 = sum(sum1)
sumt2 = sum(sum2)
# 3) Divide each bin by total sum of all bins
sumn1 = lapply(sum1, `/`, sumt1)
sumn2 = lapply(sum2, `/`, sumt2)
# 4) Convert to data frame as I'm not sure how to difference otherwise
df_sumn1 = data.frame(sumn1)
df_sumn2 = data.frame(sumn2)
# 5) Difference between the two as percentage
dbin = (df_sumn1 - df_sumn2)*100
How can I plot those results using ggplot() and geom_line()?
I want
dbin values on the x-axis ranging from 25-50
different colors and line types for the lines
Here is what I tried:
p1 <- ggplot(dbin, aes(x = ?, color=Data, linetype=Data)) +
geom_line() +
scale_linetype_manual(values=c("solid")) +
scale_x_continuous(limits = c(25, 50)) +
scale_color_manual(values = c("#000000"))
dput(dbin) outputs:
structure(list(X.25.27. = -0.0729132928804117, X.27.29. = -0.119044772581772,
X.29.31. = 0.316016473225017, X.31.33. = -0.292812782147632,
X.33.35. = 0.0776336591308158, X.35.37. = 0.0205584754637611,
X.37.39. = -0.300768421159599, X.39.41. = -0.403235174844081,
X.41.43. = 0.392510458816457, X.43.45. = 0.686758883448307,
X.45.47. = -0.25387105113263, X.47.49. = -0.0508324553382303), class = "data.frame", row.names = c(NA,
-1L))
Edit
The final piece of code that works, using only the dbin and plots multiple dbins:
dat1 <- data.frame(a = replicate(1,sample(25:50,10000,rep=TRUE, prob = 25:0/100)))
dat2 <- data.frame(a = replicate(1,sample(25:50,9500,rep=TRUE, prob = 0:25/100)))
dat3 <- data.frame(a = replicate(1,sample(25:50,9500,rep=TRUE, prob = 12:37/100)))
dat4 <- data.frame(a = replicate(1,sample(25:50,9500,rep=TRUE, prob = 37:12/100)))
calc_bin_props <- function(data) {
as_tibble(data) %>%
mutate(bin = cut(a, breaks = seq(25, 50, by = 2))) %>%
group_by(bin) %>%
summarise(sum = sum(a), .groups = "drop") %>%
filter(!is.na(bin)) %>%
ungroup() %>%
mutate(sum = sum / sum(sum))
}
diff_data <-
full_join(
calc_bin_props(data = dat1),
calc_bin_props(dat2),
by = "bin") %>%
separate(bin, c("trsh", "bin", "trshb", "trshc")) %>%
mutate(dbinA = (sum.x - sum.y * 100)) %>%
select(-starts_with("trsh"))
diff_data2 <-
full_join(
calc_bin_props(data = dat3),
calc_bin_props(dat4),
by = "bin") %>%
separate(bin, c("trsh", "bin", "trshb", "trshc")) %>%
mutate(dbinB = (sum.x - sum.y * 100)) %>%
select(-starts_with("trsh"))
# Combine two differences, and remove sum.x and sum.y
full_data <- cbind(diff_data, diff_data2[,4])
full_data <- full_data[,-c(2:3)]
# Melt the data to plot more than 1 variable on a plot
m <- melt(full_data, id.vars="bin")
theme_update(plot.title = element_text(hjust = 0.5))
ggplot(m, aes(as.numeric(bin), value, col=variable, linetype = variable)) +
geom_line() +
scale_linetype_manual(values=c("solid", "longdash")) +
scale_color_manual(values = c("black", "black"))
dev.off()

library(tidyverse)
Creating example data as shown in question, but adding different probabilities to the two sample() calls, to create so visible difference
between the two sets of randomized data.
dat1 <- data.frame(a = replicate(1,sample(25:50,10000,rep=TRUE, prob = 25:0/100))) %>% as_tibble()
dat2 <- data.frame(a = replicate(1,sample(25:50,9500,rep=TRUE, prob = 0:25/100))) %>% as_tibble()
Using dplyr we can handle this within data.frames (tibbles) without
the need to switch to other datatypes.
Let’s define a function that can be applied to both datasets to get
the preprocessing done.
We use base::cut() to create
a new column that pairs each value with its bin. We then group the data
by bin, calculate the sum for each bin and finally divide the bin sums
by the total sum.
calc_bin_props <- function(data) {
as_tibble(data) %>%
mutate(bin = cut(a, breaks = seq(25, 50, by = 2), labels = seq(25, 48, by = 2))) %>%
group_by(bin) %>%
summarise(sum = sum(a), .groups = "drop") %>%
filter(!is.na(bin)) %>%
ungroup() %>%
mutate(sum = sum / sum(sum))
}
Now we call calc_bin_props() on both datasets and join them by bin.
This gives us a dataframe with the columns bin, sum.x and sum.y.
The latter two are correspond to the bin sums derived from dat1 and
dat2. With the mutate() line we calculate the differences between the
two columns.
diff_data <-
full_join(
calc_bin_props(data = dat1),
calc_bin_props(dat2),
by = "bin") %>%
mutate(dbin = (sum.x - sum.y),
bin = as.numeric(as.character(bin))) %>%
select(-starts_with("trsh"))
Before we feed the data into ggplot() we convert it to the long
format using pivot_longer() this allows us to instruct ggplot() to
plot the results for sum.x, sum.y and dbin as separate lines.
diff_data %>%
pivot_longer(-bin) %>%
ggplot(aes(as.numeric(bin), value, color = name, linetype = name)) +
geom_line() +
scale_linetype_manual(values=c("longdash", "solid", "solid")) +
scale_color_manual(values = c("black", "purple", "green"))

A simple plot for many curves with different colors

I have the following data frame which contains 4 columns of data in addition to the vector of labels c.
Time <-c(1:4)
d<-data.frame(Time,
x1= rpois(n = 4, lambda = 10),
x2= runif(n = 4, min = 1, max = 10),
x3= rpois(n = 4, lambda = 5),
x4= runif(n = 4, min = 1, max = 5),
c=c(1,1,2,3))
I would like to use ggpolt to plot 4 curves"x1,..,x4" above each others where each curve is colored according to the label. So curves x1 and x2 are colored by the same color since they have the same label where as curves x3 and x4 in different colors.
I did the following
d %>% pivot_longer(-c(Time,x1,x2,x3,x4))%>%
rename(class=value) %>% select(-name) %>%
pivot_longer(-c(Time,class)) %>%
mutate(Label=ifelse(Time==max(Time,na.rm = T),name,NA),
Label=ifelse(duplicated(Label),NA,Label)) %>%
ggplot(aes(x=Time,y=value,color=factor(class),group=name))+
geom_line()+
labs(color='class')+
scale_color_manual(values=c('red','blue','green'))+
geom_label_repel(aes(label = Label),
nudge_x = 1.5,
na.rm = TRUE,show.legend = F,color='black')
but I don't get the needed plot, the resulted curves are not colored according to the label. I want x1 and x2 in red, x3 in blue and x4 in green.
To add: I would like to obtain the same plot above in the following general case, where I can't add the vector c to the data frame as length(c) is not equal to length(x1)=...=length(x4)
Time <-c(1:5)
d<-data.frame(Time,
x1= rpois(n = 5, lambda = 10),
x2= runif(n = 5, min = 1, max = 10),
x3= rpois(n = 5, lambda = 5),
x4= runif(n = 5, min = 1, max = 5))
and c=c(1,1,2,3)

As you point out in your comments, it is only possible to put the vector of colors as a column in the original data.frame because it happens to be square, but this is a dangerous way to store the information because the colors really belong to the columns rather than the rows. It's better to assign the colors separately and then join into the long format data by variable name prior to plotting.
Below is an example of how I'd do this with your data.
First, prepare the data without the color mapping for each variable, we'll do that next:
# load necessary packages
library(tidyverse)
library(ggrepel)
# set seed to make simulated data reproducible
set.seed(1)
# simulate data
Time <-c(1:4)
d <- data.frame(Time,
x1 = rpois(n = 4, lambda = 10),
x2 = runif(n = 4, min = 1, max = 10),
x3 = rpois(n = 4, lambda = 5),
x4 = runif(n = 4, min = 1, max = 5))
Next, make a separate data.frame that maps the color grouping to the variable names. At some point you'll want to make this a factor (i.e. discrete rather than continuous) to map it to color so I just do it here but it can be done later in the ggplot call if you prefer. Per your request, this solution easily scales with your dataset without needing to manually set each level, but it requires that your vector of color mappings is in the same order and the same length as the variable names in d unless you have some other way to establish that relationship.
# create separate df with color groupings for variable in d
color_grouping <- data.frame(var = names(d)[-1],
color_group = factor(c(1, 1, 2, 3)))
Then you pivot_longer and do a join to merge the color mapping with the data for plotting.
# pivot d to long and merge in color codes
d_long <- d %>%
pivot_longer(cols = -Time, names_to = "var", values_to = "value") %>%
left_join(., color_grouping)
# inspect final table prior to plotting to confirm color mappings
head(d_long, 4)
# # A tibble: 4 x 4
# Time var value color_group
# <int> <chr> <dbl> <fct>
# 1 1 x1 8 1
# 2 1 x2 1.56 1
# 3 1 x3 4 2
# 4 1 x4 4.97 3
Finally, generate line plot where color is mapped to the color_group variable. To ensure you get one line per original variable you also need to set group = var. For more info on this check the documentation on grouping.
# plot data adding labels for each line
p <- d_long %>%
ggplot(aes(x = Time, y = value, group = var, color = color_group)) +
geom_line() +
labs(color='class') +
scale_color_manual(values=c('red','blue','green')) +
geom_label_repel(aes(label = var),
data = d_long %>% slice_max(order_by = Time, n = 1),
nudge_x = 1.5,
na.rm = TRUE,
show.legend = F,
color='black')
p
This produces the this plot:
In your comment you suggested wanting to separate out and stacking the plots. I'm not sure I fully understood, but one way to accomplish this is with faceting.
For example if you wanted to facet out separate panels by color_group, you could add this line to the plot above:
p + facet_grid(rows = "color_group")
Which gives this plot:
Note that the faceting variable must be put in quotes.

You were on the right path, but you need a little bit of a different structure to use ggplot:
# delete old color column
d$c <- NULL
# reshape df
plot.d <- reshape2::melt(d, id.vars = c("Time"))
# create new, correct color column
plot.d$c <- NA
plot.d$c[plot.d$variable == "x1"] <- 1
plot.d$c[plot.d$variable == "x2"] <- 1
plot.d$c[plot.d$variable == "x3"] <- 2
plot.d$c[plot.d$variable == "x4"] <- 3
# plot
ggplot(plot.d, aes(x=Time, y=value, color=as.factor(c), group = variable))+
geom_line() +
labs(color='class')+
scale_color_manual(values=c('red','blue','green'))
Note that I omitted the labels for brevity, but you can add them back in using the same logic. The code above gives the following result:

Here is a solution for how I understood your question.
The DF is brought in the long format, the variable c is replaced with mutate / case_when with the number code you have used.
I have set a seed for better reproducibility.
library(tidyverse)
library(ggrepel)
set.seed(1)
# YOUR DATA
Time <- c(1:4)
d <- data.frame(Time,
x1 = rpois(n = 4, lambda = 10),
x2 = runif(n = 4, min = 1, max = 10),
x3 = rpois(n = 4, lambda = 5),
x4 = runif(n = 4, min = 1, max = 5),
c = c(1, 1, 2, 3)
)
d %>%
pivot_longer(cols = x1:x4) %>% # make it long
mutate(c = as.factor(case_when( # replace consistently
name == "x1" | name == "x2" ~ 1, # according to YOUR DATA
name == "x3" ~ 2,
name == "x4" ~ 3
))) %>%
mutate(
Label = ifelse(Time == max(Time, na.rm = T), name, NA),
Label = ifelse(duplicated(Label), NA, Label)
) %>%
ggplot(aes(x = Time, y = value, color = c, group = name)) +
geom_line() +
labs(color = "class") +
scale_color_manual(values = c("red", "blue", "green")) + # YOUR CHOICE
geom_label_repel(aes(label = Label),
nudge_x = 1.5,
na.rm = TRUE, show.legend = F, color = "black"
)
ADDED
You could leave the c out and color according to name.
The color code was neccessary because you wanted 2 names with the same color. If that is not needed, the following code can do it.
d %>%
pivot_longer(cols = x1:x4) %>% # make it long
mutate(
Label = ifelse(Time == max(Time, na.rm = T), name, NA),
Label = ifelse(duplicated(Label), NA, Label)
) %>%
ggplot(aes(x = Time, y = value, color = name, group = name)) +
geom_line() +
geom_label_repel(aes(label = Label),
nudge_x = 1.5,
na.rm = TRUE, show.legend = F, color = "black"
)

Multigroup frequency with ggplot

I'm trying to replicate this histogram in R.
Here is how to mock my dataset:
dft <- data.frame(
menutype = sample(c(1,2,4,5,6,8,12), 120, replace = T),
Belief = sample(c(0,1), 120, replace = T),
Choice = sample(c(0,1), 120, replace = T)
)
Here is my code :
library(ggplot2)
library(dplyr)
library(tidyr)
library(MASS)
df <- data.frame(
menutype = factor(df$menutype, labels = c("GUILT" , "SSB0", "SSB1", "FLEX0", "FLEX1", "STD", "FLEX01"),
levels = c(1,2,4,5,6,8,12)),
Belief = factor(df$belieflearn, levels = c(1), labels= c("Believe Learn")), #Interested only in this condition
Choice = factor(df$learned, levels = c(1), labels= c("Learn")) #Same here
)
df1 <- rbind(na.omit(df %>%
count(Belief, menutype) %>%
group_by(menutype) %>%
mutate(prop = n / sum(n))),
na.omit(df %>%
count(Choice, menutype) %>%
group_by(menutype) %>%
mutate(prop = n / sum(n))))
test <- paste(df1$Belief[1:6],paste(df1$Choice[7:13]))
test[1:6] <- paste(df1$Belief[1:6])
test[7:13] <- paste(df1$Choice[7:13])
df1$combine <- paste(test)
ggplot(data = df1, aes(menutype, prop, fill = combine)) +
labs(title = "Classification based on rank ordering\n", x = "", y = "Fraction of subjects", fill = "\n") +
geom_bar(stat = "identity", position = "dodge")+
theme_bw() +
theme(legend.position="bottom", plot.title = element_text(hjust = 0.5)) #Centering of the main title+
#geom_text(aes(label="ok"), vjust=-0.3, size=3.5)+
The problem is that it's more or less working, I'm almost getting the graph that I want but it is a workaround and there is still some errors. Indeed, I've for example the same value for STD (0.10), while it should be 0 and 0.10 like in the original graph.
What I would like to do optimally is to have two different dataframe, one with menutype and Belief, the other one with menutype and Choice, then as I did, compute the proportion of a specific modality in each latter variables on menutype, and finally to plot it as histograms, much as the graph in the original study. Additionally, I'd like to have the proportions as fractions above each bar, but that is optional.
Could someone help me on this matter? I'm really struggling to get it working.
Thanks in advance!
EDIT: I think the issue is with the fill =. I would like to specify for each bar the variable I want (e.g, fill = df2$Belief & df2$Choice) but I don't know how to proceed.

library(tidyverse)
set.seed(10)
# example data frame
df <- data.frame(
menutype = sample(c(1,2,4,5,6,8,12), 120, replace = T),
Belief = sample(c(0,1), 120, replace = T),
Choice = sample(c(0,1), 120, replace = T)
)
# calculate all metrics based on all variables you want to plot in a tidy way
df_plot = df %>%
group_by(Choice) %>%
count(menutype, Belief) %>%
mutate(prop = n / sum(n),
prop_text = paste0(n, "/", sum(n))) %>%
ungroup()
# barplots using one variable and split plots using another variable
df_plot %>%
mutate(Belief = factor(Belief),
menutype = factor(menutype)) %>%
ggplot(aes(menutype, prop, fill = Belief))+
geom_col(position = "dodge")+
facet_wrap(~Choice, ncol=1)+
geom_text(aes(label=prop_text), position = position_dodge(1), vjust = -0.5)+
ylim(0,0.2)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Issue on boxplot in R language - r

Related

R Plotting three timeseries in two facet_grids in ggplot

Is it possible to change a line from a ggplot/geom_line plot depending on what month the datapoint corresponds to?

How to plot line graph of normalized differences from binned data with ggplot?

A simple plot for many curves with different colors

Multigroup frequency with ggplot

Categories

Resources