Line plot with factor variables in R - r

How can I make R draw lines between two observations according with factor variables?
I have two 'time' points, early and late, coded as categorical
plotdata <- structure(list(
x = structure(1:2, .Label = c("early", "late"), class = "factor"),
y = 1:2
),
.Names = c("x", "y"), row.names = c(NA, -2L), class = "data.frame"
)
I only get kind of a bar plot:
plot(plotdata)
I also tried coding the variables as 0 and 1, but then I get a continuous axis with.

Let's say your data is
d <- structure(list(x = structure(1:2, .Label = c("early", "late"), class = "factor"),
y = 1:2), .Names = c("x", "y"), row.names = c(NA, -2L), class = "data.frame")
d
# x y
# early 1
# late 2
With base R
plot(as.numeric(d$x), d$y, type = "l", xaxt = "n")
axis(1, labels = as.character(d$x), at = as.numeric(d$x))
With ggplot2
library(ggplot2)
ggplot(d, aes(x = x, y = y)) + geom_line(aes(group = 1))

Related

How to create two stacked bar charts next to each other using ggplot. I want to recreate the below chart:

I have the below 2 dataframes:
lc2 <- structure(list(group = 1:3, sumpct = c(13, 32, 54)), class = "data.frame", row.names = c(NA,
-3L))
note this is for the "likelihood to click" bar (see image), where "extremely/somewhat likely" is
13%, neutral is 32, and extremely/somewhat unlikely is 54)
and
le2 <- structure(list(e = 1:3, t = c(13, 38, 48)), class = "data.frame", row.names = c(NA,
-3L))
note similarly this code above is for "likelihood to enroll" bar below.
But I want to create this:
lc2 <- structure(list(group = 1:3, sumpct = c(13, 32, 54)),
class = "data.frame", row.names = c(NA, -3L))
le2 <- structure(list(e = 1:3, t = c(13, 38, 48)),
class = "data.frame", row.names = c(NA, -3L))
lc2$type <- "click"
le2$type <- "enroll"
colnames(lc2) <- c("group", "pct", "type")
colnames(le2) <- c("group", "pct", "type")
library(data.table)
library(ggplot2)
dt <- rbindlist(list(lc2, le2))
dt[, group := as.factor(group)]
ggplot(dt, aes(x = type, y = pct, fill = group)) +
geom_bar(stat = "identity") +
geom_text(aes(label=scales::percent(pct/100)), position = position_stack(vjust = .5))+
theme_classic() +
coord_flip()

Using segment labels in ggplot with ggrepel with smooth segments

This is my dataframe:
df<-structure(list(year = c(1984, 1984), team = c("Australia", "Brazil"
), continent = c("Oceania", "Americas"), medal = structure(c(3L,
3L), .Label = c("Bronze", "Silver", "Gold"), class = "factor"),
n = c(84L, 12L)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
And this is my ggplot (my question is related to the annotations regard Brazil label):
ggplot(data = df)+
geom_point(aes(x = year, y = n)) +
geom_text_repel(aes(x = year, y = n, label = team),
size = 3, color = 'black',
seed = 10,
nudge_x = -.029,
nudge_y = 35,
segment.size = .65,
segment.curvature = -1,
segment.angle = 178.975,
segment.ncp = 1)+
coord_flip()
So, I have a segment divided by two parts. On both parts I have 'small braks'. How can I avoid them?
I already tried to use segment.ncp, change nudge_xor nudge_ynut its not working.
Any help?
Not really sure what is going on here. This is the best I could generate by experimenting with variations to the input values for segment... arguments.
There is some guidance at: https://ggrepel.slowkow.com/articles/examples.html which has an example with shorter leader lines, maybe that's an approach you could use.
df<-structure(list(year = c(1984, 1984), team = c("Australia", "Brazil"
), continent = c("Oceania", "Americas"), medal = structure(c(3L,
3L), .Label = c("Bronze", "Silver", "Gold"), class = "factor"),
n = c(84L, 12L)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
library(ggplot2)
library(ggrepel)
ggplot(data = df)+
geom_point(aes(x = year, y = n)) +
geom_text_repel(aes(x = year, y = n, label = team),
size = 3, color = 'black',
seed = 1,
nudge_x = -0.029,
nudge_y = 35,
segment.size = 0.5,
segment.curvature = -0.0000002,
segment.angle = 1,
segment.ncp = 1000)+
coord_flip()
Created on 2021-08-26 by the reprex package (v2.0.0)

Create new column with percentages in data frame

I have the following dataframe:
dput(df1)
structure(list(month = c(1, 1, 2, 2, 3, 4), transaction_type = c("AAA",
"BBB", "BBB", "CCC",
"DDD", "AAA"), max_wt_per_month = c(54.9,
51.6833333333333, 52.3333333333333, 49.4666666666667, 49.85,
48.5833333333333), min_wt_per_month = c(0, 0, 0, 0, 0, 0), avg_wt_per_month = c(8.41701333107861,
7.65211141060198, 6.44184012508551, 7.74798927613941, 7.4360566888844,
7.50611319574734), prop = c(Inf, Inf, Inf, Inf, Inf, Inf)), .Names = c("month",
"transaction_type", "max_wt_per_month", "min_wt_per_month", "avg_wt_per_month",
"prop"), row.names = c(NA, -6L), class = c("grouped_df", "tbl_df",
"tbl", "data.frame"), vars = list(month), drop = TRUE, indices = list(
0:5), group_sizes = 6L, biggest_group_size = 6L, labels = structure(list(
month = 1), row.names = c(NA, -1L), class = "data.frame", vars = list(
month), drop = TRUE, .Names = "month"))
I want to create column prop that would contain the percentage of maximum waiting time with respect to each month. If I run this code, then I get Inf values in most of the rows... (especially it is evident in the real dataset):
my_fun=function(vec){
100*as.numeric(vec[3]) /
sum(with(data_merged_transactions, ifelse(month == vec[1], max_wt_per_month, 0))) }
data_merged_transactions$prop=apply(data_merged_transactions , 1 , my_fun)
I then finally need to create the filled area chart so that each area would be a percentage out of 100%:
ggplot(data_merged_transactions, aes(x=month, y=prop, fill=transaction_type)) +
geom_area(alpha=0.6 , size=1, colour="black")
Why do I get Inf if the sum is not equal to 0?
Moreover, is it possible to create filled area chart with months being factors (Jan, Feb,etc.), not numbers? I tried to substitute month id's by month names, but then I got very thin bars instead of a filled area.
Is this what you were looking for?
library(tidyverse)
df1_tidy <- df1 %>%
group_by(month) %>%
summarise(SUM = sum(max_wt_per_month)) %>%
full_join(df1) %>%
mutate(prop = max_wt_per_month / SUM)
ggplot(data = df1_tidy,
aes(x = month,
y = prop,
fill = transaction_type)) +
geom_area(alpha = 0.6,
size = 1,
colour = "black") +
scale_x_continuous(labels = c("Jan", "Feb", "Mar", "Apr"))

Single error bar for stacked graph equalling 100

I have a stacked bar graph that shows the differences in classes between skeleton and tissue. The total of the two will always be 100 and their standard errors are the same. As such, the top error bar is superfluous and adds confusion.
Is there a way to only have the standard error for the bottom group? This link shows how to get a single bar for the top of the stack but isn't quite what I need: Single error bar on stacked bar plot ggplot Thanks.
Code:
library(reshape2)
library(Rmisc)
library(ggplot2)
melt <- melt(file, id=c("TREATMENT", "Species"),
value.name="Amount", variable.name = "Class")
x1 <- summarySE(melt, measurevar = "Amount",
groupvars = c("Species", "TREATMENT", "Class"), na.rm=TRUE)
x2 <- within(x1,lit2 <- ave(Amount, Class, Species, FUN = cumsum))
p10 <- ggplot(x2, aes(y = Amount, x = Class, fill = TREATMENT)) +
geom_bar(stat = "identity", colour = "black") +
geom_errorbar(aes(ymin = lit2-se, ymax = lit2+se), size = .5, width = .25)
p10
Data:
structure(list(TREATMENT = c("SKELETON", "SKELETON", "SKELETON",
"SKELETON", "TISSUE", "TISSUE", "TISSUE", "TISSUE"), Species = c("A",
"A", "A", "A", "A", "A", "A", "A"), `1` = c(42.1958615095789,
73.6083881998577, 62.1025409404354, 21.5264243794993, 57.8041384904211,
26.3916118001423, 37.8974590595646, 78.4735756205007), `2` = c(46.9398719372755,
89.6865089817669, 55.9907366318623, 18.1145895471236, 53.0601280627245,
10.3134910182331, 44.0092633681377, 81.8854104528764), `3` = c(55.4637732254405,
75.0933095632366, 20, 18.402199079204, 44.5362267745594, 24.9066904367634,
80, 81.597800920796)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -8L), .Names = c("TREATMENT", "Species",
"1", "2", "3"))

How to search for string patterns in another string and include a separator?

My data is structured as follows:
dput(head(CharacterAnalysis,5))
structure(list(Character = c("A", "a", "B", "b", "C"),
Descriptor = c("Jog", "Change Direction", "Shuffle", "Walk", "Stop"),
.Names = c("Character", "Descriptor"),
row.names = c(NA, 5L), class = "data.frame")
I wish to lookup the Character and relevant Descriptor in the following data frame, but am unsure how to do so:
dput(head(StringAnalysis,3))
structure(list(MovementString = c("ACb", "aAaB", "BbCa"),
.Names = c("MovementString"),
row.names = c(NA, 3L), class = "data.frame")
My expected outcome/ data frame would be:
dput(head(Output,3))
structure(list(MovementString = c("ACb", "aAaB", "BbCa"),
MovementPerformed = c("Jog/ Stop/ Walk", "Change Direction/ Jog/ Change Direction/ Shuffle", "Shuffle/ Walk/ Stop/ Change Direction")
.Names = c("MovementString", "MovementPerformed"),
row.names = c(NA, 3L), class = "data.frame")
I would like a forward stroke (/) or similar to separate each Descriptor as it signals a new movement. Any advice on how to please complete this? My data frame CharacterAnalysis is over 1 million rows long, so I do not wish to have to search for each MovementString separately!
Thank you.
CharacterAnalysis <-
structure(list(Character = c("A", "a", "B", "b", "C"),
Descriptor = c("Jog", "Change Direction", "Shuffle", "Walk", "Stop")),
.Names = c("Character", "Descriptor"),
row.names = c(NA, 5L), class = "data.frame")
Output <-
structure(list(MovementString = c("ACb", "aAaB", "BbCa"),
MovementPerformed = c("Jog/ Stop/ Walk", "Change Direction/ Jog/ Change Direction/ Shuffle", "Shuffle/ Walk/ Stop/ Change Direction")),
.Names = c("MovementString", "MovementPerformed"),
row.names = c(NA, 3L), class = "data.frame")
# A simple approach based on names
# Build the lookup table just once
m <- CharacterAnalysis$Descriptor
names(m) <- CharacterAnalysis$Character
# Build the MovementPerformed column
Output$MovementPerformed <-
sapply(strsplit(Output$MovementString,""),
FUN = function(x) paste(m[x], collapse = "/ "))

Resources