R: Creating more space between bars in geom_col - r

I'm creating a bar chart in R for values of a certain case over time, using geom_col. The chart contains values per week for a period of about a year and a half.
My problem with my current plot is that the bars are pretty close together. Especially in a PDF format, this creates a problem, since zoomed out it looks more like a histogram. You really have to zoom in drastically to see that the plot consists of individual bars per week. See below.
So what I've tried to do is increase the size between the bars, using position = position_dodge(width=2)). However, so far I see no changes. Why doesn't it take the position dodge? Because the x scale is based on dates?
Below is the head() of my df and a basic version of the code for the plot I'm trying to make.
structure(list(Land = c("India", "India", "India", "India", "India",
"India"), Date = structure(c(18498, 18491, 18484, 18477, 18470,
18463), class = "Date"), SUMU = c(88L, 142L, 96L, 101L, 112L,
128L), ChangeAVG = c("Other", "Other", "Other", "Other", "Other",
"Other")), row.names = c(NA, -6L), groups = structure(list(Land = "India",
.rows = structure(list(1:6), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = 1L, class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
ggplot(India, aes(Date, SUMU, fill=ChangeAVG))+ theme_light() + geom_col(position = position_dodge(width=10))
Examples of plot view in PDF normally and with zoom at 200%
Thanks!

The problem is that you are using the width argument inside position_dodge. Move it outside that call:
ggplot(India, aes(Date, SUMU, fill=ChangeAVG)) +
theme_light() +
geom_col(width=1.5)

Related

Using ggalluvial with nodes holding different values

My data is a set of activities completed by persons. The sequence of activities a person takes varies. The data below show the activities for each step (Step1, Step2, etc). I'd like an alluvial plot that labels the activities at each step (each a different node 1, 2, 3...) What is the best approach? Here's what I have so far:
df<-structure(list(acts_activity_id = c("9928131", "445661", "686203", "687868", "688564"), Step1 = c("Unable to Reach", "Unable to Reach",
"Search Correspondence", "Unable to Reach", "Unable to Reach"), Step2 = c("Match Request", NA, "Connection Made", NA, "Match Request"
), Step3 = c("Support Group Request", NA, "Connection Contact Attempt", NA, "Support Group Request"),Step4 = c("Information Provided",
NA, "Not Available to Support", NA, "Information Provided"),
Step5 = c(NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_)), class = c("grouped_df", "tbl_df", "tbl",
"data.frame"),
row.names = c(NA, -5L),
groups = structure(list(acts_activity_id = c("9928131", "445661", "686203", "687868", "688564"), .rows = structure(list(1L, 2L, 3L, 4L, 5L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L), .drop = TRUE))
df %>%
ggplot(
aes(
axis1=Step1, #each step has different values; individuals go thru different sequence of steps
axis2=Step2, axis3=Step3, axis4=Step4, axis5=Step5 ))+
geom_flow()+
geom_stratum()+
labs(title="Activity Sequence")
The first
If you have your data in this order (each column is a set of different activities), then use ggsankey:
df$acts_activity_id<-NULL
x<-df %>% ggsankey::make_long(Step1,Step2,Step3,Step4,Step5)
ggplot(x, aes(x = x, next_x = next_x,
node = node, next_node = next_node,
fill = factor(node), label = node)) +
geom_sankey(flow.alpha = 0.6, node.color = "gray30") +
geom_sankey_label(size = 3, color = "white", fill = "gray40") +
scale_fill_viridis_d() +
theme_sankey(base_size = 18) +
labs(x = NULL) +
theme(legend.position = "none",
plot.title = element_text(hjust = .5))

geom_bar(position="dodge") and errorbar not dodging properly

So I got a little issue.... Here's my code :
ggplot(GFAPdata_numb, aes(x=Level, y=Pos.Area, fill=Statut))+
geom_bar(stat="identity", color="black", position = "dodge")+
geom_errorbar(aes(ymin=lower, ymax=higher), width=.2, position=position_dodge(.9))
And for some weird and unknown reason, my plot look like this : weird dodge
And I don't know why ! The dodge seems to have work somehow but it like looks like the "ghost" of the data are still stacked and screwing up with my errorbars...
Do you guys have any ideas what's causing that ?
Edit : I was asked to put some data with dput so here it is (first time using this function so I'm not sure I did it right)
> dput(head(GFAPdata_numb))
structure(list(Agneau = c(1L, 1L, 1L, 1L, 2L, 2L), Statut = c("terme",
"terme", "terme", "terme", "terme", "terme"), Area = c(6.53,
6.53, 6.53, 6.53, 4.93, 4.93), Level = c("Weak", "Pos", "Strong",
"Neg", "Weak", "Pos"), Values = c(6744015L, 5076648L, 787615L,
13099676L, 5356151L, 3978924L), Positivity = c(0.262331844844596,
0.197473824638087, 0.0306370160768142, 0.509557314440504, 0.275961978086681,
0.205003880155091), Pos.Area = c(0.0401733299915154, 0.0302410144928157,
0.00469173293672499, 0.0780332793936453, 0.0559760604638299,
0.041582937151134), moyenne = c(0.0382848392036753, 0.0382848392036753,
0.0382848392036753, 0.0382848392036753, 0.050709939148073, 0.050709939148073
), ecart.type = c(0.0304231534615388, 0.0304231534615388, 0.0304231534615388,
0.0304231534615388, 0.0391149666608345, 0.0391149666608345),
SEM = c(0.0152115767307694, 0.0152115767307694, 0.0152115767307694,
0.0152115767307694, 0.0195574833304173, 0.0195574833304173
), lower = c(0.00847014881136729, 0.00847014881136729, 0.00847014881136729,
0.00847014881136729, 0.0123772718204552, 0.0123772718204552
), higher = c(0.0680995295959834, 0.0680995295959834, 0.0680995295959834,
0.0680995295959834, 0.0890426064756909, 0.0890426064756909
)), row.names = c(NA, -6L), groups = structure(list(Agneau = 1:2,
.rows = structure(list(1:4, 5:6), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
Ok I think I managed to correct the issue by myself after a couple of days not working on it to refresh my brain.
I was using "Pos.Area" as my y-value (which mean the value for each of my smaples) instead of the mean of the Pos.Area to create my plot. And I guess that's why my errorbars were so wild : I had the errorbars for each values of Pos.Area
Once I changed that, the plot was way better.

How to subset based on multiple variables in one column

Good afternoon,
I am trying to filter a data frame that includes 94 different items and how much they were sold in different hours, by the top 15 items sold overall, from the data below:
structure(list(Time = c("07", "07", "07", "07", "07", "08"),
Item = c("Bread", "Coffee", "Medialuna", "Pastry", "Toast",
"Afternoon with the baker"), Transactions = c(2L, 13L, 6L,
2L, 1L, 3L)), row.names = c(NA, -6L), groups = structure(list(
Time = c("07", "08"), .rows = structure(list(1:5, 6L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = 1:2, class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
I already know the 15 most popular items by name ("Coffee", "Tea", "Bread", etc.), and I have tried to subset the data frame using the code below:
SalesPerTimePerItem <- subset(SalesPerTimePerItem,
Item == c("Coffee",
"Tea",
"Bread",
"Cake",
"Pastry",
"Sandwich",
"Medialuna",
"Hot chocolate",
"Cookies",
"Brownie",
"Farm House",
"Muffin",
"Alfajores",
"Juice",
"Soup"))
But I have received this error:
In Item == c("Coffee", "Tea", "Bread", "Cake", "Pastry", "Sandwich", :
longer object length is not a multiple of shorter object length
I have also tried another method below:
SalesPTPI <- SalesPTPI[SalesPTPI$Item %in% c("Coffee",
"Tea",
"Bread",
"Cake",
"Pastry",
"Sandwich",
"Medialuna",
"Hot chocolate",
"Cookies",
"Brownie",
"Farm House",
"Muffin",
"Alfajores",
"Juice",
"Soup")]
But got the error:
Error: Must subset columns with a valid subscript vector.
i Logical subscripts must match the size of the indexed input.
x Input has size 3 but subscript `i` has size 631.
My aim, with this data, is to create a barplot like the one I have linked below:
but with different objects:
(x = Time, y = Transactions, fill = Item)
How I can filter out only the 'Coffee', etc., responses from the data frame included at the top?
Your first example is wrong, because == is used to match against single values, not a vector.
To fix it, substitute == with %in%
Your second example is wrong, because df[df$var %in% vals] selects columns, not rows.
To fix it, add a comma after the expression df[df$var %in% vals,]

make subway graph include 102 topics in ggplot2 r

This is a followup from subway-style graph for word frequency across three datasets in ggplot2
I used the code in the answer from this question, but am struggling with how best to manipulate the graph to make it fits 100 unique dict entries within the subway graph without completely messing up the dict word entries on the margins.
I have tested out different amounts of words to feed into the subway graph, and found that it cannot contain more than 25 words.
I have data:
structure(list(dict = c("apple", "apple", "apple",
"mandarin", "mandarin", "mandarin", "orange", "orange", "orange", "pear"),
name = c("freq_ongov", "freq_onindiv", "freq_onmedia", "freq_ongov",
"freq_onindiv", "freq_onmedia", "freq_ongov", "freq_onindiv",
"freq_onmedia", "freq_ongov"), value = c(0, 87, 63, 0, 44,
20, 3, 27, 25, 0), rank = c(26, 85, 70, 26, 61, 42.5, 86,
47, 48, 26)), row.names = c(NA, -10L), groups = structure(list(
name = c("freq_ongov", "freq_onindiv", "freq_onmedia"), .rows = structure(list(
c(1L, 4L, 7L, 10L), c(2L, 5L, 8L), c(3L, 6L, 9L)), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 3L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
But there are 100 rows within this data that I want to include in the following code:
leftlabels <- df$dict[df$name == "freq_ongov"]
leftlabels <- leftlabels[order(df$rank[df$name == "freq_ongov"])]
rightlabels <- df$dict[df$name == "freq_onmedia"]
rightlabels <- rightlabels[order(df$rank[df$name == "freq_onmedia"])]
ggplot(df, aes(name, rank, color = dict, group = dict)) +
geom_line(size = 4) +
geom_point(shape = 21, fill = "white", size = 4) +
scale_y_continuous(breaks = seq(max(df$rank)), labels = leftlabels,
sec.axis = sec_axis(~., breaks = seq(max(df$rank)),
labels = rightlabels)) +
scale_x_discrete(expand = c(0.01, 0)) +
guides(color = guide_none()) +
coord_cartesian(clip = "off") +
theme(axis.ticks.length.y = unit(0, "points"))
I tried changing the y.int and width of the y axis to fit in 100 words, but that only makes the y-axis longer, without changing the spacing between each word label on the y-axis, so all the words get squeezed together. Any suggestions?

igraph arrow.mode seems to have no effect

I have a network with some directed and some undirected edges. I'm trying to use igraph to plot it using the arrow.mode parameter, but the graph is always showing arrows with default parameters. Here's an example
Here are some data:
spearRhoP_lagged4 <- structure(list(Var1 = c("ARISA_538.9", "ARISA_538.9", "ARISA_666.4",
"ARISA_686.9", "ARISA_538.9", "ARISA_594.1"), Var2 = c("ARISA_666.4",
"ARISA_686.9", "ARISA_686.9", "ARISA_666.4", "ARISA_561.8", "ARISA_561.8"
), rho = c(0.280885191364122, 0.415365287156247, 0.614493076574831,
0.312630564055403, 0.295296877306726, 0.381890811408216), p = c(0.00206314544835896,
2.9098006351119e-06, 1.35005674822095e-13, 0.000567475872663549,
0.00116911931220592, 1.98010880043619e-05), delay = c(0, 0, 0,
1, 0, 0), fdr = c(0.0135393920048557, 7.97032347878478e-05, 2.83511917126399e-11,
0.00503534929264839, 0.00898225813036257, 0.000366902513022),
arrow = c("-", "-", "-", ">", "-", "-")), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L), groups = structure(list(
Var1 = c("ARISA_538.9", "ARISA_538.9", "ARISA_538.9", "ARISA_594.1",
"ARISA_666.4", "ARISA_686.9"), Var2 = c("ARISA_561.8", "ARISA_666.4",
"ARISA_686.9", "ARISA_561.8", "ARISA_686.9", "ARISA_666.4"
), .rows = list(5L, 1L, 2L, 6L, 3L, 4L)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE))
Then I build the graph
LaggedSpearGraph <- graph_from_data_frame(spearRhoP_lagged4)
Lastly I plot the graph, telling it that I want the arrow direction to be specified by the parameter arrow
plot(LaggedSpearGraph,
vertex.size=2,
arrow.mode = E(LaggedSpearGraph)$arrow)
I get an output that looks like this.
But what I want is a network where there is only one edge with an arrow on it.
Any suggestions?
You need to add edge as a prefix:
LaggedSpearGraph <- graph_from_data_frame(spearRhoP_lagged4, directed=T)
plot(LaggedSpearGraph,
vertex.size=10,
edge.arrow.mode = E(LaggedSpearGraph)$arrow)
See here:
https://github.com/igraph/igraph/issues/954

Resources