Plot classified by categories with column-names (R) - r

I've got a dataframe that possess the next structure:
D1A1 D1A2 D1A3 D1B1 D1B2 D1B3 D2A1 D2A2 D2A3 D2B1 D2B2 D2B3
10 12 15 40 39 27 11 13 14 33 31 32
The actual dataframe has a greater dimension (40 observations / columns). My interest is to create any kind of possible plot showing all the numerical information together with the data clustered by their column classification (D1A, D1B, D2A, D2B) as follows:
D1A1+D1A2+D1A3 || D1B1+D1B2+D1B3 || D2A1+D2A2+D2A3 || D2B1+D2B2+D2B3
As long as I feel extremely lost, any suggestion would be appreciated.

We can split the dataset by the substring of column names, loop over the list and get the rowSums and use barplot
out <- sapply(split.default(df1, sub("\\d+$", "", names(df1))),
rowSums, na.rm = TRUE)
If there are more rows and want to plot, use tidyverse, we can reshape into 'long' format with pivot_longer by making use of the pattern in column names i.e. capturing the substring of column names without the digits at the end. This create 4 columns. Then, we use summarise with across to get the sum of each columns and return a bar plot - geom_col
df2 %>%
pivot_longer(cols = everything(), names_to = ".value",
names_pattern = "(.*)\\d+$") %>%
summarise(across(everything(), sum, na.rm = TRUE)) %>%
pivot_longer(cols = everything()) %>%
ggplot(aes(x = name, y = value, fill = name)) +
If we are interested in the spread of the data, a boxplot can help. Here, we don't summarise, and instead of geom_col use geom_boxplot
df2 %>%
pivot_longer(cols = everything(), names_to = ".value",
names_pattern = "(.*)\\d+$") %>%
pivot_longer(cols = everything()) %>%
ggplot(aes(x = name, y = value, fill = name)) +
df1 <- structure(list(D1A1 = 10L, D1A2 = 12L, D1A3 = 15L, D1B1 = 40L,
D1B2 = 39L, D1B3 = 27L, D2A1 = 11L, D2A2 = 13L, D2A3 = 14L,
D2B1 = 33L, D2B2 = 31L, D2B3 = 32L), class = "data.frame", row.names = c(NA,
df2 <- structure(list(D1A1 = c(10L, 15L), D1A2 = c(12L, 23L), D1A3 = 15:14,
D1B1 = c(40L, 23L), D1B2 = c(39L, 14L), D1B3 = c(27L, 22L
), D2A1 = 11:10, D2A2 = c(13L, 15L), D2A3 = c(14L, 17L),
D2B1 = c(33L, 35L), D2B2 = c(31L, 35L), D2B3 = c(32L, 32L
)), class = "data.frame", row.names = c(NA, -2L))


Grouped bar chart in R for multiple filter and select

Following is my dataset:
I want to plot a grouped bar graph. I am able to plot following graphs but I want both the results in same graph.
coul <- brewer.pal(3, "Set2")
# Bar graph for passed courses
result_pass <-data %>% filter(Result=='Pass') %>% summarize(c1_tot=sum(course1),
c2_tot = sum(course2), c3_tot = sum(course3) )
col_sum <- colSums(result_pass[,1:3])
barplot(colSums(result_pass[,1:3]), xlab = "Courses", ylab = "Total Marks", col = coul, ylim=range(pretty(c(0, col_sum))), main = "Passed courses ")
# Bar graph for Failed courses
result_fail <-data %>% filter(Result=='Fail') %>% summarize(c1_tot=sum(course1),
c2_tot = sum(course2), c3_tot = sum(course3) )
col_sum <- colSums(result_fail[,1:3])
barplot(colSums(result_fail[,1:3]), xlab = "Courses", ylab = "Total Marks", col = coul, ylim=range(pretty(c(0, col_sum))), main = "Failed courses ")
Any suggestion for which I can merge both the above plots and create grouped bar graph for Pass and Fail courses.
It's probably easier than you think. Just put the data directly in aggregate and use as formula . ~ Result, where . means all other columns. Removing first column [-1] and coerce as.matrix (because barplot eats matrices) yields exactly the format we need for barplot.
This is the basic code:
barplot(as.matrix(aggregate(. ~ Result, data, sum)[-1]), beside=TRUE)
And here with some visual enhancements:
barplot(as.matrix(aggregate(. ~ Result, data, sum)[-1]), beside=TRUE, ylim=c(0, 70),
col=hcl.colors(2, palette='viridis'), legend.text=sort(unique(data$Result)),
names.arg=names(data)[-1], main='Here could be your title',
args.legend=list(x='topleft', cex=.9))
data <- structure(list(Result = c("pass", "pass", "Fail", "Fail", "pass",
"Fail"), course1 = c(15L, 12L, 9L, 3L, 14L, 5L), course2 = c(17L,
14L, 13L, 2L, 11L, 0L), course3 = c(18L, 19L, 3L, 0L, 20L, 7L
)), class = "data.frame", row.names = c(NA, -6L))

Plot multiple geom_line and geom_smooth objects in one plot

I have somewhat messy looking dataframes, like this one:
# A tibble: 3 x 9
# Groups: Sequ [1]
Sequ Speaker Utterance A_intpl A_dur B_intpl B_dur C_intpl C_dur
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2 ID16.A cool >wha… 31.44786152… 10.5,17,1… 32.86993284… 9.5,16,17… 58.3368399… 14,17,17…
2 2 NA (0.228) 32.75735987… 15.5,17,1… 30.83469006… 14.5,16.9… 26.0386462… 3,17,16,…
3 2 ID16.B u:m Tenne… 32.05752604… 4.5,17,16… 29.95825107… 3.5,16,17… 55.9298614… 8,17,17,…
I want to plot the *_intpl values for each speaker (A, B, or C) for each of the three Utterances in a single chart both as line charts and as trend lines.
I'm just half successful doing this:
df0 %>%
pivot_longer(cols = contains("_"),
names_to = c("Event_by", ".value"),
names_pattern = "^(.*)_([^_]+$)") %>%
separate_rows(c(intpl, dur), sep = ",", convert = TRUE) %>%
mutate(Time = cumsum(dur)) %>%
mutate(Utterance = paste0(sub(".*(.)$", "\\1",Speaker), ": ", Utterance),
Utterance = factor(Utterance, levels = unique(Utterance))) %>%
ggplot(aes(x = Time, y = log2(intpl),
group = Event_by,
colour = Event_by)) +
geom_smooth(method = 'lm', color = "red", formula = y~x)+
facet_wrap(~ Utterance, ncol = 1, scales= "free_x")
Half successful because the line plots and trend lines are side-by-side, as if in three columns, whereas they should be in rows, one below the other - how can that be achieved?
Reproducible data:
structure(list(Sequ = c(2L, 2L, 2L), Speaker = c("ID16.A", NA,
"ID16.B"), Utterance = c("cool >what part?<", "(0.228)", "u:m Tennessee="
), A_intpl = c("31.4478615210995,31.5797510648522,31.7143985369445,31.651083739602,31.5806035086034,36.8956763912703,36.2882129597292,35.2124499461012,34.1366869324732,34.1366869324732,32.1927035724058,30.2487202123383,28.3047368522709,26.3607534922035,30.5278334848495,30.5919390424853,30.8898529369568,31.578968913188,31.9011198738002,32.1543265113196,31.9708002079533,31.966536408565,31.8762658607759,31.8994741472105,31.4215913971938,32.1510578328563,31.7863350712876,32.4685052625667,31.7422271490296,32.3286054977263,31.9998974949481,32.5177992323864,32.4727499785435,32.9310888953766,32.7592010033585,33.2231711877427,33.1593949301066,33.2432973964816,33.2569729073414,33.492144800249,33.317650964723,33.4835787832119,33.2377190454279,32.9200836384356,32.9684568771567,32.6400987016883,27.5447101464944,29.3948945479171,35.3449171857603,33.5932932239592,31.8416692621581,30.0900453003569,32.7850431084597,32.7589003618266,32.8365550655013,32.386716057622,32.8420792704881,32.6909995562489,32.6269434402016,32.7370944106334,32.7529759209752,32.6528826975113,32.3663573764448,32.7326853004792,32.6930038462418,32.8975978772676,33.1752899475416,33.2034433355001,33.0667431432803,32.6322933080614,33.2503168843178,32.7573598713719",
), A_dur = c("10.5,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,0.5",
"15.5,17,17,16,17,17,16,17,17,16,17,17,16,12.5", "4.5,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,5.5"
), B_intpl = c("32.8699328424689,32.8154348109057,32.5454364786882,32.408257038977,32.5304564519672,32.3270203236281,31.9233218634346,32.0166346064182,31.7360745988363,31.7546527359571,31.8603220354065,31.6520061326962,31.5603191463274,31.3357561466519,31.0976090032219,31.1405090978825,31.1697180784961,31.0863999545386,31.3126984044729,30.580776446803,30.7137016246273,31.0801914571091,31.2343922096768,31.2749857511594,31.3488604642844,30.9327390960718,31.0750482778561,31.1849119826023,31.4180114886183,31.5284273181104,31.147361398529,31.1128597713973,31.5551385744611,31.7479939892741,31.5890352680344,31.5470790538009,31.5427330200078,31.3901913024084,31.5423214446953,31.4814325586741,31.4937336232021,31.3483738841556,31.2516462059018,31.2233881922543,31.2572951780583,31.0087226975291,31.1197589042273,31.053748381687,30.8202174718598,30.845143129195,30.8727194789634,30.4231467151428,30.7254093759809,30.2757746547116,30.6047530953025,29.6835591414008,28.257421076205,29.4634886416064,29.183064807185,28.6935506287734,29.3989017421637,30.8936090542518,30.6884831327852,30.805770713392,30.6938909098627,30.8317757801268,30.8509115577427,30.6836198471168,30.7979978629801,31.0260101704105,30.6248844591805,30.8346900656087",
), B_dur = c("9.5,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,17,16.9999999999854,16.0000000000146,17,17,16,17,17,16,17,17,16,17.0000000000146,16.9999999999854,16,2.5",
), C_intpl = c("58.3368399069697,58.249224089011,59.5198368051218,58.8722012497097,58.4418996252205,58.5849059154389,59.2752163985494,52.8407480422202,51.6276603912397,48.0255346632529,44.753541512539,41.4815483618252,38.2095552111114,34.9375620603975,31.6655689096837,28.3935757589698,25.121582608256,19.4712933827274,22.0108873782783,24.5504813738291,24.8441573376901,24.6902151101703,24.4029572181118,24.9753161974674,24.8664406826514,24.8486668451201,25.1137001504163,25.1142578332509,25.4902077628339,25.4075561268027,25.6622548410237,61.2421678149908,25.1600975771354,25.6667198263373,25.442560744158,25.8736383423437,25.5859074180431,24.7860400673889,24.4337707697216,24.3214953242744,23.915753514736,23.7363185577661,23.7186569801299,23.4313514771952,23.5730151254578,62.5124513171595,23.3260531660862,23.4498217326665,23.2145314844252,57.5586745434594,63.4646233226955,23.0706406704345,23.3318690599491,62.044649715831,62.2720656330432,22.2532276715887,62.7059140614625,22.9511208849958,22.5603175709988,23.3456453893988,63.2523901625561,60.6655429980934,60.2358824325868,59.957910796633,57.3999702562457,54.8277282980263,43.0269305132552,31.2261327284841,19.425334943713,22.7319906068577,26.0386462700023",
), C_dur = c("14,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,14",
"3,17,16,17,17,16,17,17,16,17,17,16,17,17,8", "8,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,17,16,17,2"
)), row.names = c(NA, -3L), groups = structure(list(Sequ = 2L,
.rows = structure(list(1:3), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
There's a possible solution with use of grid.arrange() func from library(gridExtra) library(grid) packages.
I've wrapped your data into unique charts and combined them together into arranged chart.
df1 = df0 %>%
pivot_longer(cols = contains("_"),
names_to = c("Event_by", ".value"),
names_pattern = "^(.*)_([^_]+$)") %>%
separate_rows(c(intpl, dur), sep = ",", convert = TRUE) %>%
mutate(Time = cumsum(dur)) %>%
mutate(Utterance = paste0(sub(".*(.)$", "\\1",Speaker), ": ", Utterance),
Utterance = factor(Utterance, levels = unique(Utterance)))
Set chart objects into enviroment:
for (i in unique(df1$Event_by)){
for (j in levels(df1$Utterance)){
assign(x = paste0(i,j), value = ggplot(data = df1[df1$Event_by == i & df1$Utterance == j,], aes(x = Time, y = log2(intpl))) +
geom_smooth(method = 'lm', color = "red", formula = y~x))
Create grided chart:
library(gridExtra) library(grid)
`AA: cool >what part?<`,
`AB: u:m Tennessee=` ,
`ANA: (0.228)` ,
`BA: cool >what part?<` ,
`BB: u:m Tennessee=` ,
`BNA: (0.228)` ,
`CA: cool >what part?<` ,
`CB: u:m Tennessee=` ,
`CNA: (0.228)` ,
nrow = 3)
Although i think there should be better solution for that.
You can also try to explore below articlesfor arranging plots:
Moreover, there's is no themming added to my solution

Grouped barplots in R using csv

I have a 3 column csv file like this
I want to have a barplot using R, with first column values on x axis and second and third columns values as grouped bars for the corresponding x. I hope I made it clear. Can someone please help me with this? My data is huge so I have to import the csv file and can't enter all the data.I found relevant posts but none was exactly addressing this.
Thank you
Use the following code
df %>% pivot_longer(names_to = "y", values_to = "value", -x) %>%
ggplot(aes(x,value, fill=y))+geom_col(position = "dodge")
df = structure(list(x = c(100L, 200L, 300L), y1 = c(50L, 10L, 15L),
y2 = c(10L, 20L, 5L)), class = "data.frame", row.names = c(NA,

R: filter %in% range not filtering values with decimals

I have a dataset e:
`structure(list(num = c(23L, 23L, 23L), code = structure(1:3, .Label = c("A",
"B", "C"), class = "factor"), ranking = c(140.5, 140.5,
2662), bottom = c(-0.0207357225475016, -0.0146710913954366,
-0.019899240924872), previous = c(0.00312288516116536,
0.00207118230618904, -0.00191931365721628), mean_of_all = c(-0.000222419352160109,
-0.00107348087538642, -0.00202343390338765)), row.names = c(NA,
-3L), class = "data.frame")`
`winner_filtered <- e %>%
group_by(code) %>%
filter(ranking %in% (winner_lower:winner_upper))`
is not filtering the two values with 140.5
Any guesses? Thanks.
As the column 'ranking' is numeric, it may not exactly be equal to the values generated from the sequence due to precision. So, the filter can be either with <, > operators or use a convenient wrapper between
e %>%
group_by(code) %>%
filter(between(ranking, winner_lower, winner_upper))

Creating new column in one dataframe based on column from another dataframe

I have a dataframe as follows:
dput(head(modellingdata, n = 5))
structure(list(X = 1:5, heading = c(2, 0.5, 2, 1.5, 2), StartFrame = c(27L,
28L, 24L, 31L, 35L), StartYaw = c(0.0719580421911421, 0.0595571128205128,
0.0645337707459207, 0.0717132524475524, 0.066818187062937), FirstSteeringTime = c(0.433389999999999,
0.449999999999989, 0.383199999999988, 0.499899999999997, 0.566800000000001
), pNum = c(1L, 1L, 1L, 1L, 1L), EarlyResponses = c(FALSE, FALSE,
FALSE, FALSE, FALSE), PeakFrame = c(33L, 34L, 32L, 38L, 46L),
PeakYaw = c(0.201025641025641, 0.140734297249417, 0.187890472913753,
0.154032698135198, 0.23129368951049), PeakSteeringTime = c(0.533459999999998,
0.550099999999986, 0.516700000000014, 0.616600000000005,
0.750100000000003), heading_radians = c(0.0349065850398866,
0.00872664625997165, 0.0349065850398866, 0.0261799387799149,
0.0349065850398866), error_rate = c(2.86537083478438, 11.459301348013,
2.86537083478438, 3.82015500141104, 2.86537083478438), error_growth = c(0.34899496702501,
0.0872653549837393, 0.34899496702501, 0.261769483078731,
0.34899496702501)), row.names = c(NA, 5L), class = "data.frame")
Each row of my df is a trial. Overall, I have 3037 rows (trials). pNum denotes the participant number - I have 19 participants overall.
I also have a dataframe of intercepts for each participant:
dput(head(heading_intercept, n = 19))
c(0.432448612242496, 0.446371667203615, 0.420854119185846, 0.366763485495426,
0.355619586392715, 0.381658477093055, 0.512552445721875, 0.317210665852951,
0.358345666677048, 0.421441965798511, 0.477135103908373, 0.325512003640487,
0.5542144068862, 0.454182438162137, 0.333993738757344, 0.424179318544432,
0.272486598058728, 0.37014581658542, 0.397112817663261)
What I want do is create a new column "intercept" in my modellingdata dataframe. If pNum is 1, I want to select the first intercept in the heading_intercept dataframe and input that value for every row where pNum is 1. When pNum is 2, I want to input the second intercept value into every row where pNum is 2. And so on...
I have tried this:
for (i in c(1:19)){
if (modellingdata$pNum == i){
modellingdata$intercept <- c(heading_intercept[i])
However this just inputs the first heading_intercept value for every row and every pNum. Does anybody have any ideas? Any help is appreciated!
modellingdata$intercept <- heading_intercept[modellingdata$pNum]
Or with minimum modification of your current loop:
modellingdata$intercept <- 0L
for (i in c(1:19)){
rows <- modellingdata$pNum == i
if (any(rows)) {
modellingdata$intercept[rows] <- heading_intercept[i]
