I have the following data which contains data from 7 combinations (rows) and 12 methods (columns).
structure(list(Beams = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 2L
), .Label = c("1 – 2", "1 – 2 – 3 – 4", "1 – 3", "1 – 4", "2 – 3",
"2 – 4", "3 – 4"), class = "factor"), Slope...No.weight = c(75L,
65L, 45L, 30L, 95L, 70L, 75L), Slope...W1 = c(85L, 70L, 65L,
55L, 90L, 85L, 75L), Slope...W2 = c(80L, 65L, 65L, 50L, 90L,
90L, 75L), Slope...W3 = c(80L, 75L, 75L, 65L, 90L, 95L, 80L),
Average.Time...No.Weight = c(75L, 65L, 45L, 30L, 95L, 70L,
70L), Average.Time...W1 = c(70L, 60L, 75L, 60L, 75L, 75L,
80L), Average.Time...W2 = c(65L, 40L, 65L, 50L, 75L, 85L,
70L), Average.Time...W3 = c(65L, 40L, 80L, 75L, 65L, 85L,
80L), Momentum...No.weight = c(80L, 60L, 45L, 30L, 95L, 70L,
75L), Momentum...W1 = c(85L, 75L, 60L, 55L, 95L, 90L, 80L
), Momentum...W2 = c(80L, 65L, 70L, 50L, 90L, 90L, 85L),
Momentum...W3 = c(85L, 75L, 75L, 55L, 90L, 95L, 80L)), .Names = c("Beams",
"Slope...No.weight", "Slope...W1", "Slope...W2", "Slope...W3",
"Average.Time...No.Weight", "Average.Time...W1", "Average.Time...W2",
"Average.Time...W3", "Momentum...No.weight", "Momentum...W1",
"Momentum...W2", "Momentum...W3"), class = "data.frame", row.names = c(NA,
-7L))
I would like to get a barplot like the one below:
I've tried with
library(RColorBrewer)
dat<-read.csv("phaser-p13-30dBm-100ms.csv")
names <- c("1-2","1-3","1-4","2-3","2-4","3-4","1-2-3-4")
barx <-
barplot(as.integer(dat2[,2:13]),
beside=TRUE,
col=brewer.pal(12,"Set3"),
names.arg=names,
ylim=c(0,100),
xlab='Combination of beams',
ylab='Correct detection [%]')
box()
par(xpd=TRUE)
legend("top", c("Slope - No weight","Slope - W1","Slope - W2","Slope - W3","Average Time - No weight","Average Time - W1","Average Time - W2","Average Time - W3","Momentum - No weight","Momentum - W1","Momentum - W2","Momentum - W3"), fill = brewer.pal(12,"Set3"),horiz = T)
but I got this error:
Error in barplot.default(as.integer(dat2[, 2:13]), beside = TRUE, col = brewer.pal(12, :
incorrect number of names
Could you find the error?
I've named you dataframe df here and made use of three packages. This is not a base R solution. Given your dataset format, this is the easiest way (IMO) to do this:
library(dplyr)
library(tidyr)
library(ggplot2)
df %>% # dataframe
gather(variable, value, -Beams) %>% # convert to long format excluding beams column
ggplot(aes(x=Beams, y=value, fill=variable)) + # plot the bar plot
geom_bar(stat='identity', position='dodge')
This should get you started, if you wish to use base graphics and not ggplot2:
df <- as.matrix(dat[,-1])
rownames(df) <- dat[, 1]
barplot(df, beside = TRUE, las = 2)
Use ggplot2 package and make sure that your data is neat and ordered?
something like ggplot(dataframe, aes(colour = some_factor))) + geom_bar(aes(x=Some_variable, y=Some_other_variable))
More explict statement as to how your data matches the image would be useful.
Related
I have the following dataframe that I want to plot a histogram for each column:
structure(list(ACTB = c(11.7087918, 13.1847403, 8.767737, 12.2949669,
12.399929, 12.130683, 9.816222, 10.700336, 11.862543, 12.479818,
12.48152, 11.798277, 12.0932696, 11.014992, 12.3496682, 11.9810211,
11.946094, 12.1517049, 11.6794028, 12.4895911, 12.787039, 12.2927522,
12.746232, 12.4428358, 11.6382198, 11.6833202, 12.3320067, 12.390378,
12.5550587, 11.597384, 11.7608624, 12.018702, 11.9211984, 11.7143178,
11.800693, 12.7543979, 12.7028472, 11.6509804, 11.5112258, 12.36468,
12.0704304, 12.5876125, 12.2929857, 11.764464, 12.3740263, 12.275172,
11.5247418, 11.9290723, 11.100383, 12.5631062, 10.647334, 12.265323,
11.457643, 12.194339, 11.468173, 12.355388, 12.3233796, 12.200504,
11.716417, 12.430028, 11.3201558, 11.43911, 12.9782049, 11.139062,
11.181185, 10.123614, 11.963833, 10.919224, 11.873896, 11.800616,
12.2159602, 11.6360763, 11.6204291, 11.5500821, 12.6783682, 11.918854,
11.8701782, 10.98058, 11.6254916, 12.1558646, 11.533709, 12.0096358,
12.2830638, 11.772724, 11.8853726, 12.041823, 12.623814, 12.3134903,
11.6714245, 12.1333082, 12.4747336, 11.5326378, 12.6222532, 10.922728,
10.9492515, 11.3410073, 12.3005053), ATP5F1 = c(8.3731175, 8.3995189,
8.871088, 8.4389342, 8.529104, 9.004405, 8.883721, 8.70097, 8.24411,
8.393635, 8.76813, 8.756177, 8.4418168, 7.986864, 8.4840108,
8.6523954, 8.5645576, 8.2452877, 8.2440872, 8.7155973, 9.028364,
8.3578703, 9.007441, 7.8892308, 9.0255621, 8.3165712, 8.3400111,
8.061171, 8.5216917, 8.337517, 8.2341439, 8.810458, 8.8794988,
8.4657149, 8.311901, 8.131606, 8.5865282, 9.0900416, 8.8407707,
7.437107, 8.3982759, 8.7610335, 8.3624475, 8.353429, 8.3630127,
8.555639, 8.6435841, 8.9587154, 8.517079, 8.9597121, 8.111514,
8.99767, 8.266991, 8.106218, 8.518875, 8.445485, 8.6409752, 8.662025,
8.697312, 8.071819, 8.3113401, 8.709276, 8.9154896, 8.138148,
6.866765, 9.391611, 8.448086, 8.29189, 8.541953, 8.801044, 8.3088083,
8.288688, 8.8357729, 8.4731257, 8.7321095, 8.383259, 8.4729561,
5.551528, 8.526436, 8.4548827, 8.242625, 8.9862422, 8.5688994,
8.848029, 8.2656363, 8.434976, 8.8023704, 8.6692361, 8.4333198,
8.2926568, 8.2141276, 8.3246346, 7.7262395, 8.0797336, 8.7005427,
8.7695946, 8.1262312), DDX5 = c(11.3122241, 11.7042284, 8.866042,
12.0376754, 12.417701, 11.479431, 10.078783, 9.043405, 11.216074,
11.846906, 11.161803, 8.713301, 11.0790887, 11.685125, 11.9599302,
12.4036502, 11.9778411, 11.9900709, 11.6069971, 11.2651929, 11.455536,
12.3741866, 11.558182, 11.498146, 12.5073231, 11.4546523, 11.8465482,
11.51445, 11.721283, 12.340818, 11.5388553, 11.920725, 11.7067172,
11.6207138, 11.638226, 11.1407525, 11.5832407, 11.981909, 11.7684202,
12.435987, 11.5253382, 10.9882446, 12.1789747, 11.956257, 12.5427815,
12.007658, 11.6360041, 12.2520109, 11.858959, 12.4740761, 6.927855,
11.117424, 7.749824, 11.518817, 11.322855, 11.74096, 11.768474,
11.497009, 11.912888, 11.570506, 11.8167398, 11.912566, 11.2631437,
11.328946, 11.072161, 12.807216, 12.127281, 12.125497, 11.524622,
11.20101, 11.5451414, 12.0747211, 11.5716524, 11.7223929, 11.8529683,
11.868865, 11.8998228, 9.859857, 12.1404707, 11.9166386, 12.613162,
12.9062351, 11.6691732, 11.984726, 11.727059, 11.421816, 11.9506736,
12.2447547, 11.8167228, 11.9021356, 12.5527606, 12.6511506, 11.8550833,
11.382018, 11.8314198, 11.8394352, 11.8128198), EEF1G = c(12.622405,
11.2945857, 8.610078, 13.1323891, 12.702769, 12.319703, 10.181874,
8.615338, 11.526551, 12.106198, 11.602801, 9.137166, 13.0991666,
13.049641, 12.2938678, 11.7442632, 12.7866184, 12.6753617, 12.9552413,
12.0861518, 13.136434, 12.64865, 13.298616, 11.8531038, 12.7791485,
13.4150478, 11.636058, 12.013313, 11.8785493, 12.771945, 12.5351321,
13.147321, 11.6760014, 12.2604174, 11.802344, 12.23351, 12.1175728,
12.7360727, 12.5730595, 11.13, 11.7737462, 11.9774565, 11.8927844,
12.17392, 12.441605, 12.221691, 12.4866463, 12.5645763, 12.070268,
12.1801377, 8.80704, 12.288168, 8.298831, 12.234659, 11.832415,
12.474423, 12.4440819, 11.888544, 11.625162, 12.161204, 12.2707656,
12.941017, 12.3491325, 12.978561, 11.833124, 11.782119, 12.273029,
12.462202, 12.538127, 12.236135, 12.2884941, 12.4195123, 12.5274317,
12.3917089, 11.912339, 12.439751, 12.0962051, 10.912737, 11.999598,
12.3776528, 11.348448, 12.4151316, 11.5389366, 11.328957, 12.4397802,
12.238454, 12.0192408, 12.2290439, 12.8381542, 11.1834666, 12.0636739,
12.4752125, 12.7681644, 12.1747129, 12.7343662, 12.3493937, 11.7971488
)), class = "data.frame", row.names = c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L,
20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L,
33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L,
46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L,
59L, 60L, 61L, 62L, 63L, 64L, 66L, 67L, 68L, 69L, 70L, 71L, 72L,
73L, 75L, 76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L,
87L, 88L, 89L, 90L, 91L, 92L, 93L, 97L, 98L, 99L, 100L, 102L,
103L))
I want to create a grid of histograms for each column, the list of column is:
HK_GENES = c(
"ACTB", "ATP5F1", "DDX5", "EEF1G"
)
Is there a way of doing it with ggplot2?
I tried with no success the following:
ggplot(data=df_hk_genes, aes_string(x=HK_GENES)) +
geom_histogram(bins=15) +
facet_wrap(HK_GENES, nrow = 5, scale = "free_x")
In python I could create a subfigure for each histogram an iterate over it.
I have around 20 column in my original dataframe, and I want to avoid calling the same block with different column
You can reshape the data and facet over the groups.
library(reshape2)
library(dplyr)
melt(df_hk_genes) %>%
ggplot(aes(x = value)) +
facet_wrap(~ variable, nrow = 5, scale = "free_x") +
geom_histogram(bins=15)
I have been working on this for a while now, but I can't seem to figure it out. I'm looking for a solution that can: calculate difference between col1 and col2 and create colA based on this; then calculate difference between col2 and col3 and create colB based on this, etc. etc. I have about 70 rows and 42 of these columns so it's not something I want to do by hand (at this point I am almost desperate enough).
To give a note also, some of the cells in the rows are empty (NA). An emergency solution would be to fill these with zeroes, but I'd rather not.
Also, the dataframe I use is a tibble, however, I am not bound to this so much that I can't change it to a real dataframe.
My data looks like this:
testdata
As you can see, the columns have annoyingly long names I did not know how to change also :). I use the column numbers usually, which are 77:119. I hope this is complete enough. Sorry for the noob-ness and possibly unclear explanation, this is my first question on here and I'm not that craftsy in R!
Finally, to create the 'user/intermittent_answers/n_length' columns I used the following loop, so I thought it'd be possible to reuse this for the calculations that I need now.
#loop through PARTS of testdata to create _length's
for(i in names(testdata[34:76]))
testdata[[paste(i, 'length', sep="_")]] <- str_length(testdata[[i]])
Then I tried something similar which I found here: FOR loop to calculate difference on dates in R
for (j in 2:length(testdata$`user/intermittant_answers/42_length`))
+ testdata$lag[j] <- as.numeric(difftime(testdata$`user/intermittant_answers/42_length`[j], testdata$`user/intermittant_answers/42_length`[j-1], units=c("difference")), units = "days")
Error in as.POSIXct.numeric(time1) : 'origin' must be supplied
I figured this was because I am not working with anything time related, but I don't know/don't know how to find another 'diff' related function that is not bound to matrixes like the one from matrixStats package.
I hope someone can push me in the right direction!
Thank you!!
EDIT: #Ben, thank you for responding! If I had known this function I would've used it way sooner :'). I tried to keep a representation of NA values inside the df. Also, some people suggested using a double loop, however, I have not managed to figure this out. I hope this helps!
> dput(testdata[1:10, 95:105])
structure(list(`user/intermittant_answers/18_length` = c(NA,
24L, 34L, 33L, NA, NA, 16L, NA, 25L, 28L), `user/intermittant_answers/19_length` = c(NA,
38L, 68L, 34L, NA, 11L, 20L, 12L, 47L, 52L), `user/intermittant_answers/20_length` = c(NA,
59L, 81L, 42L, 2L, 33L, 20L, 26L, 96L, 78L), `user/intermittant_answers/21_length` = c(6L,
90L, 116L, 42L, 14L, 41L, 20L, NA, 127L, 113L), `user/intermittant_answers/22_length` = c(17L,
115L, 131L, 65L, 20L, 70L, 37L, 11L, 170L, 130L), `user/intermittant_answers/23_length` = c(40L,
138L, 188L, 65L, 38L, 113L, 22L, 24L, 200L, 136L), `user/intermittant_answers/24_length` = c(66L,
155L, 210L, 99L, 49L, 133L, 41L, 49L, 242L, 185L), `user/intermittant_answers/25_length` = c(66L,
158L, 233L, 99L, 65L, 156L, 67L, 70L, 296L, 224L), `user/intermittant_answers/26_length` = c(84L,
201L, 250L, 113L, 84L, 164L, 67L, 78L, 334L, 224L), `user/intermittant_answers/27_length` = c(89L,
237L, 285L, 130L, 97L, 167L, 84L, 86L, 412L, 232L), `user/intermittant_answers/28_length` = c(116L,
284L, 315L, 130L, 97L, 184L, 97L, 108L, 445L, 247L)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
I am trying to mimic some figures from journal papers. Here is an example from Schlenker and Roberts (2009).
I'd like to add a similar histogram to my own plot. Please see below. Is it possible to achieve this task with ggplot? Thanks.
See a dput data below. rh represents x axis and yhat1 indicates the y axis.
> dput(df.m[,c('rh','yhat1')])
structure(list(rh = c(11L, 13L, 15L, 16L, 17L, 18L, 19L, 20L,
21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L,
34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L,
47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L,
60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L,
73L, 74L, 75L, 76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L,
86L, 87L, 88L, 89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L,
99L, 100L), yhat1 = c(0.0097784, 0.111762325, 0.0887123966666667,
0.24714677, 0.079887235, 0.162714825, 0.24789043, 0.107558165,
0.182885584545455, 0.136690964444444, 0.159203683333333, 0.5156053805,
0.587034213636364, 0.233377613, 0.31531245, 0.4778449572, 0.212574774137931,
0.2274105676, 0.253733041707317, 0.560999839354839, 0.224892959444444,
0.392268151304348, 0.351498776603774, 0.366547010727273, 0.35013903469697,
0.382026272372881, 0.510611202461538, 0.391176294871795, 0.423356474328358,
0.380316089137931, 0.459821489651163, 0.388949226593407, 0.506833284166667,
0.459263999259259, 0.558535709906542, 0.745323656071429, 0.60167464606383,
0.72210854266129, 0.695203745656566, 0.638265557105263, 0.52373110503876,
0.611695133046875, 0.963833986386555, 0.803060819275362, 0.837984669112426,
0.7931166204, 0.870764136976744, 1.21005393820225, 0.862845527777778,
1.028402381125, 1.2077895633526, 1.01176334204082, 1.08139833964706,
0.90346288, 1.05871937863014, 1.27788244930233, 1.16250975336634,
1.1450916525, 1.4412301412, 1.21264826238281, 1.35417930411504,
1.18588206727273, 1.40277204710084, 1.33194569259259, 1.18413544210084,
1.22718163528571, 1.33992107226667, 1.44770425268156, 1.43974964777778,
1.26656031551351, 1.58998655363636, 1.29994566024272, 1.46398530493902,
1.26061274530055, 1.30718501225275, 1.20523443567901, 1.23789593428571,
1.34433582230769, 1.36438752851852, 1.5915544857037, 1.10979387898438,
1.31898147708661, 1.426120105, 1.52075980155738, 1.40629729460177,
0.9048366681, 1.2973945580531, 1.37696154192982)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -88L))
Hopefully this can get you started:
library(ggplot2)
breaks <- 20
maxcount <- max(table(cut(df.m$rh,breaks = 20))) + 1
ggplot(data = df.m, aes(x = rh)) +
stat_smooth(formula = y ~ x, aes(y = yhat1 * 10 + maxcount), method = "loess") +
scale_y_continuous(breaks = c(0,5), "Exposure (Days)",
sec.axis = sec_axis(~ (. - maxcount) /10,
"Log of Daily Confirmed Case Counts")) +
geom_histogram(bins = breaks, color = "black", fill = "green") +
geom_segment(aes(x = 85, xend = 85, y = 0 + maxcount, yend = Inf),
col = "red", linetype = "dashed") +
labs(x = "Relative Humidity Percentage") + theme_classic() +
theme(axis.line.y.left = element_line(color= "green"),
axis.title.y.left = element_text(hjust = 0.05, color = "green"))
I am trying to add a scatterplot and a barplot within the same plot area with ggplot. The scatterplot should be averages of var. '1' over var.'2' for one dataset, and the barplot should be the average value of '1' over my control dataset.
My data looks like this:
> dput(lapply(ubbs6, head))
list(structure(c(96L, 96L, 100L, 88L, 93L, 100L, 61L, 61L, 70L,
40L, 58L, 70L, 7807L, 7357L, 7695L, 6400L, 6009L, 7735L), .Dim = c(6L,
3L), .Dimnames = list(NULL, c("1", "2", "3"))), structure(c(99L,
96L, 100L, 96L, 96L, 96L, 66L, 67L, 70L, 63L, 57L, 62L, 7178L,
6028L, 6124L, 6082L, 6873L, 5629L, 31L, 27L, 60L, 42L, 12L, 18L
), .Dim = c(6L, 4L), .Dimnames = list(NULL, c("1", "2",
"3", "4"))), structure(c(99L, 95L, 95L, 100L, 96L, 95L, 69L,
58L, 56L, 70L, 61L, 65L, 6067L, 6331L, 6247L, 5988L, 7538L, 6162L,
50L, 36L, 67L, 10L, 55L, 70L), .Dim = c(6L, 4L), .Dimnames = list(
NULL, c("1", "2", "3", "4"))))
Example of what I've tried so far:
aggregate(ubbs6[[2]][,'1'], list(ubbs6[[2]][,'2']), mean)
m162 <- aggregate(ubbs6[[2]][,'1'], list(ubbs6[[2]][,'2']), mean)
m163 <- aggregate(ubbs6[[3]][,'1'], list(ubbs6[[3]][,'2']), mean)
m161 <- mean(ubbs6[[1]][,'1'])
ggplot(m162, aes_(x = m162[,'Group.1'], y = m162[,'x']))+
geom_point()+
geom_smooth(method = 'lm', formula = 'y ~ sqrt (x)')
I would like to do two things:
add a barplot of one x,y value of my control set (ubbs6[[1]])
throw this into a lapply structure so I can do this for 11 similar datasets
Any help would be greatly appreciated!
**EDIT: edited out specific details that aren't needed for others to understand the code **
Saving your data in d, you can try
ggplot(as.data.frame(d[[2]]),aes(age, FPAR) ) +
coord_cartesian(ylim = c(90,100)) +
geom_point() +
geom_smooth(method = 'lm', formula = 'y ~ sqrt (x)') +
geom_col(data=data.frame(x=max(as.data.frame(d[[2]])$age),
y=mean(as.data.frame(d[[1]])$FPAR)),
aes(x,y), inherit.aes = FALSE)
You have to use coord_cartesian to specify the y-limits and inherit.aes = FALSE. Otherwise the bar is not correctly drawn.
When you have to combine your second and third dataframe in one plot, you can try
library(tidyverse)
d %>%
.[2:3] %>%
map(as.data.frame) %>%
bind_rows(.id = "id") %>%
mutate(max = max(age),
Mean = mean(d[[1]][1])) %>%
ggplot(aes(age, FPAR, color=id)) +
geom_point() +
geom_smooth(method = 'lm', formula = 'y ~ sqrt (x)', se=FALSE) +
geom_col(data = . %>% distinct(max, Mean),
aes(max, Mean), inherit.aes = FALSE)
I want to create a barplot and my data is in a csv file in the following format
0,22
40,50
80,62
120,70
160,62
200,49
240,52
280,64
320,57
360,50
400,47
440,52
480,73
520,70
560,68
600,71
640,69
680,61
720,59
760,59
800,62
840,62
880,62
920,72
960,81
1000,89
1040,86
1080,76
1120,80
1160,95
The element before the comma should be the position in the x axis and the element after the comma the height= of the bar at that position. I can do this in Excel but the data is large.
The graph I want would look like this.
I have tried the following but I think it sums the data in each row.
data <- as.matrix(read.csv(file="data.csv",sep=",",header=FALSE))
barplot(data)
barplot(x$V2, names.arg = seq_len(nrow(x)), cex.names = .6)
two things: first, if you supply the whole matrix to the height parameter of barplot, it will sum them. instead, give it only your data.
dput(dat)
structure(c(0L, 40L, 80L, 120L, 160L, 200L, 240L, 280L, 320L,
360L, 400L, 440L, 480L, 520L, 560L, 600L, 640L, 680L, 720L, 760L,
800L, 840L, 880L, 920L, 960L, 1000L, 1040L, 1080L, 1120L, 1160L,
22L, 50L, 62L, 70L, 62L, 49L, 52L, 64L, 57L, 50L, 47L, 52L, 73L,
70L, 68L, 71L, 69L, 61L, 59L, 59L, 62L, 62L, 62L, 72L, 81L, 89L,
86L, 76L, 80L, 95L), .Dim = c(30L, 2L), .Dimnames = list(NULL,
c("V1", "V2")))
barplot(height=dat[,2])
second, you need to supply the names.arg to barplot to get the labeling:
barplot(height=dat[,2], names.arg=dat[,1])
a side note: its best to avoid naming variables with built in R functions. ?data is probably the most commonly overwritten! I use dat instead regularly.
Using your method of getting the data into R:
myData <- read.csv(file = "data.csv", sep = ",", header = FALSE)
To make sure that the order of the bars follows the order of the values in the first column (although this is not strictly what you asked for in your question)
myData2 <- myData[order(myData[, 1]), ]
barplot(myData2[, 2], names.arg = myData2[, 1])
For tweaking the graph, I recommend spending some time reading ?barplot and ?par