how to control the line thickness in geom_line - r

I would like to control the thickness for lines in my plotting, however I ran into some difficulty. It seems like if I add size=0.06 or size=2 in geom_line(), it did not really change the line thickness to different size. Also it added strange legend in the output. how should I fix that?
I codes I used to do plotting are:
ggplot(data =df)+
geom_line(aes(x = ADY, y = AVAL, color = PARAMCD, yaxs="d", xaxs="d", size=0.06))+
geom_point(aes(x = ADY, y = AVAL))+
scale_color_discrete(breaks=c("SYSBP", "DIABP", "PULSE"),name = "Vital signs", labels = c("Systolic BP", "Diastolic BP", "Pulse"))+
scale_colour_manual(values=c(DIABP="#512d69",SYSBP="#007254",PULSE="#fd9300"))
The output for size=0.06 and size =2 are:
Could someone give me some guidance on this? I don't want size to be shown on legend and I would like to control the thickness of the lines.Thanks.
The sample data can be build using codes:
df<- structure(list(ADY = c(-6, -6, -6, 1, 1, 1, 8, 8, 8, 15, 15,
15, 22, 22, 22, 29, 29, 29, 43, 43, 43, 57, 57, 57, 64, 87, 87,
87, 101, 101, 101), AVAL = c(66, 67, 127, 70, 58, 136, 68, 74,
140, 145, 74, 58, 75, 72, 149, 82, 66, 143, 86, 60, 159, 64,
87, 136, NA, 73, 58, 135, 141, 74, 74), PARAMCD = structure(c(3L,
1L, 2L, 1L, 3L, 2L, 3L, 1L, 2L, 2L, 1L, 3L, 1L, 3L, 2L, 1L, 3L,
2L, 1L, 3L, 2L, 3L, 1L, 2L, NA, 1L, 3L, 2L, 2L, 1L, 3L), .Label = c("DIABP",
"SYSBP", "PULSE"), class = "factor")), row.names = c(NA, -31L
), class = "data.frame")

size should be outside aes for your case :
You can see the difference between between size = 0.06 and size = 2.
library(ggplot2)
ggplot(data =df)+
geom_line(aes(x = ADY, y = AVAL, color = PARAMCD, yaxs="d", xaxs="d"), size=0.06) +
geom_point(aes(x = ADY, y = AVAL))+
scale_colour_manual(values=c(DIABP="#512d69",SYSBP="#007254",PULSE="#fd9300"))
ggplot(data =df)+
geom_line(aes(x = ADY, y = AVAL, color = PARAMCD, yaxs="d", xaxs="d"), size=2) +
geom_point(aes(x = ADY, y = AVAL))+
scale_colour_manual(values=c(DIABP="#512d69",SYSBP="#007254",PULSE="#fd9300"))

Related

Determine the difference between the medians of two groups with 95% CI in R (not the median of the differences)

I have data on a change in a continuous outcome and the baseline score of this outcome of 49 patients. Further I've made a factor variable dividing patients in a low baseline score (Q1) or a high baseline score (Q2) based on the median baseline score. This data looks as follows:
library(boot)
mydata <-
structure(
list(
ID=c(4, 13, 20, 24, 30, 34, 37, 38, 48, 49, 51, 52, 54, 58, 75, 80, 81, 82, 83, 84, 92, 95, 103, 104, 115,
117, 125, 127, 138, 141, 153, 160, 172, 180, 185, 197, 198, 202, 205, 213, 221, 253, 255, 258, 262,
271, 277, 279, 320),
change_continuous_outcome = c(694, 52, 1500, 195, 53, 54, -500, 2, -21, 394, -10, -38, 43, 1500,
-500, -11, 8, 149, 0, 473, 8, 797, 313, 9, 263, 1219, 68, 216,
75, 0, 95, 698, -1, 750, 168, 251, -381, 19, 70, 0, 182, 4, -28,
36, 37, 18, 3, 928, -4),
baseline_continuous_outcome = c(2646.8, 3112.4, 10661.6, 5706.7, 81.5, 3730.4, 196.1, 83.9, 177.3, 1976.7,
3196.8, 2007.5, 63.2, 7594.5, 3261.8, 155.2, 57.2, 11189.7, 0,
2800.8, 13.9, 3484.5, 3528.1, 3636.6, 9.1, 5681.4, 67.9, 205.4, 138.4,
3141.1, 138.5, 3795.9, 152.7, 7349.1, 2123.4, 122, 5935.8, 100.7,
2023.4, 4095.4, 2636.1, 11.9, 2241.1, 198.2, 186, 20.2, 97.7, 6709.8, 169.5),
q2vsq1_baseline_cont_outcome = structure(c(2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L,
1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L,
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L,
1L, 1L, 2L, 1L ), .Label = c("0", "1"), class = "factor")),
row.names = c(NA, -49L),
class = c("tbl_df", "tbl", "data.frame"))
I've performed a Wilcoxon Rank Sum test to compare the continuous_outcome_change variable between patients with a low baseline score and a high baseline score:
wilcox.test(mydata$change_continuous_outcome ~ mydata$q2vsq1_baseline_cont_outcome)
Wilcoxon rank sum test with continuity correction
data: mydata$change_continuous_outcome by mydata$q2vsq1_baseline_cont_outcome
W = 201.5, p-value = 0.04995
alternative hypothesis: true location shift is not equal to 0
Warning message:
In wilcox.test.default(x = c(53, -500, 2, -21, 394, 43, -11, 8, :
cannot compute exact p-value with ties
Now I am interested in computing the difference between the two medians of the groups including a 95% confidence interval. I want to use the boot function to do this, which takes two arguments: one for the data and one to index the data. So I need to write a function that indexes my data/calculates the median between the groups. Borrowing something I found elsewhere (https://data.library.virginia.edu/the-wilcoxon-rank-sum-test/) I made:
med.diff <- function(d, i) {
mydata <- d[i,]
median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="2"]) -
median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="1"])
}
boot_result <- boot(data=mydata, statistic=med.diff, R=1000)
median(boot_result$t)
boot.ci(boot_result, type = "perc")
However this returns NA results. Is there something faulty in my formula? Or is the problem elsewhere?
Thanks in advance!
From what I can tell, the error you are getting is from the line:
median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="2"])
which is NA. When your data structure for the baseline count outcome was defined, you converted it to a factor, but relabeled it. So the integers 1 and 2 look to be relabeled as 0 and 1 in the data frame. Then your search for the value of "2" in that column returns NA because it doesn't exist. If you change your function to:
med.diff <- function(d, i) {
mydata <- d[i,]
median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="1"]) -
median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="0"])
}
you get:
median(boot_result$t)
> 143
boot.ci(boot_result, type = "perc")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot.ci(boot.out = boot_result, type = "perc")
Intervals :
Level Percentile
95% ( -1.0, 579.4 )
Calculations and Intervals on Original Scale

ggplot2: How to change the width of boxplots according to another variable?

I created a boxplot showing the dispersal distance $dist of some species $spe, and I would like the width of the boxes to be proportional to the density of regeneration of these species. I used "varwidth" and weight aesthetic as shown below, but this is still not correct, as it is still proportional to the number of observations and not only to the density of regeneration...
(for the density, I calculated the proportion for each species, so it goes from 10 to 100. It is given in the column data_dist2$prop2)
p <- ggplot(data_dist2, aes(x = reorder(spe, prop2), y = dist)) +
coord_flip() +
geom_boxplot(varwidth = TRUE, alpha=0.3, aes(weight=data_dist2$prop2), fill='grey10')
Would you have any idea how to make the boxplot exactly proportional to my prop2 column?
Reproductive example :
structure(list(spe = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("Abies concolor", "Picea abies", "Sequoia semp."
), class = "factor"), dist = c(0, 0, 3, 3, 4, 4, 25, 46, 59,
113, 113, 9, 12, 12, 12, 15, 22, 22, 22, 22, 35, 35, 36, 49,
85, 85, 90, 5, 5, 1, 1, 8, 13, 48, 48, 52, 52, 52, 65, 89), prop2 = c(92.17,
92.17, 92.17, 92.17, 92.17, 92.17, 92.17, 92.17, 92.17, 92.17,
92.17, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9,
10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 10.9, 100, 100, 100, 100,
100, 100, 100, 100, 100, 100, 100, 100, 100)), row.names = c(NA,
-40L), class = "data.frame")
Weight doesn't seem to be designed exactly for this, but you can hack it a bit. First note that the weight given to each group is the sum of the weights of the observations, so if you have a different number of observation for each species then you may need to change prop2 to the current value divided by the number of observations in the group. (I can't tell from your example if this applies)
Then note that the width is proportional to the square root of the weight, so change your code to reverse that with:
p <- ggplot(data_dist2, aes(x = reorder(spe, prop2), y = dist)) +
coord_flip() +
geom_boxplot(varwidth = TRUE, alpha=0.3, aes(weight=data_dist2$prop2^2), fill='grey10')
Miff beats me to it, but anyway here's my answer. As Miff said, you can weight the width by your prop2.
ggplot(data_dist2, aes(x = reorder(spe, prop2), y = dist)) +
geom_boxplot(aes(weight = prop2),
varwidth = TRUE,
fill='grey10', alpha=0.3) +
coord_flip()
But geom_boxplot() implicitly takes the sample size into account. So you need to divide that away in your weights. Here's how you can do it with data.table.
library(data.table)
setDT(data_dist2) # convert to data.table
data_dist2[, weight := prop2 / .N, by = spe] # Divide prop2 by sample size for each species
ggplot(data_dist2, aes(x = reorder(spe, prop2), y = dist)) +
geom_boxplot(aes(weight = weight), # note weight = weight, not weight = prop2
varwidth = TRUE,
fill='grey10', alpha=0.3) +
coord_flip()

Is it possible to add different linetypes based on group to the reg.line in ggscatter?

I am creating a scatter plot between a dependent variable FA and subject age (in yrs) that has 2 group conditions (Exercise and Rest).
I am having issues setting the regression lines so that I can have the regression line for the Exercise group be solid and the Rest group be dotted.
It seems as if ggscatter won't allow me to provide 2 different linetypes in add.params = list(linetype=c("solid", "dotted"))?
I have looked at the source code and using
add.params = list(linetype="dotted") changes the linetype for both Ex and Rest group. But when trying add.params = list(linetype=c("solid", "dotted")) I get the error
Error: Aesthetics must be either length 1 or the same as the data (160): linetype
diffusion_data <-
structure(list(FA_full_cov = c(0.153232, 0.164497, 0.111886,
0.14139, 0.130546, 0.18607, 0.181865, 0.139148, 0.178903, 0.136147,
0.140427, 0.143346, 0.140975, 0.148248, 0.128336, 0.147552, 0.126607,
0.127531, 0.153574, 0.124305, 0.168183, 0.146543, 0.135313, 0.139777,
0.148862, 0.154091, 0.131398, 0.145124, 0.136015, 0.128609, 0.159028,
0.158221, 0.124092, 0.139492, 0.142623, 0.195182, 0.229651, 0.144567,
0.169234, 0.181687, 0.136057, 0.14369, 0.143988, 0.152487, 0.109607,
0.139264, 0.139382, 0.13402, 0.159948, 0.141635, 0.177908, 0.133823,
0.196866, 0.204928, 0.15321, 0.150005, 0.126811, 0.158618, 0.135901,
0.147437), age = c(63, 57, 75, 75, 72, 58, 60, 63, 56, 58, 65,
81, 65, 65, 77, 74, 74, 67, 55, 56, 79, 59, 64, 71, 60, 63, 70,
68, 74, 68, 63, 57, 75, 75, 72, 58, 60, 63, 56, 58, 65, 81, 65,
65, 77, 74, 74, 67, 55, 56, 79, 59, 64, 71, 60, 63, 70, 68, 74,
68), Conditions = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("Ex", "Rest"), class = "factor")), row.names =
c(NA, 60L), class = "data.frame")
library(ggpubr)
a = ggscatter(diffusion_data, y="FA_full_cov", x="age", color = "Conditions", palette = c("black", "grey39"), shape = "Conditions", add = "reg.line", add.params = list(linetype=c("solid", "dotted")), conf.int = TRUE, cor.coef = TRUE, cor.method = "pearson", cor.coef.size = 5, cor.coef.coord = c(70,.25)) +
ggtitle("Hippocampal FA with Respect to Age") + xlab("Age (years)") + ylab("FA") + theme(plot.title = element_text(hjust = 0.5, size = 30)) +
theme(axis.text.x = element_text(size = 20)) +
theme(axis.text.y = element_text(size = 15)) +
theme(axis.title.y = element_text(size = 20)) +
theme(axis.title.x = element_text(size = 20)) +
theme(legend.text = element_text(size=15)) +
scale_shape_manual(values = c(16,1))
ggpar(a, ylim = c(.05,.25))
a
FA_full_cov and age are continuous variables and Conditions has 2 factors (Ex and Rest)
Image of graph when just using add.params = list(linetype="dotted"):
Below is the code I used to recreate the graph but using ggplot instead of ggscatter I was able to change the linetype and color by group with geom_smooth()
b = ggplot(diffusion_data, aes(age,FA_full_cov, shape=Conditions, color =
Conditions)) + geom_point(size=2.5)+ scale_color_manual(values = c("black",
"grey39"))+
geom_smooth(aes(linetype = Conditions, fill=Conditions), method = "lm", formula =
y~x, color ="black") +
scale_fill_manual(values = c("black", "grey39")) +
scale_shape_manual(values = c(16,1)) +
ggtitle("Hippocampal FA with Respect to Age") + xlab("Age (years)") + ylab("FA") +
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5, size = 30),
axis.text.x = element_text(size = 20),
axis.text.y = element_text(size = 15),
axis.title.y = element_text(size = 20),
axis.title.x = element_text(size = 20),
legend.text = element_text(size=15),
legend.text.align = 0,
legend.position = "top"
)
Hmm, I think what you want to achieve is pretty difficult with ggpubr::ggscatter(btw, you should ideally add the library() call).
But it's very easy with ggplot2!
If you are open to using 'simple ggplot' and not some crazy package which is build on top of it, then here is a solution:
library(ggplot2)
ggplot(diffusion_data, aes(age, FA_full_cov, color = Conditions))+
geom_point() + # draw the points
geom_smooth(aes(linetype = Conditions)) + # draw the regression curve/ line.
# For regression lines, specify method = 'lm'
scale_color_brewer(palette = 'Greys') # just for the sake of it
#> `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Created on 2019-07-24 by the reprex package (v0.2.1)
You can easily change the linetype using scale_linetype_manual
Note that I have removed big parts of your plot code because most of it wasn't necessary for the actual problem.
In ggscatter try:
add.params = list(linetype = "Conditions")

Setting range and breaks on scale on ggplot2

Using a sample dataframe:
df <- structure(list(SITCD = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("GSO/TO", "IKOF", "JL",
"MES", "SSD", "USSD"), class = "factor"), Code = structure(c(27L,
21L, 3L, 25L, 26L, 20L, 2L, 28L, 230L, 16L, 4L, 10L, 15L, 1L), .Label = c("AAR-2107",
"AAR-643", "AAR-644", "AAR-995", "HAR-2956", "HAR-2957", "I-430",
"I-431", "I-432", "I-9490", "I-9491", "K-1461", "K-1740", "K-1915",
"K-2034", "K-2096", "K-2385", "K-2386", "K-2387", "K-3112", "K-3220",
"K-3224", "Lu-1095", "Lu-1103", "LU-3282", "LU-3283", "LU-3284",
"LU-3400", "Lu-487", "Lu-489,90", "Lu-491,92", "Lu-528", "Lu-529",
"Lu-530", "Lu-531", "Lu-585", "Lu-586", "Lu-608", "Lu-646", "Lu-647",
"Lu-648", "Lu-711", "Lu-714", "Lu-766", "Lu-768", "Lu-790", "Lu-792",
"Lu-793", "Lu-826", "Lu-827", "Lu-828", "Lu-829", "Lu-830", "Lu-831",
"Lu584", "M-1611", "M-1612", "M-1613", "M-1614", "M-1615", "M-1616",
"M-1617", "M-1618", "M-1619", "M-1620", "M-1621", "M-1622", "M-1623",
"M-1624", "OS-49305", "OS-49306", "OS-49308", "OS-49309", "OS-49311",
"OS-49312", "OS-49313", "OS-49314", "OS-49315", "OS-49384", "OS-49385",
"OS-49386", "OS-49387", "OS-49403", "OS-49414", "OS-49437", "OS-49440",
"OS-49441", "OS-49442", "OS-49493", "OS-49496", "OS-49499", "OS-49502",
"OS-49506", "OS-49515", "OS-49516", "OS-49517", "OS-49518", "OS-49519",
"OS-49520", "OS-49555", "OS-49558", "OS-49562", "OS-49565", "OS-49578",
"OS-49580", "OS-49581", "OS-49582", "OS-49583", "OS-49584", "OS-49605",
"OS-49606", "OS-49607", "OS-51568", "OS-51716", "OS-51759", "OS-51760",
"OS-51765", "OS-51766", "OS-51767", "OS-51769", "OS-51770", "OS-51774",
"OS-51775", "OS-51776", "OS-51845", "OS-51846", "OS-51847", "OS-51874",
"OS-51875", "OS-51882", "OS-51883", "OS-51884", "OS-51885", "OS-52112",
"OS-52956", "OS-52957", "OS-52962", "OS-52963", "OS-52964", "OS-52966",
"OS-52967", "OS-52968", "OS-52969", "OS-52970", "OS-54002", "OS-54004",
"OS-54005", "OS-54006", "OS-54007", "OS-54008", "OS-54009", "OS-54045",
"OS-54046", "OS-54048", "OS-54073", "OS-54074", "OS-54075", "OS-54076",
"OS-54077", "OS-54892", "OS-55609", "OS-55610", "OS-55611", "OS-55612",
"OS-55613", "OS-55614", "OS-55724", "OS-55725", "OS-55728", "OS-55729",
"OS-55730", "OS-55731", "OS-55732", "OS-55733", "OS-55734", "OS-55735",
"OS-55736", "OS-55737", "OS-58249", "OS-58250", "OS-58324", "OS-58325",
"OS-58326", "OS-58327", "OS-58509", "OS-58606", "OS-58607", "OS-58609",
"OS-58673", "OS-58674", "OS-58701", "OS-58702", "OS-58703", "OS-58704",
"OS-58705", "OS-58732", "OS-58735", "OS-59579", "OS-62849", "OS-62850",
"OS-62851", "OS-62852", "OS-62855", "OS-62985", "OS-62986", "OS-62992",
"OS-62994", "OS-64754", "OS-64755", "OS-64756", "OS-64759", "OS-64760",
"OS-64762", "OS-64764", "OS-64765", "OS-64766", "OS-64843", "OS-64844",
"OS-64845", "OS-64849", "OS-65398", "OS-65399", "OS-65401", "OS-65405",
"OS-65406", "OS-65435", "OS-65436", "OS-65437", "OS-65438", "T-10382",
"Unknown", "W-1381", "Y596", "Y599", "Y600", "Y602", "Y702",
"Y703", "Y704", "Y708", "Y711", "Y712", "Y713", "Y714", "Y716",
"Y717", "Y876", "Y878", "Y879", "Y882", "Y883", "Y884"), class = "factor"),
Type = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 1L, 1L), .Label = c("Above", "At", "Below"), class = "factor"),
RSL = c(5, 8, 17.5, 19, 27, 30, 30, 33, 35, 40, 40, 50, 53,
70), RSL_error = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 5,
2), Age = c(8183.5, 9221.5, 10424.5, 10069, 9092, 10465.5,
9204.5, 10531.5, 9844.5, 10073.5, 9905, 9907.5, 11660, 10698.5
), age_error = c(232.5, 295.5, 519.5, 371, 323, 377.5, 336.5,
324.5, 318.5, 408.5, 327, 380.5, 463, 394.5), x_min_error = c(7951L,
8926L, 9905L, 9698L, 8769L, 10088L, 8868L, 10207L, 9526L,
9665L, 9578L, 9527L, 11197L, 10304L), x_max_error = c(8416L,
9517L, 10944L, 10440L, 9415L, 10843L, 9541L, 10856L, 10163L,
10482L, 10232L, 10288L, 12123L, 11093L), y_min_error = c(3,
6, 15.5, 17, 25, 28, 28, 31, 33, 38, 38, 48, 48, 68), y_max_error = c(7,
10, 19.5, 21, 29, 32, 32, 35, 37, 42, 42, 52, 58, 72)), .Names = c("SITCD",
"Code", "Type", "RSL", "RSL_error", "Age", "age_error", "x_min_error",
"x_max_error", "y_min_error", "y_max_error"), row.names = c(NA,
14L), class = "data.frame")
I wish to draw a graph using the following code:
g <- ggplot (df, aes(x=Age, y=RSL, shape = Type)) +
geom_point() +
scale_shape_manual(values=c(1,15,5)) + #makes open circle/triangle
theme(axis.line=element_line(colour = "black", size = 0.5, linetype = "solid")) + # adds solid black x and y axis
geom_errorbar(aes(ymin=y_min_error, ymax=y_max_error,width=0,)) + # y error bar
geom_errorbarh(aes(xmin=x_min_error, xmax=x_max_error,height=0,)) +
theme_classic() +
theme_bw()+ #Black outline around the graph
xlim(0, 14000) +#Set axis limits
ylim(0, 120) +
#scale_x_continuous(breaks=seq(0,14000,2000))+
#scale_y_continuous(breaks=seq(0,120,20))+
theme(legend.position="bottom")
g
I was wondering why I am having difficulty setting the axes scale. I am trying to use the scale_x_continuous(breaks=seq(...) code which wasn't working. I then read elsewhere that I had to set the limits of the scales which I did with xlim/ylim but I can't use this with the scale_x_continuous code as I get the error message:
Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.
Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
Does anyone have any ideas?
Replace xlim(0, 14000) with scale_x_continuous(breaks=seq(1, 15000, 1000), limits = c(0, 14000))
Tidier code:
library(ggplot2)
ggplot(df, aes(Age, RSL, shape = Type)) +
geom_point() +
geom_errorbarh(aes(xmin = x_min_error,
xmax = x_max_error,
height = 0)) +
geom_errorbar(aes(ymin = y_min_error,
ymax = y_max_error,
width = 0)) +
scale_shape_manual(values = c(1, 15, 5)) +
scale_y_continuous(limits = c(0, 120)) +
scale_x_continuous(breaks=seq(1, 15000, 1000),
limits = c(0, 14000))

How to use this date of tall array in R ggplot2 Date x-axis order?

I am thinking how to convert string Date data of tall array format to Date and organise the ggplot by it in the x-axis by scale_x_date.
Pseudocode motivated by Henrik's proposal
Change string data format to as.Date, maybe something similar to the following in ggplot's parameter x = ...
as.Date(time.data, format("%d.%m.%Y")
Apply scale_x_date in ggplot with date_breaks("2 day")
Code with dummy data data3
library("ggplot2")
# For RStudio
options(device="pdf") # https://stackoverflow.com/questions/6535927/how-do-i-prevent-rplots-pdf-from-being-generated
filename.pdf <- paste0(getwd(), "/", "Rplots", ".pdf", sep = "")
pdf(file=filename.pdf)
# Dummy data
data3 <- structure(list(Time.data = c("16.7.2017", "15.7.2017",
"14.7.2017", "13.7.2017", "12.7.2017", "11.7.2017", "9.7.2017",
"7.7.2017", "6.7.2017", "5.7.2017", "4.7.2017", "3.7.2017", "2.7.2017",
"1.7.2017", "30.6.2017", "29.6.2017", "28.6.2017", "16.7.2017",
"15.7.2017", "14.7.2017", "13.7.2017", "12.7.2017", "11.7.2017",
"9.7.2017", "7.7.2017", "6.7.2017", "5.7.2017", "4.7.2017", "3.7.2017",
"2.7.2017", "1.7.2017", "30.6.2017", "29.6.2017", "28.6.2017",
"16.7.2017", "15.7.2017", "14.7.2017", "13.7.2017", "12.7.2017",
"11.7.2017", "9.7.2017", "7.7.2017", "6.7.2017", "5.7.2017",
"4.7.2017", "3.7.2017", "2.7.2017", "1.7.2017", "30.6.2017",
"29.6.2017", "28.6.2017"), variable = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L
), .Label = c("ave_max", "ave", "lepo"), class = "factor"),
value = c(69, 75, 83, 97, 101, 73, 77, 78, 98, 79, 78, 95,
70, 81, 78, 71, 72, 58, 59, 59, 58, 54, 56, 60, 60, 62, 58,
56, 63, 58, 58, 63, 58, 56, 48, 51, 51, 48, 48, 48, 52, 53,
52, 49, 48, 53, 50, 50, 54, 46, 47)), row.names = c(NA, -51L
), .Names = c("Time.data", "variable", "value"), class = "data.frame")
#Relevant part of the code based on Henrik's proposal,
#rejected timestamp approach which output has wrongly shown x-axis label in Fig. 1
p <- ggplot(data3, aes(x = as.Date(Time.data, format = "%d.%m.%Y"), y = value, fill = variable)) +
geom_bar(stat='identity') +
theme(axis.text.x = element_text(angle = 90, hjust=1),
text = element_text(size=10)) +
scale_x_discrete("Date") +
scale_x_date(date_breaks = "2 days", date_labels = "%d.%m.%Y")
print(p)
dev.off()
Output which I do not understand
Scale for 'x' is already present. Adding another scale for 'x', which will replace the existing scale.
Fig. 1 Output based on Henrik's proposal
Expected output: as such but with correct x-label there on the x-axis
OS: Debian 9
R: 3.4.0
RStudio: 1.0.143
Other sources: Date format for subset of ticks on time axis, scale_datetime shifts x axis, Time series plot gets offset by 2 hours if scale_x_datetime is used
You have specified two different scales for the x axis, a discrete scale and a continuous date scale, presumably in an attempt to rename the label on the x axis. For this, xlab() can be used:
library(ggplot2)
ggplot(data3, aes(x = as.Date(Time.data, format = "%d.%m.%Y"), y = value, fill = variable)) +
# use new geom_col() instead of geom_bar(stat = "identity")
# see http://ggplot2.tidyverse.org/articles/releases/ggplot2-2.2.0.html#stacking-bars
geom_col() +
theme(axis.text.x = element_text(angle = 90, hjust=1),
text = element_text(size=10)) +
# specify label for x axis
xlab("Time.date") +
scale_x_date(date_breaks = "2 days", date_labels = "%d.%m.%Y")
Alternatively, you can use the name parameter to scale_x_date():
ggplot(data3, aes(x = as.Date(Time.data, format = "%d.%m.%Y"), y = value, fill = variable)) +
geom_col() +
theme(axis.text.x = element_text(angle = 90, hjust=1),
text = element_text(size=10)) +
scale_x_date(name = "Time.date", date_breaks = "2 days", date_labels = "%d.%m.%Y")
Addendum: Saving plots
If the intention is to save just one plot in a file you can add a call to ggsave() after the call to ggplot(), i.e.,
ggplot(...
ggsave("Rplots.pdf")
instead of
options(device="pdf") # https://stackoverflow.com/questions/6535927/how-do-i-prevent-rplots-pdf-from-being-generated
filename.pdf <- paste0(getwd(), "/", "Rplots", ".pdf", sep = "")
pdf(file=filename.pdf)
p <- ggplot(...
print(p)
dev.off()
According to help("ggsave")
ggsave() is a convenient function for saving a plot. It defaults to
saving the last plot that you displayed, using the size of the current
graphics device. It also guesses the type of graphics device from the
extension.
Another issue is the creation of the file path. Instead of
filename.pdf <- paste0(getwd(), "/", "Rplots", ".pdf", sep = "")
it is better to use
filename.pdf <- file.path(getwd(), "Rplots.pdf")
which constructs the path to a file from components in a platform-independent way.

Resources