I am trying to make a barplot in R with the following data frame and code below. However, when I do this, the year column also gets inserted into my graph. Is there a way to stop this from happening but still keep my graph sorted by the year?
barplot(t(as.matrix(Number_Letter_Year_DF)), beside=TRUE,
xlab="Year", ylab="Number",
names.arg=c("2016","2017", "2018"),
legend= c("A", "B","C","D","E","F"), args.legend = list(title="Letter", x="topright", cex=.7))
abline(h=0)
Year A B C D E F
2016 2547.150 2001.075 2493.925 1123.450 1876.625 1718.175
2017 2829.025 1808.025 2681.850 2633.425 3005.525 2542.550
2018 1776.175 1538.900 1614.675 845.225 1155.500 1029.325
We can remove the first column i.e. 'year' and change it to row names
barplot(t(`row.names<-`(as.matrix(Number_Letter_Year_DF[-1]), Number_Letter_Year_DF$Year)), beside=TRUE,
xlab="Year", ylab="Number",
names.arg=c("2016","2017", "2018"),
legend= c("A", "B","C","D","E","F"), args.legend = list(title="Letter", x="topright", cex=.7))
abline(h=0)
data
Number_Letter_Year_DF <- structure(list(Year = 2016:2018, A = c(2547.15, 2829.025, 1776.175
), B = c(2001.075, 1808.025, 1538.9), C = c(2493.925, 2681.85,
1614.675), D = c(1123.45, 2633.425, 845.225), E = c(1876.625,
3005.525, 1155.5), F = c(1718.175, 2542.55, 1029.325)), class = "data.frame",
row.names = c(NA,
-3L))
Related
I have a large dataset that is tidy and it is formatted as such
year Risk_score
2019 a
2019 b
2019 c
2020 d
2020 e
2020 f
2021 g
2021 h
2021 i
where the letters are different values
whenever I try to use geom_boxplot to visualize the data it just either shows me a big diagonal line or a straight line
senderoriskyy %>% ggplot(aes(x = factor(Risk_score), y = year)) + geom_boxplot()
I've tried changing the variables to factors and numeric variables but it just keeps giving me the same result
output from running
dput(head(senderoriskyy, 10))
structure(list(year = c("Risk_Score_in_2020", "Risk_Score_in_2021",
"Risk_Score_in_2020", "Risk_Score_in_2021", "Risk_Score_in_2019",
"Risk_Score_in_2020", "Risk_Score_in_2021", "Risk_Score_in_2019",
"Risk_Score_in_2020", "Risk_Score_in_2021"), Risk_score = c(`0.33040000000000003` = 0.3304,
`0.30687999999999999` = 0.30688, `2.9` = 2.9, `0.46500000000000002` = 0.465,
`1.16256` = 1.16256, `0.32256000000000001` = 0.32256, `0.27776000000000001` = 0.27776,
`0.19488` = 0.19488, `26.905999999999999` = 26.906, `23.581` = 23.581
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))
I have 3 dataframes:
> head(ps_data)
mass value
1 1197.106 0.0003046761
2 1197.312 0.0002792939
3 1197.518 0.0002545125
4 1197.724 0.0002304614
5 1197.930 0.0002072700
6 1198.136 0.0001850678
> head(enf_data)
mass value
1 1252.358 0.0001400532
2 1252.560 0.0001380179
3 1252.761 0.0001360147
4 1252.963 0.0001336038
5 1253.165 0.0001310146
6 1253.367 0.0001278587
> head(uti_data)
mass value
1 1209.999 9.404051e-05
2 1210.204 9.176861e-05
3 1210.409 8.892953e-05
4 1210.614 8.613961e-05
5 1210.819 8.299913e-05
6 1211.024 8.038693e-05
I need to plot something close to this:
Where z axis will be the "value" column, y axis will be the "mass" column and the x axis will be each dataframe.
I tried to plot this using plotly package, but I'm not getting it right.
How can I do it?
EDIT: dput as requested.
structure(list(mass = c(1197.10568602095, 1197.31161534199, 1197.51756246145,
1197.72352737934, 1197.92951009569, 1198.1355106105), value = c(0.000304676093184434,
0.000279293920415841, 0.000254512541389108, 0.000230461422005283,
0.000207270028165387, 0.000185067825770437), group = c("PS",
"PS", "PS", "PS", "PS", "PS")), row.names = c(NA, 6L), class = "data.frame")
structure(list(mass = c(1252.3578527531, 1252.55956147119, 1252.76128739414,
1252.96303052216, 1253.16479085545, 1253.3665683942), value = c(0.000140053215421452,
0.000138017894050617, 0.00013601474884925, 0.000133603848925069,
0.000131014621271734, 0.000127858739055662), group = c("ENF",
"ENF", "ENF", "ENF", "ENF", "ENF")), row.names = c(NA, 6L), class = "data.frame")
structure(list(mass = c(1209.99938731277, 1210.20436650703, 1210.40936335465,
1210.61437785568, 1210.81941001019, 1211.02445981824), value = c(9.40405108642129e-05,
9.17686135352109e-05, 8.89295335433793e-05, 8.61396097238083e-05,
8.29991287322805e-05, 8.03869281229029e-05), group = c("UTI",
"UTI", "UTI", "UTI", "UTI", "UTI")), row.names = c(NA, 6L), class = "data.frame")
EDIT 2:
Got some progress using plotly:
ps_data["group"] <- "PS"
enf_data["group"] <- "ENF"
uti_data["group"] <- "UTI"
all_data <- rbind(ps_data,enf_data,uti_data)
all_long <- melt(all_data, id.vars=c("mass","group","value"))
fig <- plot_ly(all_long, x = ~group, y = ~mass, z = ~value, type = 'scatter3d', mode = 'lines',
opacity = 1, line = list(width = 6, color = ~group, reverscale = FALSE))
fig
But some strange lines appeared in x axis and the colors are not right.
EDIT 3:
I managed to plot something quite good.
My data looks like this:
> head(all_data)
mass value group
1 1197.106 0.0003046761 PS
2 1197.312 0.0002792939 PS
3 1197.518 0.0002545125 PS
4 1197.724 0.0002304614 PS
5 1197.930 0.0002072700 PS
6 1198.136 0.0001850678 PS
The dataframe is huge, with three groups (PS, ENF, UTI).
I can't fit all of it here, but I decided to place the head just for you to see the structure.
With this data I used this:
p3 <- plot_ly(all_data, x = ~group, y = ~mass, z = ~value, split = ~group, type = 'scatter3d', mode = 'lines',
line = list(width = 4))
Now I'm just trying to find some reliable way to save it in TIFF and change the axis titles.
I would like to plot different rows as different lines in the same plot to illustrate the movements of the average development of 3 groups: All, Men and Women. However, I'm not getting one of the lines printed and the legend is not being filled with the rownames.
I'l be glad for a solution, either in matplot or in ggplot.
Thank you!
Code:
matplot(t(Market_Work), type = 'l', xaxt = 'n', xlab = "Time Period", ylab = "Average", main ="Market Work")
legend("right", legend = seq_len(nrow(Market_Work)), fill=seq_len(nrow(Market_Work)))
axis(1, at = 1:6, colnames(Market_Work))
Data:
2003-2005 2006-2008 2009-2010 2011-2013 2014-2016 2017-2018
All 31.48489 32.53664 30.41938 30.53870 31.15550 31.77960
Men 37.38654 38.16698 35.10247 35.65543 36.54855 36.72496
Women 31.48489 32.53664 30.41938 30.53870 31.15550 31.77960
> dput(Market_Work)
structure(list(`2003-2005` = c(31.4848853173555, 37.3865421137,
31.4848853173555), `2006-2008` = c(32.5366433161048, 38.1669798351148,
32.5366433161048), `2009-2010` = c(30.4193794808191, 35.1024661973137,
30.4193794808191), `2011-2013` = c(30.5387012166381, 35.6554329405739,
30.5387012166381), `2014-2016` = c(31.1555032381292, 36.5485451138792,
31.1555032381292), `2017-2018` = c(31.7795953402235, 36.7249638612854,
31.7795953402235)), row.names = c("All", "Men", "Women"), class = "data.frame")
Here is an example with ggplot2. I changed some of your data, as two rows were same in your originial data.
library(tidyverse)
df <- structure(list(`2003-2005` = c(31.4848853173555, 37.3865421137,
30.4848853173555), `2006-2008` = c(32.5366433161048, 38.1669798351148,
30.5366433161048), `2009-2010` = c(30.4193794808191, 35.1024661973137,
33.4193794808191), `2011-2013` = c(30.5387012166381, 35.6554329405739,
33.5387012166381), `2014-2016` = c(31.1555032381292, 36.5485451138792,
30.1555032381292), `2017-2018` = c(31.7795953402235, 36.7249638612854,
30.7795953402235)), row.names = c("All", "Men", "Women"), class = "data.frame")
df2 <- as.data.frame(t(df))
df2$Year <- rownames(df2)
df2%>% pivot_longer( c(All,Men,Women), names_to = "Category") %>%
ggplot(aes(x = Year, y = value)) + geom_line(aes(group = Category, color = Category))
This is the first 10 rows of my data frame:
head(test.data,10)
# A tibble: 10 x 5
date o2.permeg co2.ppm apo o2.spike
<time> <dbl> <dbl> <dbl> <chr>
1 2015-01-01 00:00:00 -685.09 413.023 -354.1816 N
2 2015-01-01 00:02:00 -695.10 412.894 -364.8690 N
3 2015-01-01 00:04:00 -687.84 412.979 -357.1627 N
4 2015-01-01 00:06:00 -683.23 412.866 -353.1460 N
5 2015-01-01 00:08:00 -683.28 412.755 -353.7788 N
6 2015-01-01 00:10:00 -685.40 412.647 -356.4659 N
7 2015-01-01 00:12:00 -687.80 412.659 -358.8029 N
8 2015-01-01 00:14:00 -662.79 412.665 NA Y
9 2015-01-01 00:16:00 -684.17 412.762 -354.6321 N
10 2015-01-01 00:18:00 -680.37 412.720 -351.0526 N
As you can see there's a last column named o2.spike, which has characters N and Y in it. N means that the data point is not a spike, and Y means that it is a spike. In this sample, there's only 1 Y, but in the real frame, there are loads, and randomly placed.
My desire is to plot all the data points in a plot, and those marked with Y will be plotted in a different colour.
For your information, this is the current code that I am using to plot everything. The first 3 variables are plotted in red, green, and blue, and I want the "Y" rows to be plotted in as, for example, pink.
library(openair)
test.data$yr_day <- format(as.Date(test.data$date), "%Y-%m-%d")
dir.create(daily) # where "daily" is the path of the folder I want to save the plots into
for (d in unique(test.data$yr_day)) {
mypath <- file.path(daily, paste(name, d, ".png", sep = "" ))
png(filename = mypath, width = 963, height = 690)
timePlot(subset(test.data, yr_day == d),
plot.type = "p",
pollutant = c("co2.ppm", "o2.permeg", "apo"),
y.relation = "free",
date.pad = TRUE,
pch = c(19,19,19),
cex = 0.2,
xlab = paste("Time of day in hours on", d),
ylab = "CO2, O2, and APO concentrations",
name.pol = c("CO2 (ppm)", "O2 (per meg)", "APO (per meg)"),
date.breaks = 24,
date.format = "%H:%M"
)
dev.off()
}
An example plot (containing all the spikes with the same colour as the non-spike ones) is as follows:
So how do I plot the spikes in a different colour from the others? Thank you very much!
Edit:
As asked by Sebastian, I have added this (not sure how you guys will be able to extract the data from that)
dput(head(test.data,20))
structure(list(date = structure(c(1420070400, 1420070520, 1420070640,
1420070760, 1420070880, 1420071000, 1420071120, 1420071240, 1420071360,
1420071480, 1420071600, 1420071720, 1420071840, 1420071960, 1420072080,
1420072200, 1420072320, 1420072440, 1420072560, 1420072680), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), o2.permeg = c(-685.09, -695.1, -687.84,
-683.23, -683.28, -685.4, -687.8, -662.79, -684.17, -680.37,
-684.66, -686.13, -683.27, -680.77, -682.16, -692.54, NA, NA,
NA, NA), co2.ppm = c(413.023, 412.894, 412.979, 412.866, 412.755,
412.647, 412.659, 412.665, 412.762, 412.72, 412.692, 412.71,
412.757, 412.838, 412.922, 413.019, NA, NA, NA, NA), apo = c(-354.181646778043,
-364.868973747017, -357.162673031026, -353.145990453461, -353.778806682578,
-356.465871121718, -358.802863961814, NA, -354.632052505966,
-351.052577565632, -355.489594272076, -356.86508353222, -353.75830548926,
-350.833007159904, -351.781957040573, -361.652649164678, NA,
NA, NA, NA), o2.spike = c("N", "N", "N", "N", "N", "N", "N",
"Y", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N", "N"
)), .Names = c("date", "o2.permeg", "co2.ppm", "apo", "o2.spike"
), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"
))
Unfortunately, without having data, it's not easy to answer the question.
A ggplot2 solution could be:
g1 <- ggplot(data=test.data, aes(x=date, y=o2.permeg, col=o2.spike)) + geom_point()
g1
Passing a column of the dataframe to parameter "col" in "aes" makes you map with different colors every different value in that column.
It creates even a legend, with names associated to different colors.
I tried this with another dataframe ("iris", contained in base R) and it worked, hope it will be helpful.
Edit:
To have side-by-side plots, you can create 3 plots with ggplot and the use the function plot_grid() provided by "cowplot" package.
library(cowplot)
g1 <- ggplot(data=test.data, aes(x=date, y=o2.permeg, col=o2.spike)) + geom_point()
g2 <- ggplot(data=test.data, aes(x=date, y=co2.ppm, col=o2.spike)) + geom_point()
g3 <- ggplot(data=test.data, aes(x=date, y=apo, col=o2.spike)) + geom_point()
plot_grid(g1, g2, g3, nrow=3, ncol=1)
I'm using the R package TraMineR. I would like to plot frequent event sequences by using the command seqpcplot. I previously coded the states in the alphabet as to keep them in alphabetical order so that when I compute the sequences by using the seqdef command without specifying the labels and states options I obtain the following output:
[>] state coding:
[alphabet] [label] [long label]
1 a.sin a.sin a.sin
2 b.co0 b.co0 b.co0
3 c.co1 c.co1 c.co1
4 d.co2+ d.co2+ d.co2+
5 e.ma0 e.ma0 e.ma0
6 f.ma1 f.ma1 f.ma1
7 g.ma2+ g.ma2+ g.ma2+
8 h.sin0 h.sin0 h.sin0
9 i.lp1 i.lp1 i.lp1
10 l.lp2+ l.lp2+ l.lp2+
11 m.lp1_18 m.lp1_18 m.lp1_18
12 n.lp2_18 n.lp2_18 n.lp2_18
I then convert the state-sequence objet in an event-sequece objet by using seqecreate. When plotting the event sequences by seqpcplot I obtain a very nice graph where the states are ordered alphabetically on the y-axis according to the alphabet.
However, I would like to use longer labels in the graphs, so that I specified the labels and states options in the seqdef command as
lab<-c("single", "cohabNOchildren","cohab1child","cohab2+children","marrNOchildren","marr1child","marr2+children","singleNOchildren","loneMother1child","loneMother2+children","loneMother1child_over18","loneMother2+children_over18")
obtaining:
[>] state coding:
[alphabet] [label] [long label]
1 a.sin single single
2 b.co0 cohabNOchildren cohabNOchildren
3 c.co1 cohab1child cohab1child
4 d.co2+ cohab2+children cohab2+children
5 e.ma0 marrNOchildren marrNOchildren
6 f.ma1 marr1child marr1child
7 g.ma2+ marr2+children marr2+children
8 h.sin0 singleNOchildren singleNOchildren
9 i.lp1 loneMother1child loneMother1child
10 l.lp2+ loneMother2+children loneMother2+children
11 m.lp1_18 loneMother1child_over18 loneMother1child_over18
12 n.lp2_18 loneMother2+children_over18 loneMother2+children_over18
As before, I then computed the event sequences and plot them by using seqpcplot:
seqpcplot(example.seqe,
filter = list(type = "function",
value = "cumfreq",
level = 0.8),
order.align = "last",
ltype = "non-embeddable",
cex = 1.5, lwd = .9,
lcourse = "downwards")
This time the states on the y-axis were the states are ordered alphabetically but following the order given by the labels and states labels rather than the alphabet, as I wished.
Is there a way to keep the alphabetical order given in the alphabet when plotting with seqpcplot when the labels and states options are specified and may follow a different alphabetical order from the alphabet?
Thanks.
I agree with the solution above. As a supplement, here a number of possible solutions:
Using seqecreate and the alphabet argument in seqpcplot:
dat <- data.frame(id = factor(1, 1, 1),
timestamp = c(0, 20, 22),
event = factor(c("A", "B", "C")))
dat.seqe <- seqecreate(dat)
seqpcplot(dat.seqe, alphabet = c("C", "A", "B"))
Using seqecreate only
dat <- data.frame(id = factor(1, 1, 1),
timestamp = c(0, 20, 22),
event = factor(c("A", "B", "C"),levels = c("C", "A", "B")))
dat.seqe <- seqecreate(dat)
seqpcplot(dat.seqe)
Using seqdef (here the original categories are different than the labels to be shown in the y-axis)
dat <- data.frame(id = factor(1),
ev.0 = factor("AA", levels = c("CC", "AA", "BB")),
ev.20 = factor("BB", levels = c("CC", "AA", "BB")),
ev.22 = factor("CC", levels = c("CC", "AA", "BB")))
dat.seq <- seqdef(dat, var = 2:4, alphabet = c("CC", "AA", "BB"),
states = c("C", "A", "B"))
seqpcplot(dat.seq)
The last solution may be the one you're looking for. Hope it helps.
The alphabet argument of the seqpcplot function is there to control that order. Something like
seqpcplot(example.seqe,
alphabet = lab,
filter = list(type = "function",
value = "cumfreq",
level = 0.8),
order.align = "last",
ltype = "non-embeddable",
cex = 1.5, lwd = .9,
lcourse = "downwards")
should give you the expected plot.