Plotting different rows as different lines in R with matplot - r

I would like to plot different rows as different lines in the same plot to illustrate the movements of the average development of 3 groups: All, Men and Women. However, I'm not getting one of the lines printed and the legend is not being filled with the rownames.
I'l be glad for a solution, either in matplot or in ggplot.
Thank you!
Code:
matplot(t(Market_Work), type = 'l', xaxt = 'n', xlab = "Time Period", ylab = "Average", main ="Market Work")
legend("right", legend = seq_len(nrow(Market_Work)), fill=seq_len(nrow(Market_Work)))
axis(1, at = 1:6, colnames(Market_Work))
Data:
2003-2005 2006-2008 2009-2010 2011-2013 2014-2016 2017-2018
All 31.48489 32.53664 30.41938 30.53870 31.15550 31.77960
Men 37.38654 38.16698 35.10247 35.65543 36.54855 36.72496
Women 31.48489 32.53664 30.41938 30.53870 31.15550 31.77960
> dput(Market_Work)
structure(list(`2003-2005` = c(31.4848853173555, 37.3865421137,
31.4848853173555), `2006-2008` = c(32.5366433161048, 38.1669798351148,
32.5366433161048), `2009-2010` = c(30.4193794808191, 35.1024661973137,
30.4193794808191), `2011-2013` = c(30.5387012166381, 35.6554329405739,
30.5387012166381), `2014-2016` = c(31.1555032381292, 36.5485451138792,
31.1555032381292), `2017-2018` = c(31.7795953402235, 36.7249638612854,
31.7795953402235)), row.names = c("All", "Men", "Women"), class = "data.frame")

Here is an example with ggplot2. I changed some of your data, as two rows were same in your originial data.
library(tidyverse)
df <- structure(list(`2003-2005` = c(31.4848853173555, 37.3865421137,
30.4848853173555), `2006-2008` = c(32.5366433161048, 38.1669798351148,
30.5366433161048), `2009-2010` = c(30.4193794808191, 35.1024661973137,
33.4193794808191), `2011-2013` = c(30.5387012166381, 35.6554329405739,
33.5387012166381), `2014-2016` = c(31.1555032381292, 36.5485451138792,
30.1555032381292), `2017-2018` = c(31.7795953402235, 36.7249638612854,
30.7795953402235)), row.names = c("All", "Men", "Women"), class = "data.frame")
df2 <- as.data.frame(t(df))
df2$Year <- rownames(df2)
df2%>% pivot_longer( c(All,Men,Women), names_to = "Category") %>%
ggplot(aes(x = Year, y = value)) + geom_line(aes(group = Category, color = Category))

Related

How to edit the labels of a facet_wrap/grid if there are two variables?

In ggplot I have faceted by two variables (tau and z) but can only change the label of the first:
df<-data.frame(x=runif(1e3),y=runif(1e3),tau=rep(c("A","aBc"),each=500),z=rep(c("DDD","EEE"),each=500))
tauNames <- c(
`A` = "10% load",
`aBc` = "40% load"
)
df%>%
ggplot(aes(x=x,y=y))+
geom_point(alpha=0.4)+
xlab(label = "Time[s]")+
ylab(label = "Dose")+
facet_grid(tau~z,labeller = as_labeller(tauNames))+
ggpubr::theme_pubclean()
As you can see I can change one of the labels but not both. Any thoughts are much appreciated
In the documentation of ?as_labeller you can find in the examples how you get the labels for multiple faceting variables.
library(tidyverse)
df<-data.frame(x=runif(1e3),y=runif(1e3),tau=rep(c("A","aBc"),each=500),z=rep(c("DDD","EEE"),each=500))
tauNames <- c(
`A` = "10% load",
`aBc` = "40% load"
)
df%>%
ggplot(aes(x=x,y=y))+
geom_point(alpha=0.4)+
xlab(label = "Time[s]")+
ylab(label = "Dose")+
facet_grid(tau~z,labeller = labeller(tau = tauNames,
z = c("DDD" = "D", "EEE" = "E")))+
ggpubr::theme_pubclean()

Create a multiline plot from a dataset with time on one axis and genes on the other

I have a dataset with mean gene counts for each decade as shown below:
structure(list(decade_0 = c(92.500989948184, 2788.27384875413,
28.6937227408861, 1988.03831525414, 1476.83143096418), decade_1 = c(83.4606306426572,
537.725421951383, 10.2747132062782, 235.380422949258, 685.043600629146
), decade_2 = c(188.414375201462, 2091.84249935145, 17.080858894829,
649.55107199935, 1805.3484565514), decade_3 = c(43.3316024314987,
141.64396529835, 2.77851259926935, 94.7748265692319, 413.248354335235
), decade_4 = c(54.4891626582901, 451.076574268175, 12.4298374245007,
346.102609621018, 769.215535857077), decade_5 = c(85.5621750431284,
131.822699578988, 13.3130607062134, 151.002200923853, 387.727911723968
), decade_6 = c(112.860998806804, 4844.59668489898, 19.7317645111144,
2084.76584309876, 766.375852567831), decade_7 = c(73.2198969730458,
566.042952305845, 3.2457873699886, 311.853982701609, 768.801733767044
), decade_8 = c(91.8161648275608, 115.161700090147, 10.7289451320065,
181.747670625714, 549.21661120626), decade_9 = c(123.31045087146,
648.23694540667, 17.7690326882018, 430.301803845829, 677.187054208271
)), row.names = c("ANK1", "NTN4", "PTPRH", "JAG1", "PLAT"), class = "data.frame")
I would like to plot a line graph with the changes in counts over time for each of >30 genes as shown here in excel.
To do this with ggplot I have to convert it to col1: decade, col2: gene, col3: counts.
My question is, either how to convert my table into this ggplot friendly table, or if there is a better way to produce the plot with a different tool?
Thanks!
One possibility: transpose your data frame, convert rownames to columns, then gather ("make long"). Plotting is then easy.
library(tidyverse)
mydat <- structure(list(decade_0 = c(92.500989948184, 2788.27384875413,
28.6937227408861, 1988.03831525414, 1476.83143096418), decade_1 = c(83.4606306426572,
537.725421951383, 10.2747132062782, 235.380422949258, 685.043600629146
), decade_2 = c(188.414375201462, 2091.84249935145, 17.080858894829,
649.55107199935, 1805.3484565514), decade_3 = c(43.3316024314987,
141.64396529835, 2.77851259926935, 94.7748265692319, 413.248354335235
), decade_4 = c(54.4891626582901, 451.076574268175, 12.4298374245007,
346.102609621018, 769.215535857077), decade_5 = c(85.5621750431284,
131.822699578988, 13.3130607062134, 151.002200923853, 387.727911723968
), decade_6 = c(112.860998806804, 4844.59668489898, 19.7317645111144,
2084.76584309876, 766.375852567831), decade_7 = c(73.2198969730458,
566.042952305845, 3.2457873699886, 311.853982701609, 768.801733767044
), decade_8 = c(91.8161648275608, 115.161700090147, 10.7289451320065,
181.747670625714, 549.21661120626), decade_9 = c(123.31045087146,
648.23694540667, 17.7690326882018, 430.301803845829, 677.187054208271
)), row.names = c("ANK1", "NTN4", "PTPRH", "JAG1", "PLAT"), class = "data.frame")
newdat <- mydat %>% t() %>% as.data.frame() %>% tibble::rownames_to_column('decade') %>%
pivot_longer(-decade, names_to = 'gene', values_to = 'count')
ggplot(newdat) + geom_line(aes(decade, count, color = gene, group = gene))
Created on 2020-02-14 by the reprex package (v0.3.0)

HeatMap: how to cluster only the rows and keep order of the heatmap's column labels as same as in the df?

I wanna plot a heatmap and cluster only the rows (i.e. genes in this tydf1).
Also, wanna keep order of the heatmap's column labels as same as in the df (i.e. tydf1)?
Sample data
df1 <- structure(list(Gene = c("AA", "PQ", "XY", "UBQ"), X_T0_R1 = c(1.46559502, 0.220140568, 0.304127515, 1.098842127), X_T0_R2 = c(1.087642983, 0.237500819, 0.319844338, 1.256624804), X_T0_R3 = c(1.424945196, 0.21066267, 0.256496284, 1.467120048), X_T1_R1 = c(1.289943948, 0.207778662, 0.277942721, 1.238400358), X_T1_R2 = c(1.376535013, 0.488774258, 0.362562315, 0.671502431), X_T1_R3 = c(1.833390311, 0.182798731, 0.332856558, 1.448757569), X_T2_R1 = c(1.450753714, 0.247576125, 0.274415259, 1.035410946), X_T2_R2 = c(1.3094609, 0.390028842, 0.352460646, 0.946426593), X_T2_R3 = c(0.5953716, 1.007079177, 1.912258811, 0.827119776), X_T3_R1 = c(0.7906009, 0.730242116, 1.235644748, 0.832287694), X_T3_R2 = c(1.215333041, 1.012914813, 1.086362205, 1.00918082), X_T3_R3 = c(1.069312467, 0.780421013, 1.002313082, 1.031761442), Y_T0_R1 = c(0.053317766, 3.316414959, 3.617213894, 0.788193798), Y_T0_R2 = c(0.506623748, 3.599442788, 1.734075583, 1.179462912), Y_T0_R3 = c(0.713670106, 2.516735845, 1.236204882, 1.075393433), Y_T1_R1 = c(0.740998252, 1.444496448, 1.077023349, 0.869258744), Y_T1_R2 = c(0.648231834, 0.097957459, 0.791438659, 0.428805547), Y_T1_R3 = c(0.780499252, 0.187840968, 0.820430227, 0.51636582), Y_T2_R1 = c(0.35344654, 1.190274584, 0.401845911, 1.223534348), Y_T2_R2 = c(0.220223951, 1.367784148, 0.362815405, 1.102117612), Y_T2_R3 = c(0.432856978, 1.403057729, 0.10802472, 1.304233845), Y_T3_R1 = c(0.234963735, 1.232129062, 0.072433381, 1.203096462), Y_T3_R2 = c(0.353770497, 0.885122768, 0.011662112, 1.188149743), Y_T3_R3 = c(0.396091395, 1.333921747, 0.192594116, 1.838029829), Z_T0_R1 = c(0.398000559, 1.286528398, 0.129147097, 1.452769794), Z_T0_R2 = c(0.384759325, 1.122251177, 0.119475721, 1.385513609), Z_T0_R3 = c(1.582230097, 0.697419716, 2.406671502, 0.477415567), Z_T1_R1 = c(1.136843842, 0.804552001, 2.13213228, 0.989075996), Z_T1_R2 = c(1.275683837, 1.227821594, 0.31900326, 0.835941568), Z_T1_R3 = c(0.963349308, 0.968589683, 1.706670339, 0.807060135), Z_T2_R1 = c(3.765036263, 0.477443352, 1.712841882, 0.469173869), Z_T2_R2 = c(1.901023385, 0.832736132, 2.223429427, 0.593558769), Z_T2_R3 = c(1.407713024, 0.911920317, 2.011259223, 0.692553388), Z_T3_R1 = c(0.988333629, 1.095130142, 1.648598854, 0.629915612), Z_T3_R2 = c(0.618606729, 0.497458337, 0.549147265, 1.249492088), Z_T3_R3 = c(0.429823986, 0.471389536, 0.977124788, 1.136635484)), row.names = c(NA, -4L ), class = c("data.table", "data.frame"))
Scripts used
library(dplyr)
library(stringr)
library(tidyr)
gdf1 <- gather(df1, "group", "Expression", -Gene)
gdf1$tgroup <- apply(str_split_fixed(gdf1$group, "_", 3)[, c(1, 2)],
1, paste, collapse ="_")
library(dplyr)
tydf1 <- gdf1 %>%
group_by(Gene, tgroup) %>%
summarize(expression_mean = mean(Expression)) %>%
spread(., tgroup, expression_mean)
#1 heatmap script is being used
library(tidyverse)
tydf1 <- tydf1 %>%
as.data.frame() %>%
column_to_rownames(var=colnames(tydf1)[1])
library(gplots)
library(vegan)
randup.m <- as.matrix(tydf1)
scaleRYG <- colorRampPalette(c("red","yellow","darkgreen"),
space = "rgb")(30)
data.dist <- vegdist(randup.m, method = "euclidean")
row.clus <- hclust(data.dist, "aver")
heatmap.2(randup.m, Rowv = as.dendrogram(row.clus),
dendrogram = "row", col = scaleRYG, margins = c(7,10),
density.info = "none", trace = "none", lhei = c(2,6),
colsep = 1:3, sepcolor = "black", sepwidth = c(0.001,0.0001),
xlab = "Identifier", ylab = "Rows")
#2 heatmap script is being used
df2 <- as.matrix(tydf1[, -1])
heatmap(df2)
Also, I want to add a color key.
It is still unclear to me, what the desired output is. There are some notes:
You don't need to use vegdist() to calculate distance matrix for your hclust() call. Because if you check all(vegdist(randup.m, method = "euclidian") == dist(randup.m)) it returns TRUE;
Specifying Colv = F in your heatmap.2() call will prevent reordering of the columns (default is TRUE);
Maybe it is better to scale your data by row (see the uncommented row);
Your call of heatmap.2() returns the heatmap with color key.
So summing it up - in your first script you just miss the Colv = F argument, and after a little adjustment it looks like this:
heatmap.2(randup.m,
Rowv = as.dendrogram(row.clus),
Colv = F,
dendrogram = "row",
#scale = "row",
col = scaleRYG,
density.info = "none",
trace = "none",
srtCol = -45,
adjCol = c(.1, .5),
xlab = "Identifier",
ylab = "Rows"
)
However I am still not sure - is it what you need?

How to plot multiple curves and color them as group using R ggplot

I have a data frame like this.
ID read1 read2 read3 read4 class
1 5820350 0.3791915 0.3747022 0.3729779 0.3724259 1
2 5820364 0.3758676 0.3711775 0.3695976 0.3693112 2
3 5820378 0.3885081 0.3823900 0.3804273 0.3797707 2
4 5820392 0.3779945 0.3729582 0.3714910 0.3709072 1
5 5820425 0.2954782 0.2971604 0.2973882 0.2973216 3
6 5820426 0.3376101 0.3368173 0.3360203 0.3359517 3
Each row represents one sample with four values,and the last column is the classification of this sample. I want to visualize each sample curve and set the class as the color.
I tried to reshape the data frame, but I then lost the class feature which I need.
Could you please give me some hint or show me how to do that in R?
Thanks in advance.
You are going to want to tidy your data first (shown below with tidyr::gather). Then, when you plot, you will want to set your group = ID and color = factor(class) (for discrete colors):
library(tidyr)
library(ggplot2)
df <- structure(list(ID = c(5820350L, 5820364L, 5820378L, 5820392L, 5820425L, 5820426L),
read1 = c(0.3791915, 0.3758676, 0.3885081, 0.3779945, 0.2954782, 0.3376101),
read2 = c(0.3747022, 0.3711775, 0.38239, 0.3729582, 0.2971604, 0.3368173),
read3 = c(0.3729779, 0.3695976, 0.3804273, 0.371491, 0.2973882, 0.3360203),
read4 = c(0.3724259, 0.3693112, 0.3797707, 0.3709072, 0.2973216, 0.3359517),
class = c(1L, 2L, 2L, 1L, 3L, 3L)),
.Names = c("ID", "read1", "read2", "read3", "read4", "class"),
class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))
df <- gather(df, reading, value, -c(ID, class))
ggplot(df, aes(x = reading, y = value, color = factor(class))) +
geom_line(aes(group = ID))
Here's a function that may do what you want:
PlotMultiCurve = function(x, classes, cols = NULL, colSet = "Set1", ...) {
if(!is.factor(classes)) classes = as.factor(classes)
nClasses = length(levels(classes))
if(is.null(cols)) cols = brewer.pal(nClasses, colSet)
plot(1:ncol(x), x[1,], col = cols[classes[1]], type = "l",
ylim = range(x), xaxt = "n", ...)
axis(1, 1:ncol(x), 1:ncol(x))
for(i in 2:nrow(x)) {
par(new = T)
plot(1:ncol(x), x[i,], col = cols[classes[i]], type = "l",
ylim = range(x), axes = F, xlab = "", ylab = "")
}
}
It uses chooses colors automatically from the RColorBrewer package unless you provide the colors. I copied your data directly into a text file and then ran the following:
# Prepare data
require(RColorBrewer)
myData = read.table("Data.2016-05-03.txt")
x = myData[,2:5]
classes = as.factor(myData$class)
# Plot into PNG file[![enter image description here][1]][1]
png("Plot.2016-05-03.png", width = 1000, height = 1000, res = 300)
par(cex = 0.8)
PlotMultiCurve(x = x, classes = classes, xlab = "Read", ylab = "Response")
dev.off()

time series plot in R

My data looks something like this:
There are 10,000 rows, each representing a city and all months since 1998-01 to 2013-9:
RegionName| State| Metro| CountyName| 1998-01| 1998-02| 1998-03
New York| NY| New York| Queens| 1.3414| 1.344| 1.3514
Los Angeles| CA| Los Angeles| Los Angeles| 12.8841| 12.5466| 12.2737
Philadelphia| PA| Philadelphia| Philadelphia| 1.626| 0.5639| 0.2414
Phoenix| AZ| Phoenix| Maricopa| 2.7046| 2.5525| 2.3472
I want to be able to do a plot for all months since 1998 for any city or more than one city.
I tried this but i get an error. I am not sure if i am even attempting this right. Any help will be appreciated. Thank you.
forecl <- ts(forecl, start=c(1998, 1), end=c(2013, 9), frequency=12)
plot(forecl)
Error in plots(x = x, y = y, plot.type = plot.type, xy.labels = xy.labels, :
cannot plot more than 10 series as "multiple"
You might try
require(reshape)
require(ggplot2)
forecl <- melt(forecl, id.vars = c("region","state","city"), variable_name = "month")
forecl$month <- as.Date(forecl$month)
ggplot(forecl, aes(x = month, y = value, color = city)) + geom_line()
To add to #JLLagrange's answer, you might want to pass city through facet_grid() if there are too many cities and the colors will be hard to distinguish.
ggplot(forecl, aes(x = month, y = value, color = city, group = city)) +
geom_line() +
facet_grid( ~ city)
Could you provide an example of your data, e.g. dput(head(forecl)), before converting to a time-series object? The problem might also be with the ts object.
In any case, I think there are two problems.
First, data are in wide format. I'm not sure about your column names, since they should start with a letter, but in any case, the general idea would be do to something like this:
test <- structure(list(
city = structure(1:2, .Label = c("New York", "Philly"),
class = "factor"), state = structure(1:2, .Label = c("NY",
"PA"), class = "factor"), a2005.1 = c(1, 1), a2005.2 = c(2, 5
)), .Names = c("city", "state", "a2005.1", "a2005.2"), row.names = c(NA,
-2L), class = "data.frame")
test.long <- reshape(test, varying=c(3:4), direction="long")
Second, I think you are trying to plot too many cities at the same time. Try:
plot(forecl[, 1])
or
plot(forecl[, 1:5])

Resources