How to customize the tooltip of ggplotly? - r

I have tried to follow different answers here but none worked. I went through the plotly official documentation and came up with following:
Data
Following is a sample of the data set:
> dput(head(df))
structure(list(ID = c(-1, -1, -1, -1, -1, -1), spacing.ft = c(0,
0, 0, 0, 0, 0), gap.s = c(0, 0, 0, 0, 0, 0), frspacing.ft = c(0,
0, 0, 0, 0, 0), TTC = c(0, 0, 0, 0, 0, 0), LV.vel.fps = c(0,
0, 0, 0, 0, 0), x = c(0, 0, 0, 0, 0, 0), y = c(0, 0, 0, 0, 0,
0), z = c(0, 0, 0, 0, 0, 0), frames = 29373:29378, df16 = c(6L,
6L, 6L, 6L, 6L, 6L), ADO.name = structure(c(NA_integer_, NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_), .Label = c("BlueT5",
"ghtTFrei10", "ilT6Carg", "owT8Yell", "CargoT4", "MoveT12", "RaceT11",
"RedT1", "SemiT3", "StarT7", "WhiteT2", "artTWalm9"), class = "factor"),
speed.fps.ED = c(33.25, 33.4, 33.55, 33.7, 33.84, 33.99),
deltaV.fps = c(33.25, 33.4, 33.55, 33.7, 33.84, 33.99)), .Names = c("ID",
"spacing.ft", "gap.s", "frspacing.ft", "TTC", "LV.vel.fps", "x",
"y", "z", "frames", "df16", "ADO.name", "speed.fps.ED", "deltaV.fps"
), row.names = c(NA, 6L), class = "data.frame")
What I want to do:
I want to customize the tooltip to add speed, speed.fps.ED. I tried following:
library(ggplot2)
library(plotly)
mt.plot <- ggplot() +
geom_point(data = df,
mapping = aes(x = deltaV.fps, y = frspacing.ft, color = ADO.name))
# Build the ggplot:
p <- plotly_build(mt.plot)
# Change the tooltip:
p$data[[1]]$text <- paste("ED.speed = ", df$speed.fps.ED)
p$filename <- 'test'
r <- plotly_POST(p)
knit_print.plotly(r, options=list())
You can see the resulting plot here: Plot.
Problem
The problem is that the third element in the tooltip is displayed only for 1 ADO.name i.e. BlueT5. I want it to be visible for all ADO.names. What is the problem here?

You can add speed.fps.ED to the ggplot aesthetic, as in:
geom_point(data = df,
aes(x = deltaV.fps, y = frspacing.ft, color = ADO.name, label = speed.fps.ED))
See also: how to choose variable to display in tooltip when using ggplotly?

Related

Is there a way to combine mosaic plots using different data sets so that they are outputted next to each other?

I am trying to combine 20 mosaic plots onto one output. par(mfrow=...) is not working. I would like the rows to have 3 plots in each row.
Here is code for just 4 of the plots that I am using:
library(vcd)
library(vcdExtra)
library(MASS)
All <- matrix(c(599,250,39,24, 157,238,89,40, 26,51,51,45, 26,26,30,57), 4, 4)
dimnames(All) <- list("2002" =c("Never","Light","Moderate","Heavy"), "2014" =c("Never","Light","Moderate","Heavy"))
All <- as.table(All)
Poor <- matrix(c(184,57,7,6, 51,43,12,6, 9,10,6,6, 9,5,9,11), 4, 4)
dimnames(Poor) <- list("2002"=c("Never","Light","Moderate","Heavy"), "2014" =c("Never","Light","Moderate","Heavy"))
Poor <- as.table(Poor)
NonPoor <- matrix(c(376,180,30,18, 94,192,77,34, 12,40,43,39, 15,19,21,41), 4, 4)
dimnames(NonPoor) <- list("2002"=c("Never","Light","Moderate","Heavy"), "2014" =c("Never","Light","Moderate","Heavy"))
NonPoor <- as.table(NonPoor)
Black <- matrix(c(239,82,7,6, 54,56,15,9, 8,12,9,5, 12,11,8,8), 4, 4)
dimnames(Black) <- list("2002"=c("Never","Light","Moderate","Heavy"), "2014" =c("Never","Light","Moderate","Heavy"))
Black <- as.table(Black)
mosaic(All, shade = TRUE, rot_labels = c(top = 90, left=0),just_labels = c("left","right","right","right"),offset_labels = c(-.4, 0, 0, 0),offset_varnames = c(1.5, 0, 0, 1.5))
mosaic(Poor, shade = TRUE, rot_labels = c(top = 90, left=0),just_labels = c("left","right","right","right"),offset_labels = c(-.4, 0, 0, 0),offset_varnames = c(1.5, 0, 0, 1.5))
mosaic(NonPoor, shade = TRUE, rot_labels = c(top = 90, left=0),just_labels = c("left","right","right","right"),offset_labels = c(-.4, 0, 0, 0),offset_varnames = c(1.5, 0, 0, 1.5))
mosaic(Black, shade = TRUE, rot_labels = c(top = 90, left=0),just_labels = c("left","right","right","right"),offset_labels = c(-.4, 0, 0, 0),offset_varnames = c(1.5, 0, 0, 1.5))
I did something similar earlier with bar plots and it worked just using the par(mfrow=...). Thank you in advance!
Its a little bit tricky with VCD graphics
try this approach
library(gridExtra)
fig1<-grid.grabExpr(mosaic(All, shade = TRUE, rot_labels = c(top = 90, left=0),just_labels = c("left","right","right","right"),offset_labels = c(-.4, 0, 0, 0),offset_varnames = c(1.5, 0, 0, 1.5)))
fig2<-grid.grabExpr(mosaic(Poor, shade = TRUE, rot_labels = c(top = 90, left=0),just_labels = c("left","right","right","right"),offset_labels = c(-.4, 0, 0, 0),offset_varnames = c(1.5, 0, 0, 1.5)))
fig3<-grid.grabExpr(mosaic(NonPoor, shade = TRUE, rot_labels = c(top = 90, left=0),just_labels = c("left","right","right","right"),offset_labels = c(-.4, 0, 0, 0),offset_varnames = c(1.5, 0, 0, 1.5)))
fig4<-grid.grabExpr(mosaic(Black, shade = TRUE, rot_labels = c(top = 90, left=0),just_labels = c("left","right","right","right"),offset_labels = c(-.4, 0, 0, 0),offset_varnames = c(1.5, 0, 0, 1.5)))
grid.arrange(fig1,fig2,fig3,fig4)

mean for subsets using survey

I have two variables "c" and "q" in a data.frame. "c" is a number between zero and one (a level of poverty) and "q" indicate if the household (or subject) is poor with 1 or non-poverty with zero.
How can I calculate the mean of "c" only of the poor households (q=1).
What I need
Important detail: I have a database for a coutry and I want this result for regions.
I am using the svyby like this:
svyby( ~q , ~region , design = base2015_pos , na.rm=TRUE, svytotal)
so in that way the R give me the number of poor by region and I don't need this now. I need the mean of a subset (see image above) by region.
structure(list(domicilio = c(11000015001, 11000015003, 11000015004), agua = c(0, 0, 6), ind_agua = c(0, 0, 1), esgoto = c(1, 1, 6), ind_cond_sanitaria = c(1, 1, 1), lixo = c(0, 0, 0), ind_lixo = c(0, 0, 0), luz = c(0, 0, 0), ind_iluminacao = c(0, 0, 0), ativos = c(0, 0, 0), ind_ativos = c(0, 0, 0), emprego = c(0, 0, 0), ind_emprego = c(0, 0, 0), renda = c(0, 0, 0), ind_renda = c(0, 0, 0), casa = c(1, 1, 0), ind_riqueza = c(1, 1, 0), anos = c(0, 0, 0), ind_estudo = c(0, 0, 0), ler = c(0, 0, 0), ind_alfabetizado = c(0, 0, 0), peso = c(270, 270, 270), sexo = c(0, 1, 1), uf = c("11", "11", "11"), v4609 = c("001772940", "001772940", "001772940"), v4617 = c(110001, 110001, 110001), v4618 = c(1, 1, 1), pre_wgt = c(200, 200, 200), one = c(1L,
1L, 1L), region = c("1", "1", "1"), c = c(0.2, 0.2, 0.2), q = c(0, 0, 0)), .Names = c("domicilio", "agua", "ind_agua", "esgoto", "ind_cond_sanitaria", "lixo", "ind_lixo", "luz", "ind_iluminacao","ativos", "ind_ativos", "emprego", "ind_emprego", "renda", "ind_renda", "casa", "ind_riqueza", "anos", "ind_estudo", "ler", "ind_alfabetizado","peso", "sexo", "uf", "v4609", "v4617", "v4618", "pre_wgt", "one", "region", "c", "q"), row.names = c(NA, 3L), class = "data.frame")
# complex sample survey design
library(survey)
# your data.frame
x <- structure(list(domicilio = c(11000015001, 11000015003, 11000015004), agua = c(0, 0, 6), ind_agua = c(0, 0, 1), esgoto = c(1, 1, 6), ind_cond_sanitaria = c(1, 1, 1), lixo = c(0, 0, 0), ind_lixo = c(0, 0, 0), luz = c(0, 0, 0), ind_iluminacao = c(0, 0, 0), ativos = c(0, 0, 0), ind_ativos = c(0, 0, 0), emprego = c(0, 0, 0), ind_emprego = c(0, 0, 0), renda = c(0, 0, 0), ind_renda = c(0, 0, 0), casa = c(1, 1, 0), ind_riqueza = c(1, 1, 0), anos = c(0, 0, 0), ind_estudo = c(0, 0, 0), ler = c(0, 0, 0), ind_alfabetizado = c(0, 0, 0), peso = c(270, 270, 270), sexo = c(0, 1, 1), uf = c("11", "11", "11"), v4609 = c("001772940", "001772940", "001772940"), v4617 = c(110001, 110001, 110001), v4618 = c(1, 1, 1), pre_wgt = c(200, 200, 200), one = c(1L,
1L, 1L), region = c("1", "1", "1"), c = c(0.2, 0.2, 0.2), q = c(0, 0, 0)), .Names = c("domicilio", "agua", "ind_agua", "esgoto", "ind_cond_sanitaria", "lixo", "ind_lixo", "luz", "ind_iluminacao","ativos", "ind_ativos", "emprego", "ind_emprego", "renda", "ind_renda", "casa", "ind_riqueza", "anos", "ind_estudo", "ler", "ind_alfabetizado","peso", "sexo", "uf", "v4609", "v4617", "v4618", "pre_wgt", "one", "region", "c", "q"), row.names = c(NA, 3L), class = "data.frame")
# your survey.design (this is not the correct svydesign statement, please follow the directions specific to your data set)
y <- svydesign( ~ 1 , data = x , weights = ~ pre_wgt )
# your desired subset
z <- subset( y , q == 1 )
# your desired mean
svyby( ~ c , ~ region , z , svymean )
aggregate(df$c, by=list(df$q), FUN=mean)
Here's another possibility. To illustrate, create a dataset per your parameters:
set.seed(787)
dat.a <-runif(n=10,min=0,max=1)
dat.b <-rbinom(n=10, size=1, prob=0.5)
dat.1 <-data.frame(matrix(c(dat.a, dat.b), ncol=2, nrow=10))
colnames(dat.1) <-c("c","q")
dat.1
c q
1 0.35326234 1
2 0.45277055 0
3 0.29505270 0
4 0.78723105 1
5 0.95915348 1
6 0.17505284 0
7 0.79693672 0
8 0.01648420 1
9 0.02706417 0
10 0.93996311 1
Now subset by extracting all rows that match q=1 and compute mean for column c in resulting output:
dat.1.subset <-dat.1[dat.1$q==1,]
mean(dat.1.subset$c)
[1] 0.6112188

How can I plot ggplot2 columns in a loop?

I have a dataframe with lat, long and dozens of other columns. Basically, I want to write a loop where X=Long,Y=Lat remains constant, but the color changes in every loop. The color is basically the other columns, one for every plot. How can I do this?
library(maps)
library(ggplot2)
library(RColorBrewer)
library(reshape)
usamap <- ggplot2::map_data("state")
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
simplefun<-function(colname){
ggplot()+
geom_polygon( data=usamap, aes(x=long, y=lat, group=group),colour="black",fill="white")+
geom_point(data=stat,aes_string(x=stat$longitude,y=stat$latitude,color=colname))+
scale_colour_gradientn(name="name",colours = myPalette(10))+
xlab('Longitude')+
ylab('Latitude')+
coord_map(projection = "mercator")+
theme_bw()+
theme(line = element_blank())+
theme(legend.position = c(.93,.20),panel.grid.major = element_line(colour = "#808080"))+
ggsave(paste0(colname,".png"),width=10, height=8,dpi=300)
}
colname<-names(stat[4:16])
lapply(colname,simplefun)
dput(droplevels(stat))
structure(list(siteId = structure(1:16, .Label = c("US1NYAB0001",
"US1NYAB0006", "US1NYAB0010", "US1NYAB0021", "US1NYAB0023", "US1NYAB0028",
"US1NYAB0032", "US1NYAL0002", "US1NYBM0004", "US1NYBM0007", "US1NYBM0011",
"US1NYBM0014", "US1NYBM0021", "US1NYBM0024", "US1NYBM0032", "US1NYBM0034"
), class = "factor"), latitude = c(42.667, 42.7198, 42.5455,
42.6918, 42.6602, 42.7243, 42.5754, 42.2705, 42.0296, 42.0493,
42.0735, 42.3084, 42.0099, 42.1098, 42.1415, 42.0826), longitude = c(-74.0509,
-73.9304, -74.1475, -73.8311, -73.8103, -73.757, -73.7995, -77.9419,
-76.0213, -76.0288, -75.9296, -75.9569, -75.5142, -75.8858, -75.889,
-75.9912), no = c(2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L), min_obs = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0), min_mod = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0), avg_obs = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0.15,
0, 0, 0, 0, 0, 0), avg_mod = c(3136.8388671875, 2997.28173828125,
3258.61840820312, 2970.74340820312, 2992.9765625, 0, 3075.54443359375,
2701.03662109375, 2974.23413085938, 2967.5029296875, 3004.57861328125,
2965.07470703125, 3260.25463867188, 3028.55590820312, 2981.8876953125,
0), max_obs = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0.3, 0, 0, 0, 0, 0,
0), max_mod = c(6273.677734375, 5994.5634765625, 6517.23681640625,
5941.48681640625, 5985.953125, 0, 6151.0888671875, 5402.0732421875,
5948.46826171875, 5935.005859375, 6009.1572265625, 5930.1494140625,
6520.50927734375, 6057.11181640625, 5963.775390625, 0), mean_bias = c(0,
0, 0, 0, 0, NaN, 0, 0, 0, 5.05475490855863e-05, 0, 0, 0, 0, 0,
NaN), corr_coef = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, 1, NA,
NA, NA, NA, NA, NA), additive_bias = c(-6273.677734375, -5994.5634765625,
-6517.23681640625, -5941.48681640625, -5985.953125, 0, -6151.0888671875,
-5402.0732421875, -5948.46826171875, -5934.705859375, -6009.1572265625,
-5930.1494140625, -6520.50927734375, -6057.11181640625, -5963.775390625,
0), mean_error = c(-3136.8388671875, -2997.28173828125, -3258.61840820312,
-2970.74340820312, -2992.9765625, 0, -3075.54443359375, -2701.03662109375,
-2974.23413085938, -2967.3529296875, -3004.57861328125, -2965.07470703125,
-3260.25463867188, -3028.55590820312, -2981.8876953125, 0), mean_abs_error = c(3136.8388671875,
2997.28173828125, 3258.61840820312, 2970.74340820312, 2992.9765625,
0, 3075.54443359375, 2701.03662109375, 2974.23413085938, 2967.3529296875,
3004.57861328125, 2965.07470703125, 3260.25463867188, 3028.55590820312,
2981.8876953125, 0), rmse = c(4436.16006895562, 4238.79648453055,
4608.38234747949, 4201.26561821133, 4232.7080465523, 0, 4349.47664966936,
3819.84262201718, 4206.20224553428, 4196.4707575116, 4249.11582411849,
4193.24886413303, 4610.69632679956, 4283.02483978603, 4217.02602018439,
0)), .Names = c("siteId", "latitude", "longitude", "no", "min_obs",
"min_mod", "avg_obs", "avg_mod", "max_obs", "max_mod", "mean_bias",
"corr_coef", "additive_bias", "mean_error", "mean_abs_error",
"rmse"), row.names = c(NA, -16L), class = "data.frame")
I had the same problem and I solve it like this:
Let's assume ggplotdata in my code is like your dataframe (with more than two columns) in the second post. (dput(droplevels(stat))?)
library(reshape) # package for melting data
shapes <- 1:ncol(ggplot_data) # number of shapes
ggplot_data <- melt(ggplot_data, id = "X1") # melt data together
p1 <- ggplot(ggplot_data, aes(X1,value))
p1 <- p1 +aes(shape = factor(variable))+ # different shapes
geom_point(aes(colour = factor(variable)))+ # different colors
scale_shape_manual(labels=colname, values = shapes)+ # same for the legend
scale_color_manual(labels=colname, values = mypalette) # same for legend

Using aggregate on list in R

I have a list (lst3, subset below) and would like to do some calculations on it, e.g.:
lst4 <-lapply(lst3, function(x) aggregate(x[,5:ncol(x)], x[c(4)], FUN = mean)) #column means
lst5<-lapply(lst4,function(x) apply(x[,-c(1)],1,mean)) # get row mean
However, I am unable to get row mean without ignoring "Site".
I would like my final list to look like this:
lst5<-
[[1]]
Site x
G116 1.864233
[[2]]
Site x
GG16 2.064567
The essence is that the final list should have the above structure so that I can write my data to working directory using:
lapply(lst5,function(x)write.table(x,file=paste(getwd(),"summer",paste0(unique(x$Site),".csv"),
sep="/"),row.names=FALSE,quote=FALSE)) ### create a folder called "summer" and write files to directory###
Thanks,
AZ.
list(structure(list(Year = c(2005L, 2005L, 2005L), Month = c(8L,
8L, 8L), Day = 29:31, Site = structure(c(1L, 1L, 1L), .Label = "G116", class = "factor"),
Sim001 = c(8.4, 17.72, 6.03), Sim002 = c(0.27, 0, 0), Sim003 = c(2.83,
0.14, 0.1), Sim004 = c(0, 0, 0), Sim005 = c(0, 0.77, 0.28
), Sim006 = c(0, 0, 0), Sim007 = c(0, 0, 0), Sim008 = c(10.94,
4.77, 0), Sim009 = c(0, 0, 0), Sim010 = c(3.43, 2.74, 0.65
), Sim011 = c(0.36, 0, 2.75), Sim012 = c(26.91, 0, 2.16),
Sim013 = c(0.88, 1.33, 0.87), Sim014 = c(0, 0.86, 9.42),
Sim015 = c(0, 0.17, 1.15), Sim016 = c(0, 0, 0), Sim017 = c(0.13,
0, 0), Sim018 = c(0, 0, 6.72), Sim019 = c(8.45, 12.99, 23.72
), Sim020 = c(1.76, 0, 0), Sim021 = c(0, 0, 2.34), Sim022 = c(0,
0, 0), Sim023 = c(1.2, 0, 0.26), Sim024 = c(0.85, 0, 0),
Sim025 = c(0, 0, 0), Sim026 = c(2.05, 0.76, 5.03), Sim027 = c(0.78,
0, 0), Sim028 = c(1.2, 0, 0), Sim029 = c(22, 0.19, 0), Sim030 = c(0.12,
0, 0), Sim031 = c(3.1, 13.67, 0), Sim032 = c(0, 0, 17.88),
Sim033 = c(0, 0, 0), Sim034 = c(1.11, 0, 0), Sim035 = c(1.17,
1.41, 23.35), Sim036 = c(0, 0.48, 1.71), Sim037 = c(1.51,
11.1, 7.98), Sim038 = c(0, 0, 0), Sim039 = c(0, 0, 5.46),
Sim040 = c(5.21, 0, 0), Sim041 = c(0.1, 0.11, 0), Sim042 = c(0,
0.15, 5.23), Sim043 = c(0, 0, 0), Sim044 = c(0, 0.1, 0),
Sim045 = c(0, 0, 0), Sim046 = c(0, 0, 0), Sim047 = c(0, 0,
0.11), Sim048 = c(0, 0, 0), Sim049 = c(0, 0, 4.05), Sim050 = c(0,
0, 0), Sim051 = c(0, 0.12, 0), Sim052 = c(0.24, 2.58, 0),
Sim053 = c(3.63, 0, 0.17), Sim054 = c(10.94, 2.69, 0), Sim055 = c(0,
0, 0), Sim056 = c(0.24, 0.44, 8.27), Sim057 = c(0, 0, 0),
Sim058 = c(0, 0, 3.75), Sim059 = c(0.19, 11.06, 0), Sim060 = c(0,
0, 1.65), Sim061 = c(0, 4.95, 0), Sim062 = c(0.15, 0, 4.73
), Sim063 = c(2.99, 0.12, 1.28), Sim064 = c(0, 0, 0), Sim065 = c(0,
0, 0), Sim066 = c(0, 0, 0), Sim067 = c(0.11, 0.62, 0.56),
Sim068 = c(2.84, 0, 0), Sim069 = c(0, 0, 0), Sim070 = c(17.91,
0.11, 4.78), Sim071 = c(0, 0, 1.68), Sim072 = c(0, 0, 1.38
), Sim073 = c(1.68, 0, 0), Sim074 = c(0.53, 0, 2.87), Sim075 = c(0,
0, 0), Sim076 = c(2.58, 0.27, 0.11), Sim077 = c(0, 0, 0),
Sim078 = c(9.07, 3.13, 8.62), Sim079 = c(0.98, 0, 2.38),
Sim080 = c(3.4, 0, 0), Sim081 = c(0, 0, 4.57), Sim082 = c(1.87,
2.86, 0), Sim083 = c(21.76, 2.24, 0), Sim084 = c(0.45, 4.03,
0.39), Sim085 = c(0, 0, 0), Sim086 = c(0, 0, 0), Sim087 = c(0,
0, 17.12), Sim088 = c(5.05, 0, 0), Sim089 = c(0, 0, 1.4),
Sim090 = c(0.1, 0, 0), Sim091 = c(1.96, 0, 1.38), Sim092 = c(0,
0, 0), Sim093 = c(0, 0, 0), Sim094 = c(0, 0, 1.81), Sim095 = c(2.72,
7.16, 1.7), Sim096 = c(6.37, 0, 0), Sim097 = c(0, 1.12, 25.7
), Sim098 = c(0, 0, 0), Sim099 = c(0, 0, 0), Sim100 = c(6.77,
10.87, 2.6)), .Names = c("Year", "Month", "Day", "Site",
"Sim001", "Sim002", "Sim003", "Sim004", "Sim005", "Sim006", "Sim007",
"Sim008", "Sim009", "Sim010", "Sim011", "Sim012", "Sim013", "Sim014",
"Sim015", "Sim016", "Sim017", "Sim018", "Sim019", "Sim020", "Sim021",
"Sim022", "Sim023", "Sim024", "Sim025", "Sim026", "Sim027", "Sim028",
"Sim029", "Sim030", "Sim031", "Sim032", "Sim033", "Sim034", "Sim035",
"Sim036", "Sim037", "Sim038", "Sim039", "Sim040", "Sim041", "Sim042",
"Sim043", "Sim044", "Sim045", "Sim046", "Sim047", "Sim048", "Sim049",
"Sim050", "Sim051", "Sim052", "Sim053", "Sim054", "Sim055", "Sim056",
"Sim057", "Sim058", "Sim059", "Sim060", "Sim061", "Sim062", "Sim063",
"Sim064", "Sim065", "Sim066", "Sim067", "Sim068", "Sim069", "Sim070",
"Sim071", "Sim072", "Sim073", "Sim074", "Sim075", "Sim076", "Sim077",
"Sim078", "Sim079", "Sim080", "Sim081", "Sim082", "Sim083", "Sim084",
"Sim085", "Sim086", "Sim087", "Sim088", "Sim089", "Sim090", "Sim091",
"Sim092", "Sim093", "Sim094", "Sim095", "Sim096", "Sim097", "Sim098",
"Sim099", "Sim100"), row.names = 15947:15949, class = "data.frame"),
structure(list(Year = c(2005L, 2005L, 2005L), Month = c(8L,
8L, 8L), Day = 29:31, Site = structure(c(1L, 1L, 1L), .Label = "GG16", class = "factor"),
Sim001 = c(18.36, 0.33, 0.14), Sim002 = c(0, 10.92, 0
), Sim003 = c(0, 0, 0), Sim004 = c(0, 0, 1.7), Sim005 = c(0,
0, 0), Sim006 = c(0.91, 4.24, 0), Sim007 = c(0, 0, 0.22
), Sim008 = c(0.63, 2.9, 2.24), Sim009 = c(0, 0, 0),
Sim010 = c(0, 0, 6.91), Sim011 = c(0, 3.28, 10.18), Sim012 = c(8.39,
14.58, 45.62), Sim013 = c(2.87, 0.53, 0.11), Sim014 = c(9.15,
21.1, 0.66), Sim015 = c(0, 1.75, 2.2), Sim016 = c(0,
7.86, 0), Sim017 = c(0, 0, 0), Sim018 = c(0, 0, 0), Sim019 = c(0,
0, 0), Sim020 = c(0.39, 0, 0), Sim021 = c(0.13, 0, 1.05
), Sim022 = c(0, 0, 10.91), Sim023 = c(0.23, 0, 0), Sim024 = c(0.12,
0.83, 5.35), Sim025 = c(0, 0, 0), Sim026 = c(7.75, 0,
4.82), Sim027 = c(20.04, 0, 0), Sim028 = c(12.41, 0,
5.3), Sim029 = c(0, 0, 0), Sim030 = c(0, 0, 0), Sim031 = c(0,
8.06, 0), Sim032 = c(0, 0, 0), Sim033 = c(0, 0, 0), Sim034 = c(0.1,
0, 3.34), Sim035 = c(0, 4.34, 3.53), Sim036 = c(2.89,
0.27, 0), Sim037 = c(0, 0, 0), Sim038 = c(0, 0, 0), Sim039 = c(0,
0.11, 0), Sim040 = c(9.83, 1.55, 9.09), Sim041 = c(3.6,
0, 0), Sim042 = c(0, 0, 1.37), Sim043 = c(0, 0, 0), Sim044 = c(0,
0, 0), Sim045 = c(0, 0, 0), Sim046 = c(0, 0, 0), Sim047 = c(0,
20.52, 0.65), Sim048 = c(1.77, 0.67, 0), Sim049 = c(0,
0, 0), Sim050 = c(0, 0, 0), Sim051 = c(0, 4.9, 0), Sim052 = c(0.71,
11.34, 0), Sim053 = c(3.46, 2.59, 1.5), Sim054 = c(0,
23.63, 0), Sim055 = c(0, 16.48, 4.99), Sim056 = c(0,
0, 0), Sim057 = c(0, 0, 0), Sim058 = c(0, 0, 0), Sim059 = c(0,
0, 0), Sim060 = c(16.87, 0, 0), Sim061 = c(0, 3.43, 0
), Sim062 = c(0.45, 0, 0), Sim063 = c(0, 11.14, 7.22),
Sim064 = c(0, 0, 0), Sim065 = c(0, 0, 0), Sim066 = c(0,
16.08, 1.87), Sim067 = c(0, 0, 0), Sim068 = c(5.16, 0.88,
0.1), Sim069 = c(0, 0, 3.91), Sim070 = c(0, 0, 0), Sim071 = c(0.17,
0, 5.22), Sim072 = c(0, 0, 6.95), Sim073 = c(0, 0, 0),
Sim074 = c(0.14, 0, 0), Sim075 = c(0, 0, 0), Sim076 = c(0,
9.62, 0), Sim077 = c(0, 0, 0), Sim078 = c(1.65, 0, 0),
Sim079 = c(0.23, 8.41, 0.28), Sim080 = c(0.78, 0, 0),
Sim081 = c(0, 0, 0), Sim082 = c(0.11, 2.75, 0), Sim083 = c(0.26,
7.34, 5.92), Sim084 = c(0, 0, 4.27), Sim085 = c(0, 0,
0), Sim086 = c(0, 0, 0.1), Sim087 = c(27.18, 0.72, 28.29
), Sim088 = c(0, 0, 4.2), Sim089 = c(0, 9.37, 6.59),
Sim090 = c(0.21, 2.57, 0), Sim091 = c(0.45, 0, 0), Sim092 = c(0,
4.97, 0), Sim093 = c(1.43, 0, 0), Sim094 = c(0, 0, 2.15
), Sim095 = c(6, 0, 1.63), Sim096 = c(7.21, 0, 0), Sim097 = c(0,
0.39, 1.92), Sim098 = c(0, 0, 0), Sim099 = c(4.38, 0,
0), Sim100 = c(0, 0, 0)), .Names = c("Year", "Month",
"Day", "Site", "Sim001", "Sim002", "Sim003", "Sim004", "Sim005",
"Sim006", "Sim007", "Sim008", "Sim009", "Sim010", "Sim011",
"Sim012", "Sim013", "Sim014", "Sim015", "Sim016", "Sim017",
"Sim018", "Sim019", "Sim020", "Sim021", "Sim022", "Sim023",
"Sim024", "Sim025", "Sim026", "Sim027", "Sim028", "Sim029",
"Sim030", "Sim031", "Sim032", "Sim033", "Sim034", "Sim035",
"Sim036", "Sim037", "Sim038", "Sim039", "Sim040", "Sim041",
"Sim042", "Sim043", "Sim044", "Sim045", "Sim046", "Sim047",
"Sim048", "Sim049", "Sim050", "Sim051", "Sim052", "Sim053",
"Sim054", "Sim055", "Sim056", "Sim057", "Sim058", "Sim059",
"Sim060", "Sim061", "Sim062", "Sim063", "Sim064", "Sim065",
"Sim066", "Sim067", "Sim068", "Sim069", "Sim070", "Sim071",
"Sim072", "Sim073", "Sim074", "Sim075", "Sim076", "Sim077",
"Sim078", "Sim079", "Sim080", "Sim081", "Sim082", "Sim083",
"Sim084", "Sim085", "Sim086", "Sim087", "Sim088", "Sim089",
"Sim090", "Sim091", "Sim092", "Sim093", "Sim094", "Sim095",
"Sim096", "Sim097", "Sim098", "Sim099", "Sim100"), row.names = 15947:15949, class = "data.frame"))
You can go from lst3 directly to lst5 without the intermediate aggregate step:
lapply(lst3, function(df){
data.frame(Site = df$Site[1], x = mean(unlist(df[-c(1:4)])))
})
#[[1]]
# Site x
#1 G116 1.864233
#
#[[2]]
# Site x
#1 GG16 2.064567
Since you're calculating the mean of all columns except the first 4 columns and over all the rows of the other columns, it's quite easy to unlist the data, creating a single vector, and then using standard mean on it. Also, by skipping the lst4 step, this most likely be noticeably faster.
Or, as commented by Richard, a variation could be:
lapply(lst3, function(df){
data.frame(Site = df$Site[1], x = mean(colMeans(df[-c(1:4)])))
})
Benchmark:
library(microbenchmark)
microbenchmark(
f1 = {lapply(lst3, function(df){
data.frame(Site = df$Site[1], x = mean(unlist(df[-c(1:4)])))
})},
f2 = {lapply(lst3, function(df){
data.frame(Site = df$Site[1], x = mean(colMeans(df[-c(1:4)])))
})},
unit = "relative"
)
Unit: relative
expr min lq median uq max neval
f1 1.00000 1.000000 1.000000 1.000000 1.000000 100
f2 2.91545 2.937272 2.927799 2.894704 3.486007 100
Here's another option for your consideration:
library(reshape2)
x <- melt(lst3)
aggregate(value ~ Site, x[grepl("^Sim.*", x$variable),], FUN = mean)
# Site value
#1 G116 1.864233
#2 GG16 2.064567
Or the same concept but using dplyr:
library(dplyr)
filter(x, grepl("^Sim.*", variable)) %>% group_by(Site) %>% summarise(x = mean(value))
#Source: local data frame [2 x 2]
#
# Site x
#1 G116 1.864233
#2 GG16 2.064567
Of course, this could also be done using data.table, for example like this (there are probably several even slightly more efficient ways to do this in data.table):
library(data.table)
setDT(x)[grepl("^Sim.*", variable), list(x = mean(value)), by = Site]
# Site x
#1: G116 1.864233
#2: GG16 2.064567

How to make the graph easier to read ? Editing plot

I would like to ask you for the suggestions how I can edit my plot function to make my graph more clear ?
Here I show you the code which I use for plotting:
# open the pdf file
pdf(file='LSF1_PWD_GWD.pdf')
a <- c('LSF1', 'PWD', 'GWD')
rowsToPlot<-c(1066,2269,109)
matplot(as.matrix(t(tbl_alles[rowsToPlot,])),type=rep("l", length(rowsToPlot)), col=rainbow(length(rowsToPlot)),xlab = 'Fraction Size', ylab = 'Intensity')
legend('topright',a,lty=1, bty='n', cex=.75, col = rainbow(length(rowsToPlot)))
# close the pdf file
dev.off()
and that's how the graph looks like:
It's just a basic plot because I have no idea how to edit it. The arrow indicates three lines on one position which you can't see because they overlap... and that's the most important part of this graph for me. Maybe I shouldn't use dotted line ? How to change it ?
Data:
tbl_alles <-
structure(list("10" = c(0, 0, 0, 0, 0, 0),
"20" = c(0, 0, 0, 0, 0, 0),
"52.5" = c(0, 0, 0, 0, 0, 0),
"81" = c(0, 0, 1, 0, 0, 0),
"110" = c(0, 0, 0, 0, 0, 0),
"140.5" = c(0, 0, 0, 0, 0, 0),
"189" = c(0, 0, 0, 0, 0, 0),
"222.5" = c(0, 0, 0, 0, 0, 0 ),
"278" = c(0, 0, 0, 0, 0, 0),
"340" = c(0, 0, 0, 0, 0, 0),
"397" = c(0, 1, 0, 0, 0, 0),
"453.5" = c(0, 0.66069369, 0, 0, 0, 1),
"529" = c(0, 0.521435654, 0, 0, 1, 0),
"580" = c(0, 0.437291195, 0, 0, 1, 0),
"630.5" = c(0, 0.52204783, 0, 0, 0, 0),
"683.5" = c(0, 0.52429838, 0, 0, 0, 0),
"735.5" = c(1, 0.3768651, 0, 1, 0, 0),
"784" = c(0, 0, 0, 0, 0, 0),
"832" = c(0, 0, 0, 0, 0, 0),
"882.5" = c(0, 0, 0, 0, 0, 0),
"926.5" = c(0, 0, 0, 0, 0, 0),
"973" = c(0, 0, 0, 0, 0, 0),
"1108" = c(0, 0, 0, 0, 0, 0),
"1200" = c(0, 0, 0, 0, 0, 0)),
.Names = c("10", "20", "52.5", "81",
"110", "140.5","189", "222.5",
"278", "340", "397", "453.5",
"529", "580", "630.5", "683.5",
"735.5", "784", "832", "882.5",
"926.5", "973", "1108", "1200"),
row.names = c("at1g01050.1", "at1g01080.1",
"at1g01090.1","at1g01220.1",
"at1g01420.1", "at1g01470.1"),
class = "data.frame")
RowsToPlot:
> dput(tbl_alles[rowsToPlot,])
structure(list(`10` = c(0, 0, 0), `20` = c(0, 0, 0), `52.5` = c(0,
0, 0), `81` = c(0, 0, 0), `110` = c(0, 0, 0), `140.5` = c(0,
0, 0), `189` = c(0, 0, 0), `222.5` = c(0, 0, 0), `278` = c(0,
0, 0), `340` = c(0, 0, 0), `397` = c(0, 0, 0), `453.5` = c(0,
0, 0), `529` = c(0, 0, 0), `580` = c(0, 0, 0), `630.5` = c(0,
0, 0), `683.5` = c(0, 0, 0.57073483), `735.5` = c(0, 1, 0.85691826
), `784` = c(0, 0, 0.90706982), `832` = c(1, 1, 1), `882.5` = c(0,
0, 0), `926.5` = c(0, 0, 0), `973` = c(0, 0, 0), `1108` = c(0,
0, 0), `1200` = c(0, 0, 0)), .Names = c("10", "20", "52.5", "81",
"110", "140.5", "189", "222.5", "278", "340", "397", "453.5",
"529", "580", "630.5", "683.5", "735.5", "784", "832", "882.5",
"926.5", "973", "1108", "1200"), row.names = c("at3g01510.1",
"at5g26570.1", "at1g10760.1"), class = "data.frame")
Okay, here's a way to distinguish the lines clearly, while keeping everything on one plot. I use non solid linetypes and different sizes to 'make room' for the overlayed lines.
library(reshape2)
library(ggplot2)
dat <- as.data.frame(as.matrix(t(tbl_alles)))
dat$x <- as.numeric(row.names(dat))
ggplot(melt(dat, id.vars='x'), aes(x=x, y=value, group=variable)) +
geom_line(aes(color=variable, linetype=variable, size=variable)) +
scale_linetype_manual(values=c('solid', 'dotted', 'dashed')) +
scale_size_manual(values=c(1,3,1)) +
scale_color_manual(values=c('black', 'red', 'white')) +
theme(axis.text = element_text(color='black'),
panel.background = element_rect('grey'),
legend.key = element_rect('grey'),
panel.grid = element_blank()) +
labs(title='This is not a pretty chart, but you can make out the lines')
I took as a starting point your data from the dput you pasted above:
tbl_alles <- structure(list(`10` = c(0, 0, 0), `20` = c(0, 0, 0), `52.5` = c(0, 0, 0), `81` = c(0, 0, 0), `110` = c(0, 0, 0), `140.5` = c(0, 0, 0), `189` = c(0, 0, 0), `222.5` = c(0, 0, 0), `278` = c(0, 0, 0), `340` = c(0, 0, 0), `397` = c(0, 0, 0), `453.5` = c(0, 0, 0), `529` = c(0, 0, 0), `580` = c(0, 0, 0), `630.5` = c(0, 0, 0), `683.5` = c(0, 0, 0.57073483), `735.5` = c(0, 1, 0.85691826), `784` = c(0, 0, 0.90706982), `832` = c(1, 1, 1), `882.5` = c(0, 0, 0), `926.5` = c(0, 0, 0), `973` = c(0, 0, 0), `1108` = c(0, 0, 0), `1200` = c(0, 0, 0)), .Names = c("10", "20", "52.5", "81", "110", "140.5", "189", "222.5", "278", "340", "397", "453.5", "529", "580", "630.5", "683.5", "735.5", "784", "832", "882.5", "926.5", "973", "1108", "1200"), row.names = c("at3g01510.1", "at5g26570.1", "at1g10760.1"), class = "data.frame")
This is most certainly not what you need, but perhaps it can give you another idea.
X=structure(list(`10` = c(0, 0, 0), `20` = c(0, 0, 0), `52.5` = c(0,
0, 0), `81` = c(0, 0, 0), `110` = c(0, 0, 0), `140.5` = c(0,
0, 0), `189` = c(0, 0, 0), `222.5` = c(0, 0, 0), `278` = c(0,
0, 0), `340` = c(0, 0, 0), `397` = c(0, 0, 0), `453.5` = c(0,
0, 0), `529` = c(0, 0, 0), `580` = c(0, 0, 0), `630.5` = c(0,
0, 0), `683.5` = c(0, 0, 0.57073483), `735.5` = c(0, 1, 0.85691826
), `784` = c(0, 0, 0.90706982), `832` = c(1, 1, 1), `882.5` = c(0,
0, 0), `926.5` = c(0, 0, 0), `973` = c(0, 0, 0), `1108` = c(0,
0, 0), `1200` = c(0, 0, 0)), .Names = c("10", "20", "52.5", "81",
"110", "140.5", "189", "222.5", "278", "340", "397", "453.5",
"529", "580", "630.5", "683.5", "735.5", "784", "832", "882.5",
"926.5", "973", "1108", "1200"), row.names = c("at3g01510.1",
"at5g26570.1", "at1g10760.1"), class = "data.frame");
library(ggplot2)
library(reshape2)
library(data.table)
X.dt<-as.data.table(t(X))
X.dt[,X:=1:dim(X.dt)[1]]
X.dt<-melt(X.dt, id='X')
ggplot(X.dt,aes(X, value,group=variable,color=variable))+
geom_line()+
facet_wrap(~variable, nrow=3)+
guides(color=FALSE)+labs(x="X",y="Intensity")
Since you have a discrete number of x values, I suggest using a barplot instead. This will make the categories easier to distinguish and highlight the aspect you are most interested in.
First put the data in long format
dat <- structure(list(`10` = c(0, 0, 0), `20` = c(0, 0, 0), `52.5` = c(0, 0, 0),
`81` = c(0, 0, 0), `110` = c(0, 0, 0), `140.5` = c(0, 0, 0),
`189` = c(0, 0, 0), `222.5` = c(0, 0, 0), `278` = c(0, 0, 0),
`340` = c(0, 0, 0), `397` = c(0, 0, 0), `453.5` = c(0, 0, 0),
`529` = c(0, 0, 0), `580` = c(0, 0, 0), `630.5` = c(0, 0, 0),
`683.5` = c(0, 0, 0.57073483), `735.5` = c(0, 1, 0.85691826),
`784` = c(0, 0, 0.90706982), `832` = c(1, 1, 1),
`882.5` = c(0, 0, 0), `926.5` = c(0, 0, 0), `973` = c(0, 0, 0),
`1108` = c(0, 0, 0), `1200` = c(0, 0, 0)),
.Names = c("10", "20", "52.5", "81", "110", "140.5", "189",
"222.5", "278", "340", "397", "453.5", "529", "580",
"630.5", "683.5", "735.5", "784", "832", "882.5",
"926.5", "973", "1108", "1200"),
row.names = c("at3g01510.1", "at5g26570.1", "at1g10760.1"),
class = "data.frame")
library(tidyr)
dat$rowname <- rownames(dat)
ggdat <- gather(dat, key = "colname", value = "Intensity", -rowname)
Then create the barplot using ggplot2
library(RColorBrewer)
library(ggplot2)
colors <- brewer.pal(nrow(dat), "Dark2")
ggplot(data = ggdat, aes(x = colname, y = Intensity, fill = rowname)) +
geom_bar(aes(color = rowname), stat = "identity",
position = position_dodge(), width = 0.75) +
scale_fill_manual(values = colors) +
scale_color_manual(values = colors) +
theme(axis.title.x = element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
legend.position = "bottom")
The code could be used for more than 3 rows, although the bars will get harder to distinguish with more categories. If this is a problem, you could consider dropping/binning x values, or perhaps splitting the plot into two:
ggdat$group <- factor(ggdat$colname %in% colnames(dat)[1:12],
levels = c(TRUE, FALSE), labels = c("Low x", "High x"))
ggplot(data = ggdat, aes(x = colname, y = Intensity, fill = rowname)) +
geom_bar(aes(color = rowname), stat = "identity",
position = position_dodge(), width = 0.75) +
scale_fill_manual(values = colors) +
scale_color_manual(values = colors) +
theme(axis.title.x = element_blank(),
axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5),
legend.position = "bottom") +
facet_wrap(~ group, ncol = 1, scales = "free_x")
How many records does the dataset have? It seems you are dealing with an overplotting issue. Follow #Nikos method to tidy the data.
Use size and alpha to change the size and transparency of the line.
ggplot(data = X.dt, aes(x = X, y = value, group = variable, color = variable)) +
geom_line(data = X.dt, aes(x = X, y = value, group = variable, color = variable),
size = 3, alpha = .25)
The color of the line changes as they overlap. However this will only work for smaller datasets. My only other suggestion is to overlay geom_line() with geom_point() that will plot points over the lines. You can use position = position_jitter() to slightly augment the position of the points, that way if they overlap you can see where they overlap.
ggplot(data = X.dt, aes(x = X, y = value, group = variable, color = variable)) +
geom_point(position = position_jitter(w = 0.001, h = 0.02), size = 3, alpha = .5) +
geom_line(data = X.dt, aes(x = X, y = value, group = variable, color = variable), size = 1, alpha = .25)
You can try to play with the line types but this can become really difficult if you have too much lines to see : is 3 the maximum you'll have ? Else, you may consider another way to draw your data.
Here is an example with your data, when I plot it, I can see the 3 lines :
matplot(as.matrix(t(tbl_alles[rowsToPlot,])),type="l",lwd=2,lty=c("solid","48","36"), col=rainbow(length(rowsToPlot)),xlab = 'Fraction Size', ylab = 'Intensity')
legend('topright',c('LSF1', 'PWD', 'GWD'),lty=c("solid","48","36"),lwd=2, bty='n', cex=.75, col = rainbow(length(rowsToPlot)))
the 3 line types :
solid: this is the default type, as you already know...
48: first 4 units of line then a blank of 8 units
36: first 3 units of line then a blank of 6 units.
I also changed the width of the line with lwd=2.
There is another parameter to play with : transparency.
If (keeping the different lty) you change the colors to c("#FF000030","#0000FF50","#00FF0080") for example, it will be easier to see every lines (the two last characters of each hexadecimal code specify the transparency).
If you use transparency, then you can even specify a unique color and ovelapping lines will appear darker : for example, col=#00000044".

Resources