R - d3heatmap - implement breaks - r

I am trying to plot a heatmap using the d3heatmap package.
Unfortunately, I have not been successful yet in implementing certain breaks using the option breaks=... as in heatmap or heatmap.2.
This yields just funny results, I am not even sure whether I am doing something wrong or whether the function just ignores breaks.
For example, I tried:
breaks = c(seq(-10, -2), seq(-2, -1.65), seq(-1.65, 1.65), seq(1.65, 2), seq(2, 10)
and
breaks = c(-10, -2, -1.65, 1.65, 2, 10)
with
colors = c("red", "yellow", "green", "yellow", "red")
but nothing seems to work properly.
Any suggestions?
Here's the dput of my data:
> dput(mat)
structure(c(-0.04, NA, 0.59, NA, 0.675, 0.96, 1.09, 0.445, NA,
0.545, NA, NA, 0.09, -1.11, NA, 0.99, 0.13, 0.215, 1.425, 0,
NA, 0.69, 0.805, NA, 0.69, 1.22, NA, 0.3, NA, 0.025, NA, 0.075,
0.36, -0.94, NA, -0.31, 0.26, 1.02, -1.19, NA, NA, -0.77, NA,
-1.48, 1.05, 0.48, NA, NA, NA, 1.49, -1.285, NA, 0.76, 1.14,
-0.62, NA, NA, NA, 0.95, NA, NA, -0.12, 0.49, NA, 2.31, NA, -0.33,
0.85, NA, -1.7, -1.63, NA, -1.12, 0.135, -0.18, NA, -0.245, NA,
-0.2, -0.2, 0.23, -0.11, NA, 0.3, -0.81, 0.04, 0.18, -0.7, 0.53,
0.44, -0.49, 0.28, 0.26, 0.06, 0.265, 0.21, 0.06, -0.175, 0.365,
0.255, 1.25, -0.35, 0.16, 0.125, 0.825, 0.08, 0.02, -0.02, 0.99,
0.79, -0.23, 0.06, NA, 0.36, -0.64, -0.195, 1.19, -0.29, 0.915,
NA, NA, NA, NA, 0.2, 0.1, NA, 0.04, 0.33, NA, 1.46, 2.36, NA,
-0.92, 1.295, NA, NA, 0.8, NA, 1.09, 1.45, 5.42, NA, NA, NA,
1.69, 3.43, NA, 0.55), .Dim = c(37L, 4L), .Dimnames = list(c("AT",
"BE", "BG", "CEE", "CH", "CN", "CZ", "DE", "DK", "EA", "EE",
"EMU", "ES", "EU", "FI", "FR", "GB", "GR", "HR", "HU", "IE",
"IT", "JP", "LU", "NL", "PL", "PT", "RO", "RS", "RU", "SE", "SI",
"SK", "TR", "UA", "UK", "US"), c("Credit Risk", "Funding and liquidity Risk",
"Macro Risk", "Market Risk")))
And the code I am running:
d3heatmap(abs(mat),
dendrogram = "none",
breaks = c(0,1.65,2,10),
col = c("green", "yellow", "red"),
na.rm = TRUE)
The same function using heatmap.2 works perfectly, though.

The function d3heatmap simply does not have a 'breaks' argument. If it gets passed in as an argument it is silently ignored. (See ?d3heatmap.)
The heatmap.2 function in the gplots package on the other hand does have a "breaks" argument. That explains the difference in behaviour.
Luckily, it is still possible to get the desired behaviour by passing an appropriate 'colors' function to d3heatmap. It works as follows.
First the example data:
mat <- structure(c(-0.04, NA, 0.59, NA, 0.675, 0.96, 1.09, 0.445, NA,
0.545, NA, NA, 0.09, -1.11, NA, 0.99, 0.13, 0.215, 1.425, 0,
NA, 0.69, 0.805, NA, 0.69, 1.22, NA, 0.3, NA, 0.025, NA, 0.075,
0.36, -0.94, NA, -0.31, 0.26, 1.02, -1.19, NA, NA, -0.77, NA,
-1.48, 1.05, 0.48, NA, NA, NA, 1.49, -1.285, NA, 0.76, 1.14,
-0.62, NA, NA, NA, 0.95, NA, NA, -0.12, 0.49, NA, 2.31, NA, -0.33,
0.85, NA, -1.7, -1.63, NA, -1.12, 0.135, -0.18, NA, -0.245, NA,
-0.2, -0.2, 0.23, -0.11, NA, 0.3, -0.81, 0.04, 0.18, -0.7, 0.53,
0.44, -0.49, 0.28, 0.26, 0.06, 0.265, 0.21, 0.06, -0.175, 0.365,
0.255, 1.25, -0.35, 0.16, 0.125, 0.825, 0.08, 0.02, -0.02, 0.99,
0.79, -0.23, 0.06, NA, 0.36, -0.64, -0.195, 1.19, -0.29, 0.915,
NA, NA, NA, NA, 0.2, 0.1, NA, 0.04, 0.33, NA, 1.46, 2.36, NA,
-0.92, 1.295, NA, NA, 0.8, NA, 1.09, 1.45, 5.42, NA, NA, NA,
1.69, 3.43, NA, 0.55), .Dim = c(37L, 4L),
.Dimnames = list(c("AT", "BE", "BG", "CEE", "CH", "CN", "CZ", "DE", "DK", "EA", "EE", "EMU", "ES", "EU", "FI", "FR", "GB", "GR", "HR", "HU", "IE", "IT", "JP", "LU", "NL", "PL", "PT", "RO", "RS", "RU", "SE", "SI", "SK", "TR", "UA", "UK", "US"), c("Credit Risk", "Funding and liquidity Risk", "Macro Risk", "Market Risk")))
Suppose we want the following three color bins: blue for values < 0, green for values >= 0 but < 2, and red for values >= 2. We then define the corresponding ordered list of colors.
palette <- c("blue", "green", "red")
We also define the boundary values of the color bins. These values must include the domain boundaries.
mi <- min(mat, na.rm = TRUE)
ma <- max(mat, na.rm = TRUE)
breaks <- c(mi, 0, 2, ma)
We can now define a color interpolation function which maps a value in [0,1] onto a color, respecting our color bins. The 'scales' package comes to help here.
install.package('scales') # if needed
library(scales)
colorFunc <- col_bin(palette, bins = rescale(breaks))
The breaks originally defined in the domain of our data needed to be rescaled to [0,1]. The 'rescale' function in the 'scales' package handled that.
Small detail: the low boundary of a bin is included in the bin, but the high boundary is excluded. So the value 0 will be green, anything between 0 and 2 will be green too, but 2 will be red.
We can now plot the heat map.
d3heatmap(mat, dendrogram = "none", colors = colorFunc, na.rm = TRUE)
The result looks like this:

Related

Error in parse(text = elt) when using the tab_spanner in a gt table R

I would like to create a gt table in R. However, I am facing some issues with the tab spanner When I try to use:
tab_spanner(
label = 'If you help me you will be my hero',
columns = names(table2a)[3:8]
)
I really do not understand why since this should be a quite easy thing to do. I have added other elements such as footnotes, source notes and other elements quite smoothly but I am stuck with the tab spanner.
Here is the code:
load(paste0(wd ,"/Tables/Table2a.Rdata"))
gt(table2a) %>%
tab_spanner(
label = 'If you help me you will be my hero',
columns = names(table2a)[3:8]
)
and here is the data if you want to try yourself:
> dput(table2a)
structure(list(Indicator = c("Real GDP growth", NA, "Private consumption",
NA, "Business investment", NA, "Housing investment", NA, "Net exports - contribution to GDP growth",
NA, "Employment growth", NA, "Compensation per employee", NA),
Type = c("Latest Obs.", "(B)MPE", "Latest Obs.", "(B)MPE",
"Latest Obs.", "(B)MPE", "Latest Obs.", "(B)MPE", "Latest Obs.",
"(B)MPE", "Latest Obs.", "(B)MPE", "Latest Obs.", "(B)MPE"
), `2022 Q3` = c(0.8, 0.68, 1.02, 0.83, 2.14, 1.63, -0.77,
-0.29, -395.89, -0.08, 0.35, 0.25, 0.77, 0.38), `2022 Q4` = c(0.32,
0.13, 0.87, 0.31, 7.73, -0.16, -0.69, -0.27, 99.5, 0.12,
0.29, 0.11, 1.09, 1.09), `2023 Q1` = c(NA, -0.06, NA, -0.11,
NA, -0.38, NA, -0.15, NA, 0.05, NA, 0.01, NA, 1.26), `2023 Q2` = c(NA,
0.02, NA, -0.09, NA, 0.04, NA, -0.3, NA, 0.12, NA, 0.03,
NA, 1.46), `2023 Q3` = c(NA, 0.43, NA, 0.36, NA, 0.89, NA,
-0.21, NA, 0.18, NA, 0.01, NA, 1.23), `Annual rate of change` = c(NA,
3.14, NA, 3.59, NA, 3.35, NA, 2.79, NA, 0.04, NA, 1.98, NA,
3.97), CF = c(NA, 3.2, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA), SPF = c(NA, 3.01, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA)), row.names = c(NA, -14L), class = "data.frame")
This is the error I get:
Error in parse(text = elt) : <text>:1:7: unexpected symbol
1: Using an
^
As the error suggests the problem should be related to the names of the columns.
I have read similar issues at the following links 1 and 2 but I am still lost.
Below a screenshot of it

Not getting the correct output with this R script

I have once again thrown myself into learning R. However, I'm not sure if my data is formatted wrong or if I'm missing a key point.
The vision is to compare all samples against each other over time. However, nailing the code has proved difficult. I can't seem to get time on the x-axis and the samples to match and overlap. I have looked at what feels like 100 videos and web pages. Still can't work this in.
Script:
Data2 <- Data3 %>%
gather( key = "test", value = "value", c(-Name))
Data2 %>%
ggplot() +
geom_point(aes(x=value, y=test)) +
ylab("Film type") +
theme(legend.position="none") +
xlab("Time")
Name = c("2% No wash No cure 20gm", "3 % no wash no cure 20 gm", "4 % no wash no cure 20 gm", "2 % no cure just wash 20 gm", "3 % no cure just wash 20gm", "4 % no cure just wash 20 gm", "3 % cure + wash 20 gm", "4%cure+wash 20gm")
Data:
structure(list(Name = c(0, 15, 30, 45, 60, 75, 90, 105, 120,
135, 150, 165, 180), `2% No wash No cure 20gm` = c(0.0499999999999998,
0.0800000000000001, 0.13, 0.23, 0.56, 0.61, 0.54, 0.54, NA, NA,
NA, NA, NA), `3 % no wash no cure 20 gm` = c(0.0200000000000005,
0.04, 0.0700000000000003, 0.350000000000001, 0.42, 0.36, 0.36,
0.350000000000001, NA, NA, NA, NA, NA), `4 % no wash no cure 20 gm` = c(0.0499999999999998,
0.0899999999999999, 0.12, 0.18, 0.655, 0.649999999999999, 0.62,
0.62, NA, NA, NA, NA, NA), `2 % no cure just wash 20 gm` = c(0.04,
0.0699999999999994, 0.0899999999999999, 0.13, 0.44, 0.64, 0.62,
0.739999999999999, NA, NA, NA, NA, NA), `3 % no cure just wash 20gm` = c(0.04,
0.0999999999999996, 0.0800000000000001, 0.0999999999999996, 0.23,
0.6, 0.919999999999999, 1.42, 1.51, 1.64, NA, NA, NA), `4 % no cure just wash 20 gm` = c(0.0499999999999998,
0.0899999999999999, 0.0999999999999996, 0.12, 0.13, 0.13, 0.2,
0.37, 0.62, 0.86, 1.05, 1.23, 0.899999999999999), `3 % cure + wash 20 gm` = c(0.11,
0.16, 0.17, 0.18, 0.19, 0.2, 0.37, 0.819999999999999, 1.34, 1.62,
1.62, 2.02, 1.53), `4%cure+wash 20gm` = c(0.0600000000000005,
0.11, 0.14, 0.16, 0.17, 0.19, 0.26, 0.680000000000001, 0.87,
1.02, 1.12, 1.29, 1.12)), row.names = c(NA, -13L), class = c("tbl_df",
"tbl", "data.frame"))
I'm not sure about the meaning of your features, but did you think about something like this?
Data2 %>%
ggplot(aes(x = Name, y = value)) +
geom_point(aes(col = test), alpha = 0.5, position = "jitter")

Coefficient plot - Increase gap between rows and alternative background colors in rows

I have created this coefficient plot. However, I cannot increase the gap between rows. I also like to add an alternative background colour of row (like row-wise grey then white then grey ) to make it easier for the reader to read the plot. Would you please support improving its visualization?
I used the following code to create this plot.
mydf <- data.frame(
SubgroupH=c('Age',NA,NA,NA,NA,'Marital or Union Status',NA,NA, NA, 'Place of Residence', NA, NA, 'Education', NA, NA, NA, NA,'Occupation', NA, NA, NA, NA, 'Wealth', NA, NA, NA, NA, NA, 'Reading newspaper or magazine', NA, NA, NA, 'Frequency of watching television', NA, NA, NA, 'Frequency of listening radio', NA, NA, NA ),
Subgroup=c(NA,'15-19','20-29','30-39','40-49', NA, 'Currently Married or Union', 'Never Married or Union','Formally Married or Union', NA, 'Rural', 'Urban', NA, 'Higher', 'Secondary', 'Primary', 'No eduction', NA, 'Not working', 'Professional/technical/managerial/services', 'Agriculture', 'Skilled/unskilled manual', NA, 'Poorest', 'Poorer', 'Middle','Richer', 'Richest', NA, 'Not at all', 'Less than once a week', 'At least once a week', NA, 'Not at all', 'Less than once a week', 'At least once a week', NA, 'Not at all', 'Less than once a week', 'At least once a week'),
AdjustedOR=c(NA,1,'2.76 (2.49-3.07)','3.68 (3.26-4.15)','4.61 (4.06-5.23)',NA,1,'1.03 (0.94-1.13)', '1.26 (1.04-1.54)', NA, 1,'1.12 (1.04-1.21)', NA, 1, '1.42 (1.30-1.56)', '2.09 (1.85-2.36)', '2.20 (1.93-2.49)', NA, 1, '1.39 (1.25-1.54)', '1.37 (1.24-1.51)', '1.55 (1.39-1.72)', NA, 1, '0.91 (0.84-0.99)', '0.77 (0.71-0.84)', '0.72 (0.65-0.79)', '0.61 (0.54-0.69)', NA, 1, '1 (0.91-1.10)', '0.92 (0.85-0.998)', NA, 1, '1.19 (1.07-1.31)', '1.29 (1.19-1.40)', NA, 1, '1.19 (1.09-1.30)', '1.13 (1.05-1.21)'),
OddsRatio=c(NA,1,2.76,3.68,4.61, NA,1,1.03, 1.26, NA, 1,1.12, NA, 1, 1.42, 2.09, 2.20, NA, 1, 1.39, 1.37, 1.55, NA, 1, 0.91, 0.77, 0.78, 0.61 , NA, 1, 1,0.92, NA, 1,1.19,1.29, NA, 1, 1.19, 1.13),
ORLower=c(NA,NA,2.49,3.26,4.06,NA,NA,0.94, 1.04, NA, NA,1.04, NA, NA, 1.30,1.85, 1.93, NA, NA,1.25, 1.24, 1.39, NA, NA, 0.84, 0.71, 0.65, 0.54, NA, NA, 0.91, 0.85, NA, NA, 1.07, 1.19, NA, NA,1.09, 1.05),
ORUpper=c(NA,NA,3.07,4.15,5.23,NA,NA,1.13, 1.54, NA, NA,1.21, NA, NA, 1.56, 2.36, 2.49, NA, NA, 1.54, 1.51,1.72, NA, NA, 0.99, 0.84, 0.79, 0.69, NA, NA, 1.10, 0.998, NA, NA, 1.31, 1.40, NA, NA, 1.30,1.21),
Pvalue=c(NA,NA,'< 0.001','< 0.001','< 0.001', NA,NA, 0.518, 0.021, NA, NA, 0.003, NA, NA, '< 0.001', '< 0.001', '< 0.001', NA, NA, '< 0.001', '< 0.001', '< 0.001', NA, NA, 0.029, '< 0.001','< 0.001','< 0.001', NA, NA, 0.993, 0.045, NA, NA, '< 0.001','< 0.001',NA, NA, '< 0.001', 0.002),
stringsAsFactors=FALSE )
#png('temp.png', width=8, height=4, units='in', res=400)
rowseq <- seq(nrow(mydf),1)
par(mai=c(0.7,0,0,0))
plot(mydf$OddsRatio, rowseq, pch=15,
xlim=c(-0.8,6.2), ylim=c(0,42),
xlab='', ylab='', yaxt='n', xaxt='n',
bty='n')
axis(1, seq(0.5, 5,by=0.5), cex.axis=1)
segments(1,-1,1,40.20, lty=3, )
segments(mydf$ORLower, rowseq, mydf$ORUpper, rowseq)
mtext('Adjusted Odds Ratio (95% CI)', 1, line=2, at=1.2, cex=1, font=2)
text(-1,42, "Factors", cex=1.4, font=2, pos=4)
t1h <- ifelse(!is.na(mydf$SubgroupH), mydf$SubgroupH, '')
text(-1,rowseq, t1h, cex=1.3, pos=4, font=2)
t1 <- ifelse(!is.na(mydf$Subgroup), mydf$Subgroup, '')
text(-0.98,rowseq, t1, cex=1.3, pos=4)
text(4.6,42, "Adjusted Odds Ratio (95% CI)", cex=1.4, font=2, pos=4)
t2 <- ifelse(!is.na(mydf$AdjustedOR), format(mydf$AdjustedOR,big.mark=","), '')
text(6, rowseq, t2, cex=1.3, pos=2)
text(6,42, "P-value", cex=1.4, font=2, pos=4)
t4 <- ifelse(!is.na(mydf$Pvalue), mydf$Pvalue, '')
text(6,rowseq, t4, cex=1.3, pos=4)
You could play with flexible and different cex and adjust with the png parameters. This looks already better. For line-by-line gray shading we can simply use abline with modulo 2.
cex11 <- 1
cex12 <- 1.2
cex42 <- cex41 <- cex23 <- cex22 <- cex21 <- 1.3
png('temp.png', width=23, height=12, units='in', res=400)
par(mai=c(0.7, 0, 0, 0))
rowseq <- seq(nrow(mydf), 1)
plot(mydf$OddsRatio, rowseq, xlim=c(-0.8, 6.2), ylim=c(0, 42),
xlab='', ylab='', yaxt='n', xaxt='n', bty='n')
abline(h=rowseq[rowseq %% 2 != 0], lwd=25, col='grey90') ## grey shading
points(mydf$OddsRatio, rowseq, pch=15)
axis(1, seq(0.5, 5, by=0.5), cex.axis=cex11)
segments(1, -1, 1, 40.20, lty=3, )
segments(mydf$ORLower, rowseq, mydf$ORUpper, rowseq)
mtext('Adjusted Odds Ratio (95% CI)', 1, line=2, at=1.2, cex=cex12, font=2)
text(-1, 42, "Factors", cex=cex21, font=2, pos=4)
t1h <- ifelse(!is.na(mydf$SubgroupH), mydf$SubgroupH, '')
text(-1, rowseq, t1h, cex=cex22, pos=4, font=2)
t1 <- ifelse(!is.na(mydf$Subgroup), mydf$Subgroup, '')
text(-0.98, rowseq, t1, cex=cex23, pos=4)
text(4.6, 42, "Adjusted Odds Ratio (95% CI)", cex=cex41, font=2, pos=4)
t2 <- ifelse(!is.na(mydf$AdjustedOR), format(mydf$AdjustedOR, big.mark=", "), '')
text(6, rowseq, t2, cex=cex42, pos=2)
text(6, 42, "P-value", cex=cex41, font=2, pos=4)
t4 <- ifelse(!is.na(mydf$Pvalue), mydf$Pvalue, '')
text(6, rowseq, t4, cex=cex41, pos=4)
dev.off()
However,
it might be more convenient to expand the margins and use mtext instead of text. The code parts could also be better organized to avoid confusion. For the names of text parameters, use numbers according to their plot margin which are numbered according to their quadrant (1=bottom, 2=left, 3=top, 4=right). So try this:
## parameters
rowseq <- rev(seq_len(dim(mydf)[1]))
rg <- range(mydf[c('ORLower', 'ORUpper')], na.rm=TRUE)
t2h <- ifelse(!is.na(mydf$SubgroupH), mydf$SubgroupH, '')
t2 <- ifelse(!is.na(mydf$Subgroup), mydf$Subgroup, '')
t4or <- ifelse(!is.na(mydf$AdjustedOR), format(mydf$AdjustedOR, big.mark=", "), '')
t4p <- ifelse(!is.na(mydf$Pvalue), mydf$Pvalue, '')
cexh1 <- 1.3
cexh2 <- 1.2
cext <- 1.1
## plot
png('temp.png', width=18, height=12, units='in', res=400)
op <- par(mar=c(5, 18.5, 4, 15)+.1)
plot(mydf$OddsRatio, rowseq, type='n', xlim=rg, axes=FALSE, xlab='', ylab='')
## content
abline(h=rowseq[rowseq %% 2 == 0], lwd=20, col='grey95', xpd=TRUE) ## grey shades
points(mydf$OddsRatio, rowseq, pch=15)
segments(1, 0, 1, max(rowseq)*1.025, lty=3)
segments(mydf$ORLower, rowseq, mydf$ORUpper, rowseq)
## margin 1
axis(1, seq(0.5, 5, by=0.5), cex.axis=cex11)
mtext('Adjusted Odds Ratio (95% CI)', 1, line=2.5, at=1.2, cex=cexh1, font=2)
## margin 2
mtext("Factors", 2, 17.5, at=max(rowseq)*1.03, las=2, adj=0, font=2, cex=cexh1)
mtext(t2h, 2, 17.5, at=rowseq, las=2, adj=0, font=2, cex=cexh2)
mtext(t2, 2, 17, at=rowseq, las=2, adj=0, cex=cext)
## margin 4
mtext("Adjusted Odds Ratio (95% CI)", 4, 7, at=max(rowseq)*1.03, las=2, adj=1,
font=2, cex=cexh1)
mtext(t4or, 4, 7, at=rowseq, las=2, adj=1, cex=cext)
mtext("P-value", 4, 12, at=max(rowseq)*1.03, las=2, adj=1, font=2, cex=cexh1)
mtext(t4p, 4, 12, at=rowseq, las=2, adj=1, cex=cext)
par(op)
dev.off()
Data:
mydf <- structure(list(SubgroupH = c("Age", NA, NA, NA, NA, "Marital or Union Status",
NA, NA, NA, "Place of Residence", NA, NA, "Education", NA, NA,
NA, NA, "Occupation", NA, NA, NA, NA, "Wealth", NA, NA, NA, NA,
NA, "Reading newspaper or magazine", NA, NA, NA, "Frequency of watching television",
NA, NA, NA, "Frequency of listening radio", NA, NA, NA), Subgroup = c(NA,
"15-19", "20-29", "30-39", "40-49", NA, "Currently Married or Union",
"Never Married or Union", "Formally Married or Union", NA, "Rural",
"Urban", NA, "Higher", "Secondary", "Primary", "No eduction",
NA, "Not working", "Professional/technical/managerial/services",
"Agriculture", "Skilled/unskilled manual", NA, "Poorest", "Poorer",
"Middle", "Richer", "Richest", NA, "Not at all", "Less than once a week",
"At least once a week", NA, "Not at all", "Less than once a week",
"At least once a week", NA, "Not at all", "Less than once a week",
"At least once a week"), AdjustedOR = c(NA, "1", "2.76 (2.49-3.07)",
"3.68 (3.26-4.15)", "4.61 (4.06-5.23)", NA, "1", "1.03 (0.94-1.13)",
"1.26 (1.04-1.54)", NA, "1", "1.12 (1.04-1.21)", NA, "1", "1.42 (1.30-1.56)",
"2.09 (1.85-2.36)", "2.20 (1.93-2.49)", NA, "1", "1.39 (1.25-1.54)",
"1.37 (1.24-1.51)", "1.55 (1.39-1.72)", NA, "1", "0.91 (0.84-0.99)",
"0.77 (0.71-0.84)", "0.72 (0.65-0.79)", "0.61 (0.54-0.69)", NA,
"1", "1 (0.91-1.10)", "0.92 (0.85-0.998)", NA, "1", "1.19 (1.07-1.31)",
"1.29 (1.19-1.40)", NA, "1", "1.19 (1.09-1.30)", "1.13 (1.05-1.21)"
), OddsRatio = c(NA, 1, 2.76, 3.68, 4.61, NA, 1, 1.03, 1.26,
NA, 1, 1.12, NA, 1, 1.42, 2.09, 2.2, NA, 1, 1.39, 1.37, 1.55,
NA, 1, 0.91, 0.77, 0.78, 0.61, NA, 1, 1, 0.92, NA, 1, 1.19, 1.29,
NA, 1, 1.19, 1.13), ORLower = c(NA, NA, 2.49, 3.26, 4.06, NA,
NA, 0.94, 1.04, NA, NA, 1.04, NA, NA, 1.3, 1.85, 1.93, NA, NA,
1.25, 1.24, 1.39, NA, NA, 0.84, 0.71, 0.65, 0.54, NA, NA, 0.91,
0.85, NA, NA, 1.07, 1.19, NA, NA, 1.09, 1.05), ORUpper = c(NA,
NA, 3.07, 4.15, 5.23, NA, NA, 1.13, 1.54, NA, NA, 1.21, NA, NA,
1.56, 2.36, 2.49, NA, NA, 1.54, 1.51, 1.72, NA, NA, 0.99, 0.84,
0.79, 0.69, NA, NA, 1.1, 0.998, NA, NA, 1.31, 1.4, NA, NA, 1.3,
1.21), Pvalue = c(NA, NA, "< 0.001", "< 0.001", "< 0.001", NA,
NA, "0.518", "0.021", NA, NA, "0.003", NA, NA, "< 0.001", "< 0.001",
"< 0.001", NA, NA, "< 0.001", "< 0.001", "< 0.001", NA, NA, "0.029",
"< 0.001", "< 0.001", "< 0.001", NA, NA, "0.993", "0.045", NA,
NA, "< 0.001", "< 0.001", NA, NA, "< 0.001", "0.002")), class = "data.frame", row.names = c(NA,
-40L))

Removing NAs from ggplot x-axis in ggplot2

I would like to get rid off the whole NA block (highlighted here ).
I tried na.ommit and na.rm = TRUE unsuccesfully.
Here is the code I used :
library(readxl)
data <- read_excel("Documents/TFB/xlsx_geochimie/solfatara_maj.xlsx")
View(data)
data <- gather(data,FeO:`Fe2O3(T)`,key = "Element",value="Pourcentage")
library(ggplot2)
level_order <- factor(data$Element,levels = c("SiO2","TiO2","Al2O3","Fe2O3","FeO","MgO","CaO","Na2O","K2O"))
ggplot(data=data,mapping=aes(x=level_order,y=data$Pourcentage,colour=data$Ech)+geom_point()+geom_line(group=data$Ech) +scale_y_log10()
And here is my original file
https://drive.google.com/file/d/1bZi7fPWebbpodD1LFScoEcWt5Bs-cqhb/view?usp=sharing
If I run your code and look at data that goes into ggplot:
table(data$Element)
Al2O3 CaO Fe2O3 Fe2O3(T) FeO K2O LOI LOI2 MgO MnO
12 12 12 12 12 12 12 12 12 12
Na2O P2O5 SiO2 SO4 TiO2 Total Total 2 Total N Total S
12 12 12 12 12 12 12 12 12
You have included Total into the melted data frame.. which is not intended I guess. Hence when you do factor on these, and these "Total.." are not included in the levels, they become NA.
So we can do it from scratch:
data <- read_excel("solfatara_maj.xlsx")
The data:
structure(list(Ech = c("AGN 1A", "AGN 2A", "AGN 3B", "SOL 4B",
"SOL 8Ag", "SOL 8Ab", "SOL 16A", "SOL 16B", "SOL 16C", "SOL 22 A",
"SOL 22D", "SOL 25B"), FeO = c(0.2, 0.8, 1.7, 0.3, 1.7, NA, 0.2,
NA, 0.1, 0.7, 1.3, 2), `Total S` = c(5.96, 45.3, 0.22, 17.3,
NA, NA, NA, NA, NA, NA, 2.37, 0.36), SO4 = c(NA, 6.72, NA, 4.08,
0.06, 0.16, 42.2, 35.2, 37.8, 0.32, 6.57, NA), `Total N` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, 15.2, NA, NA), SiO2 = c(50.2,
31.05, 56.47, 62.14, 61.36, 75.66, 8.41, 21.74, 17.44, 13.52,
19.62, 56.35), Al2O3 = c(15.53, 7.7, 17.56, 4.44, 17.75, 10.92,
31.92, 26.38, 27.66, 0.64, 3.85, 17.28), Fe2O3 = c(0.49, 0.63,
2.06, NA, 1.76, 0.11, 0.64, 0.88, 1.71, NA, 1.32, 2.67), MnO = c(0.01,
0.01, 0.13, 0.01, 0.09, 0.01, 0.01, 0.01, 0.01, 0.005, 0.04,
0.12), MgO = c(0.06, 0.07, 0.88, 0.03, 0.97, 0.05, 0.04, 0.07,
0.03, 0.02, 1.85, 1.63), CaO = c(0.2, 0.09, 3.34, 0.09, 2.58,
0.57, 0.2, 0.26, 0.15, 0.06, 35.66, 4.79), Na2O = c(0.15, 0.14,
3.23, 0.13, 3.18, 2.04, 0.68, 0.68, 0.55, 0.05, 0.45, 3.11),
K2O = c(4.39, 1.98, 8, 1.26, 8.59, 5.94, 8.2, 6.97, 8.04,
0.2, 0.89, 7.65), TiO2 = c(0.42, 0.27, 0.46, 0.79, 0.55,
0.16, 0.09, 0.22, 0.16, 0.222, 0.34, 0.53), P2O5 = c(0.11,
0.09, 0.18, 0.08, 0.07, 0.07, 0.85, 0.68, 0.62, NA, 0.14,
0.28), LOI = c(27.77, 57.06, 6.13, 29.03, 1.38, 4.92, 42.58,
37.58, 38.76, NA, 26.99, 3.92), LOI2 = c(27.79, 57.15, 6.32,
29.06, 1.57, 4.93, 42.6, 37.59, 38.77, 0.08, 27.13, 4.15),
Total = c(99.52, 99.88, 100.2, 98.25, 99.99, 100.5, 93.81,
95.57, 95.23, 15.25, 92.45, 100.3), `Total 2` = c(99.54,
99.96, 100.3, 98.28, 100.2, 100.6, 93.83, 95.58, 95.24, 15.33,
92.59, 100.6), `Fe2O3(T)` = c(0.71, 1.52, 3.95, 0.27, 3.65,
0.22, 0.87, 0.99, 1.82, 0.61, 2.76, 4.9)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"))
First we set the plotting level like you did:
plotlvls = c("SiO2","TiO2","Al2O3","Fe2O3","FeO","MgO","CaO","Na2O","K2O")
Then we select only these columns, and also Ech, note I use pivot_longer() because gather() will supposedly be deprecated, and then we do the factoring too:
plotdf = data %>% select(c(plotlvls,"Ech")) %>%
pivot_longer(-Ech,names_to = "Element",values_to = "Pourcentage") %>%
mutate(Element=factor(Element,levels=toplot))
Finally we plot, and there are no NAs:
ggplot(data=plotdf,mapping=aes(x=Element,y=Pourcentage,colour=Ech))+
geom_point()+geom_line(aes(group=Ech)) +scale_y_log10()
1.Create reproducible minimal data
data <- data.frame(Element = c("SiO2","TiO2","Al2O3","Fe2O3","FeO","MgO","CaO","Na2O","K2O",NA),
Pourcentage = 1:10,
Ech = c("AGN 1A", "SOL 16"))
2.Set factor levels for variable 'Element'
data$Element <- factor(data$Element,levels = c("SiO2","TiO2","Al2O3","Fe2O3","FeO","MgO","CaO","Na2O","K2O"))
3.Remove rows containing NA in the variable 'Element'
data <- data[!is.na(data$Element), ]
4.Plot data using ggplot2 (ggplot2 syntax uses NSE (non standard evaluation), which means you dont't have to pass the variable names as strings or using the $ notation):
ggplot(data=data,aes(x=Element,y=Pourcentage,colour=Ech)) +
geom_point() +
geom_line(aes(group=Ech)) +
scale_y_log10()

R: Applying function to dataframe columns within list of dataframes (lapply within lapply)

I'm trying to apply a function to the columns of several dataframes in a list.
This seems like it should be a simple thing to do, but in other similar examples on SO, the answers get complicated really quickly based on what each OP is trying to accomplish.
Example 1
Example 2
Example 3
Example 4
Here is an excerpt from my data using dput.
l <- structure(list(rent = structure(c(NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 0.619348507, 0.606042704, 0.69948953,
0.70404742, 0.715846766, 0.752410211, 0.747648989, 0.731315218,
0.724050859, 0.731388758, 0.737695122, 0.723440202, 0.672426913,
0.62184407, 0.579224699, 0.543929258, 0.499494354, 0.482955432,
0.458398376, 0.518930096, 0.50932028, 0.518930096, 0.525756186,
0.513746556, 0.503713341, 0.491045528, 0.488167562, 0.476132099,
0.47692961, 0.477979792, 0.495832538, 0.511954619, 0.53, 0.54,
0.58, 0.6, 0.59, 0.58, 0.66, 0.64, 0.7, 0.77, 0.8, 0.68, 0.69,
0.74, 0.69, 0.7, 0.7, 0.658823529, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, 2.094773258, 2.101037984, 2.136784642,
2.195558442, 2.304390876, 2.428667765, 2.433074558, 2.435306322,
2.603913629, 2.437131683, 2.400160785, 2.35669125, 2.327183577,
2.283133019, 2.246607636, 2.212371694, 2.162791522, 2.115152942,
2.067514362, 2.038931214, 2.000820351, 1.985438936, 1.938178806,
1.9108904, 1.888022635, 1.843136376, 1.836852022, 1.82947565,
1.825745961, 1.821102267, 1.834815456, 1.84978194, 1.84, 1.78,
1.8, 1.88, 1.91, 1.89, 1.94, 1.91, 1.97, 1.89, 2, 2.02, 1.93,
1.95, 1.95, 1.95, 1.979104478, 1.98880597, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 1.599590974, 1.526601389,
1.573368264, 1.669687245, 1.653456053, 1.944681138, 2.020075584,
2.242033637, 2.159612762, 2.025100889, 2.137469624, 2.082928045,
2.029291987, 1.902079515, 1.842787645, 1.737607978, 1.644962316,
1.644962316, 1.58757991, 1.511070035, 1.482378831, 1.482378831,
1.411623672, 1.358613285, 1.327873365, 1.346018731, 1.374365901,
1.337657385, 1.332574633, 1.329043862, 1.315019275, 1.295720841,
1.26, 1.23, 1.27, 1.22, 1.1, 1.02, 1.09, 1.03, 1, 0.94, 0.96,
0.99, 1.01, 0.99, 1, 1.02, 1.02761194, 1.035223881), .Dim = c(65L,
3L), .Dimnames = list(NULL, c("IndustrialRent", "OfficeRent",
"RetailRent")), .Tsp = c(2002, 2018, 4), class = c("mts", "ts",
"matrix")), jobs = structure(c(118700, 116900, 116300, 118400,
118500, 118400, 117100, 116900, 115400, 115000, 113800, 112500,
109200, 109000, 108900, 110000, 109300, 109700, 110700, 111300,
110800, 111200, 110500, 110300, 107700, 107500, 108100, 108900,
110300, 111700, 112000, 113500, 114000, 114200, 114800, 115100,
113100, 113300, 113100, 114300, 115300, 116400, 116600, 118100,
118400, 119500, 120100, 120000, 119000, 119900, 120300, 121400,
122200, 122600, 123700, 125600, 126100, 127100, 127200, 127900,
128100, 128400, 128700, 129300, 131200, 131800, 131900, 133400,
134200, 136000, 136900, 137900, 134700, 134800, 134700, 135700,
135700, 137300, 138500, 139500, 140300, 140200, 140800, 141600,
141400, 141900, 144000, 144000, 144500, 146000, 145900, 146900,
148500, 150000, 151100, 154300, 150500, 151800, 153000, 154600,
211500, 210000, 209800, 212500, 212100, 212900, 210800, 212700,
210300, 214300, 214400, 213300, 215400, 213800, 215000, 216700,
216400, 217200, 216100, 216200, 218500, 220700, 219800, 219800,
223000, 222900, 221900, 224200, 225200, 225900, 222000, 226700,
227400, 231100, 230800, 228300, 231400, 231100, 233600, 234800,
235700, 235500, 231800, 234300, 233400, 237500, 238400, 236800,
238900, 239800, 241100, 242700, 244700, 245400, 240300, 244900,
246100, 252400, 252100, 251100, 254000, 253800, 255600, 258700,
259300, 258200, 257400, 261800, 261800, 268200, 269400, 269200,
268200, 268000, 269800, 273800, 273400, 275300, 274600, 275900,
279100, 282700, 282100, 279700, 283000, 281300, 284900, 283700,
286300, 287500, 285300, 286300, 287500, 289400, 287700, 288000,
287300, 289200, 288600, 290100, 91600, 90100, 90200, 90600, 91100,
91600, 92000, 92500, 92200, 93500, 96800, 97600, 92700, 91300,
91500, 92200, 92700, 93000, 93400, 94000, 94200, 95900, 100100,
100600, 96500, 94800, 95500, 96000, 96400, 96800, 96500, 96700,
97200, 98100, 103600, 103000, 98000, 96300, 96400, 96700, 97900,
98400, 98900, 99400, 99700, 101000, 105800, 106800, 101200, 99800,
100200, 100500, 101100, 101500, 101600, 102500, 103000, 105400,
110000, 110900, 105200, 104000, 104700, 105400, 105800, 106000,
105900, 106600, 106400, 108100, 112100, 112100, 105400, 104600,
105000, 105600, 106000, 106100, 106300, 107000, 107100, 108800,
113500, 113200, 107700, 106300, 106100, 106200, 107000, 107000,
106900, 107300, 107600, 108400, 114500, 114700, 109400, 107000,
108000, 108000), .Dim = c(100L, 3L), .Dimnames = list(NULL, c("IndustrialUsingJobs",
"OfficeUsingJobs", "RetailUsingJobs")), .Tsp = c(2010, 2018.25,
12), class = c("mts", "ts", "matrix"))), .Names = c("rent", "jobs"
))
I want to apply a function to each column of each data frame in my list. The most straight forward way to do this that I came up with is:
library(seasonal) #christoph sax's x-13-seats seasonal adjustment implementation
lapply(l, function(x) lapply(x, seas))
This example includes no options. The output for each column is a list describing the seasonal adjustment.
The problem with this code is that it seems quite slow relative to what I expected. Are there more efficient and/or simpler ways of accomplishing this? Thanks for your input.

Resources