Histogram for diagonal axis in scatterplot - r

I have a 5 x 5 scatterplot matrix that I created using ggplot. I made histograms for X and Y axis, but I needed an additional histogram for the diagonals of the matrix as well.
Edited for data
data <- structure(c(5, 5, 5, 3, 4, 4, 2, 4, 4, 4, 5, 4, 5, 4, 5, 1, 4,
3, 5, 4, 5, 2, 3, 3, 3, 4, 2, 5, 2, 4, 3, 3, 3, 3, 5, 4, 3, 4,
4, 4, 3, 3, 5, 3, 1, 3, 4, 5, 5, 3, 2, 4, 5, 4, 4, 5, 3, 5, 1,
3, 4, 5, 3, 2, 4, 3, 4, 1, 4, 3, 5, 2, 3, 3, 4, 5, 5, 5, 4, 3,
1, 1, 4, 2, 5, 4, 4, 1, 5, 3, 4, 2, 4, 3, 4, 4, 5, 4, 5, 1, 4,
5, 5, 5, 3, 4, 4, 2, 4, 4, 4, 5, 4, 5, 4, 5, 1, 4, 3, 5, 4, 5,
2, 3, 3, 3, 4, 2, 5, 2, 4, 3, 3, 3, 3, 5, 4, 3, 4, 4, 4, 3, 3,
5, 3, 1, 3, 4, 5, 5, 3, 2, 4, 5, 4, 4, 5, 3, 5, 1, 3, 3, 5, 2,
1, 1, 4, 5, 4, 5, 1, 1, 5, 4, 5, 3, 1, 3, 5, 5, 5, 5, 2, 1, 1,
1, 2, 3, 5, 1, 2, 5, 3, 5, 4, 5, 2, 2, 5, 2, 3, 5), .Dim = c(101L,
2L))
Here is the code
library(ggplot2)
library(gridExtra)
data <- as.data.frame(data)
x <- data$V2
y <- data$V1
xhist <- qplot(x, geom="histogram", binwidth = 0.5)
yhist <- qplot(y, geom="histogram", binwidth = 0.5) + coord_flip()
none <- ggplot()+geom_point(aes(1,1), colour="white") +
theme(axis.ticks=element_blank(), panel.background=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
g1 <- ggplot(data, aes(x,y)) +
geom_point(size = 1, position = position_jitter(w=0.3, h=0.3))
grid.arrange(yhist, g1, none, xhist, ncol=2, nrow=2, widths=c(1, 4), heights=c(4,1))
Is there a way to directly plot z-axis histogram from this data alone? What I want is to remove the panel of 'none', and instead place a histogram for data points across the diagonal.

Related

r restructure/ stack wide 'boxy' data into one long table

I have a table that is downloaded from Excel. The structure looks like this:
excel_table <- tribble(
~hour, ~day, ~value_1, ~value_2, ~value_3, ~value_4, ~day, ~value_1, ~value_2, ~value_3, ~value_4, ~day, ~value_1, ~value_2, ~value_3, ~value_4,
"10am", "11-03-2021", 2, 3, 4, 5, "11-10-2021", 2, 3, 4, 5, "11-17-2021", 2, 3, 4, 5,
"11am", "11-03-2021", 2, 3, 4, 5, "11-10-2021"2, 3, 4, 5, "11-17-2021", 2, 3, 4, 5,
"12pm", "11-03-2021", 2, 3, 4, 5, "11-10-2021"2, 3, 4, 5, "11-17-2021", 2, 3, 4, 5,
"1pm", "11-03-2021", 2, 3, 4, 5, "11-10-2021"2, 3, 4, 5, "11-17-2021", 2, 3, 4, 5,
"2pm", "11-03-2021", 2, 3, 4, 5, "11-10-2021"2, 3, 4, 5, "11-17-2021", 2, 3, 4, 5,
"3pm", "11-03-2021", 2, 3, 4, 5, "11-10-2021"2, 3, 4, 5, "11-17-2021", 2, 3, 4, 5,
"4pm", "11-03-2021", 2, 3, 4, 5, "11-10-2021"2, 3, 4, 5, "11-17-2021", 2, 3, 4, 5,
"5pm", "11-03-2021", 2, 3, 4, 5, "11-10-2021"2, 3, 4, 5, "11-17-2021", 2, 3, 4, 5,
"6pm", "11-03-2021", 2, 3, 4, 5, "11-10-2021"2, 3, 4, 5, "11-17-2021", 2, 3, 4, 5,
"7pm", "11-03-2021", 2, 3, 4, 5, "11-10-2021"2, 3, 4, 5, "11-17-2021", 2, 3, 4, 5,
"8pm", "11-03-2021", 2, 3, 4, 5, "11-10-2021"2, 3, 4, 5, "11-17-2021", 2, 3, 4, 5
)
This is the output that I am looking for:
excel_table <- tribble(
~hour, ~day, ~value_1, ~value_2, ~value_3, ~value_4,
"10am", "11-03-2021", 2, 3, 4, 5,
"11am", "11-03-2021", 2, 3, 4, 5,
"12pm", "11-03-2021", 2, 3, 4, 5,
"1pm", "11-03-2021", 2, 3, 4, 5,
"2pm", "11-03-2021", 2, 3, 4, 5,
"3pm", "11-03-2021", 2, 3, 4, 5,
"4pm", "11-03-2021", 2, 3, 4, 5,
"5pm", "11-03-2021", 2, 3, 4, 5,
"6pm", "11-03-2021", 2, 3, 4, 5,
"7pm", "11-03-2021", 2, 3, 4, 5,
"8pm", "11-03-2021", 2, 3, 4, 5,
"10am", "11-10-2021", 2, 3, 4, 5,
"11am", "11-10-2021", 2, 3, 4, 5,
"12pm", "11-10-2021", 2, 3, 4, 5,
"1pm", "11-10-2021", 2, 3, 4, 5,
"2pm", "11-10-2021", 2, 3, 4, 5,
"3pm", "11-10-2021", 2, 3, 4, 5,
"4pm", "11-10-2021", 2, 3, 4, 5,
"5pm", "11-10-2021", 2, 3, 4, 5,
"6pm", "11-10-2021", 2, 3, 4, 5,
"7pm", "11-10-2021", 2, 3, 4, 5,
"8pm", "11-10-2021", 2, 3, 4, 5,
"10am", "11-17-2021", 2, 3, 4, 5,
"11am", "11-17-2021", 2, 3, 4, 5,
"12pm", "11-17-2021", 2, 3, 4, 5,
"1pm", "11-17-2021", 2, 3, 4, 5,
"2pm", "11-17-2021", 2, 3, 4, 5,
"3pm", "11-17-2021", 2, 3, 4, 5,
"4pm", "11-17-2021", 2, 3, 4, 5,
"5pm", "11-17-2021", 2, 3, 4, 5,
"6pm", "11-17-2021", 2, 3, 4, 5,
"7pm", "11-17-2021", 2, 3, 4, 5,
"8pm", "11-17-2021", 2, 3, 4, 5,
)
My first attempts were using tidyr::gather or tidyr::pivot_longer but that didn't get a good result and I won't reproduce that attempt here because it wasn't the right approach. Then it occurred to me that I could just cut off the columns into new dataframes and then use rbind() or dplyr::bind_rows() to stack the rows on top of each other where the columns match. So I started down that road but it's not such a good road to go down because I was timing myself and it would take way too long. The table I'm working with has more than the three dates; it has many years worth of data.
Is there a solution where I can restructure this data? I'm looking to preserve the first six columns and then stack the next five columns on the bottom of the rows, and then the next five on the bottom of that (and also I'm hoping to repeat the first column that says 'hour' all the way down)

How can I get the initial communalities for an exploratory factor analysis in R?

I would like to obtain the initial communalities for an exploratory factor analysis in R
(that is, the R squared of each item when predicted by the other items included in the analysis).
Is there a way to do this with either jmv::efa or psych::fa ?
I only see the uniqueness, which informs me of the communalities AFTER factor extraction (1-uniqueness)...
Thank you for your consideration : )
As you note, the initial communalities in a factor analysis are the squared multiple correlations (SMC) of each variable by the remaining variables. Using the built-in attitude dataset as an example they are easily calculated without additional packages via:
1 - 1 / diag(solve(cor(attitude)))
rating complaints privileges learning raises critical advance
0.7326020 0.7700868 0.3831176 0.6194561 0.6770498 0.1881465 0.5186447
The psych package includes the smc() function for convenience:
psych::smc(attitude)
rating complaints privileges learning raises critical advance
0.7326020 0.7700868 0.3831176 0.6194561 0.6770498 0.1881465 0.5186447
Dataset
Here is the dput for the data I am using, hereafter called hwk:
hwk <- structure(list(V1 = structure(c(4, 4, 2, 2, 2, 2, 2, 2, 4, 4,
2, 3, 2, 3, 4, 2, 2, 2, 3, 3, 2, 3, 1, 3, 3, 3, 3, 4, 1, 2, 4,
1, 2, 3, 2, 3, 1, 1, 2, 2, 4, 3, 2, 1, 2, 3, 3, 4, 3, 3, 2, 3,
1, 4, 3, 2, 3, 4, 1, 3, 3, 3, 2, 2, 1, 2, 3, 4, 4, 2, 4, 3, 2,
3, 3, 3, 3, 2, 4, 3, 3, 3, 2, 2, 3, 4, 2, 4, 4, 2, 2, 3, 3), format.spss = "F8.0"),
V2 = structure(c(4, 4, 3, 4, 3, 4, 3, 2, 4, 1, 3, 3, 3, 4,
3, 3, 2, 3, 4, 3, 1, 4, 2, 3, 4, 2, 4, 3, 3, 2, 3, 2, 3,
3, 4, 3, 3, 3, 3, 3, 3, 2, 4, 2, 2, 2, 4, 3, 4, 4, 2, 4,
2, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 4, 3, 3, 4, 4, 4, 4, 4,
3, 4, 3, 3, 3, 4, 2, 4, 3, 4, 3, 3, 2, 3, 3, 4, 3, 4, 3,
4, 4, 3), format.spss = "F8.0"), V3 = structure(c(4, 4, 4,
4, 4, 4, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), format.spss = "F8.0"),
V4 = structure(c(4, 4, 3, 4, 3, 4, 2, 1, 3, 2, 3, 1, 4, 4,
2, 3, 2, 2, 2, 4, 1, 2, 2, 2, 3, 2, 3, 2, 2, 1, 3, 1, 1,
2, 4, 1, 1, 2, 3, 2, 2, 1, 1, 1, 3, 2, 4, 3, 3, 3, 3, 3,
3, 4, 3, 1, 4, 3, 4, 3, 2, 3, 2, 1, 4, 1, 4, 1, 2, 4, 4,
4, 3, 3, 3, 2, 2, 1, 4, 3, 2, 3, 2, 1, 3, 4, 1, 2, 4, 3,
4, 2, 2), format.spss = "F8.0"), V5 = structure(c(3, 3, 3,
4, 3, 4, 3, 1, 1, 1, 1, 2, 1, 2, 2, 2, 1, 2, 2, 2, 3, 2,
2, 2, 2, 4, 2, 3, 2, 3, 4, 1, 4, 2, 3, 3, 2, 2, 3, 2, 2,
3, 3, 2, 3, 3, 3, 2, 2, 2, 3, 2, 3, 3, 2, 2, 3, 3, 2, 3,
2, 2, 3, 3, 3, 2, 3, 3, 3, 4, 3, 2, 3, 3, 3, 3, 3, 3, 4,
3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 4, 3, 3), format.spss = "F8.0"),
V6 = structure(c(4, 4, 3, 4, 3, 4, 4, 1, 3, 3, 3, 3, 2, 3,
4, 2, 4, 3, 3, 3, 3, 4, 4, 3, 3, 3, 4, 4, 4, 3, 4, 4, 3,
3, 3, 4, 2, 2, 3, 3, 3, 4, 2, 4, 3, 4, 4, 4, 3, 4, 2, 4,
3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 3, 1, 4, 4, 4, 4, 4, 4,
4, 3, 4, 4, 4, 4, 2, 4, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 3,
4, 4, 4), format.spss = "F8.0"), V7 = structure(c(4, 4, 2,
4, 2, 4, 4, 3, 3, 3, 2, 2, 4, 4, 3, 3, 1, 4, 3, 3, 1, 2,
4, 3, 4, 2, 4, 4, 3, 3, 2, 2, 3, 2, 4, 3, 3, 3, 3, 3, 3,
1, 4, 3, 2, 2, 4, 3, 4, 4, 2, 4, 2, 3, 4, 3, 3, 3, 4, 3,
4, 4, 3, 4, 4, 3, 4, 4, 4, 4, 3, 4, 4, 4, 3, 3, 4, 3, 4,
3, 3, 3, 3, 2, 2, 4, 4, 4, 4, 2, 4, 4, 3), format.spss = "F8.0"),
V8 = structure(c(4, 4, 2, 1, 2, 1, 1, 1, 3, 3, 2, 3, 2, 3,
4, 2, 2, 2, 3, 3, 2, 3, 1, 3, 3, 3, 3, 4, 1, 2, 4, 1, 2,
3, 2, 3, 1, 1, 2, 2, 3, 1, 1, 1, 2, 3, 3, 4, 3, 3, 2, 3,
1, 3, 4, 2, 3, 4, 1, 3, 3, 3, 2, 2, 1, 2, 3, 4, 4, 2, 4,
3, 4, 4, 4, 4, 3, 2, 4, 3, 3, 3, 2, 2, 3, 4, 2, 4, 4, 2,
1, 3, 4), format.spss = "F8.0"), V9 = structure(c(4, 4, 4,
4, 4, 4, 4, 4, 3, 3, 2, 3, 3, 3, 3, 2, 3, 3, 2, 3, 4, 4,
4, 4, 3, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 4, 3, 2, 4, 3, 4,
4, 4, 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 4, 3, 4, 3, 2, 4,
3, 3, 4, 4, 4, 3, 4, 4, 4, 4, 4, 3, 4, 3, 4, 3, 4, 4, 4,
4, 3, 4, 4, 4, 4, 4, 3, 2, 4, 4, 4, 4, 4), format.spss = "F8.0"),
V10 = structure(c(4, 4, 2, 4, 2, 4, 3, 2, 3, 3, 3, 2, 4,
4, 2, 2, 1, 3, 4, 4, 1, 4, 2, 3, 3, 2, 4, 3, 2, 3, 3, 1,
3, 2, 4, 3, 2, 3, 3, 3, 3, 1, 2, 4, 2, 3, 4, 4, 3, 3, 2,
4, 2, 4, 3, 3, 4, 3, 4, 3, 4, 4, 4, 1, 4, 3, 3, 4, 3, 4,
4, 3, 3, 3, 3, 3, 4, 1, 4, 3, 3, 3, 3, 2, 3, 4, 4, 2, 4,
2, 4, 4, 3), format.spss = "F8.0"), V11 = structure(c(3,
3, 1, 4, 1, 4, 1, 1, 1, 1, 2, 1, 1, 1, 3, 2, 2, 2, 2, 1,
2, 3, 1, 2, 3, 3, 2, 1, 2, 2, 2, 3, 2, 2, 3, 2, 1, 2, 2,
1, 1, 4, 3, 1, 3, 2, 3, 1, 2, 1, 2, 1, 2, 2, 1, 2, 2, 3,
2, 2, 2, 2, 2, 2, 1, 1, 1, 3, 3, 4, 2, 1, 2, 2, 3, 3, 3,
3, 4, 3, 2, 3, 3, 2, 2, 2, 2, 1, 3, 1, 4, 1, 3), format.spss = "F8.0"),
V12 = structure(c(4, 4, 3, 2, 3, 2, 3, 1, 3, 3, 3, 3, 2,
3, 3, 2, 4, 3, 3, 4, 4, 3, 3, 4, 4, 3, 3, 3, 4, 3, 4, 4,
3, 3, 3, 4, 2, 2, 3, 3, 3, 4, 2, 4, 3, 4, 4, 4, 3, 4, 2,
4, 3, 3, 3, 3, 4, 3, 3, 2, 2, 1, 1, 3, 1, 4, 4, 4, 4, 4,
4, 4, 3, 3, 2, 2, 2, 2, 4, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4,
3, 2, 3, 4), format.spss = "F8.0")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -93L))
EFA
I did some research after my initial answer and it appears that there is a package for this called EFA Tools. There is a function called EFA that allows you to specify that you want the initial communalities. First, run the library and the EFA itself below:
# Load EFA Tools library:
library(EFAtools)
# Run EFA:
hwkfa <- EFA(hwk,
n_factors = 3,
start_method = "psych",
method = "PAF",
rotation = "promax",
init_comm = "smc", # selected initial communalities
type = "SPSS")
Obtaining initial communalities:
Then from there you can simply select the initial communalities by using the following code:
hwkfa$h2_init
Which gives you the following vector of output:
V1 V2 V3 V4 V5 V6 V7
0.8034001 0.5583605 0.5487691 0.3255253 0.5685402 0.4643686 0.5227481
V8 V9 V10 V11 V12
0.8050573 0.3474202 0.5564858 0.3496354 0.3783390
I ran the same thing in SPSS and got matching values:

Select and compare two elements from two dataframes using R

I want to calculate the shortest path between two proteins using two dataframes. For example, I want to calculate the shortest path of first from the first list and the first from the seconds list, the first from the first list and the second from the second list, etc.
structure(list(LAS1L = c("FKBP4", "RBM6", "UPF1", "SLC25A5",
"DHX33", "ELAC2", "CCDC124", "RPS20", "CSDE1", "AKAP8L", "UTP18",
"PTBP1", "DCN", "MATR3", "SAMD4A", "AQR", "STRAP", "SEC63", "BCLAF1",
"TFB1M", "GRN", "ZCCHC8", "NSUN2", "SKIV2L2", "STAU2", "CTNNA1",
"YTHDC2", "POLR2B", "TPR", "MAP4", "NOP16", "FAM120A", "R3HDM1",
"PTCD2", "RRP12", "MRTO4", "THRAP3", "NOP58", "USP36", "MLL3",
"PUM2", "MRPL43", "ZFR", "RC3H2", "ZC3H11A", "PARP12", "ALDH18A1",
"CSDA", "CCAR1")), class = "data.frame", row.names = c(NA, -49L
))
structure(list(GNL3L = c("FMR1", "FRAXA", "UBA1", "CSTF2", "MECP2",
"PHF6", "RBM10", "GSPT2", "SLC25A5", "EIF1AX", "NKRF", "RPS4X",
"RBMX2", "HTATSF1", "LAS1L", "MBNL3", "HUWE1", "RPL10", "RPL15",
"RBMX", "NONO", "RPGR", "UPF3B", "RBM3", "HNRNPH2", "UTP14A",
"DKC1", "MEX3C", "DDX3X", "FLNA", "FAM120C")), class = "data.frame", row.names = c(NA,
-31L))
So far, I just come out with this.
sp<-shortest_path[protein1[,1],protein2[,1]]
dput for shortest_path:
structure(c(0, 4, 6, 4, 4, 4, 4, 3, 3, 3, 5, 3, 5, 3, 3, 3, 4,
3, 3, 3, 4, 0, 5, 4, 4, 4, 4, 3, 3, 3, 5, 3, 5, 3, 3, 3, 4, 3,
3, 3, 6, 5, 0, 6, 4, 6, 5, 5, 5, 5, 7, 5, 6, 5, 5, 3, 6, 5, 5,
5, 4, 4, 6, 0, 3, 3, 3, 3, 3, 3, 4, 3, 5, 3, 3, 3, 4, 3, 3, 3,
4, 4, 4, 3, 0, 4, 3, 3, 3, 3, 4, 3, 5, 3, 3, 3, 4, 3, 3, 3, 4,
4, 6, 3, 4, 0, 3, 3, 3, 3, 5, 3, 5, 3, 3, 3, 4, 3, 3, 3, 4, 4,
5, 3, 3, 3, 0, 3, 3, 2, 3, 3, 5, 3, 3, 3, 4, 3, 3, 3, 3, 3, 5,
3, 3, 3, 3, 0, 2, 2, 4, 2, 4, 2, 2, 2, 3, 2, 2, 2, 3, 3, 5, 3,
3, 3, 3, 2, 0, 2, 4, 2, 4, 2, 2, 2, 3, 2, 2, 2, 3, 3, 5, 3, 3,
3, 2, 2, 2, 0, 2, 2, 4, 2, 2, 2, 3, 2, 2, 2, 5, 5, 7, 4, 4, 5,
3, 4, 4, 2, 0, 4, 6, 4, 4, 4, 5, 4, 4, 4, 3, 3, 5, 3, 3, 3, 3,
2, 2, 2, 4, 0, 4, 2, 2, 2, 3, 2, 2, 2, 5, 5, 6, 5, 5, 5, 5, 4,
4, 4, 6, 4, 0, 4, 4, 4, 5, 4, 4, 4, 3, 3, 5, 3, 3, 3, 3, 2, 2,
2, 4, 2, 4, 0, 2, 2, 3, 2, 2, 2, 3, 3, 5, 3, 3, 3, 3, 2, 2, 2,
4, 2, 4, 2, 0, 2, 3, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 4,
2, 4, 2, 2, 0, 3, 2, 2, 2, 4, 4, 6, 4, 4, 4, 4, 3, 3, 3, 5, 3,
5, 3, 3, 3, 0, 3, 3, 3, 3, 3, 5, 3, 3, 3, 3, 2, 2, 2, 4, 2, 4,
2, 2, 2, 3, 0, 1, 2, 3, 3, 5, 3, 3, 3, 3, 2, 2, 2, 4, 2, 4, 2,
2, 2, 3, 1, 0, 2, 3, 3, 5, 3, 3, 3, 3, 2, 2, 2, 4, 2, 4, 2, 2,
2, 3, 2, 2, 0), .Dim = c(20L, 20L), .Dimnames = list(c("1810055G02Rik",
"2810046L04Rik", "4922501C03Rik", "4930572J05Rik", "9830001H06Rik",
"A1CF", "A2M", "AAGAB", "AATF", "ABCA1", "ABCA13", "ABCA2", "ABCA4",
"ABCB1", "ABCB7", "ABCC2", "ABCC8", "ABCD1", "ABCD3", "ABCD4"
), c("1810055G02Rik", "2810046L04Rik", "4922501C03Rik", "4930572J05Rik",
"9830001H06Rik", "A1CF", "A2M", "AAGAB", "AATF", "ABCA1", "ABCA13",
"ABCA2", "ABCA4", "ABCB1", "ABCB7", "ABCC2", "ABCC8", "ABCD1",
"ABCD3", "ABCD4")))
Thanks in advance!
Maybe you can try the code below
outer(protein1$LAS1L, protein2$GNL3L, FUN = function(x, y) shortest_path[x, y])

Plots are not stored in list during loop

I have a problem similar to what is found here. I have a loop which runs through some modelling for different pairs of variables. Probably should not have used loops to go through them, but right now that is too late. Then I want to create a plot for each run. At first nothing showed before looking at that post. Looking at the post and implementing the best answer i could at least print the plots, but they still were not stored. The idea is to generate the plots, and then use grid.arrange to plot them together. Could someone show how to fix it? Here is some random data and the loop from example:
col1 <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4,
2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3)
col2 <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4,
1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3)
col3 <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3,
2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3)
data2 <- data.frame(col1,col2,col3)
data2[,1:3] <- lapply(data2[,1:3], as.factor)
colnames(data2)<- c("A","B","C")
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
message(i)
myplots[[i]] <- local({
i <- i
p1 <- ggplot(data2, aes(x = data2[[i]])) +
geom_histogram(fill = "lightgreen") +
xlab(colnames(data2)[i])
print(p1)
})
}
I tried to change print to return, but to no avail. I get the plots printed in the View window in Rstudio, but the plots are not stored at all.
You can use the following code -
library(ggplot2)
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
myplots[[i]] <- ggplot(data2, aes(x = .data[[colnames(data2)[i]]])) +
geom_histogram(fill = "lightgreen")
}
However, using lapply would be easier.
myplots <- lapply(names(data2), function(x)
ggplot(data2, aes(x = .data[[x]])) + geom_histogram(fill = "lightgreen"))
Plot the list of plots with grid.arrange.
gridExtra::grid.arrange(grobs = myplots)
data
A <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4,
2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3)
B <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4,
1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3)
C <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3,
2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3)
data2 <- data.frame(A,B,C)
Does this work for you?, With patchwork and purrr::reduce we can club these graphs to stack(horizontal or vertical) with each other. You can also use slashes(/) instead of plus(+) in reduce to make it appended vertically instead of horizontally. If you want to plot histogram you should have continuous data , In case you do want to plot counts for discrete data you should try geom_bar. If you do want to check for geom_bar then you need to convert the columns into factors. I am not so sure what plot you want to carry out, I am assuming that you have continuous data and you want to carry out histogram here. Please let me know if it doesn't work in your scenario.
library(tidyverse)
library(patchwork)
data2 <- data.frame(col1, col2, col3) ## No conversion of factors
nm <- names(data2)
g1 <- reduce(map2(data2,nm, ~ggplot(data2,aes(x =.x )) + geom_histogram(fill = "yellow4") + labs(x=.y, y = 'count')), `+`)
print(g1)
Or with slashes:
g2 <- reduce(map2(data2,nm, ~ggplot(data2,aes(x =.x )) + geom_histogram(fill = "yellow4") + labs(x=.y, y = 'count')), `/`)
print(g2)
Or if you want to have for loops then probably you can do this as well, you already have intialised myplots so not adding it here:
for (i in seq_along(data2)) {
myplots[[i]] <-
ggplot(data2, aes(x = data2[[i]])) +
geom_histogram(fill = "lightgreen") +
xlab(colnames(data2)[i])
}
Explanation:
Now you can use reduce with your myplots to arrange them, Note here myplots should be containing your 3 plots :
reduce(myplots, `+`)
for arranging it.
The map2 and reduce is similar solution, with map2 you are getting 3 plots saved into a list, so 3 objects are returned from below code:
plots <- map2(data2,nm, ~ggplot(data2,aes(x =.x )) + geom_histogram(fill = "yellow4") + labs(x=.y, y = 'count'))
To add them (arrange) them all you have to do is to use patchwork like below:
plots[[1]] + plots[[2]] + plots[[3]], but then its quite cumbersome, so we use reduce to make it happen like below:
reduce(plots, `+`)
Also like I mentioned earlier you can use slash instead of plus to make the arrangement vertical than horizontal. with plot_layout option in patchwork, you can create more flexible plots. You can check here .
with gridExtra : gridExtra::grid.arrange(grobs = (myplots)), again instead of myplots, it can be any list that contain ggplot objects.

Math Symbols within for loop of GGplots in R

I'm currently trying to develop a similar result as this link. I have a significant number of columns and several different labels for the x-axis.
col1 <- c(2, 4, 1, 2, 5, 1, 2, 0, 1, 4, 4, 3, 5, 2, 4, 3, 3, 6, 5, 3, 6, 4, 3, 4, 4, 3, 4,
2, 4, 3, 3, 5, 3, 5, 5, 0, 0, 3, 3, 6, 5, 4, 4, 1, 3, 3, 2, 0, 5, 3, 6, 6, 2, 3,
3, 1, 5, 3, 4, 6)
col2 <- c(2, 4, 4, 0, 4, 4, 4, 4, 1, 4, 4, 3, 5, 0, 4, 5, 3, 6, 5, 3, 6, 4, 4, 2, 4, 4, 4,
1, 1, 2, 2, 3, 3, 5, 0, 3, 4, 2, 4, 5, 5, 4, 4, 2, 3, 5, 2, 6, 5, 2, 4, 6, 3, 3,
3, 1, 4, 3, 5, 4)
col3 <- c(2, 5, 4, 1, 4, 2, 3, 0, 1, 3, 4, 2, 5, 1, 4, 3, 4, 6, 3, 4, 6, 4, 1, 3, 5, 4, 3,
2, 1, 3, 2, 2, 2, 4, 0, 1, 4, 4, 3, 5, 3, 2, 5, 2, 3, 3, 4, 2, 4, 2, 4, 5, 1, 3,
3, 3, 4, 3, 5, 4)
col4 <- c(2, 5, 2, 1, 4, 1, 3, 4, 1, 3, 5, 2, 4, 3, 5, 3, 4, 6, 3, 4, 6, 4, 3, 2, 5, 5, 4,
2, 3, 2, 2, 3, 3, 4, 0, 1, 4, 3, 3, 5, 4, 4, 4, 3, 3, 5, 4, 3, 5, 3, 6, 6, 4, 2,
3, 3, 4, 4, 4, 6)
data2 <- data.frame(col1,col2,col3,col4)
data2[,1:4] <- lapply(data2[,1:4], as.factor)
colnames(data2)<- c("A","B","C", "D")
> x.axis.list
[[1]]
expression(beta[paste(1, ",", 1L)])
[[2]]
expression(beta[paste(1, ",", 2L)])
[[3]]
expression(beta[paste(1, ",", 3L)])
[[4]]
expression(beta[paste(1, ",", 4L)])
myplots <- vector('list', ncol(data2))
for (i in seq_along(data2)) {
message(i)
myplots[[i]] <- local({
i <- i
p1 <- ggplot(data2, aes(x = data2[[i]])) +
geom_histogram(fill = "lightgreen") +
xlab(x.axis.list[[i]])
print(p1)
})
}
In the past, I've been able to do something similar to this where I can just put x.axis.list[[i]] in my loop and change the symbols. However, I continue to get the term expression on the axis. So the symbol for Beta is correct as well as the subscript but the word "expression" remains. I'm not sure exactly what I'm doing wrong, for a moment, I was able to produce a plot without "expression" but it has since stayed in the ggplot.
I want to be able to produce this plot, or one with the title on the y-axis without the word "expression".
My image currently looks . I'm not worried about this example data and the result of the plot, I'm wondering how to get rid of "expression" so only the math symbol shows.
Thanks in advance.
You can do:
for (i in seq_along(data2)) {
df <- data2[i]
names(df)[1] <- "x"
myplots[[i]] <- local({
p1 <- ggplot(df, aes(x = x)) +
geom_bar(fill = "lightgreen", stat = "count") +
xlab(x.axis.list[[i]])
})
}
And we can show all the plots together:
library(patchwork)
(myplots[[1]] + myplots[[2]]) / (myplots[[3]] + myplots[[4]])
Note I created the expression list like this:
x.axis.list <- lapply(1:4, function(i){
parse(text = paste0("beta[paste(1, \",\", ", i, ")]"))
})

Resources