multiple columns plot with correlation value in ggplot2 - r

Hi I have a dataframe df as below.
I would like to make a facet plot that shows relation between columns A & B, A & C, A & D , B & C and C & D and overlay a regression line and person's correlation coefficient value.
I am trying to make a facet plot to show relation between each of these column could not figure out exactly how.
Any help would be appreciated. This question is unique in SO as there are not any ans for plot among columns.
df<- read.table(text =c("A B C D
0.451 0.333 0.034 0.173
0.491 0.27 0.033 0.207
0.389 0.249 0.084 0.271
0.425 0.819 0.077 0.281
0.457 0.429 0.053 0.386
0.436 0.524 0.049 0.249
0.423 0.27 0.093 0.279
0.463 0.315 0.019 0.204
"), header = T)
df
pl<-ggplot(data=df) + geom_point(aes(x=A,y=B,size=10)) +
geom_point(aes(x=A,y=C,size=10)) +
geom_point(aes(x=A,y=C,size=10)) +
geom_point(aes(x=A,y=D,size=10)) +
geom_smooth(method = "lm", se=FALSE, color="black")
pl

Related

Non linear regression for exponential decay model in R

I have the following problem:
I asked 5 people (i=1, ..., 5) to forecast next period's return of 3 different stocks. This gives me the following data:
S_11_i_c <-read.table(text = "
i c_1 c_2 c_3
1 0.150 0.70 0.190
2 0.155 0.70 0.200
3 0.150 0.75 0.195
4 0.160 0.80 0.190
5 0.150 0.75 0.180
",header = T)
In words, in period t=10 participant i=1 expects the return of stock c_1 to be 0.15 in period t=11.
The forecasts are based on past returns of the stocks. These are the following:
S_t_c <-read.table(text = "
time S_c_1 S_c_2 S_c_3
1 0.020 0.015 0.040
2 0.045 0.030 0.050
3 0.060 0.045 0.060
4 0.075 0.060 0.060
5 0.090 0.070 0.060
6 0.105 0.070 0.090
7 0.120 0.070 0.120
8 0.125 0.070 0.140
9 0.130 0.070 0.160
10 0.145 0.070 0.180
",header = T)
In words, stock c=1 had a return of 0.145 in period 10.
So, the variables in table S_11_i_c are the dependent variables.
The variables in table S_t_c are the independet variables.
The model I want to estimate is the following:
My problem with coding this is as follows:
I do only know how to express
with the help of a loop. As in:
Sum_S_t_c <- data.frame(
s = seq(1:9),
c_1 = rnorm(9)
c_2 = rnorm(9)
c_3 = rnorm(9)
)
Sum_S_t_c = 0
for (c in 2:4) {
for (s in 0:9) {
Sum_S_t_c[s,c] <- Sum_S_t_c + S_t_c[10-s, c]
Sum_S_t_c = Sum_S_t_c[s,c]
}
}
However, loops within a regression are not possible. So, my other solution would be to rewrite the sum to
However, as my actual problem has a much larger n, this isn*t realy working for me.
Any ideas?

Two types of variables in a single Heatmap (using R)

I have to plot data from immunized animals in a way to visualize possible correlations in protection. As a background, when we vaccinate an animal it produces antibodies, which might or not be linked to protection. We immunized bovine with 9 different proteins and measured antibody titers which goes up to 1.5 (Optical Density (O.D.)). We also measured tick load that goes up to 5000. Each animal have different titers for each protein and different tick loads, maybe some proteins are more important for protection than the others, and we think that a heatmap could illustrate it.
TL;DR: Plot a heatmap with one variable (Ticks) that goes from 6 up to 5000, and another variable (Prot1 to Prot9) that goes up to 1.5.
A sample of my data:
Animal Group Ticks Prot1 Prot2 Prot3 Prot4 Prot5 Prot6 Prot7 Prot8 Prot9
G1-54-102 control 3030 0.734 0.402 0.620 0.455 0.674 0.550 0.654 0.508 0.618
G1-130-102 control 5469 0.765 0.440 0.647 0.354 0.528 0.525 0.542 0.481 0.658
G1-133-102 control 2070 0.367 0.326 0.386 0.219 0.301 0.231 0.339 0.247 0.291
G3-153-102 vaccinated 150 0.890 0.524 0.928 0.403 0.919 0.593 0.901 0.379 0.647
G3-200-102 vaccinated 97 1.370 0.957 1.183 0.658 1.103 0.981 1.051 0.534 1.144
G3-807-102 vaccinated 606 0.975 0.706 1.058 0.626 1.135 0.967 0.938 0.428 1.035
I have little knowledge in R, but I'm really excited to learn more about it. So feel free to put whatever code you want and I will try my best to understand it.
Thank you in advance.
Luiz
Here is an option to use the ggplot2 package to create a heatmap. You will need to convert your data frame from wide format to long format. It is also important to convert the Ticks column from numeric to factor if the numbers are discrete.
library(tidyverse)
library(viridis)
dat2 <- dat %>%
gather(Prot, Value, starts_with("Prot"))
ggplot(dat2, aes(x = factor(Ticks), y = Prot, fill = Value)) +
geom_tile() +
scale_fill_viridis()
DATA
dat <- read.table(text = "Animal Group Ticks Prot1 Prot2 Prot3 Prot4 Prot5 Prot6 Prot7 Prot8 Prot9
'G1-54-102' control 3030 0.734 0.402 0.620 0.455 0.674 0.550 0.654 0.508 0.618
'G1-130-102' control 5469 0.765 0.440 0.647 0.354 0.528 0.525 0.542 0.481 0.658
'G1-133-102' control 2070 0.367 0.326 0.386 0.219 0.301 0.231 0.339 0.247 0.291
'G3-153-102' vaccinated 150 0.890 0.524 0.928 0.403 0.919 0.593 0.901 0.379 0.647
'G3-200-102' vaccinated 97 1.370 0.957 1.183 0.658 1.103 0.981 1.051 0.534 1.144
'G3-807-102' vaccinated 606 0.975 0.706 1.058 0.626 1.135 0.967 0.938 0.428 1.035",
header = TRUE, stringsAsFactors = FALSE)
In the newest version of ggplot2 / the tidyverse, you don't even need to explicitly load the viridis-package. The scale is included via scale_fill_viridis_c(). Exciting times!

Marginal densities (or bar plots) on facets in ggplot2

my problem is the following: I have this table below
0 1-5 6-10 11-15 16-20 21-26 27-29
a 0.019 0.300 0.296 0.211 0.117 0.042 0.014
b 0.058 0.448 0.308 0.120 0.042 0.019 0.005
c 0.026 0.277 0.316 0.187 0.105 0.068 0.020
d 0.054 0.297 0.378 0.108 0.108 0.041 0.014
e 0.004 0.252 0.358 0.216 0.102 0.053 0.015
f 0.032 0.097 0.312 0.280 0.161 0.065 0.054
g 0.113 0.500 0.233 0.094 0.043 0.014 0.003
h 0.328 0.460 0.129 0.050 0.020 0.010 0.003
representing some marginal frequencies (by row) for each subgroups of my data (a to h).
My dataset is actually in the long format (very long, counting more than 100 thousand entries), with the first 6 rows as you see below:
RX_SUMM_SURG_PRIM_SITE Nodes.Examined.Class
1 Wedge Resection 1-5
2 Segmental Resection 1-5
3 Lobectomy w/mediastinal LNdissection 6-10
4 Lobectomy w/mediastinal LNdissection 6-10
5 Lobectomy w/mediastinal LNdissection 1-5
6 Lobectomy w/mediastinal LNdissection 11-15
When I plot a barplot by group (the table above is simply the cross tabulation of of these two covariates with the row marginal probabilities taken) here's what happens:
The code I have for this is
ggplot(data.ln.red, aes(x=Nodes.Examined.Class))+geom_bar(aes(x=Nodes.Examined.Class, group=RX_SUMM_SURG_PRIM_SITE))+
facet_grid(RX_SUMM_SURG_PRIM_SITE~.)
Actually I would be very happy only with the marginal frequencies (i.e. the ones in the table) on each y-axis of the facets of the plot (instead of the counts).
Anybody can help me with this?
Thanks for all your help!
EM
geom_bar calculates both counts and proportions of observations. You can access these calculated proportions with either ..prop.. (the old way) or calc(prop) (introduced in newer versions of ggplot2). Use this as your y aesthetic.
You can also get rid of the aes you have in geom_bar, as this is just a repeat of what you've already covered by ggplot and facet_grid.
It looks like your counts/proportions are going to vary widely between groups, so I'm adding free y-scaling to the faceting.
Here's an example of a similar plot with the iris data, which you can model your code off of:
library(tidyverse)
ggplot(iris, aes(x = Sepal.Length, y = calc(prop))) +
geom_bar() +
facet_grid(Species ~ ., scales = "free_y")
Created on 2018-04-06 by the reprex package (v0.2.0).
Edit: the calculated prop variable is proportions within each group, not proportions across all groups, so it works differently when x is a factor. For categorical x, prop treats x as the group; to override this, include group = 0 or some other dummy value in your aes. Sorry I missed that the first time!

Removing outliers from facet_wrap boxplots in ggplot

How can I change the y axis to exclude outliers (not just hide them but scale the y axis so as not to include them) for geom_boxplot with multiple individual boxplots using facet_wrap? An example of my dataset is:
Pop. grp1 grp2 grp3 grp4 grp5 grp6 grp7 grp8
a 0.00652 1.27 0.169 0.859 0.388 0.521 3.58 0.0912
a 0.0133 0.136 0.154 0.167 0.845 0.159 0.561 0.108
a 0.0270 1.60 0.119 0.515 0.0386 0.0145 0.884 0.0155
b 0.00846 0.331 0.100 0.897 0.330 2.52 0.663 0.0338
b 0.0154 0.0997 0.122 0.0873 0.905 0.136 0.413 0.139
b 0.0353 0.536 0.171 0.471 0.0280 0.00608 0.414 0.00973
where I'd like to make a boxplot for each column showing populations a and b.
I've melted the data by population and then used geom_boxplot + facet_wrap but some outliers are so far above the whiskers that the boxes themselves barely show. The code I've used is:
wc.m <- melt(w_c_diff_ab, id.var="Pop.")
p.wc <- ggplot(data = wc.m, aes(x=variable, y=value)) + geom_boxplot(aes(fill=Population))
p.wc + facet_wrap( ~ variable, scales="free") + scale_fill_manual(values=c("skyblue", violetred1"))
but I'm struggling to remove outliers as I'm not sure how to calculate limits for the y axes on a per-boxplot basis.

How to set the level above which to display factor loadings from factanal() in R?

I was performing factor analysis with data state.x77, which is in R by default. After running the analysis, I inspected the factor loadings.
> output = factanal(state.x77, factors=3, rotation="promax")
> ld = output$loadings
> ld
Loadings:
Factor1 Factor2 Factor3
Population 0.161 0.239 -0.316
Income -0.149 0.681
Illiteracy 0.446 -0.284 -0.393
Life Exp -0.924 0.172 -0.221
Murder 0.917 0.103 -0.129
HS Grad -0.414 0.731
Frost 0.107 1.046
Area 0.387 0.585 0.101
Factor1 Factor2 Factor3
SS loadings 2.274 1.519 1.424
Proportion Var 0.284 0.190 0.178
Cumulative Var 0.284 0.474 0.652
It looks like that by default R is blocking all values less than 0.1. I was wondering if there is a way to set this blocking level by hand, say 0.3 instead of 0.1?
try this:
print(output$loadings, cutoff = 0.3)
see ?print.loadings for the details.

Resources