Show different data in top and bottom of Rcirclize - r

I have 2 dataframes with different number of rows and columns, and I'd like to show both of them in a circos plot with circlize.
My data looks like this:
df1=data.frame(replicate(7,sample(-200:200,200,rep=TRUE))/100)
df2=data.frame(replicate(2,sample(-200:200,200,rep=TRUE))/100)
#head(df1)
X1 X2 X3 X4 X5 X6 X7
1 -0.03 0.63 -0.33 0.73 -1.37 -1.39 1.96
2 -1.81 -1.24 -1.63 1.58 0.13 1.39 -0.76
3 0.02 -2.00 -1.93 -1.35 1.06 -0.58 -0.77
4 -1.11 -1.38 -0.66 -0.40 1.69 -0.47 -1.55
5 0.98 0.06 0.00 -0.35 1.97 1.74 0.72
6 1.51 -1.68 -0.44 -1.74 0.15 0.26 0.36
#head(df2)
X1 X2
1 0.16 -0.81
2 -1.38 -0.16
3 -0.22 -0.74
4 0.73 -0.82
5 0.58 -1.87
6 -0.63 1.50
I want to build a single circos plot where the top is showing df1 and bottom is showing df2, but I can only show individual dfs. For instance, this is how I show df1:
col_fun1=colorRamp2(c(min(df1), 0, max(df1)), c("blue", "white", "red"))
circos.heatmap(df1, col = col_fun1, cluster = T, track.height = 0.2, rownames.side = "outside", rownames.cex = 0.6)
circos.clear()
How can I df1 only in the top half, and df2 only in the bottom half?

Related

loop a function in r and output the result to .csv file

Subject var1 var2 var3 var4 var5
1 0.2 0.78 7.21 0.5 0.47
1 0.52 1.8 11.77 -0.27 -0.22
1 0.22 0.84 7.32 0.35 0.36
2 0.38 1.38 10.05 -0.25 -0.2
2 0.56 1.99 13.76 -0.44 -0.38
3 0.35 1.19 7.23 -0.16 -0.06
4 0.09 0.36 4.01 0.55 0.51
4 0.29 1.08 9.48 -0.57 -0.54
4 0.27 1.03 9.42 -0.19 -0.21
4 0.25 0.9 7.06 0.12 0.12
5 0.18 0.65 5.22 0.41 0.42
5 0.15 0.57 5.72 0.01 0.01
6 0.26 0.94 7.38 -0.17 -0.13
6 0.14 0.54 5.13 0.16 0.17
6 0.22 0.84 6.97 -0.66 -0.58
6 0.18 0.66 5.79 0.23 0.25
# the above is sample data matrix (dat11)
# The following lines of function is to calculate the p-value (P.z) for a
# variable pair var2 and var3 using lmer().
fit1 <- lmer(var2 ~ var3 + (1|Subject), data = dat11)
summary(fit1)
coefs <- data.frame(coef(summary(fit1)))
# use normal distribution to approximate p-value
coefs$p.z <- 2 * (1 - pnorm(abs(coefs$t.value)))
round(coefs,6)
# the following is the result
Estimate Std.Error t.value p.z
(Intercept) -0.280424 0.110277 -2.542913 0.010993
var3 0.163764 0.013189 12.417034 0.000000
The real data contains 65 variables (var1, var2....var65). I would like to use the above codes to find the above result for all possible pairs of 65 variables, eg, var1 ~ var2, var1 ~var3, ... var1 ~var65; var2 ~var3, var2 ~ var4, ... var2~var65; var3~var4, ... and so on. There will be about 2000 pairs. Can somebody help me with the loop codes and get the results to a .csv file? Thank you.

putting multiple plots in one panel

I am trying to plot a scatter plot in R using ggscatter function from ggpubr package. I am showing you a subset of my data.frame
tracking_id gene_short_name B1 B2 C1 C2
ENSG00000000003.14 TSPAN6 1.2 1.16 1.22 1.26
ENSG00000000419.12 DPM1 1.87 1.87 1.68 1.83
ENSG00000000457.13 SCYL3 0.59 0.63 0.82 0.69
ENSG00000000460.16 C1orf112 0.87 0.99 0.97 0.83
ENSG00000001036.13 FUCA2 1.59 1.59 1.4 1.39
ENSG00000001084.10 GCLC 1.43 1.55 1.46 1.32
ENSG00000001167.14 NFYA 1.2 1.3 1.39 1.21
ENSG00000001460.17 STPG1 0.43 0.46 0.34 0.76
ENSG00000001461.16 NIPAL3 0.72 0.84 0.78 0.74
I want to make scatter plot between B1 vs B1, B1 vs B2, B1 vs C1, B2 vs C2.
I used the following command
df <- read.table(file="transformation.txt",header= TRUE,sep = "\t")
lapply(3:6, function(X) ggscatter(df, x = "B1", y = colnames(df[X]), add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",add.params = list(color="blue")))
I get individual 4 plots. I want to have all 4 plots in 1 plot. How can I do this?
Thanks
Do you perhaps mean something like this?
library(GGally)
ggpairs(df[, -(1:2)])
GGally is a very nice R package offering a lot of customisation options for its plotting routines.
Sample data
df <- read.table(text =
"tracking_id gene_short_name B1 B2 C1 C2
ENSG00000000003.14 TSPAN6 1.2 1.16 1.22 1.26
ENSG00000000419.12 DPM1 1.87 1.87 1.68 1.83
ENSG00000000457.13 SCYL3 0.59 0.63 0.82 0.69
ENSG00000000460.16 C1orf112 0.87 0.99 0.97 0.83
ENSG00000001036.13 FUCA2 1.59 1.59 1.4 1.39
ENSG00000001084.10 GCLC 1.43 1.55 1.46 1.32
ENSG00000001167.14 NFYA 1.2 1.3 1.39 1.21
ENSG00000001460.17 STPG1 0.43 0.46 0.34 0.76
ENSG00000001461.16 NIPAL3 0.72 0.84 0.78 0.74", header = T)

Combining multiple functions into one plot (ggplot)

I have a (25x6) matrix containing the following observations (class: dataframe):
Mkt.RF SMB HML RMW CMA WML
-3.86 1.37 1.14 1.47 -2.35 0.05
1.10 -0.95 -1.60 1.17 -0.33 -2.96
2.44 -1.79 0.39 1.14 -2.31 -1.55
9.10 2.48 0.01 -1.43 -0.12 -7.61
-2.37 2.90 -0.84 0.84 -1.22 1.81
0.54 0.09 0.48 0.30 0.32 0.03
0.72 -0.48 0.40 0.20 -0.12 0.87
-6.09 1.57 1.04 1.05 0.43 1.13
3.43 -1.63 -0.55 1.45 -0.63 3.35
-1.35 0.32 -0.59 1.57 -0.80 3.43
2.90 0.52 0.00 -0.26 0.39 1.56
1.35 -0.22 -1.42 -1.58 0.19 2.25
-5.10 0.77 -1.34 1.21 -0.35 1.06
6.26 -1.91 -2.70 1.89 -1.94 3.01
-2.21 4.04 3.00 -0.07 1.09 0.38
-1.93 2.50 1.88 0.53 1.13 1.26
-5.48 1.04 2.45 0.79 0.61 0.90
-0.11 -1.34 2.59 3.32 2.21 0.10
4.13 0.15 0.66 -1.51 1.13 -0.18
-3.72 0.76 0.92 0.87 0.42 2.96
-0.64 -2.35 -1.31 0.27 0.55 0.94
2.52 -2.70 -1.71 -0.16 0.86 -3.55
-1.41 -0.20 -0.96 0.47 -0.25 2.56
-3.08 -0.45 -0.35 0.23 -2.21 1.55
1.78 -0.19 -1.64 -0.10 -1.17 0.69
I wish to produce two plots: (1) a probability density function, and (2) a cumulative distribution function in ggplot. I would like to have a function for each column, hence there should be 6 pdfs and 6 cdfs. I have produced the following:
Loaddata <- setwd("~/Desktop")
library(ggplot2)
library(plyr)
library(reshape2)
D <- read.table(file = "MyData.csv", header = TRUE, sep =";", dec = ",")
attach(D)
factors <- cbind(D[,2:7])
ggplot(faktors, aes(Mkt.RF)) + geom_density() + labs(x = "Return", y = "Distribution", title = "PDF")+
xlim(-20,20) + theme(plot.title = element_text(hjust = 0.5))
With this I can produce a plot with a single function (one column of data), but I am having trouble with combining all six functions into one plot. So that I can replicate something similar to this:
PDF functions example
Thank you in advance!
You can try
library(tidyverse)
df %>%
bind_rows(df, .id="gr") %>%
gather(key, value, -gr) %>%
ggplot() +
geom_density(data = . %>% filter(gr == 1), aes(value, color = key), size=1.1) +
stat_ecdf(data = . %>% filter(gr == 2), aes(value, color = key), size=1.1) +
facet_wrap(~gr, labeller = labeller(gr=c("1" = "PD", "2" = "CD")))
The single plots can be created using
df %>%
gather(key, value) %>%
ggplot(aes(value, color=key)) +
geom_density()
Here is a reproduceable example using mtcars, and plotting all the distributions on top of eachother
library(tidyverse)
mtcars %>%
gather(Variable, Value) %>%
ggplot(aes(x=Value, color=Variable)) +
geom_density(alpha=0)

group_by prevents correct formatting with pandoc

also posted as an issue on github
After using group_by, cannot output table with pandoc correctly with the digits= or round= parameters.
Take the group_by out of the chain and pandoc displays the table just fine. Add the group_by in and the number of decimal places of the floating point numbers is way to big to display.
# test dataframe
dat <- data.frame(matrix(rnorm(10 * 10), 10))
group <- rbinom(10,20,.1)
df1 <- cbind(group, dat)
library(pander)
pander(df1, digits = 2, keep.line.breaks = TRUE, split.table = Inf,
caption = "Not Grouped, correct format")
library(dplyr)
df2 <- df1 %>%
group_by(group)
pander(df2, digits = 2, keep.line.breaks = TRUE, split.table = Inf,
caption = "Grouped, incorrect format")
Is there a way around this?
As a workaround, you can convert the object df2 (of class tbl_df) to a data.frame object.
pander(as.data.frame(df2), digits = 2, keep.line.breaks = TRUE, split.table = Inf)
The result:
-----------------------------------------------------------------------
group X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
------- ------ ----- ----- ----- ------ ----- ----- ------ ------ -----
0 -0.55 -0.13 -0.71 -1.3 -0.096 0.49 0.73 -0.53 0.17 -0.44
2 -1.5 1.4 -2.1 0.96 -0.2 -0.36 0.33 0.2 0.67 -0.27
1 -2.3 -0.98 -1.5 1.1 0.87 -0.54 1.2 -0.24 0.31 -0.76
1 0.24 0.086 -0.78 0.39 -0.17 -0.2 -1.5 -1.1 -1.3 -0.72
0 0.2 -1.2 0.27 2.1 0.73 1.8 -0.12 -0.45 0.07 -0.29
1 0.022 0.084 -0.41 0.32 -0.023 0.38 0.57 -0.16 0.0011 -0.76
2 0.99 0.7 -0.32 -0.25 -0.17 -0.68 -0.59 0.29 0.77 -0.12
3 -1.3 -1.6 -0.14 0.49 0.61 1.2 0.14 -0.087 -1.2 -0.95
0 -0.073 -0.86 2 -0.87 0.51 -1.3 -0.94 0.022 0.6 0.68
3 1.8 -0.81 -0.4 0.72 2.1 0.19 0.086 1.7 0.19 -0.49
-----------------------------------------------------------------------

A better way to plot lots of lines (in ggplot perhaps)?

Using R 3.0.2, I have a dataframe that looks like
head()
0 5 10 15 30 60 120 180 240
YKL134C 0.08 -0.03 -0.74 -0.92 -0.80 -0.56 -0.54 -0.42 -0.48
YMR056C -0.33 -0.26 -0.56 -0.58 -0.97 -1.47 -1.31 -1.53 -1.55
YBR085W 0.55 3.33 4.11 3.47 2.16 2.19 2.01 2.09 1.55
YJR155W -0.44 -0.92 -0.27 0.75 0.28 0.45 0.45 0.38 0.51
YNL331C 0.42 0.01 -0.05 0.23 0.19 0.43 0.73 0.95 0.86
YOL165C -0.49 -0.46 -0.25 0.03 -0.26 -0.16 -0.12 -0.37 -0.34
Where row.names() are variable names, names() are measurement times, and the values are measurements. It's several thousand rows deep. Let's call it tmp.
I want to do a sanity check of plotting every variable as time versus value as a line-plot on one plot. What's a better way to do it than naively plotting each line with plot() and lines():
timez <- names(tmp)
plot(x=timez, y=tmp[1,], type="l", ylim=c(-5,5))
for (i in 2:length(tmp[,1])) {
lines(x=timez,y=tmp[i,])
}
The above crude answer is good enough, but I'm looking for a way to do this right. I had a concusion recently, so sorry if I'm missing something obvious. I've been doing that a lot.
Could it be something with transposing the data.frame so it's each timepoint observed across several thousand variables? Or melt()-ing the data.frame in some meaningful way? Is there someway of handling it in ggplot using aggregate()s of data.frames or something? This isn't the right way to do this, is it?
At a loss.
I personally prefer ggplot2 for all of my plotting needs. Assuming I've understood you correctly, you can put the data in long format with reshape2 and then use ggplot2 to plot all of your lines on the same plot:
library(reshape2)
df2<-melt(df,id.var="var")
names(df2)<-c("var","time","value")
df2$time<-as.numeric(substring(df2$time,2))
library(ggplot2)
ggplot(df2,aes(x=time,y=value,colour=var))+geom_line()
You can simply use matplot as follows
DF
## 0 5 10 15 30 60 120 180 240
## YKL134C 0.08 -0.03 -0.74 -0.92 -0.80 -0.56 -0.54 -0.42 -0.48
## YMR056C -0.33 -0.26 -0.56 -0.58 -0.97 -1.47 -1.31 -1.53 -1.55
## YBR085W 0.55 3.33 4.11 3.47 2.16 2.19 2.01 2.09 1.55
## YJR155W -0.44 -0.92 -0.27 0.75 0.28 0.45 0.45 0.38 0.51
## YNL331C 0.42 0.01 -0.05 0.23 0.19 0.43 0.73 0.95 0.86
## YOL165C -0.49 -0.46 -0.25 0.03 -0.26 -0.16 -0.12 -0.37 -0.34
matplot(t(DF), type = "l", xaxt = "n", ylab = "") + axis(side = 1, at = 1:length(names(DF)), labels = names(DF))
xaxt = "n" suppresses ploting x axis annotations. axis function allows you to specify details for any axis, in this case we are using to specify labels of x axis.
It should produce plot as below.

Resources