I have a data frame with several columns. the relevant three are chr, pos and ratio. I want to use ddply to ksmooth based on chr (chromosome) but keep getting a wrong data frame with lots of NA values. Here is my reproducible data frame:
d=data.frame(chr=c(rep.int(1,24),rep.int(2,15),rep.int(3,30),rep.int(4,20),rep.int(5,11)),
pos=c(sort(sample(1:1000, size = 24, replace = FALSE),decreasing = FALSE), sort(sample(1:1000, size = 15, replace = FALSE),decreasing = FALSE), sort(sample(1:1000, size = 30, replace = FALSE),decreasing = FALSE), sort(sample(1:1000, size = 20, replace = FALSE),decreasing = FALSE), sort(sample(1:1000, size = 11, replace = FALSE),decreasing = FALSE)),
ratio=seq(1:100))
and ddply function
f <- ddply(d, .(chr),
function(e) {
as.data.frame(ksmooth(e$pos,e$ratio,"normal",bandwidth=10))
})
Obviously I'm doing something wrong.
Thanks for the help,
Guy
This is nothing related to plyr::ddply. The issue is with ksmooth. You want:
ksmooth(e$pos, e$ratio, "normal", bandwidth=10, x.points = e$pos)
Read ?ksmooth for what x.points means. By default, this is NULL, and ksmooth will use n.points instead. This is the source of all your trouble.
Related
For a very large table, I would like to automatically have a barplot for each column, which is then also saved as a png.
The title can simply be the column title and the description of the columns correspond to the column variables. So no individual edit of the barplots is necessary
I have already created barplots by hand and experimented with the "lapply" command without success.
Here the barplot code
png(file="a_x.png", width=600, height=400)
barplot(table(Example$a_x), main = "a_x")
dev.off()
You can do a simple for loop and paste0 for the file name:
Data
df <- data.frame(a_x = sample(c("Yes","No"), 100, prob = c(0.10,0.90), replace = TRUE),
b_x = sample(c("Yes","No"), 100, prob = c(0.10,0.90), replace = TRUE),
c_x = sample(c("Yes","No"), 100, prob = c(0.10,0.90), replace = TRUE))
Code
for(i in colnames(df)){
png(file = paste0(i,".png"), width = 600, height = 400)
barplot(table(df[i]), main = i)
dev.off()
}
I am trying to draw a Venn diagram with four logical variables. I have tried many different R packages but with each of them I have faced some problems. So far, the best result I have achieved by using the ggvenn package. However, the problem is, that it shows the percentages of the intersections based on the observations included in the diagram, instead of all observations in the data.
Below is an example Venn diagram and its code to illustrate the problem. So my question is: is there some way to display the percentages in relation to the total amount of observations in the data. For instance, in the diagram below the intersection of ABCD consists of 45 observation and thus the correct proportion would be 4.5% (i.e. 45/1000) instead of 4.7%.
I would really appreciate if someone could help me out with this.
library(ggvenn)
a <- sample(x = c(TRUE, TRUE, FALSE), size = 1000, replace = TRUE)
b <- sample(x = c(TRUE,FALSE, FALSE), size = 1000, replace = TRUE)
c <- sample(x = c(TRUE, FALSE, FALSE, FALSE), size = 1000, replace = TRUE)
d <- sample(x = c(TRUE, TRUE, TRUE, FALSE), size = 1000, replace = TRUE)
df <- tibble(values = c(1:1000), A=a, B=b, C=c, D=d)
ggvenn(df,
fill_color = c("black", "grey70", "grey80", "grey90"),
show_percentage = TRUE,
digits = 1,
text_size = 2.5)
A quick fix for this problem would be to use the latest Github version of {ggvenn}:
remove.packages("ggvenn")
devtools::install_github("yanlinlin82/ggvenn")
Then, by default, there will be also a percentage for the observations that are outside A/B/C/D. This gives you the percentages you're looking for:
I have data that look like this:
Gene
HBEC-KT-01
HBEC-KT-02
HBEC-KT-03
HBEC-KT-04
HBEC-KT-05
Primarycells-02
Primarycells-03
Primarycells-04
Primarycells-05
BPIFB1
15726000000
15294000000
15294000000
14741000000
22427000000
87308000000
2.00E+11
1.04E+11
1.51E+11
LCN2
18040000000
26444000000
28869000000
30337000000
10966000000
62388000000
54007000000
56797000000
38414000000
C3
2.52E+11
2.26E+11
1.80E+11
1.80E+11
1.78E+11
46480000000
1.16E+11
69398000000
78766000000
MUC5AC
15647000
8353200
12617000
12221000
29908000
40893000000
79830000000
28130000000
69147000000
MUC5B
965190000
693910000
779970000
716110000
1479700000
38979000000
90175000000
41764000000
50535000000
ANXA2
14705000000
18721000000
21592000000
18904000000
22657000000
28163000000
24282000000
21708000000
16528000000
I want to make a heatmap like the following using R. I am following a paper and they quoted "Heat maps were generated with the ‘pheatmap’ package76, where correlation clustering distance row was applied". Here is their heatmap.
I want the same like this and I am trying to make one using R by following tutorials but I am new to R language and know nothing about R.
Here is my code.
df <- read.delim("R.txt", header=T, row.names="Gene")
df_matrix <- data.matrix(df)
pheatmap(df_matrix,
main = "Heatmap of Extracellular Genes",
color = colorRampPalette(rev(brewer.pal(n = 10, name = "RdYlBu")))(10),
cluster_cols = FALSE,
show_rownames = F,
fontsize_col = 10,
cellwidth = 40,
)
This is what I get.
When I try using clustering, I got the error.
pheatmap(
mat = df_matrix,
scale = "row",
cluster_column = F,
show_rownames = TRUE,
drop_levels = TRUE,
fontsize = 5,
clustering_method = "complete",
main = "Hierachical Cluster Analysis"
)
Error in hclust(d, method = method) :
NA/NaN/Inf in foreign function call (arg 10)
Can someone help me with the code?
You can normalize the data using scale to archive a more uniform coloring. Here, the mean expression is set to 0 for each sample. Genes lower expressed than average have a negative z score:
library(tidyverse)
library(pheatmap)
data <- tribble(
~Gene, ~`HBEC-KT-01`, ~`HBEC-KT-02`, ~`HBEC-KT-03`, ~`HBEC-KT-04`, ~`HBEC-KT-05`, ~`Primarycells-03`, ~`Primarycells-04`, ~`Primarycells-05`,
"BPIFB1", 1.5726e+10, 1.5294e+10, 1.5294e+10, 1.4741e+10, 2.2427e+10, 2e+11, 1.04e+11, 1.51e+11,
"LCN2", 1.804e+10, 2.6444e+10, 2.8869e+10, 3.0337e+10, 1.0966e+10, 5.4007e+10, 5.6797e+10, 3.8414e+10,
"C3", 2.52e+11, 2.26e+11, 1.8e+11, 1.8e+11, 1.78e+11, 1.16e+11, 6.9398e+10, 7.8766e+10,
"MUC5AC", 15647000, 8353200, 12617000, 12221000, 29908000, 7.983e+10, 2.813e+10, 6.9147e+10,
"MUC5B", 965190000, 693910000, 779970000, 716110000, 1479700000, 9.0175e+10, 4.1764e+10, 5.0535e+10,
"ANXA2", 1.4705e+10, 1.8721e+10, 2.1592e+10, 1.8904e+10, 2.2657e+10, 2.4282e+10, 2.1708e+10, 1.6528e+10
)
data %>%
mutate(across(where(is.numeric), scale)) %>%
column_to_rownames("Gene") %>%
pheatmap(
scale = "row",
cluster_column = F,
show_rownames = FALSE,
show_colnames = TRUE,
treeheight_col = 0,
drop_levels = TRUE,
fontsize = 5,
clustering_method = "complete",
main = "Hierachical Cluster Analysis (z-score)",
)
Created on 2021-09-26 by the reprex package (v2.0.1)
I am using following function to generate stars(), one the visualization technique for multivariate data.
library(randomNames)
set.seed(3)
Name = randomNames(50, which.names = 'first')
height = sample(160:180, 50, replace = TRUE)
weight = sample(45:85, 50, replace = TRUE)
tumour_size = runif(50, 0,1)
df = data.frame(Name, height, weight, tumour_size, rnorm(50, 10,3))
stars(df,labels = Name)
But, I get the output like this:
How to align the names exactly below the stars?
Use option flip.labels=FALSE.
stars(df, labels = Name, flip.labels = FALSE)
Result
I want to write a set of randomly generated numbers to a text file with fixed format. But for some reasons, write.fwf only wrote the 1st column right, all other columns got one extra digit. How can I fix it? Thanks!
set.seed(1899)
library(sensitivity)
library(randtoolbox)
par_lower <- c( 0.12, 0.13, 0.038, 0.017)
par_upper <- c(12.00, 13.00, 3.800, 1.700)
sample_size <- 5
lim_para8 <- c(par_lower[1], par_upper[1])
lim_para9 <- c(par_lower[2], par_upper[2])
lim_parb8 <- c(par_lower[3], par_upper[3])
lim_parb9 <- c(par_lower[4], par_upper[4])
par_rand <- parameterSets(par.ranges = list(lim_para8, lim_para9,
lim_parb8, lim_parb9),
samples = sample_size, method = "sobol")
par_rand
# write to file
library(gdata)
file2write <- paste("par.txt", sep = "")
write.fwf(par_rand, file = file2write, width = c(10, 10, 10, 10), colnames = FALSE)
The results:
6.060 6.56500 1.91900 0.858500
9.030 3.34750 2.85950 0.437750
3.090 9.78250 0.97850 1.279250
4.575 4.95625 2.38925 0.227375
10.515 11.39125 0.50825 1.068875
If I changed to
write.fwf(par_rand, file = file2write, width = c(10, 9, 9, 9),
colnames = FALSE, quote = FALSE, rownames = FALSE)
I got this error
Error in write.fwf(par_rand, file = file2write, width = c(10, 9, 9, 9), :
'width' (9) was too small for columns: V4
'width' should be at least (10)
Please try the code below, it works for me. I tested with several formats and all worked. Both code segments return a fixed format file with width 4 x 10.
This of course implies that setting sep in the definition of file2write does not work for getting the desired output with write.fwf
write.fwf(par_rand, file = "par2.txt", width = c(10, 10, 10, 10), colnames = FALSE, sep = "")
write.fwf(par_rand, file = file2write, width = c(10, 10, 10, 10), colnames = FALSE, sep = "")
The following generates the same but with 1x10 and 3x9, as I think you wanted
write.fwf(par_rand, file = "par3.txt", width = c(10, 9, 9, 9), colnames = FALSE, sep = "")
Please let me know whether this is what you wanted.