No layers in plot (R) - r

I have a table that look like below,
chr1 500 15 0.502 na
chr1 1000 21 0.641 0.019704955
chr1 1500 21 0.621 0.016777844
chr1 2000 22 0.534 na
chr1 2500 35 0.698 0.028712731
chr2 4500 2 0.371 na
chr2 5000 3 0.342 na
chr4 5500 1 0.068 na
chr4 6000 0 0.000 na
chr4 6500 0 0.000 na
chr5 7000 2 0.079 na
chr5 7500 12 0.440 na
From this table, I would like to generate multiple plots - one for each chr- where Xaxis and Y axis will be column 2 and 5.
Based on a response to another question, I tried this,
require(ggplot2)
require(plyr)
Y <- read.table("integ.pi")
names(Y) <- c("Chr","Window","SNPs","covfra","pi")
chrs <- levels(Y[,"Chr"])
c <- lapply(chrs, function(chr) {
ggplot(Y[Y[, "Chr"]==chr,], aes(x=as.factor(Window), y=pi))
})
lapply(c)
But I am an error
"Error: No layers in plot".
How should I go about this? Any ideas?
Thanks.
Cheers,

Just a simple example to see how to use the commands:
library(ggplot2)
dt = data.frame(Chr = c("c1","c1","c1","c2","c2","c2","c3","c3","c3"),
x = c(1,2,3,4,5,6,7,8,9),
y = c(2,4,5,2,3,4,6,6,7))
ggplot(dt, aes(x,y, col=Chr)) +
geom_point(size = 3) +
geom_line() +
facet_grid(. ~ Chr) # remove this to have all lines in same plot

Related

Finding overlaps between 2 ranges and their overlapped region lengths?

I need to find length of overlapped region on same chromosomes between 2 group(gp1 & gp2). (similar question in stackoverflow were different from my aim, because I wanna find overlapped region not a TRUE/FALSE answer).
For example:
gp1:
chr start end id1
chr1 580 600 1
chr1 900 970 2
chr3 400 600 3
chr2 100 700 4
gp2:
chr start end id2
chr1 590 864 1
chr3 550 670 2
chr2 897 1987 3
I'm looking for a way to compare these 2 group and get results like this:
id1 id2 chr overlapped_length
1 1 chr1 10
3 2 chr3 50
Should point you in the right direction:
Load libraries
# install.packages("BiocManager")
# BiocManager::install("GenomicRanges")
library(GenomicRanges)
library(IRanges)
Generate data
gp1 <- read.table(text =
"
chr start end id1
chr1 580 600 1
chr1 900 970 2
chr3 400 600 3
chr2 100 700 4
", header = TRUE)
gp2 <- read.table(text =
"
chr start end id2
chr1 590 864 1
chr3 550 670 2
chr2 897 1987 3
", header = TRUE)
Calculate ranges
gr1 <- GenomicRanges::makeGRangesFromDataFrame(
gp1,
seqnames.field = "chr",
start.field = "start",
end.field = "end"
)
gr2 <- GenomicRanges::makeGRangesFromDataFrame(
gp2,
seqnames.field = "chr",
start.field = "start",
end.field = "end"
)
Calculate overlaps
hits <- findOverlaps(gr1, gr2)
p <- Pairs(gr1, gr2, hits = hits)
i <- pintersect(p)
Result
> as.data.frame(i)
seqnames start end width strand hit
1 chr1 590 600 11 * TRUE
2 chr3 550 600 51 * TRUE

Plotting genomic data using RCircos package

I am trying to use the RCircos package in R to visualize links between genomic positions. I am unfamiliar with this package and have been using the package documentation available from the CRAN repository from 2016.
I have attempted to format my data according to the package requirements. Here is what it looks like:
> head(pts3)
Chromosome ChromStart ChromEnd Chromosome.1 ChromStart.1 ChromEnd.1
1 chr1 33 34 chr1 216 217
2 chr1 33 34 chr1 789 790
3 chr1 33 34 chr1 1716 1717
4 chr1 33 34 chr1 1902 1903
5 chr1 33 34 chr2 2538 2539
6 chr1 33 34 chr2 4278 4279
Ultimately, I would like to produce a plot with tracks from ChromStart to ChromStart.1 and each gene labeled along the outside of the plot. I thought the script would look something like:
RCircos.Set.Core.Components(cyto.info = pts3,
chr.exclude = NULL,
tracks.inside = 1,
tracks.outside = 2)
RCircos.Set.Plot.Area()
RCircos.Chromosome.Ideogram.Plot()
RCircos.Link.Plot(link.data = pts3,
track.num = 3,
by.chromosome = FALSE)
It appears that to do so, I must first initialize with the RCircos.Set.Core.Components() function which requires positional information for each gene to pass to RCircos.Chromosome.Ideogram.Plot(). So, I created a second data frame containing the required information to pass to the function and this is the error that I get:
> head(genes)
Chromosome ChromStart ChromEnd GeneName Band Stain
1 chr1 0 2342 PB2 NA NA
2 chr2 2343 4683 PB1 NA NA
3 chr3 4684 6917 PA NA NA
4 chr4 6918 8710 HA NA NA
5 chr5 8711 10276 NP NA NA
6 chr6 10277 11735 NA NA NA
> RCircos.Set.Core.Components(cyto.info = genes,
+ chr.exclude = NULL,
+ tracks.inside = 1,
+ tracks.outside = 2)
Error in RCircos.Validate.Cyto.Info(cyto.info, chr.exclude) :
Cytoband start should be 0.
I don't actually have data for the Band or Stain columns and don't understand what they are for, but adding data to the those columns (such as 1:8 or chr1, chr2, etc) does not resolve the problem. Based on a recommendation from another forum, I also tried to reset the plot parameters for RCircos using the following functions, but it did not resolve the error:
core.chrom <- data.frame("Chromosome" = c("chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8"),
"ChromStart" = c(0, 2343, 4684, 6918, 8711, 10277, 11736, 12763),
"ChromEnd" = c(2342, 4683, 6917, 8710, 10276, 11735, 12762, 13666),
"startLoc" = c(0, 2343, 4684, 6918, 8711, 10277, 11736, 12763),
"endLoc" = c(2342, 4683, 6917, 8710, 10276, 11735, 12762, 13666),
"Band" = NA,
"Stain" = NA)
RCircos.Reset.Plot.Ideogram(chrom.ideo = core.chrom)
Any advice would be deeply appreciated!
I'm not sure if you figured this one out or moved on etc. I had the same problem and ended up resolving it by reformatting my start positions for each chromosome to 0 as opposed to a continuation of the previous chr. For you it would be:
Chromosome ChromStart ChromEnd GeneName Band Stain
1 chr1 0 2342 PB2 NA NA
2 chr2 0 2340 PB1 NA NA
3 chr3 0 2233 PA NA NA
...etc

How to get the output from the function in a column of a dataframe - R

Hi I have a simple function:
same_picking <- function(cena){
data_model2$price_model2 <- 0.6 + cena * data_model2$item_SKU + 0.4
}
I would like the output to be rewritten in a column of a data.frame.
currently, because I still did not get the first writing of a function the column is still filled with NAs.. but I would like that after every run of a function the values would be rewriten in theat column.
count_code sifra item_SKU price_model2
281 0421 2 NA
683 0499 5 NA
903 0654 3 NA
7390 0942 3 NA
2778 0796 5 NA
2778 0796 7 NA
7066 0907 83 NA
281 0421 2 NA
I have tried with the comands: data.frame and within... but it got me nowhere.
I would appraciate the help.
Andraz
Solution:
same_picking <- function(cena){
data_model2$price_model2 <<- 0.6 + cena * data_model2$item_SKU + 0.4
}
<<- operator allows you to access the object from the ouside. Very clean :)
The simplest way would be to return the df from function:
df <- read.table(
text = "count_code sifra item_SKU price_model2
281 0421 2 NA
683 0499 5 NA
903 0654 3 NA
7390 0942 3 NA
2778 0796 5 NA
2778 0796 7 NA
7066 0907 83 NA
281 0421 2 NA",
header = TRUE)
head(df, 2)
# count_code sifra item_SKU price_model2
# 1 281 421 2 NA
# 2 683 499 5 NA
# 1st ---------------------------------------------------------------------
same_picking_1 <- function(df, cena){
df$price_model2 <- 0.6 + cena * df$item_SKU + 0.4
return(df)
}
df2 <- same_picking_1(df, 1)
head(df2, 2)
# count_code sifra item_SKU price_model2
# 1 281 421 2 3
# 2 683 499 5 6
Other options, data.table and dplyr:
same_picking_2 <- function(cena, item_SKU){
0.6 + cena * df$item_SKU + 0.4
}
# data.table --------------------------------------------------------------
library(data.table)
dt <- data.table(df)
dt[, price_model2 := same_picking_2(1, item_SKU)]
head(dt, 2)
# count_code sifra item_SKU price_model2
# 1: 281 421 2 3
# 2: 683 499 5 6
# dplyr -------------------------------------------------------------------
library(dplyr)
df3 <- df %>% mutate(price_model2 = same_picking_2(1, item_SKU))
head(df3, 2)
# count_code sifra item_SKU price_model2
# 1 281 421 2 3
# 2 683 499 5 6
Edit after OP comment:
You can also wrap data.table solution into a function
# data.table --------------------------------------------------------------
library(data.table)
same_picking_2_int <- function(cena, item_SKU){
0.6 + cena * df$item_SKU + 0.4
}
same_picking_2 <- function(dt, cena){
dt[, price_model2 := same_picking_2_int(cena, item_SKU)]
}
# Use update by reference
dt <- data.table(df)
head(dt, 2)
same_picking_2(dt, 1)
head(dt, 2)
# Slightly more readable, the same output, also utilizes the update by reference of data.table (see tracemem())
dt <- data.table(df)
tracemem(dt)
head(dt, 2)
dt <- same_picking_2(dt, 1)
head(dt, 2)

Replace a certain number with a phrase in R

I have a data.frame that looks like
SNP CHR BP A1 A2 EFF SE P NGT
rs882982 1 100066094 a g -.0179 .006 .002797 28 .486
rs7518025 2 100066515 t c .0198 .0059 .0007438 26 .47
rs6678322 3 100069554 a t .0187 .0059 .001498 25 .452
rs61784986 14 100074953 t c -.0182 .0058 .001748 26 .469
rs7554246 21 100075695 t c -.0167 .006 .004932 26 .455
rs12121193 16 100075777 a t -.0183 .0058 .001652 26 .471
rs835016 3 100078102 t c .02 .0065 .001979 28 .196
And I would like to add the letters "chr" before the numbers in the CHR column. My desired output is:
SNP CHR BP A1 A2 EFF SE P NGT
rs882982 chr1 100066094 a g -.0179 .006 .002797 28 .486
rs7518025 chr2 100066515 t c .0198 .0059 .0007438 26 .47
rs6678322 chr3 100069554 a t .0187 .0059 .001498 25 .452
rs61784986 chr14 100074953 t c -.0182 .0058 .001748 26 .469
rs7554246 chr21 100075695 t c -.0167 .006 .004932 26 .455
rs12121193 chr16 100075777 a t -.0183 .0058 .001652 26 .471
rs835016 chr3 100078102 t c .02 .0065 .001979 28 .196
Should I use grep somehow, or what are some useful R commands?
Just use paste():
df$CHR <- as.character(df$CHR) # in case it is a factor column
df$CHR <- paste("chr", df$CHR, sep="")

R: sum vector by vector of conditions

I am trying to obtain a vector, which contains sum of elements which fit condition.
values = runif(5000)
bin = seq(0, 0.9, by = 0.1)
sum(values < bin)
I expected that sum will return me 10 values - a sum of "values" elements which fit "<" condition per each "bin" element.
However, it returns only one value.
How can I achieve the result without using a while loop?
I understand this to mean that you want, for each value in bin, the number of elements in values that are less than bin. So I think you want vapply() here
vapply(bin, function(x) sum(values < x), 1L)
# [1] 0 497 1025 1501 1981 2461 2955 3446 3981 4526
If you want a little table for reference, you could add names
v <- vapply(bin, function(x) sum(values < x), 1L)
setNames(v, bin)
# 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
# 0 497 1025 1501 1981 2461 2955 3446 3981 4526
I personally prefer data.table over tapply or vapply, and findInterval over cut.
set.seed(1)
library(data.table)
dt <- data.table(values, groups=findInterval(values, bin))
setkey(dt, groups)
dt[,.(n=.N, v=sum(values)), groups][,list(cumsum(n), cumsum(v)),]
# V1 V2
# 1: 537 26.43445
# 2: 1041 101.55686
# 3: 1537 226.12625
# 4: 2059 410.41487
# 5: 2564 637.18782
# 6: 3050 904.65876
# 7: 3473 1180.53342
# 8: 3951 1540.18559
# 9: 4464 1976.23067
#10: 5000 2485.44920
cbind(vapply(bin, function(x) sum(values < x), 1L)[-1],
cumsum(tapply( values, cut(values, bin), sum)))
# [,1] [,2]
#(0,0.1] 537 26.43445
#(0.1,0.2] 1041 101.55686
#(0.2,0.3] 1537 226.12625
#(0.3,0.4] 2059 410.41487
#(0.4,0.5] 2564 637.18782
#(0.5,0.6] 3050 904.65876
#(0.6,0.7] 3473 1180.53342
#(0.7,0.8] 3951 1540.18559
#(0.8,0.9] 4464 1976.23067
Using tapply with a cut()-constructed INDEX vector seems to deliver:
tapply( values, cut(values, bin), sum)
(0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6]
25.43052 71.06897 129.99698 167.56887 222.74620 277.16395
(0.6,0.7] (0.7,0.8] (0.8,0.9]
332.18292 368.49341 435.01104
Although I'm guessing you would want the cut-vector to extend to 1.0:
bin = seq(0, 1, by = 0.1)
tapply( values, cut(values, bin), sum)
(0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6]
25.48087 69.87902 129.37348 169.46013 224.81064 282.22455
(0.6,0.7] (0.7,0.8] (0.8,0.9] (0.9,1]
335.43991 371.60885 425.66550 463.37312
I see that I understood the question differently than Richard. If you wanted his result you can use cumsum on my result.
Using dplyr:
set.seed(1)
library(dplyr)
df %>% group_by(groups) %>%
summarise(count = n(), sum = sum(values)) %>%
mutate(cumcount= cumsum(count), cumsum = cumsum(sum))
Output:
groups count sum cumcount cumsum
1 (0,0.1] 537 26.43445 537 26.43445
2 (0.1,0.2] 504 75.12241 1041 101.55686
3 (0.2,0.3] 496 124.56939 1537 226.12625
4 (0.3,0.4] 522 184.28862 2059 410.41487
5 (0.4,0.5] 505 226.77295 2564 637.18782
6 (0.5,0.6] 486 267.47094 3050 904.65876
7 (0.6,0.7] 423 275.87466 3473 1180.53342
8 (0.7,0.8] 478 359.65217 3951 1540.18559
9 (0.8,0.9] 513 436.04508 4464 1976.23067
10 NA 536 509.21853 5000 2485.44920

Resources