Highlight few points in plot - r

I try to plot the following data with few positions (points) to highlight
plot(b$pos,b$log_p,col==ifelse(b$pos==c(14824849,13920386,14837470),90,100), pch=19, xlab='Chromosome 21 position', ylab='-log10(p)')
The plot produced, only show one point highlighted red with the following warning message:
In b$pos == c(14824849, 13920386,14837470) : longer object length is not a multiple of shorter object length

OK, the issue is likely to be your condition in the ifelse. If you attempt the condition (b$pos==c(14824849,13920386,14837470)) outside of your ifelse() you will get an error message along the lines of:
longer object length is not a multiple of shorter object length
If you change the condition to:
b$pos %in% c(14824849,13920386,14837470)
You will get a vector of TRUE/FALSE values determining whether each entry in b$pos is present in the vector (14824849,13920386,14837470) rather than whether the entries in b$pos are equal to c(14824849,13920386,14837470).
x = c(49, 7, 66, 51, 43, 70, 35, 53, 6, 29)
y = c(10, 98, 44, 31, 37, 14, 64, 84, 4, 34)
x %in% c(6, 7)
[1] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
plot(x, y, col=ifelse(x %in% c(6, 7), 'red', 'blue'))
Now this dataset has 10 x values, if you were to write this:
plot(x, y, col=ifelse(x == c(1, 7), 'red', 'blue'))
This would work fine, the x values would be compared against 1 and 7 alternately e.g:
49 == 1 ?
7 == 7 ?
66 == 1?
51 == 7? .... etc etc.
The error message was saying that your vector length of 3 did not exactly go into the length of the b$pos.

Within the tidyverse and ggplot you can try
library(tidyverse)
tibble(x = c(49, 7, 66, 51, 43, 70, 35, 53, 6, 29),
y = c(10, 98, 44, 31, 37, 14, 64, 84, 4, 34),
gr=x %in% c(6, 7)) %>%
ggplot(aes(x,y, col=gr)) +
geom_point(size=2) +
ggalt::geom_encircle(data= . %>% filter(gr), color="green", s_shape=0) +
theme_bw()
Using ggalt::geom_encircle function you can draw a circle around your points of interest.

Related

R, creating a knights tour plot with a matrix indicating the path

I need to create a knight tour plot out of such an exemplary matrix:
Mat = matrix(c(1, 38, 55, 34, 3, 36, 19, 22,
54, 47, 2, 37, 20, 23, 4, 17,
39, 56, 33, 46, 35, 18, 21, 10,
48, 53, 40, 57, 24, 11, 16, 5,
59, 32, 45, 52, 41, 26, 9, 12,
44, 49, 58, 25, 62, 15, 6, 27,
31, 60, 51, 42, 29, 8, 13, 64,
50, 43, 30, 61, 14, 63, 28, 7), nrow=8, ncol=8, byrow=T)
Numbers indicate the order in which knight moves to create a path.
I have a lot of these kind of results with chessboard up to 75 in size, however I have no way of presenting them in a readable way, I found out that R, given the matrix, is capable of creating a plot like this:
link (this one is 50x50 in size)
So for the matrix I presented the lines between two points occur between the numbers like: 1 - 2 - 3 - 4 - 5 - ... - 64, in the end creating a path presented in the link, but for the 8x8 chessboard, instead of 50x50
However, I have a very limited time to learn R good enough to accomplish it, I am desperate for any kind of direction. How hard does creating such code in R, that tranforms any matrix into such plot, is going to be ? Or is it something trivial ? Any code samples would be a blessing
You can use geom_path as described here: ggplot2 line plot order
In order to do so you need to convert the matrix into a tibble.
coords <- tibble(col = rep(1:8, 8),
row = rep(1:8, each = 8))
coords %>%
mutate(order = Mat[8 * (col - 1) + row]) %>%
arrange(order) %>%
ggplot(aes(x = col, y = row)) +
geom_path() +
geom_text(aes(y = row + 0.25, label = order)) +
coord_equal() # Ensures a square board.
You can subtract .5 from the col and row positions to give a more natural chess board feel.

Collision detection and nearing in R

I have x and y values for points (on a grid with discrete steps). I want to find those points that are in the same position or within a certain range from another point. I tried with the functions match(), duplicated(), which(), for loops, and if cases of different kinds and somehow got stuck.
As an example:
x <- c(23, 45, 98, 23, 12)
y <- c(15, 90, 10, 15, 70)
[1] and [4] would 'collide' in this case.
x <- c(24, 45, 98, 23, 12)
y <- c(14, 90, 10, 15, 70)
range<-1
[1] and [4] would again 'collide' in this case.
Either index or values of the points will do, however I will need one information per collision.
This is brute force but should work well as long and x and y are not massive.
x <- c(24, 45, 98, 23, 12)
y <- c(14, 90, 10, 15, 70)
range <- 2
temp = as.matrix(dist(cbind(x, y)))
diag(temp) = Inf
unique(t(apply(which(temp < range, arr.ind = TRUE), 1, sort)))
# [,1] [,2]
#4 1 4

To count number of events occuring in large data in r

Hello I have a large data set, part of which might look something like this.
Seconds <- c(2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24)
B<- c(1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1)
C<-c(50, 60, 62, 65, 80, 60, 68, 66, 60, 69, 70, 89)
mydata<- data.frame(Seconds, B, C)
I am stuck in the analysis of this type of data. Getting straight to the problem, I need
Number of times C<80 for continuously more than 6 seconds and 10 seconds.
in this case
N6(C<80 for more than 6 seconds)=4
N10(C<80 for more than 10 seconds)=1
I hope this makes sense! Any help is appreciated :)
We can do
with(mydata, sum(C<80 & Seconds>=6 & B!=0))
#[1] 4
It could be also
library(data.table)
setDT(mydata)[Seconds>=6 & B!=0, sum(C<80), rleid(B)]
I would like to suggest this modest dplyr-based solution
# Libs
Vectorize(require)(package = c("dplyr", "magrittr"),
char = TRUE)
# Summary
mydata %<>%
mutate(criteria = ifelse(Seconds >= 6 & C < 80, TRUE, FALSE)) %>%
group_by(criteria) %>%
tally()
Preview
> head(mydata)
Source: local data frame [2 x 2]
criteria n
(lgl) (int)
1 FALSE 4
2 TRUE 8

Using sort() and cut() to segment numeric vector into groups

Here is the problem I'm attemptint to solve:
Create a function that inputs a vector of numeric grades (from 0 to 100) and outputs a vector of letter grades. Group grades A-D not by fixed cut-offs (e.g. A = scores 90 to 100), but by using a curve where 40% receive A's, 30% B's, 20% C's, 10% D's.
Here is what I've written thus far. It returns an error (see bottom). What is wrong with how I am tackling this?
letter.grade <- function(grades){
num.a <- .4*length(grades)
num.b <- .3*length(grades)
num.c <- .2*length(grades)
num.d <- .1*length(grades)
sort.grades <- sort(grades, decreasing = TRUE)
cut(grades,
breaks = c(sort.grades[0:num.a],sort.grades[num.a+1:num.b],
sort.grades[num.b+1:num.c],
sort.grades[num.c+1:num.d]),
labels = c("A", "B", "C", "D")
)
}
letter.grade(c(60, 39, 58, 36, 41, 44, 89, 17, 47, 63))
Error message:
Error in cut.default(grades, breaks = c(sort.grades[0:num.a],
sort.grades[num.a + : 'breaks' are not unique
Thanks!
Try this:
letter.grade(c(60, 39, 58, 36, 41, 44, 89, 17, 47, 63))
letter.grade <- function(grades){
num.a <- .6*length(grades)
num.b <- .3*length(grades)
num.c <- .1*length(grades)
sort.grades <- sort(grades, decreasing = FALSE)
cut(grades,
breaks = c(0,sort.grades[num.a],sort.grades[num.b],
sort.grades[num.c],100),
labels = c("D", "C", "B", "A")
)
}
letter.grade(c(60, 39, 58, 36, 41, 44, 89, 17, 47, 63))
Note in particular that because you want four categories, you must specify five (not four) breaks (including the lower and upper limits), in the same way that if you want to lay 100m of fencing with a post every metre, you'll need 101 fence posts.
Use quantile() and cut():
letter.grade<-function(samp){
q<-quantile(samp,c(1,0.6,0.3,0.1,0))
res<-cut(samp,q,include.lowest=TRUE)
levels(res)<-c("D","C","B","A")
return(res)
}
letter.grade(c(60, 39, 58, 36, 41, 44, 89, 17, 47, 63))
[1] A C A C B B A D B A
Levels: D C B A
If you dont like res beeing a level use as.numeric()

R finding relative maximum from outliers

Suppose I have a vector of numbers that I want to find a general cutoff for. For example:
x <- c(35, 2, 3, 30, 1, 4, 33, 6, 36)
In this case, I would want to only extract a subset that countains 35, 30, 33, 36. In this case the cutoff would be at 30 Without hardcoding a definite cutoff, I would like my code to adapt to different vectors of numbers in order to find that cutoff.
Another example would be:
x <- c(1, 20, 42, 13, 118, 149, 130, 30, 11, 32, 120, 0.5, 0.03)
In this case, a reasonable cutoff would be around 118.
Currently I am hard coding the cutoffs because I am dealing with simple cases, however I would like to make this process more modular for more variable vectors.
You could use the quantile function
cutoff <- function(y, prob=0.7) y[y > quantile(y, prob)]
x <- c(35, 2, 3, 30, 1, 4, 33, 6, 36)
cutoff(x)
[1] 35 33 36
x <- c(1, 20, 42, 13, 118, 149, 130, 30, 11, 32, 120, 0.5, 0.03)
cutoff(x)
[1] 118 149 130 120
And you can define a different probability as desired
cutoff(x, 0.8)
[1] 149 130 120

Resources