Represent visually a vector of index - R - r

I have a "logical" vector of N components like this:
0 0 0 0 0 1 1 1 1 0 1 0 1 0 ...
I want to show a vector/matrix where the elements are colors. The element i is one color if the element i of my logical value is a 0 and another color otherwise. It's for representing a logical vector in a visual way.

You can use your logical vector in a col= argument for plotting, and the logical will be coerced into numeric. So you could do
logi_vec <- sample(c(T,F), 20, replace=T)
# [1] TRUE FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE
x <- rnorm(20)
# simple plotting, the pch=16 produces a solid dot
plot(x, col=logi_vec+1, pch=16) # black vs. red
plot(x, col=logi_vec+2, pch=16) # red vs. green
plot(x, col=2*(logi_vec+1), pch=16) # red vs. blue
etc.
Note that this will work exactly the same way with a vector of 0/1 as with FALSE/TRUE.
If you want to see which colors correspond with which numbers on your machine, check out
palette()
# [1] "black" "red" "green3" "blue" "cyan" "magenta" "yellow" "gray"
So on my machine, a color of value 1 is black, 2 is red, etc. Check out ?palette to see how to change the default values.

Not exactly sure what your expected output is, but maybe something like:
x <- structure(c(0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0), .Dim = c(14L,
1L))
image(x, col = grey.colors(start = 1, end = 0, n = 2))
To give:
Edit: A nicer version:
z <- structure(c(0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0), .Dim = c(14L,
1L))
x <- 1:nrow(z)
y <- 1:ncol(z)
image(x, y, z, col = grey.colors(start = 1, end = 0, n = 2), yaxt = "n", xaxs = "r")

Related

R Manipulating List of Lists With Conditions / Joining Data

I have the following data showing 5 possible kids to invite to a party and what neighborhoods they live in.
I have a list of solutions as well (binary indicators of whether the kid is invited or not; e.g., the first solution invites Kelly, Gina, and Patty.
data <- data.frame(c("Kelly", "Andrew", "Josh", "Gina", "Patty"), c(1, 1, 0, 1, 0), c(0, 1, 1, 1, 0))
names(data) <- c("Kid", "Neighborhood A", "Neighborhood B")
solutions <- list(c(1, 0, 0, 1, 1), c(0, 0, 0, 1, 1), c(0, 1, 0, 1, 1), c(1, 0, 1, 0, 1), c(0, 1, 0, 0, 1))
I'm looking for a way to now filter the solutions in the following ways:
a) Only keep solutions where there are at least 3 kids from both neighborhood A and neighborhood B (one kid can count as one for both if they're part of both)
b) Only keep solutions that have at least 3 kids selected (i.e., sum >= 3)
I think I need to somehow join data to the solutions in solutions, but I'm a bit lost on how to manipulate everything since the solutions are stuck in lists. Basically looking for a way to add entries to every solution in the list indicating a) how many kids the solution has, b) how many kids from neighborhood A, and c) how many kids from neighborhood B. From there I'd have to somehow filter the lists to only keep the solutions that satisfy >= 3?
Thank you in advance!
I wrote a little function to check each solution and return TRUE or FALSE based on your requirements. Passing your solutions to this using sapply() will give you a logical vector, with which you can subset solutions to retain only those that met the requirements.
check_solution <- function(solution, data) {
data <- data[as.logical(solution),]
sum(data[["Neighborhood A"]]) >= 3 && sum(data[["Neighborhood B"]]) >= 3
}
### No need for function to test whether `sum(solution) >= 3`, since
### this will *always* be true if either neighborhood sums is >= 3.
tests <- sapply(solutions, check_solution, data = data)
# FALSE FALSE FALSE FALSE FALSE
solutions[tests]
# list()
### none of the `solutions` provided actually meet criteria
Edit: OP asked in the comments how to test against all neighborhoods in the data, and return TRUE if a specified number of neighborhoods have enough kids. Below is a solution using dplyr.
library(dplyr)
data <- data.frame(
c("Kelly", "Andrew", "Josh", "Gina", "Patty"),
c(1, 1, 0, 1, 0),
c(0, 1, 1, 1, 0),
c(1, 1, 1, 0, 1),
c(0, 1, 1, 1, 1)
)
names(data) <- c("Kid", "Neighborhood A", "Neighborhood B", "Neighborhood C",
"Neighborhood D")
solutions <- list(c(1, 0, 0, 1, 1), c(0, 0, 0, 1, 1), c(0, 1, 0, 1, 1),
c(1, 0, 1, 0, 1), c(0, 1, 0, 0, 1))
check_solution <- function(solution,
data,
min_kids = 3,
min_neighborhoods = NULL) {
neighborhood_tests <- data %>%
filter(as.logical(solution)) %>%
summarize(across(starts_with("Neighborhood"), ~ sum(.x) >= min_kids)) %>%
as.logical()
# require all neighborhoods by default
if (is.null(min_neighborhoods)) min_neighborhoods <- length(neighborhood_tests)
sum(neighborhood_tests) >= min_neighborhoods
}
tests1 <- sapply(solutions, check_solution, data = data)
solutions[tests1]
# list()
tests2 <- sapply(
solutions,
check_solution,
data = data,
min_kids = 2,
min_neighborhoods = 3
)
solutions[tests2]
# [[1]]
# [1] 1 0 0 1 1
#
# [[2]]
# [1] 0 1 0 1 1

if_else with haven_labelled column fails because of wrong class

I have the following data:
dat <- structure(list(value = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
label = "value: This is my label",
labels = c(`No` = 0, `Yes` = 1),
class = "haven_labelled"),
group = structure(c(1, 2, 1, 1, 2, 3, 3, 1, 3, 1, 3, 3, 1, 2, 3, 2, 1, 3, 3, 1),
label = "my group",
labels = c(first = 1, second = 2, third = 3),
class = "haven_labelled")),
row.names = c(NA, -20L),
class = c("tbl_df", "tbl", "data.frame"),
label = "test.sav")
As you can see, the data uses a special class from tidyverse's haven package, so called labelled columns.
Now I want to recode my initial value variable such that:
if group equals 1, value should stay the same, otherwise it should be missing
I was trying the following, but getting an error:
dat_new <- dat %>%
mutate(value = if_else(group != 1, NA, value))
# Error: `false` must be a logical vector, not a `haven_labelled` object
I got so far as to understand that if_else from dplyr requires the true and false checks in the if_else command to be of same class and since there is no NA equivalent for class labelled (e.g. similar to NA_real_ for doubles), the code probably fails, right?
So, how can I recode my inital variables and preserve the labels?
I know I could change my code above and replace the if_else by R's base version ifelse. However, this deletes all labels and coerces the value column to a numeric one.
You can try dplyr::case_when for cases where group == 1. If no cases are matched, NA is returned:
dat %>% mutate(value = case_when(group == 1 ~ value))
You can create an NA value in the haven_labelled class with this ugly code:
haven::labelled(NA_real_, labels = attr(dat$value, "labels"))
I'd recommend writing a function for that, e.g.
labelled_NA <- function(value)
haven::labelled(NA_real_, labels = attr(value, "labels"))
and then the code for your mutate isn't quite so ugly:
dat_new <- dat %>%
mutate(value = if_else(group != labelled_NA(value), value))
Then you get
> dat_new[1:5,]
# A tibble: 5 x 2
value group
<dbl+lbl> <dbl+lbl>
1 NA 1 [first]
2 NA 2 [second]
3 0 [No] 1 [first]
4 0 [No] 1 [first]
5 NA 2 [second]

How to color code a categorical variable in a mosaic

I am trying to display a relationship between my categorical variables. I finally got my data into what I believe is a contingency table
subs_count
## [,1] [,2] [,3] [,4]
## carbohydrate 2 0 11 2
## cellulose 18 0 60 0
## chitin 0 4 0 4
## hemicellulose 21 3 10 0
## monosaccharide 3 0 0 0
## pectin 8 0 2 2
## starch 1 0 4 0
Where each column represents an organism. So for my plot I put in
barplot(subs_count, ylim = c(0, 100), col = predicted.substrate,
xlab = "organism", ylab = "ESTs per substrate")
But my substrates are not consistently the same color. What am I doing wrong?
Your data seems to be a matrix with row names which is close to a contingency table in R but not exactly the same. Some plotting methods have additional support for tables.
More importantly, I couldn't run your code because it is unclear what predicted.substrate is. If it were a palette with 7 colors then it should do what you intend to do (or at least what I think you intend).
I replicated your data with:
subs_count <- structure(c(2, 18, 0, 21, 3, 8, 1, 0, 0,
4, 3, 0, 0, 0, 11, 60, 0, 10, 0, 2, 4, 2, 0, 4, 0, 0, 2, 0),
.Dim = c(7L, 4L), .Dimnames = list(c("carbohydrate", "cellulose",
"chitin", "hemicellulose", "monosaccharide", "pectin", "starch"), NULL))
And then transformed them into a table by:
subs_count <- as.table(subs_count)
names(dimnames(subs_count)) <- c("EST", "Organism")
Then I used a qualitative palette from the colorspace package:
subs_pal <- colorspace::qualitative_hcl(7)
And with your barplot seems to be reasonable:
barplot(subs_count, ylim = c(0,100), col = subs_pal,
xlab = "organism", ylab = "ESTs per substrate", legend = TRUE)
And a mosaic display (as indicated in your title) would be:
mosaicplot(t(subs_count), col = subs_pal, off = 5, las = 1, main = "")
For visualizing patterns of dependence (or rather departures from independence) a mosaic plot shaded with residuals from the independence model might be even more useful.
mosaicplot(t(subs_count), shade = TRUE, off = 5, las = 1, main = "")
More refined versions of shaded mosaic displays are available in package vcd (see doi:10.18637/jss.v017.i03).

Plot correlation matrix with R in specific data range

I have used corrplot package to plot my data-pairs. But all the relationships in my data are positive.
Mydata<-read.csv("./xxxx.csv")
M <-cor(Mydata)
corrplot(M,,col=rev(brewer.pal(n=8, name="RdYlBu")))
Using ggcorr, I also can't find any solution to deal with the issue.
How to generate a user-defined colormap with the corresponding range from 0 to 1?
If you are trying to map the entire range of the colormap to only the positive correlations, you could use col = rep(rev(brewer.pal(n=8, name="RdYlBu")), 2). This repeats the color sequence, and then cl.lim = c(0,1) forces corrplot to use only the 2nd half of the sequence, mapped to the range 0 to 1.
par(xpd=T)
corrplot(M,,'upper',
col = rep(rev(brewer.pal(n=8, name="RdYlBu")), 2),
cl.lim = c(0,1),
mar = c(1, 0, 1, 0))
Some reproducible data
set.seed(12)
x = (1:100)/100
Mydata = data.frame(a=x^runif(1, 0, 50),
b=x^runif(1, 0, 50),
c=x^runif(1, 0, 50),
d=x^runif(1, 0, 50),
e=x^runif(1, 0, 50),
f=x^runif(1, 0, 50),
g=x^runif(1, 0, 50),
h=x^runif(1, 0, 50),
i=x^runif(1, 0, 50))
M = cor(Mydata)

Converting in R cartesian coordinates to barycentric ones

I have three reference vectors
a ( 0, 0, 1 )
b ( 0, 1, 0 )
c ( 1, 0, 0 )
and will have measurements such as
x( 0, 0.5, 0.3 )
which I want to plot in a 2D figure as a triangle, who edges would correspond to a, b and c.
In Matlab there is a straighforward function to do that
http://fr.mathworks.com/help/matlab/ref/triangulation.cartesiantobarycentric.html?s_tid=gn_loc_drop
does anyone know an equivalent in R or should I implement the maths?
Sure, you can go back and forth between cartesian and barycentric.
Bary to Cart:
library(geometry)
## Define simplex in 2D (i.e. a triangle)
X <- rbind(
c( 0, 0, 1 ),
c( 0, 1, 0 ),
c( 1, 0, 0 ))
## Cartesian cooridinates of points
beta <- rbind(c( 0, 0.5, 0.3 ),
c(0.1, 0.8, 0.1),
c(0.1, 0.8, 0.1))
## Plot triangle and points
trimesh(rbind(1:3), X)
text(X[,1], X[,2], 1:3) # Label vertices
P <- bary2cart(X, beta)
Cart to Bary:
## Define simplex in 2D (i.e. a triangle)
X <- rbind(c(0, 0),
c(0, 1),
c(1, 0))
## Cartesian cooridinates of points
P <- rbind(c(0.5, 0.5),
c(0.1, 0.8))
## Plot triangle and points
trimesh(rbind(1:3), X)
text(X[,1], X[,2], 1:3) # Label vertices
points(P)
cart2bary(X, P)

Resources