How can I match coordinates with their associated values in R - r

I have 250 points that I generated within a rectangle (-4,4)x(-6,6). If the popints are within a certain space they are blue and if they are outside of that space they are red.
The code I used for this is here, where i defined the confined space with squares:
library(sf)
border <- matrix(c(
-6, -4,
-6, 4,
6, 4,
6, -4,
-6, -4
), ncol = 2, byrow = TRUE) |>
sfheaders::sfc_polygon()
# sample random points
rand_points <- st_sample(border, size = 250)
squares1 <- matrix(c(
-4, 0,
-4, 3,
-1, 3,
-1, 0,
-4, -0
), ncol = 2, byrow = TRUE) |>
sfheaders::sfc_polygon()
squares2 <- matrix(c(
-2, -4,
-2, -1,
1, -1,
1, -4,
-2, -4
), ncol = 2, byrow = TRUE) |>
sfheaders::sfc_polygon()
squares3 <- matrix(c(
2, -2,
2, 1,
5, 1,
5, -2,
2, -2
), ncol = 2, byrow = TRUE) |>
sfheaders::sfc_polygon()
squares <- c(squares1, squares2, squares3)
red_vals <- st_difference(rand_points, squares)
blue_vals <- st_intersection(rand_points, squares)`
plot(border)
plot(negative_vals, add = TRUE, col = "red")
plot(positive_vals, add = TRUE, col = "blue")
My goal is to match the points' coordinates with their expected value. Example:
In the table, the third column is for the blue points and the fourth column for the red. If the point at that coordinate is blue it gets a +1 and if it is not blue at that coordinate -1, and vice versa for the red points.
So far, I have attained the coordinates of all the points.
y <- c(red_vals)
x <- c(blue_vals)
cdata <- c(x, y)
coord <- st_coordinates(cdata)`
I am now stuck on trying to figure out how I can classify x and y to their respective coordinates and indicate this in a dataframe.
Any help is appreciated.

You could do:
red_vals <- rand_points[rowSums(st_intersects(rand_points, squares, F)) == 0]
blue_vals <- st_intersection(rand_points, squares)
df <- rbind(cbind(st_coordinates(red_vals), PosGroup = 1, NegGroup = -1),
cbind(st_coordinates(red_vals), PosGroup = -1, NegGroup = 1)) |>
as.data.frame()
head(df)
#> X Y PosGroup NegGroup
#> X1 -5.2248158 0.03710509 1 -1
#> X2 -5.8932331 -1.41421992 1 -1
#> X3 -0.0609895 0.26541100 1 -1
#> X4 1.7345333 -3.04312404 1 -1
#> X5 -4.6801643 0.24656851 1 -1
#> X6 1.3190239 3.36491623 1 -1
Obviously the first few values are all red dots.
We can see that the points are correct by using this data frame to draw points in ggplot:
library(ggplot2)
df %>%
ggplot() +
geom_sf(data = squares) +
geom_point(aes(X, Y, color = factor(PosGroup)), pch = 1, size = 3) +
theme_classic() +
scale_color_brewer(palette = "Set1", direction = -1)

Related

When plotting a correlation matrix with pairs(), how to display long column names in many lines?

After drawing the correlation matrix in R, I try to read it in binary form in an external program.
However, each column name is too long, so in the correlation matrix drawn in pairs(), parts of both sides of the column names are cut off.
If the column name is this long, is there a way to make these column names appear in multiple lines?
And can we increase the fontsize of the column names to increase readability?
This is the sample code.
In this case, for example, I want the column
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
to look like
AAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAA
in two lines.
a <- runif(100, 0, 30) + rnorm(100, 5, 1) + 10
b <- runif(100, 0, 30) + rnorm(100, 5, 1) + 10
c <- runif(100, 0, 30) + rnorm(100, 5, 1) + 10
d <- runif(100, 0, 30) + rnorm(100, 5, 1) + 10
e <- runif(100, 0, 30) + rnorm(100, 5, 1) + 10
f <- runif(100, 0, 30) + rnorm(100, 5, 1) + 10
g <- runif(100, 0, 30) + rnorm(100, 5, 1) + 10
h <- runif(100, 0, 30) + rnorm(100, 5, 1) + 10
i <- runif(100, 0, 30) + rnorm(100, 5, 1) + 10
df <- data.frame(AAAAAAAAAAAAAAAAAAAAAAAAAAA = a,
BBBBBBBBBBBBBBBBBBBBBBBBBBB = b,
CCCCCCCCCCCCCCCCCCCCCCCCCCC = c,
DDDDDDDDDDDDDDDDDDDDDDDDDDD = d,
EEEEEEEEEEEEEEEEEEEEEEEEEEE = e,
FFFFFFFFFFFFFFFFFFFFFFFFFFF = f,
GGGGGGGGGGGGGGGGGGGGGGGGGGG = g,
HHHHHHHHHHHHHHHHHHHHHHHHHHH = h,
IIIIIIIIIIIIIIIIIIIIIIIIIII = i)
pairs(df,
lower.panel = NULL,
upper.panel = function(x, y){
points(x,y,pch=20)
r <- round(cor(x, y, use = "complete.obs"), digits=2)
txt <- paste0("R = ", r)
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
text(1, 0.95, txt, col="red", pos=2, cex=1.0)
},
)
I would begin by using the function stringi::stri_extract_all on a long label to break it down into chunks of at most ten characters
longlabel <- "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
small_chunks <- stringi::stri_extract_all(longlabel,
regex = ".{1,10}")[[1]]
After that, you can use paste to get
betterlabel <- paste(small_chunks, collapse = "\n")
And test that it works:
hist(rnorm(100), main = betterlabel)
On your example, you would need to wrap all that into an sapply to wrap the column names of df, like so:
colnames(df) <- sapply(stringi::stri_extract_all(colnames(df), regex = ".{1,10}"), paste, collapse = "\n")
to obtain the desired result:

How to make a 3D histogram with Plotly in R?

library(plotly)
fig1 <- plot_ly(data=DATA, x=~X, name="X",
type="histogram", histnorm="probability")
fig2 <- plot_ly(data=DATA, x=~Y, name="Y",
type="histogram", histnorm="probability")
subplot(fig1, fig2)
Suppose I have a dataset, DATA, and I already drew 2D histograms for both variable X and Y. Now I want to plot a 3D histogram of X and Y. Does anyone know how to do it? The description of Plotly with R includes creating 3D histograms, but I can't find the tutorial under either 3D Charts or Histograms. And the wild guess below just gives me a rotated 2D histogram.
fig <- plot_ly(data=DATA, x=~X, y=~Y,
type="histogram",
histnorm="probability"
) %>%
layout(scene=list(xaxis=list(title="X",zeroline=TRUE),
yaxis=list(title="Y",zeroline=TRUE),
zaxis=list(title="Frequency",zeroline=TRUE)
)
)
fig
Below you can find some preliminary ideas for drawing a 3D histogram with plotly.
See this link to understand how the add_3Dbar function works.
# The matrix with frequencies from a 3 x 4 cross table
z_mtx <- cbind(c(2,4,6,5), c(1,5,9,6), c(2,4,2,3))
# Define a function to add 3D bars
add_3Dbar <- function(p, x,y,z, width=0.4) {
w <- width
add_trace(p, type="mesh3d",
x = c(x-w, x-w, x+w, x+w, x-w, x-w, x+w, x+w),
y = c(y-w, y+w, y+w, y-w, y-w, y+w, y+w, y-w),
z = c(0, 0, 0, 0, z, z, z, z),
i = c(7, 0, 0, 0, 4, 4, 2, 6, 4, 0, 3, 7),
j = c(3, 4, 1, 2, 5, 6, 5, 5, 0, 1, 2, 2),
k = c(0, 7, 2, 3, 6, 7, 1, 2, 5, 5, 7, 6),
facecolor = rep(toRGB(viridisLite::inferno(6)), each = 2))
}
# Draw the 3D histogram
fig <- plot_ly()
for (k1 in 1:nrow(z_mtx)) {
for (k2 in 1:ncol(z_mtx)) {
fig <- fig %>% add_3Dbar(k1,k2,z_mtx[k1,k2])
}
}
fig

Calculate row specific based on min

My data looks like this
df <- data.frame(x = c(3, 5, 4, 4, 3, 2),
y = c(.9, .8, 1, 1.2, .5, .1))
I am trying to multiply each x value by either y or 1, depending on which has the least value.
df$z <- df$x * min(df$y, 1)
The problem is it is taking the min of the whole column, so it is multiplying every x by 0.1.
Instead, I need x multiplied by .9, .8, 1, 1, .5, .1...
We need pmin that will go through each value of 'y' and get the minimum val when it is compared with the second value (which is recycled)
pmin(df$y, 1)
#[1] 0.9 0.8 1.0 1.0 0.5 0.1
Likewise, we can have n arguments (as the parameter is ...)
pmin(df$y, 1, 0)
#[1] 0 0 0 0 0 0
To get the output, just multiply 'x' with the pmin output
df$x * pmin(df$y, 1)
which can also be written as
with(df, x * pmin(y, 1))
Maybe you could use an ifelse function:
df <- data.frame(x = c(3, 5, 4, 4, 3, 2),
y = c(.9, .8, 1, 1.2, .5, .1))
df$z = ifelse(df$y<1, df$x*df$y, df$x*1)
This will compare the values of each row.
Hope it helps! :)

Set X Axis Range on Plotly Frequency Graph

Consider the following graph:
library(dplyr)
library(plotly)
x <- c(1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 4, 5, 6, 8, 8, 8, 10, 10)
y <- as.data.frame(table(x))
y %>%
plot_ly(x = ~ x,
y = ~ Freq,
type = 'bar')
I would like to take this graph and produce a similar graph in which the values 7 and 9 are listed with a frequency of zero. Is there a way to get a frequency count of a sequence like seq(0, 10, 1) where 7 and 9 would appear as a frequency of 0 or is there a way that I can set the x axis on my plotly graph to be from 0 to 10 even though I don't have all the numbers in my data?
I have tried
y %>%
plot_ly(x = ~ x,
y = ~ Freq,
type = 'bar') %>%
layout(xaxis = list(autotick = FALSE, tick0 = 0, tickd = seq(0, 10, 1))
and also
layout(xaxis = list(autotick = FALSE, tick0 = 0, tickd = c(0,10))
but neither one seems to change anything.
I would like my desired output to look like this:
Note that This is just a small sample and my actual data will be much larger. Because of this, something like looping through the data and counting every number would be too slow.
A simple solution is to convert x as a factor with levels from 1 to 10.
library(dplyr)
library(plotly)
x <- c(1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 4, 5, 6, 8, 8, 8, 10, 10)
x <- factor(x, levels=1:10)
y <- as.data.frame(table(x))
y %>%
plot_ly(x = ~ x,
y = ~ Freq,
type = 'bar')

plot r two categorical variables

I am using below command to plot two categorical variables in R
gender has 2 levels and Income has 9 levels.
spineplot(main$Gender,main$Income, xlab="Gender", ylab="Income levels: 1 is lowest",xaxlabels=c("Male","Female"))
It produces chart like below
How can i plot this chart in color?
How can i show % of each income level within each box? for example female income level 1 has 21% of data. How can i show 21% within the dark colored area?
################update 1
Adding reproducible example
fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1,
1, 1, 1, 2, 1, 1, 1, 1, 1,2,2,2,2),
levels = c(1, 2), labels = c("male", "female"))
gender <- factor(rep(c(1:9),3))
spineplot(fail,gender)
I think it may be easier to do this with a barplot since spineplot doesn't return anything useful.
The default would be the following, but you can adjust the widths of the bars to some other variable (you can see the x-axis coordinates are returned):
par(mfrow = 1:2)
(barplot(table(gender, fail)))
# [1] 0.7 1.9
(barplot(table(gender, fail), width = table(fail)))
# [1] 10.7 26.9
With some final touches we get
tbl <- table(gender, fail)
prp <- prop.table(tbl, 2L)
yat <- prp / 2 + apply(rbind(0, prp[-nrow(prp), ]), 2L, cumsum)
bp <- barplot(prp, width = table(fail), axes = FALSE, col = rainbow(nrow(prp)))
axis(2L, at = yat[, 1L], labels = levels(gender), lwd = 0)
axis(4L)
text(rep(bp, each = nrow(prp)), yat, sprintf('%0.f%%', prp * 100), col = 0)
Compare to
spineplot(fail, gender, col = rainbow(nlevels(gender)))
An alternative to the interesting solution of #rawr is:
fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1,
1, 1, 1, 2, 1, 1, 1, 1, 1,2,2,2,2),
levels = c(1, 2), labels = c("male", "female"))
gender <- factor(rep(c(1:9),3))
mypalette <- colorRampPalette(c("lightblue","darkblue"))
tbl <- spineplot(fail, gender, xlab="Gender", ylab="Income levels: 1 is lowest",
xaxlabels=c("Male","Female"), col=mypalette(nlevels(gender)) )
print(tbl)
# Income levels: 1 is lowest
# Gender 1 2 3 4 5 6 7 8 9
# male 2 1 2 1 3 2 2 2 1
# female 1 2 1 2 0 1 1 1 2
print.perc <- function(k, tbl, ndigits=2, str.pct="%") {
# These lines of codes are the same used by from spineplot
# for the calculation of the x-position of the stacked bars
nx <- nrow(tbl)
off <- 0.02
xat <- c(0, cumsum(prop.table(margin.table(tbl, 1)) + off))
posx <- (xat[1L:nx] + xat[2L:(nx + 1L)] - off)/2
# Proportions by row (gender)
ptbl <- prop.table(tbl,1)
# Define labels as strings with a given format
lbl <- paste(format(round(100*ptbl[k,], ndigits), nsmall=ndigits), str.pct, sep="")
# Print labels
# cumsum(ptbl[k,])-ptbl[k,]/2 is the vector of y-positions
# for the centers of each stacked bar
text(posx[k], cumsum(ptbl[k,])-ptbl[k,]/2, lbl)
}
# Print income levels for males and females
strsPct <- c("%","%")
for (k in 1:nrow(tbl)) print.perc(k, tbl, ndigits=2, str.pct=strsPct[k])
Hope it can help you.

Resources