Elongated kde2d from MASS R library - r

When I use kde2d function for two points on square (in my case 1000 x 1000 px) from MASS package I get elongated gaussians when x difference of points is very different from y difference of points:
library(MASS)
library(tibble)
par(mfrow = c(2, 1))
points_1 <- tribble(
~x, ~y,
100, 800,
150, 500
) # x2-x1 = 50; y2-y1 = -300
kde_1 <- kde2d(points_1$x, points_1$y, n = 50, lims = c(1, 1000, 1, 1000))
image(kde_1)
points_2 <- tribble(
~x, ~y,
100, 800,
650, 700
) # x2-x1 = 550; y2-y1 = -100
kde_2 <- kde2d(points_2$x, points_2$y, n = 50, lims = c(1, 1000, 1, 1000))
image(kde_2)
How to obtain round kde2d for two pints? I need something like this:

As the help page for kde2d says, it will use the function bandwidth.nrd to compute the bandwidth in each coordinate. You want those to be the same, so specify the h value as a scalar:
h <- mean(bandwidth.nrd(points_1$x), bandwidth.nrd(points_1$y))
kde_3 <- kde2d(points_1$x, points_1$y, h = h, n = 50, lims = c(1, 1000, 1, 1000))
image(kde_3)
which gives me
You might want a larger value for h, e.g. using max instead of mean:

Related

how to draw a matrix image with R

I'm trying to draw a similar a matrix image like this using a known matrix. in this image each square represent the frequency of the corresponding number in vertical axis, and darker color square means high frequency of that number. For example, my known matrix could be generate as
Ture <- rep(8, 100)
PA <- rep(7, 100)
ED <- sample(6:8, 100, replace = T)
ER <- rep(0, 100)
IC1 <- sample(1:2, 100, replace = T)
NE <- sample(3:4, 100, replace = T)
BCV <- sample(5:7, 100, replace = T)
Oracle <- sample(5:6, 100, replace = T)
M <- rbind(Ture, PA, ED, ER, IC1, NE, BCV, Oracle)
Thanks very much!
Further to my comment above, you can do the following
image(M, axes = F, col = rev(gray.colors(12, start = 0, end = 1)))
axis(1, at = seq(0, 1, length.out = nrow(M)), labels = rownames(M))
axis(2, at = seq(0, 1, length.out = 11), labels = seq(0, 100, length.out = 11))

Specifying x values when converting approx() to data frame

I am trying to get a data frame from the output of approx(t,y, n=120) below. My intent is for the input values returned to be in increments of 0.25; for instance, 0, 0.25, 0.5, 0.75, ... so I've set n = 120.
However, the data frame I get doesn't return those input values.
t <- c(0, 0.5, 2, 5, 10, 30)
z <- c(1, 0.9869, .9478, 0.8668, .7438, .3945)
data.frame(approx(t, z, n = 120))
I appreciate any assistance in this matter.
There are 121, not 120, points from 0 to 30 inclusive in steps of 0.25
length(seq(0, 30, 0.25))
## [1] 121
so use this:
approx(t, z, n = 121)
Another approach is:
approx(t, z, xout = seq(min(t), max(t), 0.25))

How can I predict values in factorial experiments (2^k) with centre points in R?

How can I predict values in factorial experiments with centre points in R using FrF2 package with the predict function or using the broom package?
My code:
library(FrF2)
plan.person = FrF2(nfactors = 5, resolution = 5, replications = 2,
ncenter = 1, randomize = FALSE,
factor.names = list(
A = c(8, 5),
B = c(70, 30),
C = c(0.5, 0),
D = c(1000, 700),
E = c(70, 10)))
resp <- c(84.55, 66.34, -1, 69.18, 73.01, 64.52, 0.73, 47.61, 68.18, 59.87,
26, 72.57, 78.08, 73.81, 26, 59.38, 71.41, 88.64, 64.92, 4, 68.81,
80, 69.66, -1.36, 54.50, 79.24, 78.53, -1, 72.63, 89.97, 87.98,
-11, 65.68, 82.46)
newplan <- add.response(design = plan.person, response = resp)
model <- lm(newplan, use.center = T)
# summary(model)
d <- within(newplan, {
A <- as.numeric(as.character(A))
B <- as.numeric(as.character(B))
C <- as.numeric(as.character(C))
D <- as.numeric(as.character(D))
E <- as.numeric(as.character(E)) })
A = seq(5, 8, 1)
B = seq(30, 70, length.out = length(A))
C = seq(0, 0.5, length.out = length(A))
D = seq(700, 1000, length.out = length(A))
E = seq(10, 70, length.out = length(A))
data <- expand.grid(A = A, B = B,
C = C, D = D,
E = E)
dados$p <- predict(model, newdata=data)
Because of the center point the following message appears.
Error in model.frame.default (Terms, newdata, na.action = na.action, xlev = object $ xlevels):
   lengths of variables differ (found in 'center')
"A two-level experiment with center points can detect, but not fit, quadratic effects."
(https://www.itl.nist.gov/div898/handbook/pri/section3/pri336.htm)
That is, R can't predict these values because you need to make additional assumptions about what the curve looks like to predict points not at your design points.
Note that computationally, you can get the software to work by adding a center term. The error is because this term is in the regression but not in the data set. You could add one with data$center <- FALSE (because none of the points in data are at the center), but this will not do the right thing, as it will not take the potential curvature into account when predicting non-central points, it would simply predict a twisted plane (that is, linear with interactions) with a single bump at the center.
Of course, it's also equivalent to just fitting the model with use.center=FALSE, as the center point doesn't affect the fit of the other points.
If you remove the central value, you can this after model <- lm(newplan, use.center = T)
:
1- Filter the pvalues < 0.05
coe <- broom::tidy(model) %>%
slice(-7) %>% #remove center
filter(p.value < 0.05)
m_beta <- coe$estimate
2 - Do a grid:
A = seq(5, 8, 0.5)
B = seq(30, 70, length.out = length(A))
exp <- expand.grid(A = A, B = B) %>%
mutate(bo = as.numeric(1)) %>%
mutate(ult = A*B) %>%
select(bo, A, B, ult) %>%
as.matrix()
3: Do a Regression:
reg <- t(m_beta %*% t(exp))
exp <- cbind(exp, reg) %>%
as.data.frame() %>%
rename(reg = V5)
But I believe this only solves the computational problem or simplifies it. I believe linear regression should be redone as well. But with this code you can explore and see what other errors exist.

MASS packages' "fitdistr": Error when dealing with manipulated random data

Background:
Below I have generated some random beta data using R and manipulate the shape of the data a bit to arrive at what I call "Final" in my code. And I histogram "Final" in my code.
Question:
I'm wondering why when trying to fit a "beta" distribution to "Final" data using MASS packages' "fitdistr" function, I get the following error (Any suggestion how to avoid this error)?
Error in stats::optim(x = c(0.461379379270288, 0.0694261016478062, 0.76934266883081, :
initial value in 'vmmin' is not finite
Here is my R code:
require(MASS)
## Generate some data and manipulate it
set.seed(47)
Initial = rbeta(1e5, 2, 3)
d <- density(Initial)
b.5 <- dbeta(seq(0, 1, length.out = length(d$y)), 50, 50)
b.5 <- b.5 / (max(b.5) / max(d$y)) # Scale down to max of original density
b.6 <- dbeta(seq(0, 1, length.out = length(d$y)), 60, 40)
b.6 <- b.6 / (max(b.6) / max(d$y))
# Collect maximum densities at each x to use as sample probability weights
p <- pmax(d$y, b.5, b.6)
Final <- sample(d$x, 1e4, replace = TRUE, prob = p) ## THIS IS MY FINAL DATA
hist(Final, freq = F, ylim = c(0, 2)) ## HERE IS A HISTOGRAM
m <- MASS::fitdistr(Final, "beta", ## RUN THIS TO SEE HOW THE ERROR COMES UP
start = list(shape1 = 1, shape2 = 1))
Here is the code.
It is the same with your code, I just removed the negative beta values.
library(MASS)
set.seed(47)
Initial = rbeta(1e5, 2, 3)
d <- density(Initial)
b.5 <- dbeta(seq(0, 1, length.out = length(d$y)), 50, 50)
b.5 <- b.5 / (max(b.5) / max(d$y)) # Scale down to max of original
density
b.6 <- dbeta(seq(0, 1, length.out = length(d$y)), 60, 40)
b.6 <- b.6 / (max(b.6) / max(d$y))
# Collect maximum densities at each x to use as sample probability weights
p <- pmax(d$y, b.5, b.6)
Final <- sample(d$x, 1e4, replace = TRUE, prob = p) ## THIS IS MY FINAL DATA
hist(Final, freq = F, ylim = c(0, 2)) ## HERE IS A HISTOGRAM
# replace negative beta values with smallest value > 0
Final[Final<= 0] <- min(Final[Final>0])
hist(Final, freq = F, ylim = c(0, 2))
m <- MASS::fitdistr(x = Final, densfun = "beta",
start = list(shape1 = 1, shape2 = 1))
Here are the shape parameters:
> m
shape1 shape2
1.99240852 2.90219720
(0.02649853) (0.04010168)
Take note that it gives some warnings.

How to create a dataframe representing a 10000 points unit square?

I have to create a dataframe representing a unit square, shaped by 10 000 points. In orderd to achieve that, I need all the combinations between (coordinates) x and y, where each one goes from 0 to 1,00. The result should be something like this:
x y
1 0,01 0,01
2 0,01 0,02
n 0,12 0,04
10000 1,00 1,00
I would be very glad if you can help me.
10 000 points are just a 100x100 square.
Here I fix the value of y and describe the 100 values of x for this possibility.
To do this:
df<-data.frame(
x = rep(seq(from = 0, to = 1, length.out = 100), times = 100)
y = rep(seq(from = 0, to = 1, length.out = 100), each = 100)
)
Using #Heroka's suggestion, for the same output:
df<- expand.grid(x = seq(from = 0, to = 1, length.out = 100),
y = seq(from = 0, to = 1, length.out = 100)
)

Resources