How to create a dataframe representing a 10000 points unit square? - r

I have to create a dataframe representing a unit square, shaped by 10 000 points. In orderd to achieve that, I need all the combinations between (coordinates) x and y, where each one goes from 0 to 1,00. The result should be something like this:
x y
1 0,01 0,01
2 0,01 0,02
n 0,12 0,04
10000 1,00 1,00
I would be very glad if you can help me.

10 000 points are just a 100x100 square.
Here I fix the value of y and describe the 100 values of x for this possibility.
To do this:
df<-data.frame(
x = rep(seq(from = 0, to = 1, length.out = 100), times = 100)
y = rep(seq(from = 0, to = 1, length.out = 100), each = 100)
)
Using #Heroka's suggestion, for the same output:
df<- expand.grid(x = seq(from = 0, to = 1, length.out = 100),
y = seq(from = 0, to = 1, length.out = 100)
)

Related

how to draw a matrix image with R

I'm trying to draw a similar a matrix image like this using a known matrix. in this image each square represent the frequency of the corresponding number in vertical axis, and darker color square means high frequency of that number. For example, my known matrix could be generate as
Ture <- rep(8, 100)
PA <- rep(7, 100)
ED <- sample(6:8, 100, replace = T)
ER <- rep(0, 100)
IC1 <- sample(1:2, 100, replace = T)
NE <- sample(3:4, 100, replace = T)
BCV <- sample(5:7, 100, replace = T)
Oracle <- sample(5:6, 100, replace = T)
M <- rbind(Ture, PA, ED, ER, IC1, NE, BCV, Oracle)
Thanks very much!
Further to my comment above, you can do the following
image(M, axes = F, col = rev(gray.colors(12, start = 0, end = 1)))
axis(1, at = seq(0, 1, length.out = nrow(M)), labels = rownames(M))
axis(2, at = seq(0, 1, length.out = 11), labels = seq(0, 100, length.out = 11))

Elongated kde2d from MASS R library

When I use kde2d function for two points on square (in my case 1000 x 1000 px) from MASS package I get elongated gaussians when x difference of points is very different from y difference of points:
library(MASS)
library(tibble)
par(mfrow = c(2, 1))
points_1 <- tribble(
~x, ~y,
100, 800,
150, 500
) # x2-x1 = 50; y2-y1 = -300
kde_1 <- kde2d(points_1$x, points_1$y, n = 50, lims = c(1, 1000, 1, 1000))
image(kde_1)
points_2 <- tribble(
~x, ~y,
100, 800,
650, 700
) # x2-x1 = 550; y2-y1 = -100
kde_2 <- kde2d(points_2$x, points_2$y, n = 50, lims = c(1, 1000, 1, 1000))
image(kde_2)
How to obtain round kde2d for two pints? I need something like this:
As the help page for kde2d says, it will use the function bandwidth.nrd to compute the bandwidth in each coordinate. You want those to be the same, so specify the h value as a scalar:
h <- mean(bandwidth.nrd(points_1$x), bandwidth.nrd(points_1$y))
kde_3 <- kde2d(points_1$x, points_1$y, h = h, n = 50, lims = c(1, 1000, 1, 1000))
image(kde_3)
which gives me
You might want a larger value for h, e.g. using max instead of mean:

Specifying x values when converting approx() to data frame

I am trying to get a data frame from the output of approx(t,y, n=120) below. My intent is for the input values returned to be in increments of 0.25; for instance, 0, 0.25, 0.5, 0.75, ... so I've set n = 120.
However, the data frame I get doesn't return those input values.
t <- c(0, 0.5, 2, 5, 10, 30)
z <- c(1, 0.9869, .9478, 0.8668, .7438, .3945)
data.frame(approx(t, z, n = 120))
I appreciate any assistance in this matter.
There are 121, not 120, points from 0 to 30 inclusive in steps of 0.25
length(seq(0, 30, 0.25))
## [1] 121
so use this:
approx(t, z, n = 121)
Another approach is:
approx(t, z, xout = seq(min(t), max(t), 0.25))

Achieving t random variables with each different df and ncp in R?

I'm trying to generate 5 random t variates using rt(), with each of the 5 having a particular df (respectively, from 1 to 5) and a particular ncp (respectively, seq(0, 1, l = 5)). So, 5 random t-variables each having a different df and a different ncp.
To achieve the above, I tried the below with no success. What could be the efficient R code to achieve what I described above?
vec.rt = Vectorize(function(n, df, ncp) rt(n, df, ncp), c("n", "df", "ncp"))
vec.rt(n = 5, df = 1:5, ncp = seq(0, 1, l = 5))
Or
mapply(FUN = rt, n = 5 , df = 1:5, ncp = seq(0, 1, l = 5))
Notice for:
rt(n = 5, df = 1:5, ncp = seq(0, 1, l = 5))
R gives the following warning:
Warning message:
In if (is.na(ncp)) { :
the condition has length > 1 and only the first element will be used
Rephrasing your question helps to find an answer: you want sample of length 1 (n = 1) from 5 random variables each having different parameters.
mapply(FUN = rt, n = 1 , df = 1:5, ncp = seq(0, 1, l = 5))

Average Cells of Two or More DataFrames

So I currently have 3 data frames that I need to average each cell in, and I am at a loss of how to do this... Essentially, I need to obtain the mean of the first observation in column 1 for df1, df2, df3, and like that for every single observation.
Here is a reproducible sample data.
set.seed(789)
df1 <- data.frame(
a = runif(100, 0, 100),
b = runif(100, 0, 100),
c = runif(100, 0, 100),
d = runif(100, 0, 100))
df2 <- data.frame(
a = runif(100, 0, 100),
b = runif(100, 0, 100),
c = runif(100, 0, 100),
d = runif(100, 0, 100))
df3 <- data.frame(
a = runif(100, 0, 100),
b = runif(100, 0, 100),
c = runif(100, 0, 100),
d = runif(100, 0, 100))
I need to create a fourth data frame of dimensions 100 by 4 that is the result of averaging each cell across the first three dataframes. Any ideas are highly appreciated!
We can do this with Reduce with + and divide by the number of datasets in a list. This has the flexibility of keeping 'n' number of datasets in a list
dfAvg <- Reduce(`+`, mget(paste0("df", 1:3)))/3
Or another option is to convert to array and then use apply, which also have the option of removing the missing values (na.rm=TRUE)
apply(array(unlist(mget(paste0("df", 1:3))), c(dim(df1), 3)), 2, rowMeans, na.rm = TRUE)
As #user20650 mentioned, rowMeans can be applied directly on the array with the dim
rowMeans(array(unlist(mget(paste0("df", 1:3))), c(dim(df1), 3)), dims=2)

Resources