I have an array of 4 dimensions: location(3) x species(3) x Season(6) x Depth (2). Like this matrix 12 times.
Season = 1, depth = 1
[A] [B] [C]
[a] 12 52 55
[b] 13 14 235
[c] 13 76 355
I would like to merge everything in one big matrix like:
Season = 1, depth = 1
[A] [B] [C]
[a11] 12 52 55
[b11] 13 14 235
[c11] 13 76 355
[a12] 12 52 55
[b12] 13 14 235
[c12] 13 76 355
[a21] 12 52 55
[b21] 13 14 235
[c21] 13 76 355
...
and so on. The first number would refer to one extra dimension, and the second for the other one. Does it make sense? Any idea?
Thanks a lot!! :)
This transposes the array with aperm and then makes a matrix.
location = 3
species = 3
Season = 6
Depth = 2
set.seed(1)
myArr <- array(sample(1000, location * species * Season * Depth), dim = c(location, species, Season, Depth))
myArrPerm <- aperm(myArr, perm = c(1,3,4,2))
matrix(myArrPerm, ncol = species)
I need to build an algorithm which will:
For 116 existing observations of 2 variables x1 and x2 (plotted individually: one single point)
Create new observations by merging extreme points of 2 existing observations (ex: observation 117 will have 2 extreme points, (x1_115, x2_115) and (x1_30, x2_30)). Do this for all combinations.
If, for one combination, one pair dominates the other: x1_a < x1_b AND x2_a < x2_b, only select a.
For the new set of 116+n newly created variables, remove the dominated pairs, in the same logic as above.
Continue until we cannot create new non-dominated pairs.
I'm trying to solve this problem by creating independent functions for each operation. So far I have created the ConvexUnion function which merges extreme points (simply the union of 2 observations), but it does not take into account dominance yet.
ConvexUnion <- function(a,b){
output = NULL
for (i in 1:ncol(a)) {
u = unique(rbind(a[,i],b[,i]), incomparables = FALSE)
output = cbind(output, u)
}
output #the extreme points of the newly created pair
}
a = matrix(c(50,70), ncol = 2)
b = matrix(c(60,85), ncol = 2)
v = ConvexUnion(a,b)
TRAFO LABOR DELLV CLIENTS
1 49 15023 180119 11828
2 54 3118 212988 13465
3 31 6016 81597 4787
4 39 8909 127263 10291
5 9 1789 30095 2205
6 59 8327 190405 12045
7 95 11985 288146 16379
8 54 11309 208009 12252
9 13 3844 53631 4426
10 148 26348 459371 39831
11 17 3968 48798 3210
12 157 20131 366409 27050
13 18 4614 60366 4673
14 17 5941 49042 3950
15 77 6449 226815 12584
Here, the result for the new pair, which is the so-called convex union of a and b, would be (50,70) because a dominates b (both x1 and x2 are smaller).
How do I solve the problem?
I would like to know how to proceed with the following non linear regression analysis, which is a simplified version of my real problem.
5 Participants where asked to observe the speed of three different cars: Audis, VWs and Porsches over a ten second time frame. This gives me the following data set:
S_t_c <- read.table(text = "
time S_c_1 S_c_2 S_c_3
1 20 15 40
2 45 30 50
3 60 45 60
4 75 60 60
5 90 70 60
6 105 70 90
7 120 70 120
8 125 70 140
9 130 70 160
10 145 70 180
",header = T)
After observing the last 10 seconds, the 5 participants where then asked to guess how fast the car would go in t=11. This gives me this data:
S_11_i_c <-read.table(text = "
i c_1 c_2 c_3
1 150 70 190
2 155 70 200
3 150 75 195
4 160 80 190
5 150 75 180
",header = T)
I now want to execute a non linear regression to estimate the free parameters of the following model:
The indices stand for the following:
i= participant
c=car brand
s=time
My problems are the sums as well as the fact that I have to estimate the parameters based on three different observations sets (for each car one). So I do not know how to code sums into a regression and I have problems with the facts that my DVs are dependent on different time-series IVs. I would like to learn how to do this in R.
EDIT: Attempt at solving the problem.
What I managed to do so far is write w_s and Sum_S:
function (x) {
x = 0
for (j in 0:9) {
x <- x+ x^j
}
}
w_s = beta_2^s / function(beta_2)
Sum_S_t_c <- data.frame(
s = seq(1:9),
c_1 = rnorm(9)
c_2 = rnorm(9)
c_3 = rnorm(9)
)
Sum_S_t_c = 0
for (c in 2:4) {
for (s in 0:9) {
Sum_S_t_c[s,c] <- Sum_S_t_c + S_t_c[10-s, c]
Sum_S_t_c = Sum_S_t_c[s,c]
}
}
Now, I somehow need to fit these variables into a non-linear regression. This would be my dummy code for it:
For (c in 2:4) {
for (i in 1:5) {
for (s in 0:9) {
S_11_i_c ~ beta_0 + beta_1 * Sum_S_t_c[s,c] * beta_2^s / function(beta_2)
}
}
}
I also need to set an upper and lower limit for beta_2, which I do not know how to do. I also wonder, if it even possible to use a function within a regression?
Edit:
Should I possibly group the DV and IVS somehow? If so, is it possible to group variables of two different data tables together?
This question already has an answer here:
How to compute the size of the allocated memory for a general type
(1 answer)
Closed 9 years ago.
I was interested in the memory usage of matrices in R when I observed something strange. In a loop, I made grow up the number of columns of a matrix and computed, for each step, the object size like this:
x <- 10
size <- matrix(1:x, x, 2)
for (i in 1:x){
m <- matrix(1, 2, i)
size[i,2] <- object.size(m)
}
Which gives
plot(size[,1], size[,2], xlab="n columns", ylab="memory")
It seems that matrices with 2 rows and 5, 6, 7 or 8 columns use the exact same memory. How can we explain that?
To understand what's going on here, you need to know a little bit about the memory overhead associated with objects in R. Every object, even an object with no data, has 40 bytes of data associated with it:
x0 <- numeric()
object.size(x0)
# 40 bytes
This memory is used to store the type of the object (as returned by typeof()), and other metadata needed for memory management.
After ignoring this overhead, you might expect that the memory usage of a vector is proportional to the length of the vector. Let's check that out with a couple of plots:
sizes <- sapply(0:50, function(n) object.size(seq_len(n)))
plot(c(0, 50), c(0, max(sizes)), xlab = "Length", ylab = "Bytes",
type = "n")
abline(h = 40, col = "grey80")
abline(h = 40 + 128, col = "grey80")
abline(a = 40, b = 4, col = "grey90", lwd = 4)
lines(sizes, type = "s")
It looks like memory usage is roughly proportional to the length of the vector, but there is a big discontinuity at 168 bytes and small discontinuities every few steps. The big discontinuity is because R has two storage pools for vectors: small vectors, managed by R, and big vectors, managed by the OS (This is a performance optimisation because allocating lots of small amounts of memory is expensive). Small vectors can only be 8, 16, 32, 48, 64 or 128 bytes long, which once we remove the 40 byte overhead, is exactly what we see:
sizes - 40
# [1] 0 8 8 16 16 32 32 32 32 48 48 48 48 64 64 64 64 128 128 128 128
# [22] 128 128 128 128 128 128 128 128 128 128 128 128 136 136 144 144 152 152 160 160 168
# [43] 168 176 176 184 184 192 192 200 200
The step from 64 to 128 causes the big step, then once we've crossed into the big vector pool, vectors are allocated in chunks of 8 bytes (memory comes in units of a certain size, and R can't ask for half a unit):
# diff(sizes)
# [1] 8 0 8 0 16 0 0 0 16 0 0 0 16 0 0 0 64 0 0 0 0 0 0 0 0 0 0 0
# [29] 0 0 0 0 8 0 8 0 8 0 8 0 8 0 8 0 8 0 8 0 8 0
So how does this behaviour correspond to what you see with matrices? Well, first we need to look at the overhead associated with a matrix:
xv <- numeric()
xm <- matrix(xv)
object.size(xm)
# 200 bytes
object.size(xm) - object.size(xv)
# 160 bytes
So a matrix needs an extra 160 bytes of storage compared to a vector. Why 160 bytes? It's because a matrix has a dim attribute containing two integers, and attributes are stored in a pairlist (an older version of list()):
object.size(pairlist(dims = c(1L, 1L)))
# 160 bytes
If we re-draw the previous plot using matrices instead of vectors, and increase all constants on the y-axis by 160, you can see the discontinuity corresponds exactly to the jump from the small vector pool to the big vector pool:
msizes <- sapply(0:50, function(n) object.size(as.matrix(seq_len(n))))
plot(c(0, 50), c(160, max(msizes)), xlab = "Length", ylab = "Bytes",
type = "n")
abline(h = 40 + 160, col = "grey80")
abline(h = 40 + 160 + 128, col = "grey80")
abline(a = 40 + 160, b = 4, col = "grey90", lwd = 4)
lines(msizes, type = "s")
This seems to only happen for a very specific range of columns on the small end. Looking at matrices with 1-100 columns I see the following:
I do not see any other plateaus, even if I increase the number of columns to say, 10000:
Intrigued, I've looked at bit further, putting your code in a function:
sizes <- function(nrow, ncol) {
size=matrix(1:ncol,ncol,2)
for (i in c(1:ncol)){
m = matrix(1,nrow, i)
size[i,2]=object.size(m)
}
plot(size[,1], size[,2])
size
}
Interestingly, we still see this plateau and straight line in low numbers if we increase the number of rows, with the plateau shrinking and moving backwards, before finally adjusting to a straight line by the time we hit nrow=8:
Indicating that this happens for a very specific range for the number of cells in a matrix; 9-16.
Memory Allocation
As #Hadley pointed out in his comment, there is a similar thread on memory allocation of vectors. Which comes up with the formula: 40 + 8 * floor(n / 2) for numeric vectors of size n.
For matrices the overhead is slightly different, and the stepping relationship doesn't hold (as seen in my plots). Instead I have come up with the formula 208 + 8 * n bytes where n is the number of cells in the matrix (nrow * ncol), except where n is between 9 and 16:
Matrix size - 208 bytes for "double" matrices, 1 row, 1-20 columns:
> sapply(1:20, function(x) { object.size(matrix(1, 1, x)) })-208
[1] 0 8 24 24 40 40 56 56 120 120 120 120 120 120 120 120 128 136 144
[20] 152
HOWEVER. If we change the type of the matrix to Integer or Logical, we do see the stepwise behaviour in memory allocation described in the thread above:
Matrix size - 208 bytes for "integer" matrices 1 row, 1-20 columns:
> sapply(1:20, function(x) { object.size(matrix(1L, 1, x)) })-208
[1] 0 0 8 8 24 24 24 24 40 40 40 40 56 56 56 56 120 120 120
[20] 120
Similarly for "logical" matrices:
> sapply(1:20, function(x) { object.size(matrix(1L, 1, x)) })-208
[1] 0 0 8 8 24 24 24 24 40 40 40 40 56 56 56 56 120 120 120
[20] 120
It is surprising that we do not see the same behaviour with a matrix of type double, as it is just a "numeric" vector with a dim attribute attached (R lang specification).
The big step we see in memory allocation comes from R having two memory pools, one for small vectors and one for Large vectors, and that happens to be where the jump is made. Hadley Wickham explains this in detail in his response.
Look at the numeric vector with size from 1 to 20, I got this figure.
x=20
size=matrix(1:x,x,2)
for (i in c(1:x)){
m = rep(1, i)
size[i,2]=object.size(m)
}
plot(size[,1],size[,2])
This is a similar question to that posted in Regression (logistic) in R: Finding x value (predictor) for a particular y value (outcome). I am trying to find the x value for a known y value (in this case 0.000001) obtained from fitting a log normal curve fitted to sapling densities at distances from parent trees using a genetic algorithm. This algorithm gives me the a and b parameters of the best-fit log normal curve.
I have obtained the value of x for y=0.00001 for other curves, such as negative exponential, by using uniroot using this code (which works well for these curves):
##calculate x value at y=0.000001 (predicted near-maximum recruitment distance)
aparam=a
bparam=b
testfn <- function (y, aparam, bparam) {
## find value of x that satisfies y = a + bx
fn <- function(x) (a * exp(-b * x)) - y
uniroot(fn, lower=0, upper= 100000000)$root
}
testfn(0.000001)
Unfortunately, the same code using a log normal formula does not work. I have tried to use uniroot by setting the lower boundary above zero. But get an error code:
Error in uniroot(fn, lower = 1e-16, upper = 1e+18) :
f() values at end points not of opposite sign
My code and data (given below the code) is:
file="TR maire 1mbin.txt"
xydata <- read.table(file,header=TRUE,col.names=c('x','y'))
####assign best parameter values
a = 1.35577
b = 0.8941521
#####Plot model against data
par(mar=c(5,5,2,2))
xvals=seq(1,max(xydata$x),1)
plot(jitter(xydata$x), jitter(xydata$y),pch=1,xlab="distance from NCA (m)",
ylab=quote(recruit ~ density ~ (individuals ~ m^{2~~~ -1})))
col2="light grey"
plotmodel <- a* exp(-(b) * xvals)
lines(xvals,plotmodel,col=col2)
####ATTEMPT 1
##calculate x value at y=0.000001 (predicted near-maximum recruitment distance)
aparam=a
bparam=b
testfn <- function (y, aparam, bparam) {
fn <- function(x) ((exp(-(((log(x/b)) * (log(x/b)))/(2*a*a))))/(a * x * sqrt(2*pi))) - y
uniroot(fn, lower=0.0000000000000001, upper= 1000000000000000000)$root
}
testfn(0.000001)
data is:
xydata
1 1 0.318309886
2 2 0.106103295
3 2 0.106103295
4 2 0.106103295
5 3 0.063661977
6 4 0.045472841
7 5 0.035367765
8 5 0.035367765
9 7 0.048970752
10 8 0.021220659
11 8 0.021220659
12 8 0.042441318
13 9 0.018724111
14 10 0.016753152
15 10 0.016753152
16 12 0.013839560
17 13 0.025464791
18 16 0.010268061
19 17 0.009645754
20 24 0.013545102
21 25 0.032480601
22 26 0.043689592
23 27 0.006005847
24 28 0.011574905
25 31 0.062618338
26 32 0.005052538
27 42 0.003835059
28 42 0.003835059
29 44 0.003658734
30 46 0.003497911
31 48 0.006701261
32 50 0.003215251
33 50 0.006430503
34 51 0.006303166
35 58 0.002767912
36 79 0.002027452
37 129 0.003715680
38 131 0.001219578
39 132 0.001210304
40 133 0.001201169
41 144 0.001109094
42 181 0.000881745
43 279 0.001142944
44 326 0.000488955
Or is there another way of approaching this?
I'm an ecologist and sometimes R just does not make sense!
Seems like there were some errors in my r code, but the main problem is that my lower limit was too low and the Log Normal curve does not extend to that value (my interpretation). The solution that works for me is:
### define the formula parameter values
a = 1.35577
b = 0.8941521
### define your formula (in this instance a log normal) in the {}
fn <- function(x,a,b,y) { ((exp(-(((log(x/b)) * (log(x/b)))/(2*a*a))))/(a * x * sqrt(2*pi))) - y}
###then use uniroot()$root calling the known parameter values and defining the value of y that is of interest (in this case 0.000001)
uniroot(fn,c(1,200000),a=a,b=b,y=0.000001)$root