This might sound like a School assignment but it is not!
I have made a recursive function returning a value from the Fibonacci Sequence.
let rec FoneFive n =
match n with
| 1 | 2 -> 1
| n -> FoneFive(n-1) + FoneFive(n-2)
printfn "%A" (FoneFive 6)
What is going on in this recursive function? FoneFive 6 gives 8 as it should. But why?
The way I see it: It starts with n=6 and concludes that 6 is not 1 or 2. So it calls FoneFive(n-1) + FoneFive(n-2). (This is probably where I get it wrong. But the way I see it is that this return nothing unless n is 1 or 2. So from my point of view it will narrow both down n = 1 or 2 and there by say 1 + 1 which of course is 2.)
Can someone tell me how it returns 8 ?
Calculating FoneFive(6) requires to calculate FoneFive(5) and FoneFive(4)
(as 5 and 4 are n-1 and n-2 for n=6)
Calculating FoneFive(5) requires to calculate FoneFive(4) and FoneFive(3)
(as 4 and 3 are n-1 and n-2 for n=5)
Calculating FoneFive(4) requires to calculate FoneFive(3) and FoneFive(2)
(as 3 and 2 are n-1 and n-2 for n=4)
Calculating FoneFive(3) requires to calculate FoneFive(2) and FoneFive(1)
(as 2 and 1 are n-1 and n-2 for n=3)
Both FoneFive(1) and FoneFive(2) returns 1
so FoneFive(3) = FoneFive(2) + FoneFive(1) = 1 + 1 = 2
so FoneFive(4) = FoneFive(3) + FoneFive(2) = 2 + 1 = 3
so FoneFive(5) = FoneFive(4) + FoneFive(3) = 3 + 2 = 5
so FoneFive(6) = FoneFive(5) + FoneFive(4) = 5 + 3 = 8
Okay so i get it now. It so to speak splits it selv up into two pieces every time n is not 1 or 2 and then again splits itself of to two pieces if that isn't 1 or 2 either.
f6 = f5 + f4
f5 + f4 = f4 + f3 + f3 + (f2=1)
f4 + f3 + f3 + (f2=1) = f3 + (f2=1) + (f2=1) + (f1=1) + (f2=1) + (f1=1) + 1
f3 + 1 + 1 + 1 + 1 + 1 + 1 = (f2=1) + (f1=1) + 1 + 1 + 1 + 1 + 1 + 1
(f2=1) + (f1=1) + 1 + 1 + 1 + 1 + 1 + 1 = 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1
1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 = 8
Related
I am working with the R programming language.
I am trying to simulate random discrete data that contains "correlations" between the variables. For example, this is what I have tried so far (I generated random continuous data with correlations, and converted all values below a certain threshold to 0 else 1):
library(mvtnorm)
n <- 11
A <- matrix(runif(n^2)*2-1, ncol=n)
s <- t(A) %*% A
my_data = MASS::mvrnorm(100, mu = c(rnorm(11,10,1)), Sigma = s)
my_data = data.frame(my_data)
colnames(my_data)[1] <- 'p1'
colnames(my_data)[2] <- 'p2'
colnames(my_data)[3] <- 'p3'
colnames(my_data)[4] <- 'p4'
colnames(my_data)[5] <- 'p5'
colnames(my_data)[6] <- 'p6'
colnames(my_data)[7] <- 'p7'
colnames(my_data)[8] <- 'p8'
colnames(my_data)[9] <- 'p9'
colnames(my_data)[10] <- 'p10'
colnames(my_data)[11] <- 'result'
my_data[my_data < 9] <- 0
my_data[my_data > 9] <- 1
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 result
1 1 1 1 0 1 1 1 0 0 0 0
2 0 1 1 1 1 0 1 1 1 1 1
3 1 1 1 0 1 0 1 1 1 1 1
4 1 1 1 0 1 1 1 1 1 1 1
5 0 1 1 1 1 0 1 1 0 0 0
6 1 0 1 0 1 1 1 0 1 1 1
I am not sure if I have done this correctly - sure, I have simulated random discrete data, but I am not sure if I have preserved the correlation structure within the data. For instance, I would have liked there to be correlation patterns such as:
When p1 = p5 = p9 = 1 -> "results" are more likely to be 1 (i.e. take all rows where p1 = p5 = p9 = 1 and measure the percentage of 1's in the results column)
When p3 = p4 = 0 and p9 = 1 -> "results" are more likely to be 0
etc.
Is there some other way to do this?
Thanks!
If you are happy with p1 through p10 and just want to use your stated rules to generate the result column, then you do a kind of reverse logistic regression. First of all, set up your rules to give you numerical results. Here, we get a 1 if p1 = p5 = p9 = 1, and we get -1 if p3 = 0, p4 = 0, p9 = 1:
log_odds <- with(my_data, p1 * p5 * p9)
log_odds <- with(my_data, result - (1 - p3) * (1 - p4) * p9)
Now we convert these to probabilities of getting a 1 in our results column:
odds <- exp(log_odds)
probs <- odds / (1 + odds)
Finally, we use probs to generate a binomial sample:
my_data$result <- rbinom(nrow(my_data), size = 1, prob = probs)
We can see that overall our sample has about a 50% chance of having a 1 or 0:
table(my_data$result)
#> 0 1
#> 47 53
But the odds of a 1 are much increased when p1 = p5 = p9 = 1
table(my_data$result[with(my_data, p1 == 1 & p5 == 1 & p9 == 1)])
#> 0 1
#> 3 18
It is possible to control the background probability and strength of correlations by adjusting the weightings for log_odds
I got three equations.
1) 0+0=0
2) 1+0=1
3) 1+1=10
I have to calculate this with the help of the three equations
i) 1+0+0+0=?
ii) 0+1+0=?
iii) 1+1+0=?
But I don't know how to start
This is obviously homework so I'm not going to give you the answers. Instead, I'll leave you a few hints.
0 + 1 + 0 = (0 + 1) + 0 = (1 + 0) + 0
1 + 1 + 0 = (1 + 1) + 0 = 10 + 0
Binary and Decimal represent the same numbers just in a different way.
You can check your answers using the following mapping from decimal to binary.
Decimal -> Binary
0 = 0 + 0 + 0 -> 0
1 = 0 + 0 + 2^0 -> 1
2 = 0 + 2^1 + 0 -> 10
3 = 0 + 2^1 + 2^0 -> 11
4 = 2^2 + 0 + 0 -> 100
5 = 2^2 + 0 + 2^0 -> 101
6 = 2^2 + 2^1 + 0 -> 110
7 = 2^2 + 2^1 + 2^0 -> 111
and so on.
Binary is just another way of writing the same old same old numbers. What is 1+1 in a normal amount number system? When you add numbers together you can be transitive normally:
(a+b)+c = a+(b+c)
and
a+b = b+a
How would the recursive sequence a(n)=-a(n-1)+n-1 be solved?
I tried forward and backward iterations but haven't been able to get a explicit solution for a(n).
Your first step should be to write out a result table
f(n)=x
n | x
-----
0 | 7
1 | -7 (-7 + 1 - 1)
2 | 8 ( 7 + 2 - 1)
3 | -6 (-8 + 3 - 1)
4 | 9 ( 6 + 4 - 1)
5 | -5 (-9 + 5 - 1)
6 | 10 ( 5 + 6 - 1)
7 | -4 (-10 + 7 - 1)
8 | 11 ( 4 + 8 - 1)
9 | -3 (-11 + 9 - 1)
You should see a pattern emerging. Each pair of solutions [(0, 1), (2, 3), (4, 5), ...] have a difference of 14, starting with (7, -7) and incrementing one every two points of n. We can generalize this:
f(0) = 7
f(n) = 7 + k - 14 * b
where k is the increment value (each 1 k per 2 n)
b is 1 when n is odd, else 0.
Now we just have to define k and b in terms of n. k shouldn't be too hard, let's see:
n | k
0 | 0
1 | 0
2 | 1
3 | 1
Does that remind you of anything? That's a floored div2.
7 + (n // 2) - 14 * b
Now for b
n | b
0 | 0
1 | 1
2 | 0
3 | 1
That looks like mod 2 to me! Modulo is the remainder of a division problem, and is a great way to check if a number is even or odd. We're looking for the plain modulo, too, since we want b==1 when n is odd and vice versa.
f(0) = 7
f(n) = 7 + (n // 2) - 14 * (n%2)
where (//) is the floor division function
(%) is the modulo function
Now we can put that all together in a function. In Go this is:
func f(n int) int {
return 7 + n/2 - 14 * (n % 2)
}
In Python it's
def f(n):
return 7 + n//2 - 14 * (n%2)
In Haskell we've got
f :: Int -> Int
f n = 7 + n `div` 2 - 14 * (n `mod` 2)
or, since Haskell implements recursion exceedingly well, simply...
f :: Int -> Int
f 0 = 7
f n = f (n-1) + n - 1
I have categorical data that I'd like to map the frequency of using a heatmap (geom_tile), much like the example below:
data("mtcars")
freq <- data.frame(xtabs(~cyl + gear, mtcars)) #count number of 4,6,8 cyl cars by gear
ggplot(freq, aes(cyl, gear)) +
geom_tile(aes(fill = Freq)) +
scale_fill_gradient(low = "white",high = "steelblue")
But I'd like to split each tile according to the proportion of significant or non-significant results (0-1 values). In this example, I would generate the same frequency count but differentiate between automatic and manual transmission (am)
freq_am <- data.frame(xtabs(~cyl + gear + am, mtcars))
print(freq_am)
#cyl gear am Freq
4 3 0 1
6 3 0 2
8 3 0 12
4 4 0 2
6 4 0 2
8 4 0 0
4 5 0 0
6 5 0 0
8 5 0 0
4 3 1 0
6 3 1 0
8 3 1 0
4 4 1 6
6 4 1 2
8 4 1 0
4 5 1 2
6 5 1 1
8 5 1 2
The resulting heatmap would have (for example) blue for values of am==0 and red for am==1. Each tile would be divided (along a diagonal?) according to the proportion of cars of that type that are automatic (am==0) or manual (am==1). The shades of blue and red would be proportionate to the count, just as the gradient already reflects.
For example:
the top left tile (4,5) would be completely light red because all of the 4-cyl, 5-gear cars (count = 2) are manual
the middle left tile (4,4) would be 1/4 blue and 3/4 red because 25% of the 4-gear, 4-cyl cars are automatic (count = 2) and 75% are manual (count = 6)
the bottom left tile (4,3) would be completely lightest blue because all of the 4-cyl, 3-gear cars (count = 1) are automatic
This is a second and hopfully complete attempt to answer the question by manipulating the frequency counts so that they become negative for am==1. The difference to the first attempt is that geom_col(position = "fill") is used instead of geom_tile() for plotting.
Note: I didn't edited the first answer because the OP has already commented on it and I might delete that first and incomplete answer, eventually.
Preparing the data
freq_am <-data.frame(xtabs(~cyl + gear + am, mtcars))
freq_am$Freq_am <- freq_am$Freq * (-1)^as.integer(as.character(freq_am$am))
This creates a new column Freq_am where Freq counts are multiplied with -1 if am == 1 (manual). Using exponentiation by a logical value is a trick to avoid ifelse.
Plotting
There are two possibilities to achieve the desired heatmap-like appearance.
Variant 1
p <- ggplot(freq_am, (aes(x = cyl, y = Freq, fill = Freq_am))) +
geom_col(position = "fill", width = 1) +
scale_fill_gradient2() +
facet_grid(gear ~ ., as.table = FALSE, switch = "y") +
scale_y_continuous(expand = c(0, 0)) +
scale_x_discrete(expand = c(0, 0))
p
This creates a stacked bar chart of Freq vs cyl using geom_col() where the bars are stretched vertically (position = "fill") and horizontally (width = 1) to fill the plotting area. In addition, the expand = c(0, 0) parameter to the scale functions tells ggplot to not expand the axes as usual. Note that the x-axis is discrete as xtabs() has coerced cyl to factor.
facet_grid() is used to simulate an y-axis with the grid values in increasing order (as.table = FALSE). switch = "y" moves the panel strips to the left side.
scale_fill_gradient2() uses a convenient diverging colour scheme by default so that the count of cars with automatic transmission appears in blue and the count of cars with manual transmission in red.
Now, we need to remove all decorations and spaces which aren't needed for a heatmap. Finally, the y-axis label is renamed:
p + theme(panel.grid = element_blank()
, axis.ticks = element_blank()
, axis.text.y = element_blank()
, strip.background = element_blank()
, panel.spacing.y = unit(0, "pt")
) +
ylab("gear")
The downside of this approach is the lack of borders between tiles. So, it is difficult to distinguish the share of counts if adjacent tiles have the same colour as, e.g., the 6-cyl, 3-gear and 4-gear, resp., tiles.
Variant 2
This variant adds borders between the tiles. The width of the borders can be flexibly adjusted:
p <- ggplot(freq_am, (aes(x = 1, y = Freq, fill = Freq_am))) +
geom_col(position = "fill") +
scale_fill_gradient2() +
facet_grid(gear ~ cyl, as.table = FALSE, switch = "both") +
scale_y_continuous(expand = c(0, 0)) +
scale_x_continuous(expand = c(0, 0))
p
Here, we use facet_grid() for both directions. For each panel, Freq is plotted vs a dummy variable 1 using geom_col() as above. As the dummy variable 1 is numeric we don't need the width parameter to geom_col(). Both axes are continuous now.
Again, we need to remove some of the decorations and to rename the labels on the x and y-axes:
p + theme(panel.grid = element_blank()
, axis.ticks = element_blank()
, axis.text = element_blank()
, strip.background = element_blank()
# , panel.spacing = unit(0, "pt")
) +
xlab("cyl") + ylab("gear")
Now, we do have a heatmap with borders between the tiles. In order to remove the borders or adjust the width, you can uncomment the line with panel.spacing and change the value.
This is a first attempt to find an (incomplete) answer to the Q by manipulating the frequency counts so that they become negative for am==0.
Note that the question is not fully clear. ?mtcars defines am as
Transmission (0 = automatic, 1 = manual).
while the OP has defined
automatic (am==1) or manual (am==0)
which is just the other way around. In addition, the OP has requested the heatmap to show blue for values of am==1 and red for am==0.
Preparing the data
freq_am <-data.frame(xtabs(~cyl + gear + am, mtcars))
freq_am$Freq_am <- -freq_am$Freq * (-1)^as.integer(as.character(freq_am$am))
freq_am$gear_am <- factor(paste(as.character(freq_am$gear), as.character(freq_am$am), sep = "_"))
freq_am
#freq_am
# cyl gear am Freq Freq_am gear_am
#1 4 3 0 1 -1 3_0
#2 6 3 0 2 -2 3_0
#3 8 3 0 12 -12 3_0
#4 4 4 0 2 -2 4_0
#5 6 4 0 2 -2 4_0
#6 8 4 0 0 0 4_0
#7 4 5 0 0 0 5_0
#8 6 5 0 0 0 5_0
#9 8 5 0 0 0 5_0
#10 4 3 1 0 0 3_1
#11 6 3 1 0 0 3_1
#12 8 3 1 0 0 3_1
#13 4 4 1 6 6 4_1
#14 6 4 1 2 2 4_1
#15 8 4 1 0 0 4_1
#16 4 5 1 2 2 5_1
#17 6 5 1 1 1 5_1
#18 8 5 1 2 2 5_1
Note that xtabs() has coerced am to factor:
str(freq_am$am)
# Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2 ...
To convert am back to numeric we have to use as.integer(as.character(freq_am$am)). (You may convert the level numbers directly to the original numeric values by using (as.integer(am) - 1) but that's less save.)
gear_am will be used as new y-axis when plotting the heatmap.
Plotting
library(ggplot2)
ggplot(freq_am, aes(cyl, gear_am, fill = Freq_am)) +
geom_tile() +
scale_fill_gradient2() +
theme_minimal() +
theme(panel.grid = element_blank())
scale_fill_gradient2() uses a convenient diverging colour scheme by default.
The tiles for gear on the y-axis have now been split up into tiles with am==0 and am==1.
"Incomplete" answer
The OP has requested that the now split-up tiles should be completely filled even if there are zero counts. This could be achieved by further manipulating freq_am. However, I find the current chart communicates the result in a clear, unamibiguous way.
I have two regression models without random effects: one is OLS using lm, the other includes multiplication of coefficients using nle.
I wish to add individual-level random effects to both. I've managed to do this for the OLS function using the lme4 package, but haven't been able to find a way to do it for the multiplicative model.
The following code produces a dataset with similar structure to the one I am working on:
df <- data.frame(id = rep(1:1000, each=10), jit = rep(rnorm(1000, 0, 0.2), each = 10), a = sample(1:5, 10000, T), b = sample(1:5, 10000,T), c = sample(1:5, 10000, T))
df <- cbind(df, model.matrix(~ as.factor(a) + as.factor(b) + as.factor(c), data.frame(rbind(as.matrix(df), t(matrix(rep(1:5, each = 5), nrow=5)))))[1:nrow(df),2:13])
colnames(df)[6:17] <- (dim_dummies <- as.vector(outer(2:5, letters[1:3], function(x, y) paste(y, x, sep=""))))
true_vals <- list(vL2 = 0.4, vL3 = 0.5, vL4 = 0.8, vA = 0.7, vB = 1.1, vC = 0.9)
attach(df)
attach(true_vals)
df$val <-
(a2 * vA + b2*vB + c2*vC) * vL2 +
(a3 * vA + b3*vB + c3*vC) * vL3 +
(a4 * vA + b4*vB + c4*vC) * vL4 +
(a5 * vA + b5*vB + c5*vC) + runif(1, -.2, .2) + jit
detach(true_vals)
detach(df)
df[1:15, ]
id jit a b c a2 a3 a4 a5 b2 b3 b4 b5 c2 c3 c4 c5 val
1 1 -0.14295 4 4 1 0 0 1 0 0 0 1 0 0 0 0 0 1.1698
2 1 -0.14295 5 1 4 0 0 0 1 0 0 0 0 0 0 1 0 1.1498
3 1 -0.14295 5 4 4 0 0 0 1 0 0 1 0 0 0 1 0 2.0298
4 1 -0.14295 5 1 5 0 0 0 1 0 0 0 0 0 0 0 1 1.3298
5 1 -0.14295 5 4 2 0 0 0 1 0 0 1 0 1 0 0 0 1.6698
6 1 -0.14295 1 5 1 0 0 0 0 0 0 0 1 0 0 0 0 0.8298
7 1 -0.14295 3 2 5 0 1 0 0 1 0 0 0 0 0 0 1 1.4198
8 1 -0.14295 3 2 1 0 1 0 0 1 0 0 0 0 0 0 0 0.5198
9 1 -0.14295 3 2 4 0 1 0 0 1 0 0 0 0 0 1 0 1.2398
10 1 -0.14295 5 3 3 0 0 0 1 0 1 0 0 0 1 0 0 1.4298
11 2 -0.01851 4 5 3 0 0 1 0 0 0 0 1 0 1 0 0 1.9643
12 2 -0.01851 2 1 3 1 0 0 0 0 0 0 0 0 1 0 0 0.5843
13 2 -0.01851 2 1 3 1 0 0 0 0 0 0 0 0 1 0 0 0.5843
14 2 -0.01851 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 -0.1457
15 2 -0.01851 2 3 1 1 0 0 0 0 1 0 0 0 0 0 0 0.6843
...
a, b, and c represent scores on three 1:5 dimension scales. a2 through c5 are dummy variables representing levels 2:5 on the same scales. There are 10 observations per individual (id). val is a proxy for the score I wish to predict using the regression models. (The values in the actual data may not correspond to the structure here, however.)
I have two regression models without random effects. One is a regular OLS using the 12 dummy variables as predictors of val:
additive.formula <- as.formula("val ~
a2 + a3 + a4 + a5 +
b2 + b3 + b4 + b5 +
c2 + c3 + c4 + c5")
fit.additive <- lm(additive.formula, data = df)
The second assumes that the relative distance between the levels is shared for the three dinensions (a,b,c), but that the dimensions differ in terms of scale. That leaves 6 coefficients (cA, cB, cC, cL2, cL3, cL4) + the intercept.
multiplicative.formula <- as.formula(" val ~ intercept +
(a2 * cA + b2*cB + c2*cC) * cL2 +
(a3 * cA + b3*cB + c3*cC) * cL3 +
(a4 * cA + b4*cB + c4*cC) * cL4 +
(a5 * cA + b5*cB + c5*cC)")
multiplicative.start <- list(intercept = 0, cA = 1, cB = 1, cC = 1, cL2 = 1, cL3 = 1, cL4 = 1)
fit.multiplicative <- nls(multiplicative.formula, start=multiplicative.start, data=df, control = list(maxiter = 5000))
Since there are 10 observations per individual, we cannot expect them to be fully independent. Therefore, I wish to add a random effect at the level of individual as defined by the variable id. I've found a way to do that with the lme4 package:
require(lme4)
additive.formula.re <- as.formula("val ~ (1 | id) +
a2 + a3 + a4 + a5 +
b2 + b3 + b4 + b5 +
c2 + c3 + c4 + c5")
fit.additive.re <- lmer(additive.formula.re, data=df)
The question is if it is possible to add random effects on the id variable using a regression model similar to the multiplicative one, maybe with the lme4 or nlme packages? The formula should look something like
multiplicative.formula.re <- as.formula(" val ~ (1 | id) + intercept +
(a2 * cA + b2*cB + c2*cC) * cL2 +
(a3 * cA + b3*cB + c3*cC) * cL3 +
(a4 * cA + b4*cB + c4*cC) * cL4 +
(a5 * cA + b5*cB + c5*cC)")
Any suggestions?
Try nlme. This should be what you need (if I understood correctly):
library(nlme)
fit.multiplicative.nlme <- nlme( model = val ~ intercept +
(a2 * cA + b2*cB + c2*cC) * cL2 +
(a3 * cA + b3*cB + c3*cC) * cL3 +
(a4 * cA + b4*cB + c4*cC) * cL4 +
(a5 * cA + b5*cB + c5*cC),
fixed = intercept + cA +cB + cC + cL2 + cL3 + cL4 ~ 1,
random = intercept ~ 1|id,
start = unlist(multiplicative.start), data=df)
However, this didn't converge when I tried it with the non-reproducible data you provide (you should set a random seed). You could try different settings in nlmeControl.
The below was incorrect:
I don't see a reason for non-linear least squares. Let's revert the dummy encoding:
df$id1 <- seq_len(nrow(df))
df$a1 <- as.integer(rowSums(df[, paste0("a", 2:5)]) == 0)
df$b1 <- as.integer(rowSums(df[, paste0("b", 2:5)]) == 0)
df$c1 <- as.integer(rowSums(df[, paste0("c", 2:5)]) == 0)
library(reshape2)
DFm <- melt(df, id.vars = c("id", "jit", "a", "b", "c", "val", "id1"))
DFm <- DFm[DFm$value == 1,]
DFm$g <- paste0("fac", substr(DFm$variable, 1, 1))
DF <- dcast(DFm, ... ~ g, value.var = "variable")
fit1 <- lm(val ~ faca + facb + facc, data = DF)
#compare results:
coef(fit.multiplicative)
prod(coef(fit.multiplicative)[c("cA", "cL2")])
coef(fit1)["facaa2"]
prod(coef(fit.multiplicative)[c("cA", "cL3")])
coef(fit1)["facaa3"]
As you see, this is basically the same model (differences are due to numerical optimization within nls). And it's easy to add a random intercept to this.