R functions that operate on data frame columns [duplicate] - r

This question already has answers here:
Applying a function on each row of a data frame in R
(3 answers)
Closed 9 years ago.
I have the following function:
calculateAngle <- function(x, y)
{
v <- c(x, y)
a <- c(1, 0)
theta <- acos( sum(a*v) / ( sqrt(sum(a * a)) * sqrt(sum(v * v)) ) )
if(v[[2]] < 0)
{
return(-1 * theta)
}
else
{
return(theta)
}
}
Which takes an x and y value and calculates the angle between that vector and a vector of 1, 0. Now, this function works fine in these examples:
> calculateAngle(0, 1)
[1] 1.570796
> calculateAngle(0, -1)
[1] -1.570796
> calculateAngle(0, -10)
[1] -1.570796
> calculateAngle(rnorm(1), rnorm(1))
[1] -0.2600444
But when I try to pass it the columns of a dataframe, it returns a single value when what I want is the angle for each row.
df <- data.frame(x=rnorm(10), y=rnorm(10))
df$angle <- calculateAngle(df$x, df$y)
Help is appreciated.

(Warning: this is the lazy answer because I don't feel like spending more than 5 seconds on this!)
calculateAnglev <- Vectorize(calculateAngle,c('x','y'))
> calculateAnglev(runif(2),runif(2))
[1] 0.2738694 0.8039875
i.e. this should not be mistaken for a substitute for true vectorization, performance-wise.

Your code for theta is overly complicated, for example you have term sum(sqrt(a*a)) which is always 1, and sum(a*v) is always x. Also sum(v*v) = x^2+y^2, and using that form we get to the version which works also for vector arguments:
calculateAngle <- function(x, y)
{
a <- c(1, 0)
theta <- acos( x / sqrt(x^2+y^2))
sign(y)*theta
}

Related

why smart rounding works differently with map/lapply than without?

I would like to smartly round my results so that it sums up to the same sum after rounding.
Can someone explain me why this is different when I do it with map or lapply?
v <- c(
0.9472164,
71.5330771,
27.5197066)
smart.round <- function(x, digits = 0) {
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
}
### works correctly
smart.round(v)
### lapply and map is wrong
lapply(v,smart.round)
map(v,smart.round)
( I think this is merely a comment, but I have not yet earned my right add comments )
lapply, purrr::map are processing your input sequentially. In your example, lapply takes the first value of v and calls smart.round then moves on to the second value of v and so on ...
in total smart.round is called three times, each time without any knowledge of the other two values in v.
I'm not entirely sure why you try to use lapply here, if this is part of a more complex situation you might want to expand your question.
I have written my own solution. Definitely a bit cumbersome but it works.. :) My initial goal was just to input a dataframe and output the rounded dataframe.
The whole example here:
v <- data.frame(a = c(0.9472164,
71.5330771,
27.5197066),
b = c(4.6472164,
5.6330771,
27.1197066))
smart.round <- function(x, digits = 0) {
up <- 10 ^ digits
x <- x * up
y <- floor(x)
indices <- tail(order(x-y), round(sum(x)) - sum(y))
y[indices] <- y[indices] + 1
y / up
}
rounding_function <- function(input_df) {
output_df <- data.frame(matrix(ncol = ncol(input_df), nrow = nrow(input_df)))
for (i in 1:nrow(input_df)) {
a = smart.round(as.numeric(input_df[i,]))
for (k in 1:ncol(input_df)) {
output_df[i,k]=a[k]
}
colnames(output_df) = colnames(input_df)
}
return(output_df)
}
v_rounded <- rounding_function(v)

How to solve an equation for a given variable in R?

This is equation a <- x * t - 2 * x. I want to solve this equation for t.
So basically, set a = 0 and solve for t . I am new to the R packages for solving equations. I need the package that solves for complex roots. The original equations I am work with have real and imaginary roots. I am looking for an algebraic solution only, not numerical.
I tried:
a <- x * t - 2 * x
solve(a,t)
I run into an error:
Error in solve.default(a, t) : 'a' (1000 x 1) must be square
You can use Ryacas to get the solution as an expression of x:
library(Ryacas)
x <- Sym("x")
t <- Sym("t")
Solve(x*t-2*x == 0, t)
# Yacas vector:
# [1] t == 2 * x/x
As you can see, the solution is t=2 (assuming x is not zero).
Let's try a less trivial example:
Solve(x*t-2*x == 1, t)
# Yacas vector:
# [1] t == (2 * x + 1)/x
If you want to get a function which provides the solution as a function of x, you can do:
solution <- Solve(x*t-2*x == 1, t)
f <- function(x){}
body(f) <- yacas(paste0("t Where ", solution))$text
f
# function (x)
# (2 * x + 1)/x
You might be looking for optimize:
a=function(x,t) x*t-2*x
optimize(a,lower=-100,upper=100,t=10)
optimize(a,lower=-100,upper=100,x=2)
If you need more help, I need a reproductible example.

How to condition a computation and then add al computation done in R?

i am experimenting with and R and I can't find the way to do the next thing:
1- I want to multiply if x == 3 multiply by "y" value of the same row
2- Add all computations done in step 1.
x <- 3426278722533992028364647392927338
y <- 7479550949037487987438746984798374
x <- as.numeric(strsplit(as.character(x), "")[[1]])
y <- as.numeric(strsplit(as.character(y), "")[[1]])
Table <- table(x,y)
Table <- data.frame(Table)
Table$Freq <- NULL
So I tried creating a function:
Calculation <- function (x,y) {
z <- if(x == 3){ x * y }
w <- sum(z)
}
x and y are the columns of the data.frame
This prints and error which I struggle to solve...
Thanks for your time,
Kylian Pattje
2 things here:
1. Use ifelse in your function,
Calculation <- function (x,y) {
z <- ifelse(x == 3, x * y, NA)
w <- sum(z, na.rm = TRUE)
return(w)
}
2. Make sure your variables are NOT factors,
Table[] <- lapply(Table, function(i) as.numeric(as.character(i)))
Calculation(Table$x, Table$y)
#[1] 84

How to sum values printed by print() in R

T <- function (p,q) {
for (x in 1:3) {
for (y in 1:3) {
sum(p*x + q*y)
print(sum(p*x + q*y))
}
}
}
sum(T(2,2))
I want the sum of 4, 6, 8, 6, 8, 10, 8, 10, 12 which is the output of the loop where p=2 and q=2 but the result is
sum(T(2,2))
[1] 4
[1] 6
[1] 8
[1] 6
[1] 8
[1] 10
[1] 8
[1] 10
[1] 12
Is there any way that I can get the sum or change the printed value to a vector rather than a list of each value? sum() function doesn't work on printed value.
Printing to the screen does exactly that; it doesn't return the value printed to the calling code. Your function needs to build a vector and return that. Here's a few different ways of doing it:
Code along the lines of what you were trying to do, but starting with an empty numeric vector and building it for each iteration:
T1 <- function(p, q) {
out <- numeric(0)
for (x in 1:3) {
for (y in 1:3) {
out <- c(out, p * x + q * y)
}
}
out
}
Preassigning the vector size and assigning to specific members of the vector
T2 <- function(p, q) {
out <- numeric(3 ^ 2)
for (x in 1:3) {
for (y in 1:3) {
out[(x - 1) * 3 + y] <- p * x + q * y
}
}
out
}
Using one of the map functions in purrr (overkill here but can be extended to functions which don't easily vectorise)
library(purrr)
T3 <- function(p, q) {
map2_dbl(rep(1:3, each = 3), rep(1:3, 3), function(x, y) p * x + q * y)
}
Using vector arithmetic to return the result in one command
T4 <- function(p, q) {
rep(1:3, each = 3) * p + rep(1:3, 3) * q
}
Using outer to achieve the same as T4 (as per #alistaire's comment):
T5 <- function(p, q) {
c(outer(1:3 * p, 1:3 * q, `+`))
}
It should be noted that the efficiency of these method will vary as the size of the loops increases, but that in general method 4 will be the most efficient closely followed by method 5. For small loops, interestingly method 1 seems to be better than 2, but 1 becomes inefficient as the loop size increases because R has to move memory around to keep increasing the size of the vector.

R: How can I calculate large numbers in n-choose-k? [duplicate]

This question already has answers here:
How would you program Pascal's triangle in R?
(2 answers)
How to work with large numbers in R?
(1 answer)
Closed 6 years ago.
For a class assignment, I need to create a function that calculates n Choose k. I did just that, and it works fine with small numbers (e.g. 6 choose 2), but I'm supposed to get it work with 200 choose 50, where it naturally doesn't.
The answer is too large and R outputs NaN or Inf, saying:
> q5(200, 50)
[1] "NaN"
Warning message:
In factorial(n) : value out of range in 'gammafn'
I tried using logs and exponents, but it doesn't cut it.
q5 <- function (n, k) {
answer <- log(exp( factorial(n) / ( (factorial(k)) * (factorial(n - k)) )))
paste0(answer)
}
The answer to the actual question is that R cannot show numbers it cannot represent, and some of the terms in your equation are too big to represent. So it fails. However there are approximations to factorial that can be used - they work with logarithms which get big a lot slower.
The most famous one, Sterling's approximation, was not accurate enough, but the Ramanujan's approximation came to the rescue :)
ramanujan <- function(n){
n*log(n) - n + log(n*(1 + 4*n*(1+2*n)))/6 + log(pi)/2
}
nchoosek <- function(n,k){
factorial(n)/(factorial(k)*factorial(n-k))
}
bignchoosek <- function(n,k){
exp(ramanujan(n) - ramanujan(k) - ramanujan(n-k))
}
nchoosek(20,5)
# [1] 15504
bignchoosek(20,5)
# [1] 15504.06
bignchoosek(200,50)
# [1] 4.538584e+47
You can try this too:
q5 <- function (n, k) {
# nchoosek = (n-k+1)(n-k+2)...n / (1.2...k)
return(prod(sapply(1:k, function(i)(n-k+i)/(i))))
}
q5(200, 50)
#[1] 4.538584e+47
or in log domain
q5 <- function (n, k) {
# ln (nchoosek) = ln(n-k+1) + ln(n-k+2) + ...+ ln(n) - ln(1) -ln(2) - ...- ln(k)
return(exp(sum(sapply(1:k, function(i)(log(n-k+i) - log(i))))))
}
q5(200, 50)
#[1] 4.538584e+47
The packages for large numbers:
Brobdingnag package for "Very large numbers in R":
https://cran.r-project.org/web/packages/Brobdingnag/index.html
Paper: https://www.researchgate.net/publication/251996764_Very_large_numbers_in_R_Introducing_package_Brobdingnag
library(Brobdingnag)
googol <- as.brob(10)^100 # googol:=10^100
googol
# [1] +exp(230.26) # exponential notation is convenient for very large numbers
gmp package for multiple Precision Arithmetic (big integers and rationals, prime number tests, matrix computation):
https://cran.r-project.org/web/packages/gmp/index.html
This solution calculates the complete row of the Pascal triangle:
x <- 1
print(x)
for (i in 1:200) { x <- c(0, x) + c(x, 0); print(x) }
x[51] ### 200 choose 50
## > x[51]
## [1] 4.538584e+47
(as I proposed for How would you program Pascal's triangle in R? )
If you want to speed up the code then do not the print(x) (output is a relative slow operation).
To put the code in a function we can do
nchoosek <- function(n,k) {
x <- 1
for (i in 1:n) x <- c(0, x) + c(x, 0)
x[k+1] ### n choose k
}
nchoosek(200, 50) ### testing the function
## [1] 4.538584e+47
Here is a more refined version of my function:
nchoosek <- function(n, k) {
if (k==0) return(1)
if (k+k > n) k <- n-k
if (k==0) return(1)
x <- 1
for (i in 1:k) x <- c(0, x) + c(x, 0)
for (i in 1:(n-k)) x <- x + c(0, head(x, -1))
tail(x, 1)
}
nchoosek(200, 50) ### testing the function
## [1] 4.538584e+47

Resources