Related
I have a vector of numbers that I need to round according to the rules in the image below:
Consider the following examples:
0.5 -> 0.5 (no rounding)
1.2 -> 1.0
3.7 -> 4.0
18.9 -> 20.0
28.1 -> 30.0
110 -> 120
I could in theory write a series of conditional statements to achieve this task; however, it will be a tedious and inefficient thing to do. Is there a way to achieve the desired outcome in an efficient manner?
Thank you
You could use the floor of base 10 logarithm to calculate powers of 10. Then divide the vector by that, round it and multiply with the powers of 10 again.
tens <- 10^floor(log10(abs(x)))
round(x/tens)*tens
# [1] NaN 0.5 1.0 4.0 -4.0 20.0 30.0 100.0
Note, that this won't work for zero and you therefore should use case-handling.
(However, 110 -> 120 is not obvious to me.)
Data:
x <- c(0, .5, 1.2, 3.7, -3.7, 18.9, 28.1, 110)
This solution uses findInterval to get which of the rounding functions is to be applied to the vector's elements.
roundSpecial <- function(x){
round_funs <- list(
no_round <- function(x) x,
round_by_1 <- function(x) round(x),
round_to_20 <- function(x) 20,
round_by_10 <- function(x) 10*round(x / 10),
round_by_15 <- function(x) 15*round(x / 15),
round_by_30 <- function(x) 30*round(x / 30)
)
lims <- c(0, 1, 17, 20, 30, 90, Inf)
which_fun <- findInterval(x, lims)
sapply(seq_along(which_fun), function(i) {
round_funs[[ which_fun[i] ]](x[i])
})
}
roundSpecial(x)
#[1] 0.5 1.0 4.0 20.0 30.0 120.0
Data
x <- c(0.5, 1.2, 3.7, 18.9, 28.1, 110)
Here's an example of how the group label from cut() doesn't seem accurate. The observation with x1=200 is classified in the [0,200) group of x2, which is wrong. The label can be fixed by increasing dig.lab, but I still think the default rounding should give a result for x2 with face validity. Is this a bug?
df <- data.frame(x1 = c(100, 100.5, 200, 200.5))
df$x2 <- cut(df$x1, breaks = c(0,200.1,999), right = FALSE)
df$x3 <- cut(df$x1, breaks = c(0,200.1,999), right = FALSE, dig.lab = 4)
df
# x1 x2 x3
# 1 100.0 [0,200) [0,200.1)
# 2 100.5 [0,200) [0,200.1)
# 3 200.0 [0,200) [0,200.1)
# 4 200.5 [200,999) [200.1,999)
The title can be confusing but I guess it has a simple solution. I have my own function and I want to apply same function to multiple lists that consists of two columns. But I need to do different calculations to each column separately.
As an example mydata is:
x1 x2 y1 y2 z1 z2
1 0.0 0.0 0.0 7.8 0.0 8.6
2 8.6 0.0 0.0 7.6 1.6 1.4
3 11.2 7.8 3.4 1.2 7.6 0.0
4 8.4 7.6 21.4 10.2 23.6 0.0
5 0.0 1.2 1.8 7.0 3.2 0.0
6 0.0 10.2 1.4 0.0 0.0 0.0
mydata<-structure(list(x1 = c(0, 8.6, 11.2, 8.4, 0, 0), x2 = c(0, 0,
7.8, 7.6, 1.2, 10.2), y1 = c(0, 0, 3.4, 21.4, 1.8, 1.4), y2 = c(7.8,
7.6, 1.2, 10.2, 7, 0), z1 = c(0, 1.6, 7.6, 23.6, 3.2, 0), z2 = c(8.6,
1.4, 0, 0, 0, 0)), .Names = c("x1", "x2", "y1", "y2", "z1", "z2"
), class = "data.frame", row.names = c(NA, -6L))
And myfun function is:
myfun<- function(x) {
means<-sapply(list(x), function(ss) mean(ss, na.rm = T))
#my point: vars<-sapply(list(y), function(ss) var(ss, na.rm = T))
mean<-means[[1]]
#var<-vars[[1]]
#lists<-list(mean, var)
#names(lists) <- c("mean", "var")
#return(lists)
lists<-list(mean)
names(lists)<-c("mean")
return(lists)
}
I used #for parts that will be added later in the myfun.
When I tried
results<-lapply(mydata, myfun)
I can apply same function and same calculation to each column.
As you see there are 2 columns(x1-x2, y1-y2, z1-z2) for each data (x, y, z).
What I want is:
1) Obtaining means of first columns (x1, y1, z1)
2) Obtaining variances of second columns (x2, y2, z2)
3) And as output; I want to see results of mean1and var1for each data under x, y and z lists like:
x-> mean1 (mean of x1)
var1 (var of x2)
y-> mean1 (mean of y1)
var1 (var of y2)
4) Do all these in a loop with lapply or sapply or with any useful function.
Notes:
1) I did not group x1 and x2 under x, y1 and y2 under y. Because If a solution can be found for mydata form, it would be more useful for me. But if it is necessary I can group them separately.
2) myfun function is finding means of 6 columns now. I have indicated the additional parts that will be used to calculate variances of second columns with #
Consider assigning your groups first, then iterate off this with lapply. In fact use sapply with simplify=FALSE for a named list.
grps <- unique(gsub("[0-9]", "", colnames(mydata)))
# [1] "x" "y" "z"
myfun <- function(grp)
list(mean = mean(mydata[,paste0(grp, 1)]),
variance = var(mydata[,paste0(grp, 2)]))
mean_var_list <- sapply(grps, myfun, simplify = FALSE)
mean_var_list
# $x
# $x$mean
# [1] 4.7
#
# $x$variance
# [1] 20.87467
#
# $y
# $y$mean
# [1] 4.666667
#
# $y$variance
# [1] 16.53467
#
# $z
# $z$mean
# [1] 6
#
# $z$variance
# [1] 11.85067
Or use the default, simplify=TRUE and return a matrix.
mean_var_mat <- sapply(grps, myfun)
mean_var_mat
# x y z
# mean 4.7 4.666667 6
# variance 20.87467 16.53467 11.85067
I would start by splitting the dataframe to create a list of dataframes with 2 columns.
At the point you can use lapply or map_dfr to apply the function mean_var to each of the elements of the list.
The advantage of map_dfr is that it return a dataframe, binding the rows of the function output.
library(purrr)
my_data_l <- split.default(mydata, rep(1:3, each = 2))
mean_var <- function(x) {
list(mean = mean(x[,1]), var = var(x[,2]))
}
map_dfr(my_data_l, mean_var)
I am new to R and need to do pairwise comparison formulas across a set of variables. The number of elements to be compared will by dynamic but here is a hardcoded example with 4 elements, each compared against the other:
#there are 4 choices A, B, C, D -
#they are compared against each other and comparisons are stored:
df1 <- data.frame("A" = c(80),"B" = c(20))
df2 <- data.frame("A" = c(90),"C" = c(10))
df3 <- data.frame("A" = c(95), "D" = c(5))
df4 <- data.frame("B" = c(80), "C" = c(20))
df5 <- data.frame("B" = c(90), "D" = c(10))
df6 <- data.frame("C" = c(80), "D" = c(20))
#show the different comparisons in a matrix
matrixA <- matrix(c("", df1$B[1], df2$C[1], df3$D[1],
df1$A[1], "", df4$C[1], df5$D[1],
df2$A[1], df4$B[1], "", df6$D[1],
df3$A[1], df5$B[1], df6$C[1], ""),
nrow=4,ncol = 4,byrow = TRUE)
dimnames(matrixA) = list(c("A","B","C","D"),c("A","B","C","D"))
#perform calculations on the comparisons
matrixB <- matrix(
c(1, df1$B[1]/df1$A[1], df2$C[1]/df2$A[1], df3$D[1]/df3$A[1],
df1$A[1]/df1$B[1], 1, df4$C[1]/df4$B[1], df5$D[1]/df5$B[1],
df2$A[1]/df2$C[1], df4$B[1]/df4$C[1], 1, df6$D[1]/df6$C[1],
df3$A[1]/df3$D[1], df5$B[1]/df5$D[1], df6$C[1]/df6$D[1], 1),
nrow = 4, ncol = 4, byrow = TRUE)
matrixB <- rbind(matrixB, colSums(matrixB)) #add the sum of the colums
dimnames(matrixB) = list(c("A","B","C","D","Sum"),c("A","B","C","D"))
#so some more calculations that I'll use later on
dfC <- data.frame("AB" = c(matrixB["A","A"] / matrixB["A","B"],
matrixB["B","A"] / matrixB["B","B"],
matrixB["C","A"] / matrixB["C","B"],
matrixB["D","A"] / matrixB["D","B"]),
"BC" = c(matrixB["A","B"] / matrixB["A","C"],
matrixB["B","B"] / matrixB["B","C"],
matrixB["C","B"] / matrixB["C","C"],
matrixB["D","B"] / matrixB["D","C"]
),
"CD" = c(matrixB["A","C"] / matrixB["A","D"],
matrixB["B","C"] / matrixB["B","D"],
matrixB["C","C"] / matrixB["C","D"],
matrixB["D","C"] / matrixB["D","D"]))
dfCMeans <- colMeans(dfC)
#create the normalization matrix
matrixN <- matrix(c(
matrixB["A","A"] / matrixB["Sum","A"], matrixB["A","B"] / matrixB["Sum","B"], matrixB["A","C"] / matrixB["Sum","C"], matrixB["A","D"] / matrixB["Sum","D"],
matrixB["B","A"] / matrixB["Sum","A"], matrixB["B","B"] / matrixB["Sum","B"], matrixB["B","C"] / matrixB["Sum","C"], matrixB["B","D"] / matrixB["Sum","D"],
matrixB["C","A"] / matrixB["Sum","A"], matrixB["C","B"] / matrixB["Sum","B"], matrixB["C","C"] / matrixB["Sum","C"], matrixB["C","D"] / matrixB["Sum","D"],
matrixB["D","A"] / matrixB["Sum","A"], matrixB["D","B"] / matrixB["Sum","B"], matrixB["D","C"] / matrixB["Sum","C"], matrixB["D","D"] / matrixB["Sum","D"]
), nrow = 4, ncol = 4, byrow = TRUE)
Since R is so concise it seems like there should be a much better way to do this, I would like to know an easier way to figure out these type of calculations using R.
OK, I might be starting to piece together something here.
We start with a matrix like so:
A <- structure(
c(NA, 20, 10, 5, 80, NA, 20, 10, 90, 80, NA, 20, 95, 90, 80, NA),
.Dim = c(4, 4),
.Dimnames = list(LETTERS[1:4], LETTERS[1:4]))
A
# A B C D
# A NA 80 90 95
# B 20 NA 80 90
# C 10 20 NA 80
# D 5 10 20 NA
This matrix is the result of a pairwise comparison on a vector of length 4. We know nothing of this vector, and the only thing we know about the function used in the comparison is that it is binary non-commutative, or more precisely: f(x, y) = 100 - f(y, x) and the result is ∈ [0, 100].
matrixB appears to be simply matrixA divided by its own transpose:
B = ATA-1
or if you prefer:
B = (100 - A) / A
Potato patato due to above mentioned properties.
B <- (100 - A) / A
B <- t(A) / A
# fill in the diagonal with 1s
diag(B) <- 1
round(B, 2)
# A B C D
# A 1 0.25 0.11 0.05
# B 4 1.00 0.25 0.11
# C 9 4.00 1.00 0.25
# D 19 9.00 4.00 1.00
The 'normalized' matrix as you call it seems to be simply each column divided by its sum.
B.norm <- t(t(B) / colSums(B))
round(B.norm, 3)
# A B C D
# A 0.030 0.018 0.021 0.037
# B 0.121 0.070 0.047 0.079
# C 0.273 0.281 0.187 0.177
# D 0.576 0.632 0.746 0.707
I have been trying to solve a constrained optimization problem in R using constrOptim() (my first time) but am struggling to set up the constraints for my problem.
The problem is pretty straight forward and i can set up the function ok but am a bit at a loss about passing the constraints in.
e.g. problem i've defined is (am going to start with N fixed at 1000 say so i just want to solve for X ultimately i'd like to choose both N and X that max profit):
so i can set up the function as:
fun <- function(x, N, a, c, s) { ## a profit function
x1 <- x[1]
x2 <- x[2]
x3 <- x[3]
a1 <- a[1]
a2 <- a[2]
a3 <- a[3]
c1 <- c[1]
c2 <- c[2]
c3 <- c[3]
s1 <- s[1]
s2 <- s[2]
s3 <- s[3]
((N*x1*a1*s1)-(N*x1*c1))+((N*x2*a2*s2)-(N*x2*c2))+((N*x3*a3*s3)-(N*x3*c3))
}
The constraints i need to implement are that:
x1>=0.03
x1<=0.7
x2>=0.03
x2<=0.7
x3>=0.03
x2<=0.7
x1+x2+x3=1
The X here represents buckets into which i need to optimally allocate N, so x1=pecent of N to place in bucket 1 etc. with each bucket having at least 3% but no more than 70%.
Any help much appreciated...
e.g. here is an example i used to test the function does what i want:
fun <- function(x, N, a, c, s) { ## profit function
x1 <- x[1]
x2 <- x[2]
x3 <- x[3]
a1 <- a[1]
a2 <- a[2]
a3 <- a[3]
c1 <- c[1]
c2 <- c[2]
c3 <- c[3]
s1 <- s[1]
s2 <- s[2]
s3 <- s[3]
((N*x1*a1*s1)-(N*x1*c1))+((N*x2*a2*s2)-(N*x2*c2))+((N*x3*a3*s3)-(N*x3*c3))
};
x <-matrix(c(0.5,0.25,0.25));
a <-matrix(c(0.2,0.15,0.1));
s <-matrix(c(100,75,50));
c <-matrix(c(10,8,7));
N <- 1000;
fun(x,N,a,c,s);
You can use The lpSolveAPI package.
## problem constants
a <- c(0.2, 0.15, 0.1)
s <- c(100, 75, 50)
c <- c(10, 8, 7)
N <- 1000
## Problem formulation
# x1 >= 0.03
# x1 <= 0.7
# x2 >= 0.03
# x2 <= 0.7
# x3 >= 0.03
# x1 +x2 + x3 = 1
#N*(c1- a1*s1)* x1 + (a2*s2 - c2)* x2 + (a3*s3- c3)* x3
library(lpSolveAPI)
my.lp <- make.lp(6, 3)
The best way to build a model in lp solve is columnwise;
#constraints by columns
set.column(my.lp, 1, c(1, 1, 0, 0, 1, 1))
set.column(my.lp, 2, c(0, 0, 1, 1, 0, 1))
set.column(my.lp, 3, c(0, 0, 0, 0, 1, 1))
#the objective function ,since we need to max I set negtive max(f) = -min(f)
set.objfn (my.lp, -N*c(c[1]- a[1]*s[1], a[2]*s[2] - c[2],a[3]*s[3]- c[3]))
set.rhs(my.lp, c(rep(c(0.03,0.7),2),0.03,1))
#constraint types
set.constr.type(my.lp, c(rep(c(">=","<="), 2),">=","="))
take a look at my model
my.lp
Model name:
Model name:
C1 C2 C3
Minimize 10000 -3250 2000
R1 1 0 0 >= 0.03
R2 1 0 0 <= 0.7
R3 0 1 0 >= 0.03
R4 0 1 0 <= 0.7
R5 1 0 1 >= 0.03
R6 1 1 1 = 1
Kind Std Std Std
Type Real Real Real
Upper Inf Inf Inf
Lower 0 0 0
solve(my.lp)
[1] 0 ## sucess :)
get.objective(my.lp)
[1] -1435
get.constraints(my.lp)
[1] 0.70 0.70 0.03 0.03 0.97 1.00
## the decisions variables
get.variables(my.lp)
[1] 0.03 0.70 0.27
Hi Just in case of use to anyone i also found an answer as below:
First of all, your objective function can be written a lot more concisely using vector operations:
> my_obj_coeffs <- function(N,a,c,s) N*(a*s-c)
> fun <- function(x,N,a,c,s) sum(my_obj_coeffs(N,a,c,s) * x)
You're trying to solve a linear program, so you can use solve it using the simplex algorithm. There's a lightweight implementation of it in the 'boot' package.
> library(boot)
> solution <- function(obj) simplex(obj, diag(3), rep(0.7,3), diag(3), rep(0.03,3), rep(1,3), 1, maxi=TRUE)
Then for the example parameters you used, you can call that solution function:
> a <- c(0.2,0.15,0.1)
> s <- c(100,75,50)
> c <- c(10,8,7)
> N <- 1000
> solution(my_obj_coeffs(N,a,c,s))
Linear Programming Results
Call : simplex(a = obj(N, a, s, c), A1 = diag(3), b1 = rep(0.7, 3),
A2 = diag(3), b2 = rep(0.03, 3), A3 = matrix(1, 1, 3), b3 = 1,
maxi = TRUE)
Maximization Problem with Objective Function Coefficients
[,1]
[1,] 10000
[2,] 3250
[3,] -2000
attr(,"names")
[1] "x1" "x2" "x3"
Optimal solution has the following values
x1 x2 x3
0.70 0.27 0.03
The optimal value of the objective function is 7817.5.