Related
I am trying to find the do a function which is similar to a vlookup in excel but which returns the maximum value and the other values in the same row.
The data frame looks like this:
The data frame which I am dealing with are given below:
dput(Book3)
structure(list(Item = c("ABA", "ABB", "ABC", "ABD", "ABE", "ABF"
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L))
dput(Book4)
structure(list(Item = c("ABA", "ABB", "ABC", "ABD", "ABE", "ABF",
"ABA", "ABB", "ABC", "ABD", "ABE", "ABF", "ABA", "ABB", "ABC",
"ABD", "ABE", "ABF"), Max1 = c(12, 68, 27, 17, 74, 76, 78, 93,
94, 98, 46, 90, 5, 58, 67, 64, 34, 97), Additional1 = c(40, 66,
100, 33, 66, 19, 8, 70, 21, 93, 48, 34, 44, 89, 74, 20, 0, 47
), Additional2 = c(39, 31, 85, 58, 0, 2, 57, 28, 31, 32, 15,
22, 93, 41, 57, 81, 95, 46)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -18L))
The Expected output for this is given below:
You are looking for slice_max:
library(dplyr)
Book4 %>%
group_by(Item) %>%
slice_max(Max1)
# Item Max1 Additional1 Additional2
# 1 ABA 78 8 57
# 2 ABB 93 70 28
# 3 ABC 94 21 31
# 4 ABD 98 93 32
# 5 ABE 74 66 0
# 6 ABF 97 47 46
Using base R
subset(Book4, Max1 == ave(Max1, Item, FUN = max))
-output
# A tibble: 6 × 4
Item Max1 Additional1 Additional2
<chr> <dbl> <dbl> <dbl>
1 ABE 74 66 0
2 ABA 78 8 57
3 ABB 93 70 28
4 ABC 94 21 31
5 ABD 98 93 32
6 ABF 97 47 46
An alternative base solution that is more resilient to floating-point precision problems (c.f., Why are these numbers not equal?, https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f). It also allows two behavior options if there are duplicate max-values:
if you want all of them, use ties.method = "min";
if you want the first (or just one) of them, then ties.method = "first".
Book4[ave(Book4$Max1, Book4$Item, FUN = function(z) rank(-z, ties.method = "first")) == 1,]
# # A tibble: 6 x 4
# Item Max1 Additional1 Additional2
# <chr> <dbl> <dbl> <dbl>
# 1 ABE 74 66 0
# 2 ABA 78 8 57
# 3 ABB 93 70 28
# 4 ABC 94 21 31
# 5 ABD 98 93 32
# 6 ABF 97 47 46
Using R base aggregate + max + merge
> merge(Book4, aggregate(Max1~Item, data = Book4, max), by = c("Item", "Max1"))
Item Max1 Additional1 Additional2
1 ABA 78 8 57
2 ABB 93 70 28
3 ABC 94 21 31
4 ABD 98 93 32
5 ABE 74 66 0
6 ABF 97 47 46
I am performing a splsda-model in R on 10 dataframes (data of 10 study areas), stored as a list (datalist). All these dataframes are similar, with the same variables, but just different values.
I use the micromics library to do this.
This is the head of the first study area. It compares the absence or presence of wetlands (factor variable - wetl or no wetl) depending on its value of TPI of different ranges.
> head(datalist[[1]])
OID POINTID WETLAND TPI200 TPI350 TPI500 TPI700 TPI900 TPI1000 TPI2000 TPI3000 TPI4000 TPI5000 TPI2500
1 -1 1 no wetl 70 67 55 50 48 46 53 47 49 63 48
2 -1 2 no wetl 37 42 35 29 32 16 17 35 49 63 26
3 -1 3 no wetl 45 55 45 39 41 41 53 47 49 63 48
4 -1 4 no wetl 46 58 51 43 46 36 54 47 49 62 49
5 -1 5 no wetl 58 55 53 49 47 46 54 47 49 62 49
6 -1 6 no wetl 56 53 51 49 46 46 54 47 49 61 49
I have done the cross validation step using following code:
library(mixOmics)
for (i in 1: length(model_list))
{
myperf_plsda <- perf(model_list[[i]], validation = "Mfold", folds = 10,
progressBar = FALSE, nrepeat = 10, auc = TRUE)
save(myperf_plsda, file="performancePLSDA.RData")
}
model_list is the list obtained from the spslda-function.
But now I am stuck in the next step, which is to look at the error rate (overall and per class)
For just one dataframe (studyarea), I can use the following code:
# cross-validation error in function of nr of PCs
# can see how many PCs is best
plot(myperf_plsda, col = color.mixo(5:7), sd = TRUE,
legend.position = "horizontal")
# error rate overall and per class
myperf_plsda$error.rate
myperf_plsda$error.rate.class
myperf_plsda$auc
So first, I am trying to plot see the error in function of the prinipal components (= plot, first code here above for one study area). The result would be something like I would like to have it in a pdf.
Second, I want to know the overall error rate and error rate per class, from which the code is mentioned above for one study area. The result for one study area is then for example:
overall error rate:
error rate per class:
I have tried some ways to all this codes in a for loop, or using lapply, in order to get these results for the 10 study areas.
, such as:
### To see how many PCs is best ###
pdf('overallerrorrate_wetlall_small.pdf')
for (i in 1:length(myperf_plsda))
{
plot(model_list[[i]], col = color.mixo(5:7), sd = TRUE,
legend.position = "horizontal")
}
dev.off()
or
for (i in 1:length(myperf_plsda))
{plot(myperf_plsda, col = color.mixo(5:7), sd = TRUE,
legend.position = "horizontal")}
or
for (i in 1:length(myperf_plsda))
{myperf_plsda[[1]]error.rate
myperf_plsda[[1]]error.rate.class
myperf_plsda[[i]]auc
}
or
lapply(myperf_plsda, [[, 'error.rate')`
But all these codes don't work! How can I run the code for multiple elements in a list? Many thanks!
Based on your outputs, you will have to create a new list and save the results on it. Using just myperf_plsda could be overwriting each step in the loop. Also most of the measures you want are lists, so I added some processing functions to reach dataframes. I used next dummy data:
library(mixOmics)
#Function
custom_splsda <- function(datalist, ncomp, keepX, ..., Xcols, Ycol){
Y <- datalist[[Ycol]]
X <- datalist[Xcols]
res <- splsda(X, Y, ncomp = ncomp, keepX = keepX, ...)
res
}
#Data
datalist <- list(df1 = structure(list(OID = c(-1, -1, -1, -1, -1, -1), POINTID = c(1,
2, 3, 4, 5, 6), WETLAND = c("no wetl", "no wetl", "no wetl",
"wetl", "wetl", "wetl"), TPI200 = c(70, 37, 45, 46, 58, 56),
TPI350 = c(67, 42, 55, 58, 55, 53), TPI500 = c(55, 35, 45,
51, 53, 51), TPI700 = c(50, 29, 39, 43, 49, 49), TPI900 = c(48,
32, 41, 46, 47, 46), TPI1000 = c(46, 16, 41, 36, 46, 46),
TPI2000 = c(53, 17, 53, 54, 54, 54), TPI3000 = c(47, 35,
47, 47, 47, 47), TPI4000 = c(49, 49, 49, 49, 49, 49), TPI5000 = c(63,
63, 63, 62, 62, 61), TPI2500 = c(48, 26, 48, 49, 49, 49)), row.names = c(NA,
6L), class = "data.frame"), df2 = structure(list(OID = c(-1,
-1, -1, -1, -1, -1), POINTID = c(1, 2, 3, 4, 5, 6), WETLAND = c("no wetl",
"no wetl", "no wetl", "wetl", "wetl", "wetl"), TPI200 = c(70,
37, 45, 46, 58, 56), TPI350 = c(67, 42, 55, 58, 55, 53), TPI500 = c(55,
35, 45, 51, 53, 51), TPI700 = c(50, 29, 39, 43, 49, 49), TPI900 = c(48,
32, 41, 46, 47, 46), TPI1000 = c(46, 16, 41, 36, 46, 46), TPI2000 = c(53,
17, 53, 54, 54, 54), TPI3000 = c(47, 35, 47, 47, 47, 47), TPI4000 = c(49,
49, 49, 49, 49, 49), TPI5000 = c(63, 63, 63, 62, 62, 61), TPI2500 = c(48,
26, 48, 49, 49, 49)), row.names = c(NA, 6L), class = "data.frame"))
Now the code, I will create an empty list myperf_plsda:
#Create model_list, you must have the object created
model_list <- lapply(datalist, custom_splsda,
ncomp = 2, keepX = c(5, 5),
Xcols = 4:8, Ycol = "WETLAND")
#Create empty list
myperf_plsda <- list()
#Loop for objects and saving
for (i in 1: length(model_list))
{
myperf_plsda[[i]] <- perf(model_list[[i]], validation = "Mfold", folds = 3,
progressBar = FALSE, nrepeat = 3, auc = TRUE)
object <- myperf_plsda[[i]]
save(object,file = paste0("performancePLSDA.",i,".RData"))
}
#Process the object myperf_plsda
#First function to get elements
extract1 <- function(x)
{
#Object
error.rate <- x$error.rate
error.rate <- lapply(error.rate, as.data.frame)
#Process
O1 <- do.call(rbind,error.rate)
#Separate vars
O1$id <- rownames(O1)
rownames(O1) <- NULL
O1$id1 <- gsub("\\..*","", O1$id )
O1$id2 <- gsub(".*\\.","", O1$id )
O1$id <- NULL
return(O1)
}
#Function 2
extract2 <- function(x)
{
#Object
error.rate.class <- x$error.rate.class
names(error.rate.class) <- gsub('.','_',names(error.rate.class),fixed = T)
error.rate.class <- lapply(error.rate.class, as.data.frame)
#Process
O2 <- do.call(rbind,error.rate.class)
#Separate vars
O2$id <- rownames(O2)
rownames(O2) <- NULL
O2$id1 <- gsub("\\..*","", O2$id )
O2$id2 <- gsub(".*\\.","", O2$id )
O2$id <- NULL
return(O2)
}
#Function 3
extract3 <- function(x)
{
#Object
auc <- x$auc
#Modify for dataframe
change <- function(x)
{
y <- as.data.frame(x)
y$id1 <- rownames(y)
rownames(y)<-NULL
y$id1 <- gsub('.','_',y$id1,fixed = T)
return(y)
}
auc <- lapply(auc, change)
#Process
O3 <- do.call(rbind,auc)
#Separate vars
O3$id2 <- rownames(O3)
rownames(O3) <- NULL
O3$id2 <- gsub("\\..*","", O3$id2 )
return(O3)
}
#Apply functions and save in lists for late process
L1 <- lapply(myperf_plsda,extract1)
L2 <- lapply(myperf_plsda,extract2)
L3 <- lapply(myperf_plsda,extract3)
#Assign the same names from model_list
names(L1) <- names(model_list)
names(L2) <- names(model_list)
names(L3) <- names(model_list)
#Bind the data
#Error rate
error.rate.df <- do.call(rbind,L1)
error.rate.df$genid <- gsub("\\..*","", rownames(error.rate.df) )
rownames(error.rate.df) <- NULL
#Error rate class
error.rate.class.df <- do.call(rbind,L2)
error.rate.class.df$genid <- gsub("\\..*","", rownames(error.rate.class.df) )
rownames(error.rate.class.df) <- NULL
#Auc
auc.df <- do.call(rbind,L3)
auc.df$genid <- gsub("\\..*","", rownames(auc.df) )
rownames(auc.df) <- NULL
With previous code you will end up with three dataframes that contains the values that are identified according to names of model_list, you can navigate by vars id1, id2 and genid to see measures, components and datasets:
error.rate.df
max.dist centroids.dist mahalanobis.dist id1 id2 genid
1 0.2222222 0.2222222 0.2222222 overall comp1 df1
2 0.2777778 0.3888889 0.2777778 overall comp2 df1
3 0.2222222 0.2222222 0.2222222 BER comp1 df1
4 0.2777778 0.3888889 0.2777778 BER comp2 df1
5 0.2222222 0.2222222 0.2222222 overall comp1 df2
6 0.2777778 0.3333333 0.2777778 overall comp2 df2
7 0.2222222 0.2222222 0.2222222 BER comp1 df2
8 0.2777778 0.3333333 0.2777778 BER comp2 df2
error.rate.class.df
comp1 comp2 id1 id2 genid
1 0.3333333 0.3333333 max_dist no wetl df1
2 0.1111111 0.2222222 max_dist wetl df1
3 0.3333333 0.6666667 centroids_dist no wetl df1
4 0.1111111 0.1111111 centroids_dist wetl df1
5 0.3333333 0.3333333 mahalanobis_dist no wetl df1
6 0.1111111 0.2222222 mahalanobis_dist wetl df1
7 0.3333333 0.3333333 max_dist no wetl df2
8 0.1111111 0.2222222 max_dist wetl df2
9 0.3333333 0.5555556 centroids_dist no wetl df2
10 0.1111111 0.1111111 centroids_dist wetl df2
11 0.3333333 0.3333333 mahalanobis_dist no wetl df2
12 0.1111111 0.2222222 mahalanobis_dist wetl df2
auc.df
x id1 id2 genid
1 0.62966667 AUC_mean comp1 df1
2 0.06414361 AUC_sd comp1 df1
3 0.81483333 AUC_mean comp2 df1
4 0.06414361 AUC_sd comp2 df1
5 0.62966667 AUC_mean comp1 df2
6 0.06414361 AUC_sd comp1 df2
7 0.77780000 AUC_mean comp2 df2
8 0.11110000 AUC_sd comp2 df2
Finally for the plots you can use next code (I have assigned the name of the dataset to x label so you can identify it into the plots):
#Plot and save
#Assign names
names(myperf_plsda) <- names(model_list)
pdf('example.pdf')
for (i in 1:length(myperf_plsda))
{
plot(myperf_plsda[[i]], col = color.mixo(5:7), sd = TRUE,
legend.position = "horizontal",xlab = paste0(names(myperf_plsda)[i],' (Comp)'))
}
dev.off()
As remark, I have changed the number of folds in order to make the code working but with your real data you could set the original values you have.
I have a tree called mytree that looks like this:
I have it stored as a list:
mytree <- list(list(structure(list(y = c(-10, 7, 8, -7), x = c(10, 20,
25, 35), grad = c(-10.5, 6.5, 7.5, -7.5), sim_score = c(4, 4,
4, 4), value = c(-1, -1, -1, -1)), row.names = c(NA, -4L), class = "data.frame")),
list(structure(list(y = -10, x = 10, grad = -10.5, sim_score = 110.25,
value = -10.5, gain = 120.333333333333, criterion = "x < 15"), row.names = 1L, class = "data.frame"),
structure(list(y = c(7, 8, -7), x = c(20, 25, 35), grad = c(6.5,
7.5, -7.5), sim_score = c(14.0833333333333, 14.0833333333333,
14.0833333333333), value = c(2.16666666666667, 2.16666666666667,
2.16666666666667), gain = c(120.333333333333, 120.333333333333,
120.333333333333), criterion = c("x >= 15", "x >= 15",
"x >= 15")), row.names = 2:4, class = "data.frame")),
list(NULL, NULL, structure(list(y = c(7, 8), x = c(20, 25
), grad = c(6.5, 7.5), sim_score = c(98, 98), value = c(7,
7), gain = c(140.166666666667, 140.166666666667), criterion = c("x < 30",
"x < 30")), row.names = 2:3, class = "data.frame"), structure(list(
y = -7, x = 35, grad = -7.5, sim_score = 56.25, value = -7.5,
gain = 140.166666666667, criterion = "x >= 30"), row.names = 4L, class = "data.frame")),
list(NULL, NULL, NULL, NULL, structure(list(y = 7, x = 20,
grad = 6.5, sim_score = 42.25, value = 6.5, gain = 0.5,
criterion = "x < 22.5"), row.names = 2L, class = "data.frame"),
structure(list(y = 8, x = 25, grad = 7.5, sim_score = 56.25,
value = 7.5, gain = 0.5, criterion = "x >= 22.5"), row.names = 3L, class = "data.frame"),
NULL, NULL))
and it looks like this:
[[1]]
[[1]][[1]]
y x grad sim_score value
1 -10 10 -10.5 4 -1
2 7 20 6.5 4 -1
3 8 25 7.5 4 -1
4 -7 35 -7.5 4 -1
[[2]]
[[2]][[1]]
y x grad sim_score value gain criterion
1 -10 10 -10.5 110.25 -10.5 120.3333 x < 15
[[2]][[2]]
y x grad sim_score value gain criterion
2 7 20 6.5 14.08333 2.166667 120.3333 x >= 15
3 8 25 7.5 14.08333 2.166667 120.3333 x >= 15
4 -7 35 -7.5 14.08333 2.166667 120.3333 x >= 15
[[3]]
[[3]][[1]]
NULL
[[3]][[2]]
NULL
[[3]][[3]]
y x grad sim_score value gain criterion
2 7 20 6.5 98 7 140.1667 x < 30
3 8 25 7.5 98 7 140.1667 x < 30
[[3]][[4]]
y x grad sim_score value gain criterion
4 -7 35 -7.5 56.25 -7.5 140.1667 x >= 30
[[4]]
[[4]][[1]]
NULL
[[4]][[2]]
NULL
[[4]][[3]]
NULL
[[4]][[4]]
NULL
[[4]][[5]]
y x grad sim_score value gain criterion
2 7 20 6.5 42.25 6.5 0.5 x < 22.5
[[4]][[6]]
y x grad sim_score value gain criterion
3 8 25 7.5 56.25 7.5 0.5 x >= 22.5
[[4]][[7]]
NULL
[[4]][[8]]
NULL
The first index of the list, i.e. 1, 2, 3, 4, correspond to the level, or height of the tree. The second index corresponds to the index of the node in the given level. For example, mytree[[1]][[1]] contains the root, which has child nodes in mytree[[2]][[1]] and mytree[[2]][[2]].
Given a parent node stored in mytree[[i]][[j]], its children are stored in mytree[[i + 1]][[2 * j]] and mytree[[i + 1]][[2 * j -1]].
I want to write a function called eval_tree that when given a new instance x, it will check which leaf node x falls into by checking the criterion of the splits and then output the value of the leaf, which is stored under value. Here is an example of how I'd like eval_tree to work:
newdata <- data.frame(x = c(10, 20, 25, 35))
> eval_tree(tree = mytree, newdata = newdata)
[1] -10.5
[2] 6.5
[3] 7.5
[4] -7.5
Here is what I have so far. Unfortunately it's not working...and I think I may need to implement the function recursively so that it's more efficient. Can anyone point me in the right direction?
eval_tree <- function(tree, newdata){
if(length(tree) == 1){
# If tree only has a root, return value of root
return(tree[[1]][[1]]$value[1])
}else if(length(tree) > 1){
for (level in 2:length(tree)){
for(ind in 1:length(tree[[level]]))
if(eval(parse(text = tree[[level]][[ind]][["criterion"]]))){
# Criterion is true, then go to child node
# Check if there is child node
if(is.null(tree[[level + 1]][[ind * 2]]) && is.null(tree[[level + 1]][[ind * 2 - 1]])){
return(tree[[level]][[ind]]$value[1])
}else if(eval(parse(text = tree[[level + 1]][[ind * 2]][["criterion"]]))){
# Criterion is true, then go to childi node
# I think this is where recursion would be more appropriate than all these nested loops
}
}
}
}
}
you can try something like this:
index <- function(x,tree,e, i = 1, j = 1)
{
if(nrow((tree[[i]][[j]])) == 1)
{
if(eval(parse(text=tree[[i]][[j]]$crite), list(x = x))) {
if(is.null(e$a)){
e$a <- i
e$b <- tree[[i]][[j]]$val
}
else if(e$a > i)e$b <- tree[[i]][[j]]$val
TRUE
}
else FALSE
}
else index(x, tree, e,i + 1,2*j-1) | index(x, tree,e,i+1, 2*j)
}
pred_tree <- function( tree,newdata){
cbind(newdata,pred = sapply(newdata$x,function(x){
e <- new.env()
index(x,tree,e)
e$b
}))
}
pred_tree(mytree,data.frame(x = c(10,20,25,30,25)))
x pred
1 10 -10.5
2 20 6.5
3 25 7.5
4 30 -7.5
5 25 7.5
I have a dataframe "data" with the following structure:
structure(list(age = c(45, 4, 32, 45), sex = c(1, 0, 1, 0), height = c(165,
178, 145, 132), weight = c(65, 73, 60, 45)), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
And I would like to add to this data.frame two new variables (var1, var2), which should be calculated with the two following formulas:
var1 = age*height + (4 if sex==1 OR 2 if sex==0)
var2 = height*weight + (1 if age>40 or 2 if age=<40)
I have a problem both in adding the two variables to the data frame, both in applying a function (I tried to build a function, but seems that can be applied only to a single value and not to all values from all rows).
Can anyone help me, please?
akrun's suggestion of using Boolean arithmetic is a good one but you could also do simply a Boolean version of your own expression substituting multiplication for the if statements.s (whit mild editing of the "=<" to "<=")
data <- structure(list(age = c(45, 4, 32, 45), sex = c(1, 0, 1, 0), height = c(165, 178, 145, 132), weight = c(65, 73, 60, 45)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"))
data <- within(data, {var1 = age*height + 4*(sex==1) + 2 *(sex==0);
var2 = height*weight + (age>40) + 2 *(age <= 40)})
#----
> data
age sex height weight var2 var1
1 45 1 165 65 10726 7429
2 4 0 178 73 12996 714
3 32 1 145 60 8702 4644
4 45 0 132 45 5941 5942
Since the two sets of conditions are each disjoint, the "non-qualifying" choice terms will each be 0.
the function ifelse() is vector based, so it will apply the conditions to each element in the vector.
df <- structure(list(age = c(45, 4, 32, 45), sex = c(1, 0, 1, 0), height = c(165,
178, 145, 132), weight = c(65, 73, 60, 45)), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
df$var1 <- ifelse(df$sex == 1,(df$age * df$height) + 4,(df$age * df$height) + 2)
df$var2 <- ifelse(df$age > 40,(df$weight * df$height) + 1,(df$age * df$height) + 2)
final output
> df
# A tibble: 4 x 6
age sex height weight var1 var2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 45 1 165 65 7429 10726
2 4 0 178 73 714 714
3 32 1 145 60 4644 4642
4 45 0 132 45 5942 5941
I rather the tool case_when() from dplyr package.
Your original data is:
data <-
structure(
list(age = c(45, 4, 32, 45),
sex = c(1, 0, 1, 0),
height = c(165, 178, 145, 132),
weight = c(65, 73, 60, 45)),
row.names = c(NA, -4L),
class = c("tbl_df", "tbl", "data.frame"))
The new variables are created by:
library(dplyr)
data ->
data %>% mutate(var1 = case_when(sex==1 ~ age*height + 4,
sex==0 ~ age*height + 2),
var2 = case_when(age>40 ~ height*weight + 1,
age<=40 ~ height*weight + 2)
)
The outcome is:
# A tibble: 4 x 6
age sex height weight var1 var2
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 45 1 165 65 7429 10726
2 4 0 178 73 714 12996
3 32 1 145 60 4644 8702
4 45 0 132 45 5942 5941
We convert the logical/binary to numeric index by adding 1 to it and use that to change the values to 2, 4, or just 1, 2 and use that in the calculation
library(dplyr)
data %>%
mutate(var1 = (age * height) + c(2, 4)[sex + 1],
var2 = (height * weight) + (age <= 40)+1)
# A tibble: 4 x 6
# age sex height weight var1 var2
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 45 1 165 65 7429 10726
#2 4 0 178 73 714 12996
#3 32 1 145 60 4644 8702
#4 45 0 132 45 5942 5941
The problem: AF is not cumulutaing. And its something wrong with the addressing to matrix elements and with the comparison...
My data
VF <- matrix(c(40, 70, 80, 35,
90, 66, 15, 46,
50, 52, 60, 80,
30, 73, 30, 40,
80, 70, 76, 69), nrow = 5, byrow = TRUE)
VA <- matrix(c(40, 25, 67, 43,
5, 26, 80, 43,
45, 35, 30, 10,
63, 13, 60, 45,
10, 19, 11, 22), nrow = 5, byrow = TRUE)
UV <- c(1, 0, 0, 1)
I try to call my function
f <- function(VF, VA, UV) {
vote_for <- VF
vote_against <- VA
user_vote <- UV
am_law <- ncol(vote_for)
am_fr <- nrow(vote_for)
AF <- 0
AFP_vec <- c(0)
for (i in 1:am_fr) {
AF <- 0
for (j in 1:am_law) {
if (user_vote[j] == 1) {
AF <- AF + vote_for[i][j]
} else {
AF <- AF + vote_against[i][j]
}
}
AFP <- AF / am_law
append(AFP_vec, AFP)
}
return(AFP_vec)
}
The result of calling
f(VF, VA, UV)
is
[1] 40 [1] NA [1] NA [1] NA [1] 90 [1] NA [1] NA [1] NA [1] 50
[1] NA [1] NA [1] NA [1] 30 [1] NA [1] NA [1] NA [1] 80 [1] NA [1] NA [1] NA [1] 0
but I wish it would be only 5 values in my AFP_vec.
Please, help me. I'm beginner. And I can't understand what's wrong.
Still not entirely sure of what your after but I think this might be along the right lines. Am happy to explain the code if required.
VF <- matrix(
c(40 , 70 , 80 ,35,
90 , 66 , 15 , 46,
50 , 52 , 60 , 80,
30 , 73 , 30 , 40,
80 , 70 , 76 , 69) , nrow = 5 , byrow = T
)
VA <- matrix(
c(40, 25, 67 , 43,
5, 26 , 80 , 43,
45 , 35 , 30 , 10 ,
63 , 13, 60 , 45,
10 , 19 , 11, 22 ), nrow = 5 , byrow = T
)
UV <- c( 1, 0, 0, 1)
mat <- (t(VF) * UV ) + (t(VA) * ! UV )
apply( mat , 2 , mean)
I have edited your code to correct the errors in it. This should work now. But the mentioned solution is much better.
fun<-function(VF,VA,UV)
{
vote_for<-VF
vote_against<-VA
user_vote<-UV
am_law<-ncol(vote_for)
am_fr<-nrow(vote_for)
AF<-0
AFP_vec<-list()
for (i in 1:am_fr)
{
AF<-0
for (j in 1:am_law)
{
if (user_vote[j] == 1){ AF=AF+vote_for[i,j]}
else { AF=AF+vote_against[i,j] }
}
AFP<-AF / am_law
AFP_vec[i]=AFP
}
return(AFP_vec) }
res=fun(VF,VA,UV)