Suppose, I have a data set like,
y <- c(0,0,1,2,2,1,0,1)
a <- c(0,2,1,1,0,2,0,3)
dat <- data.frame(y, a)
I want to calculate the value of f <- digamma(a+y) by using the condition
if(a>0 & y==0) then f
if(a==0 & y>0) then f
if (a==0 & y==0) then f = 1
How can I do it using R code?
I think this will give you what you're after post hoc:
f <- digamma(a+y)
f[a==0 & y==0] <- 1
or
f[is.nan(f)] <- 1
Well this is using indexing to see which elements in the vector are nan or in the first method both a and y are 0. Then the index [] says take these elements of f and the assignment <- says make these elements a one.
Related
The following are my r code. I am trying to check whether the true value a = 10 is included or in the dataframe. If its included in the dataframe, then I need to compute the length of that data frame otherwise I want to assign the length 0 .
Assume the value I am checking is 10
k1 = c(1,2,3,5,6)
k2 = c(10,12,13,15,16,18)
For example, for the k1 set i want to get the length 0 whereas for k2 the length must be 6
I trying to use the following code to do this work
library(tidyverse)
map_lgl(k, `%in%`, x = 10) %>% length
Why it is not working for the k1 dataset?
you can do this with a simple ifelse statement - nothing else required.
a <- 10
ifelse(a %in% k2, length(k2), 0)
[1] 0
you could wrap in a function and feed the different sets in:
my_func <- function(x){
ifelse(a %in% x, length(x), 0)
}
my_func(k2)
[1] 6
If you have more K(i) lists (100, for example) and you need to interate with all of then, you can use a loop and store the results in a resume table.
I never saw map_lgl, but we can use the ~hard code~ of R, like:
k1 <- c(1,2,3,5,6)
k2 <- c(10,12,13,15,16,18)
results <- data.frame()
for(i in 1:2){
analysis <- get(paste("k",i,sep=""))
if(10 %in% analysis){
results[nrow(results)+1, 1] <- paste("k",i,sep="")
results[nrow(results), 2] <- length(analysis)
} else{
results[nrow(results)+1, 1] <- paste("k",i,sep="")
results[nrow(results), 2] <- 0
}
}
Than we get:
I would like to create evaluate different indexes in a for loop cycle.
those indexes has different formulas and not always they need to be evaluated.
f.i. :
my indices to evaluate might be
a=1
b=2
c=5
d=8
IDX1=function(a,b) {result=a+b}
IDX2=function(c,b) {result=c+b}
IDX3=function(d,b) {result=d+b-c}
IDX4=function(a,d) {result=a+d+b+c}
the formulas doesn't really matter
in a data frame I have the iteration number and the indices i need to take at each loop (let's say that I have to evaluate 2 indices for each iteration)
head=c("iter","IndexA","IndexB")
r1=c(1,"IDX1","IDX2")
r2=c(2,"IDX3","IDX4")
r3=c(3,"IDX1","IDX4")
df=as.data.frame(rbind(head,r1,r2,r3))
what I would like to do is within the loop evaluate for each iteration the respective 2 indices, calling automatically the right formula ad feed it with the right arguments
iter 1 : IndexA=IDX1(args)=3 ; IndexB(args)=IDX2(args)=7
iter 2 : IndexA=IDX3(args)=5 ; IndexB(args)=IDX4(args)=16
iter 3 : IndexA=IDX1(args)=3 ; IndexB(args)=IDX4(args)=16
Plese do not answer with "just run all the function and recall the
needed result in the loop".
I'm working with big matrix and memory is a problem indeed. I need to evaluate the function within the loop to reduce the usage of memory
I believe that the answer is some what inside this discussion but I can't get trough.
How to create an R function programmatically?
Can somebody explain me
1. how to built a function that can be programmatically changed in a loop?
2. once I have it how can I run the formula and get the result I want?
thanks
You can use a combination of eval and parse function to call (evaluate) any string as code. First, you have to construct such a string. For this, you can specify your indexes as character strings. For example: IDX1 = "a + b". Then, you can get that value by name with get("IDX1").
Try this code:
# Your preparations
a <- 1
b <- 2
c <- 5
d <- 8
IDX1 <- "a + b"
IDX2 <- "c + b"
IDX3 <- "d + b - c"
IDX4 <- "a + d + b + c"
head = c("iter", "IndexA", "IndexB")
r1 = c(1, "IDX1", "IDX2")
r2 = c(2, "IDX3", "IDX4")
r3 = c(3, "IDX1", "IDX4")
df = as.data.frame(rbind(r1, r2, r3))
colnames(df) <- head
# Loop over df with apply
result <- apply(df, 1, function(x){
# Construct call string for further evaluation
IndexA_call <- paste("ia <- ", get(x[2]), sep = "")
IndexB_call <- paste("ib <- ", get(x[3]), sep = "")
# eval each call string
eval(parse(text = IndexA_call))
eval(parse(text = IndexB_call))
x[2:3] <- c(ia, ib)
return(as.numeric(x))
})
result <- t(result)
colnames(result) <- head
print(result)
This gives:
iter IndexA IndexB
r1 1 3 7
r2 2 5 16
r3 3 3 16
i am trying to build a dataframe (df2) based on the following relationship: df1[i,j] = df2[i,j]^2. For doing this, i need to solve a system of non-linear equations:
library(nleqslv)
df1 = data.frame(a = c(9,9), b = c(9,9))
df2 = df1
for(i in colnames(df1)){
f = function(x) {df1[i] - x^2}
xstart = c(df2[i])
df2[i] = nleqslv(xstart, f)[[1]]
}
The expected result is:
a b
1 3 3
2 3 3
But i get the following error message:
Error in nleqslv(xstart, f) :
Argument 'x' cannot be converted to numeric!
not sure what causes the problem. Could you give me some advice please?
Well, I don't know what you are trying to accomplish, but I think the function you defined has to be fixed. You can do it in the following manner, although the answer is not correct.
f <- function(x) x - x^2
df1 = data.frame(a = c(9,9), b = c(9,9))
sapply(df1, function(y) nleqslv(y, f)[[1]])
You should instead use sqrt() since it is vectorized.
sqrt(df1)
# a b
# 1 3 3
# 2 3 3
I'm unclear as to why you need such a complex solution for such a simple operation (df2 <- sqrt(df1) would produce your example solution). But if you want to know what's producing that error, it comes down to how R indexes lists.
df1[1] returns a list, whereas df1[[1]] (double brackets) returns the vector. The nleqslv function expects vectors. So all we have to do is modify your existing code to use double brackets instead of singles:
library(nleqslv)
df1 = data.frame(a = c(9,9), b = c(9,9))
df2 = df1
for(i in colnames(df1)){
f = function(x) {df1[[i]] - x^2}
xstart = c(df2[[i]])
df2[i] = nleqslv(xstart, f)[[1]]
}
First creating the data:
df2 <- data.frame(a=c(9,9), b=c(9,9))
df1 <- df2
Now on solving it iteratively, here's the R code:
for(i in 1:nrow(df1)){
for(j in 1:ncol(df1)){
df2[i, j] <- sqrt(df1[i,j])
}
}
df2
This will return:
<dbl>
a b
3 3
3 3
You could have used a vectorized solution (df2 <- sqrt(df1)) to achieve the above as well, but the loop function above will work for you if you need to solve for it iteratively using a traditional loop.
I'm writing a function in R and I want to be able to call different objects from the function. I've got simple example of the problem I'm talking about (not the real code obviously).
example <- function(a,b){
c <- a+b
d <- a*b
e <- a/b
e
}
a <- 10
b <- 20
output <- example(a,b)
str(output)
output$c
My goal is for the last line to show the value of c defined in the function. In this code the only thing saved in output is the returned value, e.
I've tried changing the local and global environments, using <<- etc. That doesn't solve the problem though. Any help would be appreciated.
We can return multiple output in a list and then extract the list element
example <- function(a,b){
c <- a+b
d <- a*b
e <- a/b
list(c=c, d= d, e = e)
}
a <- 10
b <- 20
output <- example(a,b)[['c']]
output
#[1] 30
example(a,b)[['d']]
#[1] 200
I am trying to split my data set using two parameters, the fraction of missing values and "maf", and store the sub-data sets in a list. Here is what I have done (it's not working). Any help will be appreciated,
Thanks.
library(BLR)
library(missForest)
data(wheat)
X2<- prodNA(X, 0.4) ### creating missing values
dim(X2)
fd<-t(X2)
MAF<-function(geno){ ## markers are in the rows
geno[(geno!=0) & (geno!=1) & (geno!=-1)] <- NA
geno <- as.matrix(geno)
## calc_Freq for alleles
n0 <- apply(geno==0,1,sum,na.rm=T)
n1 <- apply(geno==1,1,sum,na.rm=T)
n2 <- apply(geno==-1,1,sum,na.rm=T)
n <- n0 + n1 + n2
## calculate allele frequencies
p <- ((2*n0)+n1)/(2*n)
q <- 1 - p
maf <- pmin(p, q)
maf}
frac.missing <- apply(fd,1,function(z){length(which(is.na(z)))/length(z)})
maf<-MAF(fd)
lst<-matrix()
for (i in seq(0.2,0.7,by =0.2)){
for (j in seq(0,0.2,by =0.005)){
lst=fd[(maf>j)|(frac.missing < i),]
}}
It sounds like you want the results that the split function provides.
If you have a vector, "frac.missing" and "maf" is defined on the basis of values in "fd" (and has the same length as the number of rows in fd"), then this would provide the split you are looking for:
spl.fd <- split(fd, list(maf, frac.missing) )
If you want to "group" the fd values basesd on of maf(fd) and frac.missing within the bands specified by your for-loop, then the same split-construct may do what your current code is failing to accomplish:
lst <- split( fd, list(cut(maf(fd), breaks = seq(0,0.2,by =0.005) ,
include.lowest=TRUE),
cut(frac.missing, breaks = seq(0.2,0.7,by =0.2),
right=TRUE,include.lowest=TRUE)
)
)
The right argument accomodates the desire to have the splits based on a "<" operator whereas the default operation of cut presumes a ">" comparison against the 'breaks'. The other function that provides similar facility is by.
the below codes give me exactly what i need:
Y<-t(GBS.binary)
nn<-colnames(Y)
fd<-Y
maf<-as.matrix(MAF(Y))
dff<-cbind(frac.missing,maf,Y)
colnames(dff)<-c("fm","maf",nn)
dff<-as.data.frame(dff)
for (i in seq(0.1,0.6,by=0.1)) {
for (j in seq(0,0.2,by=0.005)){
assign(paste("fm_",i,"maf_",j,sep=""),
(subset(dff, maf>j & fm <i))[,-c(1,2)])
} }