I am trying to set up a function that checks the data then runs the appropriate function.
I have tried to move tbl1 and tbl2 into TBL.Fun. It won't run.
TBL.fun <- function (x,y){
if(length(y)==1) tbl1(x[,y])
else if(length(y)==2) tbl2(x[,y[1]],x[,y[2]])
else print("Only two columns of data, kiddo!")
}
tbl1 <- function(x){
tbl <- ftable(x)
ptbl<- round(prop.table(tbl)*100,2)
out <- tbl
out[] <- paste(tbl,"(",ptbl,"%)")
return(out)
}
tbl2 <- function(x,y){
tbl <- ftable(x,y)
ptbl<- round(prop.table(tbl)*100,2)
out <- tbl
out[] <- paste(tbl,"(",ptbl,"%)")
return(out)
}
I want the TBL.fun to check the data and based on that check, compute and print the correct table. After I combined the functions into
TBL.fun1 <- function (x,y=NULL){
if(is.vector(x)==T && is.null(y)==T) tbl1(x)
else tbl2(x,y)
tbl1 <- function(x){
tbl <- ftable(x)
ptbl<- round(prop.table(tbl)*100,2)
out <- tbl
out[] <- paste(tbl,"(",ptbl,"%)")
return(out)
}
tbl2 <- function(x,y){
tbl <- ftable(x,y)
ptbl<- round(prop.table(tbl)*100,2)
out <- tbl
out[] <- paste(tbl,"(",ptbl,"%)")
return(out)
}
}
After combining the functions i ran a dput() on the function with a single variable.
Gender <- c("F","F","F","M","M","M")
Race <- c("Black","White","Asian","White","Black","Black")
> sam_dat <- cbind(Gender,Race)
dput(TBL.fun1(sam_dat[,1]))
function (x, y)
{
tbl <- ftable(x, y)
ptbl <- round(prop.table(tbl) * 100, 2)
out <- tbl
out[] <- paste(tbl, "(", ptbl, "%)")
return(out)
}
> TBL.fun1(sam_dat[,1])
You dont have to include all functions in TBL.fun1, you just call them, depending on the condition.
You can also simplify the condition as is.vector and is.null already return logical values, so you dont have to test for == TRUE.
I inserted 2 print statements, so you can see that both functions are called.
TBL.fun1 <- function (x, y = NULL){
if (is.vector(x) && is.null(y)) {
print("used tbl1")
tbl1(x)
} else {
print("used tbl2")
tbl2(x, y)
}
}
Gender <- c("F","F","F","M","M","M")
Race <- c("Black","White","Asian","White","Black","Black")
sam_dat <- cbind(Gender,Race)
a = TBL.fun1(sam_dat[,1])
b = TBL.fun1(sam_dat[,2], sam_dat[,1])
Related
Can anyone tell me what’s preventing this loop from running?
For each row i, in column 3 of the data frame ‘depth.df’, the loop preforms a mathematical function, using a second data frame, 'linker.df' (it multiplies i by a constant / a value from linker.df which is found by matching the value of i.
If I run the loop for a single instance of i, (lets say its = 50) it runs fine:
cor.depth <- function(depth.df){
result <- seq(from=1, to=(nrow(depth.df)))
x <- 8971
for(i in 1:nrow(depth.df)){
result[i] <- depth.df[i,3]*(x /( linker.df [i,2][ linker.df [i,1] == 50]))
return(result)
}
}
>97,331
but if I run it to loop over each instance of i, it always returns an error:
cor.depth <- function(depth.df){
result <- seq(from=1, to=(nrow(depth.df)))
x <- 8971
for(i in 1:nrow(depth.df)){
result[i] <- depth.df[i,3]*(x /( linker.df [i,2][ linker.df [i,1] %in% depth.df[i,3]]))
return(result)
}
}
Error in result[i] <- depth.df[i, 3] * (all_SC_bins/(depth.ea.bin.all[, :
replacement has length zero
EDIT
Here is a reproducible data set provided to illustrate data structure and issue
#make some data as an example
#make some data as an example
linker.data <- sample(x=40:50, replace = FALSE)
linker.df <- data.frame(
X = linker.data
, Y = sample(x=2000:3000, size = 11, replace = TRUE)
)
depth.df <- data.frame(
X = sample(x=9000:9999, size = 300, replace = TRUE)
, Y = sample(x=c("A","G","T","C"), size = 300, replace = TRUE)
, Z = sample(linker.data, size = 300, replace = TRUE)
)
cor.depth <- function(depth.df){
result <- seq(from=1, to=(nrow(depth.df)))
x <- 8971
for(i in 1:nrow(depth.df)){
result[i] <- depth.df[i,3]*(x /( linker.df [i,2][ linker.df [i,1] %in% depth.df[i,3]]))
return(result)
}
}
Error emerges because denominator returns integer(0) or numeric(0) or a FALSE result on most rows. Your loop attempts to find exact row number, i, where both dataframes' respective X and Z match. Likely, you intended where any of the rows match which would entail using a second, nested loop with an if conditional on matches.
cor.depth <- function(depth.df){
result <- seq(from=1, to=(nrow(depth.df)))
x <- 8971
for(i in 1:nrow(depth.df)){
for (j in 1:nrow(linker.df)){
if (linker.df[j,1] == depth.df[i,3]) {
result[i] <- depth.df[i,3]*(x /( linker.df[j,2]))
}
}
}
return(result)
}
Nonetheless, consider merge a more efficient, vectorized approach which matches any rows between both sets on ids. The setNames below renames columns to avoid duplicate headers:
mdf <- merge(setNames(linker.df, paste0(names(linker.df), "_l")),
setNames(depth.df, paste0(names(depth.df), "_d")),
by.x="X_l", by.y="Z_d")
mdf$result <- mdf$X_l * (8971 / mdf$Y_l)
And as comparison, the two approaches would be equivalent:
depth.df$result <- cor.depth(depth.df)
depth.df <- with(depth.df, depth.df[order(Z),]) # ORDER BY Z
mdf <- with(mdf, mdf[order(X_l),]) # ORDER BY X_L
all.equal(depth.df$result, mdf$result)
# [1] TRUE
I am trying to fill a vector pred_pos with the result pred on each iteration of the for loop. However, my pred_pos vector is never filled. The my_vec object is a list of large character vectors which I don't believe needs to be reproduced for this problem as it is most likely a fundamental indexing error. I just need to know how to populate a vector from this for loop. I can't seem to work out a solution.
pred_pos <- vector("numeric" , 2)
for(i in my_vec) {
for(r in pred_pos) {
inserts <- sapply(i, function(n) { n <- cond_probs_neg[n] } )
pred <- sum(unlist(inserts) , na.rm = T) * apriori_neg
pred_pos[r] <- pred
}
}
Assuming that the rest of your code works, there is no need to explicitly state:
pred_pos <- vector("numeric" , 2)
That creates a numeric vector of length two. You ought to be able to write:
pred_pos <- vector()
Now when you wish to append to the vector you can simply use:
vector[length(vector)+1] <- someData
I believe your code should work if it is adjusted:
pred_pos <- vector()
for(i in my_vec) {
inserts <- sapply(i, function(n) { n <- cond_probs_neg[n] } )
pred <- sum(unlist(inserts) , na.rm = T) * apriori_neg
pred_pos[length(pred_pos)+1] <- pred
}
Trying to nest functions with in a function to return a list in R after taking in a data frame. But running into a problem right away getting the error:
Error in ------frqTbl <- function(df) { : object 'frqTbl' not found
Is there some way to define a variable that's a function before the function definition? Or is the nesting incorrect?
Tested with:
data(diamonds, package = "ggplot2")
test <- diamonds[1:100,]
mstrFnct(test)
mstrFnct <- function(df){
output <- list()
frqTbl <- function(df){
fctvr <- df[sapply(df,is.factor)]
logicvr <- df[sapply(df,is.logical)]
nwDf <- data.frame(fctvr,logicvr)
if(ncol(nwDf)>0){
freq <-list()
for (i in 1:ncol(nwDf)){
freq[[i]] <- as.data.frame(table((nwDf)[,i]))
names(freq[[i]])[1]=colnames(nwDf[i])
}
return(freq)
}
else{
print("There are no categorical or logical variables in the data
frame.")
}
}
output[[length(output)+1]] <- frqTbl(df)
rSqd <- function(df){
y <- df[sapply(df,is.numeric)]
if(ncol(y)>=2){
c <- combn(colnames(y), 2)
vrPrs <- paste(c[1,], c[2,], sep = "-")
m <- cor(y, method = "pearson")
r <- m[which(lower.tri(m))]
vlus <- r^2
df2 <- data.frame(vrPrs, values)
names(df2) <- sub("^VrPrs$", "Variable Pairs",
names(df2))
names(df2) <- sub("^vlus$", "R-Square", names(df2))
format.data.frame(df2)
return(df2)
}
else{
print(paste("This Data Frame does not have two or more numerical
columns to compute the Pearson correlation coefficient(s)."))
}
}
output[[length(output)+1]] <- rSqd(df)
}
Is there some way to define a variable that's a function before the
function definition?
No. (see first code chunk)
Or is the nesting incorrect?
Actually not. You just messed up the variable names. (see second code chunk)
I suggest the following code to cover your example:
frqTbl <- function(df){
fctvr <- df[sapply(df,is.factor)]
logicvr <- df[sapply(df,is.logical)]
nwDf <- data.frame(fctvr,logicvr)
if(ncol(nwDf)>0){
freq <-list()
for (i in 1:ncol(nwDf)){
freq[[i]] <- as.data.frame(table((nwDf)[,i]))
names(freq[[i]])[1]=colnames(nwDf[i])
}
return(freq)
}
else{
print("There are no categorical or logical variables in the data
frame.")
}
}
rSqd <- function(df){
y <- df[sapply(df,is.numeric)]
if(ncol(y)>=2){
c <- combn(colnames(y), 2)
vrPrs <- paste(c[1,], c[2,], sep = "-")
m <- cor(y, method = "pearson")
r <- m[which(lower.tri(m))]
vlus <- r^2
df2 <- data.frame(vrPrs, vlus)
names(df2) <- sub("^vrPrs$", "Variable Pairs",
names(df2))
names(df2) <- sub("^vlus$", "R-Square", names(df2))
format.data.frame(df2)
return(df2)
}
else{
print(paste("This Data Frame does not have two or more numerical
columns to compute the Pearson correlation coefficient(s)."))
}
}
mstrFnct <- function(df){
output <- list()
output[[length(output)+1]] <- frqTbl(df)
output[[length(output)+1]] <- rSqd(df)
return(output)
}
data(diamonds, package = "ggplot2")
test <- diamonds[1:100,]
mstrFnct(test)
But you could also pack the function definitions into the master function. Like this:
mstrFnct <- function(df){
# create output list
output <- list()
# define function frqTbl()
frqTbl <- function(df){
fctvr <- df[sapply(df,is.factor)]
logicvr <- df[sapply(df,is.logical)]
nwDf <- data.frame(fctvr,logicvr)
if(ncol(nwDf)>0){
freq <-list()
for (i in 1:ncol(nwDf)){
freq[[i]] <- as.data.frame(table((nwDf)[,i]))
names(freq[[i]])[1]=colnames(nwDf[i])
}
return(freq)
}
else{
print("There are no categorical or logical variables in the data
frame.")
}
}
# call function frqTbl() and store result in list
output[[length(output)+1]] <- frqTbl(df)
# define function rSqd()
rSqd <- function(df){
y <- df[sapply(df,is.numeric)]
if(ncol(y)>=2){
c <- combn(colnames(y), 2)
vrPrs <- paste(c[1,], c[2,], sep = "-")
m <- cor(y, method = "pearson")
r <- m[which(lower.tri(m))]
vlus <- r^2
df2 <- data.frame(vrPrs, vlus)
names(df2) <- sub("^vrPrs$", "Variable Pairs",
names(df2))
names(df2) <- sub("^vlus$", "R-Square", names(df2))
format.data.frame(df2)
return(df2)
}
else{
print(paste("This Data Frame does not have two or more numerical
columns to compute the Pearson correlation coefficient(s)."))
}
}
# call function rSqd() and store result in list
output[[length(output)+1]] <- rSqd(df)
return(output)
}
data(diamonds, package = "ggplot2")
test <- diamonds[1:100,]
mstrFnct(test)
I've been working on a function in R, and looking to return a data.frame. However, when I run, it return 0 rows/0 columns. I did add a "print" in the if statement, and that works, so I know it's performing as intended within the loop and if statement.
Here is the function:
predict.next.word <- function(word, ng_matrix){
if(ncol(ng_matrix)==3){
for (i in 1:100){
ngram_df <- data.frame()
first_word <- ng_matrix[,1][i]
second_word <- ng_matrix[,2][i]
count_word <- ng_matrix[,3][i]
if (word[1] == first_word && !is.na(first_word)){
matched_factor <- structure(c(second_word, count_word), .Names = c("predicted", "count"))
matched_df <- as.data.frame(as.list(matched_factor))
ngram_df <- (rbind(ngram_df, matched_df))
ngram_df <- transform(ngram_df, count = as.numeric(count))
print (ngram_df) # this works great, but not intention of function
}
}
return (ngram_df)
}
}
Here is a sample of when I call it:
test_bigram_word <- c("a")
predict.next.word(test_bigram_word, bigram_index)
Solved it by moving the initializing of the dataframe above the for loop, like this:
predict.next.word <- function(word, ng_matrix){
if(ncol(ng_matrix)==3){
ngram_df <- data.frame()
for (i in 1:100){
first_word <- ng_matrix[,1][i]
second_word <- ng_matrix[,2][i]
count_word <- ng_matrix[,3][i]
if (word[1] == first_word && !is.na(first_word)){
matched_factor <- structure(c(second_word, count_word), .Names = c("predicted", "count"))
matched_df <- as.data.frame(as.list(matched_factor))
ngram_df <- (rbind(ngram_df, matched_df))
ngram_df <- transform(ngram_df, count = as.numeric(count))
print (ngram_df) # this works great, but not intention of function
}
}
return (ngram_df)
}
}
I am wondering whether a proper framework for interval manipulation and comparison does exist in R.
After some search, I was only able to find the following:
- function findInterval in base Package. (but I hardly understand it)
- some answers here and there about union and intersection (notably: http://r.789695.n4.nabble.com/Union-Intersect-two-continuous-sets-td4224545.html)
Would you know of an initiative to implement a comprehensive set of tools to easily handles frequent tasks in interval manipulation, like inclusion/setdiff/union/intersection/etc. (eg see here for a list of functionalities)?
or would you have advice in developing such an approach?
below are some drafts on my side for doing so. it is surely awkward and still has some bugs but it might illustrate what I am looking for.
preliminary aspects about the options taken
- should deal seamlessly with intervals or intervals set
- intervals are represented as 2 columns data.frames (lower boundary, higher boundary), on one row
- intervals sets are represented as 2 columns with several rows
- a third column might be needed for identification of intervals sets
UNION
interval_union <- function(df){ # for data frame
df <- interval_clean(df)
if(is.empty(df)){
return(as.data.frame(NULL))
} else {
if(is.POSIXct(df[,1])) {
dated <- TRUE
df <- colwise(as.numeric)(df)
} else {
dated <- FALSE
}
M <- as.matrix(df)
o <- order(c(M[, 1], M[, 2]))
n <- cumsum( rep(c(1, -1), each=nrow(M))[o])
startPos <- c(TRUE, n[-1]==1 & n[-length(n)]==0)
endPos <- c(FALSE, n[-1]==0 & n[-length(n)]==1)
M <- M[o]
if(dated == TRUE) {
df2 <- colwise(mkDateTime)(as.data.frame(cbind(M[startPos], M[endPos])), from.s = TRUE)
} else {
df2 <- as.data.frame(cbind(M[startPos], M[endPos]))
}
colnames(df2) <- colnames(df)
# print(df2)
return(df2)
}
}
union_1_1 <- function(test, ref){
names(ref) <- names(test)
tmp <- interval_union(as.data.frame(rbind(test, ref)))
return(tmp)
}
union_1_n <- function(test, ref){
return(union_1_1(test, ref))
}
union_n_n <- function(test, ref){
testnn <- adply(.data = test, 1, union_1_n, ref, .expand = FALSE)
return(testnn)
}
ref_interval_union <- function(df, ref){
tmp0 <- adply(df, 1, union_1_1, ref, .expand = FALSE) # set to FALSE to keep ID
return(tmp0)
}
INTERSECTION
interval_intersect <- function(df){
# adapted from : http://r.789695.n4.nabble.com/Union-Intersect-two-continuous-sets-td4224545.html
M <- as.matrix(df)
L <- max(M[, 1])
R <- min(M[, 2])
Inew <- if (L <= R) c(L, R) else c()
if (!is.empty(Inew)){
df2 <- t(as.data.frame(Inew))
colnames(df2) <- colnames(df)
rownames(df2) <- NULL
} else {
df2 <- NULL
}
return(as.data.frame(df2))
}
ref_interval_intersect <- function(df, ref){
tmpfun <- function(a, b){
names(b) <- names(a)
tmp <- interval_intersect(as.data.frame(rbind(a, b)))
return(tmp)
}
tmp0 <- adply(df, 1, tmpfun, ref, .expand = FALSE) # [,3:4]
#if(!is.empty(tmp0)) colnames(tmp0) <- colnames(df)
return(tmp0)
}
int_1_1 <- function(test, ref){
te <- as.vector(test)
re <- as.vector(ref)
names(re) <- names(te)
tmp0 <- c(max(te[1, 1], re[1, 1]), min(te[1, 2], re[1, 2]))
if(tmp0[1]>tmp0[2]) tmp0 <- NULL # inverse of a correct interval --> VOID
if(!is.empty(tmp0)){
tmp1 <- colwise(mkDateTime)(as.data.frame(t(as.data.frame(tmp0))))
colnames(tmp1) <- colnames(test)
} else {
tmp1 <- data.frame(NULL)
}
return(tmp1)
}
int_1_n <- function(test, ref){
test1 <- adply(.data = ref, 1, int_1_1, test = test, .expand = FALSE)
if(is.empty(test1)){
return(data.frame(NULL))
} else {
testn <- interval_union(test1[,2:3])
return(testn)
}
}
int_n_n <- function(test, ref){
testnn <- adply(.data = test, 1, int_1_n, ref, .expand = FALSE)
# return(testnn[,2:3]) # return interval set without index (1st column)
return(testnn) # return interval set with index (1st column) --> usefull to go with merge to keep metadata going alon g with interval description
}
int_intersect <- function(df, ref){
mycols <- colnames(df)
df$X1 <- 1:nrow(df)
test <- df[, 1:2]
tmp <- int_n_n(test, ref)
intersection <- merge(tmp, df, by = "X1", suffixes = c("", "init"))
return(intersection[,mycols])
}
EXCLUSION
excl_1_1 <- function(test, ref){
te <- as.vector(test)
re <- as.vector(ref)
names(re) <- names(te)
if(te[1] < re[1]){ # Lower Bound
if(te[2] > re[1]){ # overlap
x <- unlist(c(te[1], re[1]))
} else { # no overlap
x <- unlist(c(te[1], te[2]))
}
} else { # test > ref on lower bound side
x <- NULL
}
if(te[2] > re[2]){ # Upper Bound
if(te[1] < re[2]){ # overlap
y <- unlist(c(re[2], te[2]))
} else { # no overlap
y <- unlist(c(te[1], te[2]))
}
} else { # test < ref on upper bound side
y <- NULL
}
if(is.empty(x) & is.empty(y)){
tmp0 <- NULL
tmp1 <- tmp0
} else {
tmp0 <- as.data.frame(rbind(x, y))
colnames(tmp0) <- colnames(test)
tmp1 <- interval_union(tmp0)
}
return(tmp1)
}
excl_1_n <- function(test, ref){
testn0 <- adply(.data = ref, 1, excl_1_1, test = test, .expand=FALSE)
# boucle pour intersecter successivement les intervalles sets, pour gérer les intervalles disjoints (identifiés par X1, col1)
tmp <- range(testn0)
names(tmp) <- colnames(testn0)[2:3]
tmp <- as.data.frame(t(tmp))
for(i in unique(testn0[,1])){
tmp <- int_n_n(tmp, testn0[testn0[,1]==i, 2:3])
}
return(tmp)
}
INCLUSION
incl_1_1 <- function(test, ref){
te <- as.vector(test)
re <- as.vector(ref)
if(te[1] >= re[1] & te[2] <= re[2]){ return(TRUE) } else { return(FALSE) }
}
incl_1_n <- function(test, ref){
testn <- adply(.data = ref, 1, incl_1_1, test = test)
return(any(testn[,ncol(testn)]))
}
incl_n_n <- function(test, ref){
testnn <- aaply(.data = test, 1, incl_1_n, ref, .expand = FALSE)
names(testnn) <- NULL
return(testnn)
}
flat_incl_n_n <- function(test, ref){
ref <- interval_union(ref)
return(incl_n_n(test, ref))
}
# testing for a vector, instead of an interval set
incl_x_1 <- function(x, ref){
test <- (x>=ref[1,1] & x<ref[1,2])
return(test)
}
incl_x_n <- function(x, ref){
test <- any(x>=ref[,1] & x<ref[,2])
return(test)
}
I think you might be able to make good use of the many interval-related functions in the sets package.
Here's a small example illustrating the package's support for interval construction, intersection, set difference, union, and complementation, as well as its test for inclusion in an interval. These and many other related functions are documented on the help page for ?interval.
library(sets)
i1 <- interval(1,6)
i2 <- interval(5,10)
i3 <- interval(200,400)
i4 <- interval(202,402)
i5 <- interval_union(interval_intersection(i1,i2),
interval_symdiff(i3,i4))
i5
# [5, 6] U [200, 202) U (400, 402]
interval_complement(i5)
# [-Inf, 5) U (6, 200) U [202, 400] U (402, Inf]
interval_contains_element(i5, 5.5)
# [1] TRUE
interval_contains_element(i5, 201)
# [1] TRUE
If your intervals are currently encoded in a two-column data.frame, you could use something like mapply() to convert them to intervals of the type used by the sets package:
df <- data.frame(lBound = c(1,5,100), uBound = c(10, 6, 200))
Ints <- with(df, mapply("interval", l=lBound, r=uBound, SIMPLIFY=FALSE))
Ints
# [[1]]
# [1, 10]
# [[2]]
# [5, 6]
# [[3]]
# [100, 200]