Issue while executing drop() function in R - r

I am trying to find out usage of drop() function. I read the documentation that a matrix or array can be the input object for the function however the size of the matrix or object does not change. Can someone explain its actual usage and how it works?
I am using R version 3.2.1. Code snippet:
data1 <- matrix(data=(1:10),nrow=1,ncol=1)
drop(data1)

R has factors, which are very cool (and somewhat analogous to labeled levels in Stata). Unfortunately, the factor list sticks around even if you remove some data such that no examples of a particular level still exist.
# Create some fake data
x <- as.factor(sample(head(colors()),100,replace=TRUE))
levels(x)
x <- x[x!="aliceblue"]
levels(x) # still the same levels
table(x) # even though one level has 0 entries!
The solution is simple: run factor() again:
x <- factor(x)
levels(x)
If you need to do this on many factors at once (as is the case with a data.frame containing several columns of factors), use drop.levels() from the gdata package:
x <- x[x!="antiquewhite1"]
df <- data.frame(a=x,b=x,c=x)
df <- drop.levels(df)

R matrix is a two dimensional array. R has a lot of operator and functions that make matrix handling very convenient.
Matrix assignment:
>A <- matrix(c(3,5,7,1,9,4),nrow=3,ncol=2,byrow=TRUE)
>A
[,1] [,2]
[1,] 3 5
[2,] 7 1
[3,] 9 4
Matrix row and column count:
>rA <- nrow(A)
>rA
[1] 3
>cA <- ncol(A)
>cA
[1] 2
t(A) function returns a transposed matrix of A:
>B <- t(A)
>B
[,1] [,2] [,3]
[1,] 3 7 9
[2,] 5 1 4
Matrix multplication:
C <- A * A
C
[,1] [,2]
[1,] 9 25
[2,] 49 1
[3,] 81 16
Matrix Addition:
>C <- A + A
>C
[,1] [,2]
[1,] 6 10
[2,] 14 2
[3,] 18 8
Matrix subtraction (-) and division (/) operations ... ...
Sometimes a matrix needs to be sorted by a specific column, which can be done by using order() function.
Following is a csv file example:
,t1,t2,t3,t4,t5,t6,t7,t8
r1,1,0,1,0,0,1,0,2
r2,1,2,5,1,2,1,2,1
r3,0,0,9,2,1,1,0,1
r4,0,0,2,1,2,0,0,0
r5,0,2,15,1,1,0,0,0
r6,2,2,3,1,1,1,0,0
r7,2,2,3,1,1,1,0,1
Following R code will read in the above file into a matrix, and sort it by column 4, then write to a output file:
x <- read.csv("sortmatrix.csv",header=T,sep=",");
x <- x[order(x[,4]),];
x <- write.table(x,file="tp.txt",sep=",")
The result is:
"X","t1","t2","t3","t4","t5","t6","t7","t8"
"1","r1",1,0,1,0,0,1,0,2
"4","r4",0,0,2,1,2,0,0,0
"6","r6",2,2,3,1,1,1,0,0
"7","r7",2,2,3,1,1,1,0,1
"2","r2",1,2,5,1,2,1,2,1
"3","r3",0,0,9,2,1,1,0,1
"5","r5",0,2,15,1,1,0,0,0

The DROP function supports natively compiled, scalar user-defined functions.
Removes one or more user-defined functions from the current database
To execute DROP FUNCTION, at a minimum, a user must have ALTER permission on the schema to which the function belongs, or CONTROL permission on the function.
DROP FUNCTION will fail if there are Transact-SQL functions or views in the database that reference this function and were created by using SCHEMA BINDING, or if there are computed columns, CHECK constraints, or DEFAULT constraints that reference the function.
DROP FUNCTION will fail if there are computed columns that reference this function and have been indexed.
DROP FUNCTION { [ schema_name. ] function_name } [ ,...n ]

Related

How to write an apply() function that only applies to odd-numbered columns in r matrix?

Suppose we have a "test" matrix that looks like this: (1,2,3, 4,5,6, 7,8,9, 10,11,12) generated by running test <- matrix(1:12, ncol = 4). A simple 3 x 4 (rows x columns) matrix of numbers running from 1 to 12.
Now suppose we'd like to add a value of 1 to each element in each odd-numbered matrix column, so we end up with a matrix of the following values: (2,3,4, 4,5,6, 8,9,10, 10,11,12). How would we use an apply() function to do this?
Note that this is a simplified example. In the more complete code I'm working with, the matrix dynamically expands/contracts based on user inputs so I need an apply() function that counts the actual number of matrix columns, rather than using a fixed assumption of 4 columns per the above example. (And I'm not adding a value of 1 to the elements; I'm running the parallel minima function test[,1] <- pmin(test1[,1], 5) to say limit each value to a max of 5).
With my current limited understanding of the apply() family of functions, all I can so far do is apply(test, 2, function(x) {return(x+1)}) but this is adding a value of 1 to all elements in all columns rather than only the odd-numbered columns.
You may simply subset the input data frame to access only odd or even numbered columns. Consider:
test[c(TRUE, FALSE)] <- apply(test[c(TRUE, FALSE)], 2, function(x) f(x))
test[c(FALSE, TRUE)] <- apply(test[c(FALSE, TRUE)], 2, function(x) f(x))
This works because the recycling rules in R will cause e.g. c(TRUE, FALSE) to be repeated however many times is needed to cover all columns in the input test data frame.
For a matrix, we need to use the drop=FALSE flag when subsetting the matrix in order to keep it in matrix form when using apply():
test <- matrix(1:12, ncol = 4)
test[,c(TRUE, FALSE)] <- apply(test[,c(TRUE, FALSE),drop=FALSE], 2, function(x) x+1)
test
[,1] [,2] [,3] [,4]
[1,] 2 4 8 10
[2,] 3 5 9 11
[3,] 4 6 10 12
^ ^ ... these columns incremented by 1
You may use modulo %% 2.
odd <- !seq(ncol(test)) %% 2 == 0
test[, odd] <- apply(test[, odd], 2, function(x) {return(x + 1)})
# [,1] [,2] [,3] [,4]
# [1,] 2 4 8 10
# [2,] 3 5 9 11
# [3,] 4 6 10 12

Is the result of the which() function *always* ordered?

I want to assure that the result of which(..., arr.ind = TRUE) is always ordered, specifically: arranged ascending by (col, row). I do not see such a remark in the which function documentation, whereas it seems to be the case based on some experiments I made. How I can check / learn if it is the case?
Example. When I run the code below, the output is a matrix in which the results are arranged ascending by (col, row) columns.
> set.seed(1)
> vals <- rnorm(10)
> valsall <- sample(as.numeric(replicate(10, vals)))
> mat <- matrix(valsall, 10, 10)
> which(mat == max(mat), arr.ind = TRUE)
row col
[1,] 1 1
[2,] 3 1
[3,] 1 2
[4,] 2 2
[5,] 10 2
[6,] 1 6
[7,] 2 8
[8,] 4 8
[9,] 1 9
[10,] 6 9
Part1:
Answering a part of your question on how to understand functions on a deeper level, if the documentation is not enough, without going into the detail of function which().
As match() is not a primitive function (which are written in C), i.e. written using the basic building blocks of R, we can check what's going on behind the scenes by printing the function itself. Note that using the backticks allows to check functions that have reserved names, e.g. +, and is therefore optional in this example. This dense R code can be extremely tiresome to read, but I've found it very educational and it does solve some mental knots every once in a while.
> print(`which`)
function (x, arr.ind = FALSE, useNames = TRUE)
{
wh <- .Internal(which(x))
if (arr.ind && !is.null(d <- dim(x)))
arrayInd(wh, d, dimnames(x), useNames = useNames)
else wh
}
<bytecode: 0x00000000058673e0>
<environment: namespace:base>
Part2:
So after giving up on trying to understand the which and arrayInd function in the way described above, I'm trying it with common sense. The most efficient way to check each value of a matrix/array that makes sense to me, is to at some point convert it to a one-dimensional object. Coercion from matrix to atomic vector, or any reduction of dimensions will always result in concatenating the complete columns of each dimension, so to me it is natural that higher-level functions will also follow this fundamental rule.
> testmat <- matrix(1:10, nrow = 2, ncol = 5)
> testmat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> as.numeric(testmat)
[1] 1 2 3 4 5 6 7 8 9 10
I found Hadley Wickham's Advanced R an extremely valuable resource in answering your question, especially the chapters about functions and data structures.
[http://adv-r.had.co.nz/][1]

Creating a separate object for each iteration of for-loop within a function

I have a function that contains a for-loop that iterates 160 times, producing a 3-dimensional array after every iteration. I want the for-loop to save each array separately as object under the name, such as array001, array002, array003, etc., preferably without spitting those out to global workspace. Finally, I want to be able to call some of these arrays later on within the same function.
array.function <- function(df, parameter = 0) {
for (i in 1:160) {
DO A LOT OF STUFF
SAVE OUTPUT AS array###
}
DO MORE STUFF with arrays generated by for-loop above
}
Any ideas on how to save the arrays as objects with corresponding numbers in its' names? Thank you!
The comment from lmo is a great workable solution for storing the arrays and accessing them later in the function. However, if you really want them to be stored as separate variables instead of as a list, assign() and get() are your friends:
array.function <- function(df, parameter = 0) {
# This stores the function's environment for convenience:
env <- environment()
for ( i in 1:160 ) { # Then for each iteration
# make an array and store it in the function's environment
assign(paste0('array', i), array(c(i, 2:27), dim=c(3, 3, 3)))
}
# Then print out a randomly selected array to make sure it worked
print(get(paste0('array', sample(1:160, 1))))
# I don't know what you want to do with the arrays,
# for now just return the whole function's environment
# (the arrays as well as 'i', 'df', 'env', and 'parameter')
return(as.list(sapply(ls(envir=env), get, envir=env)))
}
x <- array.function(1) # the 1 is just because I kept your arguments
, , 1
[,1] [,2] [,3]
[1,] 158 4 7
[2,] 2 5 8
[3,] 3 6 9
, , 2
[,1] [,2] [,3]
[1,] 10 13 16
[2,] 11 14 17
[3,] 12 15 18
, , 3
[,1] [,2] [,3]
[1,] 19 22 25
[2,] 20 23 26
[3,] 21 24 27
For each iteration of your loop, assign(paste0('array', i), value) creates a variable in the function's environment -- not the global environment, at least not by default -- whose name is the result of paste0('array', i) and whose value is value (whatever array you create). You can access it later using get(paste0('array', x)), where x is the number of the array you need.
An excellent resource for learning more about function environments is Hadley Wickham's book.

R : confusion regarding LHS and RHS of assignment and order of operation

I am having some fundamental confusion with R. I have a snippet of R code.
> m <- 1:10
> m
[1] 1 2 3 4 5 6 7 8 9 10
> dim(m) <- c(2,5)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
Now I am a C/Python programmer and the line dim(m) <- c(2,5) is incredibly confusing to me. I realize that it effectively changed a vector into a matrix, however looking at it I do not understand the logic/order of operation.
<- is the assignment operator in R. So to me, logically the order of operation is : assign (2,5) to the output of dim(m). Since the output of dim(m) isn't assigned to a variable, the output would be lost.
Could someone explain how I should read the line dim(m) <- c(2,5)? What is the order of operation? It seems that the order of operation using <- to changes depending on the LHS and RHS of the equation.
These are special functions called Replacement Functions. I quote from Hadley's Advanced-R book:
Replacement functions act like they modify their arguments in place, and have the special name xxx<-. They typically have two arguments (x and value), although they can have more, and they must return the modified object. For example, the following function allows you to modify the second element of a vector:
`second<-` <- function(x, value) {
x[2] <- value
x
}
x <- 1:10
second(x) <- 5L
x
#> [1] 1 5 3 4 5 6 7 8 9 10
When R evaluates the assignment second(x) <- 5, it notices that the left hand side of the <- is not a simple name, so it looks for a function named second<- to do the replacement.
You can check the full chapter here under the Replacement Functions title.

applying list of functions columns in data frame

I have 2 variables, a and b. a is potentially very large. b is always a vector of functions that can be applied to each column in the data frame a.
a <- data.frame(col1=c(1, 2, 3), col2=c(4, 5, 6))
b <- c(as.double, function(x) {1+x})
So the result that I want is that function b[1] is applied to col1, b[2] is applied to col2, and so on for all columns. I feel lapply should be used here but documentation seems to say that it can only have one function. I could use a loop I suppose but a "vectorised" way would be nice.
This is one way to do it:
results <- mapply(function(i,j) b[[i]](a[[j]]), i=1:length(b), j=1:length(a))
It gives you:
> results
[,1] [,2]
[1,] 1 5
[2,] 2 6
[3,] 3 7

Resources