I'm not familiar with R, just a newbie. So I want to translate some code from matlab to R. But I have the problem about the output of function. I want to create a function give output to two specify variable, like this:
list[a,b]<-function(var1,var2){
a<-var1 + var2
b<-var1 - var2
return list(a,b)
}
But my code is not working, please help me to solve this problem.
You seem to have some fundamental misunderstanding about functions in R. Read "An Introduction to R". Also, return is a function in R.
myfun <- function(var1, var2){
a <- var1 + var2
b <- var1 - var2
return(list(a, b))
}
myfun(1:5, 10:6)
#[[1]]
#[1] 11 11 11 11 11
#
#[[2]]
#[1] -9 -7 -5 -3 -1
Related
Sometimes I read posts where people use the print() function and I don't understand why it is used. Here for example in one answer the code is
print(fitted(m))
# 1 2 3 4 5 6 7 8
# 0.3668989 0.6083009 0.4677463 0.8685777 0.8047078 0.6116263 0.5688551 0.4909217
# 9 10
# 0.5583372 0.6540281
But using fitted(m) would give the same output. I know there are situations where we need print(), for example if we want create plots inside of loops. But why is the print() function used in cases like the one above?
I guess that in many cases usage of print is just a bad/redundant habit, however print has a couple of interesting options:
Data:
x <- rnorm(5)
y <- rpois(5, exp(x))
m <- glm(y ~ x, family="poisson")
m2 <- fitted(m)
# 1 2 3 4 5
# 0.8268702 1.0523189 1.9105627 1.0776197 1.1326286
digits - shows wanted number of digits
print(m2, digits = 3) # same as round(m2, 3)
# 1 2 3 4 5
# 0.827 1.052 1.911 1.078 1.133
na.print - turns NA values into a specified value (very similar to zero.print argument)
m2[1] <- NA
print(m2, na.print = "Failed")
# 1 2 3 4 5
# Failed 1.052319 1.910563 1.077620 1.132629
max - prints wanted number of values
print(m2, max = 2) # similar to head(m2, 2)
# 1 2
# NA 1.052319
I'm guessing, as I rarely use print myself:
using print() makes it obvious which lines of your code do printing and which ones do actual staff. It might make re-reading your code later easier.
using print() explicitly might make it easier to later refactor your code into a function, you just need to change the print into a return
programmers coming from a language with strict syntax might have a strong dislike towards the automatic printing feature of r
I'm new to R and can't seem to get to grips with how to call a previous value of "self", in this case previous "b" b[-1].
b <- ( ( 1 / 14 ) * MyData$High + (( 13 / 14 )*b[-1]))
Obviously I need a NA somewhere in there for the first calculation, but I just couldn't figure this out on my own.
Adding example of what the sought after result should be (A=MyData$High):
A b
1 5 NA
2 10 0.7142...
3 15 3.0393...
4 20 4.6079...
1) for loop Normally one would just use a simple loop for this:
MyData <- data.frame(A = c(5, 10, 15, 20))
MyData$b <- 0
n <- nrow(MyData)
if (n > 1) for(i in 2:n) MyData$b[i] <- ( MyData$A[i] + 13 * MyData$b[i-1] )/ 14
MyData$b[1] <- NA
giving:
> MyData
A b
1 5 NA
2 10 0.7142857
3 15 1.7346939
4 20 3.0393586
2) Reduce It would also be possible to use Reduce. One first defines a function f that carries out the body of the loop and then we have Reduce invoke it repeatedly like this:
f <- function(b, A) (A + 13 * b) / 14
MyData$b <- Reduce(f, MyData$A[-1], 0, acc = TRUE)
MyData$b[1] <- NA
giving the same result.
This gives the appearance of being vectorized but in fact if you look at the source of Reduce it does a for loop itself.
3) filter Noting that the form of the problem is a recursive filter with coefficient 13/14 operating on A/14 (but with A[1] replaced with 0) we can write the following. Since filter returns a time series we use c(...) to convert it back to an ordinary vector. This approach actually is vectorized as the filter operation is performed in C.
MyData$b <- c(filter(replace(MyData$A, 1, 0)/14, 13/14, method = "recursive"))
MyData$b[1] <- NA
again giving the same result.
Note: All solutions assume that MyData has at least 1 row.
There are a couple of ways you could do this.
The first method is a simple loop
df <- data.frame(A = seq(5, 25, 5))
df$b <- 0
for(i in 2:nrow(df)){
df$b[i] <- (1/14)*df$A[i]+(13/14)*df$b[i-1]
}
df
A b
1 5 0.0000000
2 10 0.7142857
3 15 1.7346939
4 20 3.0393586
5 25 4.6079758
This doesn't give the exact values given in the expected answer, but it's close enough that I've assumed you made a transcription mistake. Note that we have to assume that we can take the NA in df$b[1] as being zero or we get NA all the way down.
If you have heaps of data or need to do this a bunch of time the speed could be improved by implementing the code in C++ and calling it from R.
The second method uses the R function sapply
The form you present the problem in
is recursive, which makes it impossible to vectorise, however we can do some maths and find that it is equivalent to
We can then write a function which calculates b_i and use sapply to calculate each element
calc_b <- function(n,A){
(1/14)*sum((13/14)^(n-1:n)*A[1:n])
}
df2 <- data.frame(A = seq(10,25,5))
df2$b <- sapply(seq_along(df2$A), calc_b, df2$A)
df2
A b
1 10 0.7142857
2 15 1.7346939
3 20 3.0393586
4 25 4.6079758
Note: We need to drop the first row (where A = 5) in order for the calculation to perform correctly.
I have managed to aggregate data successfully using the following pattern:
newdf <- setDT(df)[, list(X=sum(x),Y=max(y)), by=Z]
However, the moment I try to do anything more complicated, although the code runs, it no longer aggregates by Z: it seems to create a dataframe with the same number of observations as the original df so I know that no grouping is actually occurring.
The custom function I would like to apply is to find the n-quantile for the current list of values and then do some other stuff with it. I saw use of sdcols in another SO answer and tried something like:
customfunc <- function(dt){
q = unname(quantile(dt$column,0.25))
n = nrow(dt[dt$column <= q])
return(n/dt$someOtherColumn)
}
#fails to group anything!!! also rather slow...
newdf <- setDT(df)[, customfunc(.SD), by=Z, .SDcols=c(column, someOtherColumn)]
Can someone please help me figure out what is wrong with the way I'm trying to use group by and custom functions? Thank you very much.
Literal example as requested:
> df <- data.frame(Z=c("abc","abc","def","abc"), column=c(1,2,3,4), someOtherColumn=c(5,6,7,8))
> df
Z column someOtherColumn
1 abc 1 5
2 abc 2 6
3 def 3 7
4 abc 4 8
> newdf <- setDT(df)[, customfunc(.SD), by=Z, .SDcols=c("column", "someOtherColumn")]
> newdf
Z V1
1: abc 0.2000000
2: abc 0.1666667
3: abc 0.1250000
4: def 0.1428571
>
As you can see, DF is not grouped. There should just be two rows, one for "abc", and another for "def" since I am trying to group by Z.
As guided by eddi's point above, the basic problem is thinking that your custom function is being called inside a loop and that 'dt$column' will mysteriously give you the 'current value at the current row'. Instead it gives you the entire column (a vector). The function is passed the entire data table, not row-wise bits of data.
So, replacing the value in the return statement with something that represents a single value works. Example:
customfunc <- function(dt){
q = unname(quantile(dt$column,0.25))
n = nrow(dt[dt$column <= q])
return(n/length(dt$someOtherColumn))
}
> df <- data.frame(Z=c("abc","abc","def","abc"), column=c(1,2,3,4), someOtherColumn=c(5,6,7,8))
> df
Z column someOtherColumn
1 abc 1 5
2 abc 2 6
3 def 3 7
4 abc 4 8
> newdf <- setDT(df)[, customfunc(.SD), by=Z, .SDcols=c("column", "someOtherColumn")]
> newdf
Z V1
1: abc 0.3333333
2: def 1.0000000
Now the data is aggregated correctly.
I would like to create a loop in order to create 15 crosstables with one data.frame (var1), which consist of 15 variables, and another variable (var2), see data which can be downloaded here.
The code is now able to give results, but I would like to know how I can rename the variable "mytable" so that I get mytable1, mytable2, etc.
Code:
library(vcd) # for Cramer's V
var1 <- read.csv("~/example.csv", dec=",")
var2 <- sample(1:43)
i <- 1
while(i <= ncol(var1)) {
mytable[[i]] <- table(var2,var1[,i])
assocstats(mytable[[i]])
print(mytable[[i]])
i <- i + 1
}
As suggested in the comments, using names like mytable1, mytable2, etc. for a list of objects is actively discouraged when using R. Collecting all in a list is more useful and cleaner.
One way to do what you want would be this:
library(vcd) # for Cramer's V
data(mtcars)
var1 <- mtcars[ , c(2, 8:11)] ##OP's CSV no longer available
var2 <- sample(1:5, 32, TRUE)
mytable <- myassoc <- list() ##store output in a list
##a `for` loop looks simpler than `while`
for(i in 1:ncol(var1)){
mytable[[i]] <- table(var2, var1[ , i])
myassoc[[i]] <- assocstats(mytable[[i]])
}
So now to access "mytable2" and "myassoc2" you would simply do:
> mytable[[2]]
var2 0 1
1 4 2
2 6 6
3 1 1
4 2 3
5 5 2
> myassoc[[2]]
X^2 df P(> X^2)
Likelihood Ratio 1.7079 4 0.78928
Pearson 1.6786 4 0.79460
Phi-Coefficient : NA
Contingency Coeff.: 0.223
Cramer's V : 0.229
Puzzle for the R cognoscenti: Say we have a data-frame:
df <- data.frame( a = 1:5, b = 1:5 )
I know we can do things like
with(df, a)
to get a vector of results.
But how do I write a function that takes an expression (such as a or a > 3) and does the same thing inside. I.e. I want to write a function fn that takes a data-frame and an expression as arguments and returns the result of evaluating the expression "within" the data-frame as an environment.
Never mind that this sounds contrived (I could just use with as above), but this is just a simplified version of a more complex function I am writing. I tried several variants ( using eval, with, envir, substitute, local, etc) but none of them work. For example if I define fn like so:
fn <- function(dat, expr) {
eval(expr, envir = dat)
}
I get this error:
> fn( df, a )
Error in eval(expr, envir = dat) : object 'a' not found
Clearly I am missing something subtle about environments and evaluation. Is there a way to define such a function?
The lattice package does this sort of thing in a different way. See, e.g., lattice:::xyplot.formula.
fn <- function(dat, expr) {
eval(substitute(expr), dat)
}
fn(df, a) # 1 2 3 4 5
fn(df, 2 * a + b) # 3 6 9 12 15
That's because you're not passing an expression.
Try:
fn <- function(dat, expr) {
mf <- match.call() # makes expr an expression that can be evaluated
eval(mf$expr, envir = dat)
}
> df <- data.frame( a = 1:5, b = 1:5 )
> fn( df, a )
[1] 1 2 3 4 5
> fn( df, a+b )
[1] 2 4 6 8 10
A quick glance at the source code of functions using this (eg lm) can reveal a lot more interesting things about it.
A late entry, but the data.table approach and syntax would appear to be what you are after.
This is exactly how [.data.table works with the j, i and by arguments.
If you need it in the form fn(x,expr), then you can use the following
library(data.table)
DT <- data.table(a = 1:5, b = 2:6)
`[`(x=DT, j=a)
## [1] 1 2 3 4 5
`[`(x=DT, j=a * b)
## [1] 2 6 12 20 30
I think it is easier to use in more native form
DT[,a]
## [1] 1 2 3 4 5
and so on. In the background this is using substitute and eval
?within might also be of interest.
df <- data.frame( a = 1:5, b = 1:5 )
within(df, cx <- a > 3)
a b cx
1 1 1 FALSE
2 2 2 FALSE
3 3 3 FALSE
4 4 4 TRUE
5 5 5 TRUE