Functions without arguments - r

I'm not sure about this. Here is an example of a function which does not work:
myfunction<-function(){
mydata=read_excel("Square_data.xlsx", sheet = "Data", skip=0)
mydata$Dates=as.Date(mydata$Dates, format= "%Y-%m-%d")
mydata.ts=ts(mydata, start=2006, frequency=1)
}
The files do not load. When I execute each command line by line in R the files are loaded, so there's no problem with the commands. My question is, can I run a function such as myfunction to load the files? Thanks.

Last statement in function is an assignment If the last executed statement in a function is an assignment then it will not display on the console unless you use print but if the function result is assigned then you can print the assigned value later. For example, using the built in BOD data frame:
> f <- function() bod <- BOD
> f() # no result printed on console because f() was not explicitly printed
> print(f()) # explicitly print
Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
> X <- f() # assign and then print the assigned value
> X
Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
Last statement in function is expression producing a result If the last statement produces a value rather than being an assignment then a result is printed on the console. For example:
> g <- function() BOD
> g()
Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
Thus make sure that the last statement in your function is not an assignment if you want it to display on the console automatically.
Note 1: sourcing code Also, note that if your code is sourced using a source() statement or if the code is called by another function then it also won't print automatically on the console unless you use a print.
Note 2: Two results Regarding some comments to the question, if you want to output two results then output them in a named list. For example. this outputs a list with components named BOD and BOD2:
h <- function() list(BOD = BOD, BOD2 = 2*BOD)
h()
$BOD
Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
$BOD2
Time demand
1 2 16.6
2 4 20.6
3 6 38.0
4 8 32.0
5 10 31.2
6 14 39.6
We could refer to them like this:
> H <- h()
> H$BOD
Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
> H$BOD2
Time demand
1 2 16.6
2 4 20.6
3 6 38.0
4 8 32.0
5 10 31.2
6 14 39.6
Note 3: <<- operator Regarding the comments to the question, in general, using the <<- operator should be avoided because it undesirably links the internals of your function to the global workspace in an invisible and therefore error-prone way. If you want to return a value it is normally best to return it as the output of the function. There are some situations where <<- is warranted but they are relatively uncommon.

Sure. Just give it a value to be returned:
myfunction<-function(){
mydata=read_excel("Square_data.xlsx", sheet = "Data", skip=0)
mydata$Dates=as.Date(mydata$Dates, format= "%Y-%m-%d")
ts(mydata, start=2006, frequency=1) # The last object is returned by an R function
}
so calling dat <- myfunction() will make dat the ts-object that was created inside the function.
P.S.: There also in a return function in R. As a best practice only use this if you want to return an object early, e.g. in combination with if

Related

Applying a label depending on which condition is met using R

I would like to use a simple R function where the contents of a specified data frame column are read row by row, then depending on the value, a string is applied to that row in a new column.
So far, I've tried to use a combination of loops and generating individual columns which were combined later. However, I cannot seem to get the syntax right.
The input looks like this:
head(data,10)
# A tibble: 10 x 5
Patient T1Score T2Score T3Score T4Score
<dbl> <dbl> <dbl> <dbl> <dbl>
1 3 96.4 75 80.4 82.1
2 5 100 85.7 53.6 55.4
3 6 82.1 85.7 NA NA
4 7 82.1 85.7 60.7 28.6
5 8 100 76.8 64.3 57.7
6 10 46.4 57.1 NA 75
7 11 71.4 NA NA NA
8 12 98.2 92.9 85.7 82.1
9 13 78.6 89.3 37.5 42.9
10 14 89.3 100 64.3 87.5
and the function I have written looks like this:
minMax<-function(x){
#make an empty data frame for the output to go
output<-data.frame()
#making sure the rest of the commands only look at what I want them to look at in the input object
a<-x[2:5]
#here I'm gathering the columns necessary to perform the calculation
minValue<-apply(a,1,min,na.rm=T)
maxValue<-apply(a,1,max,na.rm=T)
tempdf<-as.data.frame((cbind(minValue,maxValue)))
Difference<-tempdf$maxValue-tempdf$minValue
referenceValue<-ave(Difference)
referenceValue<-referenceValue[1]
#quick aside to make the first two thirds of the output file
output<-as.data.frame((cbind(x[1],Difference)))
#Now I need to define the class based on the referenceValue, and here is where I run into trouble.
apply(output, 1, FUN =
for (i in Difference) {
ifelse(i>referenceValue,"HIGH","LOW")
}
)
output
}
I also tried...
if (i>referenceValue) {
apply(output,1,print("HIGH"))
}else(print("LOW")) {}
}
)
output
}
Regardless, both end up giving me the error message,
c("'for (i in Difference) {' is not a function, character or symbol", "' ifelse(i > referenceValue, \"HIGH\", \"LOW\")' is not a function, character or symbol", "'}' is not a function, character or symbol")
The expected output should look like:
Patient Difference Toxicity
3 21.430000 LOW
5 46.430000 HIGH
6 3.570000 LOW
7 57.140000 HIGH
8 42.310000 HIGH
10 28.570000 HIGH
11 0.000000 LOW
12 16.070000 LOW
13 51.790000 HIGH
14 35.710000 HIGH
Is there a better way for me to organize the last loop?
Since you seem to be using tibbles anyway, here's a much shorter version using dplyr and tidyr:
> d %>%
gather(key = tscore,value = score,T1Score:T4Score) %>%
group_by(Patient) %>%
summarise(Difference = max(score,na.rm = TRUE) - min(score,na.rm = TRUE)) %>%
ungroup() %>%
mutate(AvgDifference = mean(Difference),
Toxicity = if_else(Difference > mean(Difference),"HIGH","LOW"))
# A tibble: 10 x 4
Patient Difference AvgDifference Toxicity
<int> <dbl> <dbl> <chr>
1 3 21.4 30.3 LOW
2 5 46.4 30.3 HIGH
3 6 3.6 30.3 LOW
4 7 57.1 30.3 HIGH
5 8 42.3 30.3 HIGH
6 10 28.6 30.3 LOW
7 11 0 30.3 LOW
8 12 16.1 30.3 LOW
9 13 51.8 30.3 HIGH
10 14 35.7 30.3 HIGH
I think maybe your expected output might have been based on a slightly different average difference, so this output is very slightly different.
And a much simpler base R version if you prefer:
d$min <- apply(d[,2:5],1,min,na.rm = TRUE)
d$max <- apply(d[,2:5],1,max,na.rm = TRUE)
d$diff <- d$max - d$min
d$avg_diff <- mean(d$diff)
d$toxicity <- with(d,ifelse(diff > avg_diff,"HIGH","LOW"))
A few notes on your existing code:
as.data.frame((cbind(minValue,maxValue))) is not an advisable way to create data frames. This is more awkward than simply doing data.frame(minValue = minValue,maxValue = maxValue) and risks unintended coercion from cbind.
ave is for computing summaries over groups; just use mean if you have a single vector
The FUN argument in apply expects a function, not an arbitrary expression, which is what you're trying to pass at the end. The general syntax for an "anonymous" function in that context would be apply(...,FUN = function(arg) { do some stuff and return exactly the thing you want}).

when passing df$var to a function, is it possible to get the name of 'var'?

I'm writing a function where I'd like to be able to pass in variables from a data frame as atomic vectors, like df$var (e.g., mtcars$mpg).
To keep the example very simple, say the function just returns data.frame(table(df$var)):
foo.function <- function(var) {
data.frame(table(var))
}
head(foo.function(mtcars$mpg))
#> var Freq
#> 1 10.4 2
#> 2 13.3 1
#> 3 14.3 1
#> 4 14.7 1
#> 5 15 1
#> 6 15.2 2
Notice that the name of the tabulated variable in the returned table is the internal name of the passed object (var) rather than it's "original" name, which was mpg. Is it possible to retrieve mpg (just the name) from within the function (without changing or adding arguments)? I was inclined to say no, since R is just receiving a vector of values, but I suspect R may have this capacity based on what it can do with NSE.
We can use deparse/substitute to extract the column name
foo.function <- function(var) {
print(sub(".*\\$", "", deparse(substitute(var))))
data.frame(table(var))
}
head(foo.function(mtcars$mpg), 4)
#[1] "mpg"
# var Freq
#1 10.4 2
#2 13.3 1
#3 14.3 1
#4 14.7 1
If we need to change the column name
foo.function <- function(var) {
nm1 <- sub(".*\\$", "", deparse(substitute(var)))
out <- data.frame(table(var))
names(out)[1] <- nm1
out
}
head(foo.function(mtcars$mpg), 4)
# mpg Freq
#1 10.4 2
#2 13.3 1
#3 14.3 1
#4 14.7 1
As #RonakShah noted in the comments, it is better to pass column names and data as separate arguments. If the limitation of the function is to pass only a single argument and it always have to be with $, then the above function would be able to retrieve the column name

Pull subset of rows of dataframe based on conditions from other columns

I have a dataframe like the one below:
x <- data.table(Tickers=c("A","A","A","B","B","B","B","D","D","D","D"),
Type=c("put","call","put","call","call","put","call","put","call","put","call"),
Strike=c(35,37.5,37.5,10,11,11,12,40,40,42,42),
Other=sample(20,11))
Tickers Type Strike Other
1: A put 35.0 6
2: A call 37.5 5
3: A put 37.5 13
4: B call 10.0 15
5: B call 11.0 12
6: B put 11.0 4
7: B call 12.0 20
8: D put 40.0 7
9: D call 40.0 11
10: D put 42.0 10
11: D call 42.0 1
I am trying to analyze a subset of the data. The subset I would like to take is data where the ticker and strike are the same. But I also only want to grab this data if both a put and a call exists under type. With the data above for example, I would like to return the following result:
x[c(2,3,5,6,8:11),]
Tickers Type Strike Other
1: A call 37.5 5
2: A put 37.5 13
3: B call 11.0 12
4: B put 11.0 4
5: D put 40.0 7
6: D call 40.0 11
7: D put 42.0 10
8: D call 42.0 1
I'm not sure what the best way to go about doing this. My thought process is that I should create another column vector like
x$id <- paste(x$Tickers,x$Strike,sep="_")
Then use this vector to only pull values where there are multiple ids.
x[x$id %in% x$id[duplicated(x$id)],]
Tickers Type Strike Other id
1: A call 37.5 5 A_37.5
2: A put 37.5 13 A_37.5
3: B call 11.0 12 B_11
4: B put 11.0 4 B_11
5: D put 40.0 7 D_40
6: D call 40.0 11 D_40
7: D put 42.0 10 D_42
8: D call 42.0 1 D_42
I'm not sure how efficient this is, as my actual data consists of a lot more rows.
Also, this solution does not check for the type condition of there being one put and one call.
also the wording of the title could be a lot better, I apologize
EDIT::: having checked out this post Finding ALL duplicate rows, including "elements with smaller subscripts"
I could also use this solution:
x$id <- paste(x$Tickers,x$Strike,sep="_")
x[duplicated(x$id) | duplicated(x$id,fromLast=T),]
You could try something like:
x[, select := (.N >= 2 & all(c("put", "call") %in% unique(Type))), by = .(Tickers, Strike)][which(select)]
# Tickers Type Strike Other select
#1: A call 37.5 17 TRUE
#2: A put 37.5 16 TRUE
#3: B call 11.0 11 TRUE
#4: B put 11.0 20 TRUE
#5: D put 40.0 1 TRUE
#6: D call 40.0 12 TRUE
#7: D put 42.0 6 TRUE
#8: D call 42.0 2 TRUE
Another idea might be a merge:
x[x, on = .(Tickers, Strike), select := (length(Type) >= 2 & all(c("put", "call") %in% Type)),by = .EACHI][which(select)]
I'm not entirely sure how to get around the group-by operations since you want to make sure for each group they have both "call" and "put". I was thinking about using keys, but haven't been able to incorporate the "call"/"put" aspect.
An edit to your data to give a case where both put and call does not exist (I changed the very last "call" to "put"):
x <- data.table(Tickers=c("A","A","A","B","B","B","B","D","D","D","D"),
Type=c("put","call","put","call","call","put","call","put","call","put","put"),
Strike=c(35,37.5,37.5,10,11,11,12,40,40,42,42),
Other=sample(20,11))
Since you are using data.table, you can use the built in counter .N along with by variables to count groups and subset with that. If by counting Type you can reliably determine there is both put and call, this could work:
x[, `:=`(n = .N, types = uniqueN(Type)), by = c('Tickers', 'Strike')][n > 1 & types == 2]
The part enclosed in the first set of [] does the counting, and then the [n > 1 & types == 2] does the subsetting.
I am not a user of package data.table so this code is base R only.
agg <- aggregate(Type ~ Tickers + Strike, data = x, length)
result <- merge(x, subset(agg, Type > 1)[1:2], by = c("Tickers", "Strike"))[, c(1, 3, 2, 4)]
result
# Tickers Type Strike Other
#1: A call 37.5 17
#2: A put 37.5 7
#3: B call 11.0 14
#4: B put 11.0 20
#5: D put 40.0 15
#6: D call 40.0 2
#7: D put 42.0 8
#8: D call 42.0 1
rm(agg) # final clean up

R Programming Calculate Rows Average

How to use R to calculate row mean ?
Sample data:
f<- data.frame(
name=c("apple","orange","banana"),
day1sales=c(2,5,4),
day1sales=c(2,8,6),
day1sales=c(2,15,24),
day1sales=c(22,51,13),
day1sales=c(5,8,7)
)
Expected Results :
Subsequently the table will add more column for example the expected results is only until AverageSales day1sales.4. After running more data, it will add on to day1sales.6 and so on. So how can I count the average for all the rows?
with rowMeans
> rowMeans(f[-1])
## [1] 6.6 17.4 10.8
You can also add another column to of means to the data set
> f$AvgSales <- rowMeans(f[-1])
> f
## name day1sales day1sales.1 day1sales.2 day1sales.3 day1sales.4 AvgSales
## 1 apple 2 2 2 22 5 6.6
## 2 orange 5 8 15 51 8 17.4
## 3 banana 4 6 24 13 7 10.8
rowMeans is the simplest way. Also the function apply will apply a function along the rows or columns of a data frame. In this case you want to apply the mean function to the rows:
f$AverageSales <- apply(f[, 2:length(f)], 1, mean)
(changed 6 to length(f) since you say you may add more columns).
will add an AverageSales column to the dataframe f with the value that you want
> f
## name day1sales day1sales.1 day1sales.2 day1sales.3 day1sales.4 means
##1 apple 2 2 2 22 5 6.6
##2 orange 5 8 15 51 8 17.4
##3 banana 4 6 24 13 7 10.8

Creating a series of vectors from a vector

I have a simple two vector dataframe (length=30) that looks something like this:
> mDF
Param1 w.IL.L
1 AuZgFw 0.5
2 AuZfFw 2
3 AuZgVw 74.3
4 AuZfVw 20.52
5 AuTgIL 80.9
6 AuTfIL 193.3
7 AuCgFL 0.2
8 ...
I'd like to use each of the rows to form 30 single value numeric vectors with the name of the vector taken from mDF$Param1, so that:
> AuZgFw
[1] 0.5
etc
I've tried melting and casting, but I suspect there may be an easier way?
The simplest/shortest way is to apply assign over rows:
mDF <- read.table(textConnection("
Param1 w.IL.L
1 AuZgFw 0.5
2 AuZfFw 2
3 AuZgVw 74.3
4 AuZfVw 20.52
5 AuTgIL 80.9
6 AuTfIL 193.3
7 AuCgFL 0.2
"),header=T,stringsAsFactors=F)
invisible(apply(mDF,1,function(x)assign(x[[1]],as.numeric(x[[2]]),envir = .GlobalEnv)))
This involves converting the second column of the data frame to and from a string. invisible is there only to suppress the output of apply.
EDIT: You can also use mapply to avoid coersion to/from strings:
invisible(mapply(function(x,y)assign(x,y,envir=.GlobalEnv),mDF$Param1,mDF$w.IL.L))

Resources