Need some help of a function in R - r

The question is: Create a function that takes in a numeric vector. The output should be a vector with running mean values. The i-th element of the output vector should be the mean of the values in the input vector from 1 to i.
My main problem is in the for loop, which is as follows:
x1 <- c(2,4,6,8,10)
for (i in 2: length(x1)){
ma <- sum(x1[i-1] , x1[i]) / i
print(ma)
mresult <- rbind(ma)
}
View(ma)
I know there must be something wrong in it. But I am just not sure what it is.

As you have noticed there are more efficient ways using already available functions and packages to achieve what you are trying to do. But here is how you would go about fixing your loop
x1 <- c(2,4,6,8,10)
mresult = numeric(0) #Initiate mresult. Or maybe you'd want to initiate with 0
for (i in 2: length(x1)){
ma <- sum(x1[1:i])/i #You were originally dividing the sum of (i-1)th and ith value by i
print(ma) #This is optional
mresult <- c(mresult,ma) #Since you have only an array, there is no need to rbind
}
View(ma) #The last computed average
View(mresult) #All averages

Related

How do you create a function that row reduces a matrix in R?

So far I've tried the following code but it didn't work in R-studio; it just hangs there.
Am I doing something wrong? This is my first real R code project so I'd love suggestions!
new.rref <- function(M,fractions=FALSE)
{
#M is a matrix.
#Require numeric matricies.
if ((!is.matrix(M)) || (!is.numeric(M)))
stop("Sorry pal! Data not a numeric matrix.")
#Specify and differentiate between rows and columns.
r=nrow(M)
c=ncol(M)
#Now establish a continuous loop (*needed help on this one)
#According to the help documents I've read, this has to do with a
#computerized version of the Gaussian Reducing Algorithm
#While 1<r and 1<c, must set first column entries in which
#1:r < 1 equal to zero. This while loop is used to loop the
#algorithm until a specific condition is met -- in this case,
#until elements in the first column to which 1:r < 1
#are set to zero.
while((1<=r) & (1<=c))
new <- M[,1]
new[1:r < y.position] <- 0
# Now here's the fun part :)
#We need to find the maximum leading coefficient that lies
#at or below the current row.
new1 <- which.max(abs(new))
#We will assign these values to the vector "LC"
LC <- col[which]
#Now we need to allow for row exchange!
#Basically tells R that M[c(A,B),] = M[c(B,A),].
if (which > 1) { M[c(1,which),]<-A[c(which,1),] }
#Now we have to allow for the pivot, "sweep", and restoration
#of current row. I totally didn't know how to do this so I
#used and changed some code from different documentations.
#PIVOT (friends reference)
M[1,]<-M[1,]/LC
new2 <-M[1,]
#CLEAN
M <- M - outer(M[,x.position],new2)
#RESTORE
A[1,]<-new2
#Last, but certantly not least, we're going to round the matrix
#off to a certain value. I might have did this wrong.
round(M)
return(M)
print(M)
}
Edit: I added the first line, for some reason it got deleted.
Edit 2: Say you have a matrix M=matrix(c(2,3,4,7), nrow=2, ncol=2, byrow=TRUE); new.rref(M) needs to produce the reduced row echelon form of matrix M. I already did the math; new.rref(M) should be equal to matrix(c(1,0,0,1), nrow=2, ncol=2, byrow=T

initialise multiple variables at once in R [duplicate]

I am using the example of calculating the length of the arc around a circle and the area under the arc around a circle based on the radius of the circle (r) and the angle of the the arc(theta). The area and the length are both based on r and theta, and you can calculate them simultaneously in python.
In python, I can assign two values at the same time by doing this.
from math import pi
def circle_set(r, theta):
return theta * r, .5*theta*r*r
arc_len, arc_area = circle_set(1, .5*pi)
Implementing the same structure in R gives me this.
circle_set <- function(r, theta){
return(theta * r, .5 * theta * r *r)
}
arc_len, arc_area <- circle_set(1, .5*3.14)
But returns this error.
arc_len, arc_area <- circle_set(1, .5*3.14)
Error: unexpected ',' in "arc_len,"
Is there a way to use the same structure in R?
No, you can't do that in R (at least, not in base or any packages I'm aware of).
The closest you could come would be to assign objects to different elements of a list. If you really wanted, you could then use list2env to put the list elements in an environment (e.g., the global environment), or use attach to make the list elements accessible, but I don't think you gain much from these approaches.
If you want a function to return more than one value, just put them in a list. See also r - Function returning more than one value.
You can assign multiple variables the same value as below. Even here, I think the code is unusual and less clear, I think this outweighs any benefits of brevity. (Though I suppose it makes it crystal clear that all of the variables are the same value... perhaps in the right context it makes sense.)
x <- y <- z <- 1
# the above is equivalent to
x <- 1
y <- 1
z <- 1
As Gregor said, there's no way to do it exactly as you said and his method is a good one, but you could also have a vector represent your two values like so:
# Function that adds one value and returns a vector of all the arguments.
plusOne <- function(vec) {
vec <- vec + 1
return(vec)
}
# Creating variables and applying the function.
x <- 1
y <- 2
z <- 3
vec <- c(x, y, z)
vec <- plusOne(vec)
So essentially you could make a vector and have your function return vectors, which is essentially filling 3 values at once. Again, not what you want exactly, just a suggestion.

R programming Function (Returning a subset of Real Mean Squared)

I am new to R and am working on writing some cool functions while I learn statistics in parallel. I'm trying to make a function that will take a numeric vector, perform the "root mean squared" operations and then have the output return essentially same vector with the possible outliers removed.
For example, if the vector is c(2,4,9,10,100) the resulting RMS would be about 37.
Therefore, I want the output to return the same vector with the possible outlier (in this case, 100) removed from the dataset. So the result would be 2, 4, 9, 10
I put my code below but the output isn't working. I tried it 2 different ways. Everything up to the line that says RMS final works. But below that it does not.
How can I modify this function so that it does what I want? Also, as a bonus, and this might be asking a lot but based on my coding below, any tips for a newbie on making functions would be something I'd be grateful for as well. Thanks so much!
RMS_x <- c(2,4,9,10,100)
#Root Mean Squared Function - Takes a numeric vector
RMS <- function(RMS_x){
RMS_MEAN <- mean(RMS_x)
RMS_DIFF <- (RMS_x-RMS_MEAN)
RMS_DIFF_SQ <- RMS_DIFF^2
RMS_FINAL <- sqrt(sum(RMS_DIFF_SQ)/length(RMS_x))
for(i in length(RMS_x)){
if(abs(RMS_x[i]) > RMS_FINAL){
output <- RMS_x[i]}
else {NULL} }
return(output)
}
#Root Mean Squared Function - Takes a numeric vector
RMS <- function(RMS_x){
RMS_MEAN <- mean(RMS_x)
RMS_DIFF <- (RMS_x-RMS_MEAN)
RMS_DIFF_SQ <- RMS_DIFF^2
RMS_FINAL <- sqrt(sum(RMS_DIFF_SQ)/length(RMS_x))
#output <- ifelse(abs(RMS_x) > RMS_FINAL,RMS_x, NULL)
return(RMS_FINAL)
}
Try following in the first lines of the RMS function.
RMS <- function(RMS_x) {
bp <- boxplot(RMS, plot = FALSE)
RMS_x <- RMS_x[!(RMS_x %in% bp$out)]
...
Now, you have RMS_x sans the outliers.
The boxplot function has a way of determining the outliers. Here, I am using that to remove them.
Since you are asking more specifically about R and R functions I’ll focus my response on that. There are a couple errors I'll point out then provide a few alternative solutions.
Your first function isn’t producing the output you want for two reasons:
The logic instructs the function to return a single value rather than a vector. If you’re trying to load a vector within your for loop (one without the outlier) make sure to initialize the vector outside of the function : output <- vector() (note that in my solution below however this is not required). Also the value it is returning is just a value in your vector RMS_x that is greater than the RMS rather that finding an outlier, just fyi if that's what you wanted.
There’s an error and/or typo in your for loop argument, it’s minor but it turns your for loop into not-a-loop whatsoever – which is obviously the total opposite of what you intended. The for loop needs a vector to loop through, the argument should be: for(i in 1:length(RMS_x))
In your code the loop is jumping straight to i = 5 because that is the length of your vector (length(RMS_x) = 5). Given that the values in the RMS_x vector were already in ascending order your code happens to give the "right" answer but that's just because of how you initially loaded the vector. This may have been a typo in your question, and it's a difference of only 2 code characters, but it totally changes what the function looks for.
Solution:
To get what you are trying to accomplish, you need to write two functions: 1.) that defines what's considered an outlier in your data set and 2.) a second function that strips out the outliers and calculates RMS. Then from there either make the functions independent or nest them to pass variables (this kind of goes with your bonus request as well since it's multiple ways of writing functions).
Function to identify outliers:
outlrs <- function(vec){
Q1 <- summary(vec)["1st Qu."]
Q3 <- summary(vec)["3rd Qu."]
# defining outliers can get complicated depending on your sample data but
# your data set is super simple so we'll keep it that way
IQR <- Q3 - Q1
lower_bound <- Q1 - 1.5*(IQR)
upper_bound <- Q3 + 1.5*(IQR)
bounds <- c(lower_bound, upper_bound)
return(bounds)
assign("non_outlier_range", bounds, envir = globalEnv())
# the assign() function will create an actual object in your environment
# called non_outlier_range that you can access directly - return()
# just mean the result will be spit out into the console or into a variable
# you load it into
}
Now moving on to the second function, a few options here:
First Way: Input bounds argument into RMS_func()
RMS_func <- function(dat, bounds){
dat <- dat[!(dat < min(bounds)) & !(dat > max(bounds))]
dat_MEAN <- mean(dat)
dat_DIFF <- (dat-dat_MEAN)
dat_DIFF_SQ <- dat_DIFF^2
dat_FINAL <- sqrt(sum(dat_DIFF_SQ)/length(dat))
return(dat_FINAL)
}
# Call function from approach 1 - note that here the assign() in the
# definition of outlrs() would be required to refer to non_outlier_range:
RMS_func(dat = RMS_x, bounds = non_outlier_range)
Second Way: Call outlrs() inside the second function
RMS_func <- function(dat){
bounds <- outlrs(vec = dat)
dat <- dat[!(dat < min(bounds)) & !(dat > max(bounds))]
dat_MEAN <- mean(dat)
dat_DIFF <- (dat-dat_MEAN)
dat_DIFF_SQ <- dat_DIFF^2
dat_FINAL <- sqrt(sum(dat_DIFF_SQ)/length(dat))
return(dat_FINAL)
}
# Call RMS_func - here the assign() in outlrs() would not be needed is not
# needed because the output will exist within the functions temp environment
# and be passed to RMS_func
RMS_func(dat = RMS_x)
Third Way: Nest outlrs() definition within the RMS_Func - in this case you only need one nested function to accomplish your task
RMS_Func <- function(dat){
outlrs <- function(vec){
Q1 <- summary(dat)["1st Qu."]
Q3 <- summary(dat)["3rd Qu."]
#Q1 <- quantile(vec)["25%"]
#Q3 <- summary(vec)["75%"]
IQR <- Q3 - Q1
lower_bound <- Q1 - 1.5*(IQR)
upper_bound <- Q3 + 1.5*(IQR)
bounds <- c(lower_bound, upper_bound)
return(bounds)
}
bounds <- outlrs(vec = dat)
dat <- dat[!(dat < min(bounds)) & !(dat > max(bounds))]
dat_MEAN <- mean(dat)
dat_DIFF <- (dat-dat_MEAN)
dat_DIFF_SQ <- dat_DIFF^2
dat_FINAL <- sqrt(sum(dat_DIFF_SQ)/length(dat))
return(dat_FINAL)
}
P.S. Wrote this pretty quickly - will likely re-test and edit later. Hopefully for now this helps.

How to take results of a function and apply it to function again in R?

I am aware this is a very basic question and am sorry to take up everyone's time. I created a function but would like to take those results, and apply it to the function again ( I am trying to model growth).
I don't think I want to use a loop because I need the values to come from the function. I also don't think it's apply because I need to extract the values from the function.
Here is my function
initial<-c(36.49)
second<-NULL
growth <- function(x){
second <- (131.35-(131.35 -x)*exp(-0.087))
}
second<-growth(initial)
third<-growth(second)
fourth<-growth(third)
fifth<-growth(fourth)
sixth<-growth(fifth)
seventh<-growth(sixth)
here is how I am doing it now, but as you can see I would have to keep doing this over and over again
You can use loop. Just store the outputs in a vector:
# initial value
initial<-c(36.49)
# dont need this i think
# second<-NULL
# create a holding vector fro result
values <- vector()
# assign
values[1] <- initial
# your function
growth <- function(x){
second <- (131.35-(131.35 -x)*exp(-0.087))
}
# start a loop; you start with 2
for(i in 2:7){
# then access the previous value using i - 1
# then store to the next index, which is i
values[i] <- growth(values[i - 1])
}
This should do the same.
Something along the lines of this maybe can help
x <- 1
try <- function(x) x <<- x+1
for(i in 1:5) try(x)

Indexing variables in R

I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.
Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done:
v[1]:=some vector,
and the nth element is then called by the command
v[1][n].
How can this be done in R? The actual problem is as follows:
I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
x[k]<-rep(NA,M[k])
X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
if(x[k][i]>=0 & x[i]<=0.1056379){
X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
}
The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?
Thanks a lot :)
I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.
require(MASS)
sims<-10
# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1
# Create a list
x <- list()
X <- list()
for(k in 1:sims){
x[[k]]<-rep(NA,M[k])
X[[k]]<-rep(NA,M[k])
for(i in 1:M[k]){
x[[k]][i]<-runif(1,min=0,max=1)
if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.
Hope this helps!
Let me start with a few remarks and then show you, how your problem can be solved using R.
In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
x1[i] <- runif(1, 0, 1)
}
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.
So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.
# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
# define the function that calculates X for a single value from M
calculate_X <- function(m) {
x <- runif(m, min=0,max=1)
X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)
As you can see, there are no loops in that solution. I'll start to explain at the end:
lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.

Resources