I am trying to replace all values of r for which r<=10 with the value of the 1st observation in x (which is 1). This is just a very simplified example of what I am trying to do, so please do not question why I'm trying to do this in a complicated way because the full code is more complicated. The only thing I need help with is figuring out how to use the vector I created (p1) to replace r[p1] or equivalently r[c(1,2,3,4)] with x[ 1 ] (which is equal to 1). I can not write p1 explicitly because it will be generated in a loop (not shown in code).
x=c(1,2,3)
r=c(1,3,7,10,15)
assign(paste0("p", x[1]), which(r<=10))
p1
r[paste0("p", x[1])]=x[1]
In the code above, I tried using r[paste0("p", x[1])]=x[1] but this is the output I end up with
When instead I would like to see this output
Basically, I need to figure out a way to call p1 in this code r[??]=x[1] without explicitly typing p1.
I have included the full code I am attempting below in case context is needed.
##Creates a function to generate discrete random values from a specified pmf
##n is the number of random values you wish to generate
##x is a vector of discrete values (e.g. c(1,2,3))
##pmf is the associated pmf for the discrete values (e.g. c(.3,.2,.5))
r.dscrt.pmf=function(n,x,pmf){
set.seed(1)
##Generate n uniform random values from 0 to 1
r=runif(n)
high=0
low=0
for (i in 1:length(x)){
##High will establish the appropriate upper bound to consider
high=high+pmf[i]
if (i==1){
##Creates the variable p1 which contains the positions of all
##r less than or equal to the first value of pmf
assign(paste0("p", x[i]), which(r<=pmf[i]))
} else {
##Creates the variable p2,p3,p4,etc. which contains the positions of all
##r between the appropriate interval of high and low
assign(paste0("p", x[i]), which(r>low & r<=high))
}
##Low will establish the appropriate lower bound to consider
low=low+pmf[i]
}
for (i in 1:length(x)){
##Will loops to replace the values of r at the positions specified at
##p1,p2,p3,etc. with x[1],x[2],x[3],etc. respectively.
r[paste0("p", x[i])]=x[i]
}
##Returns the new r
r
}
##Example call of the function
r.dscrt.pmf(10,c(0,1,3),c(.3,.2,.5))
get is like assign, in that it lets you refer to variables by string instead of name.
r[get(paste0("p", x[1]))]=x[1]
But get is one of those "flags" of something that could be written in a much clearer and safer way.
Would this suit your needs?
ifelse(r<11, x[1], r)
[1] 1 1 1 1 15
Related
I am using the example of calculating the length of the arc around a circle and the area under the arc around a circle based on the radius of the circle (r) and the angle of the the arc(theta). The area and the length are both based on r and theta, and you can calculate them simultaneously in python.
In python, I can assign two values at the same time by doing this.
from math import pi
def circle_set(r, theta):
return theta * r, .5*theta*r*r
arc_len, arc_area = circle_set(1, .5*pi)
Implementing the same structure in R gives me this.
circle_set <- function(r, theta){
return(theta * r, .5 * theta * r *r)
}
arc_len, arc_area <- circle_set(1, .5*3.14)
But returns this error.
arc_len, arc_area <- circle_set(1, .5*3.14)
Error: unexpected ',' in "arc_len,"
Is there a way to use the same structure in R?
No, you can't do that in R (at least, not in base or any packages I'm aware of).
The closest you could come would be to assign objects to different elements of a list. If you really wanted, you could then use list2env to put the list elements in an environment (e.g., the global environment), or use attach to make the list elements accessible, but I don't think you gain much from these approaches.
If you want a function to return more than one value, just put them in a list. See also r - Function returning more than one value.
You can assign multiple variables the same value as below. Even here, I think the code is unusual and less clear, I think this outweighs any benefits of brevity. (Though I suppose it makes it crystal clear that all of the variables are the same value... perhaps in the right context it makes sense.)
x <- y <- z <- 1
# the above is equivalent to
x <- 1
y <- 1
z <- 1
As Gregor said, there's no way to do it exactly as you said and his method is a good one, but you could also have a vector represent your two values like so:
# Function that adds one value and returns a vector of all the arguments.
plusOne <- function(vec) {
vec <- vec + 1
return(vec)
}
# Creating variables and applying the function.
x <- 1
y <- 2
z <- 3
vec <- c(x, y, z)
vec <- plusOne(vec)
So essentially you could make a vector and have your function return vectors, which is essentially filling 3 values at once. Again, not what you want exactly, just a suggestion.
I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.
Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done:
v[1]:=some vector,
and the nth element is then called by the command
v[1][n].
How can this be done in R? The actual problem is as follows:
I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
x[k]<-rep(NA,M[k])
X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
if(x[k][i]>=0 & x[i]<=0.1056379){
X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
}
The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?
Thanks a lot :)
I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.
require(MASS)
sims<-10
# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1
# Create a list
x <- list()
X <- list()
for(k in 1:sims){
x[[k]]<-rep(NA,M[k])
X[[k]]<-rep(NA,M[k])
for(i in 1:M[k]){
x[[k]][i]<-runif(1,min=0,max=1)
if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.
Hope this helps!
Let me start with a few remarks and then show you, how your problem can be solved using R.
In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
x1[i] <- runif(1, 0, 1)
}
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.
So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.
# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
# define the function that calculates X for a single value from M
calculate_X <- function(m) {
x <- runif(m, min=0,max=1)
X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)
As you can see, there are no loops in that solution. I'll start to explain at the end:
lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.
I have the following code below and what I would like to do is populate a dataframe. Each row should be returned from the custom function rX (it returns a vector with 3 numbers).
I've come up with two ways to achieve this but they both feel a bit like work arounds and I was wondering if anyone had a better way to suggest.
Method 1 involves looping through each iteration storing the result in a temporary variable and then putting it in the correct place in the data frame
The second method rbinds the data in but I'm left with blank row which needs to be stripped out after.
n=500
ff<-c(0.2,0.3,0.5,0.25)
rX<-function(ff){
#generate data frame to hold set selections
rands<-runif(3)
s<-rep(0,3)
for(x in 1:3){
#generate probabalities from FF
probs<-cumsum(ff/sum(ff))
#select first fracture set
s[x]<-min(which(probs>=rands[x]))
#get rid of set and recalc
s[x]
ff[s[x]]<-0
}
rx<-s
}
solutions
#way 1
df_sets<-data.frame(s1=rep(0,n),s2=rep(0,n),s3=rep(0,n))
for (i in 1:n){
a<-rX(ff)
df_sets$s1[i]<-a[1]
df_sets$s2[i]<-a[2]
df_sets$s3[i]<-a[3]
}
head(df_sets)
#way 2
df_sets<-data.frame(s1=0,s2=0,s3=0)
for (i in 1:n){
a<-rX(ff)
df_sets<-rbind(df_sets,a)
}
df_sets<-df_sets[-1,]
head(df_sets)
edit:
The point of this function is to create a number of realizations which select from (without replacement) a predetermined vector which discrete probabilities. The function rX will use a static input as shown in the function above. It will select one of the datapoints by comparing a random number between 0 and 1 to the cumulative percent passing at each point. Then it will remove this point recalculate the probability function and recompare.
I am trying to set up a Gibbs sampler in R where I update my value at each step.
I have a function in R that I want to maximise for 2 values; my previous value and a new one.
So I know the maximum outcome from the function applied to both values. But then how do I select the best input without doing it manually? (I need to do a lot of iterations). Here is an idea of the code and the variables:
g0<-function(k){sample(0:1,k,replace=T)}
this is a k dimensional vector with entries 1 or 0 uniformly. Initial starting point for my chain. If i=1 then include the i'th variable in the design matrix.
X1 design matrix
Xg<-function(g){
Xg<-cbind(X1[,1]*g[1],X1[,2]*g[2],X1[,3]*g[3],X1[,4]*g[4],X1[,5]*g[5],X1[,6]*g[6],X1[,7]*g[7])
return(Xg[,which(!apply(Xg,2,FUN = function(x){all(x == 0)}))])
}
Xg0<-Xg(g0)
reduced design matrix for g0
c<-1:100000
mp<-function(g){
mp<-sum((1/(c*(c+1)^-((q+1)/2)))*
(t(Y)%*%Y-(c/(c+1))*t(Y)%*%Xg(g)%*%solve(t(Xg(g))%*%Xg(g))%*%t(Xg(g))%*%Y)^(-27/2))
return(mp)
}
this is my function.
Therefore if I have mp(g) and mp(g*), for 2 inputs g and g*, such that the max is mp(g*) how can I return g*?
Thanks for any help and if you have any queries just ask. sorry about the messy code as well; I have not used this site before.
Like this:
inputs <- list(g, g2)
outputs <- sapply(inputs, mp)
best.input <- inputs[which.max(outputs)]
I have a very large data set that I have binned, and stored each bin (subset) as a list so that I can easily call any given subset. My problem is in calling for a specific column within a subset.
For example my data (which has diameters and strengths as the columns), is broken up into 20 bins, by diameter. I manually binned the data, like so:
subset.1 <- subset(mydata, Diameter <= 0.01)
Similar commands were used, to make 20 bins. Then I stored the names (subset.1 through subset.20) into a list:
diameter.bin<-list(subset.1, ... , subset.20)
I can successfully call each diameter bin using:
diameter.bin[x]
Now, if I only want to see the strength values for a given diameter bin, I can use the original name (that is store in the list):
subset.x$Strength
But I cannot get this information using the list call:
diameter.bin[x]$Strength
This command returns NULL
Note that when I call any subset (either by diameter.bin[x], subset.x or even subset.x$Strength) my column headers do show up. When I use:
names(subset.1)
This returns "Diameter" and "Strength"
But when I use:
names(diameter.bin[1])
This returns NULL.
I'm assuming that the column header is part of the problem, but I'm not sure how to fix it, other than take the headers off of the original data file. I would prefer not to do this if at all possible.
The end goal is to look at the distribution of strength values for each diameter bin, so I will be doing things like drawing histograms, calculating parameters etc. I was hoping to do something along these lines to produce the histograms:
n=length(diameter.bin)
for(i in (1:n))
{
hist(diameter.bin[i]$Strength)
}
And do something similar to this to store median values for each bin in a new vector.
Any tips are greatly appreciated, as right now I'm doing it all 1 bin at a time, and I know a loop (or something similar) would really speed up my analysis.
You need two square brackets. Here is a reproducible example demonstrating the issue:
> diam <- data.frame(x=rnorm(5), y=rnorm(5))
>
> diam.l <- list(diam, diam)
> diam.l[1]$x
NULL
> diam.l[[1]]$x
[1] -0.5389441 -0.5155441 -1.2437108 -2.0044323 -0.6914124