I'm currently trying to grasp the basics of r.
One Exercise I'm trying is creating square sequences where the sequence is the number squared itself.
I'm trying to create a sequence such as:
(1,2,3,4,5) =
(1,2,9,64,625)
Is there a function for this in r?
The answer to this question is
(1:10) ^ (0:9)
I don't exactly understand what you want your desired output to be, but arrays are your friends. Almost anything you want to do here can be vectorized.
n <- 5
x <- seq(1:n)
x
y <- c(0:(n-1))
y
z <- x^y
z
Related
I am using the example of calculating the length of the arc around a circle and the area under the arc around a circle based on the radius of the circle (r) and the angle of the the arc(theta). The area and the length are both based on r and theta, and you can calculate them simultaneously in python.
In python, I can assign two values at the same time by doing this.
from math import pi
def circle_set(r, theta):
return theta * r, .5*theta*r*r
arc_len, arc_area = circle_set(1, .5*pi)
Implementing the same structure in R gives me this.
circle_set <- function(r, theta){
return(theta * r, .5 * theta * r *r)
}
arc_len, arc_area <- circle_set(1, .5*3.14)
But returns this error.
arc_len, arc_area <- circle_set(1, .5*3.14)
Error: unexpected ',' in "arc_len,"
Is there a way to use the same structure in R?
No, you can't do that in R (at least, not in base or any packages I'm aware of).
The closest you could come would be to assign objects to different elements of a list. If you really wanted, you could then use list2env to put the list elements in an environment (e.g., the global environment), or use attach to make the list elements accessible, but I don't think you gain much from these approaches.
If you want a function to return more than one value, just put them in a list. See also r - Function returning more than one value.
You can assign multiple variables the same value as below. Even here, I think the code is unusual and less clear, I think this outweighs any benefits of brevity. (Though I suppose it makes it crystal clear that all of the variables are the same value... perhaps in the right context it makes sense.)
x <- y <- z <- 1
# the above is equivalent to
x <- 1
y <- 1
z <- 1
As Gregor said, there's no way to do it exactly as you said and his method is a good one, but you could also have a vector represent your two values like so:
# Function that adds one value and returns a vector of all the arguments.
plusOne <- function(vec) {
vec <- vec + 1
return(vec)
}
# Creating variables and applying the function.
x <- 1
y <- 2
z <- 3
vec <- c(x, y, z)
vec <- plusOne(vec)
So essentially you could make a vector and have your function return vectors, which is essentially filling 3 values at once. Again, not what you want exactly, just a suggestion.
This question already has an answer here:
What's wrong with my R function of logistic map
(1 answer)
Closed 4 years ago.
I am very new to R and am in need of some help. I am trying to write code for the following:
suppose x[0]=1 and
x[j]=x[j-1]+(2/x[j-1])
for j=1,2,...
Write a program to find the first 10 values, i.e. x[0],x[1],...x[9]
I believe I have to write a for()
loop but I am struggling to get the right combination. Any help you can provide would be greatly appreciated.
Here is where I'm at right now:
x=1
for(j in 1:10){
x=x[j-1]+(2/x[j-1])
print(x)
}
Yes, this is for homework. The x[0] is supposed to be x (subscript) 0. I'm unsure how to write that any other way.
Some pointers:
1) The goal should probably be to create a vector x with 10 elements
2) In R, vector indicies start at 1 (instead of 0), so you have that x[1] = 1.
3) In R, a single number is in fact a vector of length 1, so you can initiate this vector by writing x <- 1
4) Since you already have the first element and the loop uses the preceding element to create the next element, the loop should start at j = 2.
5) In R, when you assign an element to a vector outside its length, R will expand the vector to the necessary length. I.e. you can write
x <- 1
x[2] <- 3.14
and have a vector x = [1, 3.14]
So the setup can look like this:
x <- 1
for(j in 2:10){
#do stuff to generate the x vector
}
I'm trying to understand the answer to this question using R and I'm struggling a lot.
The dataset for the R code can be found with this code
library(devtools)
install_github("genomicsclass/GSE5859Subset")
library(GSE5859Subset)
data(GSE5859Subset) ##this loads the three tables you need
Here is the question
Write a function that takes a vector of values e and a binary vector group coding two groups, and returns the p-value from a t-test: t.test( e[group==1], e[group==0])$p.value.
Now define g to code cases (1) and controls (0) like this g <- factor(sampleInfo$group)
Next use the function apply to run a t-test for each row of geneExpression and obtain the p-value. What is smallest p-value among all these t-tests?
The answer provided is
myttest <- function(e,group){
x <- e[group==1]
y <- e[group==0]
return( t.test(x,y)$p.value )
}
g <- factor(sampleInfo$group)
pvals <- apply(geneExpression,1,myttest, group=g)
min( pvals )
Which gives you the answer of 1.406803e-21.
What exactly is the input of the "e" argument of the myttest function when you run this? Is it possible to write this function as a formula like
t.test(DV ~ sampleInfo$group)
The t test is comparing the gene expression values of the 24 people (the values of which I believe are in the "geneExpression" matrix) by what group they were
in which you can find in sampleInfo's "group" column. I've run t tests so many times in R, but for some reason I can't wrap my mind around what's going on in this code.
You question seems to be about understanding the function apply().
For the technical description, see ?apply.
My quick explanation: the apply() line of code in your question applies the following function to each of the rows of geneExpression
myttest(e=x, group=g)
where x is a placeholder for each row.
To help make sense of it, a for loop version of that apply() line would look something like:
N <- nrows(geneExpression) #so we don't have to type this twice
pvals <- numeric(N) #empty vector to store results
# what 'apply' does (but it does it very quickly and with less typing from us)
for(i in 1:N) {
pvals[i] <- myttest(geneExpression[i,], group=g[i])
}
I am normally a maple user currently working with R, and I have a problem with correctly indexing variables.
Say I want to define 2 vectors, v1 and v2, and I want to call the nth element in v1. In maple this is easily done:
v[1]:=some vector,
and the nth element is then called by the command
v[1][n].
How can this be done in R? The actual problem is as follows:
I have a sequence M (say of length 10, indexed by k) of simulated negbin variables. For each of these simulated variables I want to construct a vector X of length M[k] with entries given by some formula. So I should end up with 10 different vectors, each of different length. My incorrect code looks like this
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
for(k in 1:sims){
x[k]<-rep(NA,M[k])
X[k]<-rep(NA,M[k])
for(i in 1:M[k]){x[k][i]<-runif(1,min=0,max=1)
if(x[k][i]>=0 & x[i]<=0.1056379){
X[k][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[k][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
}
The error appears to be that x[k] is not a valid name for a variable. Any way to make this work?
Thanks a lot :)
I've edited your R script slightly to get it working and make it reproducible. To do this I had to assume that eks_2016_kasko was an integer value of 10.
require(MASS)
sims<-10
# Because you R is not zero indexed add one
M<-rnegbin(sims, 10*exp(-2.17173), 840.1746) + 1
# Create a list
x <- list()
X <- list()
for(k in 1:sims){
x[[k]]<-rep(NA,M[k])
X[[k]]<-rep(NA,M[k])
for(i in 1:M[k]){
x[[k]][i]<-runif(1,min=0,max=1)
if(x[[k]][i]>=0 & x[[k]][i]<=0.1056379){
X[[k]][i]<-rlnorm(1, 6.228244, 0.3565041)}
else{
X[[k]][i]<-rlnorm(1, 8.910837, 1.1890874)
}
}
This will work and I think is what you were trying to do, BUT is not great R code. I strongly recommend using the lapply family instead of for loops, learning to use data.table and parallelisation if you need to get things to scale. Additionally if you want to read more about indexing in R and subsetting Hadley Wickham has a comprehensive break down here.
Hope this helps!
Let me start with a few remarks and then show you, how your problem can be solved using R.
In R, there is most of the time no need to use a for loop in order to assign several values to a vector. So, for example, to fill a vector of length 100 with uniformly distributed random variables, you do something like:
set.seed(1234)
x1 <- rep(NA, 100)
for (i in 1:100) {
x1[i] <- runif(1, 0, 1)
}
(set.seed() is used to set the random seed, such that you get the same result each time.) It is much simpler (and also much faster) to do this instead:
x2 <- runif(100, 0, 1)
identical(x1, x2)
## [1] TRUE
As you see, results are identical.
The reason that x[k]<-rep(NA,M[k]) does not work is that indeed x[k] is not a valid variable name in R. [ is used for indexing, so x[k] extracts the element k from a vector x. Since you try to assign a vector of length larger than 1 to a single element, you get an error. What you probably want to use is a list, as you will see in the example below.
So here comes the code that I would use instead of what you proposed in your post. Note that I am not sure that I correctly understood what you intend to do, so I will also describe below what the code does. Let me know if this fits your intentions.
# define M
library(MASS)
eks_2016_kasko <- 486689.1
sims<-10
M<-rnegbin(sims, eks_2016_kasko*exp(-2.17173), 840.1746)
# define the function that calculates X for a single value from M
calculate_X <- function(m) {
x <- runif(m, min=0,max=1)
X <- ifelse(x > 0.1056379, rlnorm(m, 6.228244, 0.3565041),
rlnorm(m, 8.910837, 1.1890874))
}
# apply that function to each element of M
X <- lapply(M, calculate_X)
As you can see, there are no loops in that solution. I'll start to explain at the end:
lapply is used to apply a function (calculate_X) to each element of a list or vector (here it is the vector M). It returns a list. So, you can get, e.g. the third of the vectors with X[[3]] (note that [[ is used to extract elements from a list). And the contents of X[[3]] will be the result of calculate_X(M[3]).
The function calculate_X() does the following: It creates a vector of m uniformly distributed random values (remember that m runs over the elements of M) and stores that in x. Then it creates a vector X that contains log normally distributed random variables. The parameters of the distribution depend on the value x.
I am trying to solve a system of non-linear equations in R but it keeps giving me this error "number of items to replace is not a multiple of replacement length".
My code looks like this:
my_data <- Danske
D <- my_data$D
V <- my_data$V
r <- my_data$r
s <- my_data$s
fnewton <- function(x)
{
y <- numeric(2)
d1 <- (log(x[1]/D)+(r+x[2]^2/2))/x[2]
d2 <- d1-x[2]
y[1] <- V - (x[1]*pnorm(d1) - exp(-r)*D*pnorm(d2))
y[2] <- s*V - pnorm(d1)*x[2]*x[1]
y
}
xstart <- c(239241500000, 0.012396)
nleqslv(xstart, fnewton, method="Newton")
D, V, r and s are numeric[1:2508] values and I think thats where the problem comes from. If I have single values 1x1, it solves it well, however, if I insert vectors with 2508 values, it only calculates the first x1 and x2 and then comes the warnings with the message I wrote above.
Thank you for any help.
Lina
You don't really have a "system" of equations the way you've written your fnewton . May I recommend (disclaimer: I'm the author) you take a look at ktsolve package? You may find that it'll get you the solutions you're looking for a bit more easily. You can use your fnewton almost as written, except that you will pass a collection of named scalar variables into the function.
If you want to solve (either with nleqslv or ktsolve) for a variety of input 'starting points', then you should wrap your approach inside a loop or *apply function.
Too long for a comment.
Without having a coy of your data, it's impossible to verify this, but...
You are passing fnewton(...) a vector of length 2, and expecting a vector of length 2 as the return value. But in your function, d1 and d2 are set to vectors of length 2508. Then you attempt to set y[1] and y[2] to vectors of length 2508. R can't do that, so it uses the first value in the RHS and provides the warnings.
I suggest you step through your function and see what each line is doing.
Can't propose a solution because I have no idea what you are trying to accomplish.