I am trying to generate 100 samples of Z, where Z is the summation of 8 independent uniformly distributed random variables in the interval [0;1]
I have the following code so far, but I'm not sure if it's correct. I am not sure if my loop is correct
eight<-runif(8,0,1) #Uses the runif function to generate 8 uniform 0-1 random variables
Z_1<-sum(eight) #caclulates the sum and stores it in the variable Z_1
sample <-NA
for (i in 1:100 ) { #Function continues the loop for 100 different values
eight<-runif(8,0,1); #Creates sum loop for 8 independent values uniform 0-1 random variables.
Z_1<-sum(eight); # stores in the sum loop in the variable Z
sample[i] = Z_1;
}`
Thanks
I would vectorize the whole thing. There is no real reason to run 100 iterations when you can just generate 800 observations in one run. Then just use matrix and colSums and you done
set.seed(123)
n <- 100
Z <- colSums(matrix(runif(8 * n), 8, n))
Z
# [1] 4.774425 4.671171 4.787691 4.804041 3.513257 2.330163 3.300135 3.568657 5.503481 2.861764 4.533283 3.353658
# [13] 4.230073 4.690739 4.364708 3.094156 4.933616 3.942834 3.712522 2.587036 3.731474 4.388749 4.484030 4.315968
# [25] 4.800758 4.252631 2.716972 5.159044 4.146881 3.244546 4.418618 4.350035 5.344182 3.176801 3.903337 2.261935
# [37] 3.646572 4.286075 3.074900 4.210506 3.172513 4.535665 4.245856 4.184848 4.532286 2.899883 4.473629 4.751224
# [49] 3.498038 3.337437 4.238989 3.349812 3.888696 4.748254 3.029479 4.246619 3.330434 3.879168 3.786216 3.839956
# [61] 3.878997 4.546531 2.863010 3.417072 4.266108 3.141875 4.960758 3.989613 4.373042 4.295742 4.459014 5.561066
# [73] 4.401990 4.121301 3.830575 3.412446 3.812347 5.591238 3.801587 4.454336 4.213343 5.222007 4.300991 2.765003
# [85] 3.293251 5.362586 2.954080 3.036312 3.655677 3.373603 5.575184 4.167740 3.904038 3.884440 2.901452 3.079311
# [97] 4.927770 3.930943 4.169907 2.922618
Related
How can I repeat my line 22 to line 26 ten times serially and add the result of line 26 from first to the 10th and finally get the average of the 10 consecutive sum in my r code bellow:
# simulate arima(1,1,0)
library(forecast)
set.seed(100)
wn <- rnorm(10, mean = 0, sd = 1)
ts <- wn[1:2]
for (i in 3:10){
ts<-arima.sim(n=10,model=list(ar=-0.7048,order=c(1,1,0)),start.innov=4.1,n.start=1,innov=wn)
}
ts <-ts[-1]
#
# write the function for RMSE
rmse <- function(x) {
m <- auto.arima(x)
acu <- accuracy(m)
acu[1, 2]
}
#
t<-length(ts)# the length of the time series
l<- 2# block size
m <- ceiling(t / l) # number of blocks
blk<-split(ts, rep(1:m, each=l, length.out = t)) # divides the series into blocks
res<-sample(blk, replace=T, 1000) # resamples the blocks
unlist<-unlist(res, use.names = F) # unlist the bootstrap series
tsunlist<-ts(unlist) # turns the bootstrap series into time series data
# use the RMSE function
RMSE <- rmse(tsunlist)
The above R code performs the following algorithm in steps:
Simulate ARIMA(1,1,0) time series (line 1 to line 9)
Split the time series into blocks of equal size of 2 (line 18 to line 21)
Resample each block at random 1000 times (line 22 to line 23)
Rearrange the resampled series into time series data (line 24)
Obtain the RMSE of the resampled time series (line 26)
I want to repeat step 3 to 5 100 times, add up the results produced in step 5 100 times and get the average of the results.
Here is an approach for how to perform the resample 100 times. Replace your code starting on line 22 with this.
finalresult <- vector() #initialize vector to receive result from for loop
for(i in 1:100){
res<-sample(blk, replace=T, 1000) # resamples the blocks
res.unlist<-unlist(res, use.names = F) # unlist the bootstrap series
tsunlist<-ts(res.unlist) # turns the bootstrap series into time series data
# use the RMSE function
RMSE <- rmse(tsunlist)
finalresult[i] <- RMSE # Assign RMSE value to final result vector element i
}
Once you run the loop you will find the results in the finalresult object.
finalresult
[1] 0.4695763 0.4702997 0.4727734 0.4677841 0.4633566 0.4619570 0.4645237 0.4657333 0.4734756 0.6035923 0.4676718 0.4563636
[13] 0.4653432 0.4741868 0.4638185 0.4679926 0.4652872 0.4644442 0.4673774 0.4654423 0.4574012 0.4613827 0.4689168 0.4652262
[25] 0.4621680 0.4714052 0.4544405 0.4781833 0.4640436 0.4708187 0.4562165 0.4631720 0.4638906 0.4654569 0.4691919 0.4644442
[37] 0.4635361 0.4657427 0.4682825 0.4626979 0.4636363 0.4562305 0.4582826 0.4689343 0.4648181 0.4659912 0.4597617 0.4701386
[49] 0.4678786 0.4658028 0.4633929 0.4759015 0.4719038 0.4685480 0.4639831 0.4663984 0.4631158 0.4636240 0.4677248 0.4680744
[61] 0.4633999 0.4734937 0.4545364 0.4671157 0.4656818 0.4638791 0.4676848 0.4650673 0.4710859 0.4680129 0.4641621 0.4632793
[73] 0.4664942 0.4596942 0.4643477 0.4626547 0.4684443 0.4572568 0.4695198 0.4566412 0.4650888 0.4643392 0.4603766 0.4694628
[85] 0.4579581 0.4627443 0.4729837 0.4668802 0.4723182 0.4688657 0.4684541 0.4655471 0.4687285 0.4504862 0.4599657 0.4660643
[97] 0.5093069 0.4620558 0.4655066 0.4682742
Suppose I have a data frame of 101 variables. I select one so-called Y as a dependent variable, and the remaining 100 so-called x_1, X_2,...,X_{100} as independent ones.
Now I would like to create a matrix containing 100 independent variables. What are the ways to do it directly? Like when I make a linear regression model, just use "." as regex, i.e lm(Y ~ ., _____)
You can use grep function to extract indpendent variable associated column names of the data frame. Then you can transform it into the matrix. Please see the code below:
# simulation of the data frame with 100 measurements and 101 variables
n <- 100
df <- data.frame(matrix(1:101 * n, ncol = 101))
names(df) <- c(paste0("X_", 1:100), "Y")
# extract matrix of Xs
m_x <- as.matrix(df[, grep("^X", names(df))])
dimnames(m_x)
Output:
[[1]]
NULL
[[2]]
[1] "X_1" "X_2" "X_3" "X_4" "X_5" "X_6" "X_7" "X_8" "X_9" "X_10" "X_11" "X_12" "X_13" "X_14" "X_15"
[16] "X_16" "X_17" "X_18" "X_19" "X_20" "X_21" "X_22" "X_23" "X_24" "X_25" "X_26" "X_27" "X_28" "X_29" "X_30"
[31] "X_31" "X_32" "X_33" "X_34" "X_35" "X_36" "X_37" "X_38" "X_39" "X_40" "X_41" "X_42" "X_43" "X_44" "X_45"
[46] "X_46" "X_47" "X_48" "X_49" "X_50" "X_51" "X_52" "X_53" "X_54" "X_55" "X_56" "X_57" "X_58" "X_59" "X_60"
[61] "X_61" "X_62" "X_63" "X_64" "X_65" "X_66" "X_67" "X_68" "X_69" "X_70" "X_71" "X_72" "X_73" "X_74" "X_75"
[76] "X_76" "X_77" "X_78" "X_79" "X_80" "X_81" "X_82" "X_83" "X_84" "X_85" "X_86" "X_87" "X_88" "X_89" "X_90"
[91] "X_91" "X_92" "X_93" "X_94" "X_95" "X_96" "X_97" "X_98" "X_99" "X_100"
I would like to construct a sequence with length 50 of the following type:
Xn+1=4*Xn*(1-Xn). For your information, this is the Logistic Map for r=4. In the case of the Logistic Map with parameter r = 4 and an initial state in (0,1), the attractor is also the interval (0,1) and the probability measure corresponds to the beta distribution with parameters a = 0.5 and b = 0.5. (The Logistic Map is a polynomial mapping (equivalently, recurrence relation) of degree 2, often cited as an archetypal example of how complex, chaotic behaviour can arise from very simple non-linear dynamical equations). How can I do this in R?
There are some ready to use solution on the net. I cite the general solution of mage's blog where you can find more detailed description.
logistic.map <- function(r, x, N, M){
## r: bifurcation parameter
## x: initial value
## N: number of iteration
## M: number of iteration points to be returned
z <- 1:N
z[1] <- x
for(i in c(1:(N-1))){
z[i+1] <- r *z[i] * (1 - z[i])
}
## Return the last M iterations
z[c((N-M):N)]
}
For OP example:
logistic.map(4,0.2,50,49)
This isn't really an R question, is it? More basic programming. Anyway, you probably need an accumulator and a value to process.
values <- 0.2 ## this accumulates as a vector, starting with 0.2
xn <- values ## xn gets the first value
for (it in 2:50) { ## start the loop from the second iteration
xn <- 4L*xn*(1L-xn) ## perform the sequence function
values <- c(values, xn) ## add the new value to the vector
}
values
# [1] 0.2000000000 0.6400000000 0.9216000000 0.2890137600 0.8219392261 0.5854205387 0.9708133262 0.1133392473 0.4019738493 0.9615634951 0 .1478365599 0.5039236459
# [13] 0.9999384200 0.0002463048 0.0009849765 0.0039360251 0.0156821314 0.0617448085 0.2317295484 0.7121238592 0.8200138734 0.5903644834 0 .9673370405 0.1263843622
# [25] 0.4416454208 0.9863789723 0.0537419811 0.2034151221 0.6481496409 0.9122067356 0.3203424285 0.8708926280 0.4497546341 0.9899016128 0 .0399856390 0.1535471506
# [37] 0.5198816927 0.9984188732 0.0063145074 0.0250985376 0.0978744041 0.3531800204 0.9137755744 0.3151590962 0.8633353611 0.4719496615 0 .9968527140 0.0125495222
# [49] 0.0495681269 0.1884445109
I would like to generate data following chi-square distribution with N=30 population. However I don't know if it is correct:
dchisq(1, df = 1:30)
# [1] 2.419707e-01 3.032653e-01 2.419707e-01 1.516327e-01 8.065691e-02
# [6] 3.790817e-02 1.613138e-02 6.318028e-03 2.304483e-03 7.897535e-04
# [11] 2.560537e-04 7.897535e-05 2.327761e-05 6.581279e-06 1.790585e-06
# [16] 4.700913e-07 1.193723e-07 2.938071e-08 7.021903e-09 1.632262e-09
# [21] 3.695738e-10 8.161308e-11 1.759875e-11 3.709686e-12 7.651632e-13
# [26] 1.545702e-13 3.060653e-14 5.945009e-15 1.133575e-15 2.123217e-16
If you would like to generate 30 random Chi-Squared variables, you need to use the rchisq() function.
rchisq(n, df, ncp = 0)
So you would replace n with 30 and the df with the number of degrees of freedom you require. You can read more here.
I have a vector file with 1000 values. All the values were generated using Random function between 0-1.
x <- runif(100,min=0,max=1)
x
[1] 0.84620011 0.82525410 0.31622827 0.08040362 0.12894525 0.23997187 0.57177296 0.91691368 0.65751720
[10] 0.39810175 0.60632205 0.26339035 0.93543618 0.09662383 0.35147739 0.51731042 0.29151612 0.54411769
[19] 0.73688309 0.26086586 0.37808273 0.19163366 0.62776847 0.70973345 0.31802726 0.69101574 0.50042561
[28] 0.20768256 0.23555818 0.21015820 0.18221151 0.85593725 0.12916935 0.52222127 0.62269135 0.51267707
[37] 0.60164023 0.30723904 0.81990231 0.61771762 0.02502631 0.47427724 0.21250040 0.88611710 0.88648546
[46] 0.92586513 0.57015942 0.33454379 0.03572245 0.68120369 0.48692522 0.76587764 0.55214917 0.31137200
[55] 0.47170307 0.48639510 0.68922858 0.73506033 0.23541740 0.81793240 0.17184666 0.06670039 0.55664270
[64] 0.10030533 0.94620061 0.58572228 0.53333567 0.80887841 0.55015406 0.82491114 0.81251132 0.06038019
[73] 0.10918904 0.84011824 0.33169617 0.03568364 0.07703029 0.15601158 0.31623253 0.25021777 0.77024833
[82] 0.88588620 0.49044305 0.10165930 0.55494697 0.17455070 0.94458467 0.43135868 0.99313733 0.04482747
[91] 0.53453604 0.52500493 0.35496966 0.06994880 0.11377845 0.71307042 0.35086237 0.04032254 0.23744845
[100] 0.81131033
Out of all these values in the vector, I need to find the most occurring value(Or close to that). I'm new to R and have no idea what this. Please help?
One approach I have - Divide all the values in a certain ranges and find the frequency distribution. But will it be helpful?
One possibility to analyze the distribution of the numbers could consist in plotting a histogram and adding an approximate probability density distribution.
This can be done with the ggplot2 library:
set.seed(123) # used here for reproducibility
x <- runif(100) # pseudo-random numbers between 0 and 1
library(ggplot2)
p <- ggplot(as.data.frame(x),aes(x=x, y=..density..)) +
geom_histogram(fill="lightblue",colour="grey60",bins=50) +
geom_density()
The value of bins specified in geom_histigram() is the number of bars in the histogram. You may want to try to change this value to obtain a different representation of the distribution.
OR
You could use base Rand plot a simple histogram:
hist(x)
There you can also change the bin width (see breaks), but the default might be sufficient to show the concept.
You can identify which bin in this histogram has the most entries with
> hist(x)$mids[which.max(hist(x)$counts)]
#[1] 0.45
Which in this case means that most values occur near a value of 0.45 (the middle of the bin describing the range between 0.4 and 0.5).
Hope this helps.
You can do this:
set.seed(12)
x <- runif(100,min=0,max=1)
n <- length(x)
x_cut<-cut(x, breaks = n/4)
which(table(x_cut)==max(table(x_cut)))
The result depends on the breaks value you set. This is an alternative to using a histogram if you don't need one.
To really get just the most occurrent value, or when using discrete data as input, you could simply create a table, sort the results and return the highest value:
values <- c("a", "a", "c", "c", "c")
names(sort(table(values), decreasing = TRUE)[1])
#> [1] "c"
Breaking it down:
# create a table of the values
table(values)
#> a c
#> 2 3
# sort the table descending on number of occurrences
sort(table(values), decreasing = TRUE)
#> c a
#> 3 2
# now only keep the first value
sort(table(values), decreasing = TRUE)[1]
#> c
#> 3
# so the final line:
names(sort(table(values), decreasing = TRUE)[1])
#> [1] "c"
If you're feeling like wanting to do fancy stuff, create a function that does this for you:
get_mode <- function(x) {
names(sort(table(values), decreasing = TRUE)[1])
}
get_mode(values)
#> [1] "c"