I just discovered R and I am trying to work with it.
Here is what I am trying to achieve:
I have a vector of numbers, x, between 50 and 100 and with a size of 250 observations.
x = sample(seq(50, 100), 250, repeat = T)
Now, I want to generate another vector of numbers, y, between 0 and 100, which is the same size as vector x such that each element in y is less than or equal to its equivalent in x.
That is to say that if x[1] is 76, for example, the highest value y[1] could attain when generated is 76. But it could definitely be any other value below 76. In other words and more generally, I want vector y to be generated in such a way that y[i] <= x[i].
I hope I have made my request clearer.
Thank you very much!
y <- x -1 # ...........................
y <- sapply( x, function(x) runif(n=1, max=x))
y
[1] 7.2713788 30.0008063 42.5205775 0.9271717 10.7114456 39.5199145 7.4109775
[8] 28.3464373 28.5840101 34.0654033 15.0675028 50.2836294 45.9031794 13.5931005
[15] 43.2751738 17.0560824 3.1507491 25.7619129 12.3391448 22.6203684 51.3334810
[22] 37.0481703 33.4733277 37.1304850 26.7984406 66.3844126 40.2775918 47.6379024
[29] 16.2480595 66.8358384 33.3513161 60.2673874 65.6204462 45.6951960 1.5729434
[36] 20.4850357 0.1345737 84.5334203 19.7997451 53.8025623 48.5528486 8.8992123
[43] 90.9651742 28.3584167 41.7728159 46.4790641 17.8129578 83.1906415 37.5114353
[50] 89.5685501 85.2499600
Related
I have calculated the differences of my data points and received this vector:
> diff(smooth$a)/(diff(smooth$b))
[1] -0.0099976150 0.0011162606 0.0116275973 0.0247594149 0.0213592319 0.0205187495 0.0179274056 0.0207752713
[9] 0.0231903072 -0.0077549224 -0.0401528643 -0.0477294350 -0.0340842051 -0.0148157337 0.0003829642 0.0160912230
[17] 0.0311189830
Now I want to get the positions (index) where I have a change from negative to positive when the following 3 data points are also positive.
So my output would be like this:
> output
-0.0099976150 -0.0148157337
How could I do this?
One way like this:
series <- paste(ifelse(vec < 0, 0, 1), collapse = '')
vec[gregexpr('0111', series)[[1]]]
#[1] -0.009997615 -0.014815734
The first line creates a sequence of 0s and 1s depending on the sign of the number. In the second line of the code we capture the sequence with gregexpr. Finally, we use these indices to subset the original vector.
Imagine a vector z:
z <- seq(-2, 2, length.out = 20)
z
#> [1] -2.0000000 -1.7894737 -1.5789474 -1.3684211 -1.1578947 -0.9473684 -0.7368421 -0.5263158
#> [9] -0.3157895 -0.1052632 0.1052632 0.3157895 0.5263158 0.7368421 0.9473684 1.1578947
#> [17] 1.3684211 1.5789474 1.7894737 2.0000000
then you can do
turn_point <- which(z == max(z[z < 0]))
turn_plus_one <- c(turn_point, turn_point + 1)
z[turn_plus_one]
#> [1] -0.1052632 0.1052632
Suppose I have a data frame of 101 variables. I select one so-called Y as a dependent variable, and the remaining 100 so-called x_1, X_2,...,X_{100} as independent ones.
Now I would like to create a matrix containing 100 independent variables. What are the ways to do it directly? Like when I make a linear regression model, just use "." as regex, i.e lm(Y ~ ., _____)
You can use grep function to extract indpendent variable associated column names of the data frame. Then you can transform it into the matrix. Please see the code below:
# simulation of the data frame with 100 measurements and 101 variables
n <- 100
df <- data.frame(matrix(1:101 * n, ncol = 101))
names(df) <- c(paste0("X_", 1:100), "Y")
# extract matrix of Xs
m_x <- as.matrix(df[, grep("^X", names(df))])
dimnames(m_x)
Output:
[[1]]
NULL
[[2]]
[1] "X_1" "X_2" "X_3" "X_4" "X_5" "X_6" "X_7" "X_8" "X_9" "X_10" "X_11" "X_12" "X_13" "X_14" "X_15"
[16] "X_16" "X_17" "X_18" "X_19" "X_20" "X_21" "X_22" "X_23" "X_24" "X_25" "X_26" "X_27" "X_28" "X_29" "X_30"
[31] "X_31" "X_32" "X_33" "X_34" "X_35" "X_36" "X_37" "X_38" "X_39" "X_40" "X_41" "X_42" "X_43" "X_44" "X_45"
[46] "X_46" "X_47" "X_48" "X_49" "X_50" "X_51" "X_52" "X_53" "X_54" "X_55" "X_56" "X_57" "X_58" "X_59" "X_60"
[61] "X_61" "X_62" "X_63" "X_64" "X_65" "X_66" "X_67" "X_68" "X_69" "X_70" "X_71" "X_72" "X_73" "X_74" "X_75"
[76] "X_76" "X_77" "X_78" "X_79" "X_80" "X_81" "X_82" "X_83" "X_84" "X_85" "X_86" "X_87" "X_88" "X_89" "X_90"
[91] "X_91" "X_92" "X_93" "X_94" "X_95" "X_96" "X_97" "X_98" "X_99" "X_100"
I have a vector file with 1000 values. All the values were generated using Random function between 0-1.
x <- runif(100,min=0,max=1)
x
[1] 0.84620011 0.82525410 0.31622827 0.08040362 0.12894525 0.23997187 0.57177296 0.91691368 0.65751720
[10] 0.39810175 0.60632205 0.26339035 0.93543618 0.09662383 0.35147739 0.51731042 0.29151612 0.54411769
[19] 0.73688309 0.26086586 0.37808273 0.19163366 0.62776847 0.70973345 0.31802726 0.69101574 0.50042561
[28] 0.20768256 0.23555818 0.21015820 0.18221151 0.85593725 0.12916935 0.52222127 0.62269135 0.51267707
[37] 0.60164023 0.30723904 0.81990231 0.61771762 0.02502631 0.47427724 0.21250040 0.88611710 0.88648546
[46] 0.92586513 0.57015942 0.33454379 0.03572245 0.68120369 0.48692522 0.76587764 0.55214917 0.31137200
[55] 0.47170307 0.48639510 0.68922858 0.73506033 0.23541740 0.81793240 0.17184666 0.06670039 0.55664270
[64] 0.10030533 0.94620061 0.58572228 0.53333567 0.80887841 0.55015406 0.82491114 0.81251132 0.06038019
[73] 0.10918904 0.84011824 0.33169617 0.03568364 0.07703029 0.15601158 0.31623253 0.25021777 0.77024833
[82] 0.88588620 0.49044305 0.10165930 0.55494697 0.17455070 0.94458467 0.43135868 0.99313733 0.04482747
[91] 0.53453604 0.52500493 0.35496966 0.06994880 0.11377845 0.71307042 0.35086237 0.04032254 0.23744845
[100] 0.81131033
Out of all these values in the vector, I need to find the most occurring value(Or close to that). I'm new to R and have no idea what this. Please help?
One approach I have - Divide all the values in a certain ranges and find the frequency distribution. But will it be helpful?
One possibility to analyze the distribution of the numbers could consist in plotting a histogram and adding an approximate probability density distribution.
This can be done with the ggplot2 library:
set.seed(123) # used here for reproducibility
x <- runif(100) # pseudo-random numbers between 0 and 1
library(ggplot2)
p <- ggplot(as.data.frame(x),aes(x=x, y=..density..)) +
geom_histogram(fill="lightblue",colour="grey60",bins=50) +
geom_density()
The value of bins specified in geom_histigram() is the number of bars in the histogram. You may want to try to change this value to obtain a different representation of the distribution.
OR
You could use base Rand plot a simple histogram:
hist(x)
There you can also change the bin width (see breaks), but the default might be sufficient to show the concept.
You can identify which bin in this histogram has the most entries with
> hist(x)$mids[which.max(hist(x)$counts)]
#[1] 0.45
Which in this case means that most values occur near a value of 0.45 (the middle of the bin describing the range between 0.4 and 0.5).
Hope this helps.
You can do this:
set.seed(12)
x <- runif(100,min=0,max=1)
n <- length(x)
x_cut<-cut(x, breaks = n/4)
which(table(x_cut)==max(table(x_cut)))
The result depends on the breaks value you set. This is an alternative to using a histogram if you don't need one.
To really get just the most occurrent value, or when using discrete data as input, you could simply create a table, sort the results and return the highest value:
values <- c("a", "a", "c", "c", "c")
names(sort(table(values), decreasing = TRUE)[1])
#> [1] "c"
Breaking it down:
# create a table of the values
table(values)
#> a c
#> 2 3
# sort the table descending on number of occurrences
sort(table(values), decreasing = TRUE)
#> c a
#> 3 2
# now only keep the first value
sort(table(values), decreasing = TRUE)[1]
#> c
#> 3
# so the final line:
names(sort(table(values), decreasing = TRUE)[1])
#> [1] "c"
If you're feeling like wanting to do fancy stuff, create a function that does this for you:
get_mode <- function(x) {
names(sort(table(values), decreasing = TRUE)[1])
}
get_mode(values)
#> [1] "c"
I am trying to generate 100 samples of Z, where Z is the summation of 8 independent uniformly distributed random variables in the interval [0;1]
I have the following code so far, but I'm not sure if it's correct. I am not sure if my loop is correct
eight<-runif(8,0,1) #Uses the runif function to generate 8 uniform 0-1 random variables
Z_1<-sum(eight) #caclulates the sum and stores it in the variable Z_1
sample <-NA
for (i in 1:100 ) { #Function continues the loop for 100 different values
eight<-runif(8,0,1); #Creates sum loop for 8 independent values uniform 0-1 random variables.
Z_1<-sum(eight); # stores in the sum loop in the variable Z
sample[i] = Z_1;
}`
Thanks
I would vectorize the whole thing. There is no real reason to run 100 iterations when you can just generate 800 observations in one run. Then just use matrix and colSums and you done
set.seed(123)
n <- 100
Z <- colSums(matrix(runif(8 * n), 8, n))
Z
# [1] 4.774425 4.671171 4.787691 4.804041 3.513257 2.330163 3.300135 3.568657 5.503481 2.861764 4.533283 3.353658
# [13] 4.230073 4.690739 4.364708 3.094156 4.933616 3.942834 3.712522 2.587036 3.731474 4.388749 4.484030 4.315968
# [25] 4.800758 4.252631 2.716972 5.159044 4.146881 3.244546 4.418618 4.350035 5.344182 3.176801 3.903337 2.261935
# [37] 3.646572 4.286075 3.074900 4.210506 3.172513 4.535665 4.245856 4.184848 4.532286 2.899883 4.473629 4.751224
# [49] 3.498038 3.337437 4.238989 3.349812 3.888696 4.748254 3.029479 4.246619 3.330434 3.879168 3.786216 3.839956
# [61] 3.878997 4.546531 2.863010 3.417072 4.266108 3.141875 4.960758 3.989613 4.373042 4.295742 4.459014 5.561066
# [73] 4.401990 4.121301 3.830575 3.412446 3.812347 5.591238 3.801587 4.454336 4.213343 5.222007 4.300991 2.765003
# [85] 3.293251 5.362586 2.954080 3.036312 3.655677 3.373603 5.575184 4.167740 3.904038 3.884440 2.901452 3.079311
# [97] 4.927770 3.930943 4.169907 2.922618
require(fracdiff)
#load your data, n is the sample size
x<-matrix(scan("data.txt", n = ), n, 1, byrow = TRUE)
x<-log(x)
x<-x-mean(x)
n<-length(x)
#select the truncation
m<-round(n^(0.7))
perdx<-px[2:n]
fn<-function(h)
{
lambda<-freq[1:m]^(2*h-1)
Gh=mean(perdx[1:m]*lambda)
Rh=log(Gh)-(2*h-1)*mean(log(freq[1:m]))
return(Rh)
}
est<-optimize(fn,c(0,1.5),tol = 0.00001)
hhat<-est$minimum
b <- hhat-0.5
b
I have this function written in R and I want to do loop for m, where m<-round(n^(0.7)) where the power of n running from 0.3-0.8(the default is 0.7, so I have different value of m supply to the function), and ultimately get the string of b (every b for the power of n running from 0.3-0.8), but so far I have been unsuccessful. Furthermore, I want to plot the different values of b with respect to m. I'm really hoping anyone can suggest me how get the result. Any suggestion is highly appreciated. thank you.
First some dummy data:
set.seed(1983)
freq <- rnorm(100, mean=10)
perdx <- rnorm(100, mean=100, sd=10)
Then your function (slightly shortened and modified: you need to add m as a parameter since it will be changing at each iteration), vector m and an empty vector b (you do want to allocate upfront the length of the vector):
m <- 1:100
b <- vector(length=length(m))
fn <- function(h,m){
lambda <- freq[1:m]^(2*h-1)
Gh <- mean(perdx[1:m]*lambda)
log(Gh)-(2*h-1)*mean(log(freq[1:m]))
}
And finally your loop:
for(i in seq_along(m)){b[i] <- optimize(fn,c(0,1.5),tol = 0.00001, m=m[i])$minimum - 0.5}
b
[1] 0.370809995 0.143004969 0.295652288 0.341975500 0.155323568 -0.004270843 -0.004463482 -0.005151013 -0.019702428 -0.066622938 -0.071558276 -0.051269281 -0.010162819 -0.011613268
[15] -0.043173232 -0.023702358 -0.017404588 -0.041314701 -0.041849379 -0.039659543 -0.042926431 -0.041149212 -0.050584172 -0.051101425 -0.051999900 -0.053473729 -0.007245899 -0.023556513
[29] -0.026109458 -0.035935971 -0.063366257 -0.048185532 -0.051862241 -0.051659993 -0.078318886 -0.080683266 -0.082146068 -0.088776082 -0.095815094 -0.097276217 -0.099827675 -0.090215664
[43] -0.091023273 -0.090649640 -0.091877778 -0.091318363 -0.083812376 -0.091700899 -0.086337626 -0.105456723 -0.105972890 -0.101094946 -0.101748039 -0.101323129 -0.070511638 -0.081105305
[57] -0.072667430 -0.072361640 -0.069692202 -0.067384208 -0.072985712 -0.063617816 -0.064122242 -0.067135980 -0.070663150 -0.069359528 -0.069691113 -0.084422380 -0.073379583 -0.072209507
[71] -0.069132825 -0.067681419 -0.063782326 -0.057532656 -0.063031479 -0.054001810 -0.053523184 -0.051783114 -0.053388449 -0.055742505 -0.052429781 -0.058399275 -0.059529803 -0.059389065
[85] -0.058834476 -0.043061836 -0.045186752 -0.048336234 -0.055597368 -0.065307991 -0.060903775 -0.062518358 -0.062898386 -0.059452595 -0.051983381 -0.049742105 -0.050124722 -0.049212744
[99] -0.041458672 -0.043251041