Dividing components of a vector into several data points in R - r

I am trying to turn a vector of length n (say, 14), and turn it into a vector of length N (say, 90). For example, my vector is
x<-c(5,3,7,11,12,19,40,2,22,6,10,12,12,4)
and I want to turn it into a vector of length 90, by creating 90 equally "spaced" points on this vector- think of x as a function. Is there any way to do that in R?

Something like this?
> x<-c(5,3,7,11,12,19,40,2,22,6,10,12,12,4)
> seq(min(x),max(x),length=90)
[1] 2.000000 2.426966 2.853933 3.280899 3.707865 4.134831 4.561798
[8] 4.988764 5.415730 5.842697 6.269663 6.696629 7.123596 7.550562
[15] 7.977528 8.404494 8.831461 9.258427 9.685393 10.112360 10.539326
[22] 10.966292 11.393258 11.820225 12.247191 12.674157 13.101124 13.528090
[29] 13.955056 14.382022 14.808989 15.235955 15.662921 16.089888 16.516854
[36] 16.943820 17.370787 17.797753 18.224719 18.651685 19.078652 19.505618
[43] 19.932584 20.359551 20.786517 21.213483 21.640449 22.067416 22.494382
[50] 22.921348 23.348315 23.775281 24.202247 24.629213 25.056180 25.483146
[57] 25.910112 26.337079 26.764045 27.191011 27.617978 28.044944 28.471910
[64] 28.898876 29.325843 29.752809 30.179775 30.606742 31.033708 31.460674
[71] 31.887640 32.314607 32.741573 33.168539 33.595506 34.022472 34.449438
[78] 34.876404 35.303371 35.730337 36.157303 36.584270 37.011236 37.438202
[85] 37.865169 38.292135 38.719101 39.146067 39.573034 40.000000
>

Try this:
#data
x <- c(5,3,7,11,12,19,40,2,22,6,10,12,12,4)
#expected new length
N=90
#number of numbers between 2 numbers
my.length.out=round((N-length(x))/(length(x)-1))+1
#new data
x1 <- unlist(
lapply(1:(length(x)-1), function(i)
seq(x[i],x[i+1],length.out = my.length.out)))
#plot
par(mfrow=c(2,1))
plot(x)
plot(x1)

Related

Using R, How do I copy the tibble to an element of the list. for example, each element like ff[i] have a nibble at each i

Use vector() to create an empty vector called ff that is of mode “list” and length 9. Now write a for() loop to loop over the 9 files in dfiles and for each (i) read the file in to a tibble, and change the column names to x and y, and (ii) copy the tibble to an element of your list ff.
dfiles is a directory which has different files.
This is what I did.
ff <- vector(mode = "list", length = 9)
length <- length(dfiles)
for (i in 1:length) {
study <- read_csv(dfiles[i])
names(study)[1] <- "x"
names(study)[2] <- "y"
ff[i] <- c(study)
print(head(ff[i]))
}
[[1]]
[1] -0.989532202 -0.052799402 0.823610903 -0.255509103 -0.220684347
[6] 0.307726791 -0.060013253 -0.555652890 -0.138615019 1.882839792
[11] 0.873668680 -0.914597073 -1.244917622 -0.359982241 1.328774701
[16] 0.292679118 -0.701505237 0.882234568 -0.133370389 -1.120678499
[21] 0.461192454 1.524142810 0.434468298 0.192000371 -0.656243128
[26] 0.568398531 -1.070570535 -1.653149024 -0.043352768 -0.034593506
[31] 2.365055532 -1.216347308 0.170906323 0.805053094 1.050592844
[36] -0.010724485 -0.743256141 -0.065784052 1.939755992 0.482739008
[41] -2.044477073 1.423459129 0.540502661 -0.033571772 -0.017863621
[46] -0.149789720 0.256559481 -0.503866933 0.277011252 -0.931356025
[51] 0.200146875 1.106837421 0.509206114 1.033749676 -1.090868762
[56] 0.054792784 0.617250303 -1.068004868 1.565814337 -1.034808011
[61] 0.164518709 0.151832330 0.121670302 -0.210424584 0.449936787
[66] -1.031164492 -1.289364188 -0.654568638 -0.057324104 1.256747820
[71] 1.587454140 0.319481463 0.381591623 -0.243644884 0.048053084
[76] -1.404545861 0.289933729 -0.535553582 0.334678773 -0.345981339
[81] -0.661615735 -0.219111377 -0.366904911 1.094578208 0.209208082
[86] 0.432491426 -1.240853586 1.496821710 0.159370441 -0.856281403
[91] 0.309046645 0.870434030 -1.383677138 1.690106970 -0.158030705
[96] 1.121170781 0.072261319 -0.332422845 -1.834920047 -1.100172219
[101] -0.041340300 0.827852545 -1.881678654 1.375441112 1.398990464
[106] -1.143316256 0.472300562 -1.033639213 -0.125199979 0.928662739
[111] 0.868339648 -0.849174604 -0.386636454 -0.976163571 0.339543660
[116] -1.559075164 -2.629325442 1.469812282 2.273472913 -0.455033540
[121] 0.761102487 -0.007502784 1.474313800
and the following error.
1: In ff[i] <- c(study) :
number of items to replace is not a multiple of replacement length
2: In ff[i] <- c(study) :
I was expecting that it'll still have column names so I am not sure how to fix it and where I am going wrong.
Was supposed to use double brackets.
ff[[i]] <- study would fix the problem.

How to add new value to existing dataset so that only the range changes but mean remains the same in R?

Hi I'm a student studying statistic, as my textbook does not include much of the R coding but more of the basic calculation. Hence, would like to ask if it is there a way in R, for adding additional number to the existing generated set with specific mean and range?
1(a) Apply R to simulate a set of 100 numbers, with mean value of 20 and standard deviation of 2. List out the set of numbers.
> x <- rnorm(100,20,2)
> print(x)
[1] 20.59256 20.66069 12.68841 21.13575 24.09587 21.69535 20.18661 21.71236 20.92864 19.63182 22.12583 19.06238
[13] 18.73813 22.59813 17.30012 16.98957 20.74050 21.28319 19.75426 20.62065 20.20814 18.16406 22.24261 22.05673
[25] 21.27086 18.78538 21.86479 18.03242 21.00538 20.27731 22.59440 23.24389 20.20846 19.73281 19.50040 20.51712
[37] 20.16493 23.56715 21.25884 18.37542 19.84470 19.81911 16.94701 19.06637 17.74580 18.03151 19.57144 16.45314
[49] 20.89975 21.86249 17.42996 23.52514 21.17759 20.20160 18.11839 21.69716 16.93685 20.62335 20.37935 22.46131
[61] 17.78489 19.90424 17.67674 20.20571 21.60567 20.41897 20.25134 22.44366 19.06513 20.62692 24.04101 24.03634
[73] 20.15566 20.33157 20.22881 20.54014 19.49401 17.34388 19.94099 18.71450 19.24386 19.91813 18.71863 20.94027
[85] 17.55676 17.18079 24.96868 24.09565 19.87488 20.06114 19.21374 18.39874 21.01435 18.38329 20.91788 21.45158
[97] 20.43168 21.80438 20.50405 23.07149
(b) Add another 2 numbers to the set simulated in Question 1(a), such that the new set now has (same) mean of 20, but range becomes 200. List out the set of numbers.
First create reproducible data:
set.seed(42)
x <- rnorm(100,20,2)
mean(x)
# [1] 20.06503
range(x)
# [1] 14.01382 24.57329
(x2 <- mean(x) + c(-100, 100))
# [1] -79.93497 120.06503
To keep the mean the same we need to add points 100 above the mean and 100 below the mean. Fortunately these points lie beyond the original range.
mean(c(x, x2))
# [1] 20.06503
diff(range(c(x, x2)))
# [1] 200
The mean is the same and the range is now 200.
As you need a range of 200, then each aggregation should be current_range-+desired_range/2
Solution in code:
> x <- rnorm(100,20,2)
>
> x
[1] 17.84671 19.02797 23.83426 21.28975 20.35738 19.35365 22.57753 15.09991 18.18989 21.61537 20.97786 20.74412 20.95964
[14] 20.00677 13.79552 16.65435 23.48840 19.50842 25.10979 21.10134 19.15891 22.58312 23.65634 17.89358 17.98529 22.33547
[27] 20.84291 21.28044 22.37447 16.89740 19.95510 17.67625 19.64634 18.07762 21.50655 18.62182 18.59671 15.53542 12.85074
[40] 19.06638 19.90743 18.64610 20.71322 22.78706 22.33449 22.30899 17.09384 21.57055 19.88208 18.85795 18.52198 23.70028
[53] 22.91794 20.24993 20.63627 19.01672 19.34706 17.42375 21.88536 20.91214 21.16099 23.54738 21.40821 21.06485 23.95725
[66] 21.09893 16.15641 21.28983 19.27113 17.89774 23.24801 23.23136 22.67976 23.21619 20.17257 21.09512 16.83565 22.17975
[79] 20.50282 23.86079 14.97483 16.91109 18.66540 21.79649 21.01789 18.81188 19.77038 25.04698 17.69211 20.04085 17.29910
[92] 18.98335 16.37297 19.78979 18.83341 16.60093 19.41327 17.85721 22.55003 16.67850
>
> mean(x)
[1] 19.99774
>
> sd(x)
[1] 2.494173
>
> range <- range(x)[2]-range(x)[1]
>
> range
[1] 12.25905
>
> x <- c(x,range+100,range-100)
>
> mean(x)
[1] 19.846
>
> sd(x)
[1] 14.3276
>
> range <- range(x)[2]-range(x)[1]
>
> range
[1] 200
>

Split a sequence of numbers into groups of 10 digits using R

I would like for R to read in the first 10,000 digits of Pi and group every 10 digits together
e.g., I want R to read in a sequence
pi <- 3.14159265358979323846264338327950288419716939937510582097...
and would like R to give me a table where each row contains 10 digit:
3141592653
5897932384
6264338327
...
I am new to R and really don't know where to start so any help would be much appreciated!
Thank you in advance
https://rextester.com/OQRM27791
p <- strsplit("314159265358979323846264338327950288419716939937510582097", "")
digits <- p[[1]]
split(digits, ceiling((1:length(digits)) / 10));
Here's one way to do it. It's fully reproducible, so just cut and paste it into your R console. The vector result is the first 10,000 digits of pi, split into 1000 strings of 10 digits.
For this many digits, I have used an online source for the precalculated value of pi. This is read in using readChar and the decimal point is stripped out with gsub. The resulting string is split into individual characters and put in a 1000 * 10 matrix (filled row-wise). The rows are then pasted into strings, giving the result. I have displayed only the first 100 entries of result for clarity of presentation.
pi_url <- "https://www.pi2e.ch/blog/wp-content/uploads/2017/03/pi_dec_1m.txt"
pi_char <- gsub("\\.", "", readChar(url, 1e4 + 1))
pi_mat <- matrix(strsplit(pi_char, "")[[1]], byrow = TRUE, ncol = 10)
result <- apply(pi_mat, 1, paste0, collapse = "")
head(result, 100)
#> [1] "3141592653" "5897932384" "6264338327" "9502884197" "1693993751"
#> [6] "0582097494" "4592307816" "4062862089" "9862803482" "5342117067"
#> [11] "9821480865" "1328230664" "7093844609" "5505822317" "2535940812"
#> [16] "8481117450" "2841027019" "3852110555" "9644622948" "9549303819"
#> [21] "6442881097" "5665933446" "1284756482" "3378678316" "5271201909"
#> [26] "1456485669" "2346034861" "0454326648" "2133936072" "6024914127"
#> [31] "3724587006" "6063155881" "7488152092" "0962829254" "0917153643"
#> [36] "6789259036" "0011330530" "5488204665" "2138414695" "1941511609"
#> [41] "4330572703" "6575959195" "3092186117" "3819326117" "9310511854"
#> [46] "8074462379" "9627495673" "5188575272" "4891227938" "1830119491"
#> [51] "2983367336" "2440656643" "0860213949" "4639522473" "7190702179"
#> [56] "8609437027" "7053921717" "6293176752" "3846748184" "6766940513"
#> [61] "2000568127" "1452635608" "2778577134" "2757789609" "1736371787"
#> [66] "2146844090" "1224953430" "1465495853" "7105079227" "9689258923"
#> [71] "5420199561" "1212902196" "0864034418" "1598136297" "7477130996"
#> [76] "0518707211" "3499999983" "7297804995" "1059731732" "8160963185"
#> [81] "9502445945" "5346908302" "6425223082" "5334468503" "5261931188"
#> [86] "1710100031" "3783875288" "6587533208" "3814206171" "7766914730"
#> [91] "3598253490" "4287554687" "3115956286" "3882353787" "5937519577"
#> [96] "8185778053" "2171226806" "6130019278" "7661119590" "9216420198"
Created on 2020-07-23 by the reprex package (v0.3.0)
We can use str_extract:
pi <- readLines("https://www.pi2e.ch/blog/wp-content/uploads/2017/03/pi_dec_1m.txt")
library(stringr)
t <- unlist(str_extract_all(sub("\\.","", pi), "\\d{10}"))
t[1:100]
[1] "3141592653" "5897932384" "6264338327" "9502884197" "1693993751" "0582097494" "4592307816" "4062862089"
[9] "9862803482" "5342117067" "9821480865" "1328230664" "7093844609" "5505822317" "2535940812" "8481117450"
[17] "2841027019" "3852110555" "9644622948" "9549303819" "6442881097" "5665933446" "1284756482" "3378678316"
[25] "5271201909" "1456485669" "2346034861" "0454326648" "2133936072" "6024914127" "3724587006" "6063155881"
[33] "7488152092" "0962829254" "0917153643" "6789259036" "0011330530" "5488204665" "2138414695" "1941511609"
[41] "4330572703" "6575959195" "3092186117" "3819326117" "9310511854" "8074462379" "9627495673" "5188575272"
[49] "4891227938" "1830119491" "2983367336" "2440656643" "0860213949" "4639522473" "7190702179" "8609437027"
[57] "7053921717" "6293176752" "3846748184" "6766940513" "2000568127" "1452635608" "2778577134" "2757789609"
[65] "1736371787" "2146844090" "1224953430" "1465495853" "7105079227" "9689258923" "5420199561" "1212902196"
[73] "0864034418" "1598136297" "7477130996" "0518707211" "3499999983" "7297804995" "1059731732" "8160963185"
[81] "9502445945" "5346908302" "6425223082" "5334468503" "5261931188" "1710100031" "3783875288" "6587533208"
[89] "3814206171" "7766914730" "3598253490" "4287554687" "3115956286" "3882353787" "5937519577" "8185778053"
[97] "2171226806" "6130019278" "7661119590" "9216420198"

R: Using for loop on data frame

I have a data frame, deflator.
I want to get a new data frame inflation which can be calculated by:
deflator[i] - deflator[i-4]
----------------------------- * 100
deflator [i - 4]
The data frame deflator has 71 numbers:
> deflator
[1] 0.9628929 0.9596746 0.9747274 0.9832532 0.9851884
[6] 0.9797770 0.9913502 1.0100561 1.0176906 1.0092516
[11] 1.0185932 1.0241043 1.0197975 1.0174097 1.0297328
[16] 1.0297071 1.0313232 1.0244618 1.0347808 1.0480411
[21] 1.0322142 1.0351968 1.0403264 1.0447121 1.0504402
[26] 1.0487097 1.0664664 1.0935239 1.0965951 1.1141851
[31] 1.1033155 1.1234482 1.1333870 1.1188136 1.1336276
[36] 1.1096461 1.1226584 1.1287245 1.1529588 1.1582911
[41] 1.1691221 1.1782178 1.1946234 1.1963453 1.1939922
[46] 1.2118189 1.2227960 1.2140535 1.2228828 1.2314258
[51] 1.2570788 1.2572214 1.2607763 1.2744415 1.2982076
[56] 1.3318808 1.3394186 1.3525902 1.3352815 1.3492751
[61] 1.3593859 1.3368135 1.3642940 1.3538567 1.3658135
[66] 1.3710932 1.3888638 1.4262185 1.4309707 1.4328823
[71] 1.4497201
This is a very tricky question for me.
I tried to do this using a for loop:
> d <- data.frame(deflator)
> for (i in 1:71) {d <-rbind(d,c(delfaotr ))}
I think I might be doing it wrong.
Why use data frames? This is a straightforward vector operation.
inflation = 100 * (deflator[1:67] - deflator[-(1:4)])/deflator[-(1:4)]
I agree with #Fhnuzoag that your example suggests calculations on a numeric vector, not a data frame. Here's an additional way to do your calculations taking advantage of the lag argument in the diff function (with indexes that match those in your question):
lagBy <- 4 # The number of indexes by which to lag
laggedDiff <- diff(deflator, lag = lagBy) # The numerator above
theDenom <- deflator[seq_len(length(deflator) - lagBy)] # The denominator above
inflation <- laggedDiff/theDenom
The first few results are:
head(inflation)
# [1] 0.02315470 0.02094710 0.01705379 0.02725941 0.03299085 0.03008297

Range standardization (0 to 1) in R [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
scale a series between two points in R
Does any know of an R function to perform range standardization on a vector? I'm looking to transform variables to a scale between 0 and 1, while retaining rank order and the relative size of separation between values.
Just to be clear, i'm not looking to standardize variables by mean centering and scaling by the SD, as is done in the function scale().
I tried the functions mmnorm() and rangenorm() in the package 'dprep', but these don't seem to do the job.
s = sort(rexp(100))
range01 <- function(x){(x-min(x))/(max(x)-min(x))}
range01(s)
[1] 0.000000000 0.003338782 0.007572326 0.012192201 0.016055006 0.017161145
[7] 0.019949532 0.023839810 0.024421602 0.027197168 0.029889484 0.033039408
[13] 0.033783376 0.038051265 0.045183382 0.049560233 0.056941611 0.057552543
[19] 0.062674982 0.066001242 0.066420884 0.067689067 0.069247825 0.069432174
[25] 0.070136067 0.076340460 0.078709590 0.080393512 0.085591881 0.087540132
[31] 0.090517295 0.091026499 0.091251213 0.099218526 0.103236344 0.105724733
[37] 0.107495340 0.113332392 0.116103438 0.124050331 0.125596034 0.126599323
[43] 0.127154661 0.133392300 0.134258532 0.138253452 0.141933433 0.146748798
[49] 0.147490227 0.149960293 0.153126478 0.154275371 0.167701855 0.170160948
[55] 0.180313542 0.181834891 0.182554291 0.189188137 0.193807559 0.195903010
[61] 0.208902645 0.211308713 0.232942314 0.236135220 0.251950116 0.260816843
[67] 0.284090255 0.284150541 0.288498370 0.295515143 0.299408623 0.301264703
[73] 0.306817872 0.307853369 0.324882091 0.353241217 0.366800517 0.389474449
[79] 0.398838576 0.404266315 0.408936260 0.409198619 0.415165553 0.433960390
[85] 0.440690262 0.458692639 0.464027428 0.474214070 0.517224262 0.538532221
[91] 0.544911543 0.559945121 0.585390414 0.647030109 0.694095422 0.708385079
[97] 0.736486707 0.787250428 0.870874773 1.000000000
Adding ... will allow you to pass through na.rm = T if you want to omit missing values from the calculation (they will still be present in the results):
range01 <- function(x, ...){(x - min(x, ...)) / (max(x, ...) - min(x, ...))}

Resources