First and foremost, regardless if you have input or not, thank you for taking your time to view my question.
Let me break down what I am doing, the sample dataset, and the error.
What I currently have is data for several different ID's that list the dispersion per day. (you will see below). I want to loop through the dates and add two columns to the data : Rolling Means columns & Rolling standard deviation column.
The code I have written out so far is this:
library(zoo)
Testing1 <- function(dataset, k) {
ops <- data.frame()
for (i in unique(dataset$Date)) {
ops <- dataset %>% mutate(rolling_mean = rollmean(dataset$Dispersion,k)) %>%
mutate(rolling_std = rollapply(dataset$Dispersion, width = k, FUN = sd))
}
Results <<- ops
}
however, i get the following error:
Error in mutate_impl(.data, dots) :
Column rolling_mean must be length 30 (the number of rows) or one, not 26
I am assuming that the row differential is due to me specifying a 5 day window for the rolling average, meaning it won't calculate it for the first 4 rows. But how do I go about telling R that it's ok to input NA's on those rows? Or If you guys have any other solution, that would work as well. Please do help.
Heres a sample of the data:
Identifier Date Dispersion
1000 2/15/2018 0.390
1000 2/16/2018 0.664
1000 2/17/2018 0.526
1000 2/18/2018 0.933
1000 2/19/2018 0.009
1000 2/20/2018 0.987
1000 2/21/2018 0.517
1000 2/22/2018 0.641
1000 2/23/2018 0.777
1000 2/24/2018 0.613
1001 2/15/2018 0.617
1001 2/16/2018 0.234
1001 2/17/2018 0.303
1001 2/18/2018 0.796
1001 2/19/2018 0.359
1001 2/20/2018 0.840
1001 2/21/2018 0.291
1001 2/22/2018 0.699
1001 2/23/2018 0.882
1001 2/24/2018 0.467
1002 2/15/2018 0.042
1002 2/16/2018 0.906
1002 2/17/2018 0.077
1002 2/18/2018 0.156
1002 2/19/2018 0.350
1002 2/20/2018 0.060
1002 2/21/2018 0.457
1002 2/22/2018 0.770
1002 2/23/2018 0.433
1002 2/24/2018 0.366
You get this error because the length of rolling means/stds does not match the legth of Dispersion. Simply add k - 1 NAs at the beginnig of your means/stds vectors.
Below is a working example. You can modify this based on your needs.
my_function <- function(df, k) {
df %>%
mutate(
rolling_mean = c(rep(NA, k - 1), rollmean(Dispersion, k)),
rolling_std = c(rep(NA, k - 1), rollapply(Dispersion, width = k, FUN = sd))
)
}
For example, you may want to add group_by to compute these values for each Identifier:
my_function <- function(df, k) {
df %>%
group_by(Identifier) %>%
mutate(
rolling_mean = c(rep(NA, k - 1), rollmean(Dispersion, k)),
rolling_std = c(rep(NA, k - 1), rollapply(Dispersion, width = k, FUN = sd))
)
}
Update following up #G. Grothendieck's comment:
It turns out the package zoo already has comprehensive features for NA handling, refactoring the above-given code as:
my_function <- function(df, k) {
df %>%
mutate(
rolling_mean = rollmeanr(Dispersion, k, fill = NA),
rolling_std = rollapplyr(Dispersion, width = k, FUN = sd, fill = NA)
)
}
I'd take a look at tibbletime.
Assuming your data frame is named mydata and the Date column is a character: first convert the Date, then convert to a time-aware tibble:
library(dplyr)
library(tibbletime)
mydata <- mydata %>%
mutate(Date = as.Date(Date, "%m/%d/%Y")) %>%
as_tbl_time(index = Date)
Now you can define functions for rolling mean and sd:
mean_5 <- rollify(mean, window = 5)
sd_5 <- rollify(sd, window = 5)
mydata %>%
mutate(rolling_mean = mean_5(Dispersion),
rolling_std = sd_5(Dispersion))
# A time tibble: 30 x 5
# Index: Date
Identifier Date Dispersion rolling_mean rolling_std
<int> <date> <dbl> <dbl> <dbl>
1 1000 2018-02-15 0.39 NA NA
2 1000 2018-02-16 0.664 NA NA
3 1000 2018-02-17 0.526 NA NA
4 1000 2018-02-18 0.933 NA NA
5 1000 2018-02-19 0.009 0.504 0.342
6 1000 2018-02-20 0.987 0.624 0.393
7 1000 2018-02-21 0.517 0.594 0.394
8 1000 2018-02-22 0.641 0.617 0.393
9 1000 2018-02-23 0.777 0.586 0.367
10 1000 2018-02-24 0.613 0.707 0.182
# ... with 20 more rows
Related
Suppose I am using panel data: for each individual and time, there is an observation of a numerical variable. I want to apply a function to this numerical variable but this function outputs a vector of numbers. I'd like to apply this function over the observations of each individual and store the resulting vector as columns of a new dataframe.
Example:
TICKER OFTIC CNAME ANNDATS_ACT ACTUAL
<chr> <chr> <chr> <date> <dbl>
1 0001 EPE EP ENGR CORP 2019-05-08 -0.15
2 0004 ACSF AMERICAN CAPITAL 2014-08-04 0.29
3 000R CRCM CARECOM 2018-02-27 0.32
4 000V EIGR EIGER 2018-05-11 -0.84
5 000Y RARE ULTRAGENYX 2016-02-25 -1.42
6 000Z BIOC BIOCEPT 2018-03-28 -54
7 0018 EGLT EGALET 2016-03-08 -0.28
8 001A SESN SESEN BIO 2021-03-15 -0.11
9 001C ARGS ARGOS 2017-03-16 -7
10 001J KN KNOWLES 2021-02-04 0.38
For each TICKER, I will consider the time-series implied by ACTUAL and compute the autocorrelation function. I defined the following wrapper to perform the operation:
my_acf <- function(x, lag = NULL){
acf_vec <- acf(x, lag.max = lag, plot = FALSE, na.action = na.contiguous)$acf
acf_vec <- as.vector(acf_vec)[-1]
return(acf_vec)
}
If the desired maximum lag is, say, 3, I'd like to create another dataset in which I have 4 columns: TICKER and the correspoding 3 first autocorrelations of the associated series of ACTUAL observations.
My solution was:
max_lag = 3
autocorrs <- final_sample %>%
group_by(TICKER) %>%
filter(!all(is.na(ACTUAL))) %>%
summarise(rho = my_acf(ACTUAL, lag = max_lag)) %>%
mutate(order = row_number()) %>%
pivot_wider(id_cols = TICKER, values_from = rho, names_from = order, names_prefix = "rho_")
This indeed provides the desired output:
TICKER rho_1 rho_2 rho_3
<chr> <dbl> <dbl> <dbl>
1 0001 0.836 0.676 0.493
2 0004 0.469 -0.224 -0.366
3 000R 0.561 0.579 0.327
4 000V 0.634 0.626 0.604
5 000Y 0.370 0.396 0.117
6 000Z 0.476 0.454 0.382
7 0018 0.382 -0.0170 -0.278
8 001A 0.330 0.316 0.0944
9 001C 0.727 0.590 0.400
10 001J 0.281 -0.308 -0.0343
My question is how can one perform this operation without a pivot_wider and the manual creation of the order column? The summarise verb creates a single column that store the autocorrelations sequentially for each TICKER. Is there a way to force summarize to create different columns for the different output a given function may provide when applied to, let's say, the ACTUAL series?
Good Morning,
i am using the "epiR" packages to assess test accuracy.
https://search.r-project.org/CRAN/refmans/epiR/html/epi.tests.html
## Generate a data set listing test results and true disease status:
dis <- c(rep(1, times = 744), rep(0, times = 842))
tes <- c(rep(1, times = 670), rep(0, times = 74),
rep(1, times = 202), rep(0, times = 640))
dat.df02 <- data.frame(dis, tes)
tmp.df02 <- dat.df02 %>%
mutate(dis = factor(dis, levels = c(1,0), labels = c("Dis+","Dis-"))) %>%
mutate(tes = factor(tes, levels = c(1,0), labels = c("Test+","Test-"))) %>%
group_by(tes, dis) %>%
summarise(n = n())
tmp.df02
## View the data in conventional 2 by 2 table format:
pivot_wider(tmp.df02, id_cols = c(tes), names_from = dis, values_from = n)
rval.tes02 <- epi.tests(tmp.df02, method = "exact", digits = 2,
conf.level = 0.95)
summary(rval.tes02)
The data type is listed as "epi.test". I would like to export the summary statistics to a table (i.e. gtsummary or flextable).
As summary is a function of base R, I am struggling to do this. Can anyone help? Thank you very much
The epi.tests function has been edited so it writes the results out to a data frame (instead of a list). This will simplify export to gtsummary or flextable. epiR version 2.0.50 to be uploaded to CRAN shortly.
This was not quite as straight forward as I expected.
It appears that summary() when applied to an object x of class epi.tests simply prints x$details. x$details is a list of data.frames with statistic names as row names. That last bit makes things slightly more complicated than they would otherwise have been.
A potential tidyverse solution is
library(tidyverse)
lapply(
names(rval.tes02$detail),
function(x) {
as_tibble(rval.tes02$detail[[x]]) %>%
add_column(statistic=x, .before=1)
}
) %>%
bind_rows()
# A tibble: 18 × 4
statistic est lower upper
<chr> <dbl> <dbl> <dbl>
1 ap 0.550 0.525 0.574
2 tp 0.469 0.444 0.494
3 se 0.901 0.877 0.921
4 sp 0.760 0.730 0.789
5 diag.ac 0.826 0.806 0.844
6 diag.or 28.7 21.5 38.2
7 nndx 1.51 1.41 1.65
8 youden 0.661 0.607 0.710
9 pv.pos 0.768 0.739 0.796
10 pv.neg 0.896 0.872 0.918
11 lr.pos 3.75 3.32 4.24
12 lr.neg 0.131 0.105 0.163
13 p.rout 0.450 0.426 0.475
14 p.rin 0.550 0.525 0.574
15 p.tpdn 0.240 0.211 0.270
16 p.tndp 0.0995 0.0789 0.123
17 p.dntp 0.232 0.204 0.261
18 p.dptn 0.104 0.0823 0.128
Which is a tibble containing the same information as summary(rval.tes02), which you should be able to pass on to gtsummary or flextable. Unusually, the broom package doesn't have a tidy() verb for epi.tests objects.
I have a large dataframe (34707060 obs) that consists of accelerometer data for x,y,z. Data was collected at 30Hz, meaning I have 30 rows of data for each second. See head of my data below.
Timestamp Accelerometer.X Accelerometer.Y Accelerometer.Z
1 30/06/2021 08:00:00.000 -1.109 -1.559 1.508
2 30/06/2021 08:00:00.034 -0.688 -1.043 0.891
3 30/06/2021 08:00:00.067 -0.363 -0.531 0.555
4 30/06/2021 08:00:00.100 -0.164 -0.496 0.816
5 30/06/2021 08:00:00.134 0.063 -0.363 0.496
6 30/06/2021 08:00:00.167 -0.098 -0.992 0.227
I would like to compress this dataset to have data for every second, by calculating the mean, minimum, maximum, sum and standard deviation of every 30 rows. I would like to keep the Timestamp with data and time.
I have tried to apply the following code to my dataframe, which I copied from the answer of det on the question here:
df %>% group_by(group=row_number() %/% 30) %>%
dplyr::summarize(
Timestamp = first(Timestamp),
X_mean=mean(Accelerometer.X),
Y_mean=mean(Accelerometer.Y),
Z_mean=mean(Accelerometer.Z),
X_min=min(Accelerometer.X),
Y_min=min(Accelerometer.Y),
Z_min=min(Accelerometer.Z),
X_max=max(Accelerometer.X),
Y_max=max(Accelerometer.Y),
Z_max=max(Accelerometer.Z),
X_sum=sum(Accelerometer.X),
Y_sum=sum(Accelerometer.Y),
Z_sum=sum(Accelerometer.Z),
X_sd=sd(Accelerometer.X),
Y_sd=sd(Accelerometer.Y),
Z_sd=sd(Accelerometer.Z),
)
Unfortunately, this does not give me the result I want (see below).
# A tibble: 5 × 5
group Timestamp X_mean Y_mean Z_mean
<dbl> <chr> <dbl> <dbl> <dbl>
1 0 30/06/2021 08:00:00.000 -0.576 -0.989 0.431
2 1 30/06/2021 08:00:00.967 -0.240 -1.06 0.270
3 2 30/06/2021 08:00:01.967 -0.287 -0.821 0.390
4 3 30/06/2021 08:00:02.967 -0.364 -0.830 0.337
5 4 30/06/2021 08:00:03.967 -0.332 -0.961 -0.086
The way it looks to me now, it first calculates all the values for the first 30 rows, and then includes these calculated values as the first row of 30 in the next calculation. So rather than calculating the compressed values for rows 1:30, 31-60, 61-90 etc, it keeps applying the code to lines 1:30.
I am not sure how to adjust the code to calculate the mean, min, max, sum, sd for every 30 rows (so 1:30, 31:60 etc.). Would really appreciate some help.
You can use dmy_hms to convert your row to a lubridate objectand use floor_date to round to the second. Then I'd rather use across here to compute the mean, min, max and sd
library(lubridate)
library(dplyr)
dat %>%
group_by(sec = floor_date(dmy_hms(Timestamp), "second")) %>%
summarise(Timestamp = first(Timestamp),
across(-Timestamp,
list(mean = mean, min = min, max = max, sd = sd),
.names = "{.col}_{.fn}"))
Task
I am trying to import tables situated in a single excel sheet into an R object as efficiently as possible (list will be fine, as I can take the rest of the calculations from there).
Nuance
The tables are actually excel ranges not excel tables, but they are structured and look like tables: here is an example of an excel range that should be imported as a table in R:
Ranges(In a table form) are not of the same length and can be situated anywhere in the same sheet.
Reproducible Example
Here you can find a toy example (.xlsx file) to play with:
What I have tried
Here is the code that I have written to import excel tables into R. This is inefficient method as it requires to convert all excel ranges into tables before running this code to import them to a list in R:
library(purrr)
library(XLConnect)
wb <- loadWorkbook("example.xlsx")
tables <- map(1:100,function(x) tryCatch(readTable(wb,
sheet = "Sheet1",
table = paste0("Table",x)),
error = function(e) NA)
)
Question
Is there a better (more efficient) way of importing ranges in one excel sheet into an R structure by taking excel file as given and running all computations/transformations in R. Any packages are welcomed!
Thank you very much in advance.
I'm not sure if I'm doing it using the best way, but to solve a similar problem in one of my projects. I wrote some utility functions to deal with it.You can see those functions here
The logic behind the splits is that whenever there is a row or column that only contains NA, the split will be created on the row or column. And this process will be done for a certain times.
Anyway, if you load all the functions I wrote, you can use the codes below:
Read Data
library(tidyverse)
table_raw<- readxl::read_excel("example.xlsx",col_names = FALSE,col_types = "text")
Display data Shape
# This is a custom function I wrote
display_table_shape(table_raw)
Split data into separate data frames.
split_table <- table_raw %>%
split_df(complexity = 2) # another custom function I wrote
After the original data frame is split, you can do more processing using for loop or map functions.
Data Cleaning
map(split_table, function(df){
df <- df[-1,]
set_1row_colname(df) %>% # another function I wrote
mutate_all(as.numeric)
})
Result
[[1]]
# A tibble: 8 x 4
aa bb cc dd
<dbl> <dbl> <dbl> <dbl>
1 0.197 0.321 0.265 0.0748
2 0.239 0.891 0.0308 0.453
3 0.300 0.779 0.780 0.213
4 0.132 0.138 0.612 0.0362
5 0.834 0.697 0.879 0.571
6 0.956 0.807 0.741 0.936
7 0.359 0.536 0.0902 0.764
8 0.403 0.315 0.593 0.840
[[2]]
# A tibble: 4 x 4
aa bb cc dd
<dbl> <dbl> <dbl> <dbl>
1 0.136 0.347 0.603 0.542
2 0.790 0.672 0.0808 0.795
3 0.589 0.338 0.837 0.00968
4 0.513 0.766 0.553 0.189
[[3]]
# A tibble: 8 x 4
aa bb cc dd
<dbl> <dbl> <dbl> <dbl>
1 0.995 0.105 0.106 0.530
2 0.372 0.306 0.190 0.609
3 0.508 0.987 0.585 0.233
4 0.0800 0.851 0.215 0.761
5 0.471 0.603 0.740 0.106
6 0.395 0.0808 0.571 0.266
7 0.908 0.739 0.245 0.141
8 0.534 0.313 0.663 0.824
[[4]]
# A tibble: 14 x 4
aa bb cc dd
<dbl> <dbl> <dbl> <dbl>
1 0.225 0.993 0.0382 0.412
2 0.280 0.202 0.823 0.664
3 0.423 0.616 0.377 0.857
4 0.289 0.298 0.0418 0.410
5 0.919 0.932 0.882 0.668
6 0.568 0.561 0.600 0.832
7 0.341 0.210 0.351 0.0863
8 0.757 0.962 0.484 0.677
9 0.275 0.0845 0.824 0.571
10 0.187 0.512 0.884 0.612
11 0.706 0.311 0.00610 0.463
12 0.906 0.411 0.215 0.377
13 0.629 0.317 0.0975 0.312
14 0.144 0.644 0.906 0.353
The functions you need to load
# utility function to get rle as a named vector
vec_rle <- function(v){
temp <- rle(v)
out <- temp$values
names(out) <- temp$lengths
return(out)
}
# utility function to map table with their columns/rows in a bigger table
make_df_index <- function(v){
table_rle <- vec_rle(v)
divide_points <- c(0,cumsum(names(table_rle)))
table_index <- map2((divide_points + 1)[1:length(divide_points)-1],
divide_points[2:length(divide_points)],
~.x:.y)
return(table_index[table_rle])
}
# split a large table in one direction if there are blank columns or rows
split_direction <- function(df,direction = "col"){
if(direction == "col"){
col_has_data <- unname(map_lgl(df,~!all(is.na(.x))))
df_mapping <- make_df_index(col_has_data)
out <- map(df_mapping,~df[,.x])
} else if(direction == "row"){
row_has_data <- df %>%
mutate_all(~!is.na(.x)) %>%
as.matrix() %>%
apply(1,any)
df_mapping <- make_df_index(row_has_data)
out <- map(df_mapping,~df[.x,])
}
return(out)
}
# split a large table into smaller tables if there are blank columns or rows
# if you still see entire rows or columns missing. Please increase complexity
split_df <- function(df,showWarnig = TRUE,complexity = 1){
if(showWarnig){
warning("Please don't use first row as column names.")
}
out <- split_direction(df,"col")
for(i in 1 :complexity){
out <- out %>%
map(~split_direction(.x,"row")) %>%
flatten() %>%
map(~split_direction(.x,"col")) %>%
flatten()
}
return(out)
}
#display the rough shape of table in a sheet with multiple tables
display_table_shape <- function(df){
colnames(df) <- 1:ncol(df)
out <- df %>%
map_df(~as.numeric(!is.na(.x))) %>%
gather(key = "x",value = "value") %>%
mutate(x = as.numeric(x)) %>%
group_by(x) %>%
mutate(y = -row_number()) %>%
ungroup() %>%
filter(value == 1) %>%
ggplot(aes(x = x, y = y,fill = value)) +
geom_tile(fill = "skyblue3") +
scale_x_continuous(position = "top") +
theme_void() +
theme(legend.position="none",
panel.border = element_rect(colour = "black", fill=NA, size=2))
return(out)
}
# set first row as column names for a data frame and remove the original first row
set_1row_colname <- function(df){
colnames(df) <- as.character(df[1,])
out <- df[-1,]
return(out)
}
I had a similar problem, this is how I solved it. Note, it loses some of the benefit of yusuzech's answer, in that it does require you to specify the ranges of interest. On the flip side, it may be more efficient to code and more adaptable to different situations.
# specify the ranges you want to import from the excel sheet
v_ranges <- c("A3:F54", "H3:M54", "O3:T54", "V3:AA54", "AC3:AH54")
# specify the names of the dataframes
v_names <- c("21Q3", "21Q2", "21Q1", "20Q4", "20Q3")
# specify sheet and path
v_path_file <- "my_path/my_excel_file.xlsx"
v_sheet <- "my_sheet_name"
# define the import function, with v_ranges as your ranges of interest, v_path_file as the excel workbook you want to import from, and v_sheet the sheet name of the file
f_import_excel_by_range <- function(.x) {
janitor::clean_names(
readxl::read_excel(v_path_file,
sheet = v_sheet,
range = .x,
col_names = TRUE, na = c(" ", "NA"), trim_ws = TRUE, skip = 1)
)
}
my_file_name <-
purrr::map(v_ranges, f_import_excel_by_range) %>%
purrr::set_names(paste0("my_file_name_",v_names))
# extract databases to the environment
base::invisible(base::list2env(my_file_name, .GlobalEnv))
I believe this function can be improved by including the path and file as well as the sheet name in the function. If I sleuth that out, I will edit. Feedback welcome.
I have a data set similar to the following with 1 column and 60 rows:
value
1 0.0423
2 0.0388
3 0.0386
4 0.0342
5 0.0296
6 0.0276
7 0.0246
8 0.0239
9 0.0234
10 0.0214
.
40 0.1424
.
60 -0.0312
I want to reorder the rows so that certain conditions are met. For example one condition could be: sum(df$value[4:7]) > 0.1000 & sum(df$value[4:7]) <0.1100
With the data set looking like this for example.
value
1 0.0423
2 0.0388
3 0.0386
4 0.1312
5 -0.0312
6 0.0276
7 0.0246
8 0.0239
9 0.0234
10 0.0214
.
.
.
60 0.0342
What I tried was using repeat and sample as in the following:
repeat{
df1 <- as_tibble(sample(sdf$value, replace = TRUE))
if (sum(df$value[4:7]) > 0.1000 & sum(df$value[4:7]) <0.1100) break
}
Unfortunately, this method takes quite some time and I was wondering if there is a faster way to reorder rows based on mathematical conditions such as sum or prod
Here's a quick implementation of the hill-climbing method I outlined in my comment. I've had to slightly reframe the desired condition as "distance of sum(x[4:7]) from 0.105" to make it continuous, although you can still use the exact condition when doing the check that all requirements are satisfied. The benefit is that you can add extra conditions to the distance function easily.
# Using same example data as Jon Spring
set.seed(42)
vs = rnorm(60, 0.05, 0.08)
get_distance = function(x) {
distance = abs(sum(x[4:7]) - 0.105)
# Add to the distance with further conditions if needed
distance
}
max_attempts = 10000
best_distance = Inf
swaps_made = 0
for (step in 1:max_attempts) {
# Copy the vector and swap two random values
new_vs = vs
swap_inds = sample.int(length(vs), 2, replace = FALSE)
new_vs[swap_inds] = rev(new_vs[swap_inds])
# Keep the new vector if the distance has improved
new_distance = get_distance(new_vs)
if (new_distance < best_distance) {
vs = new_vs
best_distance = new_distance
swaps_made = swaps_made + 1
}
complete = (sum(vs[4:7]) < 0.11) & (sum(vs[4:7]) > 0.1)
if (complete) {
print(paste0("Solution found in ", step, " steps"))
break
}
}
sum(vs[4:7])
There's no real guarantee that this method will reach a solution, but I often try this kind of basic hill-climbing when I'm not sure if there's a "smart" way to approach a problem.
Here's an approach using combn from base R, and then filtering using dplyr. (I'm sure there's a way w/o it but my base-fu isn't there yet.)
With only 4 numbers from a pool of 60, there are "only" 488k different combinations (ignoring order; =60*59*58*57/4/3/2), so it's quick to brute force in about a second.
# Make a vector of 60 numbers like your example
set.seed(42)
my_nums <- rnorm(60, 0.05, 0.08);
all_combos <- combn(my_nums, 4) # Get all unique combos of 4 numbers
library(tidyverse)
combos_table <- all_combos %>%
t() %>%
as_tibble() %>%
mutate(sum = V1 + V2 + V3 + V4) %>%
filter(sum > 0.1, sum < 0.11)
> combos_table
# A tibble: 8,989 x 5
V1 V2 V3 V4 sum
<dbl> <dbl> <dbl> <dbl> <dbl>
1 0.160 0.00482 0.0791 -0.143 0.100
2 0.160 0.00482 0.101 -0.163 0.103
3 0.160 0.00482 0.0823 -0.145 0.102
4 0.160 0.00482 0.0823 -0.143 0.104
5 0.160 0.00482 -0.0611 -0.00120 0.102
6 0.160 0.00482 -0.0611 0.00129 0.105
7 0.160 0.00482 0.0277 -0.0911 0.101
8 0.160 0.00482 0.0277 -0.0874 0.105
9 0.160 0.00482 0.101 -0.163 0.103
10 0.160 0.00482 0.0273 -0.0911 0.101
# … with 8,979 more rows
This says that in this example, there are about 9000 different sets of 4 numbers from my sequence which meet the criteria. We could pick any of these and put them in positions 4-7 to meet your requirement.