Given a data frame with two columns, I'm looking to calculate a third column which would contain the mean for every n number of rows while keeping the data frame intact.
Given the data frame
index<-1:20
V<-c(2,5,7,4,8,9,4,6,8,NA,3,4,5,6,0,4,5,7,5,3)
DF<-data.frame(index,V)
How could I create DF$mean which would be the non-rolling mean of every 5 rows.
index V mean
1 2 5.2
2 5 5.2
3 7 5.2
4 4 5.2
5 8 5.2
6 9 6.75
7 4 6.75
8 6 6.75
9 8 6.75
10 NA 6.75
11 3 3.6
12 4 3.6
13 5 3.6
14 6 3.6
15 0 3.6
16 4 4.8
17 5 4.8
18 7 4.8
19 5 4.8
20 3 4.8
You can use colMeans and rep
DF$mean <- rep(colMeans(matrix(DF$V, nrow=5), na.rm=TRUE), each=5)
DF$mean <- ave(DF$V,
rep(1:(nrow(DF)/5), each=5),
FUN=function(x){mean(x, na.rm=TRUE)})
which gives
> DF
index V mean
1 1 2 5.20
2 2 5 5.20
3 3 7 5.20
4 4 4 5.20
5 5 8 5.20
6 6 9 6.75
7 7 4 6.75
8 8 6 6.75
9 9 8 6.75
10 10 NA 6.75
11 11 3 3.60
12 12 4 3.60
13 13 5 3.60
14 14 6 3.60
15 15 0 3.60
16 16 4 4.80
17 17 5 4.80
18 18 7 4.80
19 19 5 4.80
20 20 3 4.80
Related
trying to determine how I would address this problem using R code.
Brief description of problem:
There is a minimum mass required to run an analysis of samples. Previously collected samples are often less than this mass, which means that multiple samples within an experimental treatment must be pooled to reach the minimum requirement. However, samples should be pooled as little as possible to maximize biological replicates.
For example, samples within Treatment A may have these masses: 8g, 7g, 5g, and 10g. Another Treatment, B, has samples with masses of 20g, 21g, 24g, and 29g.
If the minimum mass required for the analysis is 15g, then each sample in Treatment B can be analyzed without pooling. However, in Treatment A, samples would need to be pooled to reach this minimum.
It would be best to combine the 5g and 10g sample and the 8g and 7g sample, because this maximizes the number of possible pooled samples by minimizing potential total masses (i.e., if I combined the 5g and 8g and also the 10g and 7g, I would only have one possible pooled sample that meets the minimum)
Data and R
The data is structured as this example follows:
sample_id = c(1:24)
treatments = c(rep("A",8),rep("B",8),rep("C",8))
mass = round(c(runif(8,4,10),runif(8,5,13),runif(8,15,18)),1)
df = data.frame(cbind(sample_id,treatments,mass))
df$mass = as.numeric(df$mass)
df$sample_id = as.numeric(df$sample_id)
> df
sample_id treatments mass
1 1 A 8.6
2 2 A 8.9
3 3 A 7.5
4 4 A 4.5
5 5 A 7.9
6 6 A 4.5
7 7 A 7.7
8 8 A 6.6
9 9 B 5.0
10 10 B 12.0
11 11 B 7.4
12 12 B 8.4
13 13 B 12.2
14 14 B 10.0
15 15 B 6.5
16 16 B 12.1
17 17 C 15.6
18 18 C 16.5
19 19 C 16.8
20 20 C 17.5
21 21 C 15.6
22 22 C 17.6
23 23 C 18.0
24 24 C 15.8
So far my strategy has been:
# Step 1: separate out all samples that do not need to be pooled, for ease IRL
bigenough = df %>%
filter(mass >= 15)
#Keep df with all the samples that will need to be pooled
poolneeded = df %>%
filter(!(sample_id %in% bigenough$sample_id))
I am at a loss of how to best pool the samples algorithmically however. If anyone has any suggestions that would be helpful. I usually use tidyverse if that helps...
Here is a first attempt. It is made up of a function which takes a split data.frame (split by treatment) of the data to be pooled.
In this function a new DF is created which contains all pairwise possibilities of the sample_id. This df2 is then 2 times left_join-ed with data and the sum of the two samples is calculated, filtered for being >= 15 and ordered.
This function is then called by map after group_split. The result is all the possible allowed sample combinations.
library(tidyverse)
fff <- function(data) {
nn <- nrow(data)
mm <- combn(seq(data$sample_id[1], data$sample_id[nn]), 2) |> t()
df2 <- data.frame(mm) |> setNames(c("sample_id", "sample_id_2"))
ddf <- df2 |>
left_join(data) |> # nolint: object_usage_linter.
left_join(data, by = c("sample_id_2" = "sample_id", "treatments")) |>
mutate(sum = mass.x + mass.y) |> # nolint: object_usage_linter.
filter(sum >= 15) |>
arrange(sample_id, sum) # nolint: object_usage_linter.
return(ddf)
}
poolneeded |>
group_split(treatments) |>
map(fff)
#> Joining, by = "sample_id"
#> Joining, by = "sample_id"
#> [[1]]
#> sample_id sample_id_2 treatments mass.x mass.y sum
#> 1 1 2 A 9.4 6.2 15.6
#> 2 1 7 A 9.4 6.5 15.9
#> 3 1 4 A 9.4 6.8 16.2
#> 4 1 3 A 9.4 7.6 17.0
#> 5 1 8 A 9.4 8.9 18.3
#> 6 2 8 A 6.2 8.9 15.1
#> 7 3 8 A 7.6 8.9 16.5
#> 8 4 8 A 6.8 8.9 15.7
#> 9 7 8 A 6.5 8.9 15.4
#>
#> [[2]]
#> sample_id sample_id_2 treatments mass.x mass.y sum
#> 1 9 10 B 10.9 7.0 17.9
#> 2 9 14 B 10.9 7.2 18.1
#> 3 9 11 B 10.9 7.9 18.8
#> 4 9 16 B 10.9 8.5 19.4
#> 5 9 13 B 10.9 11.2 22.1
#> 6 9 12 B 10.9 11.7 22.6
#> 7 9 15 B 10.9 11.7 22.6
#> 8 10 16 B 7.0 8.5 15.5
#> 9 10 13 B 7.0 11.2 18.2
#> 10 10 12 B 7.0 11.7 18.7
#> 11 10 15 B 7.0 11.7 18.7
#> 12 11 14 B 7.9 7.2 15.1
#> 13 11 16 B 7.9 8.5 16.4
#> 14 11 13 B 7.9 11.2 19.1
#> 15 11 12 B 7.9 11.7 19.6
#> 16 11 15 B 7.9 11.7 19.6
#> 17 12 14 B 11.7 7.2 18.9
#> 18 12 16 B 11.7 8.5 20.2
#> 19 12 13 B 11.7 11.2 22.9
#> 20 12 15 B 11.7 11.7 23.4
#> 21 13 14 B 11.2 7.2 18.4
#> 22 13 16 B 11.2 8.5 19.7
#> 23 13 15 B 11.2 11.7 22.9
#> 24 14 16 B 7.2 8.5 15.7
#> 25 14 15 B 7.2 11.7 18.9
#> 26 15 16 B 11.7 8.5 20.2
Another way
This makes use of the same function fff as above but it needs to be called with a subset of the poolneeeded - in this case below it is a subset of treatments == "B" .
You see then a DF of all possible allowed combinations for pooling and can choose a first pair for pooling. Then the remaining choices for a second pooling are also shown.
sel2 <- function(data) {
ddf <- fff(data)
cat(paste("\n", "These are your possibilities for the FIRST pooling", "\n"))
print(ddf)
ask <- askYesNo("Do You want to make first choice?")
if (ask) {
s_1 <- readline(prompt = "Enter sample 1: ")
s_2 <- readline(prompt = "Enter sample 2: ")
ddf2 <- ddf |> filter(
sample_id != s_1 & sample_id != s_2 &
sample_id_2 != s_1 & sample_id_2 != s_2 # nolint: object_usage_linter.
)
cat(paste0("\n", "These are your possibilities for the SECOND pooling", "\n"))
print(ddf2)
} else {
return()
}
}
poolneeded_b <- poolneeded |> filter(treatments == "B")
sel2(poolneeded_b)
#> r$> sel2(poolneeded_b)
#> Joining, by = "sample_id"
#>
#> These are your possibilities for the FIRST pooling
#> sample_id sample_id_2 treatments mass.x mass.y sum
#> 1 9 14 B 7.6 8.1 15.7
#> 2 9 12 B 7.6 8.7 16.3
#> 3 9 15 B 7.6 9.6 17.2
#> 4 9 10 B 7.6 10.3 17.9
#> 5 9 16 B 7.6 10.9 18.5
#> 6 9 13 B 7.6 12.3 19.9
#> 7 10 11 B 10.3 5.6 15.9
#> 8 10 14 B 10.3 8.1 18.4
#> 9 10 12 B 10.3 8.7 19.0
#> 10 10 15 B 10.3 9.6 19.9
#> 11 10 16 B 10.3 10.9 21.2
#> 12 10 13 B 10.3 12.3 22.6
#> 13 11 15 B 5.6 9.6 15.2
#> 14 11 16 B 5.6 10.9 16.5
#> 15 11 13 B 5.6 12.3 17.9
#> 16 12 14 B 8.7 8.1 16.8
#> 17 12 15 B 8.7 9.6 18.3
#> 18 12 16 B 8.7 10.9 19.6
#> 19 12 13 B 8.7 12.3 21.0
#> 20 13 14 B 12.3 8.1 20.4
#> 21 13 15 B 12.3 9.6 21.9
#> 22 13 16 B 12.3 10.9 23.2
#> 23 14 15 B 8.1 9.6 17.7
#> 24 14 16 B 8.1 10.9 19.0
#> 25 15 16 B 9.6 10.9 20.5
#>
#> Do You want to make first choice? (Yes/no/abbrechen) y
#> Enter sample 1: 9
#> Enter sample 2: 14
#>
#> These are your possibilities for the SECOND pooling
#> sample_id sample_id_2 treatments mass.x mass.y sum
#> 1 10 11 B 10.3 5.6 15.9
#> 2 10 12 B 10.3 8.7 19.0
#> 3 10 15 B 10.3 9.6 19.9
#> 4 10 16 B 10.3 10.9 21.2
#> 5 10 13 B 10.3 12.3 22.6
#> 6 11 15 B 5.6 9.6 15.2
#> 7 11 16 B 5.6 10.9 16.5
#> 8 11 13 B 5.6 12.3 17.9
#> 9 12 15 B 8.7 9.6 18.3
#> 10 12 16 B 8.7 10.9 19.6
#> 11 12 13 B 8.7 12.3 21.0
#> 12 13 15 B 12.3 9.6 21.9
#> 13 13 16 B 12.3 10.9 23.2
#> 14 15 16 B 9.6 10.9 20.5
Let's see an example. For it, I have two observations repeated 4 times:
> data(anscombe)
> anscombe
x1 x2 x3 x4 y1 y2 y3 y4
1 10 10 10 8 8.04 9.14 7.46 6.58
2 8 8 8 8 6.95 8.14 6.77 5.76
3 13 13 13 8 7.58 8.74 12.74 7.71
4 9 9 9 8 8.81 8.77 7.11 8.84
5 11 11 11 8 8.33 9.26 7.81 8.47
6 14 14 14 8 9.96 8.10 8.84 7.04
7 6 6 6 8 7.24 6.13 6.08 5.25
8 4 4 4 19 4.26 3.10 5.39 12.50
9 12 12 12 8 10.84 9.13 8.15 5.56
10 7 7 7 8 4.82 7.26 6.42 7.91
11 5 5 5 8 5.68 4.74 5.73 6.89
If I want to see how many of the four times the first observation is greater than 10 and the second is greater than 9, I have at least two options to proceed:
First, reshape the table to long format, sum by group (in this example is like if I have only an id) and reshape again to wide. I can do that, but it does not seem to me very efficient, and if I have too many columns, some indexed and some not, the codes to rehape can be a bit cumbersome.
Second, I can do the next:
library(dplyr)
library(purrr)
anscombe %>%
mutate(new_var = rowSums(map_dfc(
1:4,
~ anscombe[[paste0("x",.)]] > 10 & anscombe[[paste0("y",.)]] > 9
), na.rm = T))
x1 x2 x3 x4 y1 y2 y3 y4 new_var
1 10 10 10 8 8.04 9.14 7.46 6.58 0
2 8 8 8 8 6.95 8.14 6.77 5.76 0
3 13 13 13 8 7.58 8.74 12.74 7.71 1
4 9 9 9 8 8.81 8.77 7.11 8.84 0
5 11 11 11 8 8.33 9.26 7.81 8.47 1
6 14 14 14 8 9.96 8.10 8.84 7.04 1
7 6 6 6 8 7.24 6.13 6.08 5.25 0
8 4 4 4 19 4.26 3.10 5.39 12.50 1
9 12 12 12 8 10.84 9.13 8.15 5.56 2
10 7 7 7 8 4.82 7.26 6.42 7.91 0
11 5 5 5 8 5.68 4.74 5.73 6.89 0
Great! It works. But, since in my real data I have much more observations and conditions each time, I would like to do the line anscombe[[paste0("x",.)]] > 10 & anscombe[[paste0("y",.)]] > 9 shorter.
For example, with dplyr functions, data frame name often can be avoided. Maybe I would have to use rlang function sym as follows:
!!sym(paste0("x",.)) > 10 & !!sym(paste0("y",.)) > 9
I tried, but it didn't work. Maybe there is some other function than map_dfc in dplyr, purrr or some other package which allow to do this in an easier and more efficient way. Do you have some idea?
Thank you very much.
1) split/map2: Here is an option with split based on the names of the dataset. Here, we remove the digit part at the end from the names, split the dataset into a list of data.frames, using map2, pass the vector elements to compare, reduce and get the rowSums
library(dplyr)
library(purrr)
library(stringr)
anscombe %>%
split.default(str_remove(names(.), "\\d+$")) %>%
map2(., c(10, 9), `>`) %>%
reduce(`&`) %>%
rowSums %>%
bind_cols(anscombe, new_var = .)
# x1 x2 x3 x4 y1 y2 y3 y4 new_var
#1 10 10 10 8 8.04 9.14 7.46 6.58 0
#2 8 8 8 8 6.95 8.14 6.77 5.76 0
#3 13 13 13 8 7.58 8.74 12.74 7.71 1
#4 9 9 9 8 8.81 8.77 7.11 8.84 0
#5 11 11 11 8 8.33 9.26 7.81 8.47 1
#6 14 14 14 8 9.96 8.10 8.84 7.04 1
#7 6 6 6 8 7.24 6.13 6.08 5.25 0
#8 4 4 4 19 4.26 3.10 5.39 12.50 1
#9 12 12 12 8 10.84 9.13 8.15 5.56 2
#10 7 7 7 8 4.82 7.26 6.42 7.91 0
#11 5 5 5 8 5.68 4.74 5.73 6.89 0
2) pivot_longer: Another option is pivot_longer from tidyr which can take multiple sets of columns and reshape it to 'long' format
library(dplyr)
library(tidyr) #1.0.0
library(tibble)
anscombe %>%
rownames_to_column('rn') %>%
pivot_longer( -rn, names_to = c(".value", "repl"),
values_to = c('x', 'y'), names_pattern = '(\\D+)(\\d+)') %>%
group_by(rn) %>%
summarise(new_var = sum(x > 10 & y > 9, na.rm = TRUE)) %>%
arrange(as.integer(rn)) %>%
select(-rn) %>%
bind_cols(anscombe, .)
# x1 x2 x3 x4 y1 y2 y3 y4 new_var
#1 10 10 10 8 8.04 9.14 7.46 6.58 0
#2 8 8 8 8 6.95 8.14 6.77 5.76 0
#3 13 13 13 8 7.58 8.74 12.74 7.71 1
#4 9 9 9 8 8.81 8.77 7.11 8.84 0
#5 11 11 11 8 8.33 9.26 7.81 8.47 1
#6 14 14 14 8 9.96 8.10 8.84 7.04 1
#7 6 6 6 8 7.24 6.13 6.08 5.25 0
#8 4 4 4 19 4.26 3.10 5.39 12.50 1
#9 12 12 12 8 10.84 9.13 8.15 5.56 2
#10 7 7 7 8 4.82 7.26 6.42 7.91 0
#11 5 5 5 8 5.68 4.74 5.73 6.89 0
3) base R: (similar to the logic used for the first method). This would make it automatic as we can split the data into chunks based on the prefix similarity
anscombe$new_var <- rowSums(Reduce(`&`, Map(`>`,
split.default(anscombe, sub("\\d+$", "", names(anscombe))), c(10, 9))))
4) unique substring prefix: Or another option which is making use of prefix matching is loop through the unique substring prefix (would be slower than split) and then apply
rowSums(Reduce(`&`, Map(`>`, lapply(unique(sub("\\d+$", "",
names(anscombe))), function(nm)
anscombe[grep(nm, names(anscombe))]), c(10, 9))))
#[1] 0 0 1 0 1 1 0 1 2 0 0
You could try pmap in purrr to iterate a data frame row-wisely
library(dplyr)
library(purrr)
library(stringr)
new_var <- pmap_dbl(anscombe, function(...){
row <- unlist(list(...))
x <- row[str_subset(names(row),"^x")]
y <- row[str_subset(names(row),"^y")]
sum((x > 10) & (y > 9))
})
anscombe[,"new_var"] <- new_var
> anscombe
x1 x2 x3 x4 y1 y2 y3 y4 new_var
1 10 10 10 8 8.04 9.14 7.46 6.58 0
2 8 8 8 8 6.95 8.14 6.77 5.76 0
3 13 13 13 8 7.58 8.74 12.74 7.71 1
4 9 9 9 8 8.81 8.77 7.11 8.84 0
5 11 11 11 8 8.33 9.26 7.81 8.47 1
6 14 14 14 8 9.96 8.10 8.84 7.04 1
7 6 6 6 8 7.24 6.13 6.08 5.25 0
8 4 4 4 19 4.26 3.10 5.39 12.50 1
9 12 12 12 8 10.84 9.13 8.15 5.56 2
10 7 7 7 8 4.82 7.26 6.42 7.91 0
11 5 5 5 8 5.68 4.74 5.73 6.89 0
Why not just
rowSums(anscombe[1:4] > 10 & anscombe[5:8] > 9)
# [1] 0 0 1 0 1 1 0 1 2 0 0
or
rowSums(anscombe[grep("^x", names(anscombe))] > 10 &
anscombe[grep("^y", names(anscombe))] > 9)
# [1] 0 0 1 0 1 1 0 1 2 0 0
I have been told in github by another option similar to my attempts:
library(dplyr)
library(purrr)
anscombe %>%
mutate(new_var = rowSums(map_dfc(
1:4,
~ get(paste0("x",.)) > 10 & get(paste0("y",.)) > 9
), na.rm = T))
which gives me no problem independent on data format (whether it is Date or whatever), allows flexibility writing the condition, shorten the script a few and it is intuitive.
I have the data set:
Time a b
[1,] 0 5.06 9.60
[2,] 4 9.57 4.20
[3,] 8 1.78 3.90
[4,] 12 2.21 3.90
[5,] 16 4.10 5.84
[6,] 20 2.81 8.10
[7,] 24 2.70 1.18
[8,] 36 52.00 5.68
[9,] 48 NA 6.66
And I would like to reshape it to:
Time variable value
0 a 5.06
4 a 9.57
8 a 1.78
...
0 b 9.60
4 b 4.20
8 b 3.90
...
The code I am using is:
library(reshape2)
Time <- c(0,4,8,12,16,20,24,36,48)
a <- c(5.06,9.57,1.78,2.21,4.1,2.81,2.7,52,NA)
b <- c(9.6,4.2,3.9,3.9,5.84,8.1,1.18,5.68,6.66)
Mono <- cbind(Time,a,b)
mono <- melt(Mono,id="Time",na.rm=F)
Which produces:
Var1 Var2 value
1 1 Time 0.00
2 2 Time 4.00
3 3 Time 8.00
4 4 Time 12.00
5 5 Time 16.00
6 6 Time 20.00
7 7 Time 24.00
8 8 Time 36.00
9 9 Time 48.00
10 1 a 5.06
11 2 a 9.57
12 3 a 1.78
13 4 a 2.21
14 5 a 4.10
15 6 a 2.81
16 7 a 2.70
17 8 a 52.00
18 9 a NA
19 1 b 9.60
20 2 b 4.20
21 3 b 3.90
22 4 b 3.90
23 5 b 5.84
24 6 b 8.10
25 7 b 1.18
26 8 b 5.68
27 9 b 6.66
I'm sure its a small error, but I can't figure it out. It's especially frustrating because I've used melt() without problems many times before. How can I fix the code to produce the table I'm looking for?
Thanks for your help!
Use tidyr::gather() to move from wide to long format.
> df <- data.frame(time = seq(0,20,5),
a = rnorm(5,0,1),
b = rnorm(5,0,1))
> library(tidyr)
> gather(df, variable, value, -time)
time variable value
1 0 a 1.5406529
2 5 a 1.5048055
3 10 a -1.1138529
4 15 a -0.1199039
5 20 a -1.7052608
6 0 b -1.1976938
7 5 b 0.7997127
8 10 b 1.1940454
9 15 b 0.5177981
10 20 b 0.6725264
This is my sample data:
date label type exdate x y z w
1 10 A 2 15 0.25 0.35 13.49
1 10 A 2 12.5 1.30 1.45 13.49
1 10 B 2 10 1.7 1.8 13.49
1 10 B 2 12.5 0.3 0.4 13.49
1 10 B 2 17.5 1.8 0.3 13.49
1 11 A 3 15 0.75 0.8 13.49
1 11 A 3 12.5 1.8 1.9 13.49
1 11 A 3 17.5 0.2 0.35 13.49
1 11 B 3 10 0.1 0.25 13.49
1 11 B 3 15 2.15 2.3 13.49
1 11 B 3 12.5 0.8 0.85 13.49
1 11 B 3 17.5 4.1 4.3 13.49
2 11 A 4 10 3.7 4 13.49
2 11 A 4 15 1 1.1 13.49
2 11 A 4 12.5 2.05 2.2 13.49
2 11 A 4 17.5 0.4 0.55 13.49
2 11 B 4 10 0.3 0.4 13.49
2 11 B 4 15 2.45 2.6 13.49
2 11 B 4 12.5 1.05 1.15 13.49
2 11 B 4 17.5 4.3 4.6 13.49
Firstly, I will group my data set by c(date,label,exdate), and for each group it will be A and B inside variable 'type'. BUT I want to let the number of rows for type A and type B is the same.
Filter conditions:
To make data to be the same number of rows, the distance between x and w should be same or almost the same for any pairs of type A and type B.
For example:
type x w
A 2 3.5
A 3 3.5
A 4 3.5
B 1 3.5
B 2 3.5
# The output after filter
type x w
A 2 3.5 (pair with type_B ; x = 1)
A 3 3.5 (pair with type_B ; x = 2)
B 1 3.5
B 2 3.5
So, for the sample data above, the result I hope:
date label type exdate x y z w
1 10 A 2 15 0.25 0.35 13.49
1 10 A 2 12.5 1.30 1.45 13.49
1 10 B 2 12.5 0.3 0.4 13.49
1 10 B 2 17.5 1.8 0.3 13.49
1 11 A 3 15 0.75 0.8 13.49
1 11 A 3 12.5 1.8 1.9 13.49
1 11 A 3 17.5 0.2 0.35 13.49
1 11 B 3 15 2.15 2.3 13.49
1 11 B 3 12.5 0.8 0.85 13.49
1 11 B 3 17.5 4.1 4.3 13.49
2 11 A 4 10 3.7 4 13.49
2 11 A 4 15 1 1.1 13.49
2 11 A 4 12.5 2.05 2.2 13.49
2 11 A 4 17.5 0.4 0.55 13.49
2 11 B 4 10 0.3 0.4 13.49
2 11 B 4 15 2.45 2.6 13.49
2 11 B 4 12.5 1.05 1.15 13.49
2 11 B 4 17.5 4.3 4.6 13.49
To make this result, how can I code? Is it insert else if condition inside filter()?
Is there a way that one can set the number of digits of a full data frame to 2? Case in point, how would you set the number of digits to 2 for the following data using R?
Distance Age Height Coning
1 21.4 18 3.3 Yes
2 13.9 17 3.4 Yes
3 23.9 16 2.9 Yes
4 8.7 18 3.6 No
5 241.8 6 0.7 No
6 44.5 17 1.3 Yes
7 30.0 15 2.5 Yes
8 32.3 16 1.8 Yes
9 31.4 17 5.0 No
10 32.8 13 1.6 No
11 53.3 12 2.0 No
12 54.3 6 0.9 No
13 96.3 11 2.6 No
14 133.6 4 0.6 No
15 32.1 15 2.3 No
16 57.9 12 2.4 Yes
17 30.8 17 1.8 No
18 59.9 7 0.8 No
19 42.7 15 2.0 Yes
20 20.6 18 1.7 Yes
21 62.0 8 1.3 No
22 53.1 7 1.6 No
23 28.9 16 2.2 Yes
24 177.4 5 1.1 No
25 24.8 14 1.5 Yes
26 75.3 14 2.3 Yes
27 51.6 7 1.4 No
28 36.1 9 1.1 No
29 116.1 6 1.1 No
30 28.1 16 2.5 Yes
31 8.7 19 2.2 Yes
32 105.1 6 0.8 No
33 46.0 15 3.0 Yes
34 102.6 7 1.2 No
35 15.8 15 2.2 No
36 60.0 7 1.3 No
37 96.4 13 2.6 No
38 24.2 14 1.7 No
39 14.5 15 2.4 No
40 36.6 14 1.5 No
41 65.7 5 0.6 No
42 116.3 7 1.6 No
43 113.6 8 1.0 No
44 16.7 15 4.3 Yes
45 66.0 7 1.0 No
46 60.7 7 1.0 No
47 90.6 7 0.7 No
48 91.3 7 1.3 No
49 14.4 18 3.1 Yes
50 72.8 14 3.0 Yes
It sounds like you're just looking for format, which has a data.frame method.
A small example:
mydf <- mydf2 <- data.frame(
Distance = c(21.4, 13.9, 23.9, 8.7, 241.8, 44.5),
Age = c(18, 17, 16, 18, 6, 17),
Height = c(3.3, 3.4, 2.9, 3.6, 0.7, 1.3),
Coning = c("Y", "Y", "Y", "N", "N", "Y"))
format(mydf, nsmall = 2)
# Distance Age Height Coning
# 1 21.40 18.00 3.30 Y
# 2 13.90 17.00 3.40 Y
# 3 23.90 16.00 2.90 Y
# 4 8.70 18.00 3.60 N
# 5 241.80 6.00 0.70 N
# 6 44.50 17.00 1.30 Y
As you should expect, if the data are integers, they won't be printed as decimals.
mydf2$Age <- as.integer(mydf2$Age)
format(mydf2, nsmall = 2)
# Distance Age Height Coning
# 1 21.40 18 3.30 Y
# 2 13.90 17 3.40 Y
# 3 23.90 16 2.90 Y
# 4 8.70 18 3.60 N
# 5 241.80 6 0.70 N
# 6 44.50 17 1.30 Y
An alternative to format is to globally set the option for the number of digits to be displayed:
options(digits=2)
Will mean that from that point forward all numerics will be printed up to 2 decimal places (The default is 7).