Setting the number of digits to a given value - r

Is there a way that one can set the number of digits of a full data frame to 2? Case in point, how would you set the number of digits to 2 for the following data using R?
Distance Age Height Coning
1 21.4 18 3.3 Yes
2 13.9 17 3.4 Yes
3 23.9 16 2.9 Yes
4 8.7 18 3.6 No
5 241.8 6 0.7 No
6 44.5 17 1.3 Yes
7 30.0 15 2.5 Yes
8 32.3 16 1.8 Yes
9 31.4 17 5.0 No
10 32.8 13 1.6 No
11 53.3 12 2.0 No
12 54.3 6 0.9 No
13 96.3 11 2.6 No
14 133.6 4 0.6 No
15 32.1 15 2.3 No
16 57.9 12 2.4 Yes
17 30.8 17 1.8 No
18 59.9 7 0.8 No
19 42.7 15 2.0 Yes
20 20.6 18 1.7 Yes
21 62.0 8 1.3 No
22 53.1 7 1.6 No
23 28.9 16 2.2 Yes
24 177.4 5 1.1 No
25 24.8 14 1.5 Yes
26 75.3 14 2.3 Yes
27 51.6 7 1.4 No
28 36.1 9 1.1 No
29 116.1 6 1.1 No
30 28.1 16 2.5 Yes
31 8.7 19 2.2 Yes
32 105.1 6 0.8 No
33 46.0 15 3.0 Yes
34 102.6 7 1.2 No
35 15.8 15 2.2 No
36 60.0 7 1.3 No
37 96.4 13 2.6 No
38 24.2 14 1.7 No
39 14.5 15 2.4 No
40 36.6 14 1.5 No
41 65.7 5 0.6 No
42 116.3 7 1.6 No
43 113.6 8 1.0 No
44 16.7 15 4.3 Yes
45 66.0 7 1.0 No
46 60.7 7 1.0 No
47 90.6 7 0.7 No
48 91.3 7 1.3 No
49 14.4 18 3.1 Yes
50 72.8 14 3.0 Yes

It sounds like you're just looking for format, which has a data.frame method.
A small example:
mydf <- mydf2 <- data.frame(
Distance = c(21.4, 13.9, 23.9, 8.7, 241.8, 44.5),
Age = c(18, 17, 16, 18, 6, 17),
Height = c(3.3, 3.4, 2.9, 3.6, 0.7, 1.3),
Coning = c("Y", "Y", "Y", "N", "N", "Y"))
format(mydf, nsmall = 2)
# Distance Age Height Coning
# 1 21.40 18.00 3.30 Y
# 2 13.90 17.00 3.40 Y
# 3 23.90 16.00 2.90 Y
# 4 8.70 18.00 3.60 N
# 5 241.80 6.00 0.70 N
# 6 44.50 17.00 1.30 Y
As you should expect, if the data are integers, they won't be printed as decimals.
mydf2$Age <- as.integer(mydf2$Age)
format(mydf2, nsmall = 2)
# Distance Age Height Coning
# 1 21.40 18 3.30 Y
# 2 13.90 17 3.40 Y
# 3 23.90 16 2.90 Y
# 4 8.70 18 3.60 N
# 5 241.80 6 0.70 N
# 6 44.50 17 1.30 Y

An alternative to format is to globally set the option for the number of digits to be displayed:
options(digits=2)
Will mean that from that point forward all numerics will be printed up to 2 decimal places (The default is 7).

Related

Algorithm to minimize sample pooling to reach minimum mass

trying to determine how I would address this problem using R code.
Brief description of problem:
There is a minimum mass required to run an analysis of samples. Previously collected samples are often less than this mass, which means that multiple samples within an experimental treatment must be pooled to reach the minimum requirement. However, samples should be pooled as little as possible to maximize biological replicates.
For example, samples within Treatment A may have these masses: 8g, 7g, 5g, and 10g. Another Treatment, B, has samples with masses of 20g, 21g, 24g, and 29g.
If the minimum mass required for the analysis is 15g, then each sample in Treatment B can be analyzed without pooling. However, in Treatment A, samples would need to be pooled to reach this minimum.
It would be best to combine the 5g and 10g sample and the 8g and 7g sample, because this maximizes the number of possible pooled samples by minimizing potential total masses (i.e., if I combined the 5g and 8g and also the 10g and 7g, I would only have one possible pooled sample that meets the minimum)
Data and R
The data is structured as this example follows:
sample_id = c(1:24)
treatments = c(rep("A",8),rep("B",8),rep("C",8))
mass = round(c(runif(8,4,10),runif(8,5,13),runif(8,15,18)),1)
df = data.frame(cbind(sample_id,treatments,mass))
df$mass = as.numeric(df$mass)
df$sample_id = as.numeric(df$sample_id)
> df
sample_id treatments mass
1 1 A 8.6
2 2 A 8.9
3 3 A 7.5
4 4 A 4.5
5 5 A 7.9
6 6 A 4.5
7 7 A 7.7
8 8 A 6.6
9 9 B 5.0
10 10 B 12.0
11 11 B 7.4
12 12 B 8.4
13 13 B 12.2
14 14 B 10.0
15 15 B 6.5
16 16 B 12.1
17 17 C 15.6
18 18 C 16.5
19 19 C 16.8
20 20 C 17.5
21 21 C 15.6
22 22 C 17.6
23 23 C 18.0
24 24 C 15.8
So far my strategy has been:
# Step 1: separate out all samples that do not need to be pooled, for ease IRL
bigenough = df %>%
filter(mass >= 15)
#Keep df with all the samples that will need to be pooled
poolneeded = df %>%
filter(!(sample_id %in% bigenough$sample_id))
I am at a loss of how to best pool the samples algorithmically however. If anyone has any suggestions that would be helpful. I usually use tidyverse if that helps...
Here is a first attempt. It is made up of a function which takes a split data.frame (split by treatment) of the data to be pooled.
In this function a new DF is created which contains all pairwise possibilities of the sample_id. This df2 is then 2 times left_join-ed with data and the sum of the two samples is calculated, filtered for being >= 15 and ordered.
This function is then called by map after group_split. The result is all the possible allowed sample combinations.
library(tidyverse)
fff <- function(data) {
nn <- nrow(data)
mm <- combn(seq(data$sample_id[1], data$sample_id[nn]), 2) |> t()
df2 <- data.frame(mm) |> setNames(c("sample_id", "sample_id_2"))
ddf <- df2 |>
left_join(data) |> # nolint: object_usage_linter.
left_join(data, by = c("sample_id_2" = "sample_id", "treatments")) |>
mutate(sum = mass.x + mass.y) |> # nolint: object_usage_linter.
filter(sum >= 15) |>
arrange(sample_id, sum) # nolint: object_usage_linter.
return(ddf)
}
poolneeded |>
group_split(treatments) |>
map(fff)
#> Joining, by = "sample_id"
#> Joining, by = "sample_id"
#> [[1]]
#> sample_id sample_id_2 treatments mass.x mass.y sum
#> 1 1 2 A 9.4 6.2 15.6
#> 2 1 7 A 9.4 6.5 15.9
#> 3 1 4 A 9.4 6.8 16.2
#> 4 1 3 A 9.4 7.6 17.0
#> 5 1 8 A 9.4 8.9 18.3
#> 6 2 8 A 6.2 8.9 15.1
#> 7 3 8 A 7.6 8.9 16.5
#> 8 4 8 A 6.8 8.9 15.7
#> 9 7 8 A 6.5 8.9 15.4
#>
#> [[2]]
#> sample_id sample_id_2 treatments mass.x mass.y sum
#> 1 9 10 B 10.9 7.0 17.9
#> 2 9 14 B 10.9 7.2 18.1
#> 3 9 11 B 10.9 7.9 18.8
#> 4 9 16 B 10.9 8.5 19.4
#> 5 9 13 B 10.9 11.2 22.1
#> 6 9 12 B 10.9 11.7 22.6
#> 7 9 15 B 10.9 11.7 22.6
#> 8 10 16 B 7.0 8.5 15.5
#> 9 10 13 B 7.0 11.2 18.2
#> 10 10 12 B 7.0 11.7 18.7
#> 11 10 15 B 7.0 11.7 18.7
#> 12 11 14 B 7.9 7.2 15.1
#> 13 11 16 B 7.9 8.5 16.4
#> 14 11 13 B 7.9 11.2 19.1
#> 15 11 12 B 7.9 11.7 19.6
#> 16 11 15 B 7.9 11.7 19.6
#> 17 12 14 B 11.7 7.2 18.9
#> 18 12 16 B 11.7 8.5 20.2
#> 19 12 13 B 11.7 11.2 22.9
#> 20 12 15 B 11.7 11.7 23.4
#> 21 13 14 B 11.2 7.2 18.4
#> 22 13 16 B 11.2 8.5 19.7
#> 23 13 15 B 11.2 11.7 22.9
#> 24 14 16 B 7.2 8.5 15.7
#> 25 14 15 B 7.2 11.7 18.9
#> 26 15 16 B 11.7 8.5 20.2
Another way
This makes use of the same function fff as above but it needs to be called with a subset of the poolneeeded - in this case below it is a subset of treatments == "B" .
You see then a DF of all possible allowed combinations for pooling and can choose a first pair for pooling. Then the remaining choices for a second pooling are also shown.
sel2 <- function(data) {
ddf <- fff(data)
cat(paste("\n", "These are your possibilities for the FIRST pooling", "\n"))
print(ddf)
ask <- askYesNo("Do You want to make first choice?")
if (ask) {
s_1 <- readline(prompt = "Enter sample 1: ")
s_2 <- readline(prompt = "Enter sample 2: ")
ddf2 <- ddf |> filter(
sample_id != s_1 & sample_id != s_2 &
sample_id_2 != s_1 & sample_id_2 != s_2 # nolint: object_usage_linter.
)
cat(paste0("\n", "These are your possibilities for the SECOND pooling", "\n"))
print(ddf2)
} else {
return()
}
}
poolneeded_b <- poolneeded |> filter(treatments == "B")
sel2(poolneeded_b)
#> r$> sel2(poolneeded_b)
#> Joining, by = "sample_id"
#>
#> These are your possibilities for the FIRST pooling
#> sample_id sample_id_2 treatments mass.x mass.y sum
#> 1 9 14 B 7.6 8.1 15.7
#> 2 9 12 B 7.6 8.7 16.3
#> 3 9 15 B 7.6 9.6 17.2
#> 4 9 10 B 7.6 10.3 17.9
#> 5 9 16 B 7.6 10.9 18.5
#> 6 9 13 B 7.6 12.3 19.9
#> 7 10 11 B 10.3 5.6 15.9
#> 8 10 14 B 10.3 8.1 18.4
#> 9 10 12 B 10.3 8.7 19.0
#> 10 10 15 B 10.3 9.6 19.9
#> 11 10 16 B 10.3 10.9 21.2
#> 12 10 13 B 10.3 12.3 22.6
#> 13 11 15 B 5.6 9.6 15.2
#> 14 11 16 B 5.6 10.9 16.5
#> 15 11 13 B 5.6 12.3 17.9
#> 16 12 14 B 8.7 8.1 16.8
#> 17 12 15 B 8.7 9.6 18.3
#> 18 12 16 B 8.7 10.9 19.6
#> 19 12 13 B 8.7 12.3 21.0
#> 20 13 14 B 12.3 8.1 20.4
#> 21 13 15 B 12.3 9.6 21.9
#> 22 13 16 B 12.3 10.9 23.2
#> 23 14 15 B 8.1 9.6 17.7
#> 24 14 16 B 8.1 10.9 19.0
#> 25 15 16 B 9.6 10.9 20.5
#>
#> Do You want to make first choice? (Yes/no/abbrechen) y
#> Enter sample 1: 9
#> Enter sample 2: 14
#>
#> These are your possibilities for the SECOND pooling
#> sample_id sample_id_2 treatments mass.x mass.y sum
#> 1 10 11 B 10.3 5.6 15.9
#> 2 10 12 B 10.3 8.7 19.0
#> 3 10 15 B 10.3 9.6 19.9
#> 4 10 16 B 10.3 10.9 21.2
#> 5 10 13 B 10.3 12.3 22.6
#> 6 11 15 B 5.6 9.6 15.2
#> 7 11 16 B 5.6 10.9 16.5
#> 8 11 13 B 5.6 12.3 17.9
#> 9 12 15 B 8.7 9.6 18.3
#> 10 12 16 B 8.7 10.9 19.6
#> 11 12 13 B 8.7 12.3 21.0
#> 12 13 15 B 12.3 9.6 21.9
#> 13 13 16 B 12.3 10.9 23.2
#> 14 15 16 B 9.6 10.9 20.5

Replace cells in a column of dataframe with NA if the column name exceeds the value of that row in an adjacent column

I have a dataframe with multiple columns. One of the columns is N_L which ranges between 1 to 5. I have 5 columns named e_1, e_2, e_3, e_4, and e_5. The values in the e_ columns is calculated from other columns in the dataframe. A sample of the data is provided:
> head (DATA)
N_l S OH e_1 e_2 e_3 e_4 e_5 e_sum
1 3 9 3.6 14.6 2.6 -9.4 -21.4 -33.4 -47
2 3 9 3.6 14.6 2.6 -9.4 -21.4 -33.4 -47
3 4 12 4.8 21.8 9.8 -2.2 -14.2 -26.2 -11
4 4 12 4.8 21.8 9.8 -2.2 -14.2 -26.2 -11
5 4 12 4.8 21.8 9.8 -2.2 -14.2 -26.2 -11
6 5 15 6 29 17 5 -7 -19 25
7 5 15 6 29 17 5 -7 -19 25
The e_ columns are calculated based on the other columns in the main dataframe such that:
DATA$e_1 <- (((DATA$N_b-1)*DATA$S + 2*DATA$OH)/2) - (parapet + edge.dist + truck.width/2)
DATA$e_2 <- DATA$e_1 - 2*truck.width
DATA$e_3 <- DATA$e_2 - 2*truck.width
DATA$e_4 <- DATA$e_3 - 2*truck.width
DATA$e_5 <- DATA$e_4 - 2*truck.width
DATA$e_sum <- DATA$e_1 + DATA$e_2 + DATA$e_3 + DATA$e_4 +DATA$e_5
I would like to set the columns e_1, e_2, e_3, e_4, e_5 to "NA" or "0" if the value in column N_L is less >= than the column name e_1, e_2, etc.
For example for the example above I would like to have:
N_l S OH e_1 e_2 e_3 e_4 e_5 e_sum
1 3 9 3.6 14.6 2.6 -9.4 NA NA 7.8
2 3 9 3.6 14.6 2.6 -9.4 NA NA 7.8
3 4 12 4.8 21.8 9.8 -2.2 -14.2 NA 15.2
4 4 12 4.8 21.8 9.8 -2.2 -14.2 NA 15.2
5 4 12 4.8 21.8 9.8 -2.2 -14.2 NA 15.2
6 5 15 6 29 17 5 -7 -19 25
7 5 15 6 29 17 5 -7 -19 25
Here's an approach requiring no loops that relies on multiplication:
DATA[,4:8] <- DATA[,4:8] * +(matrix(1:5, byrow = TRUE, ncol = 5, nrow = nrow(DATA))
<= DATA$N_l)
DATA$e_sum <- rowSums(DATA[,4:8])
N_l S OH e_1 e_2 e_3 e_4 e_5 e_sum
1 3 9 3.6 14.6 2.6 -9.4 0.0 0 7.8
2 3 9 3.6 14.6 2.6 -9.4 0.0 0 7.8
3 4 12 4.8 21.8 9.8 -2.2 -14.2 0 15.2
4 4 12 4.8 21.8 9.8 -2.2 -14.2 0 15.2
5 4 12 4.8 21.8 9.8 -2.2 -14.2 0 15.2
6 5 15 6.0 29.0 17.0 5.0 -7.0 -19 25.0
7 5 15 6.0 29.0 17.0 5.0 -7.0 -19 25.0
Here is one slightly convoluted option
library(plyr)
DATA <- do.call(rbind.fill,apply(DATA,1, function(x) as.data.frame(t(x[c(1:(3+x[1]))]))))
DATA$e_sum <- rowSums(DATA[,4:8],na.rm=T)
> DATA
N_l S OH e_1 e_2 e_3 e_4 e_5 e_sum
1 3 9 3.6 14.6 2.6 -9.4 NA NA 7.8
2 3 9 3.6 14.6 2.6 -9.4 NA NA 7.8
3 4 12 4.8 21.8 9.8 -2.2 -14.2 NA 15.2
4 4 12 4.8 21.8 9.8 -2.2 -14.2 NA 15.2
5 4 12 4.8 21.8 9.8 -2.2 -14.2 NA 15.2
6 5 15 6.0 29.0 17.0 5.0 -7.0 -19 25.0
7 5 15 6.0 29.0 17.0 5.0 -7.0 -19 25.0
Data:
DATA <- structure(list(N_l = c(3L, 3L, 4L, 4L, 4L, 5L, 5L), S = c(9L,
9L, 12L, 12L, 12L, 15L, 15L), OH = c(3.6, 3.6, 4.8, 4.8, 4.8,
6, 6), e_1 = c(14.6, 14.6, 21.8, 21.8, 21.8, 29, 29), e_2 = c(2.6,
2.6, 9.8, 9.8, 9.8, 17, 17), e_3 = c(-9.4, -9.4, -2.2, -2.2,
-2.2, 5, 5), e_4 = c(-21.4, -21.4, -14.2, -14.2, -14.2, -7, -7
), e_5 = c(-33.4, -33.4, -26.2, -26.2, -26.2, -19, -19), e_sum = c(-47L,
-47L, -11L, -11L, -11L, 25L, 25L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7"))
Using a for loop overwriting where DATA$N_l < i, assuming that e_1 - e_5 are at that place where they are now.
for(i in 1:5) {DATA[DATA$N_l < i,i+3] <- NA}
DATA$e_sum <- rowSums(DATA[4:8], na.rm=TRUE)
DATA
# N_l S OH e_1 e_2 e_3 e_4 e_5 e_sum
#1 3 9 3.6 14.6 2.6 -9.4 NA NA 7.8
#2 3 9 3.6 14.6 2.6 -9.4 NA NA 7.8
#3 4 12 4.8 21.8 9.8 -2.2 -14.2 NA 15.2
#4 4 12 4.8 21.8 9.8 -2.2 -14.2 NA 15.2
#5 4 12 4.8 21.8 9.8 -2.2 -14.2 NA 15.2
#6 5 15 6.0 29.0 17.0 5.0 -7.0 -19 25.0
#7 5 15 6.0 29.0 17.0 5.0 -7.0 -19 25.0
or use apply with [<- and seq_len
DATA[4:8] <- t(apply(DATA[c(1,4:8)], 1, function(x)
"[<-"(x[-1], -seq_len(x[1]), NA)))

qtgrace/xmgrace non-overlaping data sets

I'm using qtgrace for MacOS and when I plotted two data in qtgrace I got something like this:
Overlapping data sets
However, I would like to plot something like this:
Non-overlapping data sets
My data 1:
0 14
0.1 6
0.2 14
0.3 14
0.4 14
0.5 14
0.6 14
0.7 14
0.8 6
0.9 6
1 6
1.1 6
1.2 6
1.3 6
1.4 6
1.5 6
1.6 6
1.7 6
1.8 6
1.9 6
2 6
2.1 6
2.2 6
2.3 6
2.4 6
2.5 6
2.6 6
2.7 6
2.8 6
2.9 6
3 6
3.1 6
3.2 6
3.3 6
3.4 6
3.5 6
3.6 6
3.7 6
3.8 6
3.9 6
4 6
4.1 6
4.2 6
4.3 6
4.4 6
4.5 6
4.6 6
4.7 6
4.8 6
4.9 6
5 6
5.1 6
5.2 6
5.3 6
5.4 6
5.5 6
5.6 6
5.7 6
5.8 6
5.9 6
6 6
6.1 6
6.2 6
6.3 6
6.4 6
6.5 6
6.6 6
6.7 6
6.8 6
6.9 6
7 6
7.1 6
7.2 6
7.3 2
7.4 6
7.5 2
7.6 2
7.7 2
7.8 2
7.9 6
8 2
8.1 6
8.2 2
8.3 2
8.4 6
8.5 6
8.6 6
8.7 2
8.8 6
8.9 19
9 19
9.1 6
9.2 6
9.3 6
9.4 2
9.5 2
9.6 2
9.7 2
9.8 2
9.9 2
10 2
10.1 2
10.2 2
10.3 2
10.4 2
10.5 2
10.6 2
10.7 2
10.8 2
10.9 2
11 2
11.1 2
11.2 2
11.3 2
11.4 2
11.5 2
11.6 2
11.7 2
11.8 2
11.9 2
12 2
12.1 2
12.2 2
12.3 2
12.4 2
12.5 2
12.6 2
12.7 2
12.8 2
12.9 2
13 2
13.1 2
13.2 2
13.3 2
13.4 2
13.5 2
13.6 2
13.7 2
13.8 2
13.9 2
14 2
14.1 2
14.2 2
14.3 2
14.4 2
14.5 2
14.6 2
14.7 2
14.8 2
14.9 2
15 2
15.1 2
15.2 2
15.3 2
15.4 2
15.5 2
15.6 2
15.7 2
15.8 2
15.9 2
16 2
16.1 2
16.2 2
16.3 2
16.4 2
16.5 2
16.6 2
16.7 2
16.8 2
16.9 2
17 2
17.1 2
17.2 2
17.3 2
17.4 2
17.5 2
17.6 2
17.7 2
17.8 2
17.9 2
18 2
18.1 2
18.2 2
18.3 2
18.4 2
18.5 2
18.6 2
18.7 2
18.8 2
18.9 2
19 2
19.1 2
19.2 2
19.3 2
19.4 2
19.5 2
19.6 2
19.7 2
19.8 2
19.9 2
20 2
20.1 2
20.2 2
20.3 2
20.4 2
20.5 2
20.6 2
20.7 2
20.8 2
20.9 2
21 2
21.1 2
21.2 2
21.3 2
21.4 2
21.5 2
21.6 2
21.7 2
21.8 7
21.9 2
22 2
22.1 2
22.2 2
22.3 7
22.4 7
22.5 7
22.6 7
22.7 7
22.8 2
22.9 2
23 7
23.1 7
23.2 7
23.3 7
23.4 7
23.5 2
23.6 2
23.7 2
23.8 2
23.9 2
24 2
24.1 2
24.2 2
24.3 2
24.4 2
24.5 2
24.6 2
24.7 2
24.8 2
24.9 2
25 2
. .
. .
. .
Data 2:
0 4
0.1 4
0.2 4
0.3 4
0.4 4
0.5 4
0.6 4
0.7 4
0.8 4
0.9 4
1 2
1.1 4
1.2 4
1.3 4
1.4 4
1.5 4
1.6 4
1.7 4
1.8 4
1.9 4
2 4
2.1 4
2.2 4
2.3 4
2.4 4
2.5 4
2.6 4
2.7 4
2.8 4
2.9 4
3 4
3.1 4
3.2 4
3.3 4
3.4 4
3.5 4
3.6 4
3.7 4
3.8 4
3.9 4
4 4
4.1 4
4.2 4
4.3 4
4.4 4
4.5 4
4.6 4
4.7 4
4.8 4
4.9 4
5 4
5.1 4
5.2 4
5.3 4
5.4 4
5.5 4
5.6 4
5.7 4
5.8 4
5.9 4
6 4
6.1 4
6.2 4
6.3 4
6.4 4
6.5 4
6.6 4
6.7 4
6.8 4
6.9 4
7 4
7.1 4
7.2 4
7.3 4
7.4 4
7.5 4
7.6 4
7.7 4
7.8 4
7.9 4
8 4
8.1 4
8.2 4
8.3 4
8.4 2
8.5 4
8.6 4
8.7 4
8.8 4
8.9 4
9 4
9.1 4
9.2 4
9.3 4
9.4 4
9.5 4
9.6 4
9.7 4
9.8 4
9.9 4
10 4
10.1 4
10.2 4
10.3 4
10.4 4
10.5 2
10.6 2
10.7 4
10.8 2
10.9 2
11 2
11.1 2
11.2 4
11.3 4
11.4 2
11.5 2
11.6 2
11.7 2
11.8 2
11.9 2
12 2
12.1 2
12.2 2
12.3 2
12.4 4
12.5 4
12.6 2
12.7 2
12.8 4
12.9 2
13 2
13.1 4
13.2 4
13.3 4
13.4 4
13.5 10
13.6 2
13.7 2
13.8 2
13.9 2
14 2
14.1 2
14.2 2
14.3 10
14.4 2
14.5 2
14.6 4
14.7 2
14.8 2
14.9 4
15 2
15.1 10
15.2 2
15.3 2
15.4 2
15.5 2
15.6 2
15.7 2
15.8 2
15.9 2
16 2
16.1 2
16.2 2
16.3 2
16.4 2
16.5 2
16.6 2
16.7 2
16.8 2
16.9 2
17 2
17.1 2
17.2 2
17.3 2
17.4 2
17.5 2
17.6 2
17.7 2
17.8 2
17.9 2
18 2
18.1 2
18.2 2
18.3 2
18.4 2
18.5 2
18.6 2
18.7 2
18.8 2
18.9 2
19 2
19.1 2
19.2 2
19.3 2
19.4 2
19.5 2
19.6 2
19.7 2
19.8 2
19.9 2
20 2
20.1 2
20.2 2
20.3 2
20.4 2
20.5 2
20.6 2
20.7 2
20.8 2
20.9 2
21 2
21.1 2
21.2 2
21.3 2
21.4 2
21.5 2
21.6 2
21.7 2
21.8 2
21.9 2
22 2
22.1 2
22.2 2
22.3 2
22.4 2
22.5 2
22.6 2
22.7 2
22.8 2
22.9 2
23 2
23.1 2
23.2 2
23.3 2
23.4 2
23.5 2
23.6 2
23.7 2
23.8 2
23.9 2
24 2
24.1 2
24.2 2
24.3 2
24.4 2
24.5 2
24.6 2
24.7 2
24.8 2
24.9 2
25 2
. .
. .
. .
The data are in two separate xvg file from GROMACS cluster analysis. I wanna plot five different sets in a manner which I can see all data without superposing.
Thank you!
I think the best approach would be to write a script that takes the original files and spits out new files with shifted y values. However, since you have asked for a qt/xmgrace solution, here is how you do it:
Load up all the datasets into qtgrace
Open the "Data -> Transformations -> Evaluate expression..." dialog
Select in the left and right columns a dataset and in the textbox below enter the formula y = y + 0.1. Click "apply". This will shift the dataset up by 0.1
Select the next dataset in the same way and use the formula y = y + 0.2. Click apply
Rinse and repeat for all the datasets (changing the shift accordingly)

The type is integer, yet it has a decimal point. Why?

When I check the type, it said "integer", but has decimal point. If I change it to numeric, it become integer(no decimal point).
Because I want to do histogram, x must be numeric, but if change to numeric, all data wrong.
> typeof(data$fare_amount)
[1] "integer"
> data$fare_amount
[1] 5.5 6.5 8.0 13.5 5.5 9.5 7.5 8.0 16.0 8.0 5.5 7.0 8.0 5.0 9.5 23.0 5.0 6.0 17.5 12.0 8.5 13.0
[23] 6.5 4.5 52.0 14.5 7.5 4.5 9.0 10.0 15.0 11.5 6.0 12.5 7.5 8.0 6.5 7.5 31.5 10.0 10.0 10.0 4.0 8.5
[45] 24.0 8.5 5.5 14.0 11.0 4.5 9.0 7.5 22.0 8.5 24.0 36.5 15.0 10.5 9.5 17.0 4.5 6.0 6.5 11.5 16.0 6.5
[67] 7.0 20.0 13.5 30.0 8.0 11.0 6.5 11.5 6.5 37.0 5.5 12.5 8.5 58.5 13.5 8.5 9.0 6.0 6.5 9.0 38.0 4.5
[89] 10.0 9.0 44.5 11.0 12.0 4.5 14.5 8.5 32.0 9.5 4.5 6.0 6.5 6.0 31.5 52.0 10.5 12.0 5.5 24.5 7.0 5.5
[111] 16.5 5.0 5.5 6.5 3.5 11.5 13.0 6.0 14.0 3.5
42 Levels: 13.5 16.0 5.5 6.5 7.5 8.0 9.5 12.0 17.5 23.0 5.0 6.0 7.0 10.0 13.0 14.5 4.5 52.0 8.5 9.0 11.5 12.5 ... 3.5
> temp <- as.numeric(data$fare_amount)
> temp
[1] 3 4 6 1 3 7 5 6 2 6 3 13 6 11 7 10 11 12 9 8 19 15 4 17 18 16 5 17 20 14 23 21 12 22 5 6 4 5
[39] 24 14 14 14 28 19 27 19 3 26 25 17 20 5 31 19 27 32 23 29 7 30 17 12 4 21 2 4 13 33 1 34 6 25 4 21 4 35
[77] 3 22 19 36 1 19 20 12 4 20 37 17 14 20 39 25 8 17 16 19 38 7 17 12 4 12 24 18 29 8 3 40 13 3 41 11 3 4
[115] 42 21 15 12 26 42

create boxplots of quantiles against binary endpoint

I am trying to plot boxplots of quantiles of the numeric continuos variables (x-axis) with the binary end points(y-axis) .
my inuput data is:
ID endpoint var2 var3 var4 var5
1 0 62 3.1 13 10.1
2 1 150 4.1 9 11.1
3 0 18 5.1 0.6 12.1
4 0 60 6.1 3 13.1
5 0 20 7.1 1 14.1
6 1 100 8.1 19 15.1
7 0 56 9.1 2 16.1
8 1 36 10.1 5 17.1
9 0 33.2 11.1 4 18.1
10 1 200 12.1 64 19.1
Please help to do this plotting in R.
Thanks in advance.
Try melt {reshape2} and then plot with either base R or ggplot as follows:
library(reshape2)
library(ggplot2)
data = read.table(text="ID endpoint var2 var3 var4 var5
1 0 62 3.1 13 10.1
2 1 150 4.1 9 11.1
3 0 18 5.1 0.6 12.1
4 0 60 6.1 3 13.1
5 0 20 7.1 1 14.1
6 1 100 8.1 19 15.1
7 0 56 9.1 2 16.1
8 1 36 10.1 5 17.1
9 0 33.2 11.1 4 18.1
10 1 200 12.1 64 19.1",header=TRUE)
head(data)
# Melt data
data.melt = melt(data[,2:6],id='endpoint')
head(data.melt)
# Boxplot with base
boxplot(value~endpoint*variable, data=data.melt, col=(c("red","green",'blue')))
# Boxplot with ggplot
ggplot(data.melt, aes(x=as.factor(endpoint), y=value, fill=variable)) + geom_boxplot()
Hope it helps.

Resources