Using rbind to merge data frames

Using rbind to merge data frames - r

I have 2 data frames A and B of dimensions 2 x 5 like this:
A = data.frame(GeneA1=-0.02:1.89, GeneB2=0.25:1.99, GeneB3=0.17:1.87, GeneB4=0.3:1.63, GeneC2=0.29:1.97, row.names=c("sample 1", "sample 2"))
B = data.frame(GeneA1=0.52:-0.04, GeneB1=1.1:0.08, GeneB3=0.72:0.03, GeneB5=0.78:0.06, GeneC2=0.78:0.25, row.names=c("sample 1", "sample 2"))
For both A & B, the rows are samples and the columns are gene type
I want to try and merge A & B using rbind, adding NAs where the gene types don't match up. I've heard there's a way to do this, using the setdiff argument but I don't know how?

Use merge
> AB <- merge(A, B, all=TRUE)
> AB[,order(names(AB))] # to get the result ordered by colnames
Gene A1 Gene B1 Gene B2 Gene B3 Gene B4 Gene B5 Gene C2
1 -0.04 0.08 NA 0.03 NA 0.06 0.25
2 -0.02 NA 0.25 0.17 0.30 NA 0.29
3 0.52 1.10 NA 0.72 NA 0.78 0.78
4 1.89 NA 1.99 1.87 1.63 NA 1.97
Where A and B are as follows:
A <- matrix(c(-0.02, 0.25, 0.17, 0.3, 0.29,
1.89, 1.99, 1.87, 1.63, 1.97),
nrow=2, byrow=TRUE,
dimnames=list(NULL, c("Gene A1", "Gene B2",
"Gene B3",
"Gene B4", "Gene C2")))
B <- matrix(c(0.52, 1.1, 0.72, 0.78, 0.78,
-0.04, 0.08, 0.03, 0.06,0.25),
nrow=2, byrow=TRUE,
dimnames=list(NULL, c("Gene A1", "Gene B1",
"Gene B3",
"Gene B5", "Gene C2")))

You can use the function merge:
A=data.frame(A1=c(-0.02,1.89),B2=c(0.25,1.99),B3=c(0.17,1.87),B4=c(0.3,1.63),C2=c(0.29,1.97))
B=data.frame(A1=c(0.52,-0.04),B1=c(1.1,0.08),B3=c(0.72,0.03),B5=c(0.78,0.06),C2=c(0.78,0.25))
C<-merge(A, B, all=T)
View(C)

Try this:
# dummy data
A <- read.table(text="
Gene A1, Gene B2, Gene B3, Gene B4, Gene C2
0.52, 0.25, 0.17, 0.3, 0.29
1.89, 1.99, 1.87, 1.63, 1.97",
sep=",", header=TRUE)
B <- read.table(text="
Gene A1, Gene B1, Gene B3, Gene B5, Gene C2
0.52, 1.1, 0.72, 0.78, 0.78
-0.04, 0.08, 0.03, 0.06,0.25",
sep=",", header=TRUE)
#transpose and merge
tAB <- merge(t(A),t(B),by="row.names",all=TRUE)
#keep gene names
col <- tAB[,1]
#exclude rownames, transpose
output <- t(tAB[,-1])
#update colnames
colnames(output) <- col
#output
# Gene.A1 Gene.B1 Gene.B2 Gene.B3 Gene.B4 Gene.B5 Gene.C2
#V1.x -0.02 NA 0.25 0.17 0.30 NA 0.29
#V2.x 1.89 NA 1.99 1.87 1.63 NA 1.97
#V1.y 0.52 1.10 NA 0.72 NA 0.78 0.78
#V2.y -0.04 0.08 NA 0.03 NA 0.06 0.25

Related

R - transpose dataframe with multiple id columns and multiple variables [duplicate]

I am trying to use pivot_longer. However, I am not sure how to use names_sep or names_pattern to solve this.
dat <- tribble(
~group, ~BP, ~HS, ~BB, ~lowerBP, ~upperBP, ~lowerHS, ~upperHS, ~lowerBB, ~upperBB,
"1", 0.51, 0.15, 0.05, 0.16, 0.18, 0.5, 0.52, 0.14, 0.16,
"2.1", 0.67, 0.09, 0.06, 0.09, 0.11, 0.66, 0.68, 0.08, 0.1,
"2.2", 0.36, 0.13, 0.07, 0.12, 0.15, 0.34, 0.38, 0.12, 0.14,
"2.3", 0.09, 0.17, 0.09, 0.13, 0.16, 0.08, 0.11, 0.15, 0.18,
"2.4", 0.68, 0.12, 0.07, 0.12, 0.14, 0.66, 0.69, 0.11, 0.13,
"3", 0.53, 0.15, 0.06, 0.14, 0.16, 0.52, 0.53, 0.15, 0.16)
Desired output (First row from wide data)
group names values lower upper
1 BP 0.51 0.16 0.18
1 HS 0.15 0.5 0.52
1 BB 0.05 0.14 0.16

Here is solution following a similar method that #Fnguyen used but using the newer pivot_longer and pivot_wider construct:
library(dplyr)
library(tidyr)
longer<-pivot_longer(dat, cols=-1, names_pattern = "(.*)(..)$", names_to = c("limit", "name")) %>%
mutate(limit=ifelse(limit=="", "value", limit))
answer <-pivot_wider(longer, id_cols = c(group, name), names_from = limit, values_from = value, names_repair = "check_unique")
Most of the selecting, separating, mutating and renaming is taking place within the pivot function calls.
Update:
This regular expressions "(.*)(..)$" means:
( ) ( ) Look for two parts,
(.*) the first part should have zero or more characters
(..) the second part should have just 2 characters at the “$” end of the string

A data.table version (not sure yet how to retain the original names so that you dont need to post substitute them https://github.com/Rdatatable/data.table/issues/2551):
library(data.table)
df <- data.table(dat)
v <- c("BP","HS","BB")
setnames(df, v, paste0("x",v) )
g <- melt(df, id.vars = "group",
measure.vars = patterns(values = "x" ,
lower = "lower",
upper = "upper"),
variable.name = "names")
g[names==1, names := "BP" ]
g[names==2, names := "HS" ]
g[names==3, names := "BB" ]
group names values lower upper
1: 1 BP 0.51 0.16 0.18
2: 2.1 BP 0.67 0.09 0.11
3: 2.2 BP 0.36 0.12 0.15
4: 2.3 BP 0.09 0.13 0.16
5: 2.4 BP 0.68 0.12 0.14
6: 3 BP 0.53 0.14 0.16
7: 1 HS 0.15 0.50 0.52
8: 2.1 HS 0.09 0.66 0.68
9: 2.2 HS 0.13 0.34 0.38
10: 2.3 HS 0.17 0.08 0.11
11: 2.4 HS 0.12 0.66 0.69
12: 3 HS 0.15 0.52 0.53
13: 1 BB 0.05 0.14 0.16
14: 2.1 BB 0.06 0.08 0.10
15: 2.2 BB 0.07 0.12 0.14
16: 2.3 BB 0.09 0.15 0.18
17: 2.4 BB 0.07 0.11 0.13
18: 3 BB 0.06 0.15 0.16

Based on your example data this solution using dplyr works for me:
library(dplyr)
dat %>%
gather(key, values,-group) %>%
mutate(names = gsub("lower","",gsub("upper","",key))) %>%
separate(key, into = c("key1","key2") ,"[[:upper:]]", perl=T) %>%
mutate(key1 = case_when(key1 == "" ~ "values", TRUE ~ key1)) %>%
select(group,names,key1,values) %>%
rowid_to_column() %>%
spread(key1,values) %>%
select(-rowid) %>%
group_by(group,names) %>%
summarise_all(mean,na.rm = TRUE)

I'd like to add an alternative tidyverse solution drawing from the answer provided by #Dave2e.
Like Dave2e's solution it's a two-step procedure (first rename, then reshape). Instead of reshaping the data twice, I add the prefix "values" to the columns named "BP", "HS", and "BB" using rename_with. This was necessary for getting the column names right when using the .value sentinel in the names_to argument of pivot_longer.
library(dplyr)
library(tidyr)
dat %>%
rename_with(~sub("^(BP|HS|BB)$", "values\\1", .)) %>% # add prefix values
pivot_longer(dat , cols= -1,
names_pattern = "(.*)(BP|HS|BB)$",
names_to = c(".value", "names"))

pivot_longer into multiple columns

I am trying to use pivot_longer. However, I am not sure how to use names_sep or names_pattern to solve this.
dat <- tribble(
~group, ~BP, ~HS, ~BB, ~lowerBP, ~upperBP, ~lowerHS, ~upperHS, ~lowerBB, ~upperBB,
"1", 0.51, 0.15, 0.05, 0.16, 0.18, 0.5, 0.52, 0.14, 0.16,
"2.1", 0.67, 0.09, 0.06, 0.09, 0.11, 0.66, 0.68, 0.08, 0.1,
"2.2", 0.36, 0.13, 0.07, 0.12, 0.15, 0.34, 0.38, 0.12, 0.14,
"2.3", 0.09, 0.17, 0.09, 0.13, 0.16, 0.08, 0.11, 0.15, 0.18,
"2.4", 0.68, 0.12, 0.07, 0.12, 0.14, 0.66, 0.69, 0.11, 0.13,
"3", 0.53, 0.15, 0.06, 0.14, 0.16, 0.52, 0.53, 0.15, 0.16)
Desired output (First row from wide data)
group names values lower upper
1 BP 0.51 0.16 0.18
1 HS 0.15 0.5 0.52
1 BB 0.05 0.14 0.16

Here is solution following a similar method that #Fnguyen used but using the newer pivot_longer and pivot_wider construct:
library(dplyr)
library(tidyr)
longer<-pivot_longer(dat, cols=-1, names_pattern = "(.*)(..)$", names_to = c("limit", "name")) %>%
mutate(limit=ifelse(limit=="", "value", limit))
answer <-pivot_wider(longer, id_cols = c(group, name), names_from = limit, values_from = value, names_repair = "check_unique")
Most of the selecting, separating, mutating and renaming is taking place within the pivot function calls.
Update:
This regular expressions "(.*)(..)$" means:
( ) ( ) Look for two parts,
(.*) the first part should have zero or more characters
(..) the second part should have just 2 characters at the “$” end of the string

A data.table version (not sure yet how to retain the original names so that you dont need to post substitute them https://github.com/Rdatatable/data.table/issues/2551):
library(data.table)
df <- data.table(dat)
v <- c("BP","HS","BB")
setnames(df, v, paste0("x",v) )
g <- melt(df, id.vars = "group",
measure.vars = patterns(values = "x" ,
lower = "lower",
upper = "upper"),
variable.name = "names")
g[names==1, names := "BP" ]
g[names==2, names := "HS" ]
g[names==3, names := "BB" ]
group names values lower upper
1: 1 BP 0.51 0.16 0.18
2: 2.1 BP 0.67 0.09 0.11
3: 2.2 BP 0.36 0.12 0.15
4: 2.3 BP 0.09 0.13 0.16
5: 2.4 BP 0.68 0.12 0.14
6: 3 BP 0.53 0.14 0.16
7: 1 HS 0.15 0.50 0.52
8: 2.1 HS 0.09 0.66 0.68
9: 2.2 HS 0.13 0.34 0.38
10: 2.3 HS 0.17 0.08 0.11
11: 2.4 HS 0.12 0.66 0.69
12: 3 HS 0.15 0.52 0.53
13: 1 BB 0.05 0.14 0.16
14: 2.1 BB 0.06 0.08 0.10
15: 2.2 BB 0.07 0.12 0.14
16: 2.3 BB 0.09 0.15 0.18
17: 2.4 BB 0.07 0.11 0.13
18: 3 BB 0.06 0.15 0.16

Based on your example data this solution using dplyr works for me:
library(dplyr)
dat %>%
gather(key, values,-group) %>%
mutate(names = gsub("lower","",gsub("upper","",key))) %>%
separate(key, into = c("key1","key2") ,"[[:upper:]]", perl=T) %>%
mutate(key1 = case_when(key1 == "" ~ "values", TRUE ~ key1)) %>%
select(group,names,key1,values) %>%
rowid_to_column() %>%
spread(key1,values) %>%
select(-rowid) %>%
group_by(group,names) %>%
summarise_all(mean,na.rm = TRUE)

I'd like to add an alternative tidyverse solution drawing from the answer provided by #Dave2e.
Like Dave2e's solution it's a two-step procedure (first rename, then reshape). Instead of reshaping the data twice, I add the prefix "values" to the columns named "BP", "HS", and "BB" using rename_with. This was necessary for getting the column names right when using the .value sentinel in the names_to argument of pivot_longer.
library(dplyr)
library(tidyr)
dat %>%
rename_with(~sub("^(BP|HS|BB)$", "values\\1", .)) %>% # add prefix values
pivot_longer(dat , cols= -1,
names_pattern = "(.*)(BP|HS|BB)$",
names_to = c(".value", "names"))

Replacing zeroes with NA for values preceding non-zero

I'm new to R and have been struggling with the following for a while now so I was hoping someone would be able to help me out.
The sample data represents stock price returns (each row is a monthly period). The real data set is much bigger and is structured like the input below:
Input:
stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- cbind(stock1,stock2,stock3,stock4)
stock1 stock2 stock3 stock4
[1,] 0.01 0.00 0.00 0.00
[2,] -0.02 0.00 0.00 -0.02
[3,] 0.01 0.02 0.02 0.01
[4,] 0.05 0.04 0.00 0.00
[5,] 0.04 -0.03 -0.01 0.00
[6,] -0.02 0.02 0.03 -0.02
Any zeroes that precedes a non-zero for a given stock represents missing data as opposed to a return of zero for the period. I would like to set these values as NA so the output I would like to achieve is the following:
Desired Output:
stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(NA, NA, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(NA, NA, 0.02, 0, -0.01, 0.03)
stock4 <- c(NA, -0.02, 0.01, 0, 0, -0.02)
df <- cbind(stock1,stock2,stock3,stock4)
stock1 stock2 stock3 stock4
[1,] 0.01 NA NA NA
[2,] -0.02 NA NA -0.02
[3,] 0.01 0.02 0.02 0.01
[4,] 0.05 0.04 0.00 0.00
[5,] 0.04 -0.03 -0.01 0.00
[6,] -0.02 0.02 0.03 -0.02
I've tried a few things but they only seem to work for a single vector as opposed to a data set with multiple columns. I've tried using lapply to get around this but haven't had any luck so far. The closest I've gotten is shown below.
My single vector solution:
stock1[1:min(which(stock1!=0))-1 <- NA
My multiple vector solution which does not work:
lapply(df,function(x) x[1:min(which(x!=0))-1 <- NA]
Would greatly appreciate any guidance! Thanks!

There are three issues. First, writing:
df <- cbind(stock1,stock2,stock3,stock4)
doesn't create a data frame. It creates a matrix. This is an issue when you try to use lapply, which will operate over the columns of a data frame but over the elements of a matrix. Instead, you should write:
df <- data.frame(stock1,stock2,stock3,stock4)
Second, the function you're using in lapply needs to return the modified vector. Otherwise, the return value will be something unexpected (in this case, the assignment will return a single NA, and the lapply will return a data frame of one row of NAs instead of the data frame you want).
Third, you need to take care with 1:n when n can be zero (i.e., when the first stock quote is non-zero) because 1:0 gives the sequence c(1,0) instead of an empty sequence. (This is arguably one of R's stupidest features.)
Therefore, the following will give you what you want:
stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(stock1,stock2,stock3,stock4)
as.data.frame(lapply(df, function(x) {
n <- min(which(x != 0)) - 1
if (n > 0)
x[1:n] <- NA
x
}))
The output is as expected:
stock1 stock2 stock3 stock4
1 0.01 NA NA NA
2 -0.02 NA NA -0.02
3 0.01 0.02 0.02 0.01
4 0.05 0.04 0.00 0.00
5 0.04 -0.03 -0.01 0.00
6 -0.02 0.02 0.03 -0.02
Update: As #Daniel_Fischer notes, there's a clever trick to avoid the 1:0 problem. You can instead write:
as.data.frame(lapply(df, function(x) {
n <- min(which(x != 0)) - 1
x[0:n] <- NA # use 0:n instead of 1:n
x
}))
This takes advantage of the fact that R ignores zeros in this type of indexing operation, so:
x[0:0] <- NA # same as x[0] <- NA and does nothing
x[0:1] <- NA # same as x[1] <- NA
x[0:2] <- NA # same as x[1:2] <- NA, etc.

This might be not the most elegant way, but I think it works
changeValues <- function(x){
place <- min(which(diff(c(0,cumsum(x==0)))==0))-1;
x[0:place] <- NA
x
}
apply(df,2,changeValues)
EDIT: Some brief explanation to the function: First I create a vector that increases at each position where is a zero in your column, then I check at which position this vector does not increase (=that means, there are not two zeros next to each other) and then I still take the minimum of that and make sure that these are only leading zeros (so that not values from within the matrix are changed)

stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(stock1,stock2,stock3,stock4) #the following function only works if df is actually a data.frame
df[] <- lapply(df, function(x) {ifelse(cumsum(x) == 0 & x == 0, NA, x)})
df
stock1 stock2 stock3 stock4
1 0.01 NA NA NA
2 -0.02 NA NA -0.02
3 0.01 0.02 0.02 0.01
4 0.05 0.04 0.00 0.00
5 0.04 -0.03 -0.01 0.00
6 -0.02 0.02 0.03 -0.02
Some explanation: first check for each cell whether the cumulative colSum ánd the current cell are equal to 0. If so, return NA, else the original value. The brackets behind df make sure the lapply function returns a dataframe again that is assigned to df.
Also, if you don't really need df to be a dataframe, this works as well:
df <- cbind(stock1,stock2,stock3,stock4)
apply(df, 2, function(x) {ifelse(cumsum(x) == 0 & x == 0, NA, x)})

Keep NA values in their original position when reordering vector

I have a large set of data that I want to reorder in groups of twelve using the sample() function in R to generate randomised data sets with which I can carry out a permutation test. However, this data has NA characters where data could not be collected and I would like them to stay in their respective original positions when the data is shuffled.
Currently, NAs are shuffled randomly with all other values. For example, where example.data is a made-up example set of 12 values:
example.data <- c(0.33, 0.12, NA, 0.25, 0.47, 0.83, 0.90, 0.64, NA, NA, 1.00, 0.42)
sample(example.data, replace = F, prob = NULL)
[1] 0.64 0.83 NA 0.33 0.47 0.90 0.25 NA 0.12 0.42 1.00 NA
Whereas a suitable reordering would be:
[1] 0.64 0.83 NA 0.33 0.47 0.90 0.25 0.12 NA NA 0.42 1.00
Is there a simple way to do this?
Thank you for your help!
This has been solved, but I have an extending question
Extending from this, if I have a set of data with a length of 24 how would I go about re-ordering the first and second set of 12 values individually?
For example, a vector extending from the first example:
example.data <- c(0.33, 0.12, NA, 0.25, 0.47, 0.83, 0.90, 0.64, NA, NA, 1.00, 0.42, 0.73, NA, 0.56, 0.12, 1.0, 0.47, NA, 0.62, NA, 0.98, NA, 0.05)
Where example.data[1:12] and example.data[13:24] are shuffled separately within their own respective groups.
The code I am trying to work this solution into is as follows:
shuffle.data = function(input.data,nr,ns){
simdata <- input.data
for(i in 1:nr){
start.row <- (ns*(i-1))+1
end.row <- start.row + actual.length[i] - 1
newdata = sample(input.data[start.row:end.row], size=actual.length[i], replace=F)
simdata[start.row:end.row] <- newdata
}
return(simdata)}
Where input.data is the raw input data (example.data); nr is the number of groups (2), ns is the size of each sample (12); and actual.length is the length of each group exluding NAs stored in a vector (actual.length <- c(9, 8) in the example above).
Thank you again for your help!

Is this what you are looking for ?
example.data[!is.na(example.data)] <- sample(example.data[!is.na(example.data)], replace = F, prob = NULL)

We can try with non-NA elements by creating an index
i1 <- which(!is.na(example.data))
example.data[i1] <- example.data[sample(i1)]
example.data
#[1] 0.25 0.64 NA 0.83 0.12 1.00 0.42 0.47 NA NA 0.33 0.90

Scatterplot with categorical x-axis (and uncertainties boxes) in R

I have made some calculations on data measured on several systems of photovoltaic panels. I have 11 different photovoltaic systems, and for each of them I have 3 different numerical values.
My results are in a matrix that has 11 rows (each of them corresponding to one of the photovoltaic systems), and 3 columns (containing the 3 numerical quantities computed for each system).
Here is a minimal reproducible matrix :
monthly_LR monthly_CSD monthly_HW
solon 0.398 0.417 0.48
sanyo 0.489 0.479 0.59
atersa NA NA NA
sunpower 0.129 NA 0.19
schott_efg 0.387 0.486 0.47
BP 0.235 0.161 0.22
solarworld 1.153 1.245 1.25
schott_main 0.531 0.628 0.62
wurth 2.889 2.886 2.85
first 1.631 1.651 1.64
mhi 0.974 0.888 1.02
and the corresponding dput output so you can reproduce it :
structure(c(0.398, 0.489, NA, 0.129, 0.387, 0.235, 1.153, 0.531,
2.889, 1.631, 0.974, 0.417, 0.479, NA, NA, 0.486, 0.161, 1.245,
0.628, 2.886, 1.651, 0.888, 0.48, 0.59, NA, 0.19, 0.47, 0.22,
1.25, 0.62, 2.85, 1.64, 1.02), .Dim = c(11L, 3L), .Dimnames = list(
c("solon", "sanyo", "atersa", "sunpower", "schott_efg", "BP",
"solarworld", "schott_main", "wurth", "first", "mhi"), c("monthly_LR",
"monthly_CSD", "monthly_HW"))) `
I also have another matrix which contains the uncertainties associated with each value of the first matrix :
monthly_LR_uncertainty monthly_CSD_uncertainty monthly_HW_uncertainty
solon 0.14 0.09 0.07
sanyo 0.13 0.06 0.07
atersa NA 0.13 NA
sunpower 0.18 0.18 0.20
schott_efg 0.14 0.07 0.06
BP 0.14 0.14 0.15
solarworld 0.16 0.04 0.03
schott_main 0.15 0.08 0.07
wurth 0.12 0.10 0.11
first 0.08 0.09 0.10
mhi 0.08 0.07 0.08
and the corresponding dput output so you can reproduce it :
structure(c(0.14, 0.13, NA, 0.18, 0.14, 0.14, 0.16, 0.15, 0.12,
0.08, 0.08, 0.09, 0.06, 0.13, 0.18, 0.07, 0.14, 0.04, 0.08, 0.1,
0.09, 0.07, 0.07, 0.07, NA, 0.2, 0.06, 0.15, 0.03, 0.07, 0.11,
0.1, 0.08), .Dim = c(11L, 3L), .Dimnames = list(c("solon", "sanyo",
"atersa", "sunpower", "schott_efg", "BP", "solarworld", "schott_main",
"wurth", "first", "mhi"), c("monthly_LR_uncertainty", "monthly_CSD_uncertainty",
"monthly_HW_uncertainty"))) `
Now, here is the type of scatterplot I would like to obtain (I almost got what I wanted with boxplots, but now I'd prefer a scatterplot) :
I would like the x-axis to be categorical, as it is when I make a boxplot (i.e. one category for each of the 11 rows).
And above each category on the x-axis, I would like to have 3 points corresponding to the 3 values in the corresponding row of the first matrix, with boxes indicating the uncertainty on the results.
The image below (a graph in an article written by a researcher of the same lab than me, but that is gone from the lab now) shows exactly what I want to obtain. The 11 categories on the x-axis correspond to my 11 rows. The three different points for each category (blue, red, green) correspond to the 3 values for each category in the first matrix. And the box associated to each point corresponds to the uncertainty (given in the second matrix).

Let's say a is the table with means and b is the table with uncertainties:
# x axis width
x = 1:nrow(a)
# horizontal offset for data of same group
offset = 0.2
# draw empty plot
plot(NULL, xlim=c(0, nrow(a)), ylim=c(0, max(a,na.rm=T)), xaxt='n', ylab='performance', xlab='')
# add error bars (arrows with angle=90)
arrows(x0=x, x1=x, y0 = a[,1]-0.5*b[,1], y1 = a[,1]+0.5*b[,1], angle=90, code=3, len=0.01)
arrows(x0=x-offset, x1=x-offset, y0 = a[,2]-0.5*b[,2], y1 = a[,2]+0.5*b[,2], angle=90, code=3, col=2, len=0.02)
arrows(x0=x+offset, x1=x+offset, y0 = a[,3]-0.5*b[,3], y1 = a[,3]+0.5*b[,3], angle=90, code=3, col=4, len=0.02)
# add points
points(x, a[,1], pch=1, col=1)
points(x-offset, a[,2], pch=2, col=2)
points(x+offset, a[,3], pch=3, col=4)
# axis labels
axis(1, at = 1:nrow(a), labels = rownames(a), las=3)
# add legend
legend(x='topleft', legend=colnames(a), col=c(1,2,4), pch=c(1,2,3), inset=0.02)
Also have a look at this answer for grouped boxplots.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Using rbind to merge data frames - r

You can use the function merge: A=data.frame(A1=c(-0.02,1.89),B2=c(0.25,1.99),B3=c(0.17,1.87),B4=c(0.3,1.63),C2=c(0.29,1.97)) B=data.frame(A1=c(0.52,-0.04),B1=c(1.1,0.08),B3=c(0.72,0.03),B5=c(0.78,0.06),C2=c(0.78,0.25)) C<-merge(A, B, all=T) View(C)

Related

R - transpose dataframe with multiple id columns and multiple variables [duplicate]

pivot_longer into multiple columns

Replacing zeroes with NA for values preceding non-zero

Keep NA values in their original position when reordering vector

Scatterplot with categorical x-axis (and uncertainties boxes) in R

Categories

Resources