How to combine a list and data.frame in R? - r

I have a list named data3, like this (from JSON file).
data3 <- list(structure(c(14, 7, 10, 4, 7), .Names = c("0", "3", "2", "14", "7")), structure(c(16, 10, 12, 6, 7), .Names = c("0", "3", "2", "14", "7")), structure(c(77708, 39434, 45489, 30223, 34829 ), .Names = c("0", "3", "2", "14", "7")), structure(c(9828, 6855, 7967, 5638, 6263), .Names = c("0", "3", "2", "14", "7")), structure(c(7626, 5783, 6406, 5074, 5348), .Names = c("0", "3", "2", "14", "7")), structure(c(1012, 404, 546, 251, 300), .Names = c("0", "3", "2", "14", "7")))
and it has some missing values like
data3[4]
[[1]]
0 3 2 14 7
9828 6855 7967 5638 6263
> data3[400]
[[1]]
0 3 2
44 35 38
And I have a data.frame named data1, like this:
date d1 d2 d3 d4
3 20150402 4 5693 0 NEW
4 20150402 4 5693 0 UPGRADE(OEM)
5 20150402 4 5693 0 UPGRADE(ONLINE)
...
I need to combine them like
date d1 d2 d3 d4 0 2 3 7 14
20150402 4 5693 0 NEW 77708 39434 45489 30223 34829
The problem is that not all of data3 has the same number of elements.
I have tried this:
aaa <- NULL
for (i in 1:482){
aaa <- cbind(data1[i, ],data3[[i]])
}
but it didn't work.
May be there is another way to do this but I have no idea.

I can not reproduce your data1 data.frame so I am posting an example that uses first 6 rows of popular iris dataset:
> data3 <- list(structure(c(14, 7, 10, 4, 7), .Names = c("0", "3", "2", "14", "7")), structure(c(16, 10, 12, 6, 7), .Names = c("0", "3", "2", "14", "7")), structure(c(77708, 39434, 45489, 30223, 34829 ), .Names = c("0", "3", "2", "14", "7")), structure(c(9828, 6855, 7967, 5638, 6263), .Names = c("0", "3", "2", "14", "7")), structure(c(7626, 5783, 6406, 5074, 5348), .Names = c("0", "3", "2", "14", "7")), structure(c(1012, 404, 546, 251, 300), .Names = c("0", "3", "2", "14", "7")))
>
>
> t(as.data.frame(data3)) -> x
> rownames(x) <- NULL
> x
0 3 2 14 7
[1,] 14 7 10 4 7
[2,] 16 10 12 6 7
[3,] 77708 39434 45489 30223 34829
[4,] 9828 6855 7967 5638 6263
[5,] 7626 5783 6406 5074 5348
[6,] 1012 404 546 251 300
> cbind(iris[1:6,],x)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species 0 3 2 14 7
1 5.1 3.5 1.4 0.2 setosa 14 7 10 4 7
2 4.9 3.0 1.4 0.2 setosa 16 10 12 6 7
3 4.7 3.2 1.3 0.2 setosa 77708 39434 45489 30223 34829
4 4.6 3.1 1.5 0.2 setosa 9828 6855 7967 5638 6263
5 5.0 3.6 1.4 0.2 setosa 7626 5783 6406 5074 5348
6 5.4 3.9 1.7 0.4 setosa 1012 404 546 251 300

Related

Errorbars and bar plots having different positions in ggplot

I have a dataframe df
> df
id zone mean SE
1 1 1 0.9378712 0.10
2 1 2 2.4830645 0.09
3 1 3 0.7191759 0.09
4 1 4 1.3030844 0.09
5 1 5 1.2497096 0.11
6 1 6 0.7247015 0.15
7 1 7 0.1776825 0.16
8 1 8 1.4755258 0.13
9 1 9 1.0902742 0.16
10 1 10 0.2679057 0.08
11 1 12 0.7677998 0.09
12 2 1 1.2728942 0.14
13 2 2 1.3189574 0.07
14 2 3 1.0934750 0.14
15 2 4 1.3024298 0.10
16 2 5 1.3029797 0.11
17 2 6 1.0878356 0.12
18 2 7 0.5390098 0.12
19 2 8 1.2761170 0.09
20 2 9 1.1395524 0.12
21 2 10 0.6863418 0.14
22 2 12 1.1534048 0.12
23 3 1 1.2963668 0.14
24 3 2 1.3032349 0.07
25 3 3 1.1302980 0.14
26 3 4 1.3049038 0.10
27 3 5 1.3221782 0.11
28 3 6 1.0464710 0.14
29 3 7 0.4997006 0.13
30 3 8 1.2777002 0.09
31 3 9 1.1480874 0.12
32 3 10 0.6844529 0.15
33 3 12 1.1593346 0.13
34 4 1 1.2819611 0.14
35 4 2 1.4276992 0.07
36 4 3 1.1061886 0.14
37 4 4 1.3572913 0.11
38 4 5 1.3588146 0.12
39 4 6 1.1318426 0.14
40 4 7 0.5321167 0.12
41 4 8 1.3701237 0.10
42 4 9 1.1996266 0.13
43 4 10 0.6977050 0.14
44 4 12 1.2620727 0.14
Now it can be seen in zones that there is no 11 number, after 10 it comes 12.
So when I plot it automatically it comes like this
axis_labels <- c("first","second","third","fourth","fifth","sixth","seventh","eigth","ninth","tenth","eleventh")
axis_labels <- setNames(axis_labels, 1:11)
ggplot(df, aes(x=factor(zone), y=mean, fill = id)) +
geom_col(position = position_dodge()) +
scale_fill_discrete(labels = c("1" = "M", "2" = "I","3" = "Mi","4"="C"))+
scale_x_discrete(labels = axis_labels) +
theme(axis.title.x = element_blank(),
axis.line.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()) +
theme(plot.margin = unit(rep(0, 5), "pt"))+
geom_errorbar(aes(x=zone, ymin=mean-SE, ymax=mean+SE), width=0.4, position = position_dodge(.9))+
theme_bw()
So the bars at eleventh that are read are actually the twelveth zone in the dataframe but the errorbars are in the actual twelfth place. How can solve this problem without changing the whole code?
The problem comes down to a few things:
Up front, I'll make inferences about column class: I'm fairly confident that id should be character, but I'm not certain about zone. I'll guess character for now.
You use factor(zone) in one aesthetic and zone in another; either all of them should be factor, or none, otherwise you are confusing ggplot2 (and me).
You have 12 in your zone but your labels say eleventh, not sure if that's a typo or something else.
I think the fixes are to make a "proper" factor variable.
df$zone <- as.character(df$zone) # just in case
axis_labels <- setNames(axis_labels, c(1:10,12)) # no 11s in your data, no 12s in your labels
df$zone2 <- factor(axis_labels[df$zone], levels = axis_labels)
ggplot(df, aes(x=zone2, y=mean, fill = id)) +
geom_col(position = position_dodge()) +
scale_fill_discrete(labels = c("1" = "M", "2" = "I","3" = "Mi","4"="C"))+
scale_x_discrete(labels = axis_labels) +
theme(axis.title.x = element_blank(),
axis.line.x = element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()) +
theme(plot.margin = unit(rep(0, 5), "pt"))+
geom_errorbar(aes(x=zone2, ymin=mean-SE, ymax=mean+SE), width=0.4, position = position_dodge(.9))+
theme_bw()
Data:
df <- structure(list(id = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "4", "4", "4", "4", "4", "4", "4", "4", "4", "4", "4"), zone = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "12", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "12", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "12", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "12"), mean = c(0.9378712, 2.4830645, 0.7191759, 1.3030844, 1.2497096, 0.7247015, 0.1776825, 1.4755258, 1.0902742, 0.2679057, 0.7677998, 1.2728942, 1.3189574, 1.093475, 1.3024298, 1.3029797, 1.0878356, 0.5390098, 1.276117, 1.1395524, 0.6863418, 1.1534048, 1.2963668, 1.3032349, 1.130298, 1.3049038, 1.3221782, 1.046471, 0.4997006, 1.2777002, 1.1480874, 0.6844529, 1.1593346, 1.2819611, 1.4276992, 1.1061886, 1.3572913, 1.3588146, 1.1318426, 0.5321167, 1.3701237, 1.1996266, 0.697705, 1.2620727), SE = c(0.1, 0.09, 0.09, 0.09, 0.11, 0.15, 0.16, 0.13, 0.16, 0.08, 0.09, 0.14, 0.07, 0.14, 0.1, 0.11, 0.12, 0.12, 0.09, 0.12, 0.14, 0.12, 0.14, 0.07, 0.14, 0.1, 0.11, 0.14, 0.13, 0.09, 0.12, 0.15, 0.13, 0.14, 0.07, 0.14, 0.11, 0.12, 0.14, 0.12, 0.1, 0.13, 0.14, 0.14)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41", "42", "43", "44"))

R select rows in dataframe by external vector as index

I have the following data and I want to subset some rows from the table if the name is in the vector l.
df <-data.frame("Names" = c("TIGIT", "ABCB1", "CD8B", "CD8A", "CD1C", "F2RL1", "LCP1", "LAG3", "ABL1", "CD2", "IL12A", "PSEN2", "CD3G", "CD28", "PSEN1", "ITGA1"),"1S" = c("5", "6", "8", "99", "5", "0", "1", "3", "15", "15", "34", "62", "54", "6", "8", "9"), "1T" = c("6", "4", "6", "9", "5", "11", "33", "7", "8", "24", "34", "62", "66", "4", "78", "44"))
rownames(df) <- df$Names
df <- df %>% select(-"Names") # df I have
l <- c("TIGIT", "CD8B", "CD8A", "CD1C", "F2RL1", "LCP1", "LAG3", "CD2", "PSEN2", "CD3G", "CD28", "PSEN1") # genes I want to select
I want to get the following table in the output.
X1S X1T
TIGIT 5 6
CD8B 8 6
CD8A 99 9
CD1C 5 5
F2RL1 0 11
LCP1 1 33
LAG3 3 7
CD2 15 24
PSEN2 62 62
CD3G 54 66
CD28 6 4
PSEN1 8 78
It is easier to filter by the gene names, if you keep them as a column,
instead of making them rownames.
The following changes to your code will get you the result you are lookin for.
library(tidyverse)
df <-data.frame("Names" = c("TIGIT", "ABCB1", "CD8B", "CD8A", "CD1C", "F2RL1", "LCP1", "LAG3", "ABL1", "CD2", "IL12A", "PSEN2", "CD3G", "CD28", "PSEN1", "ITGA1"),"1S" = c("5", "6", "8", "99", "5", "0", "1", "3", "15", "15", "34", "62", "54", "6", "8", "9"), "1T" = c("6", "4", "6", "9", "5", "11", "33", "7", "8", "24", "34", "62", "66", "4", "78", "44"))
genes_to_select <- c("TIGIT", "CD8B", "CD8A", "CD1C", "F2RL1", "LCP1", "LAG3", "CD2", "PSEN2", "CD3G", "CD28", "PSEN1") # genes I want to select
df <-
df %>%
filter(Names %in% genes_to_select) %>%
column_to_rownames("Names") %>%
mutate(across(.fns = as.numeric)) %>%
as.matrix()
df
#> X1S X1T
#> [1,] 5 6
#> [2,] 8 6
#> [3,] 99 9
#> [4,] 5 5
#> [5,] 0 11
#> [6,] 1 33
#> [7,] 3 7
#> [8,] 15 24
#> [9,] 62 62
#> [10,] 54 66
#> [11,] 6 4
#> [12,] 8 78
We could also use slice
library(dplyr)
library(tibble)
df %>%
slice(match(Names, l)) %>%
column_to_rownames('Names')
One line does the job:
df[rownames(df) %in% l,]
X1S X1T
TIGIT 5 6
CD8B 8 6
CD8A 99 9
CD1C 5 5
F2RL1 0 11
LCP1 1 33
LAG3 3 7
CD2 15 24
PSEN2 62 62
CD3G 54 66
CD28 6 4
PSEN1 8 78
Or if you have Names:
df[df$Names %in% l,]

Looping Multiple variables using R

I imported data from a CSV file and wanted to create a "Comparison table" between two prices of a company's stock in 2018:
Table2018<-data.frame("Comparison"= c("Opening Price bigger than Adjusted Closing Price",
"Opening Pricee smaller than Adjusted Closing Price","Total trading days"),
"January18","February18",
"March18","April18","May18","June18","July18"
,"August18","September18","October18",
"November18","December18",stringsAsFactors = FALSE)
I have this set of code (all comparisons):
Table2018[1,2]<-sum(January18$Opening.Price > January18$Adjusted.Closing.Price)
Table2018[1,3]<-sum(February18$Opening.Price > February18$Adjusted.Closing.Price)
Table2018[1,4]<-sum(March18$Opening.Price > March18$Adjusted.Closing.Price)
Table2018[1,5]<-sum(April18$Opening.Price > April18$Adjusted.Closing.Price)
Table2018[1,6]<-sum(May18$Opening.Price > May18$Adjusted.Closing.Price)
Table2018[1,7]<-sum(June18$Opening.Price > June18$Adjusted.Closing.Price)
Table2018[1,8]<-sum(July18$Opening.Price > July18$Adjusted.Closing.Price)
Table2018[1,9]<-sum(August18$Opening.Price > August18$Adjusted.Closing.Price)
Table2018[1,10]<-sum(September18$Opening.Price > September18$Adjusted.Closing.Price)
Table2018[1,11]<-sum(October18$Opening.Price > October18$Adjusted.Closing.Price)
Table2018[1,12]<-sum(November18$Opening.Price > November18$Adjusted.Closing.Price)
Table2018[1,13]<-sum(December18$Opening.Price > December18$Adjusted.Closing.Price)
For those who asked, this is the final code part and my poor looking table:
Total.trading.days <- c(length(January18$ן..Date),length(February18$ן..Date),length(March18$ן..Date),length(April18$ן..Date),length(May18$ן..Date),length(June18$ן..Date),length(July18$ן..Date),length(August18$ן..Date),length(September18$ן..Date),length(October18$ן..Date),length(November18$ן..Date),length(December18$ן..Date))
#Displaying finished table
for (i in 1:12) {
Table2018[3,i+1]<-Total.trading.days[i]
Table2018[2,i+1]<-Total.trading.days[i]-as.numeric(Table2018[1,i+1])
}
Table2018
Comparison
1 Opening Price bigger than Adjusted Closing Price
2 Opening Pricee smaller than Adjusted Closing Price
3 Total trading days
X.January18. X.February18. X.March18. X.April18.
1 20 17 17 13
2 1 1 1 4
3 21 18 18 17
X.May18. X.June18. X.July18. X.August18.
1 19 18 14 18
2 1 0 2 2
3 20 18 16 20
X.September18. X.October18. X.November18.
1 8 17 19
2 3 2 0
3 11 19 19
X.December18.
1 16
2 4
3 20
dput(head(Table2018))
structure(list(Comparison = c("Opening Price bigger than Adjusted Closing Price",
"Opening Pricee smaller than Adjusted Closing Price", "Total trading days"
), X.January18. = c("20", "1", "21"), X.February18. = c("17",
"1", "18"), X.March18. = c("17", "1", "18"), X.April18. = c("13",
"4", "17"), X.May18. = c("19", "1", "20"), X.June18. = c("18",
"0", "18"), X.July18. = c("14", "2", "16"), X.August18. = c("18",
"2", "20"), X.September18. = c("8", "3", "11"), X.October18. = c("17",
"2", "19"), X.November18. = c("19", "0", "19"), X.December18. = c("16",
"4", "20")), row.names = c(NA, 3L), class = "data.frame")
The problem main is that this is too much code. In the second part of the code, how can i make a nice loop? do i need one?
Why do i get in the table's headline this format: X.month. ?
I would love to have tips on how to present my table more beautifully as well

Convert multiple header table to long format

I am reading in an Excel table with multiple rows of headers, which, through read.csv, creates an object like this in R.
R1 <- c("X", "X.1", "X.2", "X.3", "EU", "EU.1", "EU.2", "US", "US.1", "US.2")
R2 <- c("Min Age", "Max Age", "Min Duration", "Max Duration", "1", "2", "3", "1", "2", "3")
R3 <- c("18", "21", "1", "3", "0.12", "0.32", "0.67", "0.80", "0.90", "1.01")
R4 <- c("22", "25", "1", "3", "0.20", "0.40", "0.70", "0.85", "0.98", "1.05")
R5 <- c("26", "30", "1", "3", "0.25", "0.50", "0.80", "0.90", "1.05", "1.21")
R6 <- c("18", "21", "4", "5", "0.32", "0.60", "0.95", "0.99", "1.30", "1.40")
R7 <- c("22", "25", "4", "5", "0.40", "0.70", "1.07", "1.20", "1.40", "1.50")
R8 <- c("26", "30", "4", "5", "0.55", "0.80", "1.09", "1.34", "1.67", "1.99")
table1 <- as.data.frame(rbind(R1, R2, R3, R4, R5, R6, R7, R8))
How do I now 'flatten' this so that I end up with an R table with "Min age", "Max Age", "Min Duration", "Max Duration", "Area", "Level", "Price" columns. With the "Area" column showing either "EU" or "US", the "Level" column showing either 1, 2 or 3, and then the "Price" column showing the corresponding price found in the Excel table?
I would use the gather function from tidyr if there weren't multiple header rows, but can't seem to work it with this data, any ideas?
The output should have a total of 36 rows + headers
If you skip the first row, as suggested by akrun, you will presumably end up with data that looks something like this: (with "X"s and ".1"/".2" added automatically by R)
library(tidyverse)
df <- tribble(
~Min.Age, ~Max.Age, ~Min.Duration, ~Max.Duration, ~X1.1, ~X2.1, ~X3.1, ~X1.2, ~X2.2, ~X3.2,
"18", "21", "1", "3", "0.12", "0.32", "0.67", "0.80", "0.90", "1.01",
"22", "25", "1", "3", "0.20", "0.40", "0.70", "0.85", "0.98", "1.05",
"26", "30", "1", "3", "0.25", "0.50", "0.80", "0.90", "1.05", "1.21",
"18", "21", "4", "5", "0.32", "0.60", "0.95", "0.99", "1.30", "1.40",
"22", "25", "4", "5", "0.40", "0.70", "1.07", "1.20", "1.40", "1.50",
"26", "30", "4", "5", "0.55", "0.80", "1.09", "1.34", "1.67", "1.99"
)
With this data, you can then use gather to collect all headers beginning with X into one column and price into another. You can separate the the headers into the "Level" and "Area". Finally, recode Area and remove "X" from the levels.
df %>%
gather(headers, Price, starts_with("X")) %>%
separate(headers, c("Level", "Area")) %>%
mutate(Area = if_else(Area == "1", "EU", "US"),
Level = parse_number(Level))
#> # A tibble: 36 x 7
#> Min.Age Max.Age Min.Duration Max.Duration Level Area Price
#> <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 18 21 1 3 1 EU 0.12
#> 2 22 25 1 3 1 EU 0.20
#> 3 26 30 1 3 1 EU 0.25
#> 4 18 21 4 5 1 EU 0.32
#> 5 22 25 4 5 1 EU 0.40
#> 6 26 30 4 5 1 EU 0.55
#> 7 18 21 1 3 2 EU 0.32
#> 8 22 25 1 3 2 EU 0.40
#> 9 26 30 1 3 2 EU 0.50
#> 10 18 21 4 5 2 EU 0.60
#> # ... with 26 more rows
Created on 2018-10-12 by the reprex package (v0.2.1)
P.S. You can find lots of spreadsheet munging workflows here: https://nacnudus.github.io/spreadsheet-munging-strategies/small-multiples-with-all-headers-present-for-each-multiple.html

Creating a matrix indexed by names of vectors

Basically, I have several frequency tables d1 and d2. Suppose I have:
UPDATE2: The actual structure of d1 is table. So d1 is obtained by d1 <- table(datavector), similarly for d2.
d1
Value 0 1 2 3 4 9
Freq 25 30 100 10 10 10
d2
Value 0 1 3 5 7 11 13
Freq 25 30 100 10 10 10 12
Problem: I want to produce a matrix with rows corresponding to d1 and d2 and the columns corresponding to all the distinct "Values" seen in d1 and d2. So I want to produce a matrix with rows and columns that looks like this:
[,"0"] [,"1"] [,"2"] [,"3"] [,"4"] [,"5"] [,"7"] [,"9"] [,"11"] [,"13"]
[1,] 25 30 100 10 10 0 0 10 0 0
[2,] 25 30 0 100 0 10 10 0 10 12
Notice that, there is no column number 6 , 8, and 10 because they do not appear in the frequency table. Eventually, I am trying to put this matrix into a function image.plot().
UPDATE 1: I think I can allow column number 6,8 and 10 appear in the matrix, but eventually I will have to write a for loop to eliminate columns which consist of zeros entries only.
UPDATE 3: Please note that I am in fact working with 250 data vectors and hence 250 tables (each with different length / dimension). So, I am looking for an efficient solution
UPDATE 4: Please treat the above as an abstract of what I want to achieve. The real dataset is as follow:
> dput(head(get.dist(fnn[1])))
structure(c(0.999214894571557, 0.000134589502018843, 4.48631673396142e-05,
2.24315836698071e-05, 6.72947510094213e-05, 8.97263346792284e-05,
2.24315836698071e-05, 4.48631673396142e-05, 4.48631673396142e-05,
2.24315836698071e-05, 2.24315836698071e-05, 6.72947510094213e-05,
2.24315836698071e-05, 2.24315836698071e-05, 4.48631673396142e-05,
2.24315836698071e-05, 6.72947510094213e-05, 2.24315836698071e-05
), class = "table", .Dim = 18L, .Dimnames = structure(list(d = c("0",
"1", "2", "3", "4", "5", "8", "9", "11", "12", "15", "16", "17",
"18", "20", "22", "24", "31")), .Names = "d"))
> dput(head(get.dist(fnn[2])))
structure(c(0.71161956034096, 0.199147599820547, 0.0644010767160162,
0.0147599820547331, 0.00327501121579183, 0.000807537012113055,
6.72947510094213e-05, 0.000785105428443248, 0.000179452669358457,
0.000134589502018843, 0.000112157918349035, 4.48631673396142e-05,
6.72947510094213e-05, 0.00307312696276357, 0.00107671601615074,
0.000336473755047106, 6.72947510094213e-05, 2.24315836698071e-05,
2.24315836698071e-05), class = "table", .Dim = 19L, .Dimnames = structure(list(
d = c("0", "1", "2", "3", "4", "5", "6", "9", "10", "11",
"35", "36", "37", "38", "39", "40", "41", "42", "43")), .Names = "d"))
> dput(head(get.dist(fnn[3])))
structure(c(0.747353073126963, 0.13138178555406, 0.0295423956931359,
0.0139075818752804, 0.0119560340960072, 0.0151861821444594, 0.0243382682817407,
0.00697622252131, 0.00255720053835801, 0.00161507402422611, 0.00293853746074473,
0.00116644235082997, 0.004419021982952, 0.0018842530282638, 0.000628084342754598,
0.00053835800807537, 0.000448631673396142, 0.000493494840735756,
0.000650515926424406, 0.000403768506056528, 0.000269179004037685,
0.000179452669358457, 0.000269179004037685, 0.000179452669358457,
8.97263346792284e-05, 0.000246747420367878, 4.48631673396142e-05,
4.48631673396142e-05, 4.48631673396142e-05, 2.24315836698071e-05,
2.24315836698071e-05, 4.48631673396142e-05, 2.24315836698071e-05,
2.24315836698071e-05, 2.24315836698071e-05, 2.24315836698071e-05,
2.24315836698071e-05, 2.24315836698071e-05, 2.24315836698071e-05
), class = "table", .Dim = 39L, .Dimnames = structure(list(d = c("0",
"1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12",
"13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23",
"24", "25", "26", "27", "28", "30", "32", "33", "34", "36", "37",
"38", "43", "54", "67")), .Names = "d"))
> dput(head(get.dist(fnn[4])))
structure(c(0.217743382682817, 0.49416778824585, 0.135150291610588,
0.0331987438313145, 0.0243831314490803, 0.0431135038133692, 0.022790489008524,
0.00912965455361149, 0.00614625392552714, 0.00937640197397936,
0.00244504262000897, 0.000560789591745177, 0.000493494840735756,
0.000448631673396142, 0.000336473755047106, 0.000112157918349035,
0.000201884253028264, 4.48631673396142e-05, 4.48631673396142e-05,
2.24315836698071e-05, 2.24315836698071e-05, 4.48631673396142e-05,
2.24315836698071e-05), class = "table", .Dim = 23L, .Dimnames = structure(list(
d = c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
"11", "12", "13", "14", "15", "16", "17", "18", "19", "23",
"25", "45")), .Names = "d"))
Here is an option using Reduce that seems to work given the provided data:
# make a list including your 3 dput parts
keylist <- list(d1,d2,d3)
result <- Reduce(function(...) merge(..., by="d", all=T), keylist)
result <- transform(result,row.names=d,d=NULL)
result <- t(result)
rownames(result) <- NULL
It seems to work:
> result[,c(1:2,44:45)]
0 1 54 67
[1,] 0.9992149 0.0001345895 NA NA
[2,] 0.7116196 0.1991475998 NA NA
[3,] 0.7473531 0.1313817856 2.243158e-05 2.243158e-05
I was using dataframes, but if d1 and d2 were matrices this should still work if you removed the unlist calls:
M <- matrix(0, nrow=2, ncol=12 )
colnames(M) <- as.character(0:11)
M[1 , as.character(d1[1 , 2:7]) ] <- unlist(d1[2, 2:7 ])
M
# 0 1 2 3 4 5 6 7 8 9 10 11
#[1,] 25 30 100 10 10 0 0 0 0 10 0 0
#[2,] 0 0 0 0 0 0 0 0 0 0 0 0
M[2 , as.character(d2[1 , 2:7]) ] <- unlist(d2[2, 2:7 ])
M
#-------------------
0 1 2 3 4 5 6 7 8 9 10 11
[1,] 25 30 100 10 10 0 0 0 0 10 0 0
[2,] 25 30 0 100 0 10 0 10 0 0 0 10
Converting my examples to matrices (which inherit their indexing from the matrix class):
d1a <-data.matrix(d1[,-1])
rownames(d1a) <- d1[,1]
d2a <-data.matrix(d2[,-1])
rownames(d2a) <- d2[,1]
M[1 , as.character(d1a[1 , ]) ] <-d1a[2, ]
M[2 , as.character(d2a[1 , ]) ] <-d2a[2, ]
M
#---------
0 1 2 3 4 5 6 7 8 9 10 11
[1,] 25 30 100 10 10 0 0 0 0 10 0 0
[2,] 25 30 0 100 0 10 0 10 0 0 0 10
If as thelatemail thinks (although I do not) these are one row tables then it's even easier:
M[2 , colnames(d2b) ] <-d2b
M[2 , colnames(d1b) ] <-d1b
M
0 1 2 3 4 5 6 7 8 9 10 11
[1,] 25 30 100 10 10 0 0 0 0 10 0 0
[2,] 25 30 0 100 0 10 0 10 0 0 0 10
And please, please, please, no for-loops to be used on these:
> M[ , !colSums(M==0)==2]
0 1 2 3 4 5 7 9 11
[1,] 25 30 100 10 10 0 0 10 0
[2,] 25 30 0 100 0 10 10 0 10
You don't need to remove any zero columns if you don't create any:
You can probably create dist.list this way:
dist.list= lapply(fnn, get.dist)
# 3 element example built from your example
dist.list<-{}
dist.list[[1]] <-
structure(c(0.999214894571557, 0.000134589502018843, 4.48631673396142e-05,
2.24315836698071e-05, 6.72947510094213e-05, 8.97263346792284e-05,
2.24315836698071e-05, 4.48631673396142e-05, 4.48631673396142e-05,
2.24315836698071e-05, 2.24315836698071e-05, 6.72947510094213e-05,
2.24315836698071e-05, 2.24315836698071e-05, 4.48631673396142e-05,
2.24315836698071e-05, 6.72947510094213e-05, 2.24315836698071e-05
), class = "table", .Dim = 18L, .Dimnames = structure(list(d = c("0",
"1", "2", "3", "4", "5", "8", "9", "11", "12", "15", "16", "17",
"18", "20", "22", "24", "31")), .Names = "d"))
dist.list[[2]] <-
structure(c(0.71161956034096, 0.199147599820547, 0.0644010767160162,
0.0147599820547331, 0.00327501121579183, 0.000807537012113055,
6.72947510094213e-05, 0.000785105428443248, 0.000179452669358457,
0.000134589502018843, 0.000112157918349035, 4.48631673396142e-05,
6.72947510094213e-05, 0.00307312696276357, 0.00107671601615074,
0.000336473755047106, 6.72947510094213e-05, 2.24315836698071e-05,
2.24315836698071e-05), class = "table", .Dim = 19L, .Dimnames = structure(list(
d = c("0", "1", "2", "3", "4", "5", "6", "9", "10", "11",
"35", "36", "37", "38", "39", "40", "41", "42", "43")), .Names = "d"))
dist.list[[3]] <-
structure(c(0.747353073126963, 0.13138178555406, 0.0295423956931359,
0.0139075818752804, 0.0119560340960072, 0.0151861821444594, 0.0243382682817407,
0.00697622252131, 0.00255720053835801, 0.00161507402422611, 0.00293853746074473,
0.00116644235082997, 0.004419021982952, 0.0018842530282638, 0.000628084342754598,
0.00053835800807537, 0.000448631673396142, 0.000493494840735756,
0.000650515926424406, 0.000403768506056528, 0.000269179004037685,
0.000179452669358457, 0.000269179004037685, 0.000179452669358457,
8.97263346792284e-05, 0.000246747420367878, 4.48631673396142e-05,
4.48631673396142e-05, 4.48631673396142e-05, 2.24315836698071e-05,
2.24315836698071e-05, 4.48631673396142e-05, 2.24315836698071e-05,
2.24315836698071e-05, 2.24315836698071e-05, 2.24315836698071e-05,
2.24315836698071e-05, 2.24315836698071e-05, 2.24315836698071e-05
), class = "table", .Dim = 39L, .Dimnames = structure(list(d = c("0",
"1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12",
"13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23",
"24", "25", "26", "27", "28", "30", "32", "33", "34", "36", "37",
"38", "43", "54", "67")), .Names = "d"))
all.names <- lapply(dist.list, names)
uniq.names <- unique(unlist(all.names))
M <- matrix(0, nrow=length(dist.list), ncol=length(uniq.names) )
colnames(M) <- uniq.names
for (i in seq_along(dist.list) ) {
M[i, all.names[[i]] ] <- dist.list[[i]] }
M
First 20 columns
0 1 2 3 4
[1,] 0.9992149 0.0001345895 4.486317e-05 2.243158e-05 6.729475e-05
[2,] 0.7116196 0.1991475998 6.440108e-02 1.475998e-02 3.275011e-03
[3,] 0.7473531 0.1313817856 2.954240e-02 1.390758e-02 1.195603e-02
5 8 9 11 12
[1,] 8.972633e-05 2.243158e-05 4.486317e-05 4.486317e-05 2.243158e-05
[2,] 8.075370e-04 0.000000e+00 7.851054e-04 1.345895e-04 0.000000e+00
[3,] 1.518618e-02 2.557201e-03 1.615074e-03 1.166442e-03 4.419022e-03
15 16 17 18 20
[1,] 2.243158e-05 6.729475e-05 2.243158e-05 2.243158e-05 4.486317e-05
[2,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
[3,] 5.383580e-04 4.486317e-04 4.934948e-04 6.505159e-04 2.691790e-04
# remainder excluded
If you turn your d1 and d2 into data.tables, you can easily merge them by a common key:
library(data.table)
> d1 <- data.table(value = c(0, 1, 2, 3, 4, 9), freq = c(25, 30, 100, 10, 10, 10))
> d2 <- data.table(value = c(0, 1, 3, 5, 7, 11), freq = c(25, 30, 100, 10, 10, 10))
> setkey(d1, value)
> setkey(d2, value)
> merge(d1, d2, all = TRUE)
value freq.x freq.y
1: 0 25 25
2: 1 30 30
3: 2 100 NA
4: 3 10 100
5: 4 10 NA
6: 5 NA 10
7: 7 NA 10
8: 9 10 NA
9: 11 NA 10
You can then convert the resulting data.table to a matrix, replace NAs with 0s, etc.

Resources