Ordering the x-axis in an R graph - r

I have a data.frame that looks like:
gvs order labels
1 -2.3321916 1 Adygei
2 -1.4996229 5 Basque
3 1.7958170 15 French
4 2.5543214 19 Italian
5 -2.7758460 33 Orcadian
6 -1.9659984 39 Russian
7 2.1239768 41 Sardinian
8 -1.8515908 47 Tuscan
9 -1.5597359 6 Bedouin
10 -1.2534511 14 Druze
11 -0.1625003 31 Mozabite
12 -1.0265275 35 Palestinian
13 -0.8519079 2 Balochi
14 -2.4279528 8 Brahui
15 -3.1717421 9 Burusho
16 -0.9258497 17 Hazara
17 -1.2207974 21 Kalash
18 -1.0325107 24 Makrani
19 -3.2102686 37 Pathan
20 -0.9377928 43 Sindhi
21 -1.7657017 48 Uygurf
22 -0.5058627 10 Cambodian
23 -0.7819299 12 Dai
24 -1.4095947 13 Daur
25 2.2810477 16 Han
26 -0.9007551 18 Hezhen
27 2.6614486 20 Japanese
28 -0.9441980 23 Lahu
29 -0.7237586 29 Miao
30 -0.9452944 30 Mongola
31 -1.2035258 32 Naxi
32 -0.7703779 34 Oroqen
33 -3.0895998 42 She
34 -0.7037952 45 Tu
35 -1.9311354 46 Tujia
36 -0.5423822 49 Xibo
37 -1.6244801 50 Yakut
38 -0.9049735 51 Yi
39 -2.6491331 11 Colombian
40 2.3706977 22 Karitiana
41 -2.7590587 26 Maya
42 -0.9614190 38 Pima
43 -1.6961014 44 Surui
44 -0.8449225 28 Melanesian
45 -1.1163019 36 Papuan
46 -0.9298674 3 BantuKenya
47 -2.8859587 4 BantuSouthAfrica
48 -1.4494841 7 BiakaPygmy
49 -0.7381369 25 Mandenka
50 -0.5644325 27 MbutiPygmy
51 -0.9195156 40 San
52 2.0949378 52 Yoruba
I would like to graph the column gvs along the x-axis in the order of the column order, and then have the label for each point along the x-axis to be from the column labels. Does anyone know how this is done? I want the graph to look like a less colorful version of the graphs in figure-5 in this paper http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004412

Based on your comments, it looks like (1) labels doesn't correspond to gvs and order, and (2) if I sort the first two columns by order, the data frame will be ordered properly. Please let me know if this is not correct.
Sort first two columns by order, leaving third column alone:
df[,c("gvs","order")] = df[order(df$order), c("gvs","order")]
Set the ordering of labels based on the current ordering of labels in the sample data frame:
df$labels = factor(df$labels, levels=df$labels)
Add a grouping variable for region. I did this by creating a new group each time the alphabetic ordering of labels went "backwards". The regions are just numbers here, but you can give them descriptive names if you want to use them:
df$group = c(0, cumsum(diff(match(substr(df$labels,1,1), LETTERS)) < 0))
Add fake p-values (since point size was based on p-value in the graph you linked to):
set.seed(595)
df$p.value = runif(nrow(df), 0, 0.5)
Plot the data, including a different color for each regional group, point-size based on p-value, and black borders around points with p < 0.05. geom_line add the regional means:
library(dplyr)
ggplot(df, aes(labels, gvs, size=p.value, fill=factor(group))) +
geom_line(data=df %>% group_by(group) %>% mutate(gvs=mean(gvs)),
aes(group=group, colour=factor(group)), size=0.8,alpha=0.5) +
geom_point(pch=21, stroke=1, aes(color=p.value<0.05)) +
theme_bw() +
theme(axis.text.x=element_text(angle=-90, hjust=0, vjust=0.5),
panel.grid.major=element_blank(),
panel.grid.minor=element_blank()) +
scale_size_continuous(name="p values", limits=c(0, 0.5), breaks=seq(0,1,0.1), range=c(4,1)) +
scale_color_manual(values=c(hcl(seq(15,375,length.out=8),100,65)[1:7],NA,"black")) +
labs(x="Language", fill="Region") +
guides(colour=FALSE,
size=guide_legend(reverse=TRUE, override.aes=list(color=NA,fill="grey50")),
fill=guide_legend(reverse=TRUE, override.aes=list(color=NA, size=3)))

Read data frame:
df <- data.frame(gvs = c(-2.3321916, -1.4996229, 1.795817, 2.5543214, -2.775846, -1.9659984,
2.1239768, -1.8515908, -1.5597359, -1.2534511, -0.1625003, -1.0265275,
-0.8519079, -2.4279528, -3.1717421, -0.9258497, -1.2207974, -1.0325107,
-3.2102686, -0.9377928, -1.7657017, -0.5058627, -0.7819299, -1.4095947,
2.2810477, -0.9007551, 2.6614486, -0.944198, -0.7237586, -0.9452944,
-1.2035258, -0.7703779, -3.0895998, -0.7037952, -1.9311354, -0.5423822,
-1.6244801, -0.9049735, -2.6491331, 2.3706977, -2.7590587, -0.961419,
-1.6961014, -0.8449225, -1.1163019, -0.9298674, -2.8859587, -1.4494841,
-0.7381369, -0.5644325, -0.9195156, 2.0949378),
order = c(1L, 5L, 15L, 19L, 33L, 39L, 41L, 47L, 6L, 14L, 31L, 35L, 2L,
8L, 9L, 17L, 21L, 24L, 37L, 43L, 48L, 10L, 12L, 13L, 16L, 18L,
20L, 23L, 29L, 30L, 32L, 34L, 42L, 45L, 46L, 49L, 50L, 51L, 11L,
22L, 26L, 38L, 44L, 28L, 36L, 3L, 4L, 7L, 25L, 27L, 40L, 52L),
labels = c("Adygei", "Basque", "French", "Italian", "Orcadian", "Russian",
"Sardinian", "Tuscan", "Bedouin", "Druze", "Mozabite", "Palestinian",
"Balochi", "Brahui", "Burusho", "Hazara", "Kalash", "Makrani",
"Pathan", "Sindhi", "Uygurf", "Cambodian", "Dai", "Daur", "Han",
"Hezhen", "Japanese", "Lahu", "Miao", "Mongola", "Naxi", "Oroqen",
"She", "Tu", "Tujia", "Xibo", "Yakut", "Yi", "Colombian", "Karitiana",
"Maya", "Pima", "Surui", "Melanesian", "Papuan", "BantuKenya",
"BantuSouthAfrica", "BiakaPygmy", "Mandenka", "MbutiPygmy", "San",
"Yoruba"))
Order data
df.ordered <- df[ order(df$order) , ]
And some simple (ugly) sample plotting which you can surely improve upon (maybe with ggplot)
plot(df.ordered$gvs, pch = 19)
axis(1, at=1:52, labels=df.ordered$labels, las=2)

Another option that doesn't rely on the sorting of the dataframe is to use the limits parameter of a discrete scale (which as a side benefit can allow you do do more arbitrary ordering when plotting.)
df <-read.csv(/path/to/file/df.csv')
xorder <-df[order(df$order),'labels']
ggplot(df, aes(x=labels, y=gvs, size=gvs)) +
geom_point() +
scale_x_discrete(limits=xorder)+
theme(axis.text.x=element_text(angle=90))

Related

Drop observations if there are inconsistent variables within same ID [duplicate]

This question already has answers here:
Select groups based on number of unique / distinct values
(4 answers)
Closed 7 months ago.
df <- structure(list(id = c(123L, 123L, 123L, 45L, 45L, 9L, 103L, 103L,
22L, 22L, 22L), age = c(69L, 23L, 70L, 29L, 29L, 37L, 25L, 54L,
40L, 40L, 41L)), class = "data.frame", row.names = c(NA, -11L
))
id age
1 123 69
2 123 23
3 123 70
4 45 29
5 45 29
6 9 37
7 103 25
8 103 54
9 22 40
10 22 40
11 22 41
I would like to drop all observations for an id if it is associated with different values for age. How can I do that?
I would be left with:
id age
45 29
45 29
9 37
A dplyr approach:
library(dplyr)
dat |>
group_by(id) |>
filter(n_distinct(age)==1)
Without external packages, you could use ave():
df |>
subset(ave(age, id, FUN = \(x) length(unique(x))) == 1)
# id age
# 4 45 29
# 5 45 29
# 6 9 37

tidyverse replace NA by other data frame values under condition

I have a first incomplete dataset data_incom and a second with the missing values of the first data_to_com. Using mutate(UG = case_when (INSEE == "07185" ~ 6, etc)), overwrites the "UG" column. How is it possible to replace the NA from the first dataset with the values from the second table using the tidyverse tools please?
Thank you !
data_incom <- structure(list(INSEE = c("07005", "07005", "07010", "07011",
"07011", "07012", "07019", "07025", "07026", "07032", "07033",
"07042", "07064", "07066", "07068", "07069", "07075", "07088",
"07096", "07099", "07101", "07101", "07105", "07105", "07107",
"07110", "07117", "07117", "07119", "07128", "07129", "07131",
"07144", "07153", "07154", "07159", "07161", "07161", "07168",
"07172", "07173", "07185", "07186", "07202", "07204", "07228",
"07232", "07240", "07261", "07265", "07273", "07279", "07284",
"07286", "07294", "07301", "07315", "07329", "07330", "07331",
"07338", "07338", "07347", "07187", "07265", "07334", "07262"
), UG = c(NA, NA, 2L, NA, NA, 10L, 13L, 28L, 26L, 15L, 21L, 19L,
11L, 16L, 8L, 6L, 26L, 25L, 11L, 18L, 21L, 21L, 26L, 26L, 24L,
25L, 25L, 25L, NA, 3L, 8L, 22L, 24L, NA, 28L, NA, 28L, 28L, 21L,
1L, 12L, NA, 15L, 24L, 7L, 1L, 24L, 9L, 9L, 2L, 18L, 19L, NA,
11L, 21L, 6L, NA, 24L, 18L, 28L, 8L, 8L, 3L, 24L, 2L, 20L, 24L
)), row.names = c(NA, -67L), class = "data.frame")
data_to_com <-structure(list(INSEE=c("07185", "07284", "07315", "07153", "07119", "07159", "070005"),
UG=c(6L,20L,24L,28L,26L,15L,17L)), row.names = c(NA,7L), class = "data.frame")
You can use the following solution. There are some INSEE values in the first data set that weren't present in the second data set and I just left them as NA values.
library(dplyr)
library(tidyr)
data_incom %>%
filter(is.na(UG)) %>%
rowwise() %>%
mutate(UG = list(data_to_com$UG[grepl(INSEE, data_to_com$INSEE)])) %>%
unnest(cols = c(UG)) -> data_com
data_com %>%
bind_rows(data_incom %>%
filter(!INSEE %in% data_com$INSEE)) %>%
arrange(INSEE)
# A tibble: 67 x 2
INSEE UG
<chr> <int>
1 07005 NA
2 07005 NA
3 07010 2
4 07011 NA
5 07011 NA
6 07012 10
7 07019 13
8 07025 28
9 07026 26
10 07032 15
# ... with 57 more rows
using coalesce in these kind of scenarios.
Using left_join will result in inclusion of all rows from incom
use coalesce thereafter
further use .keep = 'unused' in mutate argument to retain wanted rows only
library(dplyr)
data_incom %>% left_join(data_to_com, by = 'INSEE') %>%
mutate(UG = coalesce(UG.x, UG.y), .keep = 'unused')
INSEE UG
1 07005 NA
2 07005 NA
3 07010 2
4 07011 NA
5 07011 NA
6 07012 10
7 07019 13
8 07025 28
9 07026 26
10 07032 15
11 07033 21
12 07042 19
13 07064 11
14 07066 16
15 07068 8
16 07069 6
17 07075 26
18 07088 25
19 07096 11
20 07099 18
21 07101 21
22 07101 21
23 07105 26
24 07105 26
25 07107 24
26 07110 25
27 07117 25
28 07117 25
29 07119 26
30 07128 3
31 07129 8
32 07131 22
33 07144 24
34 07153 28
35 07154 28
36 07159 15
37 07161 28
38 07161 28
39 07168 21
40 07172 1
41 07173 12
42 07185 6
43 07186 15
44 07202 24
45 07204 7
46 07228 1
47 07232 24
48 07240 9
49 07261 9
50 07265 2
51 07273 18
52 07279 19
53 07284 20
54 07286 11
55 07294 21
56 07301 6
57 07315 24
58 07329 24
59 07330 18
60 07331 28
61 07338 8
62 07338 8
63 07347 3
64 07187 24
65 07265 2
66 07334 20
67 07262 24

Is there a way i can calculate the area of ecdf?

I have this kind of data set
B_dong A_dong count mean count2
14 1118053 1118053 15 129.7333 15
16 1118053 1118055 18 171.8333 33
19 1118053 1118060 4 270.7500 37
13 1118053 1118052 13 374.8462 50
17 1118053 1118057 21 389.9524 71
20 1118053 1118061 9 418.0000 80
15 1118053 1118054 10 435.1000 90
1 1118053 1102054 2 3066.0000 237
44 1118053 1122060 2 3080.0000 239
2 1118053 1102059 1 4867.0000 240
test2 <- structure(list(B_dong = c(1118053L, 1118053L, 1118053L, 1118053L,1118053L, 1118053L, 1118053L, 1118053L, 1118053L, 1118053L),A_dong = c(1118053L, 1118055L, 1118060L, 1118052L, 1118057L,1118061L, 1118054L, 1102054L, 1122060L, 1102059L), count = c(15L,18L, 4L, 13L, 21L, 9L, 10L, 2L, 2L, 1L), mean = c(129.733333333333,171.833333333333, 270.75, 374.846153846154, 389.952380952381,418, 435.1, 3066, 3080, 4867), count2 = c(15L, 33L, 37L,50L, 71L, 80L, 90L, 237L, 239L, 240L)), row.names = c(14L,16L, 19L, 13L, 17L, 20L, 15L, 1L, 44L, 2L), class = "data.frame")
And I plotted a ecdf with this data set using
plot(ecdf(test2$mean), ylab="Fn(x)", verticals = FALSE,
col.01line = "gray70", pch = 19)
Is there a way i can calculate the area(integral) of the ecdf graph in R?
Sure. If you look at the plot, it's effectively a series of rectangles (red lines added):
segments(sort(unique(test2$mean)), 0,
sort(unique(test2$mean)), ecdf(test2$mean)(sort(unique(test2$mean))),
col = "red")
The area under each "step" is merely the width and height of each of those rectangles.
The widths are just the differences between each of the unique values (sorted):
diff(sort(unique(test2$mean)))
# [1] 42.10000 98.91667 104.09615 15.10623 28.04762 17.10000 2630.90000 14.00000 1787.00000
The heights are found empirically from the ecdf call's return-value function on these values:
ecdf(test2$mean)(sort(unique(test2$mean)))
# [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Since there are one fewer widths (the right-most is always a single point ... and we all know that the area under a single point is zero), we can discard the last height.
So we just multiple the heights by the widths and sum them up:
sum(
head(ecdf(test2$mean)(sort(unique(test2$mean))), n = -1) *
diff(sort(unique(test2$mean)))
)
# [1] 3546.678

Loops with random sampling from a matrix and distance calculation

I got a list of nodes, and I need to randomly assign 'p' hubs to 'n' clients.
I got the following data, where the first row shows:
The total number of nodes.
The requested number of hubs.
The total supply capacity for each hub.
The following lines show:
The first column the node number.
The second column the "x" coordinate.
The third the "y" coordinate.
Below I will show the raw data, adding colnames() it would look something like this:
total_nodes hubs_required total_capacity
50 5 120
node number x_coordinate y_coordinate node_demand
1 2 62 3
2 80 25 14
3 36 88 1
4 57 23 14
. . . .
. . . .
. . . .
50 1 58 2
The x and y values are provided so we can calculate the Euclidean distance.
nodes:
50 5 120
1 2 62 3
2 80 25 14
3 36 88 1
4 57 23 14
5 33 17 19
6 76 43 2
7 77 85 14
8 94 6 6
9 89 11 7
10 59 72 6
11 39 82 10
12 87 24 18
13 44 76 3
14 2 83 6
15 19 43 20
16 5 27 4
17 58 72 14
18 14 50 11
19 43 18 19
20 87 7 15
21 11 56 15
22 31 16 4
23 51 94 13
24 55 13 13
25 84 57 5
26 12 2 16
27 53 33 3
28 53 10 7
29 33 32 14
30 69 67 17
31 43 5 3
32 10 75 3
33 8 26 12
34 3 1 14
35 96 22 20
36 6 48 13
37 59 22 10
38 66 69 9
39 22 50 6
40 75 21 18
41 4 81 7
42 41 97 20
43 92 34 9
44 12 64 1
45 60 84 8
46 35 100 5
47 38 2 1
48 9 9 7
49 54 59 9
50 1 58 2
I extracted the information from the first line.
nodes <- as.matrix(read.table(data))
header<-colnames(nodes)
clean_header <-gsub('X','',header)
requested_hubs <- as.numeric(clean_header[2])
max_supply_capacity <- as.numeric(clean_header[3])
I need to randomly select 5 nodes, that will act as hubs
set.seed(37)
node_to_hub <-nodes[sample(nrow(nodes),requested_hubs,replace = FALSE),]
Then randomly I need to assign nodes to each hub calculate the distances between the hub and each one of the nodes and when the max_supply_capacity(120) is exceeded select the following hub and repeat the process.
After the final iteration I need to return the cumulative sum of distances for all the hubs.
I need to repeat this process 100 times and return the min() value of the cumulative sum of distances.
This is where I'm completely stuck since I'm not sure how to loop through a matrix let alone when I have to select elements randomly.
I got the following elements:
capacity <- c(numeric()) # needs to be <= to 120
distance_sum <- c(numeric())
global_hub_distance <- c(numeric())
The formula for the euclidean distance (rounded) would be as below but I'm not sure how I can reflect the random selection when assigning nodes.
distance <-round(sqrt(((node_to_hub[i,2]-nodes[i,2]))^2+(node_to_hub[random,3]-nodes[random,3])^2))
The idea for the loop I think I need is below, but as I mentioned before I don't know how to deal with the sample client selection, and the distance calculation of the random clients.
for(i in 1:100){
node_to_hub
for(i in 1:nrow(node_to_hub){
#Should I randomly sample the clients here???
while(capacity < 120){
node_demand <- nodes[**random**,3]
distance <-round(sqrt(((node_to_hub[i,2]-nodes[i,2]))^2+(node_to_hub[**random**,3]-nodes[**random**,3])^2))
capacity <-c(capacity, node_demand)
distance_sum <- c(distance_sum,distance)
}
global_hub_distance <- c(global_hub_distance,distance_sum)
capacity <- 0
distance_sum <- 0
}
min(global_hub_distance)
}
Not EXACTLY sure what you are looking for but this code may be able to help you. It's not extremely fast, as instead of using a while to stop after hitting your total_capacity it just does a cumsum on the full node list and find the place where you exceed 120.
nodes <- structure(list(node_number = 1:50,
x = c(2L, 80L, 36L, 57L, 33L, 76L, 77L, 94L,
89L, 59L, 39L, 87L, 44L, 2L, 19L, 5L,
58L, 14L, 43L, 87L, 11L, 31L, 51L, 55L,
84L, 12L, 53L, 53L, 33L, 69L, 43L, 10L,
8L, 3L, 96L, 6L, 59L, 66L, 22L, 75L, 4L,
41L, 92L, 12L, 60L, 35L, 38L, 9L, 54L, 1L),
y = c(62L, 25L, 88L, 23L, 17L, 43L, 85L, 6L, 11L,
72L, 82L, 24L, 76L, 83L, 43L, 27L, 72L, 50L,
18L, 7L, 56L, 16L, 94L, 13L, 57L, 2L, 33L, 10L,
32L, 67L, 5L, 75L, 26L, 1L, 22L, 48L, 22L, 69L,
50L, 21L, 81L, 97L, 34L, 64L, 84L, 100L, 2L, 9L, 59L, 58L),
node_demand = c(3L, 14L, 1L, 14L, 19L, 2L, 14L, 6L,
7L, 6L, 10L, 18L, 3L, 6L, 20L, 4L,
14L, 11L, 19L, 15L, 15L, 4L, 13L,
13L, 5L, 16L, 3L, 7L, 14L, 17L,
3L, 3L, 12L, 14L, 20L, 13L, 10L,
9L, 6L, 18L, 7L, 20L, 9L, 1L, 8L,
5L, 1L, 7L, 9L, 2L)),
.Names = c("node_number", "x", "y", "node_demand"),
class = "data.frame", row.names = c(NA, -50L))
total_nodes = nrow(nodes)
hubs_required = 5
total_capacity = 120
iterations <- 100
track_sums <- matrix(NA, nrow = iterations, ncol = hubs_required)
colnames(track_sums) <- paste0("demand_at_hub",1:hubs_required)
And then I prefer using a function for distance, in this case A and B are 2 separate vectors with c(x,y) and c(x,y).
euc.dist <- function(A, B) round(sqrt(sum((A - B) ^ 2))) # distances
The Loop:
for(i in 1:iterations){
# random hub selection
hubs <- nodes[sample(1:total_nodes, hubs_required, replace = FALSE),]
for(h in 1:hubs_required){
# sample the nodes into a random order
random_nodes <- nodes[sample(1:nrow(nodes), size = nrow(nodes), replace = FALSE),]
# cumulative sum their demand, and get which number passes 120,
# and subtract 1 to get the node before that
last <- which(cumsum(random_nodes$node_demand) > total_capacity) [1] - 1
# get sum of all distances to those nodes (1 though the last)
all_distances <- apply(random_nodes[1:last,], 1, function(rn) {
euc.dist(A = hubs[h,c("x","y")],
B = rn[c("x","y")])
})
track_sums[i,h] <- sum(all_distances)
}
}
min(rowSums(track_sums))
EDIT
as a function:
hubnode <- function(nodes, hubs_required = 5, total_capacity = 120, iterations = 10){
# initialize results matrices
track_sums <- node_count <- matrix(NA, nrow = iterations, ncol = hubs_required)
colnames(track_sums) <- paste0("demand_at_hub",1:hubs_required)
colnames(node_count) <- paste0("nodes_at_hub",1:hubs_required)
# user defined distance function (only exists wihtin hubnode() function)
euc.dist <- function(A, B) round(sqrt(sum((A - B) ^ 2)))
for(i in 1:iterations){
# random hub selection
assigned_hubs <- sample(1:nrow(nodes), hubs_required, replace = FALSE)
hubs <- nodes[assigned_hubs,]
assigned_nodes <- NULL
for(h in 1:hubs_required){
# sample the nodes into a random order
assigned_nodes <- sample((1:nrow(nodes))[-assigned_hubs], replace = FALSE)
random_nodes <- nodes[assigned_nodes,]
# cumulative sum their demand, and get which number passes 120,
# and subtract 1 to get the node before that
last <- which(cumsum(random_nodes$node_demand) > total_capacity) [1] - 1
# if there are none
if(is.na(last)) last = nrow(random_nodes)
node_count[i,h] <- last
# get sum of all distances to those nodes (1 though the last)
all_distances <- apply(random_nodes[1:last,], 1, function(rn) {
euc.dist(A = hubs[h,c("x","y")],
B = rn[c("x","y")])
})
track_sums[i,h] <- sum(all_distances)
}
}
return(list(track_sums = track_sums, node_count = node_count))
}
output <- hubnode(nodes, iterations = 100)
node_count <- output$node_count
track_sums <- output$track_sums
plot(rowSums(node_count),
rowSums(track_sums), xlab = "Node Count", ylab = "Total Demand", main = paste("Result of", 100, "iterations"))
min(rowSums(track_sums))

Converting a list of vectors and numbers (from replicate) into a data frame

After running the replicate() function [a close relative of lapply()] on some data I ended up with an output that looks like this
myList <- structure(list(c(55L, 13L, 61L, 38L, 24L), 6.6435972422341, c(37L, 1L, 57L, 8L, 40L), 5.68336098665417, c(19L, 10L, 23L, 52L, 60L ),
5.80430476680636, c(39L, 47L, 60L, 14L, 3L), 6.67554407822367,
c(57L, 8L, 53L, 6L, 2L), 5.67149520387856, c(40L, 8L, 21L,
17L, 13L), 5.88446015238962, c(52L, 21L, 22L, 55L, 54L),
6.01685181395007, c(12L, 7L, 1L, 2L, 14L), 6.66299948053721,
c(41L, 46L, 21L, 30L, 6L), 6.67239635545512, c(46L, 31L,
11L, 44L, 32L), 6.44174324641076), .Dim = c(2L, 10L), .Dimnames = list(
c("reps", "score"), NULL))
In this case the vectors of integers are indexes that went into a function that I won't get into and the scalar-floats are scores.
I'd like a data frame that looks like
Index 1 Index 2 Index 3 Index 4 Index 5 Score
55 13 61 38 24 6.64
37 1 57 8 40 5.68
19 10 23 52 60 5.80
and so on.
Alternatively, a matrix of the indexes and an array of the values would be fine too.
Things that haven't worked for me.
data.frame(t(random.out)) # just gives a data frame with a column of vectors and another of scalars
cbind(t(random.out)) # same as above
do.call(rbind, random.out) # intersperses vectors and scalars
I realize other people have similar problems,
eg. Convert list of vectors to data frame
but I can't quite find an example with this particular kind of vectors and scalars together.
myList[1,] is a list of vectors, so you can combine them into a matrix with do.call and rbind. myList[2,] is a list of single scores, so you can combine them into a vector with unlist:
cbind(as.data.frame(do.call(rbind, myList[1,])), Score=unlist(myList[2,]))
# V1 V2 V3 V4 V5 Score
# 1 55 13 61 38 24 6.643597
# 2 37 1 57 8 40 5.683361
# 3 19 10 23 52 60 5.804305
# 4 39 47 60 14 3 6.675544
# 5 57 8 53 6 2 5.671495
# 6 40 8 21 17 13 5.884460
# 7 52 21 22 55 54 6.016852
# 8 12 7 1 2 14 6.662999
# 9 41 46 21 30 6 6.672396
# 10 46 31 11 44 32 6.441743

Resources