Related
I have less experience in R and I need help tidying my plot as it looks messy. Also, my project is to find the best minimal route from Seoul to every city and back to Seoul. It is almost like Traveling Salesman Problem (TSP) but there are some cities needed to be visited more than once as it is the only way to reach certain cities. I don't know how to do and what packages to use.
This is my code for igraph plot
library(igraph)
g1 <- graph( c("Seoul","Incheon","Seoul","Goyang","Seoul","Seongnam","Seoul",
"Bucheon","Seoul","Uijeongbu","Seoul","Gimpo",
"Seoul","Gwangmyeong", "Seoul", "Hanam","Seoul", "Guri",
"Seoul","Gwacheon","Busan","Changwon","Busan","Gimhae",
"Busan","Jeju","Busan","Yangsan","Busan","Geoje",
"Incheon","Goyang","Incheon","Bucheon","Incheon","Siheung",
"Incheon","Jeju","Incheon","Gimpo","Daegu","Gumi",
"Daegu","Gyeongsan","Daegu","Yeongcheon","Daejeon",
"Cheongju","Daejeon","Nonsan","Daejeon","Gongju",
"Daejeon","Gyeryong","Gwangju","Naju","Suwon","Yongin",
"Suwon","Seongnam","Suwon","Hwaseong","Suwon","Ansan",
"Suwon","Gunpo","Suwon","Osan","Suwon","Uiwang",
"Ulsan","Yangsan","Ulsan","Gyeongju","Ulsan","Miryang",
"Yongin","Seongnam","Yongin","Hwaseong","Yongin","Pyeongtaek",
"Yongin","Gwangju-si","Yongin","Icheon","Yongin","Anseong",
"Yongin","Uiwang","Goyang","Gimpo","Goyang","Paju","Goyang",
"Yangju","Changwon","Gimhae","Changwon","Jinju","Changwon",
"Miryang","Seongnam","Gwangju-si","Seongnam","Hanam","Seongnam",
"Uiwang","Seongnam","Gwacheon","Hwaseong","Ansan","Hwaseong",
"Pyeongtaek","Hwaseong","Gunpo","Hwaseong","Osan","Cheongju",
"Cheonan","Cheongju","Sejong","Bucheon","Siheung","Bucheon",
"Gwangmyeong","Ansan","Anyang","Ansan","Siheung","Ansan",
"Gunpo","Namyangju","Uijeongbu","Namyangju","Chuncheon",
"Namyangju","Hanam","Namyangju","Guri","Cheonan","Pyeongtaek",
"Cheonan","Sejong","Cheonan","Asan","Cheonan","Anseong",
"Jeonju","Gimje","Gimhae","Yangsan","Gimhae","Miryang",
"Pyeongtaek","Asan","Pyeongtaek","Osan","Pyeongtaek","Anseong",
"Pyeongtaek","Dangjin","Anyang","Siheung","Anyang","Gwangmyeong",
"Anyang","Gunpo","Anyang","Gwacheon","Siheung","Gwangmyeong",
"Siheung","Gunpo","Pohang","Yeongcheon","Pohang","Gyeongju",
"Jeju","Gimpo","Jeju","Mokpo","Jeju","Seogwipo","Uijeongbu",
"Yangju","Uijeongbu","Pocheon","Paju","Yangju","Gumi","Gimcheon",
"Gumi","Sangju","Gwangju-si","Hanam","Gwangju-si","Icheon",
"Gwangju-si","Yeoju","Sejong","Gongju","Wonju","Chungju",
"Wonju","Jecheon","Wonju","Yeoju","Jinju","Sacheon", "Yangsan",
"Miryang","Asan","Gongju","Iksan","Gunsan","Iksan","Nonsan",
"Iksan","Gimje","Chuncheon","Pocheon","Gyeongsan","Yeongcheon",
"Gunpo","Uiwang","Suncheon","Yeosu","Suncheon","Gwangyang",
"Gunsan","Gimje","Gyeongju","Yeongcheon","Geoje","Tongyeong",
"Osan","Anseong","Yangju","Pocheon","Yangju","Dongducheon",
"Icheon","Anseong","Icheon","Yeoju","Mokpo","Naju","Chungju",
"Jecheon","Chungju","Yeoju","Chungju","Mungyeong","Gangneung",
"Donghae","Gangneung","Sokcho","Seosan","Dangjin","Andong",
"Yeongju","Pocheon","Dongducheon","Gimcheon","Sangju","Tongyeong",
"Sacheon","Nonsan","Gongju","Nonsan","Boryeong","Nonsan",
"Gyeryong","Gongju","Boryeong","Gongju","Gyeryong","Jeongeup",
"Gimje","Yeongju","Mungyeong","Yeongju","Taebaek","Sangju",
"Mungyeong","Sokcho","Samcheok","Samcheok","Taebaek",
"Suncheon","Gwangju"), directed=F)
E(g1)$distance <- c(27, 16, 20, 19, 20, 24, 14, 20, 15, 15, 36, 18, 299, 18, 53,
25, 8, 12, 440, 18, 36, 13, 33, 33, 31, 26, 15, 20, 13, 20,
19, 18, 13, 16, 10, 33, 36, 51, 24, 31, 28, 21, 23, 27, 22,
11, 12, 24, 18, 52, 27, 11, 13, 19, 13, 14, 34, 20, 23, 38,
18, 12, 9, 12, 7, 10, 19, 53, 11, 8, 20, 27, 11, 26, 24, 18,
33, 25, 18, 15, 44, 14, 12, 4, 5, 12, 12, 37, 21, 458, 146,
27, 10, 23, 24, 21, 36, 14, 23, 36, 21, 39, 33, 26, 20, 32,
40, 20, 29, 18, 47, 24, 4, 27, 19, 22, 29, 17, 24, 18, 13,
32, 18, 37, 28, 43, 51, 33, 56, 20, 28, 12, 30, 38, 29, 47,
17, 47, 22, 26, 46, 51, 20, 10, 36,63)
plot(g1, edge.label=E(g1)$distance,
vertex.label.cex=0.6, vertex.size=4)
igraph plot
Using trick from https://or.stackexchange.com/questions/5555/tsp-with-repeated-city-visits
library(data.table)
library(purrr)
library(TSP)
library(igraph)
We need to create distance matrix based on shortest paths for each pair of vertices:
vertex_names <- names(V(g1))
N <- length(vertex_names)
dt <- map(
head(seq_along(vertex_names), -1),
~data.table(
from = vertex_names[[.x]],
to = vertex_names[(.x+1):N],
path = map(
shortest_paths(g1, vertex_names[[.x]], vertex_names[(.x+1):N])[["vpath"]],
names
)
),
) %>%
rbindlist()
then we calculate distances of shortest paths:
m <- as_adjacency_matrix(g1, type = "both", attr = "distance", sparse = FALSE)
dt[, weight := map_dbl(path, ~sum(m[embed(.x, 2)[, 2:1, drop=FALSE]]))]
now we assemble new matrix:
dt <- rbind(
dt, dt[, .(from = to, to = from, path = map(path, rev), weight = weight)]
)
new_m <- matrix(0, N, N)
rownames(new_m) <- colnames(new_m) <- vertex_names
new_m[as.matrix(dt[, .(from,to)])] <- dt[["weight"]]
on this new matrix we use some heuristic to solve TSP (for exact solution you should use method="concorde"):
res <- new_m %>%
TSP() %>%
solve_TSP(repetitions = 1000, two_opt = TRUE)
now we exchange each pair of consecutive cities with shortest path:
start_city <- "Seoul"
path_dt <- c(start_city, labels(cut_tour(res, start_city)), start_city) %>%
embed(2) %>%
.[,2:1,drop = FALSE] %>%
"colnames<-"(c("from", "to")) %>%
as.data.table()
path_dt <- dt[path_dt, on = .(from ,to)]
my_path <- c(unlist(map(path_dt[["path"]], head, -1)), start_city)
my_path is heuristic solution with distance tour_length(res)
I have created a simple minimum spanning tree and now have a data frame with columns 'from', 'to' and 'distance'.
Based on this, I found communities using the Louvain method, which I plotted. As far as I understand it, for clustering and plotting I need only the columns from and to, and the distance is not used.
How can I keep the communities I found, ideally each in a different color, but remove the box around the communities?
library(igraph)
from <- c(14, 25, 18, 19, 29, 23, 24, 36, 5, 22, 21, 29, 18, 26, 2, 45, 8, 7, 36, 42, 3, 23, 13, 13, 20, 15, 13, 7, 28, 9, 6, 37, 8, 4, 15, 27, 10, 2, 39, 1, 43, 21, 14, 4, 14, 8, 9, 40, 31, 1)
to <- c(16, 26, 27, 20, 32, 34, 35, 39, 6, 32, 35, 30, 22, 28, 45, 46, 48, 12, 38, 43, 42, 24, 27, 25, 30, 20, 50, 29, 34, 49, 40, 39, 11, 41, 46, 47, 50, 16, 46, 40, 44, 31, 17, 40, 44, 23, 33, 42, 33, 1)
distance <- c(0.3177487, 0.3908324, 0.4804059, 0.4914682, 0.5610357, 0.6061082, 0.6357532, 0.6638961, 0.7269725, 0.8136463, 0.8605391, 0.8665838, 0.8755252, 0.8908454, 0.9411793, 0.9850834, 1.0641603, 1.0721154, 1.0790506, 1.1410964, 1.1925349, 1.2115428, 1.2165045, 1.2359032, 1.2580204, 1.2725243, 1.2843610, 1.2906908, 1.3070725, 1.3397053, 1.3598817, 1.3690732, 1.3744088, 1.3972220, 1.4472312, 1.4574936, 1.4654772, 1.4689660, 1.5999424, 1.6014316, 1.6305410, 1.6450413, 1.6929959, 1.7597620, 1.8113320, 2.0380866, 3.0789517, 4.0105981, 5.1212614, 0.0000000)
mst <- cbind.data.frame(from, to, distance)
g <- graph.data.frame(mst[, 1:2], directed = FALSE)
lou <- cluster_louvain(g)
set.seed(1)
plot(lou, g, vertex.label = NA, vertex.size=5)
The blobs around the groups can be turned off like this:
plot(lou, g, vertex.label = NA, vertex.size=5, mark.groups = NULL)
Do you want this?
plot(lou, g, vertex.label = NA, vertex.size = 5, mark.border = NA)
While there are functions for saving data as a separate CSV file (write.table) or as an R-data file (save, saveRDS), I have not found a way to store or print a data frame as R code that recreates this data frame.
Background of my question is that I want to include data with a script (instead of storing it in a separate file), and am thus looking for a way to generate the specific code provided the data frame already exists. I could hack on with sed or other external tools, but I wonder whether someone knows of a built-in method in R.
Try with "dput" like so:
dput(cars)
# Returns:
structure(list(speed = c(4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11,
12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 16,
16, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 20, 20, 20, 20, 20,
22, 23, 24, 24, 24, 24, 25), dist = c(2, 10, 4, 22, 16, 10, 18,
26, 34, 17, 28, 14, 20, 24, 28, 26, 34, 34, 46, 26, 36, 60, 80,
20, 26, 54, 32, 40, 32, 40, 50, 42, 56, 76, 84, 36, 46, 68, 32,
48, 52, 56, 64, 66, 54, 70, 92, 93, 120, 85)), class = "data.frame",
row.names = c(NA, -50L))
I am trying to subset my dataset as follows
df[df$Age > 19,]
I am seeing an error , Error: x and labels must be same type
I am not sure I understand this, any suggestions are much appreciated.
=================
dput(df$Age)
c(20, 11, 10, 15, 6, 23, 45, 30, 18, 11, 15, 20, 7, 18, 19, 30,
40, 16, 14, 33, 12, 22, 12, 5, NA, 18, 30, 26, 25, 27, 12, 27,
13, 15, 32, 19, NA, 18, 13, 30, 10, 16, 47, 24, 64, 21, 9, 30,
12, 33, 16, 20, 14, 10, 19, 18, 20, 18, 10, 15, 55, 18, 50, 14,
35, 18, 21, 17, 14, 9, 25, 17, 10, 16, 12, 30, 38, 10, 27, 20,
27, 16, 30, 11, 5, 20, 30, 12, 24, 11, 7, 26, 48, 25, 20, 18,
27, 18, 28, 15, 17, 46, 30, 20, 20, 14, 35, 31, 10, 26, 13, NA,
15, 3, 30, 33, 15, 43, 19, 40, 8, 16, 8, 3, 37, 40, 58, 18, 12,
19, 14, 24, 34, 30, 23, 28, 47, 29, 21, 35, 23, 47, 11, 30, 16,
25, 30, 30, 8, 18, 20, 12, 8, 18, 30, 6, 54, 60, 18, 27, 42,
6, 42, 13, 21, 15, 17, 10, 33, 15, 16, 36, 16, 52, 4, 30, 28,
30, 14, 13, 14, NA, 15, 20, 20, 24, 27, 23, 10, 13, 22, 30, 45,
10, 23, 14, 27, 19, 12, 25, 10, 10, 14, 16, 16, 19, 18, 12, 65,
18, 35, 20, 31, NA, 21, 40, 8, 13, 25, 8, 13, 15, 19, 25, 10,
9, 24, 8, 25, 30, 38, 35, 20, 12, 15, 25, 27, 39, 8, 10, NA,
12, 50, 16, 14, 22, 12, 20, 44, 13, 8, 43, 48, 13, 21, 20, 42,
11, 20, 35, 53, 22, 17, 5, NA, 14, 10, 21, 33, 21, 69, 24, 15,
12, 8, 28, 11, 32, 25, 26, 21, 36, 12, 24, 20, 23, 14, 30, 50,
26, NA, 30, 22, 44, 22, 14, 30, 28, 10, 16, 32, 35, 40, 16, 40,
33, 23, 25, 10, 17, 10, 14, 22, 14, 25, 20, 39, 24, 52, 16, 34,
26, 23, 11, 12, 70, 59, 12, 38, 22, 13, 40, 57, 30, 7, 21, 20,
30, 12, 13, 5, 19, 35, 56, 17, 40, 48, 19, 8, 30, 21, 5, 40,
16, 22, 20, 17, 16, 30, 18, 13, 17, NA, 40, 9, 24, 26, 20, 22,
17, 44, 45, 18, 26, 50, 10, 21, 15, NA, 20, 12, 16, 54, 15, 16,
33, 22, 26, 60, 35, 11, 30, 16, 48, 16, 16, 16, 10, 14, 15, 23,
17, 18, NA, 49, 12, 7, 18, 24, 17, 14, 30, 13, 6, 51, 36, 16,
10, 43, 34, 15, 12, 15, 15, 17, 40, 58, 15, 33, 16, 48, 25, 15,
16, 5, NA, 40, 34, 10, 30, 30, 30, 15, 15, 12, 5, 10, 20, 18,
20, 16, 20, 26, 12, 14, 14, 20, 12, 30, 30, 29, 22, 19, 26, 11,
23, 40, 30, 16, 50, 20, 25, 29, 40, 44, 20, 40, 8, 16, 15, 38,
11, 27, 63, 16, NA, 47, 65, 21, 29, 30, 16, 21, 25, 16, 23, 5,
17, 22, 12, 14, 27, NA, 16, 9, 33, 11, 15, 34, 41, 30, 33, 15,
25, 40, 25, 12, 12, 17, 14)
My friend suggested me to try to solve this problem before interview, but I have no idea on how to approach it.
I need to write a code to shuffle a deck of 52 cards without using a built-in standard random function.
Update
Thanks to Yifei Wu, his answer was very helpful.
Here is a link for my github project where I executed the given algorithm
https://github.com/Dantsj16/Shuffle-Without-Random.git
Your question does not say it must be a random shuffle of 52 cards. There is such a thing as a perfect shuffle, where a riffle shuffle is done with the top card remaining on the top and every other card comes from the other half of the deck. Many magicians and card sharks can do this shuffle as desired. It is well known that eight perfect shuffles in a row of a standard 52-card deck returns the cards to their original order, if the top card remains on top for each shuffle.
Here are 8 perfect shuffles in python Note that this shuffle is done differently than an actual manual shuffle would be done, to simplify the code.
In [1]: d0=[x for x in range(1,53)] # the card deck
In [2]: print(d0)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]
In [3]: d1=d0[::2]+d0[1::2] # a perfect shuffle
In [4]: print(d1)
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52]
In [5]: d2=d1[::2]+d1[1::2]
In [6]: d3=d2[::2]+d2[1::2]
In [7]: d4=d3[::2]+d3[1::2]
In [8]: d5=d4[::2]+d4[1::2]
In [9]: d6=d5[::2]+d5[1::2]
In [10]: d7=d6[::2]+d6[1::2]
In [11]: d8=d7[::2]+d7[1::2]
In [12]: print(d8)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]
In [13]: print(d0 == d8)
True
If you want the perfect shuffle as done by hand, use
d1=[None]*52
d1[::2]=d0[:26]
d1[1::2]=d0[26:]
This gives, for d1,
[1, 27, 2, 28, 3, 29, 4, 30, 5, 31, 6, 32, 7, 33, 8, 34, 9, 35, 10, 36, 11, 37, 12, 38, 13, 39, 14, 40, 15, 41, 16, 42, 17, 43, 18, 44, 19, 45, 20, 46, 21, 47, 22, 48, 23, 49, 24, 50, 25, 51, 26, 52]
Let me know if you really need a random shuffle. I can adapt my Borland Delphi code into python if you need it.