Inline data.frame inclusion in R script - r

While there are functions for saving data as a separate CSV file (write.table) or as an R-data file (save, saveRDS), I have not found a way to store or print a data frame as R code that recreates this data frame.
Background of my question is that I want to include data with a script (instead of storing it in a separate file), and am thus looking for a way to generate the specific code provided the data frame already exists. I could hack on with sed or other external tools, but I wonder whether someone knows of a built-in method in R.

Try with "dput" like so:
dput(cars)
# Returns:
structure(list(speed = c(4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11,
12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 16,
16, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 20, 20, 20, 20, 20,
22, 23, 24, 24, 24, 24, 25), dist = c(2, 10, 4, 22, 16, 10, 18,
26, 34, 17, 28, 14, 20, 24, 28, 26, 34, 34, 46, 26, 36, 60, 80,
20, 26, 54, 32, 40, 32, 40, 50, 42, 56, 76, 84, 36, 46, 68, 32,
48, 52, 56, 64, 66, 54, 70, 92, 93, 120, 85)), class = "data.frame",
row.names = c(NA, -50L))

Related

Remove community boxes in igraph

I have created a simple minimum spanning tree and now have a data frame with columns 'from', 'to' and 'distance'.
Based on this, I found communities using the Louvain method, which I plotted. As far as I understand it, for clustering and plotting I need only the columns from and to, and the distance is not used.
How can I keep the communities I found, ideally each in a different color, but remove the box around the communities?
library(igraph)
from <- c(14, 25, 18, 19, 29, 23, 24, 36, 5, 22, 21, 29, 18, 26, 2, 45, 8, 7, 36, 42, 3, 23, 13, 13, 20, 15, 13, 7, 28, 9, 6, 37, 8, 4, 15, 27, 10, 2, 39, 1, 43, 21, 14, 4, 14, 8, 9, 40, 31, 1)
to <- c(16, 26, 27, 20, 32, 34, 35, 39, 6, 32, 35, 30, 22, 28, 45, 46, 48, 12, 38, 43, 42, 24, 27, 25, 30, 20, 50, 29, 34, 49, 40, 39, 11, 41, 46, 47, 50, 16, 46, 40, 44, 31, 17, 40, 44, 23, 33, 42, 33, 1)
distance <- c(0.3177487, 0.3908324, 0.4804059, 0.4914682, 0.5610357, 0.6061082, 0.6357532, 0.6638961, 0.7269725, 0.8136463, 0.8605391, 0.8665838, 0.8755252, 0.8908454, 0.9411793, 0.9850834, 1.0641603, 1.0721154, 1.0790506, 1.1410964, 1.1925349, 1.2115428, 1.2165045, 1.2359032, 1.2580204, 1.2725243, 1.2843610, 1.2906908, 1.3070725, 1.3397053, 1.3598817, 1.3690732, 1.3744088, 1.3972220, 1.4472312, 1.4574936, 1.4654772, 1.4689660, 1.5999424, 1.6014316, 1.6305410, 1.6450413, 1.6929959, 1.7597620, 1.8113320, 2.0380866, 3.0789517, 4.0105981, 5.1212614, 0.0000000)
mst <- cbind.data.frame(from, to, distance)
g <- graph.data.frame(mst[, 1:2], directed = FALSE)
lou <- cluster_louvain(g)
set.seed(1)
plot(lou, g, vertex.label = NA, vertex.size=5)
The blobs around the groups can be turned off like this:
plot(lou, g, vertex.label = NA, vertex.size=5, mark.groups = NULL)
Do you want this?
plot(lou, g, vertex.label = NA, vertex.size = 5, mark.border = NA)

R rewriting a for loop

I've got a loop in my code that I would like to rewrite so running the code takes a little less time to compete. I know you allways have to avoid loops in the code but I can't think of an another way to accomplice my goal.
So I've got a dataset "df_1531" containing a lot of data that I need to cut into pieces by using subset() (if anyone knows a better way, let me know ;) ). I've got a vector with 21 variable names on which I like assign a subset of df_1531. Furthermore the script contains 22 variables with constrains (shift_XY_time).
So, this is my code now...
# list containing different slots
shift_time_list<- c(startdate, shift_1m_time, shift_1a_time, shift_1n_time,
shift_2m_time, shift_2a_time, shift_2n_time,
shift_3m_time, shift_3a_time, shift_3n_time,
shift_4m_time, shift_4a_time, shift_4n_time,
shift_5m_time, shift_5a_time, shift_5n_time,
shift_6m_time, shift_6a_time, shift_6n_time,
shift_7m_time, shift_7a_time, shift_7n_time)
# List with subset names
shift_sub_list <- c("shift_1m_sub", "shift_1a_sub", "shift_1n_sub",
"shift_2m_sub", "shift_2a_sub", "shift_2n_sub",
"shift_3m_sub", "shift_3a_sub", "shift_3n_sub",
"shift_4m_sub", "shift_4a_sub", "shift_4n_sub",
"shift_5m_sub", "shift_5a_sub", "shift_5n_sub",
"shift_6m_sub", "shift_6a_sub", "shift_6n_sub",
"shift_7m_sub", "shift_7a_sub", "shift_7n_sub")
# The actual loop that I'd like to rewrite
for (i in 1:21) {
assign(shift_sub_list[i], subset(df_1531, df_1531$'PLS FFM' >= shift_time_list[i] & df_1531$'PLS FFM' < shift_time_list[i+1]))
}
Running the loop takes approximately 6 or 7 seconds. So, if anyone knows a better/cleaner or quicker way to write my code, I desperately like to hear your suggestion/opinion.
**Reproducible example **
mydata <- cars
dput(cars)
structure(list(speed = c(4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11,
12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 16,
16, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 20, 20, 20, 20, 20,
22, 23, 24, 24, 24, 24, 25), dist = c(2, 10, 4, 22, 16, 10, 18,
26, 34, 17, 28, 14, 20, 24, 28, 26, 34, 34, 46, 26, 36, 60, 80,
20, 26, 54, 32, 40, 32, 40, 50, 42, 56, 76, 84, 36, 46, 68, 32,
48, 52, 56, 64, 66, 54, 70, 92, 93, 120, 85)), class = "data.frame", row.names = c(NA,
-50L))
dist_interval_list <- c( 0, 5, 10, 15,
20, 25, 30, 35,
40, 45, 50, 55,
60, 65, 70, 75,
80, 85, 90, 95,
100, 105, 110, 115, 120)
var_name_list <- c("var_name_1a", "var_name_1b", "var_name_1c", "var_name_1d",
"var_name_2a", "var_name_2b", "var_name_2c", "var_name_2d",
"var_name_3a", "var_name_3b", "var_name_3c", "var_name_3d",
"var_name_4a", "var_name_4b", "var_name_4c", "var_name_4d",
"var_name_5a", "var_name_5b", "var_name_5c", "var_name_5d",
"var_name_6a", "var_name_6b", "var_name_6c", "var_name_6d")
for (i in 1:24){
assign(var_name_list[i], subset(mydata,
mydata$dist >= dist_interval_list[i] &
mydata$dist < dist_interval_list[i+1]))
}
Starting with the 'reproducible' part and the information that the final aim is to summarize another column, it is possible to exploit the fact that the intervals are non-overlapping and simply use the cut function.
library(tidyverse)
mydata %>%
mutate(interval = cut(dist, breaks = dist_interval_list)) %>%
group_by(interval) %>%
summarise(sum = sum(speed))
This should be much faster and will also help you not to get lost in a messy environment full of variables (which are actually part of your data). You want to keep all your data in a single data frame as long as possible;) You probably want to follow with something like purrrlyr::invoke_rows at the final modeling step, if your function does not work with data frames.

r subset `x` and `labels` must be same type

I am trying to subset my dataset as follows
df[df$Age > 19,]
I am seeing an error , Error: x and labels must be same type
I am not sure I understand this, any suggestions are much appreciated.
=================
dput(df$Age)
c(20, 11, 10, 15, 6, 23, 45, 30, 18, 11, 15, 20, 7, 18, 19, 30,
40, 16, 14, 33, 12, 22, 12, 5, NA, 18, 30, 26, 25, 27, 12, 27,
13, 15, 32, 19, NA, 18, 13, 30, 10, 16, 47, 24, 64, 21, 9, 30,
12, 33, 16, 20, 14, 10, 19, 18, 20, 18, 10, 15, 55, 18, 50, 14,
35, 18, 21, 17, 14, 9, 25, 17, 10, 16, 12, 30, 38, 10, 27, 20,
27, 16, 30, 11, 5, 20, 30, 12, 24, 11, 7, 26, 48, 25, 20, 18,
27, 18, 28, 15, 17, 46, 30, 20, 20, 14, 35, 31, 10, 26, 13, NA,
15, 3, 30, 33, 15, 43, 19, 40, 8, 16, 8, 3, 37, 40, 58, 18, 12,
19, 14, 24, 34, 30, 23, 28, 47, 29, 21, 35, 23, 47, 11, 30, 16,
25, 30, 30, 8, 18, 20, 12, 8, 18, 30, 6, 54, 60, 18, 27, 42,
6, 42, 13, 21, 15, 17, 10, 33, 15, 16, 36, 16, 52, 4, 30, 28,
30, 14, 13, 14, NA, 15, 20, 20, 24, 27, 23, 10, 13, 22, 30, 45,
10, 23, 14, 27, 19, 12, 25, 10, 10, 14, 16, 16, 19, 18, 12, 65,
18, 35, 20, 31, NA, 21, 40, 8, 13, 25, 8, 13, 15, 19, 25, 10,
9, 24, 8, 25, 30, 38, 35, 20, 12, 15, 25, 27, 39, 8, 10, NA,
12, 50, 16, 14, 22, 12, 20, 44, 13, 8, 43, 48, 13, 21, 20, 42,
11, 20, 35, 53, 22, 17, 5, NA, 14, 10, 21, 33, 21, 69, 24, 15,
12, 8, 28, 11, 32, 25, 26, 21, 36, 12, 24, 20, 23, 14, 30, 50,
26, NA, 30, 22, 44, 22, 14, 30, 28, 10, 16, 32, 35, 40, 16, 40,
33, 23, 25, 10, 17, 10, 14, 22, 14, 25, 20, 39, 24, 52, 16, 34,
26, 23, 11, 12, 70, 59, 12, 38, 22, 13, 40, 57, 30, 7, 21, 20,
30, 12, 13, 5, 19, 35, 56, 17, 40, 48, 19, 8, 30, 21, 5, 40,
16, 22, 20, 17, 16, 30, 18, 13, 17, NA, 40, 9, 24, 26, 20, 22,
17, 44, 45, 18, 26, 50, 10, 21, 15, NA, 20, 12, 16, 54, 15, 16,
33, 22, 26, 60, 35, 11, 30, 16, 48, 16, 16, 16, 10, 14, 15, 23,
17, 18, NA, 49, 12, 7, 18, 24, 17, 14, 30, 13, 6, 51, 36, 16,
10, 43, 34, 15, 12, 15, 15, 17, 40, 58, 15, 33, 16, 48, 25, 15,
16, 5, NA, 40, 34, 10, 30, 30, 30, 15, 15, 12, 5, 10, 20, 18,
20, 16, 20, 26, 12, 14, 14, 20, 12, 30, 30, 29, 22, 19, 26, 11,
23, 40, 30, 16, 50, 20, 25, 29, 40, 44, 20, 40, 8, 16, 15, 38,
11, 27, 63, 16, NA, 47, 65, 21, 29, 30, 16, 21, 25, 16, 23, 5,
17, 22, 12, 14, 27, NA, 16, 9, 33, 11, 15, 34, 41, 30, 33, 15,
25, 40, 25, 12, 12, 17, 14)

Shuffle deck of cards without built-in random functon

My friend suggested me to try to solve this problem before interview, but I have no idea on how to approach it.
I need to write a code to shuffle a deck of 52 cards without using a built-in standard random function.
Update
Thanks to Yifei Wu, his answer was very helpful.
Here is a link for my github project where I executed the given algorithm
https://github.com/Dantsj16/Shuffle-Without-Random.git
Your question does not say it must be a random shuffle of 52 cards. There is such a thing as a perfect shuffle, where a riffle shuffle is done with the top card remaining on the top and every other card comes from the other half of the deck. Many magicians and card sharks can do this shuffle as desired. It is well known that eight perfect shuffles in a row of a standard 52-card deck returns the cards to their original order, if the top card remains on top for each shuffle.
Here are 8 perfect shuffles in python Note that this shuffle is done differently than an actual manual shuffle would be done, to simplify the code.
In [1]: d0=[x for x in range(1,53)] # the card deck
In [2]: print(d0)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]
In [3]: d1=d0[::2]+d0[1::2] # a perfect shuffle
In [4]: print(d1)
[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52]
In [5]: d2=d1[::2]+d1[1::2]
In [6]: d3=d2[::2]+d2[1::2]
In [7]: d4=d3[::2]+d3[1::2]
In [8]: d5=d4[::2]+d4[1::2]
In [9]: d6=d5[::2]+d5[1::2]
In [10]: d7=d6[::2]+d6[1::2]
In [11]: d8=d7[::2]+d7[1::2]
In [12]: print(d8)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52]
In [13]: print(d0 == d8)
True
If you want the perfect shuffle as done by hand, use
d1=[None]*52
d1[::2]=d0[:26]
d1[1::2]=d0[26:]
This gives, for d1,
[1, 27, 2, 28, 3, 29, 4, 30, 5, 31, 6, 32, 7, 33, 8, 34, 9, 35, 10, 36, 11, 37, 12, 38, 13, 39, 14, 40, 15, 41, 16, 42, 17, 43, 18, 44, 19, 45, 20, 46, 21, 47, 22, 48, 23, 49, 24, 50, 25, 51, 26, 52]
Let me know if you really need a random shuffle. I can adapt my Borland Delphi code into python if you need it.

How do you include data frame output inside warnings and errors?

How can I include array or data frame output in a message, warning or error?
By default, the output is collapsed by deparseing each column, which isn't useful. Here's an example, using the cars dataset.
message(cars)
## c(4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 20, 20, 20, 20, 20, 22, 23, 24, 24, 24, 24, 25)c(2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28, 26, 34, 34, 46, 26, 36, 60, 80, 20, 26, 54, 32, 40, 32, 40, 50, 42, 56, 76, 84, 36, 46, 68, 32, 48, 52, 56, 64, 66, 54, 70, 92, 93, 120, 85)
Print the output, recapture it using capture.output(), and collapse into a single string separated by newlines.
print_and_capture <- function(x)
{
paste(capture.output(print(x)), collapse = "\n")
}
message(print_and_capture(cars))
## speed dist
## 1 4 2
## 2 4 10
## # etc.
stop("An error was found in the cars dataset:\n", print_and_capture(cars))
## Error: An error was found in the cars dataset:
## speed dist
## 1 4 2
## 2 4 10
## # etc.
print_and_capture() is now available in assertive.base.

Resources