Related
I will play in the Brazilian Lottery with my friends. I requested every one of them to choose seven numbers. I create a variable for all of them.
pestana = c(04, 15, 29, 36, 54, 25, 07)
carol = c(7, 22, 30, 35, 44, 51, 57)
davi = c(8, 13, 21, 29, 37, 42, 55)
valerio = c(30, 20, 33, 14, 7, 41, 54)
victor = c(09, 11, 26, 33, 38, 52, 57)
Then, I created a list with all of the numbers, and a list with unique numbers (in order to avoid repeated numbers)
list = c(carol, davi, pestana, valerio, victor, diuli, cynara)
list2 = unique(list)
Finally, I made a sample() for the list2
sample(list2, 7)
After that, I was wondering. Is it possible for me not to use the unique and not have repeated numbers? Because for instance, that way, repeated numbers have the same probability of appearing, when in fact, they have more (for instance, seven appeared three times).
How about this:
pestana = c(04, 15, 29, 36, 54, 25, 07)
carol = c(7, 22, 30, 35, 44, 51, 57)
davi = c(8, 13, 21, 29, 37, 42, 55)
valerio = c(30, 20, 33, 14, 7, 41, 54)
victor = c(09, 11, 26, 33, 38, 52, 57)
list = c(carol, davi, pestana, valerio, victor)
l <- c(unlist(list))
nums <- table(l)
probs <- nums/sum(nums)
sample(names(probs), 7, prob = probs, replace=FALSE)
#> [1] "4" "33" "44" "11" "29" "52" "8"
Created on 2022-12-14 by the reprex package (v2.0.1)
Using the prob argument, you can make some values more likely to show up than others.
I created a graph G and I have a node view as following < 0, 1,2,... 100>
I randomly removed 20 nodes and the node view of this new graph misses the nodes I removed randomly. to be precise for example , in the new graph there are some nodes missing(since they are removed
node view <0,1,3,5,6,7,9 ...100>
however, I want this graph to be a new graph having node view such as the following:
<0,1,2....80>
is there any solution? I tried relabeling, coping the same graph, they didn't work
PS. my nodes have attribute label equal to either 0,1
and i want to preserve them
Here is one approach you can take. After removing your nodes from the graph you can relabel the remaining nodes using nx.relabel_nodes to get the node view you want. See example below:
import networkx as nx
import numpy as np
#Creating random graph
N_nodes=50
G=nx.erdos_renyi_graph(N_nodes,p=0.25)
#Removing random nodes
N_del_nodes=10
del_node_list=np.random.choice(N_nodes,size=N_del_nodes,replace=False)
G.remove_nodes_from(del_node_list)
print('Node view without relabelling:' +str(G.nodes))
#Relabelling graph
label_mapping={list(G.nodes)[j]:j for j in range(N_nodes-N_del_nodes)}
G_rel=nx.relabel_nodes(G, label_mapping)
print('Node view with relabelling:' +str(G_rel.nodes))
And the output gives:
Node view without relabelling:[0, 1, 2, 5, 6, 8, 9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 30, 31, 32, 33, 34, 36, 37, 38, 40, 41, 44, 45, 46, 47, 48, 49]
Node view with relabelling:[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
I need to create a knight tour plot out of such an exemplary matrix:
Mat = matrix(c(1, 38, 55, 34, 3, 36, 19, 22,
54, 47, 2, 37, 20, 23, 4, 17,
39, 56, 33, 46, 35, 18, 21, 10,
48, 53, 40, 57, 24, 11, 16, 5,
59, 32, 45, 52, 41, 26, 9, 12,
44, 49, 58, 25, 62, 15, 6, 27,
31, 60, 51, 42, 29, 8, 13, 64,
50, 43, 30, 61, 14, 63, 28, 7), nrow=8, ncol=8, byrow=T)
Numbers indicate the order in which knight moves to create a path.
I have a lot of these kind of results with chessboard up to 75 in size, however I have no way of presenting them in a readable way, I found out that R, given the matrix, is capable of creating a plot like this:
link (this one is 50x50 in size)
So for the matrix I presented the lines between two points occur between the numbers like: 1 - 2 - 3 - 4 - 5 - ... - 64, in the end creating a path presented in the link, but for the 8x8 chessboard, instead of 50x50
However, I have a very limited time to learn R good enough to accomplish it, I am desperate for any kind of direction. How hard does creating such code in R, that tranforms any matrix into such plot, is going to be ? Or is it something trivial ? Any code samples would be a blessing
You can use geom_path as described here: ggplot2 line plot order
In order to do so you need to convert the matrix into a tibble.
coords <- tibble(col = rep(1:8, 8),
row = rep(1:8, each = 8))
coords %>%
mutate(order = Mat[8 * (col - 1) + row]) %>%
arrange(order) %>%
ggplot(aes(x = col, y = row)) +
geom_path() +
geom_text(aes(y = row + 0.25, label = order)) +
coord_equal() # Ensures a square board.
You can subtract .5 from the col and row positions to give a more natural chess board feel.
I've got a loop in my code that I would like to rewrite so running the code takes a little less time to compete. I know you allways have to avoid loops in the code but I can't think of an another way to accomplice my goal.
So I've got a dataset "df_1531" containing a lot of data that I need to cut into pieces by using subset() (if anyone knows a better way, let me know ;) ). I've got a vector with 21 variable names on which I like assign a subset of df_1531. Furthermore the script contains 22 variables with constrains (shift_XY_time).
So, this is my code now...
# list containing different slots
shift_time_list<- c(startdate, shift_1m_time, shift_1a_time, shift_1n_time,
shift_2m_time, shift_2a_time, shift_2n_time,
shift_3m_time, shift_3a_time, shift_3n_time,
shift_4m_time, shift_4a_time, shift_4n_time,
shift_5m_time, shift_5a_time, shift_5n_time,
shift_6m_time, shift_6a_time, shift_6n_time,
shift_7m_time, shift_7a_time, shift_7n_time)
# List with subset names
shift_sub_list <- c("shift_1m_sub", "shift_1a_sub", "shift_1n_sub",
"shift_2m_sub", "shift_2a_sub", "shift_2n_sub",
"shift_3m_sub", "shift_3a_sub", "shift_3n_sub",
"shift_4m_sub", "shift_4a_sub", "shift_4n_sub",
"shift_5m_sub", "shift_5a_sub", "shift_5n_sub",
"shift_6m_sub", "shift_6a_sub", "shift_6n_sub",
"shift_7m_sub", "shift_7a_sub", "shift_7n_sub")
# The actual loop that I'd like to rewrite
for (i in 1:21) {
assign(shift_sub_list[i], subset(df_1531, df_1531$'PLS FFM' >= shift_time_list[i] & df_1531$'PLS FFM' < shift_time_list[i+1]))
}
Running the loop takes approximately 6 or 7 seconds. So, if anyone knows a better/cleaner or quicker way to write my code, I desperately like to hear your suggestion/opinion.
**Reproducible example **
mydata <- cars
dput(cars)
structure(list(speed = c(4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11,
12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 16,
16, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 20, 20, 20, 20, 20,
22, 23, 24, 24, 24, 24, 25), dist = c(2, 10, 4, 22, 16, 10, 18,
26, 34, 17, 28, 14, 20, 24, 28, 26, 34, 34, 46, 26, 36, 60, 80,
20, 26, 54, 32, 40, 32, 40, 50, 42, 56, 76, 84, 36, 46, 68, 32,
48, 52, 56, 64, 66, 54, 70, 92, 93, 120, 85)), class = "data.frame", row.names = c(NA,
-50L))
dist_interval_list <- c( 0, 5, 10, 15,
20, 25, 30, 35,
40, 45, 50, 55,
60, 65, 70, 75,
80, 85, 90, 95,
100, 105, 110, 115, 120)
var_name_list <- c("var_name_1a", "var_name_1b", "var_name_1c", "var_name_1d",
"var_name_2a", "var_name_2b", "var_name_2c", "var_name_2d",
"var_name_3a", "var_name_3b", "var_name_3c", "var_name_3d",
"var_name_4a", "var_name_4b", "var_name_4c", "var_name_4d",
"var_name_5a", "var_name_5b", "var_name_5c", "var_name_5d",
"var_name_6a", "var_name_6b", "var_name_6c", "var_name_6d")
for (i in 1:24){
assign(var_name_list[i], subset(mydata,
mydata$dist >= dist_interval_list[i] &
mydata$dist < dist_interval_list[i+1]))
}
Starting with the 'reproducible' part and the information that the final aim is to summarize another column, it is possible to exploit the fact that the intervals are non-overlapping and simply use the cut function.
library(tidyverse)
mydata %>%
mutate(interval = cut(dist, breaks = dist_interval_list)) %>%
group_by(interval) %>%
summarise(sum = sum(speed))
This should be much faster and will also help you not to get lost in a messy environment full of variables (which are actually part of your data). You want to keep all your data in a single data frame as long as possible;) You probably want to follow with something like purrrlyr::invoke_rows at the final modeling step, if your function does not work with data frames.
I'm wondering how residuals in aov() are calculated. I looked already for hours but can't figure it out.
I use an ANOVA for repeated measurements.
Data <- data.frame(subject = factor(rep(1:10, 3)),
age = factor(c(rep(4, 10),
rep(10, 10),
rep(35, 10))),
weight = c(20, 9, 16, 14, 30, 26, 26, 27, 13, 15,
27, 18, 30, 26, 43, 48, 38, 38, 22, 47,
50, 44, 52, 46, 64, 70, 73, 57, 54, 63))
ANOVA_MW <- aov(weight ~ age +
Error(subject / age),
data = Data)
summary(ANOVA_MW)
I know that the following command gives me something.
round(ANOVA_MW$subject:age$residuals, 2)
However, I get only 20 rather than 30 values. It starts with 11. This has propably something to do with the residuals of subject. I don't know.
The result of proj(ANOVA_MW) gives me the residuals that I calculated manually (value - personal mean - group mean + overall mean).
My question is, what are the other residuals above and why is everybody (so it feels) using them for normality testing?
I would love some helpful input. I already dove into the function but could not find an explanation.
Thanks.
residual sum of square = total sum of square - Factor sum of squares
In your case, factor is age.
The residuals should be normally distributed, it is one of the assumption of ANOVA.