How do I make some calculations in R? [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm new to R and this is issue is bothering me a lot. I have a weighted and directed network and I want to do the following:
I have an igraph network. I want to calculate the edge_betweenness of all edges and create a matrix with the following columns:
edgeID, node1, node2, weight, edgeBetweenness
By edgeID I mean the index of the edge in the graph. I need the index or ID because I want to use the elements of this matrix in another matrix.
So thanks for your help.

First off, please consider camille's advice on how to provide a reproducible & minimal example. For future posts, it is always good to provide some sample data for us to work with.
In response to your question, let's generate a random sample graph and assign some random weights to every edge. I'm using a fixed random seed to ensure reproducibility of random data.
set.seed(2020)
ig <- graph.full(5)
E(ig)$weights <- sample(10, length(E(ig)), replace = T)
Then we can use igraph::as_data_frame and igraph::edge_betweenness to extract an edge list (including weights) and the edge betweenness, respectively.
transform(
edgeID = 1:length(ig),
as_data_frame(ig),
edgeBetweenness = edge_betweenness(ig))
# from to weights edgeID edgeBetweenness
# 1 1 2 7 1 1
# 2 1 3 6 2 1
# 3 1 4 8 3 1
# 4 1 5 1 4 1
# 5 2 3 1 5 1
# 6 2 4 4 6 1
# 7 2 5 10 7 1
# 8 3 4 6 8 1
# 9 3 5 1 9 1
# 10 4 5 8 10 1

Related

what is this function doing? replication [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
rep_sample_n <- function(tbl, size, replace = FALSE, reps = 1)
{
rep_tbl = replicate(reps, tbl[sample(1:nrow(tbl), size, replace = replace),
], simplify = FALSE) %>%
bind_rows() %>%
mutate(replicate = rep(1:reps, each = size)) %>%
select(replicate, everything()) %>%
group_by(replicate)
return(rep_tbl)
}
Hey, can anyone help me there? What is this function doing? Is the first line setting the variables of the function? And then what is this "replicate" doing? Thanks!
This formula replicates your data. lets say we have a dataset of 10 observations. In order to come up with additional like-datasets of your current one, you can replicate it by introducing random sampling of your dataset.
You can check out the wikipedia page on
statistical replication if you're more curious.
Lets take a simple dataframe:
df <- data.frame(x = 1:10, y = 1:10)
df
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
if we want to take a random sample of this, we can use the function rep_sample_n which takes 2 arguments tbl, size, and has another 2 optional arguments replace = FALSE, reps = 1.
Here is an example of us just taking 4 randomly selected columns from our data.
rep_sample_n(df, 4)
# A tibble: 4 x 3
# Groups: replicate [1]
replicate x y
<int> <int> <int>
1 1 1 1
2 1 3 3
3 1 4 4
4 1 10 10
Now if we want to randomly sample 15 observations from a 10 observation dataset, it will throw an error. Currently the replace = FALSE argument doesn't allow that because each time a sample row is chosen, it's removed from the pool for the next sample to be taken. In the example above, it chose the 1st observation, then it went to choose the 2nd (because we asked for 4), and it only have 2 through 10 left, and it chose the 3rd, then 4th and then 10th etc. If we allow replace = TRUE, it will choose an observation from the full dataset each time.
Notice how in this example, the 5th observation was chosen twice. That wouldn't happen with replace = FALSE
rep_sample_n(df, 4, replace = TRUE)
# A tibble: 4 x 3
# Groups: replicate [1]
replicate x y
<int> <int> <int>
1 1 5 5
2 1 3 3
3 1 2 2
4 1 5 5
Lastly and most importantly, we have the reps argument which is the basis for this function, really. It allows you randomly sample your dataset multiple times, and then combine all those samples together.
Below, we have sampled our original dataset of 10 observations by selecting 4 of them in a sample, then we replicated that 5 times, so we have 5 different sample dataframes of 4 observations each that have been combined together into one 20 observation dataframe, but each of the unique 5 dataframes has been tagged with a replicate #. The replicate column will point out which 4 observations goes with which replicated dataframe.
rep_sample_n(df, 4, reps = 5)
# A tibble: 20 x 3
# Groups: replicate [5]
replicate x y
<int> <int> <int>
1 1 8 8
2 1 4 4
3 1 3 3
4 1 1 1
5 2 4 4
6 2 5 5
7 2 8 8
8 2 3 3
9 3 6 6
10 3 1 1
11 3 3 3
12 3 2 2
13 4 5 5
14 4 7 7
15 4 10 10
16 4 3 3
17 5 7 7
18 5 10 10
19 5 3 3
20 5 9 9
I hope this provided some clarity
This function takes a data frame as input (and several input preferences). It takes a random sample of size rows from the table, with or without replacement as set by the replace input. It repeats that random sampling reps times.
Then, it binds all the samples together into a single data frame, adding a new column called "replicate" indicating which repetition of the sampling produced each row.
Finally, it "groups" the resulting table, preparing it for future group-wise operations with dplyr.
For general questions about specific functions, like "What is this "replicate" doing?", you should look at the function's help page: type ?replicate or help("replicate") to get there. It includes a description of the function and examples of how to use it. If you read the description, run the examples, and are still confused, feel free to come back with a specific question and example illustrating what you are confused by.
Similarly, for "Is the first line setting the variables of the function?", the arguments to function() are the inputs to the function. If you have basic questions about R like "How do functions work", have a look at An Introduction to R, or one of the other sources in the R Tag Wiki.

How to find the first smaller value compared to the current row in subsequent rows? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Suppose this is the data:
data<-data.frame(number=c(4,5,3,1,0),
datetime=c(as.POSIXct("2015/06/12 12:10:25"),
as.POSIXct("2015/06/12 12:10:27"),
as.POSIXct("2015/06/12 12:10:32"),
as.POSIXct("2015/06/12 12:10:33"),
as.POSIXct("2015/06/12 12:10:35")))
number datetime
1 4 2015/06/12 12:10:25
2 5 2015/06/12 12:10:27
3 3 2015/06/12 12:10:32
4 1 2015/06/12 12:10:33
5 0 2015/06/12 12:10:35
I want to calculate the time between a row to the next smaller value. Desired output:
number next smaller time between
1 4 3 7
2 5 3 5
3 3 1 1
4 1 0 2
5 0 NA NA
Example: 3 is the first number in subsequent rows which is smaller than 4.
Any suggestion? package?
Well it's not pretty and probably not super efficient, but it seems to get the job done. Here we go ...
newcols <- with(data, {
lapply(seq_along(number), function(i) {
x <- number[-(1:i)][-i][1]
c(x, abs(datetime[i] - datetime[number == x])[1])
})
})
setNames(
cbind(data[1], do.call(rbind, newcols)),
c(names(data)[1], "nextsmallest", "timediff")
)
# number nextsmallest timediff
# 1 4 3 7
# 2 5 3 5
# 3 3 1 1
# 4 1 0 2
# 5 0 NA NA
If I understand what you're trying to do, I'd suggest starting by ordering your dataframe in ascending order by 'number'. Next, add a new column using a lag function to retrieve the time value from the previous row. Finally, calculate the difference.
I could provide code later if you need it, but hopefully that will give you something to start with.

Read row from one dataframe and write it to the column of another [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I want to read a row from one dataframe and append that row of data into the column of another dataframe. The dimensions are compatible. How does one do that?
Define A & B as 3X3 frames:
A <- data.frame(c(1:3),c(4:6),c(7:9))
A is:
c.1.3. c.4.6. c.7.9.
1 1 4 7
2 2 5 8
3 3 6 9
B <- data.frame(c(13:15),c(16:18),c(19:21))
B is:
c.13.15. c.16.18. c.19.21.
1 13 16 19
2 14 17 20
3 15 18 21
I want to add the last row of B to a new column in A so that I get in A:
1 1 4 7 15
2 2 5 8 18
3 3 6 9 21
This works. Is there an easier way?
A[, 4] <- unlist(B[3, ])
A[,5] <- NULL
Why not:
i <- sample(nrow(NXM) , 1) # pick a row, .... any row
NXM[ , i] <- unlist( QXN[i, ] (
Or:
A[,4] <- t( B[3,] )
There is the potential downside that the "lowest common type denominator" for the various column types of QXN will become the column type in NXM. The dataframe situation was a bit different than the matrix situation would have been.

Reshape data into long format, repeating range of ids for every variable [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I want to reshape my data into a long format, but I would like to repeat the entire range of id's for each variable in my data set, even for those id entries on which the variable takes no value. At the moment I can get narrow data, with ids for each variable on which there is a corresponding entry
Suppose my data has 15 variables, with 20 possible id's, I want to create a narrow form of this data that is 15*20 in length (the range of ids, repeated for each variable), whereby each repeated range of id's shows the values taken by variable, for id1, id2, id3 e.t.c until the end of the range of id's is reached, then variable2 is displayed for id1, id2, id3 e.t.c..
I am unsure of ohw to do this in R, I am currently using the reshape package.
You can use the replicate function which is explained here
v1 <- 1:5
v2 <- 1:6
rep(v1, each = 6)
# 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5
rep(v2, 5)
#1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Yeah, this is hard to work with, but you're looking for the melt function I think...
library(reshape2)
melt(yourdata, id.vars = 'ID COLUMN')
This will return a 300 x 3 data set that looks like:
ID COLUMN variable value
1 col2 7
1 col3 8
.... .... ....
20 col14 99
20 col15 100

Finding the set of n equally distant points [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
For simplicity, if I have a vector of points which looks something like:
x = c(1,4,5,8,9)
I'm trying to find the n points which are equidistant from one another. In this case my n=3 so my ideal answer would be:
1,5,9
Since 5-1=4 and 9-5=4.
The actual vectors are much larger/complex as well as n.
Any ideas on how I can achieve this?
Thanks in advance!
This isn't the whole solution, but I think it is the start of one. First, computing the distance matrix will probably be helpful.
> x <- c(1,4,5,8,9)
> dx <- dist(x)
> dx
1 2 3 4
2 3
3 4 1
4 7 4 3
5 8 5 4 1
Second, you can identify points which are the same distance apart by sorting the distances and run-length encoding them.
> rdx <- rle(sort(dx))
> rdx
Run Length Encoding
lengths: int [1:6] 2 2 3 1 1 1
values : num [1:6] 1 3 4 5 7 8
you can select the set of points you want and then get back to the indices in the original distance matrix using the order function. Taking the third group -- of points separated by distance 4 -- as an example
> i=3
> orderedIndex <- sum(rdx$lengths[1:(i-1)])
> order(dx)[(orderedIndex+1):(orderedIndex+rdx$lengths[i])]
[1] 2 6 9
(the indices count from the top down then from left to right). So here you have identified the 4s in the distance matrix: these are distances between the 1st/3rd, 2nd/4th, and 3rd/5th points in x. But you still have to do some more work to eliminate the 2nd and 4th points. Presumably you choose the 1st, 3rd and 5th points because they are connected?
I think you would want to process all groups of points identified by the rle function as over your chosen size, and then check for connectivity.
Consistent with comments above, here's something that might be what you want, not necessarily what you ask for. I'm sure there is a more efficient way to do this, though.
x = c(1,4,5,8,9)
x2 <- as.matrix(expand.grid(x, x))
x2 <- as.data.frame(t(apply(x2, 1, sort)))
x2 <- x2[!duplicated(x2), ]
x2 <- cbind(x2, d =abs(mapply("-", x2[,1], x2[,2])))
x2[order(x2$d), ]
# V1 V2 d
# 1 1 1 0
# 7 4 4 0
# 13 5 5 0
# 19 8 8 0
# 25 9 9 0
# 8 4 5 1
# 20 8 9 1
# 2 1 4 3
# 14 5 8 3
# 3 1 5 4
# 9 4 8 4
# 15 5 9 4
# 10 4 9 5
# 4 1 8 7
# 5 1 9 8

Resources