Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
For simplicity, if I have a vector of points which looks something like:
x = c(1,4,5,8,9)
I'm trying to find the n points which are equidistant from one another. In this case my n=3 so my ideal answer would be:
1,5,9
Since 5-1=4 and 9-5=4.
The actual vectors are much larger/complex as well as n.
Any ideas on how I can achieve this?
Thanks in advance!
This isn't the whole solution, but I think it is the start of one. First, computing the distance matrix will probably be helpful.
> x <- c(1,4,5,8,9)
> dx <- dist(x)
> dx
1 2 3 4
2 3
3 4 1
4 7 4 3
5 8 5 4 1
Second, you can identify points which are the same distance apart by sorting the distances and run-length encoding them.
> rdx <- rle(sort(dx))
> rdx
Run Length Encoding
lengths: int [1:6] 2 2 3 1 1 1
values : num [1:6] 1 3 4 5 7 8
you can select the set of points you want and then get back to the indices in the original distance matrix using the order function. Taking the third group -- of points separated by distance 4 -- as an example
> i=3
> orderedIndex <- sum(rdx$lengths[1:(i-1)])
> order(dx)[(orderedIndex+1):(orderedIndex+rdx$lengths[i])]
[1] 2 6 9
(the indices count from the top down then from left to right). So here you have identified the 4s in the distance matrix: these are distances between the 1st/3rd, 2nd/4th, and 3rd/5th points in x. But you still have to do some more work to eliminate the 2nd and 4th points. Presumably you choose the 1st, 3rd and 5th points because they are connected?
I think you would want to process all groups of points identified by the rle function as over your chosen size, and then check for connectivity.
Consistent with comments above, here's something that might be what you want, not necessarily what you ask for. I'm sure there is a more efficient way to do this, though.
x = c(1,4,5,8,9)
x2 <- as.matrix(expand.grid(x, x))
x2 <- as.data.frame(t(apply(x2, 1, sort)))
x2 <- x2[!duplicated(x2), ]
x2 <- cbind(x2, d =abs(mapply("-", x2[,1], x2[,2])))
x2[order(x2$d), ]
# V1 V2 d
# 1 1 1 0
# 7 4 4 0
# 13 5 5 0
# 19 8 8 0
# 25 9 9 0
# 8 4 5 1
# 20 8 9 1
# 2 1 4 3
# 14 5 8 3
# 3 1 5 4
# 9 4 8 4
# 15 5 9 4
# 10 4 9 5
# 4 1 8 7
# 5 1 9 8
Related
I have a vector X that contains positive numbers that I want to bin/discretize. For this vector, I want the numbers [0, 10) to show up just as they exist in the vector, but numbers [10,∞) to be 10+.
I'm using:
x <- c(0,1,3,4,2,4,2,5,43,432,34,2,34,2,342,3,4,2)
binned.x <- as.factor(ifelse(x > 10,"10+",x))
but this feels klugey to me. Does anyone know a better solution or a different approach?
How about cut:
binned.x <- cut(x, breaks = c(-1:9, Inf), labels = c(as.character(0:9), '10+'))
Which yields:
# [1] 0 1 3 4 2 4 2 5 10+ 10+ 10+ 2 10+ 2 10+ 3 4 2
# Levels: 0 1 2 3 4 5 6 7 8 9 10+
You question is inconsistent.
In description 10 belongs to "10+" group, but in code 10 is separated level.
If 10 should be in the "10+" group then you code should be
as.factor(ifelse(x >= 10,"10+",x))
In this case you could truncate data to 10 (if you don't want a factor):
pmin(x, 10)
# [1] 0 1 3 4 2 4 2 5 10 10 10 2 10 2 10 3 4 2 10
x[x>=10]<-"10+"
This will give you a vector of strings. You can use as.numeric(x) to convert back to numbers ("10+" become NA), or as.factor(x) to get your result above.
Note that this will modify the original vector itself, so you may want to copy to another vector and work on that.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm new to R and this is issue is bothering me a lot. I have a weighted and directed network and I want to do the following:
I have an igraph network. I want to calculate the edge_betweenness of all edges and create a matrix with the following columns:
edgeID, node1, node2, weight, edgeBetweenness
By edgeID I mean the index of the edge in the graph. I need the index or ID because I want to use the elements of this matrix in another matrix.
So thanks for your help.
First off, please consider camille's advice on how to provide a reproducible & minimal example. For future posts, it is always good to provide some sample data for us to work with.
In response to your question, let's generate a random sample graph and assign some random weights to every edge. I'm using a fixed random seed to ensure reproducibility of random data.
set.seed(2020)
ig <- graph.full(5)
E(ig)$weights <- sample(10, length(E(ig)), replace = T)
Then we can use igraph::as_data_frame and igraph::edge_betweenness to extract an edge list (including weights) and the edge betweenness, respectively.
transform(
edgeID = 1:length(ig),
as_data_frame(ig),
edgeBetweenness = edge_betweenness(ig))
# from to weights edgeID edgeBetweenness
# 1 1 2 7 1 1
# 2 1 3 6 2 1
# 3 1 4 8 3 1
# 4 1 5 1 4 1
# 5 2 3 1 5 1
# 6 2 4 4 6 1
# 7 2 5 10 7 1
# 8 3 4 6 8 1
# 9 3 5 1 9 1
# 10 4 5 8 10 1
I have a data.frame with 2 variables, and 177 observations. I would like to sum up one variable to a certain value, and then get the value of the other variable when that threshold is reached. I will try to add an reproducible example. I am new here so forgive me if I do it wrong.
> df <- data.frame(x=10:1,y=1:10)
> print(df)
x y
1 10 1
2 9 2
3 8 3
4 7 4
5 6 5
6 5 6
7 4 7
8 3 8
9 2 9
10 1 10
How can I sum column y until it reaches a certain value, let's say 7, and then either have it return the value of X(4), or the row number 7. I am sure it is pretty straightforward, but I seem to be drawing a blank.
Here is my solution.
df[cumsum(df$y) <= 7,]
x y
1 10 1
2 9 2
3 8 3
The OP just asked for the relevant value of x which would be done using:
df$x[which(cumsum(df$y) >= 10)[1]]
Also note this finds the first where cumsum(df$y) is at least 10 whereas the other answers find the last <= 7 which is potentially different (though not for this dataset). For the original question (pre-comment) it would need to be:
df$x[which(cumsum(df$y) > 7)[1]]
If you want to stay with base R, try this
> df$x[df$y >= 7][1]
[1] 4
> max(cumsum(df$y[df$y <= 7]))
[1] 28
Or if you need this in a matrix form:
> cbind(df$x[df$y >= 7][1], max(cumsum(df$y[df$y <= 7])))
[,1] [,2]
[1,] 4 28
I would still look into switching to data.table or at least dplyr packages for data manipulation.
I've looked on the internet but I haven found the answer that I'm looking for, but shure it's out there...
I've a data frame, and I want to divide (or any other operation) every cell of a row by a value that it's placed in the second column of my data frame.
So first row from col3 to last col, divide each cell by the value of col2 of that certain row, and so on for every single row.
I have solved this by using a For loop, col2 (delta) it's now a vector, and col3 to end it's a data.frame (mu). The results are append to a new data frame by using rbind.
The question is; I'm pretty sure that this can be done by using the function apply, sapply or similar, but I have not gotten the results that I've been looking so far (not the good ones as I do with the loop for). ¿How can I do it without using a loop for?
Loop for I've been using so far.
In resume.
I want to divide each mu by the delta value of it's own row.
for (i in 1:(dim(mu)[1])){
RA_row <- mu[i,]/delta[i]
RA <- rbind(RA, RA_row)
}
transcript delta mu_5 mu_15 mu_25 mu_35 mu_45 mu_55 mu_65
1 YAL001C 0.066702720 2.201787e-01 1.175731e-01 2.372506e-01 0.139281317 0.081723456 1.835414e-01 1.678318e-01
2 YAL002W 0.106000180 3.685822e-01 1.326865e-01 2.887973e-01 0.158207858 0.193476082 1.867039e-01 1.776946e-01
3 YAL003W 0.022119345 2.271518e+00 2.390637e+00 1.651997e+00 3.802739732 2.733559839 2.772454e+00 3.571712e+00
Thanks
It appears as though you want just:
mu2 <- mu[-(1:2)]/mu[[2]]
# same as mu[-(1:2), ]/mu[['delta']]
That should produce a new dataframe with the division by row. Somewhat more dangerous would be to do the division "in place".
mu[-(1:2)] <- mu[-(1:2)]/mu[[2]]
> mu <- data.frame(a=1,b=1:10, c=rnorm(10), d=rnorm(10) )
> mu
a b c d
1 1 1 -1.91435943 0.45018710
2 1 2 1.17658331 -0.01855983
3 1 3 -1.66497244 -0.31806837
4 1 4 -0.46353040 -0.92936215
5 1 5 -1.11592011 -1.48746031
6 1 6 -0.75081900 -1.07519230
7 1 7 2.08716655 1.00002880
8 1 8 0.01739562 -0.62126669
9 1 9 -1.28630053 -1.38442685
10 1 10 -1.64060553 1.86929062
> (mu2 <- mu[-(1:2)]/mu[[2]])
c d
1 -1.914359426 0.450187101
2 0.588291656 -0.009279916
3 -0.554990812 -0.106022792
4 -0.115882600 -0.232340537
5 -0.223184021 -0.297492062
6 -0.125136500 -0.179198716
7 0.298166649 0.142861258
8 0.002174452 -0.077658337
9 -0.142922281 -0.153825205
10 -0.164060553 0.186929062
> (mu[-(1:2)] <- mu[-(1:2)]/mu[[2]] )
> mu
a b c d
1 1 1 -1.914359426 0.450187101
2 1 2 0.588291656 -0.009279916
3 1 3 -0.554990812 -0.106022792
4 1 4 -0.115882600 -0.232340537
5 1 5 -0.223184021 -0.297492062
6 1 6 -0.125136500 -0.179198716
7 1 7 0.298166649 0.142861258
8 1 8 0.002174452 -0.077658337
9 1 9 -0.142922281 -0.153825205
10 1 10 -0.164060553 0.186929062
I have a vector X that contains positive numbers that I want to bin/discretize. For this vector, I want the numbers [0, 10) to show up just as they exist in the vector, but numbers [10,∞) to be 10+.
I'm using:
x <- c(0,1,3,4,2,4,2,5,43,432,34,2,34,2,342,3,4,2)
binned.x <- as.factor(ifelse(x > 10,"10+",x))
but this feels klugey to me. Does anyone know a better solution or a different approach?
How about cut:
binned.x <- cut(x, breaks = c(-1:9, Inf), labels = c(as.character(0:9), '10+'))
Which yields:
# [1] 0 1 3 4 2 4 2 5 10+ 10+ 10+ 2 10+ 2 10+ 3 4 2
# Levels: 0 1 2 3 4 5 6 7 8 9 10+
You question is inconsistent.
In description 10 belongs to "10+" group, but in code 10 is separated level.
If 10 should be in the "10+" group then you code should be
as.factor(ifelse(x >= 10,"10+",x))
In this case you could truncate data to 10 (if you don't want a factor):
pmin(x, 10)
# [1] 0 1 3 4 2 4 2 5 10 10 10 2 10 2 10 3 4 2 10
x[x>=10]<-"10+"
This will give you a vector of strings. You can use as.numeric(x) to convert back to numbers ("10+" become NA), or as.factor(x) to get your result above.
Note that this will modify the original vector itself, so you may want to copy to another vector and work on that.