I have been trying to create a list with the coordinates of a vector which have the same values.
Example: for a vector Points = (2, 2, 3, 6, 3, 6, 3), coordinates that have the same value are (1,2), (3, 5, 7) and (4,6). So the output should be something like
[[1]]
1 2
[[2]]
3 5 7
[[3]]
4 6
Just to contextualize the necessity of this function: I'm trying to simulate random walks with multiple starting points. Once any given chains hit each other, the one with smallest index will be update. The vector Points is the position of all the walks at some time step t. I need to verify, at each time step, which chains have the same value, in order to update just the one with smallest index. In this particular example, just chains 1, 3 and 4 would be updated at instant t+1.
Maybe something like this:
X = c(2, 3, 4 )
Y = c(4, 3 ,5)
Z = c(8, 8 , 9, 6)
W = c(3, 3, 7,4)
V = c(9, 9, 9,3)
findeev<-function(vi){ #takes a list of vector(s) to extract from `allv`
nms=paste("v",vi,sep="")
ne=max(sapply(allv[vi],length)) #max number of elements
ine=1:ne
# Computes sd along the positions of all vetors
rest=apply(cbind(sapply(allv[vi],function(vt)vt[ine])),1,sd)
# determine if some vectors have the same value across a position
resc=which(rest==0)
if(length(resc)){ret=allv[vi];ret$pos=resc
names(ret)[-length(ret)]<-nms
ret} else invisible()
}
allv=list(X,Y,Z,W,V)
findeev(c(3,5)) #exmple
sol=list();j=1 # This contains all pairs
for(i in (length(allv)-1):2){
# compare all combinations of vector using `findeev`
res<-apply(t(combn(1:length(allv),i )),1,findeev)
if(is.null(res))next #not found continues
# eliminate NAs form list and assign that to sol[[j]]
sol[[j]]<-Filter(Negate(function(x) is.null(unlist(x))), res)
j=j+1
}
sol
You get:
> findeev(c(3,5))
$v3
[1] 8 8 9 6
$v5
[1] 9 9 9 3
$pos
[1] 3
> sol
[[1]]
[[1]][[1]]
[[1]][[1]]$v1
[1] 2 3 4
[[1]][[1]]$v2
[1] 4 3 5
[[1]][[1]]$v4
[1] 3 3 7 4
[[1]][[1]]$pos
[1] 2
[[2]]
[[2]][[1]]
[[2]][[1]]$v1
[1] 2 3 4
[[2]][[1]]$v2
[1] 4 3 5
[[2]][[1]]$pos
[1] 2
[[2]][[2]]
[[2]][[2]]$v1
[1] 2 3 4
[[2]][[2]]$v4
[1] 3 3 7 4
[[2]][[2]]$pos
[1] 2
[[2]][[3]]
[[2]][[3]]$v2
[1] 4 3 5
[[2]][[3]]$v4
[1] 3 3 7 4
[[2]][[3]]$pos
[1] 2
[[2]][[4]]
[[2]][[4]]$v3
[1] 8 8 9 6
[[2]][[4]]$v5
[1] 9 9 9 3
[[2]][[4]]$pos
[1] 3
Related
I know this is a stupid question, but is there a function in R that is the opposite of diff, where you can add consecutive pairs in a vector. I.e. if you had the original vector 1, 2, 3, 4, 5, you would get back 3, 5, 7, 11 (1+2, 2+3, 3+4, 4+5)?
You could use filter function.just ensure you have the right felter/kernel for the lag. Ie lag=1 in the diff function use filter=c(1,1) in the filter function. :
x <-1:5
filter(x, c(1,1), sides = 1)
Here are some possibilities:
rowSums(embed(x, 2))
## [1] 3 5 7 9
x |> embed(2) |> rowSums() # same but with pipes
## [1] 3 5 7 9
head(x, -1) + tail(x, -1)
## [1] 3 5 7 9
diff(cumsum(c(0, x)), 2)
## [1] 3 5 7 9
c(ts(x) + lag(ts(x)))
## [1] 3 5 7 9
library(zoo)
rollsum(x, 2)
## [1] 3 5 7 9
# Since this is a linear operation it has a matrix M
M <- apply(diag(length(x)), 2, rollsum, 2)
c(M %*% x)
## [1] 3 5 7 9
# Take sub or super diagonal of the outer sum
out <- outer(x, x, `+`)
out[row(out) == col(out) - 1]
## [1] 3 5 7 9
Note
x <- 1:5
I have two lists, with the same length, I want to add the first element of second list to the first element of the first list and so on.
here is my example:
# the mock Data is
m1<- matrix(c(2,3,4,5), nrow = 2, ncol = 2)
m2<- matrix(c(1,2 ,3,4,5,6), nrow = 2, ncol = 3)
m3<- matrix(c(1,10,6,8 ,3,4,5,6), nrow = 4, ncol = 2)
m4<-matrix(c(2,5,9,11), nrow = 2,ncol = 2)
list1 <- list(list(x= c(m1,m4, m3), y=c(m1,m2,m3), z=c(m1,m2,m4)),list(x= c(m4,m2, m3), y=c(m1,m2,m4), z=c(m2,m2,m3)),list(x= c(m1,m2, m3), y=c(m1,m2,m3), z=c(m1,m2,m3)))
list2<- list(list(f=m4),list( g=m4),list( h=m2))
the code for achieving what I want
list1[[1]][[4]]<- list2[[1]][[1]]
list1[[2]][[4]]<- list2[[2]][[1]]
list1[[3]][[4]]<- list2[[3]][[1]]
names(list1[[1]])<- c("x","y","z","f")
names(list1[[2]])<- c("x","y","z","g")
names(list1[[3]])<- c("x","y","z","h")
#My question is how can I do the same with loop or lapply, as my actual data is very long lists not only the length of 3.
We can use Map and combine the corresponding elements of each list.
Map(c, list1, list2)
#[[1]]
#[[1]]$x
# [1] 2 3 4 5 2 5 9 11 1 10 6 8 3 4 5 6
#[[1]]$y
# [1] 2 3 4 5 1 2 3 4 5 6 1 10 6 8 3 4 5 6
#[[1]]$z
# [1] 2 3 4 5 1 2 3 4 5 6 2 5 9 11
#[[1]]$f
# [,1] [,2]
#[1,] 2 9
#[2,] 5 11
#....
which is similar to map2 from purrr
purrr::map2(list1, list2, c)
I'm trying to create a vector whose elements add up to a specific number. For example, let's say I want to create a vector with 4 elements, and they must add up to 20, so its elements could be 6, 6, 4, 4 or 2, 5, 7, 6, whatever. I tried to run some lines using sample() and seq() but I cannot do it.
Any help appreciated.
To divide into 4 parts, you need three breakpoints from the 19 possible breaks between 20 numbers. Then your partitions are just the sizes of the intervals between 0, your partitions, and 20:
> sort(sample(19,3))
[1] 5 7 12
> diff(c(0, 5,7,12,20))
[1] 5 2 5 8
Test, lets create a big matrix of them. Each column is an instance:
> trials = sapply(1:1000, function(X){diff(c(0,sort(sample(19,3)),20))})
> trials[,1:6]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 3 1 8 13 3 2
[2,] 4 7 10 2 9 5
[3,] 2 11 1 4 3 7
[4,] 11 1 1 1 5 6
Do they all add to 20?
> all(apply(trials,2,sum)==20)
[1] TRUE
Are there any weird cases?
> range(trials)
[1] 1 17
No, there are no zeroes and nothing bigger than 17, which will be a (1,1,1,17) case. You can't have an 18 without a zero.
foo = function(n, sum1){
#Divide sum1 into 'n' parts
x = rep(sum1/n, n)
#For each x, sample a value from 1 to that value minus one
f = sapply(x, function(a) sample(1:(a-1), 1))
#Add and subtract f from 'x' so that sum(x) does not change
x = x + sample(f)
x = x - sample(f)
x = floor(x)
x[n] = x[n] - (sum(x) - sum1)
return(x)
}
So I'm coming from Python where I can index until the end of an array by either omitting the end index. (Note that Python is 0-indexed):
In [1]: x = range(10)
In [2]: x
Out[2]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [3]: x[4:]
Out[3]: [4, 5, 6, 7, 8, 9]
In [4]: x[-6:]
Out[4]: [4, 5, 6, 7, 8, 9]
In [5]: x[-1]
Out[5]: 9
Also, negative indices start from the end of the array. In R, so far I've found that I need to do things like this:
> x <- 0:9
> x[5:length(x)]
[1] 4 5 6 7 8 9
and so on. Is there any sort of syntactic sugar for length(x)? (Perhaps something like end that MATLAB uses).
You could use the tail function instead of indexing.
In the following, the final 6 elements of the vector x are returned. 6 is the default return length for tail (and head, shown below), and the second argument changes that. So if you were to write tail(x, 5) for example, the final 5 elements will be returned.
> x <- 0:9
> tail(x)
# [1] 4 5 6 7 8 9
> tail(x, 5)
# [1] 5 6 7 8 9
Similarly, there is a head function for viewing the first few elements, which operates in the same manner.
> head(x)
# [1] 0 1 2 3 4 5
> head(x, 5)
# [1] 0 1 2 3 4
Logical indexing is one method. As mentioned (but not demonstrating the second argument to head or tail):
> x <- 1:10
> head(x,-3)
[1] 1 2 3 4 5 6 7
> tail(x,-3)
[1] 4 5 6 7 8 9 10
> x[x>=4]
[1] 4 5 6 7 8 9 10
Mixing negative and positive indices is not allowed.
I have a data frame with list of X/Y locations (>2000 rows). What I want is to select or find all the rows/locations based on a max distance. For example, from the data frame select all the locations that are between 1-100 km from each other. Any suggestions on how to do this?
You need to somehow determine the distance between each pair of rows.
The simplest way is with a corresponding distance matrix
# Assuming Thresh is your threshold
thresh <- 10
# create some sample data
set.seed(123)
DT <- data.table(X=sample(-10:10, 5, TRUE), Y=sample(-10:10, 5, TRUE))
# create the disance matrix
distTable <- matrix(apply(createTable(DT), 1, distance), nrow=nrow(DT))
# remove the lower.triangle since we have symmetry (we don't want duplicates)
distTable[lower.tri(distTable)] <- NA
# Show which rows are above the threshold
pairedRows <- which(distTable >= thresh, arr.ind=TRUE)
colnames(pairedRows) <- c("RowA", "RowB") # clean up the names
Starting with:
> DT
X Y
1: -4 -10
2: 6 1
3: -2 8
4: 8 1
5: 9 -1
We get:
> pairedRows
RowA RowB
[1,] 1 2
[2,] 1 3
[3,] 2 3
[4,] 1 4
[5,] 3 4
[6,] 1 5
[7,] 3 5
These are the two functions used for creating the distance matrix
# pair-up all of the rows
createTable <- function(DT)
expand.grid(apply(DT, 1, list), apply(DT, 1, list))
# simple cartesian/pythagorean distance
distance <- function(CoordPair)
sqrt(sum((CoordPair[[2]][[1]] - CoordPair[[1]][[1]])^2, na.rm=FALSE))
I'm not entirely clear from your question, but assuming you mean you want to take each row of coordinates and find all the other rows whose coordinates fall within a certain distance:
# Create data set for example
set.seed(42)
x <- sample(-100:100, 10)
set.seed(456)
y <- sample(-100:100, 10)
coords <- data.frame(
"x" = x,
"y" = y)
# Loop through all rows
lapply(1:nrow(coords), function(i) {
dis <- sqrt(
(coords[i,"x"] - coords[, "x"])^2 + # insert your preferred
(coords[i,"y"] - coords[, "y"])^2 # distance calculation here
)
names(dis) <- 1:nrow(coords) # replace this part with an index or
# row names if you have them
dis[dis > 0 & dis <= 100] # change numbers to preferred threshold
})
[[1]]
2 6 7 9 10
25.31798 95.01579 40.01250 30.87070 73.75636
[[2]]
1 6 7 9 10
25.317978 89.022469 51.107729 9.486833 60.539243
[[3]]
5 6 8
70.71068 91.78780 94.86833
[[4]]
5 10
40.16217 99.32774
[[5]]
3 4 6 10
70.71068 40.16217 93.40771 82.49242
[[6]]
1 2 3 5 7 8 9 10
95.01579 89.02247 91.78780 93.40771 64.53681 75.66373 97.08244 34.92850
[[7]]
1 2 6 9 10
40.01250 51.10773 64.53681 60.41523 57.55867
[[8]]
3 6
94.86833 75.66373
[[9]]
1 2 6 7 10
30.870698 9.486833 97.082439 60.415230 67.119297
[[10]]
1 2 4 5 6 7 9
73.75636 60.53924 99.32774 82.49242 34.92850 57.55867 67.11930