I have 2 matrices in R. One is called
j= matrix(c(1:8,1:8), nrow=2,ncol=8)
and the second:
B= matrix (c(Dav_Bou_k_med$r,Dav_Bou$r),nrow=2,ncol=8)
both Dav_Bou_k_med$r and Dav_Bou$r are matrices of nrow=1 and and ncol=8 so they are like this:
[1] 1.668 2.000 1.5 1.7 1.7 1.9 1.9 2.5
etc.
I used this plot:
plot(j,B)
but what I get is the relevant points for every 1:8 of the first matrix (j) (2 points for every 1:8, because I have two rows in B). What I want is to connect these points for every row in the B matrix in the plot. So, each of these points in the B matrix will be connected for each row (of B) and ideally with different colors. Is there any easy way to achieve that?
It's a little difficult to interpret exactly what you are looking for, but I imagine it's something like this?
j= matrix(c(1:8,1:8), nrow=2,ncol=8, byrow=TRUE)
fake_data <- sample(seq(1,3,0.2), 8, replace=TRUE)
more_fake_data <- sample(seq(1,3,0.2), 8, replace=TRUE)
B= matrix (c(fake_data, more_fake_data),nrow=2,ncol=8, byrow=TRUE)
plot(j, B)
lines(j[1,],B[1,])
lines(j[2,],B[2,], col="green")
Related
Good evening,
I have a dataframe with 100 rows and the column headers are ID, x-coordinate and y-coordinate:
ID
X
Y
1
0.1
0.1
2
0.2
0.2
and so on. I would now like to simulate a random walk between these 100 points, but do not know how to call a row. I thought about something similar to the following:
dataframe[ID,]=dataframe[ID+1,]+sample(step,1)
However this throws the error "unused argument (alist(id = ))"
does somebody know how to fix this?
Many thanks in advance!
This will return the rows in randomized order:
set.seed(123) # pick an arbitrary number to make reproducible
dfrm[ sample(100), ]
If you just wanted to x and y values but not the IDs, it would be:
set.seed(567)
dfrm[ sample(100), 2:3 ]
Here's a plot of a result:
df_start <- data.frame(ID=1:100, X=runif(100), Y=runif(100))
dfrm <- df_start[ sample(100) , ]
plot( x=0:1,y=0:1, plot=FALSE) # just setting range
arrows(x0=head(dfrm$X, -1), x1=tail(dfrm$X,-1),
y0=head(dfrm$Y, -1), y1=tail(dfrm$Y,-1) )
You said you wanted a "random walk between these points". The other way to create a random walk which would be more Markovian would be to use the values as increments from a starting point and have the values centered at 0, perhaps spanning [-1, 1]. You would instead use the cumsum of their values after starting at (0,0)
I would like to match points in 3-dimensional space.
Therefore, I am using the Hungarian Method described in this question: Finding the best matching pairwise points from 2 vectors
Here is my example using R:
# packages
library(rgl)
library(clue)
library(plyr)
library(fields)
set.seed(1)
a <- c(rep(2,7), 3,4,5,6,3,4,5,6,7,7,7,7,7,7) # x values
b <- c(rep(3,7),3,3,3,3, 3,3,3,3,3,3,3,3,3,3) # y values
c <- c(seq(1,7),1,1,1,1,7,7,7,7,1,2,3,4,5,6) # z values
# transform the points
set.seed(2)
a1 <- a + seq(1,length(a))
b1 <- b + 8
c1 <- c + 9
# plot the data
plot3d(a,b,c, col="red", pch=16,size=10)
plot3d(a1,b1,c1, lwd=10, col="blue", pch=16,size=10, add=TRUE)
# run the Hungarian Method
A <- cbind(a,b,c)
B <- cbind(a1,b1,c1)
distances <- rdist(A,B) # calculate Euclidean Distance between points
min.dist <- solve_LSAP(distances) # minimizing the sum of distance
min.dist.num <- as.numeric(min.dist)
# plot the minimized lines between point sets
for (ii in 1:dim(B)[1]){
D <- c(A[ii,1], B[min.dist.num[ii],1])
R <- c(A[ii,2], B[min.dist.num[ii],2])
W <- c(A[ii,3], B[min.dist.num[ii],3])
segments3d(D,R,W,col=2,lwd=1)
}
# calculate the share of points that is matched correctly
sum(1:dim(B)[1]==min.dist.num)/dim(B)[1]* 100
The problem here is that only 5% of the points are matched correctly (see last line of the code). In my view, the main trouble is that the algorithm does not take the structure of the object (a square) into account.
Question: Is there any method that performs better for this sample data?
In my original data, the dimensional structure of the points is way more complicated. I have a cloud of data and within this cloud there are multiple subfigures.
I am seeking primarily for a solution in R, but other implementations (e.g. MATLAB, Excel, Java) are also welcome.
Let's say I have the output of a monte-carlo simulation of one variable over several different iterations (think millions). For each iteration, I have the values of the variable at each point in time (ranging from t=1 to t=365).
I would like to produce the following plot:
For each point in time, t, on the x axis and for each possible value "y" in a given range, set the color of x,y to "k" where "k" is a count of how many observations are within a vicinity of distance "d" to x,y.
I know you can easily make density heatmaps for 1D data, but is there a good package for doing this on 2 dimensions? Do I have to use kriging?
Edit: The data structure is currently a matrix.
data matrix
day number
[,1] [,2] [,3] [,4] [,5] ... [,365]
iteration [1,] 0.000213 0.001218 0.000151 0.000108 ... 0.000101
[2,] 0.000314 0.000281 0.000117 0.000103 ... 0.000305
[3,] 0.000314 0.000281 0.000117 0.000103 ... 0.000305
[4,] 0.000171 0.000155 0.000141 0.000219 ... 0.000201
.
.
.
[100000000,] 0.000141 0.000148 0.000144 0.000226 ... 0.000188
I want to, for each "day" have the pixels running vertically across that "day" to represent the probability density of the iteration's values for that day in color. The result should look like a heatmap.
Here is one solution to what I think you are after.
Generate data.
myData <- mapply(rnorm, 1000, 200, mean=seq(-50,50,0.5))
This is a matrix with 1000 rows (observations) and 201 time points. In each time point the mean of data there shifts gradually from -50 to 50. By 0.5 each time.
Get densities.
myDensities <- apply(myData, 2, density, from=-500, to=500)
This will give you a list of densities for each column. In order for them to be plottable side by side we specified the ranges (from -500 to 500) manually.
Obtain density values from the list.
Ys <- sapply(myDensities, "[", "y")
This is again a list. You need to get a matrix from that.
Get matrix from list.
img <- do.call(cbind, Ys)
This simply combines all Ys elements by column.
Plot.
filled.contour(x=1:ncol(img), y=myDensities[[1]]$x, t(img))
I use filled.contour for that. But you can look around for other 2-D plot functions. I also used values obtained from the densities D[[1]]$x.
And here is the result:
The shift from -50 to 50 is visible.
Not sure if this can work well with millions of time points. But plotting million probably makes little sense since you will in any case by limited by the number of pixels. Some kind of pre-processing might be necessary.
Another way to present data over time is to create a video.
The following uses the same matrix data as Karolis:
library(av)
myData <- mapply(rnorm, 1000, 200, mean=seq(-50,50,0.5))
# create function that includes a for loop, the output from
# each iteration of the for loop will become one frame in
# the animation.
make_plot <- function(myData){
xrange = range(myData)
for(i in seq_along(myData[1,])){
d <- density(myData[,i],
bandwidth = 45) # returns the density data
plot(d,
xlim=xrange,
ylim=c(0, 0.003),
main = paste("Density, day:",i))
}
}
# create video
av_capture_graphics(make_plot(myData),
output = "Density change over time.mp4",
width = 720,
height = 480,
framerate = 120)
I have generated a pearson similarity matrix and plotted the results using pheatmap (clustered using hclust, method = "complete"). I'd like to output the ordered matrix, but in R the default seems to be just to alphabetize everything.
Here is my code:
df <- cor(t(genes), method = "pearson")
pheatmap(df, clustering_method = "complete")
head(genes)
pre early mid late end
AAC1 2.0059007 3.64679740 3.0092533 2.4936171 2.2693034
AAC3 -1.6843969 -1.62572636 -0.7654462 -1.5827481 -1.6059080
AAD10 2.6012529 2.05759631 1.3665322 1.4590833 0.3778324
AAD14 0.5047704 0.76021375 0.1825944 0.6111774 0.1174208
AAD15 7.6017557 8.52315453 7.2605744 6.9029452 5.9028824
AAD16 1.2018193 -0.03285354 0.2229450 -0.1337033 0.2198542
This what the current output (df) looks like:
A B C D
A 1 0.5 0.25 0.1
B 0.1 1 0.1 0.5
C 0.5 0.2 1 0.2
D 0 0.1 0.7 1
How can I output the similarity matrix as ordered by hclust?
I've looked, but I haven't been able to find anything that quite accomplishes what I need. Thanks in advance for your help!
(also sorry I don't know how to properly format everything yet)
EDIT: maybe some visuals would help. My clustered pheatmap output looks like this: ordered heatmap
I can see groups of genes that behave similarly, but because there are so many it's impossible/useless to read the labels. I want to find out which genes cluster together, but I can't output the ordered matrix.
When I plot the data without clustering it looks like this: unclustered heatmap
So the output/data I can get is pretty much useless for further analysis.
I have 2 lists with X,Y coordinates of points.
List 1 contains more points than list 2.
The task is to find pairs of points in a way that the overall euclidean distance is minimized.
I have a working code, but i don't know if this is the best way and I would like to get hint what I can improve for result (better algorithm to find the minimum ) or speed, because the list are about 2000 elements each.
The round in the sample vectors is implemented to get also points with same distances.
With the "rdist" function all distances are generated in "distances". Than the minimum in the matrix is used to link 2 point ("dist_min"). All distances of these 2 points are now replaced by NA and the loop continues by searching the next minimum until all points of list 2 have a point from list 1.
At the end I have added a plot for visualization.
require(fields)
set.seed(1)
x1y1.data <- matrix(round(runif(200*2),2), ncol = 2) # generate 1st set of points
x2y2.data <- matrix(round(runif(100*2),2), ncol = 2) # generate 2nd set of points
distances <- rdist(x1y1.data, x2y2.data)
dist_min <- matrix(data=NA,nrow=ncol(distances),ncol=7) # prepare resulting vector with 7 columns
for(i in 1:ncol(distances))
{
inds <- which(distances == min(distances,na.rm = TRUE), arr.ind=TRUE)
dist_min[i,1] <- inds[1,1] # row of point(use 1st element of inds if points have same distance)
dist_min[i,2] <- inds[1,2] # column of point (use 1st element of inds if points have same distance)
dist_min[i,3] <- distances[inds[1,1],inds[1,2]] # distance of point
dist_min[i,4] <- x1y1.data[inds[1,1],1] # X1 ccordinate of 1st point
dist_min[i,5] <- x1y1.data[inds[1,1],2] # Y1 coordinate of 1st point
dist_min[i,6] <- x2y2.data[inds[1,2],1] # X2 coordinate of 2nd point
dist_min[i,7] <- x2y2.data[inds[1,2],2] # Y2 coordinate of 2nd point
distances[inds[1,1],] <- NA # remove row (fill with NA), where minimum was found
distances[,inds[1,2]] <- NA # remove column (fill with NA), where minimum was found
}
# plot 1st set of points
# print mean distance as measure for optimization
plot(x1y1.data,col="blue",main="mean of min_distances",sub=mean(dist_min[,3],na.rm=TRUE))
points(x2y2.data,col="red") # plot 2nd set of points
segments(dist_min[,4],dist_min[,5],dist_min[,6],dist_min[,7]) # connect pairwise according found minimal distance
This is a fundamental problem in combinatorial optimization known as the assignment problem. One approach to solving the assignment problem is the Hungarian algorithm which is implemented in the R package clue:
require(clue)
sol <- solve_LSAP(t(distances))
We can verify that it outperforms the naive solution:
mean(dist_min[,3])
# [1] 0.05696033
mean(sqrt(
(x2y2.data[,1] - x1y1.data[sol, 1])^2 +
(x2y2.data[,2] - x1y1.data[sol, 2])^2))
#[1] 0.05194625
And we can construct a similar plot to the one in your question:
plot(x1y1.data,col="blue")
points(x2y2.data,col="red")
segments(x2y2.data[,1], x2y2.data[,2], x1y1.data[sol, 1], x1y1.data[sol, 2])