I have to do a heatmap from fold change data. I got a large diversity in the values (for example a few are between 1200 and 500, but the vast majority ot them are below 2). According to what I could read, I suppose I have to "scale" my heatmap but I can't understand why and how it is done. I couldn't find any clear information about it.
To rephrase my question, I don't understand why we have to "scale" the data, is a log2 transfomration not enough ?
Can somebody explain? Thanks
Related
I am currently trying to create a bspline surface from a pointcloud with 100,000 plus points. I cant seem to find a way to get the bspline definition. I dont have a way to access the knot vector or the control points. I see a nurb surface is created but all the weights seem to be 0 so it seems a bit useless to go to nurbs. If someone could explain how to find the actual bsplins definition I would appreciate it.
fit.m_nurbs.m_knot
fit.m_nurbs.GetSpanVector
nothing usefull comming out
What I'm currently doing is:
Train a GNN and see which graphs are labelled wrongly compared to the ground truth.
Use a GNN-explainer model to help explain which minimum sub-graph is responsible for the mislabeling by checking the wrongly label instances.
Use the graph_edit_distance from networkx to see how much these graphs differentiate from another.
See if I can find clusters that help explain why the GNN might label some graphs wrongly.
Does this seem reasonable?
How would I go around step 4? Would I use something like sklearn_extra.cluster.KMedoids?
All help is appreciated!
Use the graph_edit_distance from networkx to see how much these graphs
differentiate from another.
Guessing this gives you a single number for any pair of graphs.
The question is: on what direction is this number? How many dimensions ( directions ) are there? Suppose two graphs have the same distance from a third. Does this mean that the two graphs are close together, forming a cluster at a distance from the third graph?
If you have answers to the questions in the previous paragraph, then the KMeans algorithm can find clusters for as many dimensions as you might have. It is fast and easy to code, usually giving satisfactory results. https://en.wikipedia.org/wiki/K-means_clustering
So I have a pretty simple question regarding the size of my data. I am trying to calculate the distance between all sets points (WGS84) in a dataset (56000).
https://www.rdocumentation.org/packages/geosphere/versions/1.5-10/topics/distm
According to the documentation: distm(x,y,fun="") if missing, y is the same as x
mydist<-distm(coordinates(mySpatialObject), fun="distHaversine")
This led me to an Error that y was missing. So I figured I could easily work around this.
distm(coordinates(WeedClim.plot),coordinates(WeedClim.plot), fun="distHaversine")
This leads to not just R:Studio, but my entire computer freezing. I had to to a hard reset twice now and do not want to go through this again, because this is my dissertation and I am afraid of breaking something else in the project XD.
Any ideas/solutions? Is there a better function that gives me a distance matrix from a set of coordinates?
THANKS!
I hope this has not been asked before, but I am currently in the process of analyzing some microscopy pictures in R and I am not quite sure how to tackle this.
The situation is as follows:
- I have several pictures of different targets in cells which show spots of signal
- Some pictures show the same cells, but were aquired after others and are therefore a little "off" in x-, y- and z-direction
- Some, but by no means all of the pictures show colocalization = spots from one picture also show up on other pictures
Coming from the spot detection software, I now have data frames for all spots in each picture (one df per picture) with the x-, y- and z-coordinates.
I am now looking for
a) a way to align these matrices of spots from the different colors and thought that cross-correlation of the matrices might be a way to go (however, is there CC in 3D in R?)
b) a way to calculate the colocalization. As these are pictures and intrinsically noisy, even colocalized spots might have a little different coordinates. Is there a function of package in R which merges these data based on a threshold or other parameter of my choice?
Thanks a lot in advance for all your answers!!
Simon
I have a huge number of multidimensional data points. The points basically look like this:
[1.5,3.7,1.95,1.23] one point
[2.56,3.78,4.3,2.9] another point
...................
...................
so on
Sometimes the number of dimensions goes up to something like 20 and the number of points in this 20d space can go up to like 10 million.
I have to bin this data points considering all dimensions as "dependent". So the points have to move together. I have done binning in one-dimension, but although I have been racking my brains to come up with and algorithm, I haven't been successful so far in multi-dimensional case.
I couldn't find any java examples on multi-dimensional binning either. If anybody can give me an idea on how to tackle this issue in java that would be great help.