R igraph: Scaling node size - r

I currently use following script to create a plot for betweenness centrality:
plot(g,
rescale = FALSE,
edge.color= edge_color,
edge.width=E(g)$Weight*0.5,
vertex.size= degree(g)*0.5,
main="Degree Centrality"
)
As you can see, I currently use a simple multiplier to adjust vertex.size. As some nodes are really big and some seem too small, I would like to set a range with a minimum and maximum size. Of course, that range should consider degree(g).
Is that somehow possible?
Note: Attempts with scale (degree(g), 5, 15) or similar did not work: "Error in symbols(x = coords[, 1], y = coords[, 2], bg = vertex.color, :
invalid symbol parameter"

To rescale numbers, x, with a domain of (a,b) to a range of (c,d) you need to make a rescaling function like:
rescale = function(x,a,b,c,d){c + (x-a)/(b-a)*(d-c)}
So then if you have degree sizes from 0 to 200, and want your vertex sizes to range from 1 to 5 units, specify the vertex size with:
rescale(degree(g), 0, 200, 1, 5)
This is just a simple linear transformation - you might want something non-linear for better visuals.
You might find a rescale function in a package somewhere (like the rescale function in the scales package), but its not what scale does!

Related

How to circle variable to observed (not latent) variables in dagitty plot

How would I put a circle around certaiin variables in the following plot?
library(dagitty)
g = dagitty('dag{
A [pos="-1,0.5"]
W [pos="0.893,-0.422"]
X [adjusted,pos="0,-0.5"]
Y [pos="1,0.5"]
A -> Y
X -> A
X -> W
X -> Y
}')
png("mp.png", width = 500, height = 500,res=300)
plot(g)
dev.off()
In the web based tool you can indicate eg latent or adjusted and it changes the color of the circle, but this is not quite what I am looking for, although if it were possible to get these in the plot from R that would be sufficient, although I don't really like the way the variable is next to the circle in the web based version. I really wanted to circle observed variables and not circle unobserved ones.
I wrote a function which takes the points you want to circle as input, extracts the position of said points and circles them.
library(dagitty)
g = dagitty('dag{
A [pos="-1,0.5"]
W [pos="0.893,-0.422"]
X [adjusted,pos="0,-0.5"]
Y [pos="1,0.5"]
A -> Y
X -> A
X -> W
X -> Y
}')
circle_points <- function(points_to_circle, g) {
#few regexs to extract the points and the positions from "g"
#can surely be optimized, made nicer and more robust but it works for now
fsplit <- strsplit(g[1], "\\]")[[1]]
fsplit <- fsplit[-length(fsplit)]
fsplit <- substr(fsplit, 1, nchar(fsplit)-1)
fsplit[1] <- substr(fsplit[1], 6, nchar(fsplit))
vars <- sapply(regmatches(fsplit,
regexec("\\\n(.*?)\\s*\\[", fsplit)), "[", 2)
pos <- sub(".*pos=\\\"", "", fsplit)
#build dataframe with extracted information
res_df <- data.frame(vars = vars,
posx = sapply(strsplit(pos, ","), "[",1),
posy = sapply(strsplit(pos, ","), "[",2))
df_to_circle <- res_df[res_df$vars %in% points_to_circle,]
#y-position seems to be inverted and has to be multiplied by -1
points(c(as.numeric(df_to_circle$posx)),
c(as.numeric(df_to_circle$posy) * -1),
cex = 4)
}
plot(g)
circle_points(c("A", "Y"), g)
This results in:
You can of course work with the cex parameter, adding colors etc. It seems that the positioning of the circles is a bit off-centered so maybe manipulate the x and y positions in circle_points by a slim margin.
I did not find any information in dagitty, but bnlearn package can add circle/or other shape easily. But I just noticed you only want to add circle to observed traits rather than latent variables (better mentioned this in your title). Then my code might not be what you are looking for. I still attached the code here for your reference. Alternatively, you can distinguish observed/latent traits in different color. This can be easily done using bnlearn (https://www.bnlearn.com/examples/graphviz-plot/)
library(bnlearn)
tree = model2network("[X][W|X][A|X][Y|A:X]")
graphviz.plot(tree, main = "DAG structure", shape = "circle",
layout = "circo")

In R, how do I count the number of data points on a scatter plot within a cell of custom dimensions?

Let's just say I have the following scatterplot:
set.seed(665544)
n <- 100
x <- cbind(
x=runif(10, 0, 5) + rnorm(n, sd=0.4),
y=runif(10, 0, 5) + rnorm(n, sd=0.4)
)
plot(x)
I want to divide this scatterplot into square cells of a specified size and then count how many points fall into each unique cell. This will essentially give me the local density value of that cell. What is the best way of doing this? Is there an R package that can help? Perhaps a 2D histogram method like in Matlab?
Quick clarifications:
1.) I'd like the function/method to take the following 3 arguments: dimensions of total area, dimensions of cell (OR number of cells), and the data. It would then perhaps output a matrix where each value corresponds to a cell's point count.
2.) Q: Why do you want to use this method to determine local density? Isn't this much easier:
library(dbscan)
pointdensity(x, eps = .1, type = "frequency")
A: This method calculates the local density around each point. Though easy, this definition of local density then makes it very difficult (optimization algorithms necessary) to assign new data in a way that it matches the local density distribution of the original data set.

How to create exponential graph

How can I make an x-axis that doubles for every increment? I want equal distances between 0, 128, 256, 512, 1024 and 2048. How can I do that?
I'm trying to plot points from a benchmark where I measured time and doubled the memory size every increment.
You can cheat and plot with a linear axis, like from 1 up to as many numbers as you desire, then change the labels when you're done. You can use the 'xtick' property to set what horizontal tick values on your graph remain and the 'xticklabel' property to change the labels to your desired values.
labels = [0 128 256 512 1024 2048]; % Provide your labels here
x = 1 : numel(labels);
y = rand(1, numel(x)); % Insert your data here
plot(x, y, 'b.'); % Plot your data
set(gca, 'xtick', x); % Change the x-axis so only the right amount of ticks remain
set(gca, 'xticklabel', labels) % Change the labels to the desired ones
I get the following graph. Note that the data I'm plotting is completely random as I don't have your data but I want to demonstrate what the changed plot looks like:
For more properties that you can change on your graph, see the Axes Properties page on the Octave docs.
With apologies to Rayryeng, since I'm essentially proposing the same method at heart, but I felt it was missing important info, such as how to convert the axis itself to equally spaced intervals in the first place, without messing with the data. So here's a complete solution for example data X vs Y, producing the equivalent of semilogx for base 2.
Y = 1 : 10;
X = 2 .^ Y;
XTicks = log2(X);
XTickLabels = {};
for XTick = XTicks
XTickLabels{end+1} = sprintf('2^{%d}', XTick);
end
plot (log2 (X), Y);
set(gca, 'xtick', XTicks, 'xticklabel', XTickLabels);
Note that if you plan to 'superimpose' another plot on top of this, you'll have to take into account that the actual values in the X axis are essentially "1, 2, 3, ... 10", so either "log-ify" the new plot's X-axis values too, before superimposing via hold on, or plot onto another, independent set of axes entirely and place them in the same position.
Note: I have assumed that you're after a base-2 logarithmic x-axis. If you do actually want the 0-128 interval to be the same as the 128-256 interval, then modify as per Rayrengs answer --- or even better, use a more appropriate graph, like a bar graph! (i.e. with the 'powers-of-two' used purely as descriptive labels for each column)

Find scatterplot area where ~50% of points have one of 2 values

I have a data frame that has 3 values for each point in the form: (x, y, boolean). I'd like to find an area bounded by values of (x, y) where roughly half the points in the area are TRUE and half are FALSE.
I can scatterplot then data and color according to the 3rd value of each point and I get a general idea but I was wondering if there would be a better way. I understand that if you take a small enough area where there are only 2 points and one if TRUE and the other is FALSE then you have 50/50 so I was thinking there has to be a better way of deciding what size area to look for.
Visually I see this has drawing a square on the scatter plot and moving it around the x and y axis each time checking the number of TRUE and FALSE points in the area, but is there a way to determine what a good size for the area is based on the values?
Thanks
EDIT: G5W's answer is a step in the right direction but based on their scatterplot, I'm looking to create a square / rectangle idea in which ~ half the points are green and half are red. I understand that there is potentially an infinite amount of those areas but thinking there might be a good way to determine an optimal size for the area (maybe it should contain at least a certain percentage of the points or something)
Note update below
You do not provide any sample data, so I have created some bogus data like this:
TestData = data.frame(x = c(rnorm(100, -1, 1), rnorm(100, 1,1)),
y = c(rnorm(100, -1, 1), rnorm(100, 1,1)),
z = rep(c(TRUE,FALSE), each=100))
I think that what you want is how much area is taken up by each of the TRUE and FALSE points. A way to interpret that task is to find the convex hull for each group and take its area. That is, find the minimum convex polygon that contains a group. The function chull will compute the convex hull of a set of points.
plot(TestData[,1:2], pch=20, col=as.numeric(TestData$z)+2)
CH1 = chull(TestData[TestData$z,1:2])
CH2 = chull(TestData[!TestData$z,1:2])
polygon(TestData[which(TestData$z)[CH1],1:2], lty=2, col="#00FF0011")
polygon(TestData[which(!TestData$z)[CH2],1:2], lty=2, col="#FF000011")
Once you have the polygons, the polyarea function from the pracma package will compute the area. Note that it computes a "signed" area so you either need to be careful about which direction you traverse the polygon or take the absolute value of the area.
library(pracma)
abs(polyarea(TestData[which(TestData$z)[CH1],1],
TestData[which(TestData$z)[CH1],2]))
[1] 16.48692
abs(polyarea(TestData[which(!TestData$z)[CH2],1],
TestData[which(!TestData$z)[CH2],2]))
[1] 15.17897
Update
This is a completely different answer based on the updated question. I am leaving the old answer because the question now refers to it.
The question now gives a little more information about the data ("There are about twice as many FALSE than TRUE") so I have made an updated bogus data set to reflect that.
set.seed(2017)
TestData = data.frame(x = c(rnorm(100, -1, 1), rnorm(200, 1, 1)),
y = c(rnorm(100, 1, 1), rnorm(200, -1,1)),
z = rep(c(TRUE,FALSE), c(100,200)))
The problem is now to find regions where the density of TRUE and FALSE are approximately equal. The question asked for a rectangular region, but at least for this data, that will be difficult. We can get a good visualization to see why.
We can use the function kde2d from the MASS package to get the 2-dimensional density of the TRUE points and the FALSE points. If we take the difference of these two densities, we need only find the regions where the difference is near zero. Once we have this difference in density, we can visualize it with a contour plot.
library(MASS)
Grid1 = kde2d(TestData$x[TestData$z], TestData$y[TestData$z],
lims = c(c(-3,3), c(-3,3)))
Grid2 = kde2d(TestData$x[!TestData$z], TestData$y[!TestData$z],
lims = c(c(-3,3), c(-3,3)))
GridDiff = Grid1
GridDiff$z = Grid1$z - Grid2$z
filled.contour(GridDiff, color = terrain.colors)
In the plot it is easy to see the place that there are far more TRUE than false near (-1,1) and where there are more FALSE than TRUE near (1,-1). We can also see that the places where the difference in density is near zero lie in a narrow band in the general area of the line y=x. You might be able to get a box where a region with more TRUEs is balanced by a region with more FALSEs, but the regions where the density is the same is small.
Of course, this is for my bogus data set which probably bears little relation to your real data. You could perform the same sort of analysis on your data and maybe you will be luckier with a bigger region of near equal densities.

Looks like a simple graphing problem

At present I have a control to which I need to add the facility to apply various acuteness (or sensitivity). The problem is best illustrated as an image:
Graph http://img87.imageshack.us/img87/7886/control.png
As you can see, I have X and Y axess that both have arbitrary limits of 100 - that should suffice for this explanation. At present, my control is the red line (linear behaviour), but I would like to add the ability for the other 3 curves (or more) i.e. if a control is more sensitive then a setting will ignore the linear setting and go for one of the three lines. The starting point will always be 0, and the end point will always be 100.
I know that an exponential is too steep, but can't seem to figure a way forward. Any suggestions please?
The curves you have illustrated look a lot like gamma correction curves. The idea there is that the minimum and maximum of the range stays the same as the input, but the middle is bent like you have in your graphs (which I might note is not the circular arc which you would get from the cosine implementation).
Graphically, it looks like this:
(source: wikimedia.org)
So, with that as the inspiration, here's the math...
If your x values ranged from 0 to 1, the function is rather simple:
y = f(x, gamma) = x ^ gamma
Add an xmax value for scaling (i.e. x = 0 to 100), and the function becomes:
y = f(x, gamma) = ((x / xmax) ^ gamma) * xmax
or alternatively:
y = f(x, gamma) = (x ^ gamma) / (xmax ^ (gamma - 1))
You can take this a step further if you want to add a non-zero xmin.
When gamma is 1, the line is always perfectly linear (y = x). If x is less than 1, your curve bends upward. If x is greater than 1, your curve bends downward. The reciprocal value of gamma will convert the value back to the original (x = f(y, 1/g) = f(f(x, g), 1/g).
Just adjust the value of gamma according to your own taste and application needs. Since you're wanting to give the user multiple options for "sensitivity enhancement", you may want to give your users choices on a linear scale, say ranging from -4 (least sensitive) to 0 (no change) to 4 (most sensitive), and scale your internal gamma values with a power function. In other words, give the user choices of (-4, -3, -2, -1, 0, 1, 2, 3, 4), but translate that to gamma values of (5.06, 3.38, 2.25, 1.50, 1.00, 0.67, 0.44, 0.30, 0.20).
Coding that in C# might look something like this:
public class SensitivityAdjuster {
public SensitivityAdjuster() { }
public SensitivityAdjuster(int level) {
SetSensitivityLevel(level);
}
private double _Gamma = 1.0;
public void SetSensitivityLevel(int level) {
_Gamma = Math.Pow(1.5, level);
}
public double Adjust(double x) {
return (Math.Pow((x / 100), _Gamma) * 100);
}
}
To use it, create a new SensitivityAdjuster, set the sensitivity level according to user preferences (either using the constructor or the method, and -4 to 4 would probably be reasonable level values) and call Adjust(x) to get the adjusted output value. If you wanted a wider or narrower range of reasonable levels, you would reduce or increase that 1.5 value in the SetSensitivityLevels method. And of course the 100 represents your maximum x value.
I propose a simple formula, that (I believe) captures your requirement. In order to have a full "quarter circle", which is your extreme case, you would use (1-cos((x*pi)/(2*100)))*100.
What I suggest is that you take a weighted average between y=x and y=(1-cos((x*pi)/(2*100)))*100. For example, to have very close to linear (99% linear), take:
y = 0.99*x + 0.01*[(1-cos((x*pi)/(2*100)))*100]
Or more generally, say the level of linearity is L, and it's in the interval [0, 1], your formula will be:
y = L*x + (1-L)*[(1-cos((x*pi)/(2*100)))*100]
EDIT: I changed cos(x/100) to cos((x*pi)/(2*100)), because for the cos result to be in the range [1,0] X should be in the range of [0,pi/2] and not [0,1], sorry for the initial mistake.
You're probably looking for something like polynomial interpolation. A quadratic/cubic/quartic interpolation ought to give you the sorts of curves you show in the question. The differences between the three curves you show could probably be achieved just by adjusting the coefficients (which indirectly determine steepness).
The graph of y = x^p for x from 0 to 1 will do what you want as you vary p from 1 (which will give the red line) upwards. As p increases the curve will be 'pushed in' more and more. p doesn't have to be an integer.
(You'll have to scale to get 0 to 100 but I'm sure you can work that out)
I vote for Rax Olgud's general idea, with one modification:
y = alpha * x + (1-alpha)*(f(x/100)*100)
alt text http://www4c.wolframalpha.com/Calculate/MSP/MSP4501967d41e1aga1b3i00004bdeci2b6be2a59b?MSPStoreType=image/gif&s=6
where f(0) = 0, f(1) = 1, f(x) is superlinear, but I don't know where this "quarter circle" idea came from or why 1-cos(x) would be a good choice.
I'd suggest f(x) = xk where k = 2, 3, 4, 5, whatever gives you the desired degre of steepness for &alpha = 0. Pick a value for k as a fixed number, then vary α to choose your particular curve.
For problems like this, I will often get a few points from a curve and throw it through a curve fitting program. There are a bunch of them out there. Here's one with a 7-day free trial.
I've learned a lot by trying different models. Often you can get a pretty simple expression to come close to your curve.

Resources