How to create a cumulative graph in R - r

Is there a cumulative graph package in R? Or how might I create a cumulative graph in R?
For example, given values of 2, 4, 2, 2, they values should be plotted as 2, 6, 8, 10 d3 example here. Then forming a stair step pattern as with other cumulative records

As per ?plot.default - there is a "stairs" plotting method, which can be combined with cumsum() to give the result you want I think:
plot(cumsum(c(2,4,2,2)), type="s")

Related

umap for dictionary in Julia

I'm given a dictionary with keys(ids) and values.
> Dict{Int64, Vector{Float64}} with 122 entries:
3828 => [1, 2, 3, 4...
2672 => [6,7,5,8...
...
Now I need to apply umap on it. I have the code that
embedding = umap(mat, 2; n_neighbors=15, min_dist=0.001, n_epochs=200)
println(size(embedding))
Plots.scatter(embedding[1,:],embedding[2,:])
Here mat is the matrix
1, 2, 3, 4
6, 7, 5, 8
....
So I got the embedding matrix and the umap plot. But in the plot all points are same color and no labels. How do I do so that I can get points with labels(keys in the dictionary)?
Looking at UMAP.jl, the input matrix should have the shape (n_features x n_samples). If each entry in your dictionary is a sample and I’m interpreting your matrix notation correctly, it appears you have this reversed.
You should be able to add the keys of the dictionary as annotations to the plot as follows (potentially with an optional additional offset to each coordinate):
Plots.annotate!(
embedding[1,:] .+ x_offset,
embedding[2,:] .+ y_offset,
string.(collect(keys(yourdict)))
)
Finally, I’m not sure what variable you actually want to map to the color of the markers in the scatterplot. If it’s the integer value of the keys you should pass this to the scatter function just like above except without turning them into strings.

I can't figure out how to find the Euclidean distance within a single dataframe

Here is the data I am working with:
rdata <- data.frame(x = c(1, 2, 3, 4, 5), y=c(10, 12, 15, 19, 24))
ext <- rdist::rdist(rdata)
View(ext)
When I run this code, I receive the following table:
As you can see from my markings, the correct Euclidean Distance is calculated in a diagonal form, but there is a lot of extraneous data also presented in the table. What can I do to make the correct distances go into a single column? I want the distances between (1, 10) and (2, 12), (2, 12) and (3, 15) and so on,
Thanks in advance
The matrix you get is giving you the distance between all pairs of rows. It seems like you just want the distance between one row and the next row? I would recommend calculating it directly, that way you don't bother calculating the unneeded pairs. On data of any substantial size, this will make a big difference in speed and memory.
n = nrow(rdata)
result = sqrt(rowSums((rdata[-1, ] - rdata[-n, ])^2))
result
# 2 3 4 5
# 2.236068 3.162278 4.123106 5.099020

Plot kernel density estimation with the kernels over the individual observations in R

Well to keep things short what I want to achieve is a plot like the right one:
I would like to obtain a standard KDE plot with its individual kernels plotted over the observations.
The best solution would be the one that considers all the different kernel functions (e.g. rectangular, triangular etc).
Well after reading this Answer I managed to come up with an solution.
# Create some input data
x<-c(19, 20, 10, 17, 16, 13, 16, 10, 7, 18)
# Calculate the KDE
kde<-density(x,kernel="gaussian",bw=bw.SJ(x)*0.2)
# Calcualte the singel kernels/pdf's making up the KDE of all observations
A.kernel<-sapply(x, function(i) {density(i,kernel="gaussian",bw=kde$bw)},simplify=F)
sapply(1:length(A.kernel), function(i){A.kernel[[i]][['y']]<<-(A.kernel[[i]][['y']])/length(x)},simplify=F)
# Plot everything together ensuring the right scale (the area of the single kernels is corrected)
plot(kde)
rug(x,col=2,lwd=2.5)
sapply(A.kernel, function(i){
lines(i,col="red")}
)
The result looks like this:

R : Loop contract.vertices to calculate network measures for groups in a social network in Igraph

I am trying to calculate different networks measures such as betweenness() and constraint()in my network using Igraph in R. My problem is that I am not looking at individuals but on groups of individuals in my network. Therefore I have to contract the vertices before I calculate the different network measures. Thus far I have been able to create a basic code to calculate the measures. But I have a total of ca. 900 groups (with up to 7 members per group) in a network of ca. 70.000 nodes and 250.000 edges. So I am trying to create a loop to automate the approach and make life a little bit easier.
Now I want to present my approach to calculate the constrain().
# load package
library(igraph)
# load data and create a weighted edgelist
df <- data.frame(from=c(6, 9, 10, 1, 7, 8, 8, 4, 5, 2, 5, 10), to=c(3, 4, 2, 5, 10, 1, 9, 10, 6, 9, 3, 6), weight=c(4, 2, 1, 2, 3, 3, 1, 1, 4, 5, 2, 2))
g <- graph.data.frame(df, directed =FALSE)
#import groups
groups <- "
1 5 8
2
10 7 "
subv <- read.table(text = groups, fill = TRUE, header = FALSE)
I would like to loop the upcoming code , to calculate not each constraint() separately. But for all the three groups given in the reproducible example at once.
#create a subvector of the first group and delete all the NA entries
subv1 <- c(as.numeric(as.vector(subv[1,])))
subv1 <- subv1[!is.na(subv1)]
#save subvector as charcter
subv1 <- as.character(subv1)
#creat subgraph with the nodes of group 1 from graph and add their 1st neighbors
g2 <- induced.subgraph(graph=g ,vids=unlist(neighborhood(graph=g ,order=1, nodes = subv1)))
#identify the igraph IDs of the nodes in the first group
match("1", V(g2)$name)
match("5", V(g2)$name)
match("8", V(g2)$name)
#create a contract vector and contract the vertices from largest to smallest using the output from match
convec1 <- c(1:(5-1), 3, 5:(vcount(g2)-1))
g3 <- contract.vertices(g2, convec1, vertex.attr.comb=toString)
convec2 <- c(1:(4-1), 3, 4:(vcount(g3)-1))
g4 <- contract.vertices(g3, convec2, vertex.attr.comb=toString)
#remove the selfloops and sum the weight attributes for the created graph
g5 <- simplify(g4, remove.loops = TRUE, edge.attr.comb=list(weight="sum"))
# calculate the constraint measure for the vertex 1, 5, 8
constraint(g5, nodes=3, weights=NULL)
So now I have the constraint measure for the first group. For the second and third I would have to repeat my steps again. This would be feasible, but as I stated I have 900 groups. Is there any possibility to loop this?
Please let me know if the give example is unclear as I am new to R and Stackoverflow.

represent the frequency line in a histagram using freq=TRUE

I have the following piece of code in R:
w=rbeta(365,1,3,ncp=0)
hist(10*w,breaks=25,freq=TRUE,xlim=c(0,10), ylim=c(0,60))
h=seq(0,1,0.05)
So far so good.
What I want to do now is to add a line representing the beta function having parameters alpha=1, beta=3 (as in the rbeta function I used), which takes into account the frequency and not the density. The total number of elements in the rbeta is 365 (the days in a year) and the reason why I multiply w by 10 is because the variable I am studying can assume value [0,10] each day, following the beta distribution described above.
What do I have to do to represent this line?
Summarizing, the histogram is based on simulated values, and I want to show how the theoretical beta function would had behaved in comparison to the simulation.
If you want them to match up, you're going to want to match up the area under the curves of the histogram and the density plot. That should put them on the same scale. One way to do that would be
set.seed(15) #to make it reproducible
w <- rbeta(365, 1, 3, ncp=0)
hh <- hist(w*10, breaks=25, freq=TRUE, xlim=c(0,10), ylim=c(0,60))
ss <- sum(diff(hh$breaks)*hh$counts)
curve(dbeta(x/10, 1, 3, ncp=0)*ss/10, add=T)
This gives

Resources