Histogram of sum instead of frequency - R - r

I want to plot an histogram where the y-axis represent the sum of a column.
I found this example for categorical data:
R histogram that sums rather than frequency.
However, this is not what I am looking for, as it does not apply for continuous data, where I would have to define the bins.
Let's say I have x and y:
set.seed(1)
mydata <- data.frame(y = runif (100, min= 0, max = 1),
x = rpois(100, 15) * 10)
A traditional histogram will be like:
hist (mydata$x)
Now how can I get the cumulative sum of y in the y-axis?

This is one way to solve this problem that leverages the hist() function for most of the heavy lifting, and has the advantage that the barplot of the cumulative sum of y matches the bins and dimensions of the histogram of x:
set.seed(1)
mydata <- data.frame(y = runif (100, min= 0, max = 1), x = rpois(100, 15) * 10)
mx <- mydata$x
my <- mydata$y
h <- hist(mydata$x)
breaks <- data.frame(
"beg"=h$breaks[-length(h$breaks)],
"end"=h$breaks[-1]
)
sums <- apply(breaks, MARGIN=1, FUN=function(x) { sum(my[ mx >= x[1] & mx < x[2] ]) })
h$counts <- sums
plot(h, ylab="Sum", main="Sum of y Within x Bins")

Summarizing all comments, this is what I wanted to have. Thanks #Alex A.
set.seed(1)
mydata <- data.frame(y = runif (100, min= 0, max = 1), x = rpois(100, 15) * 10)
a <- aggregate(mydata$y, by=list(bin=cut(mydata$x, nclass.Sturges(mydata$x))), FUN=sum)
a$bin<- gsub (']','',as.character (a$bin))
a$bin<- gsub (',',' ',as.character (a$bin))
ab2=sapply(strsplit(as.character(a$bin), " "), "[", 2)
barplot(a$x, names.arg=ab2)

Related

Separating Parameters in Repeated Function Calls

With a vector of values, I want each value to be called on a function
values = 1:10
rnorm(100, mean=values, sd=1)
mean = values repeats the sequence (1,2,3,4,5,6,7,8,9,10). How can I get a matrix, each with 100 observations and using a single element from my vector?
ie:
rnorm(100, mean=1, sd=1)
rnorm(100, mean=2, sd=1)
rnorm(100, mean=3, sd=1)
rnorm(100, mean=4, sd=1)
# ...
It's not clear from your question, but I took it that you wanted a single matrix with 10 rows and 100 columns. That being the case you can do:
matrix(rnorm(1000, rep(1:10, each = 100)), nrow = 10, byrow = TRUE)
Or modify akrun's answer by using sapply instead of lapply
An option is lapply from base R
lapply(1:10, function(i) rnorm(100, mean = i, sd = 1))
Or Map from base R:
Map(function(i) rnorm(100, mean = i, sd = 1), 1:10)
Using map I can apply a function for each value from the vector values
library(purrr)
values = 1:10
map_dfc(
.x = values,
.f = ~rnorm(100,mean = .x,sd = 1)
)
In this case I will have a data.frame 100x10

Plot graph with values of vectors

I want to visualize the elements of my vectors in a graph. I want to generate a graph with a certain x- and y-axis and then put the values of my vectors as points into the graph. I also want different colors for the values of the different vectors. How do I do that?
For example: I have 10 elements in vector A and want to put those elements into the graph. The first Element of vector A has the y-value A[1] and the x-value 1. The second Element of vector A has the y-value A[2] and the x-value 2. Same with vector B.
vec1 = 1:10
vec2 = 1:10
for(idx in 1:10){
vec1[idx] = runif(1, min=0, max=100)
vec2[idx] = runif(1, min=0, max=100)
}
plot(vec1 and vec2) // How do I do this?
dput output for vec1: c(81.9624423747882, 45.583715592511, 56.2400584807619, 8.25600677635521, 82.0227505406365, 45.6240070518106, 68.7916911672801, 94.491201499477, 22.0095717580989, 4.29550902917981)
dput output for vec2: c(29.5684755546972, 68.0154771078378, 52.2058120695874, 2.48502977192402, 91.9532125117257, 24.7736480785534, 66.5003522532061, 79.014728218317, 47.9641782585531, 20.5593338003382)
Starting with
vec1 = 1:10
vec2 = 1:10
for(idx in 1:10){
vec1[idx] = runif(1, min=0, max=100)
vec2[idx] = runif(1, min=0, max=100)
}
plot(vec1 and vec2) // How do I do this?
Try this:
plot( 1:20, c(vec1,vec2) , col=rep(1:2,10) # just points
lines( 1:20, c(vec1,vec2) ) # add lines
# if you wanted the same x's for both sequences the first argument could be
# rep(1:10, 2) instead of 1:20
Note: Your set up code could have been just two lines (no loop):
vec1 = runif(10, min=0, max=100)
vec2 = runif(10, min=0, max=100)
I think the easiest is to create a data frame, which is usually what most functions expect in R:
library(tidyverse)
vec1 = 1:10
vec2 = 1:10
for(idx in 1:10){
vec1[idx] = runif(1, min=0, max=100)
vec2[idx] = runif(1, min=0, max=100)
}
df <- data.frame(order = 1:10, vec1, vec2) %>%
pivot_longer(!order, names_to = "color", values_to = "value")
plot(df$order, df$value, col = c("red","blue")[df$color %>% as.factor()])
I'm wondering or guessing whether you are aiming for the facility provided by teh base-plotting function arrows? This is the example in the ?arrows page:
x <- stats::runif(12); y <- stats::rnorm(12)
i <- order(x, y); x <- x[i]; y <- y[i]
plot(x,y, main = "arrows(.)" )
## draw arrows from point to point :
s <- seq(length(x)-1) # one shorter than data
arrows(x[s], y[s], x[s+1], y[s+1], col = 1:3)
If you wanted instead to plot with each vector (represented by "arrows") starting from the origin it would be:
x <- stats::runif(12); y <- stats::rnorm(12)
# ordering not needed this time
plot(x,y, main = "arrows(.)", xlim=c(0, max(x)) # to let origin be seen)
## draw arrows from origin to point :
s <- length(x) # one shorter than data
arrows(rep(0,s), rep(0,s), x, y, col = 1:3)

R: how to round weights digits in plot.nn()?

I am trying to plot my neural network and I am wondering how can I round the weights to 3 digits.
library(neuralnet)
set.seed(0)
x = matrix(rnorm(100, 0, 5), ncol=4)
y = rnorm(25, 100, 20)
data = data.frame(y, x)
nn.model = neuralnet(y~., data, linear.output=T, stepmax = 1e+06)
plot(nn.model)
I've tried mapply(round) but it didn't work out on lists as neuralnet model generates. Any suggestion is appreciated!
Like this:
nn.model$weights[[1]] <- lapply(nn.model$weights[[1]], function(x) round(x, 3))
plot(nn.model)

How to associate variable values from a df to another

I have a dataframe with three values, x and y are coordinates and z is the value of the indipendent variable:
x.range <- c(1,10)
y.range <- c(20,50)
grid <- expand.grid(x = seq(x.range[1], x.range[2], by=0.5),
y = seq(y.range[1], y.range[2], by=0.5))
grid$z <- runif(nrow(grid),10, 70)
Now i have another dataframe like this with only x and y values:
x1 <- c(3.7,5.4,9.2)
y1 <- c(41.1,30.3,22.9)
df <- data.frame(x=x1,y=y1)
Now i want to associate to the points of dataframe df the z value of the nearest point of dataframe grid (with the shortest distance). Thanks.
This isn't the prettiest, but works
apply(df, 1,
function(x){
pythag <- sqrt((x[1] - grid$x)^2 +
(x[2] - grid$y)^2)
grid[which.min(pythag), "z"]
})
Simply returning the value for the nearest point using Pythagoras.
Edit
Recoding to adhere to coding standards:
pythag <- function(x, y, g){
which.min(((x - g$x)^2 + (y - g$y)^2)^0.5)
}
idx <- mapply(FUN = pythag,
x = df[["x"]],
y = df[["y"]],
MoreArgs = list(g = grid))
grid[idx,]

R hexbin: Get count for particular bin

Suppose I create a hexbin (using the hexbin package):
h <- hexbin(df)
where df has x and y fields. For a particular value of x and y, how do I get the count of the corresponding bin?
Assuming you are using the hexbin function from library(hexbin) you can use the bin IDs to achive what you want.
Call the function as hexbin(..., IDs = T) and the result will have a field that tells you in which bin the points fall.
Working example:
library(hexbin)
x <- c(1, 1.2, 1, 3, 5, -2 ,1, 0, 0.8)
y <- c(1, 1, 0, -1, 0, 2, -1, 1, 1)
h <- hexbin(x, y, xbins = 3,IDs = T)
#what is the cell ID of point 1?
ID1 <- h#cID[1]
#how many points fall in that cell?
sum(h#cID == ID1) #answer is 4 in this case
get_count <- function(x, y, h) {
my_dist <- function(x2, y2) {
return(sqrt((x - x2) ^ 2 + (y - y2) ^ 2))
}
distances <- mapply(my_dist, attr(h, 'xcm'), attr(h, 'ycm'))
return(attr(h, 'count')[which.min(distances)])
}

Resources