I have a function :
f <- function(x,y){ return (x + y)}
I have to make a plot 2 D (not 3 D) with on the X and Y aes c(30:200). So I have to map both the x and the y to the function and based on the result of that function I have to color the point f(xi,yi) > ? and so on. How would I achieve this ?
I tried :
range <- c(30:200)
ys = matrix(nrow = 171,ncol = 171 )
for (i in range){
for (y in range){
ys[i-29,y-29] <- f(i,y) # exemple if f(i,j) < 0.5 color (i,j) red
}
}
df <- data.frame(x= c(30:200), y= c(30:200))
Now the x and y axes are correct however how would I be able to plot this since I cant just bind ys to the y axes. Using a ys seems like it isnt the right way to achieve this, how would I do this
Thx for the help
Here's a sample given a small matrix.
First, I'll generate the matrix ... you use whatever data you want.
m <- matrix(1:25, nr=5)
m
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 6 11 16 21
# [2,] 2 7 12 17 22
# [3,] 3 8 13 18 23
# [4,] 4 9 14 19 24
# [5,] 5 10 15 20 25
Now, convert it to the "long" format that ggplot2 prefers:
library(dplyr)
library(tidyr)
longm <- cbind(m, x = seq_len(nrow(m))) %>%
as.data.frame() %>%
gather(y, val, -x) %>%
mutate(y = as.integer(gsub("\\D", "", y)))
head(longm)
# x y val
# 1 1 1 1
# 2 2 1 2
# 3 3 1 3
# 4 4 1 4
# 5 5 1 5
# 6 1 2 6
And a plot:
library(ggplot2)
ggplot(longm, aes(x, y, fill=val)) + geom_tile()
# or, depending on other factors, otherwise identical
ggplot(longm, aes(x, y)) + geom_tile(aes(fill=val))
It's notable (to me) that the top-left value in the matrix (m[1,1]) is actually the bottom-left in the heatmap. This can be adjusted with scale_y_reverse(). From here, it should be primarily aesthetics.
Related
I would like the output of the code below to express diff(x) of a series of numbers in rows, not columns. It currently calculates diff(x) of the rows of a data frame, which is what I want, so the values are correct but they are formatted in columns. This is some example code that generates diff(x) of some series of numbers:
x <- c(19, 26, 39)
y <- c(34, 47, 51)
z <- c(45,50,60)
B <- data.frame(x, y, z)
B
f1 = function(x){return(diff(x))}
apply(B,1,f1)
>[,1] [,2] [,3]
y 15 21 12
z 11 3 9
#this seems to give diff(x) as columns
#want as rows, i.e. the transpose:
15 11
21 3
12 9
Many thanks
Maybe you can try
Bout <- B[-1]- B[-ncol(B)]
such that
> Bout
y z
1 15 11
2 21 3
3 12 9
or
Bout <- t(diff(t(B)))
such that
> Bout
y z
[1,] 15 11
[2,] 21 3
[3,] 12 9
I have a list and a Matrix as per below:
List Y:
$`1`
V1 V2
1 1 1
2 1 2
3 2 1
4 2 2
$`2`
V1 V2
5 5 5
6 11 2
$`3`
V1 V2
7 10 1
8 10 2
9 11 1
10 5 6
Matrix Z:
[,1][,2][,3][,4][,5][,6]
[1,] 2 1 5 5 10 1
I consider below as points1, points2 and points3 in Matrix Z respectively
points1 -(2,1)
[,1][,2]
[1,] 2 1
points2 - (5,5)
[,3][,4]
[1,] 5 5
points3 - (10,1)
[,5][,5]
[1,] 10 1
I want to calculate the sum of distances between all points in list Y[[1]] and points1, all points in List Y[[2]] and points2 and all points in List Y[[3]] and points 3 in r. How can I do this?
rowsums(|y-z|^2)
Based on the description,
Map(function(y, z) rowSums(abs(y - z[col(y)])^2),
Y, split(Z, as.numeric(gl(ncol(Z), 2, ncol(Z)))))
Try the following. It uses Map to apply a function to every vector of the two lists passed to Map. Note that we cannot simply do
Map('-', Y, Z2)
because R would do the subtractions columnwise, not row by row.
f <- function(x, y){
for(i in seq_len(nrow(x)))
x[i, ] <- x[i, ] - y
x
}
Z2 <- split(Z, rep(1:3, each = 2))
Map(f, Y, Z2)
So for data evaluation that I am doing at the moment I want to write a matrix using a "for" loop.
Let's say I have random numbers between 0 and 100:
E <- runif(100, 0, 100)
t <- 0 #start
for(t in 0:90) {
D <- length(E[E >= t, E < (t + 10)])
t = t + 10
}
So what I want to do is write "D" into a matrix at each iteration with "t" in one column and "D" in the other.
I've heard that you should avoid loops in R, but I don't know an alternative.
Rather than using a loop, you can do this with sapply, which operates on each item in a sequence and stores the result in a vector, and then cbind to create the matrix:
E <- runif(100, 0, 100)
t <- seq(0, 90, 10)
D <- sapply(t, function(ti) {
sum(E >= ti & E < (ti + 10))
})
cbind(t, D)
#> t D
#> [1,] 0 11
#> [2,] 10 12
#> [3,] 20 14
#> [4,] 30 11
#> [5,] 40 9
#> [6,] 50 12
#> [7,] 60 7
#> [8,] 70 7
#> [9,] 80 6
#> [10,] 90 11
Note that I also used sum(E >= ti & E < (ti + 10)) rather than length(length(E[E >= ti & E < (ti + 10)])), as a slightly shorter way of finding the number of items in E that were greater than t but less than t + 10.
It seems that you want to bin your variable into categories - this is exactly what cut does:
E <- runif(100, 0, 100)
table(cut(E, breaks = seq(0,100,10), right=FALSE))
#> [0,10) [10,20) [20,30) [30,40) [40,50) [50,60) [60,70) [70,80) [80,90)
#> 10 10 7 10 8 10 12 11 10
#>[90,100)
#> 12
If you don't want to see categories labels, remove table call; if you want it in "tabular" format, wrap it in as.matrix.
Please note that if you are doing it for plotting purposes, then both hist and ggplot will do it automatically for you:
hist(E, breaks = seq(0,100,10))
library("ggplot2")
ggplot(data.frame(var=E), aes(x=var)) + geom_histogram(binwidth = 10)
add <- c( 2,3,4)
for (i in add){
a <- i +3
b <- a + 3
z <- a + b
print(z)
}
# Result
[1] 13
[1] 15
[1] 17
In R, it can print the result, but I want to save the results for further computation in a vector, data frame or list
Thanks in advance
Try something like:
add <- c(2, 3, 4)
z <- rep(0, length(add))
idx = 1
for(i in add) {
a <- i + 3
b <- a + 3
z[idx] <- a + b
idx <- idx + 1
}
print(z)
This is simple algebra, no need in a for loop at all
res <- (add + 3)*2 + 3
res
## [1] 13 15 17
Or if you want a data.frame
data.frame(a = add + 3, b = add + 6, c = (add + 3)*2 + 3)
# a b c
# 1 5 8 13
# 2 6 9 15
# 3 7 10 17
Though in general, when you are trying to something like that, it is better to create a function, for example
myfunc <- function(x) {
a <- x + 3
b <- a + 3
z <- a + b
z
}
myfunc(add)
## [1] 13 15 17
In cases when a loop is actually needed (unlike in your example) and you want to store its results, it is better to use *apply family for such tasks. For example, use lapply if you want a list back
res <- lapply(add, myfunc)
res
# [[1]]
# [1] 13
#
# [[2]]
# [1] 15
#
# [[3]]
# [1] 17
Or use sapply if you want a vector back
res <- sapply(add, myfunc)
res
## [1] 13 15 17
For a data.frame to keep all the info
add <- c( 2,3,4)
results <- data.frame()
for (i in add){
a <- i +3
b <- a + 3
z <- a + b
#print(z)
results <- rbind(results, cbind(a,b,z))
}
results
a b z
1 5 8 13
2 6 9 15
3 7 10 17
If you just want z then use a vector, no need for lists
add <- c( 2,3,4)
results <- vector()
for (i in add){
a <- i +3
b <- a + 3
z <- a + b
#print(z)
results <- c(results, z)
}
results
[1] 13 15 17
It might be instructive to compare these two results with those of #dugar:
> sapply(add, function(x) c(a=x+3, b=a+3, z=a+b) )
[,1] [,2] [,3]
a 5 6 7
b 10 10 10
z 17 17 17
That is the result of lazy evaluation and sometimes trips us up when computing with intermediate values. This next one should give a slightly more expected result:
> sapply(add, function(x) c(a=x+3, b=(x+3)+3, z=(x+3)+((x+3)+3)) )
[,1] [,2] [,3]
a 5 6 7
b 8 9 10
z 13 15 17
Those results are the transpose of #dugar. Using sapply or lapply often saves you the effort off setting up a zeroth case object and then incrementing counters.
> lapply(add, function(x) c(a=x+3, b=(x+3)+3, z=(x+3)+((x+3)+3)) )
[[1]]
a b z
5 8 13
[[2]]
a b z
6 9 15
[[3]]
a b z
7 10 17
I have a data frame with list of X/Y locations (>2000 rows). What I want is to select or find all the rows/locations based on a max distance. For example, from the data frame select all the locations that are between 1-100 km from each other. Any suggestions on how to do this?
You need to somehow determine the distance between each pair of rows.
The simplest way is with a corresponding distance matrix
# Assuming Thresh is your threshold
thresh <- 10
# create some sample data
set.seed(123)
DT <- data.table(X=sample(-10:10, 5, TRUE), Y=sample(-10:10, 5, TRUE))
# create the disance matrix
distTable <- matrix(apply(createTable(DT), 1, distance), nrow=nrow(DT))
# remove the lower.triangle since we have symmetry (we don't want duplicates)
distTable[lower.tri(distTable)] <- NA
# Show which rows are above the threshold
pairedRows <- which(distTable >= thresh, arr.ind=TRUE)
colnames(pairedRows) <- c("RowA", "RowB") # clean up the names
Starting with:
> DT
X Y
1: -4 -10
2: 6 1
3: -2 8
4: 8 1
5: 9 -1
We get:
> pairedRows
RowA RowB
[1,] 1 2
[2,] 1 3
[3,] 2 3
[4,] 1 4
[5,] 3 4
[6,] 1 5
[7,] 3 5
These are the two functions used for creating the distance matrix
# pair-up all of the rows
createTable <- function(DT)
expand.grid(apply(DT, 1, list), apply(DT, 1, list))
# simple cartesian/pythagorean distance
distance <- function(CoordPair)
sqrt(sum((CoordPair[[2]][[1]] - CoordPair[[1]][[1]])^2, na.rm=FALSE))
I'm not entirely clear from your question, but assuming you mean you want to take each row of coordinates and find all the other rows whose coordinates fall within a certain distance:
# Create data set for example
set.seed(42)
x <- sample(-100:100, 10)
set.seed(456)
y <- sample(-100:100, 10)
coords <- data.frame(
"x" = x,
"y" = y)
# Loop through all rows
lapply(1:nrow(coords), function(i) {
dis <- sqrt(
(coords[i,"x"] - coords[, "x"])^2 + # insert your preferred
(coords[i,"y"] - coords[, "y"])^2 # distance calculation here
)
names(dis) <- 1:nrow(coords) # replace this part with an index or
# row names if you have them
dis[dis > 0 & dis <= 100] # change numbers to preferred threshold
})
[[1]]
2 6 7 9 10
25.31798 95.01579 40.01250 30.87070 73.75636
[[2]]
1 6 7 9 10
25.317978 89.022469 51.107729 9.486833 60.539243
[[3]]
5 6 8
70.71068 91.78780 94.86833
[[4]]
5 10
40.16217 99.32774
[[5]]
3 4 6 10
70.71068 40.16217 93.40771 82.49242
[[6]]
1 2 3 5 7 8 9 10
95.01579 89.02247 91.78780 93.40771 64.53681 75.66373 97.08244 34.92850
[[7]]
1 2 6 9 10
40.01250 51.10773 64.53681 60.41523 57.55867
[[8]]
3 6
94.86833 75.66373
[[9]]
1 2 6 7 10
30.870698 9.486833 97.082439 60.415230 67.119297
[[10]]
1 2 4 5 6 7 9
73.75636 60.53924 99.32774 82.49242 34.92850 57.55867 67.11930