Interpolating but fixing a range in R - r

I get some extreme values in the beginning and in the end when interpolating.
In fact, the last b values should not be greater than max(b), and the first values should not be less than 0.
data example:
a<-c(1, 3, 4, 6, 8.7, 9, 10, 12, 19.3, 20)
b<-c(10, 30, 40, 60, 87, 90, 100, 120, 190, 200)
df<-data.frame(a=a, b=b)
> df
a b
1 1.0 10
2 3.0 30
3 4.0 40
4 6.0 60
5 8.7 87
6 9.0 90
7 10.0 100
8 12.0 120
9 19.3 190
10 20.0 200
This is the code I'm using right now:
Hmisc::approxExtrap(df$a, df$b, xout = c(0:25))

Wrap it in pmin and pmax:
pmin(max(df$b), pmax(min(df$b), approxExtrap(df$a, df$b, xout = c(0:25))))
This will keep the upper and lower bounds of b. If you want to replace the lower bound of b (currently 1) with 0, replace min(df$b) with 0.

Related

How to define the mapping parameter iteratively to contract vertices chains?

I have a simple graph g. It is requared to smoth the graph by deleting the vertices whose degree is 2 with preserving the layout of the original graph. The same task was solved in the Mathematica.
library(igraph)
set.seed(1)
# preprocessing
g <- sample_gnp(40, 1/20)
V(g)$name <- seq(1:vcount(g))
components <- clusters(g, mode="weak")
biggest_cluster_id <- which.max(components$csize)
vert_ids <- V(g)[components$membership == biggest_cluster_id]
vert_ids
# input random graph
g <- induced_subgraph(g, vert_ids)
LO = layout.fruchterman.reingold(g)
plot(g, vertex.color = ifelse(degree(g)==2, "red", "green"), main ="g", layout = LO)
I have selected vertices chains with a degree of 2.
subg <- induced_subgraph(g, degree(g)==2)
subg_ids <- V(subg); subg_ids
I have read the Q&A and I manually define the mapping parameter of the contract() function.
# join nodes 3 -> 14, 15 -> 40, 13 -> 31, 29 -> 6
mapping = c(2, 3, 4, 5, 6, 7, 8, 10, 13, 3, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 6, 30, 13, 32, 33, 34, 35, 36, 38, 39, 15)
g2 <- simplify(contract(g, mapping=mapping, vertex.attr.comb=toString))
# L2 <- LO[-as.numeric(c(14, 40, 31, 6)),] # not working
plot(g2, vertex.color = ifelse(degree(g2)==2, "red", "green"), main ="g2")
Question. What is a possible way to define the mapping parameter iteratively?
Here is an option without mapping in contract (so you don't need to configure mapping manually)
g2 <- graph_from_data_frame(
rbind(
get.data.frame(delete.vertices(g, names(subg_ids))),
do.call(
rbind,
lapply(
decompose(subg),
function(x) {
nbs <- names(unlist(neighborhood(g, nodes = names(V(x))[degree(x) < 2])))
setNames(data.frame(t(subset(nbs, !nbs %in% names(subg_ids)))), c("from", "to"))
}
)
)
),
directed = FALSE
)
and you will see the graph below after running
plot(g2, main = "g2", layout = LO[match(names(V(g2)), names(V(g))), ])
This is only a partial answer, since it does not give a way to compute the contraction automatically. However, I can give some insights on the manual mapping:
Your vertices have names, so those are used for reference instead of the internal vertex number from 1 to n.
In the mapping we need to give the new IDs of the vertices after the contraction.
The original IDs are
> V(g)
+ 33/33 vertices, named, from 0af52c3:
[1] 2 3 4 5 6 7 8 10 13 14 15 16 17 18 19 20 21 22 23 25 26 27 29 30 31 32 33 34 35 36 38 39 40
The new IDs can be given as (multiple possibilities exist):
mapping <- c(6, 14, 6, 5, 6, 7, 7, 10, 31, 14, 15, 16, 17, 14, 6, 7, 31, 22, 6, 25, 26, 27, 14, 30, 31, 6, 6, 34, 35, 36, 38, 39, 15)
For better overview:
old ID: 2 3 4 5 6 7 8 10 13 14 15 16 17 18 19 20 21 22 23 25 26 27 29 30 31 32 33 34 35 36 38 39 40
new ID: 6 14 6 5 6 7 7 10 31 14 15 16 17 14 6 7 31 22 6 25 26 27 14 30 31 6 6 34 35 36 38 39 15
This results in:
g2 <- simplify(contract(g, mapping=mapping, vertex.attr.comb=toString))
plot(g2, vertex.color = ifelse(degree(g2)==2, "red", "green"), main ="g2")
To get rid of the now existing degree-0-nodes you can do:
g3 <- delete.vertices(g2, which(degree(g2) == 0))
Alternatively, and maybe even cleaner you could delete nameless nodes:
g3 <- delete.vertices(g2, which(names(V(g2)) == ""))
To keep the original layout you can do:
L3 <- LO[-which(mapping != as.numeric(names(V(g)))),]
plot(g3, layout = L3)
But is not very good looking in this case...

R - Calculating mean of values in data frame column cycling through pre-determined intervals

Starting from a two-column data frame like the following:
value - distance (km)
10 1.2
9.5 1.7
10 4
9.5 6.5
10 7
10 7
10 7.6
8.5 11.6
9.5 11.7
8.5 14.8
9 15.2
9 15.3
8.5 17.4
8 17.8
7 21.2
4 24.6
[...]
I have to calculate several averages of the left column values while the values of the right column are between predetermined ranges of 10 km:
an average of the values between 0 and 10 km, an average of the values between 5 and 15 km, 10 and 20 km and so on.
Something like a moving average within predefined intervals with a partial overlap.
The output should be something like the following:
9.86 (the average of 0 km <= values < 10 km)
9.43 (... 5 km <= values < 15 km)
8.71 (... 10 km <= values < 20 km)
[...]
avgX (... 40 km <= values < 50 km)
I'm looking at the documentation of the different R moving average implementations but (because of me for sure) I'm not sure of the best/right way to achieve my goal.
Here is a base R solution, and you can try the code below:
res <- sapply(seq(0,max(df$distance),by=5),function(k) mean(subset(df, distance >= k & distance < k+10)$value))
such that
> res
[1] 9.857143 9.428571 8.714286 7.583333 5.500000
DATA
df <- structure(list(value = c(10, 9.5, 10, 9.5, 10, 10, 10, 8.5, 9.5,
8.5, 9, 9, 8.5, 8, 7, 4), distance = c(1.2, 1.7, 4, 6.5, 7, 7,
7.6, 11.6, 11.7, 14.8, 15.2, 15.3, 17.4, 17.8, 21.2, 24.6)), class = "data.frame", row.names = c(NA,
-16L))

Mean of the sample obtained

I have this information:
Student: 1 2 3 4 5 6 7 8 9 10
Mark: 85 62 90 85 64 72 70 59 66 70
So I did this:
x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
y <- c(85, 62, 90, 85, 64, 72, 70, 59, 66, 70)
And I wanted to take 10 samples of 4 students and then obtain the mean of those students. I took the samples using the next command 10 times (sample1, sample2, ..., sample10):
sample1 <- sample(x, 4, replace=FALSE, prob=NULL)
Is there any easier way to obtain the samples?
How could I obtain the mean of those samples programming?
I have tried the following:
meansample1 <- mean(sample1)
You can calculate the mean marks for four students with:
mean(sample(y, 4, replace = FALSE))
Then, we can replicate this task 10 times with replicate():
replicate(n = 10, mean(sample(y, 4, replace = FALSE)))
# [1] 78.75 72.25 78.00 70.25 74.25 79.25 72.25 64.25 76.50 69.00

How to replace columnar values in R between 2 numbers

I have imported a simple CSV file into a data frame. In a particular column, I would like to replace all values under 15.0 with 15.0, all values above 25.0 with 25.0, and all values between 15.0 and 25.0 with 20.0.
The below snippet works well for the first task, and switching the directional sign and replacing '15' with '25' works fine for the second task.
ff$temp[ ff$temp<15 ] <- 15
How can I accomplish replacing the values between 15 and 25 with 20?
You could use the same premise, just with a second logical operator to catch the "between" values.
For example,
> ff <- data.frame(temp = c(14, 17, 19, 24, 30))
> within(ff, temp[temp > 15 & temp < 25] <- 20)
# temp
# 1 14
# 2 20
# 3 20
# 4 20
# 5 30
set.seed(25)
v1 <- sample(5:35, 25, replace=TRUE)
c(15, 20, 25)[cut(v1, breaks=c(-Inf, 15, 25, Inf), labels=FALSE)]
#[1] 20 25 15 25 15 25 20 15 15 15 15 20 25 20 25 15 20 25 20 25 15 20 15 15 15

create a new column with multiple categories using two columns when they satisfy certain conditions using R

I have a data set of "X" (values from 0 to 80) and "Y" (values from 0 to 80). I would like to create a new column "Table". I have 36 tables in mind: In groups of 6... They should be grouped according to:
Tables 1-6:ALL Y 11-20... Table 7-12:Y 21-30, Table 13-18:Y 31-40, Table 19-24:Y 41-50, Table 25-30:Y 51-60, Table 31-36:Y 61-70
Table 1: X 21-30 and Tables 7, 13, 19, 25, 31
Table 2: X 31-40 and Tables 8, 14, 20, 26, 32
Table 3: X 41-50 and Tables 9, 15, 21, 27, 33
Table 4: X 51-60 and Tables 10, 16, 22, 28, 34
Table 5: X 61-70 and Tables 11, 17, 23, 29, 35
Table 6: X 71-80 and Tables 12, 18, 24, 30, 36
End Result:
X Y Table
45 13 3
66 59 29
21 70 31
17 66 NA (there is no table for X lower than 21)
Should I be using the If Else function to group the data from the "X" and "Y" into my new "Table", ranging from 1 to 36 or something else? Any help will be appreciated! Thank you!
head(data)
value avg.temp X Y
1 0 6.69 45 13
2 0 6.01 48 14
3 0 7.35 39 15
4 0 5.86 45 15
5 0 6.43 42 16
6 0 5.68 48 16
I think you could use something like this. If your data frame is called df :
df$Table <- NA
df$Table[df$X>=21 & df$X<=30 & df$Y>=11 & df$Y<=20] <- 1
df$Table[df$X>=31 & df$X<=40 & df$Y>=11 & df$Y<=20] <- 2
...
Use math and indexes:
# demo data
x <- data.frame(X = c(45,66,21,17,0,1,21,80,45),Y = c(13,59,70,66,80,11,0,1,27))
# if each GROUP of Y tables was numbered 1-6, aka indexing
x$ytableindex <- ((x$Y-1) - (x$Y-1) %% 10) / 10
# NA if too low
x$ytableindex[x$ytableindex < 1] <- NA
# find lowest table based on Y index
x$ytable <- (0:5*6+1)[x$ytableindex]
# find difference from lowest Y table to arrive at correct table using X
x$xdiff <- floor((x$X - 1) / 10 - 2)
# NA if too low
x$xdiff[x$xdiff < 0] <- NA
# use difference to calculate the correct table, NA's stay NA
x$Table <- x$ytable + x$xdiff

Resources