Algorithm to Ascribe Value RStudio - r

I am trying to create an algorithm that essentially is a function of this data frame.
This is the code I was using, but it doesn't seem to be working.
I need image_id to be the independent variable so that when I input 7 into the function, I get back 10 and 15. If I were to input 8, I would get back 11 and 13.
num = function(image_id, category_id, data = categories) {x->y}
This is the data frame that I am using.
category_id image_id cat_to_img_last_update
1 15 15 NULL
2 11 11 NULL
3 13 13 NULL
4 10 10 NULL
5 35 35 NULL
6 78 78 NULL
7 112 112 NULL
8 61 61 NULL
9 86 86 NULL
10 101 101 NULL
11 61 61 NULL
12 86 86 NULL

You probably don't need a function for this, but if you really want, here is what it would look like:
# Read in data
categories <-
data.frame(category_id = c(15,11,13,10,35,78,112,61,86,101,61,86),
image_id = c(7,8,8,7,9,9,10,10,11,11,12,12),
stringsAsFactors = FALSE)
num <- function(image_id, data = categories) {
data$category_id[data$image_id == image_id]
}
num(7) # 15 10
num(8) # 11 13

df <- data.frame(
category_id = c(15, 11, 13, 10, 25, 78, 112, 61, 86, 101, 61, 86),
image_id = c(7, 8, 8, 7, 9, 9, 10, 10, 11, 11, 12, 12)
)
myfun <- function(num) { sort(df[df$image_id == num, "category_id"]) }
myfun(7)
myfun(8)

Related

How to define the mapping parameter iteratively to contract vertices chains?

I have a simple graph g. It is requared to smoth the graph by deleting the vertices whose degree is 2 with preserving the layout of the original graph. The same task was solved in the Mathematica.
library(igraph)
set.seed(1)
# preprocessing
g <- sample_gnp(40, 1/20)
V(g)$name <- seq(1:vcount(g))
components <- clusters(g, mode="weak")
biggest_cluster_id <- which.max(components$csize)
vert_ids <- V(g)[components$membership == biggest_cluster_id]
vert_ids
# input random graph
g <- induced_subgraph(g, vert_ids)
LO = layout.fruchterman.reingold(g)
plot(g, vertex.color = ifelse(degree(g)==2, "red", "green"), main ="g", layout = LO)
I have selected vertices chains with a degree of 2.
subg <- induced_subgraph(g, degree(g)==2)
subg_ids <- V(subg); subg_ids
I have read the Q&A and I manually define the mapping parameter of the contract() function.
# join nodes 3 -> 14, 15 -> 40, 13 -> 31, 29 -> 6
mapping = c(2, 3, 4, 5, 6, 7, 8, 10, 13, 3, 15, 16, 17, 18, 19, 20, 21, 22, 23, 25, 26, 27, 6, 30, 13, 32, 33, 34, 35, 36, 38, 39, 15)
g2 <- simplify(contract(g, mapping=mapping, vertex.attr.comb=toString))
# L2 <- LO[-as.numeric(c(14, 40, 31, 6)),] # not working
plot(g2, vertex.color = ifelse(degree(g2)==2, "red", "green"), main ="g2")
Question. What is a possible way to define the mapping parameter iteratively?
Here is an option without mapping in contract (so you don't need to configure mapping manually)
g2 <- graph_from_data_frame(
rbind(
get.data.frame(delete.vertices(g, names(subg_ids))),
do.call(
rbind,
lapply(
decompose(subg),
function(x) {
nbs <- names(unlist(neighborhood(g, nodes = names(V(x))[degree(x) < 2])))
setNames(data.frame(t(subset(nbs, !nbs %in% names(subg_ids)))), c("from", "to"))
}
)
)
),
directed = FALSE
)
and you will see the graph below after running
plot(g2, main = "g2", layout = LO[match(names(V(g2)), names(V(g))), ])
This is only a partial answer, since it does not give a way to compute the contraction automatically. However, I can give some insights on the manual mapping:
Your vertices have names, so those are used for reference instead of the internal vertex number from 1 to n.
In the mapping we need to give the new IDs of the vertices after the contraction.
The original IDs are
> V(g)
+ 33/33 vertices, named, from 0af52c3:
[1] 2 3 4 5 6 7 8 10 13 14 15 16 17 18 19 20 21 22 23 25 26 27 29 30 31 32 33 34 35 36 38 39 40
The new IDs can be given as (multiple possibilities exist):
mapping <- c(6, 14, 6, 5, 6, 7, 7, 10, 31, 14, 15, 16, 17, 14, 6, 7, 31, 22, 6, 25, 26, 27, 14, 30, 31, 6, 6, 34, 35, 36, 38, 39, 15)
For better overview:
old ID: 2 3 4 5 6 7 8 10 13 14 15 16 17 18 19 20 21 22 23 25 26 27 29 30 31 32 33 34 35 36 38 39 40
new ID: 6 14 6 5 6 7 7 10 31 14 15 16 17 14 6 7 31 22 6 25 26 27 14 30 31 6 6 34 35 36 38 39 15
This results in:
g2 <- simplify(contract(g, mapping=mapping, vertex.attr.comb=toString))
plot(g2, vertex.color = ifelse(degree(g2)==2, "red", "green"), main ="g2")
To get rid of the now existing degree-0-nodes you can do:
g3 <- delete.vertices(g2, which(degree(g2) == 0))
Alternatively, and maybe even cleaner you could delete nameless nodes:
g3 <- delete.vertices(g2, which(names(V(g2)) == ""))
To keep the original layout you can do:
L3 <- LO[-which(mapping != as.numeric(names(V(g)))),]
plot(g3, layout = L3)
But is not very good looking in this case...

assign value to a variable rather than using if statement

Right now, I have dataset consisting of variables Gbcode and ncnty
> str(dt)
'data.frame': 840 obs. of 8 variables:
$ Gbcode : Factor w/ 28 levels "11","12","13",..: 21 22 23 24 25 26 27 28 16 17 ...
$ ncounty : num 0 0 0 0 0 0 0 0 0 0 ...
I want to do the following thing:
if a data record is with Gbcode equal to 11, then assign 20 to its ncnty
Gbcode : 11, 12, 13, 14, 15, 21, 22, 23, 31, 32, 33
Corresponding ncnty: 20, 19, 198, 131, 112, 102, 60, 145, 22, 115, 95
I am wondering whether there is any better solution rather than write an if statement, which would be with many lines in this case, maybe less than 20 lines of code.
This is a merge operation as far as I can tell. Make a little lookup table with your Gbcode/ncnty data, and then merge it in.
# lookup table
lkup <- data.frame(Gbcode=c(11,12,13),ncnty=c(20,19,198))
#example data
dt <- data.frame(Gbcode=c(11,13,12,11,13,12,12))
dt
# Gbcode
#1 11
#2 13
#3 12
#4 11
#5 13
#6 12
#7 12
Merge:
merge(dt, lkup, by="Gbcode", all.x=TRUE)
# Gbcode ncnty
#1 11 20
#2 11 20
#3 12 19
#4 12 19
#5 12 19
#6 13 198
#7 13 198
It is sometimes preferable to use match for this sort of thing too:
dt$ncnty <- lkup$ncnty[match(dt$Gbcode,lkup$Gbcode)]
This could be more elegant, but should do the trick.
Gbcodes <- as.character(c(11, 12, 13, 14, 15, 21, 22, 23, 31, 32, 33))
ncounties <- c(20, 19, 198, 131, 112, 102, 60, 145, 22, 115, 95)
for(i in 1:length(Gbcodes)) dt$ncounty[dt$Gbcode==Gbcodes[i]] <- dt$ncounties[i]

Regroup lines of a data frame for which a column value is inferior to x

I have this data frame :
> df
Z freq proba
1 17 1 0.0033289263
2 18 4 0.0055569026
3 19 2 0.0087878028
4 20 3 0.0132023556
5 21 16 0.0188900561
6 22 12 0.0257995234
7 23 30 0.0337042731
8 24 41 0.0421963455
9 25 56 0.0507149437
10 26 65 0.0586089198
11 27 65 0.0652230449
12 28 93 0.0699913154
13 29 82 0.0725182432
14 30 94 0.0726318551
15 31 72 0.0703990113
16 32 74 0.0661024717
17 33 58 0.0601873020
18 34 66 0.0531896431
19 35 38 0.0456625487
20 36 45 0.0381117389
21 37 27 0.0309498221
22 38 17 0.0244723502
23 39 15 0.0188543771
24 40 13 0.0141629367
25 41 4 0.0103793600
26 42 1 0.0074254435
27 43 2 0.0051886582
28 45 1 0.0023658767
29 46 1 0.0015453804
30 49 2 0.0003792308
# Here are my datas :
> dput(df)
structure(list(Z = c(17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
43, 45, 46, 49), freq = c(1, 4, 2, 3, 16, 12, 30, 41, 56, 65,
65, 93, 82, 94, 72, 74, 58, 66, 38, 45, 27, 17, 15, 13, 4, 1,
2, 1, 1, 2), proba = c(0.0033289262662263, 0.00555690264007235,
0.00878780282243439, 0.0132023555702843, 0.0188900560866825,
0.0257995234198431, 0.0337042730520012, 0.0421963455163949, 0.0507149437492447,
0.0586089198012906, 0.0652230449359029, 0.0699913153996099, 0.0725182432348992,
0.0726318551493006, 0.0703990113442269, 0.0661024716831246, 0.0601873020200862,
0.0531896430528685, 0.045662548708844, 0.0381117389181843, 0.030949822142559,
0.0244723501557229, 0.01885437705459, 0.0141629366839816, 0.0103793599644779,
0.00742544354411115, 0.00518865818999788, 0.00236587669133322,
0.00154538036835848, 0.000379230768851682)), .Names = c("Z",
"freq", "proba"), row.names = c(NA, -30L), class = "data.frame")
And I want to regroup lines for which the value "freq" is < 5 with the next line, and this while the next line is < 5.
Idk if I'm clear enough so this is the output I expect :
> df2
labels effectifs pi
1 17;20 10 0.03087599
2 21 16 0.01889006
3 22 12 0.02579952
4 23 30 0.03370427
5 24 41 0.04219635
6 25 56 0.05071494
7 26 65 0.05860892
8 27 65 0.06522304
9 28 93 0.06999132
10 29 82 0.07251824
11 30 94 0.07263186
12 31 72 0.07039901
13 32 74 0.06610247
14 33 58 0.06018730
15 34 66 0.05318964
16 35 38 0.04566255
17 36 45 0.03811174
18 37 27 0.03094982
19 38 17 0.02447235
20 39 15 0.01885438
21 40 13 0.01416294
22 41;49 11 0.02728395
I did it with nested while, but I find this solution very painful and so unoptimized.
i <- 1
freqs <- c()
labels <- c()
pi <- c()
while(i < nrow(df)) {
if (df$freq[i] >= 5) {
freqs <- c(freqs, df$freq[i])
labels <- c(labels, df$Z[i])
pi <- c(pi, df$proba[i])
i <- i + 1
}
else {
count <- df$freq[i]
countPi <- df$proba[i]
k <- i
j <- i
while(df$freq[i] < 5 & i < nrow(df)) {
if (df$freq[i+1] < 5) {
count <- count + df$freq[i+1]
countPi <- countPi + df$proba[i+1]
j <- i + 1
}
i <- i + 1
}
labels <- c(labels, paste0(df$Z[k], ";", df$Z[j]))
freqs <- c(freqs, count)
pi <- c(pi, countPi)
}
}
df2 <- data.frame(labels, freqs, pi)
I'm sure there is far better, maybe with dplyr. If you have a better solution.. Thanks !
We could use the "devel" version of "data.table" as new functions are introduced (rleid). Here, we convert the "data.frame" to "data.table" (setDT(df)), create a grouping variable ("gr") based on the logical index (freq <5) using rleid. 'Z' column is 'numeric/integer' class. Create a character column ("Z1") from the "Z". Grouped by 'gr', if the "freq" is less than 5 for all the elements of that group, summarise the rows to a single row by taking the first observation of columns (.SD[1L]), remove the unwanted columns (as .SD includes "Z1" which will result in duplicate columns), append it with the "Z1" that we get from pasting the min and max value of "Z" for that group. Otherwise, leave it unchanged (else .SD). Remove the columns that we don't need by assigning it to "NULL".
library(data.table) #data.table_1.9.5
res <- setDT(df)[, gr:=rleid(freq<5)][, Z1:= as.character(Z)][,
if(all(freq<5)) c(.SD[1L][,-4, with=FALSE],
list(Z1=toString(c(min(Z), max(Z)))))
else .SD, gr][,1:2 :=NULL][]
head(res,3)
# freq proba Z1
#1: 1 0.003328926 17, 20
#2: 16 0.018890056 21
#3: 12 0.025799523 22
Since this is a dplyr question, here is a dplyr solution. First I used a grouping function in order to define the groups (similar to the rleid function in data.table). Then the summary and is fairly simple.
# grouping function
grouping <- function(condition){
# calculate runs for grouping
run <- rle((!condition) * 1:length(condition))
# revalue
run$values <- seq_along(run$values)
# invert to get grouping
inverse.rle(run)
}
# load dplyr
require(dplyr)
df %>%
mutate(group = grouping(freq<5)) %>% # add groups
group_by(group) %>% # group data
summarize(freq = sum(freq), # sum freq
proba = sum(proba), # sum proba
Z = toString(unique(range(Z)))) %>% # rename Z
mutate(group=NULL) # remove groups
## Source: local data table [22 x 3]
##
## freq proba Z
## 1 10 0.03087599 17, 20
## 2 16 0.01889006 21
## 3 12 0.02579952 22
## 4 30 0.03370427 23
## 5 41 0.04219635 24
## 6 56 0.05071494 25
## 7 65 0.05860892 26
## 8 65 0.06522304 27
## 9 93 0.06999132 28
## 10 82 0.07251824 29
## .. ... ... ...

Making sets from numbers in a dataframe

I have this data.frame:
structure(list(X0 = c(9, 13, 13, 13, 35, 36, 37, 38, 39, 40,
40, 42, 43, 44), X0.1 = c(10, 40, 45, 46, 36, 37, 38, 40, 46,
45, 46, 43, 44, 46)), .Names = c("A", "B"), row.names = c(NA,
14L), class = "data.frame")
A B
1 9 10
2 13 40
3 13 45
4 13 46
5 35 36
6 36 37
7 37 38
8 38 40
9 39 46
10 40 45
11 40 46
12 42 43
13 43 44
14 44 46
I want to create sets like this: row 2,3 and 4 have 13, so they will be grouped into a set (13,40,45,46).
If any further row has even one member common with this set, both members of that row will be included in this set.
Since row 8 has 40 common with above set, the set will include them also: (13,40,45,46,38)
Now row 7 now has one member (38) common with this set, other member (37) will also be included in this set. The set will become (13,40,45,46,38,37)
If none of the 2 members of a row are common to any existing set, they will form their own set. Like row 1 has 9 and 10, none of which is there in any other row. So they form one set of (9,10)
At end I want to print out all sets.
Can I accompalish this in R programming? Thanks for your help.
Is this what you want?
f <- function(s, v) {
m <- which(s$A %in% v | s$B %in% v)
if (!any(m)) v
else Recall(s[-m, ], sort(unique(c(v, c(unlist(s[m, ]))))))
}
done <- c()
for(n in unique(unlist(d))) {
if (n %in% done) next
r <- f(d, n)
done <- c(done, r)
cat("(", r, ") ")
}
it outputs
( 9 10 ) ( 13 35 36 37 38 39 40 42 43 44 45 46 )
Updated
done <- c()
ret <- list()
for(n in unique(unlist(d))) {
if (n %in% done) next
r <- f(d, n)
done <- c(done, r)
cat("(", r, ") ")
ret <- c(ret, list(r))
}
then,
> ret
[[1]]
[1] 9 10
[[2]]
[1] 13 35 36 37 38 39 40 42 43 44 45 46

create a new column with multiple categories using two columns when they satisfy certain conditions using R

I have a data set of "X" (values from 0 to 80) and "Y" (values from 0 to 80). I would like to create a new column "Table". I have 36 tables in mind: In groups of 6... They should be grouped according to:
Tables 1-6:ALL Y 11-20... Table 7-12:Y 21-30, Table 13-18:Y 31-40, Table 19-24:Y 41-50, Table 25-30:Y 51-60, Table 31-36:Y 61-70
Table 1: X 21-30 and Tables 7, 13, 19, 25, 31
Table 2: X 31-40 and Tables 8, 14, 20, 26, 32
Table 3: X 41-50 and Tables 9, 15, 21, 27, 33
Table 4: X 51-60 and Tables 10, 16, 22, 28, 34
Table 5: X 61-70 and Tables 11, 17, 23, 29, 35
Table 6: X 71-80 and Tables 12, 18, 24, 30, 36
End Result:
X Y Table
45 13 3
66 59 29
21 70 31
17 66 NA (there is no table for X lower than 21)
Should I be using the If Else function to group the data from the "X" and "Y" into my new "Table", ranging from 1 to 36 or something else? Any help will be appreciated! Thank you!
head(data)
value avg.temp X Y
1 0 6.69 45 13
2 0 6.01 48 14
3 0 7.35 39 15
4 0 5.86 45 15
5 0 6.43 42 16
6 0 5.68 48 16
I think you could use something like this. If your data frame is called df :
df$Table <- NA
df$Table[df$X>=21 & df$X<=30 & df$Y>=11 & df$Y<=20] <- 1
df$Table[df$X>=31 & df$X<=40 & df$Y>=11 & df$Y<=20] <- 2
...
Use math and indexes:
# demo data
x <- data.frame(X = c(45,66,21,17,0,1,21,80,45),Y = c(13,59,70,66,80,11,0,1,27))
# if each GROUP of Y tables was numbered 1-6, aka indexing
x$ytableindex <- ((x$Y-1) - (x$Y-1) %% 10) / 10
# NA if too low
x$ytableindex[x$ytableindex < 1] <- NA
# find lowest table based on Y index
x$ytable <- (0:5*6+1)[x$ytableindex]
# find difference from lowest Y table to arrive at correct table using X
x$xdiff <- floor((x$X - 1) / 10 - 2)
# NA if too low
x$xdiff[x$xdiff < 0] <- NA
# use difference to calculate the correct table, NA's stay NA
x$Table <- x$ytable + x$xdiff

Resources