search for next closest element not in a list - r

I am trying to replace 2 alphabets (repeats ) from vector of 26 alphabets.
I already have 13 of 26 alphabets in my table (keys), so replacement alphabets should not be among those 13 'keys'.
I am trying to write code to replace C & S by next present alphabet which should not be part of 'keys'.
The following code is replacing repeat C by D and S by T, but those both letters are in my 'keys'. Could someone know how I can implement condition so that code will re-run loop if letter to be replace is already present in 'key'?
# alphabets <- toupper(letters)
keys <- c("I", "C", "P", "X", "H", "J", "S", "E", "T", "D", "A", "R", "L")
repeats <- c("C", "S")
index_of_repeat_in_26 <- which(repeats %in% alphabets)
# index_of_repeat_in_26 is 3 , 19
# available_keys <- setdiff(alphabets,keys)
available <- alphabets[available_keys]
# available <- c("B", "F", "G", "K", "O", "Q", "U", "V", "W", "Y", "Z")
index_available_keys <- which(alphabets %in% available_keys)
# 2 6 7 11 15 17 21 22 23 25 26
for (i in 1:length(repeat)){
for(j in 1:(26-sort(index_of_repeat_in_26)[1])){
if(index_of_repeat_in_26[i]+j %in% index_available_keys){
char_to_replace_in_key[i] <- alphabets[index_of_capital_repeat_in_26[i]+1]
}
else{
cat("\n keys not available to replace \n")
}
}
}

keys <- c("I", "C", "P", "X", "H", "J", "S", "E", "T", "D", "A", "R", "L")
repeats <- c("C", "S")
y = sort(setdiff(LETTERS, keys)) # get the letters not present in 'keys'
y = factor(y, levels = LETTERS) # make them factor so that we can do numeric comparisons with the levels
y1 = as.numeric(y) # keep them numeric to compare
z = factor(repeats, levels = LETTERS)
z1 = as.numeric(z)
func <- function(x) { # so here, in each iteration, the index(in this case 1:4 gets passed)
xx = y1 - z1[x] # taking the difference between each 'repeat' element from all 'non-keys'
xx = which(xx>0)[1]# choose the one with smallest difference(because 'y1' is already sorted. So the first nearest non-key gets selected
r = y[xx] # extract the corresponding 'non-key' element
y <<- y[-xx] # after i get the closest letter, I remove that from global list so that it doesn't get captured the next time
y1 <<- y1[-xx] # similarily removed from the equivalent numeric list
r # return the extracted 'closest non-key' chracter
}
# sapply is also a for-loop by itself, in which a single element get passed ro func at a time.
# Here 'seq_along' is used to pass the index. i.e. for 'C' - 1, for 'S' - 2 , etc gets passed.
ans = sapply(seq_along(repeats), func)
if (any(is.na(ans))){
cat("\n",paste0("keys not available to replace for ",
paste0(repeats[which(is.na(ans))], collapse = ",")) ,
"\n")
ans <- ans[!is.na(ans)]
}
# example 2 with :
repeats <- c("Y", "Z")
# output :
# keys not available to replace for Z
# ans
# [1] Z
Note : to understand how each ieration of sapply() works : you should run debug(func) and then run the sapply() call. You can then check on console how each variable xx, r is getting evaluated. Hope this helps!

Related

Looping in R with dynamic variables as dataframe names

I am trying to loop through dataframes where my search variable is in the name of the dataframe. Here I have multiple dataframes beginning with "person", "place", or "thing" and ending with either "5" or "8." I would like to loop through the many combinations of beginning and ending to create a temporary dataframe. The temporary dataframe will be used to create a plot and save the plot.
When I try my current code, I'm able to get the variable name to loop correctly (in other words, I can get "person_odds5" or "place_odds5"), but I cannot use those variables to access the corresponding column in the dataframe.
My current code is:
person_odds5 <- data.frame(odds=c("a", "b", "c", "d"), or_lci95=1:4, or_uci95=11:14, id.exposure=c("f", "g", "h", "i"), id.outcome=c("w", "x", "y", "z"))
place_odds5 <- data.frame(odds=c("a", "b", "c", "d"), or_lci95=5:8, or_uci95=15:18, id.exposure=c("f", "g", "h", "i"), id.outcome=c("w", "x", "y", "z"))
thing_odds5 <- data.frame(odds=c("a", "b", "c", "d"), or_lci95=9:12, or_uci95=19:22, id.exposure=c("f", "g", "h", "i"), id.outcome=c("w", "x", "y", "z"))
nouns <- list("person", "place", "thing")
for (x in nouns) {
pval <- c(5)
for (p in pval) {
name <- paste(x,"_odds",p, sep="")
odds <- paste(name,"$odds", sep="")
temp_dat <- data.frame(odds=odds, index=1:nrow(name))
}
}
When I run this code, my output for "name" is "person_odds5" as character type; my output for "odds" is "person_odds5$odds" as character type, and I encounter "Error in 1:nrow(name) : argument of length 0." Basically, it appears that I can't parse my name assignment through the original dataframe.
Input:
>person_odds5
odds or_lci95 or_uci95 id.exposure id.outcome
1 a 1 11 f w
2 b 2 12 g x
3 c 3 13 h y
4 d 4 14 i z
>
Desired output:
>temp_dat
odds index
1 a 1
2 b 2
3 c 3
4 d 4
>

R: Call a Specific Column

I am trying to pickup specific value by row name and column name.
I have a df look like this and I am not allowed to change the column name.
OA OB OC OD
OA - E C G
OB C - J L
OC A A - A
OD A B B A
For example, row=OA and col=OB will return E, row=OD and col=OB will return B.
I have tried df["OA", "OB"] which is in below, but it didn't return anything.
Are you allowed to subset using coordinates?
Such as...
OAOB<-df[1,2]
##this will give you the value "E"
Use the name of your df, and then [row,col] coordinates
Are you using the data.table library?
#Opens data.table if not already open
require(data.table)
#Just creates the data as above
OA <- c("-", "C", "A", "A")
OB <- c("E", "-", "A", "B")
OC <- c("C", "J", "-", "B")
OD <- c("G", "L", "A", "A")
#Puts it all together
DT <- as.data.table(cbind(OA, OB, OC, OD))
#Makes the proper row names
row.names(DT) <- c("OA", "OB", "OC", "OD")
View(DT)
#Row OA, Col OB
DT[1,2]
#Row OD, Col OB
DT[4,2]
See if that works for you! Just copy and paste that right into your console.

R , Replicating the rownames in data.frame

I have a data.frame with dimension [6587 37] and the rownames must repeat after every 18 rows. How i can do this in Rstudio.
If your 18 column names are:
mynames <- c("a", "b", "c", "d", "e", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s")
You can get what you want with:
paste0(rep(mynames,length.out=6587),rep(1:366,each=18,length.out=6587))
Or you can modify the names pasting different things.
Row names in data.frames have to be unique.
> df <- data.frame(x = 1:2)
> rownames(df) <- c("a", "a")
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘a’
You could use make.names to make the names unique, but still carry some repeating information.
> make.names(c("a","a"), unique = TRUE)
[1] "a" "a.1"
These could be identified with help from grep
Or you could make a column in df or a second data.frame that holds the information

Filtering only unique value from multiple column in R

I have data like this:
X <- data.frame(fac_1 = c("A", "B", "C", "X", "Y"), fac_2 = c("B", "X", "P", "Q", "C"), fac_3 = c("C", "P", "Q", "T", "U"))
fac_1 fac_2 fac_3
A B C
B X P
C P Q
X Q T
Y C U
I want only those alphabet which are common
(1) between fac_1 and fac_2 (like B,C,X) and
(2) all factors which are common among fac_1, fac_2 and fac_3 (like C only)
You can use intersect
intersect(intersect(X$fac_1, X$fac_2), X$fac_3)
#[1] "C"
intersect(X$fac_1, X$fac_2)
#[1] "B" "C" "X"
Alternatively, the function Reduce can be used as described by #docendo discimus at comments section.
Reduce(intersect, X)
#[1] "C"

Create new variable condition on multiple variables R code

I have a data set named "dat".
TEAM1 TEAM2 WINNER
A P A
I S I
P S S
S I I
S P P
W P W
A E A
A S S
E A E
I want to create variable "LOSER" using R code. I have tried like this
Loser <- NULL
for (i in 1: nrow(dat)){
if(match(dat$Team1[i],dat$Winner)==TRUE){
Loser[i] <- cricket$Team2[i]
}else if(match(dat$Team1[i],dat$Winner)==FALSE ){
Loser[i] <- dat$Team1[i]
}
}
But this does not give exact result. What is wrong with this code?
Desired out put:
TEAM1 TEAM2 WINNER LOSER
A P A P
I S I S
P S S P
S I I S
S P P S
W P W P
A E A E
A S S A
E A E A
We can get the desired output by comparing the 'TEAM1' with the 'WINNER' column. Add 1 to it to coerce 'FALSE/TRUE' to '1/2'. This can be used as a column index. We can then cbind with row number and get the corresponding elements to create the 'LOSER' column
dat$LOSER <- dat[cbind(1:nrow(dat), with(dat, TEAM1 == WINNER) + 1)]
dat$LOSER
#[1] "P" "S" "P" "S" "S" "P" "E" "A" "A"
NOTE: Modified based on #David Arenburg's comments. Also, in the dataset, 1st and 2nd columns were the 'TEAM1' and 'TEAM2'. If we have a dataset with many columns and these are not in the 1st and 2nd positions, we can subset the dataset as I showed in the comments to have only two columns
dat$LOSER <- dat[paste0('TEAM', 1:2)][cbind(1:nrow(dat),
with(dat, TEAM1==WINNER)+1L)]
Another option using data.table. For TRUE values in TEAM1==WINNER, we assign (:=) 'LOSER' as 'TEAM2'. Then, we replace the NA values in 'LOSER' with 'TEAM1'
library(data.table)
setDT(dat)[TEAM1==WINNER, LOSER:= TEAM2][is.na(LOSER), LOSER:= TEAM1]
dat
data
dat <- structure(list(TEAM1 = c("A", "I", "P", "S", "S", "W", "A", "A",
"E"), TEAM2 = c("P", "S", "S", "I", "P", "P", "E", "S", "A"),
WINNER = c("A", "I", "S", "I", "P", "W", "A", "S", "E")),
.Names = c("TEAM1",
"TEAM2", "WINNER"), class = "data.frame", row.names = c(NA, -9L))
I was unable to resist to write a dplyr way.
library(dplyr)
dat %>%
mutate(LOSER = ifelse(TEAM1 == WINNER, TEAM2, TEAM1))
TEAM1 TEAM2 WINNER LOSER
1 A P A P
2 I S I S
3 P S S P
4 S I I S
5 S P P S
6 W P W P
7 A E A E
8 A S S A
9 E A E A

Resources