R: Call a Specific Column - r

I am trying to pickup specific value by row name and column name.
I have a df look like this and I am not allowed to change the column name.
OA OB OC OD
OA - E C G
OB C - J L
OC A A - A
OD A B B A
For example, row=OA and col=OB will return E, row=OD and col=OB will return B.
I have tried df["OA", "OB"] which is in below, but it didn't return anything.

Are you allowed to subset using coordinates?
Such as...
OAOB<-df[1,2]
##this will give you the value "E"
Use the name of your df, and then [row,col] coordinates

Are you using the data.table library?
#Opens data.table if not already open
require(data.table)
#Just creates the data as above
OA <- c("-", "C", "A", "A")
OB <- c("E", "-", "A", "B")
OC <- c("C", "J", "-", "B")
OD <- c("G", "L", "A", "A")
#Puts it all together
DT <- as.data.table(cbind(OA, OB, OC, OD))
#Makes the proper row names
row.names(DT) <- c("OA", "OB", "OC", "OD")
View(DT)
#Row OA, Col OB
DT[1,2]
#Row OD, Col OB
DT[4,2]
See if that works for you! Just copy and paste that right into your console.

Related

Looping in R with dynamic variables as dataframe names

I am trying to loop through dataframes where my search variable is in the name of the dataframe. Here I have multiple dataframes beginning with "person", "place", or "thing" and ending with either "5" or "8." I would like to loop through the many combinations of beginning and ending to create a temporary dataframe. The temporary dataframe will be used to create a plot and save the plot.
When I try my current code, I'm able to get the variable name to loop correctly (in other words, I can get "person_odds5" or "place_odds5"), but I cannot use those variables to access the corresponding column in the dataframe.
My current code is:
person_odds5 <- data.frame(odds=c("a", "b", "c", "d"), or_lci95=1:4, or_uci95=11:14, id.exposure=c("f", "g", "h", "i"), id.outcome=c("w", "x", "y", "z"))
place_odds5 <- data.frame(odds=c("a", "b", "c", "d"), or_lci95=5:8, or_uci95=15:18, id.exposure=c("f", "g", "h", "i"), id.outcome=c("w", "x", "y", "z"))
thing_odds5 <- data.frame(odds=c("a", "b", "c", "d"), or_lci95=9:12, or_uci95=19:22, id.exposure=c("f", "g", "h", "i"), id.outcome=c("w", "x", "y", "z"))
nouns <- list("person", "place", "thing")
for (x in nouns) {
pval <- c(5)
for (p in pval) {
name <- paste(x,"_odds",p, sep="")
odds <- paste(name,"$odds", sep="")
temp_dat <- data.frame(odds=odds, index=1:nrow(name))
}
}
When I run this code, my output for "name" is "person_odds5" as character type; my output for "odds" is "person_odds5$odds" as character type, and I encounter "Error in 1:nrow(name) : argument of length 0." Basically, it appears that I can't parse my name assignment through the original dataframe.
Input:
>person_odds5
odds or_lci95 or_uci95 id.exposure id.outcome
1 a 1 11 f w
2 b 2 12 g x
3 c 3 13 h y
4 d 4 14 i z
>
Desired output:
>temp_dat
odds index
1 a 1
2 b 2
3 c 3
4 d 4
>

How do I Identify by row id the values in a data frame column not in another data frame column?

How do I identify by row id the values in data frame d2 column c3 that are not in data frame d1 column c1? My which function returns all records when sub-setting as shown. My requirement is to follow this sub set structure and not value$field design which works:
c1 <- c("A", "B", "C", "D", "E")
c2 <- c("a", "b", "c", "d", "e")
c3 <- c("A", "z", "C", "z", "E", "F")
c4 <- c("a", "x", "x", "d", "e", "f")
d1 <- data.frame(c1, c2, stringsAsFactors = F)
d2 <- data.frame(c3, c4, stringsAsFactors = F)
x <- unique(d1["c1"])
y <- d2[,"c3"]
id <- which(!(y %in% x) ) # incorrect, all row ids returned
I am trying to find the id's of rows in y where the specified column does not include values of x
I believe setdiff would work here. I see z and F are what you want, right? They are not in d1[,"c1"] but are in d2[,"c3"]
includes <- setdiff(d2[,"c3"], d1[,"c1"])
d2_new <- d2[d2[,"c3"] %in% includes,]
d2_new$id <- rownames(d2_new)
d2_new
# or
ids <- rownames(d2[d2[,"c3"] %in% includes,])
output
d2_new
# c3 c4 id
#2 z x 2
#4 z d 4
#6 F f 6
ids
#[1] "2" "4" "6"
I had the same problem, and this code worked for me. However, indexing did not work for me. With a slight change it worked perfect.
includes <- setdiff(d2$c3, d1$c3)
d2_new <- d2[d2$c3 %in% includes,]
d2_new$id <- rownames(d2_new)
d2_new
thank you #jpsmith

How can I make a data.frame using chr vertor [duplicate]

This question already has answers here:
aggregating unique values in columns to single dataframe "cell" [duplicate]
(2 answers)
Closed 2 years ago.
I have the data.frame below.
> Chr Chr
> A E
> A F
> A E
> B G
> B G
> C H
> C I
> D E
and... I want to convert the dataset as belows as you may be noticed.
I want to coerce all chr vectors into an row.
chr chr
A E,F
B G
C H,I
D E
they are all characters, so I tried to do several things so that I want to make.
Firstly, I used unique function for FILTER <- unique(chr[,15])1st column and try to subset them using
FILTER data that I created using rbind or bind rows function.
Secondly, I tested to check whether my idea works or not
FILTER <- unique(Top[,15])
NN <- data.frame()
for(i in 1 :nrow(FILTER)){
result = unique(Top10Data[TGT == FILTER[i]]$`NM`))
print(result)
}
to this stage, it seems to be working well.
The problem for me is that when I used both functions, the data frame only creates 1 column and ignored the others vector (2nd variables from above data.frame) all.
Only For the chr [1,1], those functions do work well, but I have chr vectors such as chr[1,n], which is unable to be coerced.
here's my code for your reference.
FILTER <- unique(Top[,15])
NN <- data.frame()
for(i in 1 :nrow(FILTER)){
CGONM <- rbind(NN,unique(Top10Data[TGT == FILTER[i]]$`NM`))
}
Base R solutions:
# Solution 1:
df_str_agg1 <- aggregate(var2~var1, df, FUN = function(x){
paste0(unique(x), collapse = ",")})
# Solution 2:
df_str_agg2 <- data.frame(do.call("rbind",lapply(split(df, df$var1), function(x){
data.frame(var1 = unique(x$var1),
var2 = paste0(unique(x$var2), collapse = ","))
}
)
),
row.names = NULL
)
Tidyverse solution:
library(tidyverse)
df_str_agg3 <-
df %>%
group_by(var1) %>%
summarise(var2 = str_c(unique(var2), collapse = ",")) %>%
ungroup()
Data:
df <- data.frame(var1 = c("A", "A", "A", "B", "B", "C", "C", "D"),
var2 = c("E", "F", "E", "G", "G", "H", "I", "E"), stringsAsFactors = FALSE)

search for next closest element not in a list

I am trying to replace 2 alphabets (repeats ) from vector of 26 alphabets.
I already have 13 of 26 alphabets in my table (keys), so replacement alphabets should not be among those 13 'keys'.
I am trying to write code to replace C & S by next present alphabet which should not be part of 'keys'.
The following code is replacing repeat C by D and S by T, but those both letters are in my 'keys'. Could someone know how I can implement condition so that code will re-run loop if letter to be replace is already present in 'key'?
# alphabets <- toupper(letters)
keys <- c("I", "C", "P", "X", "H", "J", "S", "E", "T", "D", "A", "R", "L")
repeats <- c("C", "S")
index_of_repeat_in_26 <- which(repeats %in% alphabets)
# index_of_repeat_in_26 is 3 , 19
# available_keys <- setdiff(alphabets,keys)
available <- alphabets[available_keys]
# available <- c("B", "F", "G", "K", "O", "Q", "U", "V", "W", "Y", "Z")
index_available_keys <- which(alphabets %in% available_keys)
# 2 6 7 11 15 17 21 22 23 25 26
for (i in 1:length(repeat)){
for(j in 1:(26-sort(index_of_repeat_in_26)[1])){
if(index_of_repeat_in_26[i]+j %in% index_available_keys){
char_to_replace_in_key[i] <- alphabets[index_of_capital_repeat_in_26[i]+1]
}
else{
cat("\n keys not available to replace \n")
}
}
}
keys <- c("I", "C", "P", "X", "H", "J", "S", "E", "T", "D", "A", "R", "L")
repeats <- c("C", "S")
y = sort(setdiff(LETTERS, keys)) # get the letters not present in 'keys'
y = factor(y, levels = LETTERS) # make them factor so that we can do numeric comparisons with the levels
y1 = as.numeric(y) # keep them numeric to compare
z = factor(repeats, levels = LETTERS)
z1 = as.numeric(z)
func <- function(x) { # so here, in each iteration, the index(in this case 1:4 gets passed)
xx = y1 - z1[x] # taking the difference between each 'repeat' element from all 'non-keys'
xx = which(xx>0)[1]# choose the one with smallest difference(because 'y1' is already sorted. So the first nearest non-key gets selected
r = y[xx] # extract the corresponding 'non-key' element
y <<- y[-xx] # after i get the closest letter, I remove that from global list so that it doesn't get captured the next time
y1 <<- y1[-xx] # similarily removed from the equivalent numeric list
r # return the extracted 'closest non-key' chracter
}
# sapply is also a for-loop by itself, in which a single element get passed ro func at a time.
# Here 'seq_along' is used to pass the index. i.e. for 'C' - 1, for 'S' - 2 , etc gets passed.
ans = sapply(seq_along(repeats), func)
if (any(is.na(ans))){
cat("\n",paste0("keys not available to replace for ",
paste0(repeats[which(is.na(ans))], collapse = ",")) ,
"\n")
ans <- ans[!is.na(ans)]
}
# example 2 with :
repeats <- c("Y", "Z")
# output :
# keys not available to replace for Z
# ans
# [1] Z
Note : to understand how each ieration of sapply() works : you should run debug(func) and then run the sapply() call. You can then check on console how each variable xx, r is getting evaluated. Hope this helps!

Create new variable condition on multiple variables R code

I have a data set named "dat".
TEAM1 TEAM2 WINNER
A P A
I S I
P S S
S I I
S P P
W P W
A E A
A S S
E A E
I want to create variable "LOSER" using R code. I have tried like this
Loser <- NULL
for (i in 1: nrow(dat)){
if(match(dat$Team1[i],dat$Winner)==TRUE){
Loser[i] <- cricket$Team2[i]
}else if(match(dat$Team1[i],dat$Winner)==FALSE ){
Loser[i] <- dat$Team1[i]
}
}
But this does not give exact result. What is wrong with this code?
Desired out put:
TEAM1 TEAM2 WINNER LOSER
A P A P
I S I S
P S S P
S I I S
S P P S
W P W P
A E A E
A S S A
E A E A
We can get the desired output by comparing the 'TEAM1' with the 'WINNER' column. Add 1 to it to coerce 'FALSE/TRUE' to '1/2'. This can be used as a column index. We can then cbind with row number and get the corresponding elements to create the 'LOSER' column
dat$LOSER <- dat[cbind(1:nrow(dat), with(dat, TEAM1 == WINNER) + 1)]
dat$LOSER
#[1] "P" "S" "P" "S" "S" "P" "E" "A" "A"
NOTE: Modified based on #David Arenburg's comments. Also, in the dataset, 1st and 2nd columns were the 'TEAM1' and 'TEAM2'. If we have a dataset with many columns and these are not in the 1st and 2nd positions, we can subset the dataset as I showed in the comments to have only two columns
dat$LOSER <- dat[paste0('TEAM', 1:2)][cbind(1:nrow(dat),
with(dat, TEAM1==WINNER)+1L)]
Another option using data.table. For TRUE values in TEAM1==WINNER, we assign (:=) 'LOSER' as 'TEAM2'. Then, we replace the NA values in 'LOSER' with 'TEAM1'
library(data.table)
setDT(dat)[TEAM1==WINNER, LOSER:= TEAM2][is.na(LOSER), LOSER:= TEAM1]
dat
data
dat <- structure(list(TEAM1 = c("A", "I", "P", "S", "S", "W", "A", "A",
"E"), TEAM2 = c("P", "S", "S", "I", "P", "P", "E", "S", "A"),
WINNER = c("A", "I", "S", "I", "P", "W", "A", "S", "E")),
.Names = c("TEAM1",
"TEAM2", "WINNER"), class = "data.frame", row.names = c(NA, -9L))
I was unable to resist to write a dplyr way.
library(dplyr)
dat %>%
mutate(LOSER = ifelse(TEAM1 == WINNER, TEAM2, TEAM1))
TEAM1 TEAM2 WINNER LOSER
1 A P A P
2 I S I S
3 P S S P
4 S I I S
5 S P P S
6 W P W P
7 A E A E
8 A S S A
9 E A E A

Resources