R Add column to Table from two tables based on column value - r

I have three tables that I'm attempting to merge into one.
The main table is similar to:
Table1 <- data.frame("Data" = c(1, 2, 3, 4, 5), "Desc" = c("A", "A", "A", "B", "B"))
TableA <- data.frame("Values" = c(6, 2, 3))
TableB <- data.frame("Values" = c(2, 7))
I want to add another column to Table1 with the values from TableA and TableB, but Values coming from TableA must be placed in a row containing "A" in the "Desc" column and TableB values in rows containing "B" in the "Desc" column. The number of rows in Table A equal the number of rows Table1 with "A" and same for TableB.
The resulting Table should look like:
Table1 <- data.frame("Data" = c(1, 2, 3, 4, 5), "Desc" = c("A", "A", "A", "B", "B"), "Values" = c(6, 2, 3, 2, 7))
> Table1
Data Desc Values
1 1 A 6
2 2 A 2
3 3 A 3
4 4 B 2
5 5 B 7

First note that these are "data.frames", not "tables". A "table" is actually a different class in R and they aren't the same thing. This strategy should work
Table1$Values <- NA
Table1$Values[Table1$Desc=="A"] <- TableA$Value
Table1$Values[Table1$Desc=="B"] <- TableB$Value
Table1
# Data Desc Values
# 1 1 A 6
# 2 2 A 2
# 3 3 A 3
# 4 4 B 2
# 5 5 B 7

If you have multiple Table (TableA, TableB, TableC,...etc) and if you need to match the suffix of Table. to Table1 column Desc
ls1 <- ls(pattern="Table")
ls1
#[1] "Table1" "TableA" "TableB"
library(stringr)
indx <- str_extract(ls1[-1], perl('(?<=Table)[A-Z]'))
lst1 <- mget(ls1[-1])
do.call(rbind,
lapply(seq_along(lst1),function(i) {
x1 <- lst1[[i]]
x2 <- Table1[!is.na(match(Table1$Desc, indx[i])),]
x2$Values <- x1$Values
x2}
))
# Data Desc Values
#1 1 A 6
#2 2 A 2
#3 3 A 3
#4 4 B 2
#5 5 B 7
In the first step, after I created the objects (Table.), looked for the object names ls(pattern="Table")
Extracted the suffix LETTERS A, B from the objects that needs to be matched. Used regex lookbehind i.e. (?<=Table)[A-Z] matches a substring (uppercase letter) preceded by the string Table and extract the substring.
mget returns the value of the objects as a list
Loop using lapply. Match the Desc column in Table1 with the extracted suffix and created a new column

Related

How to return 2 specific rows from a dataframe?

firstVector <- c("A", "B", "C", "D", "E")
secondVector <- c(1, 2, 3, 4, 5)
thirdVector <- c("a", "b", "c", "d", "e")
myDataFrame <- data.frame(firstVector, secondVector, thirdVector)
How do I extract row 3 and 4 from my data frame? I want to print it row 3 and 4 in order it to look like this:
firstVector secondVector thirdVector
3 C 3 c
4 D 4 d
You can subset your dataframe like this [rows,columns]:
myDataFrame[c(3,4),]
In your case you want a vector containing rows 3 and 4, therefore c(3,4), you can add more columns in the vector to subset more rows, for example c(1,2,3,12).
If you dont provide an argument it returns the whole dimension. In your example you subset rows, and return all the columns
it's the same for columns:
myDataFrame[c(3,4),c(1,2)]
you can subset rows 3 and 4 and columns 1 and 2.
Another way to do this is using :
c(1:4) means from 1 to 4
Hope this helps

Input value in an empty string in a column

I need to replace the empty parts of a column with an alphabet ("S").
titanic3$embarked <- as.character(titanic3$embarked)
> str(titanic3)
titanic3$embarked[is.na(titanic3$embarked)] <- "S"
> t1 <- select(titanic3, embarked)
> View(t1)
You could use which function. My example:
# Create data frame with two columns: value and value2. The last has 2 NA values
df <- data.frame("value"= c(1, 3, 5, 7), "value2" = c("A", "B", NA, NA))
# Convert column to character
df$value2 <- as.character(df$value2)
# Find which value in column is NA and replace it with "S"
df$value2[which(is.na(df$value2))] <- "S"
And otuput
> df
value value2
1 1 A
2 3 B
3 5 S
4 7 S

Find the index of the row in data frame that contain one element in a string vector

If I have a data.frame like this
df <- data.frame(col1 = c(letters[1:4],"a"),col2 = 1:5,col3 = letters[10:14])
df
col1 col2 col3
1 a 1 j
2 b 2 k
3 c 3 l
4 d 4 m
5 a 5 n
I want to get the row indices that contains one of the element in c("a", "k", "n"); in this example, the result should be 1, 2, 5.
If you have a large data frame and you wish to check all columns, try this
x <- c("a", "k", "n")
Reduce(union, lapply(x, function(a) which(rowSums(df == a) > 0)))
# [1] 1 5 2
and of course you can sort the end result.
s <- c('a','k','n');
which(df$col1%in%s|df$col3%in%s);
## [1] 1 2 5
Here's another solution. This one works on the entire data.frame, and happens to capture the search strings as element names (you can get rid of those via unname()):
sapply(s,function(s) which(apply(df==s,1,any))[1]);
## a k n
## 1 2 5
Original second solution:
sort(unique(rep(1:nrow(df),ncol(df))[as.matrix(df)%in%s]));
## [1] 1 2 5

Select rows conditional on value of a column in specific order

I have a dataframe, for example like this:
df <- data.frame(ID=c(8, 2, 5, 1, 4), value=c("a", "b", "c", "d", "e"))
ID value
1 8 a
2 2 b
3 5 c
4 1 d
5 4 e
I know how to select rows with a given value in the "ID" column. But how to get rows conditional on their "ID"-values in a specified order?
Example: How to extract "value" for rows with ID 4, 2 and 5 in the given order? The result I want to get is "e", "b", "c".
Using %in% gives me the results in wrong order:
df[df$ID %in% c(4, 2, 5), "value"]
[1] b c e
Levels: a b c d e
I found a workaround using rownames, but I feel like there must be a better solution to this.
# workaround
rownames(df) <- df$ID
df[as.character(c(4, 2, 5)), "value"]
[1] e b c
Levels: a b c d e
Any suggestions?
Thank you!
You can use merge and order by a new introduced rank column :
dat = merge(df,data.frame(ID=c(4,2,5),v=1:3))
dat[order(dat$v),"value"]
[1] e b c
Or one linear option:
with(merge(df,data.frame(ID=c(4,2,5),v=1:3)),value[order(v)])
sapply(c(4,2,5), function(x) df[df$ID==x,"value"])

Select row numbers of a data frame conditioning on another data frame

I have a data frame that I want to find the row numbers where these rows are in common with another data frame.
To make the question clear, say I have data frame A and data frame B:
dfA <- data.frame(NAME = rep(c("a", "b"), each = 3),
TRIAL = rep(1:3, 2),
DATA = runif(6))
dfB <- data.frame(NAME = c("a", "b"),
TRIAL = c(2, 3))
dfA
# NAME TRIAL DATA
# 1 a 1 0.62948592
# 2 a 2 0.88041819
# 3 a 3 0.02479411
# 4 b 1 0.48031827
# 5 b 2 0.86591315
# 6 b 3 0.93448264
dfB
# NAME TRIAL
# 1 a 2
# 2 b 3
I want to get dfA's row number where dfA and dfB have the same NAME and TRIAL, in this case, row numbers are 2 and 6.
I tried the following code, gives me row 2, 3, 5, 6. It separately matches NAME and TRIAL, doesn't work.
which(dfA$NAME %in% dfB$NAME & dfA$TRIAL %in% dfB$TRIAL)
# 2 3 5 6
Then I tried to create a dummy column and match this col. Works, but the code would be verbose if dfB has many columns...
dfA$dummy <- paste0(dfA$NAME, dfA$TRIAL)
dfB$dummy <- paste0(dfB$NAME, dfB$TRIAL)
which(dfA$dummy %in% dfB$dummy)
# 2 6
I'm wondering if there are better ways to solve the problem, thanks for your help!
You can do:
merge(transform(dfA, row.num = 1:nrow(dfA)), dfB)$row.num
# [1] 2 6
And if the whole goal of finding the indices is so that you can subset dfA, then you can just do merge(dfA, dfB).
Or use duplicated:
apply(dfB, 1, function(x)
which(duplicated(rbind(x, dfA[1:2])))-1)
# [1] 2 6

Resources