partial match a dataframe column to pick rows of interest

partial match a dataframe column to pick rows of interest - r

Say,
I create a dataframe as:
dataframe <- data.frame("x" = c("aaa/bbb", "ccc", "ddd/eee/fff"),
"y" = c(9,2,1),
"z" = c(7,5,8))
and another dataframe as
list <- data.frame("m" = c("ccc"))
then I can select the matches rows from first dataframe as:
result<-merge(list,dataframe,by.x= "m",by.y="x")
but how can I match when my list dataframe is:
list <- data.frame("m" = c("fff","bbb"))
I am looking for a results like:
x y z
aaa/bbb 9 7
ddd/eee/fff 1 8
Thanks.

I think it's not a merge issue but a filter one. You can try this:
df1[grep(paste(df2$m, collapse = "|"), df1$x), ]
# x y z
# 1 aaa/bbb 9 7
# 3 ddd/eee/fff 1 8
It's not a good habit to assign variables with existing object or function names. So I change your dataframe and list to df1 and df2.

Related

Assign value in dataframe from list by list's element name = dataframe row number

I have a name list, such as the following:
> myNamedList
(...)
$`1870`
[1] 84.24639
$`1871`
[1] 84.59707
(...)
I would like to assign these values in a dataframe's column where the list element's name corresponds to the dataframe's row number. For now I am proceeding like this:
for (element in names(myNamedList)) {
targetDataFrame[as.numeric(element),][[columnName]] = myNamedList[[element]]
}
This is quite slow if the list is somewhat large, and also not very R-esque. I believe I could do something with apply, but am not sure where to look. Appreciate your help.

Add a row number to original data, then stack the list, then merge. See example:
# example
#data
set.seed(1); d <- data.frame(x = sample(LETTERS, 5))
#named list
x <- list("2" = 11, "4" = 22)
#add a row number
d$rowID = seq(nrow(d))
# stack the list, and merge
merge(d, stack(x), by.x = "rowID", by.y = "ind", all.x = TRUE)
# rowID x values
# 1 1 Y NA
# 2 2 D 11
# 3 3 G NA
# 4 4 A 22
# 5 5 B NA

apply function by name of list

Imagine that I have a list
l <- list("a" = 1, "b" = 2)
and a data frame
id value
a 3
b 4
I want to match id with list names, and apply a function on that list with the value in data frame. For example, I want the sum of value in the data frame and corresponding value in the list, I get
id value
a 4
b 6
Anyone has a clue?
Edit:
A.
I just want to expand the question a little bit with. Now, I have more than one value in every elements of list.
l <- list("a" = c(1, 2), "b" =c(1, 2))
I still want the sum
id value
a 6
b 7

We can match the names of the list with id of dataframe, unlist the list accordingly and add it to value
df$value <- unlist(l[match(df$id, names(l))]) + df$value
df
# id value
#1 a 4
#2 b 6
EDIT
If we have multiple entries in list we need to sum every list after matching. We can do
df$value <- df$value + sapply(l[match(df$id, names(l))], sum)
df
# id value
#1 a 6
#2 b 7

You just need
df$value=df$value+unlist(l)[df$id]# vector have names can just order by names
df
id value
1 a 4
2 b 6
Try answer with Ronak
l <- list("b" = 2, "a" = 1)
unlist(l)[as.character(df$id)]# if you id in df is factor
a b
1 2
Update
df$value=df$value+unlist(lapply(l,sum))[df$id]

How to look for uniques in other column relatively assign ids

I have a toy example to explain what I am trying to work on :
aski = data.frame(x=c("a","b","c","a","d","d"),y=c("b","a","d","a","b","c"))
I managed to do assigning unique ids to column y and now output looks like:
aski2 = data.frame(x=c("a","b","c","a","d","d"),y=c("1","2","3","2","1","4"))
as you see "b" is present in both col x and y and we assigned an id=1 in col y
and "a" with id=2 in col y and so on..
As you see these values are also present in col x.....
col x has "a" as its first element ."a" was also in col y and assigned an id=2
so I'll assign an id=2 for a in col x also
Now what i m trying to do next is look for these values in col x and if it occurs in col y I assign that id to it
FINAL DATAFRAME LIKE
aski3 = data.frame(x=c("2","1","4","2","3","3"),y=c("1","2","3","2","1","4"))

Without the need to create aski2 as an intermediate, a possible solution is to use match with lapply to get the numeric representations of the letters:
# create a vector of the unique values in the order
# in which you want them assigned to '1' till '4'
v <- unique(aski$y)
# convert both columns to integer values with 'match' and 'lapply'
aski[] <- lapply(aski, match, v)
which gives:
> aski
x y
1 2 1
2 1 2
3 4 3
4 2 2
5 3 1
6 3 4
If you want the number as characters, you can additionally do:
aski[] <- lapply(aski, as.character)

First, convert both columns to character vectors.
Then, collect all unique values from the two columns to use as levels of a factor.
Convert both columns to factors, then numeric.
aski = data.frame(x=c("a","b","c","a","d","d"),y=c("b","a","d","a","b","c"))
aski$x <- as.character(aski$x)
aski$y <- as.character(aski$y)
lev <- unique(c(aski$y, aski$x))
aski$x <- factor(aski$x, levels=lev)
aski$y <- factor(aski$y, levels=lev)
aski$x <- as.numeric(aski$x)
aski$y <- as.numeric(aski$y)
aski

A solution from dplyr. We can first create a vector showing the relationship between index and letter as vec by unique(aski$y). After this step, you can use Jaap's lapply solution, or you can use mutata_all from dplyr as follows.
# Create the vector showing the relationship of index and letter
vec <- unique(aski$y)
# View vec
vec
[1] "b" "a" "d" "c"
library(dplyr)
# Modify all columns
aski2 <- aski %>% mutate_all(funs(match(., vec)))
# View the results
aski2
x y
1 2 1
2 1 2
3 4 3
4 2 2
5 3 1
6 3 4
Data
aski <- data.frame(x = c("a","b","c","a","d","d"),
y = c("b","a","d","a","b","c"),
stringsAsFactors = FALSE)

How to convert two columns of dataframe into named vector?

I'm trying to convert a dataframe consisting of two columns into a named vector (nested list). The information in each row is essentially key:value pairs, so the lists in the final vector should each be named by the keys and contain their respective values.
Example input:
Var1 Var2
A 1
A 2
B 1
B 3
C 3
C 4
C 5
Example Output:
namedArray = list(A = c(1,2), B = c(1,3), C = c(3,4,5))
I managed to do this using dcast() in the reshape2 package, however this required additional post-processing to remove row names and NA's introduced by casting the data frame.
Is there a more efficient way to accomplish this?

If you have 2 columns: X and Y in dataframe df1, and you want Y's values to be the names of items with values from X:
myList <- as.list(df1$X)
names(myList) <- df1$Y
For the modified question, the answer is that there is already a functions that does exactly that ( and might have been a better answer that what I gave:
> split(dat$Var2, dat$Var1)
$A
[1] 1 2
$B
[1] 1 3
$C
[1] 3 4 5

Thank you #42- and #MMerry for getting me to think about split(). I found a nice solution splitting one variable by the other and wrapping the output into a list.
y <- as.list(split(df$Var2, df$Var1))

If you want key value pairs in a list from a data frame a technique could look like this:
x = data.frame(x=letters[1:5],y=1:5)
y = split(x,seq(1:nrow(x)))
names(y) = x$x
y$a

Comparing two columns

I am new to R and I am trouble with a command that I did all the time in Python.
I have two data-frames (database and creditIDs), and what I want to do is compare one column in database and one column in creditIDs. More specifically in a value exists in creditIDs[,1] but doesn't in database[,5], I want to delete that entire row in database.
Here is the code:
for (i in 1:lengthColumns){
if (!(database$credit_id[i] %in% creditosVencidos)){
database[i,]<-database[-i,]
}
}
But I keep on getting this error:
50: In `[<-.data.frame`(`*tmp*`, i, , value = structure(list( ... :
replacement element 50 has 9696 rows to replace 1 rows
Could someone explain why this is happening? Thanks!

the which() command will return the row indices that satisfy a boolean statement, much like numpy.where() in python. Using the $ after a dataframe with a column name gives you a vector of that column... alternatively you could do d[,column_number].
In this example I'm creating an x and y column which share the first five values, and use which() to slice the dataframe on their by-row equality:
L3 <- LETTERS[1:3]
fac <- sample(L3, 10, replace = TRUE)
(d <- data.frame(x = rep(1:5, 2), y = 1:10, fac = fac))
d = d[which(d$x == d$y),]
d
x y fac
1 1 A
2 2 B
3 3 C
4 4 B
5 5 B

You will need to adjust this for your column names/numbers.
# Create two example data.frames
creditID <- data.frame(ID = c("896-19", "895-8", "899-1", "899-5"))
database <- data.frame(ID = c("896-19", "camel", "899-1", "goat", "899-1"))
# Method 1
database[database$ID %in% creditID$ID, ]
# Method 2 (subset() function)
database <- subset(database, ID %in% creditID$ID)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

partial match a dataframe column to pick rows of interest - r

I think it's not a merge issue but a filter one. You can try this: df1[grep(paste(df2$m, collapse = "|"), df1$x), ] # x y z # 1 aaa/bbb 9 7 # 3 ddd/eee/fff 1 8 It's not a good habit to assign variables with existing object or function names. So I change your dataframe and list to df1 and df2.

Related

Assign value in dataframe from list by list's element name = dataframe row number

apply function by name of list

How to look for uniques in other column relatively assign ids

How to convert two columns of dataframe into named vector?

Comparing two columns

Categories

Resources