Say,
I create a dataframe as:
dataframe <- data.frame("x" = c("aaa/bbb", "ccc", "ddd/eee/fff"),
"y" = c(9,2,1),
"z" = c(7,5,8))
and another dataframe as
list <- data.frame("m" = c("ccc"))
then I can select the matches rows from first dataframe as:
result<-merge(list,dataframe,by.x= "m",by.y="x")
but how can I match when my list dataframe is:
list <- data.frame("m" = c("fff","bbb"))
I am looking for a results like:
x y z
aaa/bbb 9 7
ddd/eee/fff 1 8
Thanks.
I think it's not a merge issue but a filter one. You can try this:
df1[grep(paste(df2$m, collapse = "|"), df1$x), ]
# x y z
# 1 aaa/bbb 9 7
# 3 ddd/eee/fff 1 8
It's not a good habit to assign variables with existing object or function names. So I change your dataframe and list to df1 and df2.
Related
I have a name list, such as the following:
> myNamedList
(...)
$`1870`
[1] 84.24639
$`1871`
[1] 84.59707
(...)
I would like to assign these values in a dataframe's column where the list element's name corresponds to the dataframe's row number. For now I am proceeding like this:
for (element in names(myNamedList)) {
targetDataFrame[as.numeric(element),][[columnName]] = myNamedList[[element]]
}
This is quite slow if the list is somewhat large, and also not very R-esque. I believe I could do something with apply, but am not sure where to look. Appreciate your help.
Add a row number to original data, then stack the list, then merge. See example:
# example
#data
set.seed(1); d <- data.frame(x = sample(LETTERS, 5))
#named list
x <- list("2" = 11, "4" = 22)
#add a row number
d$rowID = seq(nrow(d))
# stack the list, and merge
merge(d, stack(x), by.x = "rowID", by.y = "ind", all.x = TRUE)
# rowID x values
# 1 1 Y NA
# 2 2 D 11
# 3 3 G NA
# 4 4 A 22
# 5 5 B NA
Imagine that I have a list
l <- list("a" = 1, "b" = 2)
and a data frame
id value
a 3
b 4
I want to match id with list names, and apply a function on that list with the value in data frame. For example, I want the sum of value in the data frame and corresponding value in the list, I get
id value
a 4
b 6
Anyone has a clue?
Edit:
A.
I just want to expand the question a little bit with. Now, I have more than one value in every elements of list.
l <- list("a" = c(1, 2), "b" =c(1, 2))
I still want the sum
id value
a 6
b 7
We can match the names of the list with id of dataframe, unlist the list accordingly and add it to value
df$value <- unlist(l[match(df$id, names(l))]) + df$value
df
# id value
#1 a 4
#2 b 6
EDIT
If we have multiple entries in list we need to sum every list after matching. We can do
df$value <- df$value + sapply(l[match(df$id, names(l))], sum)
df
# id value
#1 a 6
#2 b 7
You just need
df$value=df$value+unlist(l)[df$id]# vector have names can just order by names
df
id value
1 a 4
2 b 6
Try answer with Ronak
l <- list("b" = 2, "a" = 1)
unlist(l)[as.character(df$id)]# if you id in df is factor
a b
1 2
Update
df$value=df$value+unlist(lapply(l,sum))[df$id]
I have a toy example to explain what I am trying to work on :
aski = data.frame(x=c("a","b","c","a","d","d"),y=c("b","a","d","a","b","c"))
I managed to do assigning unique ids to column y and now output looks like:
aski2 = data.frame(x=c("a","b","c","a","d","d"),y=c("1","2","3","2","1","4"))
as you see "b" is present in both col x and y and we assigned an id=1 in col y
and "a" with id=2 in col y and so on..
As you see these values are also present in col x.....
col x has "a" as its first element ."a" was also in col y and assigned an id=2
so I'll assign an id=2 for a in col x also
Now what i m trying to do next is look for these values in col x and if it occurs in col y I assign that id to it
FINAL DATAFRAME LIKE
aski3 = data.frame(x=c("2","1","4","2","3","3"),y=c("1","2","3","2","1","4"))
Without the need to create aski2 as an intermediate, a possible solution is to use match with lapply to get the numeric representations of the letters:
# create a vector of the unique values in the order
# in which you want them assigned to '1' till '4'
v <- unique(aski$y)
# convert both columns to integer values with 'match' and 'lapply'
aski[] <- lapply(aski, match, v)
which gives:
> aski
x y
1 2 1
2 1 2
3 4 3
4 2 2
5 3 1
6 3 4
If you want the number as characters, you can additionally do:
aski[] <- lapply(aski, as.character)
First, convert both columns to character vectors.
Then, collect all unique values from the two columns to use as levels of a factor.
Convert both columns to factors, then numeric.
aski = data.frame(x=c("a","b","c","a","d","d"),y=c("b","a","d","a","b","c"))
aski$x <- as.character(aski$x)
aski$y <- as.character(aski$y)
lev <- unique(c(aski$y, aski$x))
aski$x <- factor(aski$x, levels=lev)
aski$y <- factor(aski$y, levels=lev)
aski$x <- as.numeric(aski$x)
aski$y <- as.numeric(aski$y)
aski
A solution from dplyr. We can first create a vector showing the relationship between index and letter as vec by unique(aski$y). After this step, you can use Jaap's lapply solution, or you can use mutata_all from dplyr as follows.
# Create the vector showing the relationship of index and letter
vec <- unique(aski$y)
# View vec
vec
[1] "b" "a" "d" "c"
library(dplyr)
# Modify all columns
aski2 <- aski %>% mutate_all(funs(match(., vec)))
# View the results
aski2
x y
1 2 1
2 1 2
3 4 3
4 2 2
5 3 1
6 3 4
Data
aski <- data.frame(x = c("a","b","c","a","d","d"),
y = c("b","a","d","a","b","c"),
stringsAsFactors = FALSE)
I'm trying to convert a dataframe consisting of two columns into a named vector (nested list). The information in each row is essentially key:value pairs, so the lists in the final vector should each be named by the keys and contain their respective values.
Example input:
Var1 Var2
A 1
A 2
B 1
B 3
C 3
C 4
C 5
Example Output:
namedArray = list(A = c(1,2), B = c(1,3), C = c(3,4,5))
I managed to do this using dcast() in the reshape2 package, however this required additional post-processing to remove row names and NA's introduced by casting the data frame.
Is there a more efficient way to accomplish this?
If you have 2 columns: X and Y in dataframe df1, and you want Y's values to be the names of items with values from X:
myList <- as.list(df1$X)
names(myList) <- df1$Y
For the modified question, the answer is that there is already a functions that does exactly that ( and might have been a better answer that what I gave:
> split(dat$Var2, dat$Var1)
$A
[1] 1 2
$B
[1] 1 3
$C
[1] 3 4 5
Thank you #42- and #MMerry for getting me to think about split(). I found a nice solution splitting one variable by the other and wrapping the output into a list.
y <- as.list(split(df$Var2, df$Var1))
If you want key value pairs in a list from a data frame a technique could look like this:
x = data.frame(x=letters[1:5],y=1:5)
y = split(x,seq(1:nrow(x)))
names(y) = x$x
y$a
I am new to R and I am trouble with a command that I did all the time in Python.
I have two data-frames (database and creditIDs), and what I want to do is compare one column in database and one column in creditIDs. More specifically in a value exists in creditIDs[,1] but doesn't in database[,5], I want to delete that entire row in database.
Here is the code:
for (i in 1:lengthColumns){
if (!(database$credit_id[i] %in% creditosVencidos)){
database[i,]<-database[-i,]
}
}
But I keep on getting this error:
50: In `[<-.data.frame`(`*tmp*`, i, , value = structure(list( ... :
replacement element 50 has 9696 rows to replace 1 rows
Could someone explain why this is happening? Thanks!
the which() command will return the row indices that satisfy a boolean statement, much like numpy.where() in python. Using the $ after a dataframe with a column name gives you a vector of that column... alternatively you could do d[,column_number].
In this example I'm creating an x and y column which share the first five values, and use which() to slice the dataframe on their by-row equality:
L3 <- LETTERS[1:3]
fac <- sample(L3, 10, replace = TRUE)
(d <- data.frame(x = rep(1:5, 2), y = 1:10, fac = fac))
d = d[which(d$x == d$y),]
d
x y fac
1 1 A
2 2 B
3 3 C
4 4 B
5 5 B
You will need to adjust this for your column names/numbers.
# Create two example data.frames
creditID <- data.frame(ID = c("896-19", "895-8", "899-1", "899-5"))
database <- data.frame(ID = c("896-19", "camel", "899-1", "goat", "899-1"))
# Method 1
database[database$ID %in% creditID$ID, ]
# Method 2 (subset() function)
database <- subset(database, ID %in% creditID$ID)