This question already has answers here:
How do I combine two data-frames based on two columns? [duplicate]
(3 answers)
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 5 years ago.
I have two Data frames. The first one is jobs:
Jobs <- data.frame(Company = c("A","B"), Name = c("Peter","Peter"), Job = c("CEO","Member of the Board"))
The Second one is Media_Appearence:
Media_Appearence <- data.frame(Company = c("A","A","B","B","A","A","A","A"),Name = c("Peter","Peter","Peter","Peter","Peter","Peter","Peter","Peter"))
Is there any way, using basic R commands,to insert a new column on Media_Appearence filling it using matching arguments (Checking First two columns anda returning the matching job) ? The desired output would be:
Media_Appearence <- data.frame(Company = c("A","A","B","B","A","A","A","A"),Name = c("Peter","Peter","Peter","Peter","Peter","Peter","Peter","Peter"),Job= c("CEO","CEO","Member of the Board","Member of the Board","CEO","CEO","CEO","CEO"))
Already tried to merge but results were messy.
Thanks
What you want to do is commonly called a "join", "merge" or "appending" data. R wouldn't be much of a language if you couldn't do that!
The field that is matched on is called a "key". In this case, R assumes your key is Company, because it is the only field they have in common.
Jobs <- data.frame(Company = c("A","B"), Name = c("Peter","Peter"), Job = c("CEO","Member of the Board"))
Media_Appearence <- data.frame(Company = c("A","A","B","B","A","A","A","A"),Name = c("Peter","Peter","Peter","Peter","Peter","Peter","Peter","Peter"))
merge(Media_Appearence, Jobs)
Company Name Job
1 A Peter CEO
2 A Peter CEO
3 A Peter CEO
4 A Peter CEO
5 A Peter CEO
6 A Peter CEO
7 B Peter Member of the Board
8 B Peter Member of the Board
You may have noticed that your data was sorted for you. It was sorted by the default key, which in this case is Company.
By using match , would not change the original order of data.frame
Media_Appearence$Job=Jobs$Job[match(Media_Appearence$Company, Jobs$Company)]
Media_Appearence
Company Name Job
1 A Peter CEO
2 A Peter CEO
3 B Peter Member of the Board
4 B Peter Member of the Board
5 A Peter CEO
6 A Peter CEO
7 A Peter CEO
8 A Peter CEO
Related
I have just recently started using R for my master thesis. I need to match the ID number (uuid) of dataframe 1 to the investee names in dataframe 2.
Dataframe 1
investee_name uuid
1 Wetpaint e1393508
2 Zoho bf4d7b0e
3 Digg 5f2b40b8
4 Omidyar Network f4d5ab44
5 Facebook df662812
6 Trinity Ventures 7ca12f
Dataframe 2:
investee_name investor_name investor_type
1 Facebook cel organization
2 Facebook Grock Partners organization
3 Facebook Partners organization
4 Photobucket Ventures organization
5 Geni Fund organization
6 Gizmoz Capital organization
As you can see, in Dataframe 2 the investee names appear mutliple times. With VLookup in Excel I could have easily matched the respective IDs from dataframe 1 but for some reason the merging does not work in R.
I have tried the following:
investments_complete <- merge(v2_investments, ID_organizations, by.x= names(v2_investments)[1], by.y= names(ID_organizations)[1])
v2_investments_complete <- (merge(ID_organizations,v2_investments, by = "investee_name"))
for both options it merges the ID colums but I get 0 observations.
At last, I tried this:
v2_investments_merged <- merge(v2_investments, ID_organizations, by.x = "investee_name", by.y = "investee_name", all.x= TRUE)
here the merge works and all needed observations are there but al IDs have the value NA.
Is there any kind of merge function that mirrors the Vlookup that I intend to do? I've spent hours trying to solve this but couldn't, so I would be very grateful for support!
Cheers,
Philipp
It is possible that there are some leading/lagging spaces in the by columns. One option is trimws from base R which would remove the whitespace from both ends (if any)
v2_investments$investee_name <- trimws(v2_investments$investee_name)
ID_organizations$investee_name <- trimws(ID_organizations$investee_name)
Now, the merge should work
I am analyzing a data set which is feedback from teachers. Each line in the data frame is a teacher, each of their answers is a variable, however I've run into a problem inputting the year level for each teacher as a lot of the teachers teach multiple grades.
eg:
Teacher Year
a 1
b 3
c 1/2
d 7
e 3/4
How can I enter this data into an excel sheet and then into R and analyse it usefully? I've never dealt with a variable before which contains multiple options on the same row.
Suppose you already have this data in R in an object called teacher_data. I will show you the way to deal with such responses that I have seen most commonly employed: you create additional columns so that each answer gets its own cell via the convenient tidyr function separate().
library(tidyr)
separate(teacher_data, col = "Year", into = paste0("Year", 1:2), sep = "/")
Here's the result:
Teacher Year1 Year2
1 a 1 <NA>
2 b 3 <NA>
3 c 1 2
4 d 7 <NA>
5 e 3 4
How you then use those columns kind of depends on what sort of answer you're trying to ask with the data. This part of your question is probably best asked at the sister site Cross Validated (Stack Exchange for statistics).
As far as Excel goes, I would not even deal with Excel as an intermediate step; it's just unnecessary. If you write the data out when you're done into a CSV, Excel can read CSVs just fine:
write.csv(teacher_data, file = "teacher_data.csv", row.names = FALSE)
Also, just so you know, I put your data into R via the following:
teacher_data <- read.table(header = TRUE, stringsAsFactors = FALSE, text = "
Teacher Year
a 1
b 3
c 1/2
d 7
e 3/4")
This question already has an answer here:
Pick one random element from a vector for each row of a data.table
(1 answer)
Closed 6 years ago.
This is my dataframe/data.table that shows what people has already eaten and my target field NextItem which is one next random item(uneaten) that they may eat
library(data.table)
dt <- read.table(text='
Name ItemEaten NextItem
John rice banana
John butter banana
Sarah bread apple
Vinny apple coke
',header=T)
setDT(dt)
And this vector x is my universe of food items:
x<- c("apple","pepsi","rice","coke","banana","butter","bread")
The NextItem field should only include food items from the x vector that is not already eaten by the given individual(not in ItemEaten Field). For example, John has already eaten rice & butter,thats why John should have one of the five remaining food items in the NextItem field. I have tried dt[,NextItem:= sample(x- ItemEaten,1),by=Name]
Thanks to the contributions from all the above commenters, I was able to find a solution to my problem. The following code gets the job done perfectly.
dt[, NextItem := sample(setdiff(x, ItemEaten), 1), by = Name]
I have a data.frame with names of football players, for example:
names <- data.frame(id=c(1,2,3,4,5,6,7),
year=c('Maradona', 'Cruyff', 'Messi', 'Ronaldo', 'Pele', 'Van Basten', 'Diego'))
> names
id year
1 1 Maradona
2 2 Cruyff
3 3 Messi
4 4 Ronaldo
5 5 Pele
6 6 Van Basten
7 7 Diego
I also have a 6,000 scraped text files, containing stories about these football players. These stories are stored as 6,000 elements in a large vector called stories.
Is there a way a loop (or an apply function) can be written that searches for the names of each of the football players. If a match or multiple matches occur, I would like to record the element number and the name(s) of the football player.
For example, consider the following text in stories[1]:
Diego Armando Maradona (born 30 October 1960) is a retired Argentine
professional footballer. He has served as a manager and coach at other
clubs as well as the national team of Argentina. Many in the sport,
including football writers, former players, current players and
football fans, regard Maradona as the greatest football player of all
time. He was joint FIFA Player of the 20th Century
with Pele.
The ideal data.frame would have the following structure:
> outcome
element name1 name2
1 1 Maradona Pele
Does somebody know a way to write such a code that results in one data.frame for with information on all football players?
I just did it with a loop, but maybe you can do it with an apply function
#Make sure you include stringsAsFactors = F or my code won't work
football_names <- data.frame(id=c(1:7),
year=c('Maradona', 'Cruyff', 'Messi', 'Ronaldo', 'Pele', 'Van Basten', 'Diego'),stringsAsFactors = F)
outcome <- data.frame(element=football_names$id)
for (i in 1:nrow(football_names)){
names_in_story <- football_names$year[football_names$year %in% unlist(strsplit(stories[i],split=" "))]
for (j in 1:length(names_in_story)){
outcome[i,j+1] <- names_in_story[j]
}
}
names(outcome) <- c("element",paste0("name",1:(ncol(outcome)-1)))
I don't undertsand your question exactly. But you can try to use a string match using astringr function and lapply.
I assumed that your data stories is a list.
The function finds all names you provide into the function as a vector and counts their occurence. The output is again a list.
foo <- function(x,y) table(unlist(str_match_all(x,paste0(y,collapse = "|"))))
The result
res <- lapply(series, foo,names$year)
Then you can merge and sum up the data (rowSums()) for example like this:
Reduce(function(...) merge(..., all=T, by="Var1"), res)
I have two data.table objects in my R workspace that share multiple variable names. I want to join them on one of the variables that they have in common but display both values of the other.
people<-data.table(name=c('Joe','Bob','Adam'),
zip=c(98112,98101,61604),
)
setkey(people,name)
address<-data.table(zip=c(98112,61604,94521),
state=c('WA','IL','CA'),
name=c('Puget Sound','Central IL','SF Bay Area')
)
setkey(address,zip)
address[J(people$zip),.(state,name),people[,.(zip,name)]]
name zip state name
1: Adam 61604 IL Adam
2: Bob 98101 NA Bob
3: Joe 98112 WA Joe
The join is returning the people$name twice. How do I get it to return people$name once and address$name once?