How do you find the row numbers associated with a specific text string?
I need the function to return [1] 1 2 3 4 from the dataset below.
RowNumber Name
1 Diet Pepsi
2 Pepsi Max
3 Diet Pepsi 20 oz
4 Pepsi
5 Coca-Cola
So you're looking for "Pepsi" ?
which(grepl("Pepsi", dataset$Name))
Related
So I have this first dataframe (fish18) which consists of data on fish specimens, and a column "grade" that is to be filled with conditions in an ifelse function.
species BIN collectors country grade species_frequency
1 Poecilothrissa congica BOLD:AAF7519 mljs et al, Democratic Republic of the Congo NA 2
2 Acanthurus triostegus BOLD:AAA9362 Vinothkumar S, Kaleshkumar K and Rajaram R. India NA 54
3 Pseudogramma polyacantha BOLD:AAC5137 Allan D. Connell South Africa NA 15
4 Pomadasys commersonnii BOLD:AAD1338 Allan D. Connell South Africa NA 12
5 Secutor insidiator BOLD:AAB2487 Allan D. Connell South Africa NA 18
6 Sebastes macdonaldi BOLD:AAJ7419 Merit McCrea United States NA 3
BIN_per_species collector_per_species countries_per_species species_per_bin
1 2 1 1 1
2 1 21 15 1
3 3 6 6 1
4 1 2 1 1
5 4 5 4 2
6 1 1 1 1
And after filling the grade column I have something like this (fish19)
species BIN collectors country grade species_frequency
1 Poecilothrissa congica BOLD:AAF7519 mljs et al, Democratic Republic of the Congo D 2
2 Acanthurus triostegus BOLD:AAA9362 Vinothkumar S, Kaleshkumar K and Rajaram R. India A 54
3 Pseudogramma polyacantha BOLD:AAC5137 Allan D. Connell South Africa C 15
4 Pomadasys commersonnii BOLD:AAD1338 Allan D. Connell South Africa A 12
5 Secutor insidiator BOLD:AAB2487 Allan D. Connell South Africa E 18
6 Sebastes macdonaldi BOLD:AAJ7419 Merit McCrea United States B 3
BIN_per_species collector_per_species countries_per_species species_per_bin
1 2 1 1 1
2 1 21 15 1
3 3 6 6 1
4 1 2 1 1
5 4 5 4 2
6 1 1 1 1
Both dataframes have many specimens belonging to the same species of fish, and the thing is that the grades are suposed to be assigned to each species for every specimen of that species. The problem I'm having is that some rows belonging to the same species are having different grades, specially in the case of the grades "C" and "E". What I want to incorporate into my ifelse function is: Change from grade "C" to "E" every occurrence of the dataframe where two or more specimens belonging to the same species are assigned "C" in one row and "E" in another row. Because if one species has grade "E", every other row with that species name should also have grade "E".
So far I've tried the %in% function and just using "=="
Trying with %in%
assign_grades=function(fish18){
fish19<-fish18 %>%
mutate(grade = ifelse(species_frequency<3,"D",ifelse(BIN_per_species==1 & (collector_per_species>1 | countries_per_species>1),"A",ifelse(BIN_per_species==1 & collector_per_species==1 | countries_per_species==1,"B",ifelse(BIN_per_species>1 & species_per_bin==1,"C",ifelse(species_per_bin>1,"E",ifelse(fish19$species[fish19$grade=="E"]%in%fish19$species[fish19$grade=="C"]==TRUE,"E",NA))) ))))
assign('fish19',fish19,envir=.GlobalEnv)
}
assign_grades(fish18)
Trying with "=="
assign_grades=function(fish18){
fish19<-fish18 %>%
mutate(grade = ifelse(species_frequency<3,"D",ifelse(BIN_per_species==1 & (collector_per_species>1 | countries_per_species>1),"A",ifelse(BIN_per_species==1 & collector_per_species==1 | countries_per_species==1,"B",ifelse(BIN_per_species>1 & species_per_bin==1,"C",ifelse(species_per_bin>1,"E",ifelse(fish19$species[fish19$grade=="E"]==fish19$species[fish19$grade=="C"],"E",NA))) ))))
assign('fish19',fish19,envir=.GlobalEnv)
}
assign_grades(fish18)
Both these two options did not work and the output of this alteration should be that if one occurrence of a specific species name has the grade "E" assigned to it, so should all other occurences with that same species name.
I'm sorry if this was confusion but I tried to be as clear as I could, thank you in advance for any responses.
Kind of a long winded answer, but:
dat = data.frame('species'=c('a','b','c','a','a','b'),'grade'=c('E','E','C','C','C','D'))
dat %>% left_join(dat %>%
group_by(species) %>%
summarize(sum_e = sum(grade=='E')),by='species')
Then you could do an ifelse for sum_e>0
We have a daily meeting when participants nominate each other to speak. The first person is chosen randomly.
I have a dataframe that consists of names and the order of speech every day.
I have a day1, a day2 ,a day3 , etc. in the columns.
The data in the rows are numbers, meaning the order of speech on that particular day.
NA means that the person did not participate on that day.
Name day1 day2 day3 day4 ...
Albert 1 3 1 ...
Josh 2 2 NA
Veronica 3 5 3
Tim 4 1 2
Stew 5 4 4
...
I want to create two analysis, first, I want to create a dataframe who has chosen who the most times. (I know that the result depends on if a participant was nominated before and therefore on that day that participant cannot be nominated again, I will handle it later, but for now this is enough)
It should look like this:
Name Favorite
Albert Stew
Josh Veronica
Veronica Tim
Tim Stew
...
My questions (feel free to answer only one if you can):
1. What code shall I use for it without having to manunally put the names in a different dataframe?
2. How shall I handle a tie, for example Josh chose Veronica and Tim first the same number of times? Later I want to visualise it and I have no idea how to handle ties.
I also would like to analyse the results to visualise strong connections.
Like to show that there are people who usually chose each other, etc.
Is there a good package that is specialised for these? Or how should I get to it?
I do not need DNA sequences, only this simple ones, but I have not found a suitable one yet.
Thanks for your help!
If I am not misunderstanding your problem, here is some code to get the number of occurences of who choose who as next speaker. I added a fourth day to have some count that is not 1. There are ties in the result, choosing the first couple of each group by speaker ('who') may be a solution :
df <- read.table(textConnection(
"Name,day1,day2,day3,day4
Albert,1,3,1,3
Josh,2,2,,2
Veronica,3,5,3,1
Tim,4,1,2,4
Stew,5,4,4,5"),header=TRUE,sep=",",stringsAsFactors=FALSE)
purrr::map(colnames(df)[-1],
function (x) {
who <- df$Name[order(df[x],na.last=NA)]
data.frame(who,lead(who),stringsAsFactors=FALSE)
}
) %>%
replyr::replyr_bind_rows() %>%
filter(!is.na(lead.who.)) %>%
group_by(who,lead.who.) %>% summarise(n=n()) %>%
arrange(who,desc(n))
Input:
Name day1 day2 day3 day4
1 Albert 1 3 1 3
2 Josh 2 2 NA 2
3 Veronica 3 5 3 1
4 Tim 4 1 2 4
5 Stew 5 4 4 5
Result:
# A tibble: 12 x 3
# Groups: who [5]
who lead.who. n
<chr> <chr> <int>
1 Albert Tim 2
2 Albert Josh 1
3 Albert Stew 1
4 Josh Albert 2
5 Josh Veronica 1
6 Stew Veronica 1
7 Tim Stew 2
8 Tim Josh 1
9 Tim Veronica 1
10 Veronica Josh 1
11 Veronica Stew 1
12 Veronica Tim 1
I have a data frame named as Records having 2 vectors Rank and Name
Rank Name
1 Ashish
1 Ashish
2 Ashish
3 Mark
4 Mark
1 Mark
3 Spencer
2 Spencer
1 Spencer
2 Mary
4 Joseph
I want that every name should be placed in either 1, 2 ,3 or 4 tag depending on their occurrence and uniqueness:
I want to create a new vector which will be named as Tagging
So The output should be:
Rank 1 has three unique elements Mark Spencer and Ashish so the tag is 1 for all three.
Rank 2 has one unique records which is Mary as Ashish has already been assigned tag 1 so Mary is tagged as 2.
Rank 3 has no unique records as Spencer and Mark has already been assigned 1 so I cannot tag 3 to anybody.
Rank 4 has one unique record Joseph so he gets tagged as 4.
Let me know which function can help me do this.
I do not want to use looping as this is 1000000 row database
The below solution follows the principle that the highest Rank of a person is going to be that person's tag too.
tbl <- read.table(header=TRUE, text='
Rank Name
1 Ashish
1 Ashish
2 Ashish
3 Mark
4 Mark
1 Mark
3 Spencer
2 Spencer
1 Spencer
2 Mary
4 Joseph
')
Ordering the 'tbl' dataframe by Rank
tbl_ord <- tbl[with(tbl,order(Rank)),]
Removing multiple occurrence of name within same Rank
> name_ord<- tbl_ord[duplicated(tbl_ord$Rank),]
> name_ord
Rank Name
2 1 Ashish
6 1 Mark
9 1 Spencer
8 2 Spencer
10 2 Mary
7 3 Spencer
11 4 Joseph
Displaying unique Names
#name_ord[unique(name_ord$Name),] #this will work too
> name_ord[!duplicated(name_ord$Name),]
Rank Name
2 1 Ashish
6 1 Mark
9 1 Spencer
10 2 Mary
11 4 Joseph
Using the setkey function of data.table package and unique:
library(data.table)
dt<-data.table(Rank=c(1,1,2,3,4,1,3,2,1,2,4), Name=c(rep("Ashish", 3), rep("Mark", 3), rep("Spencer", 3), "Mary", "Joseph"))
setkey(dt, Rank, Name)
dt<-unique(dt)
setkey(dt, Name)
dt<-unique(dt) # works because of the above setkey call which sorted it
setkey(dt, Rank) # if you want to order them by Rank again
Let say below is a table describing the time it takes 5 people to finish 5 jobs.
Job1 Job2 Job3 Job4 Job5
Bob 1 2 3 4 5
Joe 2 1 3 4 5
Tom 2 3 1 4 5
May 2 3 1 2 5
Sue 2 3 4 5 1
If the company has enough money to hire all 5 people, then I can see that the minimum time to complete everything is
Bob do Job1
Joe do Job2
Tom do Job3
May do Job4
Sue do Job5
1 + 1 + 1 + 2 + 1 = 6 units
I found the fastest way to solve this is to use Hungary Algorthm
(Referenced find the minimum sum of matrix (n x n) that select only one in each row and column)
Also, if company can only hire 1 person, May will be hired because everyone took 15 units to finish, where May took only 13 units.
However, if company has enough budget to hire 3 people. What's the fastest algorithm to find out which 3 people I should hire?
In the above Example, should be Bob (or Joe) + May + Sue, their total will be
Either Bob do Job 1
Either Bob do Job 2
May do Job 3
May do Job 4
Sue do Job 5
1 + 2 + 1 + 2 + 1 = 7 units
Let's say that I have the following data frame with three columns.
data = data.frame(id=c(1:10), interest_1=c("food","","","drugs","beer","soda","","","drugs","sports"),
interest_2=c("fruits","car","jeans","","","","soda","shoes","","drugs"),
interest_3=c("","","","","soda","sports","","","",""))
data
I want to get a count of each row.
The following incident, where food is interest_1, fruits is interest_2, and nothing is interest_3 occurs only once.
id interest_1 interest_2 interest_3
1 1 food fruits
The following incident, where drugs are interest_1 and nothing is interest_2 or interest_3 occurs twice.
id interest_1 interest_2 interest_3
4 drugs
9 drugs
I want to get a count the number of times that each incidence occurs. How would I go about doing this?
Output should look like:
interest_1 interest_2 interest_3 count
food fruits 1
car 1
jeans 1
drugs 2
> aggregate(id~.,data,length)
interest_1 interest_2 interest_3 id
1 drugs 2
2 car 1
3 sports drugs 1
4 food fruits 1
5 jeans 1
6 shoes 1
7 soda 1
8 beer soda 1
9 soda sports 1
Basically, this means: apply function length to to the vector made up of id values for each combination of the other columns.
require(plyr)
ddply(data, .(interest_1, interest_2, interest_3), c("nrow"))