This question already has answers here:
unique / sort in data.frame
(3 answers)
Closed 6 years ago.
I have my table(input):
user_id is_Leaver age
1 Helen yes 25
2 Helen yes 25
3 Helen yes 25
4 Rob no 31
5 Rob no 31
I need to have table with unique logs(Output):
user_id is_Leaver age
1 Helen yes 25
2 Rob no 31
Thanks!
Do this :
let's say your dataframe = df;
df <- unique(df)
It should do the trick.
Related
This question already has answers here:
Find value corresponding to maximum in other column [duplicate]
(2 answers)
Closed 2 years ago.
This is my dataframe in r studio. I'm trying to find code what will produce the name of the student with he highest age.
students.df #Name of dataframe
name DAD BDA gender nationality age
1 Amy 80 70 F IRL 20
2 Bill 65 50 M UK 21
3 Carl 50 80 M IRL 22
as.character(subset(students.df,students.df$age==max(students.df$age))$name)
library(dplyr)
students.df %>% filter(age==max(age)) %>% select(name)
you can try this
students.df[which.max(student.df$age),]
This question already has answers here:
Calculate difference between values in consecutive rows by group
(4 answers)
Closed 2 years ago.
I have a dataframe that looks like this:
Name Date
David 2019-12-23
David 2020-1-10
David 2020-2-13
Kevin 2019-2-12
Kevin 2019-3-19
Kevin 2019-5-1
Kevin 2019-7-23
Basically, I'm trying to calculate the date difference between each instance, specific to each person. I am currently using the following for-loop:
df$daysbetween <- with(df, ave(as.numeric(date) , name,
FUN=function(x) { z=c(NA,NA);
for( i in seq_along(x)[-(1:2)] ){
z <- c(z, (x[i]-x[i-1]))}
return(z) }) )
Currently, it calculates the difference between the second and third, and any following instance, perfectly fine. However, it doesn't calculate the difference between the first and second date and I need it to. Where is the error in my code coming from? Would appreciate any help.
transform(df, diff = ave(Date, Name, FUN = function(x)c(NA,diff(as.Date(x)))))
Name Date diff
1 David 2019-12-23 <NA>
2 David 2020-1-10 18
3 David 2020-2-13 34
4 Kevin 2019-2-12 <NA>
5 Kevin 2019-3-19 35
6 Kevin 2019-5-1 43
7 Kevin 2019-7-23 83
Just use lag from the dplyr package:
Description:
Find the "previous" (lag()) or "next" (lead()) values in a vector. Useful for comparing values behind of or ahead of the current values.
df %>%
group_by(name) %>%
mutate(diff = date - lag(date))
Output:
name date diff
<chr> <date> <drtn>
1 David 2019-12-23 NA days
2 David 2020-01-10 18 days
3 David 2020-02-13 34 days
4 Kevin 2019-02-12 NA days
5 Kevin 2019-03-19 35 days
6 Kevin 2019-05-01 43 days
7 Kevin 2019-07-23 83 days
This question already has answers here:
How can I match fuzzy match strings from two datasets?
(7 answers)
Closed 3 years ago.
I have 2 dataframes and I want to join by name, but names are not written exactly the same:
Df1:
ID Name Age
1 Jose 13
2 M. Jose 12
3 Laura 8
4 Karol P 32
Df2:
Name Surname
José Hall
María José Perez
Laura Alza
Karol Smith
I need to join and get this:
ID Name Age Surname
1 Jose 13 Hall
2 M. Jose 12 Perez
3 Laura 8 Alza
4 Karol P 32 Smith
How to consider that the names are not exactly the same before to join?
You can get close to your result using stringdist_left_join from fuzzyjoin
library(fuzzyjoin)
stringdist_left_join(df1, df2, by = "Name")
# ID Name.x Age Name.y Surname
#1 1 Jose 13 José Hall
#2 2 M. Jose 12 <NA> <NA>
#3 3 Laura 8 Laura Alza
#4 4 Karol P 32 Karol Smith
For the example shared it does not work for 1 entry since it is difficult to match Maria with M.. You can get the result for it by adjusting the max_dist argument to a higher value (default is 2) however, this will screw up other results and would give unwanted matches. If you have minimal NA entries (like the example shared) after this join you could just match them by "hand".
I would clean the database before (for example deleting those ´, in excel is easy doing those replace) and then use
new_df <- merge(df1, df2, by="name")
or you could try to assign an ID for df2 that coincide with df2 if it is possible.
This question already has answers here:
What is the difference between `%in%` and `==`?
(3 answers)
Subset dataframe by multiple logical conditions of rows to remove
(8 answers)
Closed 4 years ago.
i have a df as below: where there are 2 columns, student names and marks.
Stud_name Marks
Jon 25
john 20
ajay 50
ram 27
jay 61
jess 46
troy 23
mike 42
steve 45
glenn 43
i want few name and their marks.
expected output:
Stud_name Marks
john 20
ajay 50
jess 46
troy 23
ram 27
glenn 43
please help.
i tried:
pd <- filter(df,Stud_name == "john" , "ajay" , "jess")
Error in filter_impl(.df, quo) :
Evaluation error: operations are possible only for numeric, logical or
complex types.
You can try this, if you can think to use a base solution:
# your data
dats <- read.table(text='Stud_name Marks
Jon 25
john 20
ajay 50
ram 27
jay 61
jess 46
troy 23
mike 42
steve 45
glenn 43',sep='', header=T)
# vector with choosen names
names <- c("john","ajay","jess")
dats[which(dats$Stud_name %in% names),]
or (thanks #markus):
dats[(dats$Stud_name %in% names),]
Stud_name Marks
2 john 20
3 ajay 50
6 jess 46
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
i have a simple question. I have a big df like :
Name AGE Order
Anna 25 1
Anna 28 2
Peter 10 1
Paul 15 1
Mary 14 1
John 8 1
Charlie 24 2
Robert 20 2
For just Order= 1 , I need filter AGE>=10 & AGE<=15. So output file must be:
Name AGE Order
Anna 28 2
Peter 10 1
Paul 15 1
Mary 14 1
Charlie 24 2
Robert 20 2
Could you help me, please?
We can use vectorized ifelse
For Order = 1 check if AGE lies in the range of 10-15, select rest rows as it is.
df[ifelse(df$Order==1, df$AGE >= 10 & df$AGE <= 15, TRUE), ]
# Name AGE Order
#2 Anna 28 2
#3 Peter 10 1
#4 Paul 15 1
#5 Mary 14 1
#7 Charlie 24 2
#8 Robert 20 2
We can also consolidate to:
subset(df, AGE >= 10 & AGE <= 15 | Order != 1)