This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 5 months ago.
I've got two datasets with different numbers of observations that I need to merge. The First one is like this:
state1 state2
1 2
2 3
3 4
4 1
The second one is like this:
state A state B distance
1 1 0
1 2 1
1 3 1
1 4 2
2 1 1
2 2 0
2 3 2
2 4 2
3 1 1
3 2 2
3 3 0
3 4 3
4 1 2
4 2 2
4 3 3
4 4 0
... ... ...
I would like to combine the distance column of the second dataset to the first one like this:
state1 state2 distance
1 2 1
2 3 2
3 4 3
4 1 2
Is there a way to add the distance info into the first dataset based on the info from the second dataset in R? Thanks.
We can use dplyr::left_join
library(dplyr)
left_join(df1, df2, by = c('state 1' = 'state A', 'state 2' = 'state B'))
This question already has answers here:
Transpose / reshape dataframe without "timevar" from long to wide format
(9 answers)
Reshape data frame by row [duplicate]
(4 answers)
Closed 4 years ago.
I have a dataframe like this one
familyid Year memberid count var
1 2000 1 2 5
1 2000 1 2 6
1 2000 2 1 8
2 2000 1 1 5
2 2000 2 1 4
3 2000 1 1 5
3 2000 2 2 7
3 2000 2 2 5
where the column count indicates how many times each observation compares in the dataframe. I want to transpose the dataframe only for those comparing more than one times, in other words I want to have
familyid Year memberid count var_1 var_2
1 2000 1 2 5 6
1 2000 2 1 8 NA
2 2000 1 1 5 NA
2 2000 2 1 4 NA
3 2000 1 1 5 NA
3 2000 2 2 7 5
What do you suggest to use for this purpose?
Thank you so much.
This question already has answers here:
Extract row corresponding to minimum value of a variable by group
(9 answers)
Closed 5 years ago.
I have a table which contains multiple rows of the different data for a key of multiple columns.
Table looks like this:
A B C
1 1 1 2
2 1 1 3
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
7 2 3 2
8 2 3 2
I also discovered how to remove all of the duplicate elements using unique command for multiple colums, so the data duplication is not a problem.
I would like to know how to for every key(columns A and B in example) in the table to find only the minimum value in third column(C column in table)
At the end table should look like this
A B C
1 1 1 2
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
Thanks for any help. It is really appreciated
In any question, feel free to ask
con <- textConnection(" A B C
1 1 1 2
2 1 1 3
3 2 1 4
4 1 2 4
5 2 2 3
6 2 3 1
7 2 3 2
8 2 3 2")
df <- read.table(con, header = T)
df[with(df, order(A, B, C)), ]
df[!duplicated(df[1:2]),]
# A B C
# 1 1 1 2
# 3 2 1 4
# 4 1 2 4
# 5 2 2 3
# 6 2 3 1
This question already has answers here:
Delete duplicate rows in two columns simultaneously [duplicate]
(2 answers)
Closed 6 years ago.
I have got the following data.frame:
df = read.table(text = 'a b c d
1 12 2 1
1 13 2 1
1 3 3 1
2 12 6 2
2 11 2 2
2 14 2 2
1 12 1 2
1 13 2 2
2 11 4 3, header = TRUE')
I need to remove the rows which have the same observations based on columns a and b, so that the results would be:
a b c d
1 12 2 1
1 13 2 1
1 3 3 1
2 12 6 2
2 11 2 2
2 13 2 2
Thank you for any help
We can use duplicated
df[!duplicated(df[1:2]),]
This question already has answers here:
How to create a consecutive group number
(13 answers)
Create group number for contiguous runs of equal values
(4 answers)
Closed 1 year ago.
I have a vector that looks like this:
a <- c("A110","A110","A110","B220","B220","C330","D440","D440","D440","D440","D440","D440","E550")
I would like to create another another vector, based on a, that should look like:
b <- c(1,1,1,2,2,2,3,4,4,4,4,4,4,5)
In other words, b should assign a value (starting from 1) to each different element of a.
First of all, (I assume) this is your vector
a <- c("A110","A110","A110","B220","B220","C330","D440","D440","D440","D440","D440","D440","E550")
As per possible solutions, here are few (can't find a good dupe right now)
as.integer(factor(a))
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
Or
cumsum(!duplicated(a))
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
Or
match(a, unique(a))
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
Also rle will work the similarly in your specific scenario
with(rle(a), rep(seq_along(values), lengths))
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
Or (which is practically the same)
data.table::rleid(a)
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
Though be advised that all 4 solutions have their unique behavior in different scenarios, consider the following vector
a <- c("B110","B110","B110","A220","A220","C330","D440","D440","B110","B110","E550")
And the results of the 4 different solutions:
1.
as.integer(factor(a))
# [1] 2 2 2 1 1 3 4 4 2 2 5
The factor solution begins with 2 because a is unsorted and hence the first values are getting higher integer representation within the factor function. Hence, this solution is only valid if your vector is sorted, so don't use it other wise.
2.
cumsum(!duplicated(a))
# [1] 1 1 1 2 2 3 4 4 4 4 5
This cumsum/duplicated solution got confused because of "B110" already been present at the beginning and hence grouped "D440","D440","B110","B110" into the same group.
3.
match(a, unique(a))
# [1] 1 1 1 2 2 3 4 4 1 1 5
This match/unique solution added ones at the end, because it is sensitive to "B110" showing up in more than one sequences (because of unique) and hence grouping them all into same group regardless of where they appear
4.
with(rle(a), rep(seq_along(values), lengths))
# [1] 1 1 1 2 2 3 4 4 5 5 6
This solution only cares about sequences, hence different sequences of "B110" were grouped into different groups