Fastest way to populate a column from another table in R?

Fastest way to populate a column from another table in R? - r

I have 2 tables and I need to create a new column "Number" where it populates it with the col2 respective value. My data contains couple of hundreds of rows and I used for loop to populate it but it takes lots of time is there a faster way?
Col1
Col2
A
55
B
77
C
80
D
9
Letter
Number
A
B
C
D

Using match.
transform(df2, Number=df1[match(Letter, df1$Col1), ]$Col2)
# Letter Number
# 1 A 55
# 2 B 77
# 3 C 80
# 4 D 9
Data:
df1 <- structure(list(Col1 = c("A", "B", "C", "D"), Col2 = c(55L, 77L,
80L, 9L)), class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(Letter = c("A", "B", "C", "D"), Number = c(NA,
NA, NA, NA)), class = "data.frame", row.names = c(NA, -4L))

Related

Filter rows that contain values from a list [duplicate]

This question already has answers here:
Select rows from a data frame based on values in a vector
(3 answers)
Closed last month.
I'm using RStudio, and I have one dataframe containing a list of candidates, and a dataframe containing all votes of each voting location, and by candidate.
I want to extract only the rows containing the votes of the candidates of this list.
Example:
List:
Candidate
A
B
D
G
Votes:
Candidate Number of Votes
A 124
B 52
C 13
D 62
E 33
F 7
G 67
I want then to create a new dataframe containing only the candidates and votes of the "List":
Votes of listed candidates:
Candidate Number of Votes
A 124
B 52
D 62
G 67
The example is a simplification. My database contains over 30.000 "candidates"
Thanks in advance

We can use subset in base R
subset(Votes, Candidate %in% List$Candidate)

You can merge both data frames with merge():
merge(df1, df2, by = "Candidate", all.x = TRUE)
or equivalently
dplyr::left_join(df1, df2, by = "Candidate")
# Candidate Number_of_Votes
# 1 A 124
# 2 B 52
# 3 D 62
# 4 G 67
Data
df1 <- structure(list(Candidate = c("A", "B", "D", "G")), class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(Candidate = c("A", "B", "C", "D", "E", "F", "G"),
Number_of_Votes = c(124L, 52L, 13L, 62L, 33L, 7L, 67L)), class = "data.frame", row.names = c(NA, -7L))

Verifyin if there's at least two columns have the same value in a specefic column

i have a data and i want to see if my variables they all have unique value in specefic row
let's say i want to analyze row D
my data
Name F S T
A 1 2 3
B 2 3 4
C 3 4 5
D 4 5 6
> TRUE (because all the three variables have unique value)
Second example
Name F S T
A 1 2 3
B 2 3 4
C 3 4 5
D 4 5 4
>False (because F and T have the same value in row D )

In base R do
f1 <- function(dat, ind) {
tmp <- unlist(dat[ind, -1])
length(unique(tmp)) == length(tmp)
}
-testing
> f1(df, 4)
[1] TRUE
> f1(df1, 4)
[1] FALSE
data
df <- structure(list(Name = c("A", "B", "C", "D"), F = 1:4, S = 2:5,
T = 3:6), class = "data.frame", row.names = c(NA, -4L))
df1 <- structure(list(Name = c("A", "B", "C", "D"), F = 1:4, S = 2:5,
T = c(3L, 4L, 5L, 4L)), class = "data.frame", row.names = c(NA,
-4L))

You can use dplyr for this:
df %>%
summarize_at(c(2:ncol(.)), n_distinct) %>%
summarize(if_all(.fns = ~ .x == nrow(df)))

How to collapse rows by identical values in a column

Good evening,
I have a two columns tab separated .txt file, as the following:
number letter
1 a
1 b
2 a
2 b
3 b
I would like to collapse rows where the column "number" has identical value, by creating a comma separated value in the corresponding column "letter".
In other words, this should be the output:
number letter
1 a,b
2 a,b
3 b
I have looked up the web but I did not find an actual solution.
Thank you in advance,
Giuseppe

We can use aggregate in base R
aggregate(letter ~ number, df1, FUN = paste, collapse=",")
-output
# number letter
#1 1 a,b
#2 2 a,b
#3 3 b
Or with tidyverse
library(dplyr)
library(stringr)
df1 %>%
group_by(number) %>%
summarise(letter = str_c(letter, collapse=","))
data
df1 <- structure(list(number = c(1L, 1L, 2L, 2L, 3L), letter = c("a",
"b", "a", "b", "b")), class = "data.frame", row.names = c(NA,
-5L))

We can also combine aggregate() with toString:
#Code
newdf <- aggregate(letter~.,df,toString)
Output:
number letter
1 1 a, b
2 2 a, b
3 3 b
Some data:
#Data
df <- structure(list(number = c(1L, 1L, 2L, 2L, 3L), letter = c("a",
"b", "a", "b", "b")), class = "data.frame", row.names = c(NA,
-5L))

Distinct in dplyr does not work (sometimes)

I have the following data frame which I have obtained from a count. I have used dput to make the data frame available and then edited the data frame so there is a duplicate of A.
df <- structure(list(Procedure = structure(c(4L, 1L, 2L, 3L), .Label = c("A", "A", "C", "D", "-1"),
class = "factor"), n = c(10717L, 4412L, 2058L, 1480L)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), .Names = c("Procedure", "n"))
print(df)
# A tibble: 4 x 2
Procedure n
<fct> <int>
1 D 10717
2 A 4412
3 A 2058
4 C 1480
Now I would like to take distinct on Procedure and only keep the first A.
df %>%
distinct(Procedure, .keep_all=TRUE)
# A tibble: 4 x 2
Procedure n
<fct> <int>
1 D 10717
2 A 4412
3 A 2058
4 C 1480
It does not work. Strange...

If we print the Procedure column, we can see that there are duplicated levels for a, which is problematic for the distinct function.
df$Procedure
[1] D A A C
Levels: A A C D -1
Warning message:
In print.factor(x) : duplicated level [2] in factor
One way to fix is to drop the factor levels. We can use factor function to achieve this. Another way is to convert the Procedure column to character.
df <- structure(list(Procedure = structure(c(4L, 1L, 2L, 3L), .Label = c("A", "A", "C", "D", "-1"),
class = "factor"), n = c(10717L, 4412L, 2058L, 1480L)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L), .Names = c("Procedure", "n"))
library(tidyverse)
df %>%
mutate(Procedure = factor(Procedure)) %>%
distinct(Procedure, .keep_all=TRUE)
# # A tibble: 3 x 2
# Procedure n
# <fct> <int>
# 1 D 10717
# 2 A 4412
# 3 C 1480

You have duplicated value in a label parameter .Label = c("A", "A", "C", "D", "-1"). That is an issue. Btw your way of initializing of a tibble seems to be very strange (i do not know exactly your goal but still)
Why not use
df <- tibble(
Procedure = c("D", "A", "A", "C"),
n = c(10717L, 4412L, 2058L, 1480L)
)

Count matching instances between two data frames [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I'm a newbie with R and can't find my answer/anything that works.
I've got two data frames that look like..
Teams
A
B
C
...
and
TCF
A
B
C
C
B
A
...
I need to count the number of instances that each of the first DF column occurs in the second DF and return the value to the first DF. Thanks in advance!

You could use base R to do this:
sapply(unique(df1$Teams), function(x) sum(df2$TCF %in% x))
#A B C
#2 2 2
Or
setNames(table(match(df2$TCF, unique(df1$Teams))), unique(df1$Teams))
#A B C
#2 2 2
Or using data.table
library(data.table)
setkey(setDT(df1), Teams)
setkey(setDT(df2), TCF)
df2[J(unique(df1$Teams)),.N, by=.EACHI]
# TCF N
#1: A 2
#2: B 2
#3: C 2
data
df1 <- structure(list(Teams = c("A", "B", "C")), .Names = "Teams",
class = "data.frame", row.names = c(NA,-3L))
df2 <- structure(list(TCF = c("A", "B", "C", "C", "B", "A")), .Names = "TCF",
class = "data.frame", row.names = c(NA, -6L))

Would this option be easier to your eyes?
library(dplyr)
df2 %>% count(TCF) %>% filter(TCF %in% unique(df1$Teams))
# Source: local data frame [3 x 2]
# TCF n
# 1 A 2
# 2 B 2
# 3 C 2
Data
df1 <- structure(list(Teams = c("A", "B", "C")), .Names = "Teams", class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(TCF = structure(c(1L, 2L, 3L, 3L, 2L, 1L, 4L,
5L, 5L), .Label = c("A", "B", "C", "X", "Y"), class = "factor")), .Names = "TCF", row.names = c(NA,
-9L), class = "data.frame")

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Fastest way to populate a column from another table in R? - r

I have 2 tables and I need to create a new column "Number" where it populates it with the col2 respective value. My data contains couple of hundreds of rows and I used for loop to populate it but it takes lots of time is there a faster way? Col1 Col2 A 55 B 77 C 80 D 9 Letter Number A B C D

Related

Filter rows that contain values from a list [duplicate]

Verifyin if there's at least two columns have the same value in a specefic column

How to collapse rows by identical values in a column

Distinct in dplyr does not work (sometimes)

Count matching instances between two data frames [closed]

Categories

Resources