How do I join repeatedly the tables in r?

How do I join repeatedly the tables in r? - r

I would like to join repeatedly between two tables. Here is the table.
structure(list(key = structure(1:4, .Label = c("A", "B", "C", "D"),
class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
structure(list(key = structure(c(1L, 2L, 2L, 3L), .Label = c("A",
"B", "C"), class = "factor"), source = structure(c(1L, 1L, 2L, 2L), .Label = c("a", "b"), class = "factor"), value = c(1L, 1L, 2L, 2L)), class = "data.frame", row.names = c(NA, -4L))
<joined>
key
A
B
C
D
<joining>
key source value
A a 1
B a 1
B b 2
C b 2
If I use left_join function like left_join(joined, joining, by = "key"), the results is here.
key source value
1 A a 1
2 B a 1
3 B b 2
4 C b 2
5 D <NA> NA
However, I want to join grouping by "source". My expected results are here.
joining_a <- joining %>%
filter(source == "a")
joining_b <- joining %>%
filter(source == "b")
left_join(joined, joining_a, by = "key")
left_join(joined, joining_b, by = "key")
bind_rows(left_join(joined, joining_a, by = "key"), left_join(joined, joining_b, by = "key"))
key source value
1 A a 1
2 B a 1
3 C <NA> NA
4 D <NA> NA
5 A <NA> NA
6 B b 2
7 C b 2
8 D <NA> NA
How do I join the tables not dividing these tables?

We can group_split(or split from base R) the 'joining' into a list and then do the left_join with 'joined' using map
library(tidyverse)
joining %>%
group_split(source) %>%
map_dfr(~ left_join(joined, .x, by = 'key'))
# key source value
#1 A a 1
#2 B a 1
#3 C <NA> NA
#4 D <NA> NA
#5 A <NA> NA
#6 B b 2
#7 C b 2
#8 D <NA> NA
Or without a lambda function
joining %>%
group_split(source) %>%
map_dfr(left_join, x = joined, by = 'key')
data
joined <- structure(list(key = structure(1:4, .Label = c("A", "B", "C",
"D"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
joining <- structure(list(key = structure(c(1L, 2L, 2L, 3L),
.Label = c("A",
"B", "C"), class = "factor"), source = structure(c(1L, 1L, 2L,
2L), .Label = c("a", "b"), class = "factor"), value = c(1L, 1L,
2L, 2L)), class = "data.frame", row.names = c(NA, -4L))

Related

Create a dataframe with list elements with dplyr in R

This is my dataframe:
df<-list(structure(list(Col1 = structure(1:6, .Label = c("A", "B",
"C", "D", "E", "F"), class = "factor"), Col2 = structure(c(1L,
2L, 3L, 2L, 4L, 5L), .Label = c("B", "C", "D", "F", "G"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L)), structure(list(Col1 = structure(c(1L, 4L, 5L, 6L, 2L,
3L), .Label = c("A", "E", "H", "M", "N", "P"), class = "factor"),
Col2 = structure(c(1L, 2L, 3L, 2L, 4L, 5L), .Label = c("B",
"C", "D", "F", "G"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L)), structure(list(Col1 = structure(c(1L, 4L, 6L, 5L, 2L,
3L), .Label = c("A", "W", "H", "M", "T", "U"), class = "factor"),
Col2 = structure(c(1L, 2L, 3L, 2L, 4L, 5L), .Label = c("B",
"C", "D", "S", "G"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L)))
I want to extract col1=df[[1]][1] as a dataframe. Then col1 of the second position of this list I want to merge to the df[[1]][1], then I will have a dataframe with 2 columns.
After this I want to merge the column 1 of the third position of the list to the dataframe with two columns, then I will have a dataframe with 3 columns.
In other words my dataframe should have 3 columns, all the first columns of each entry of my list.
The dplyr package can helpme to do this?
Any help?

You can use lapply to extract the three columns named "Col1 in one go. Then set the names of the result.
col1 <- as.data.frame(lapply(df, '[[', "Col1"))
names(col1) <- letters[seq_along(col1)]
col1
# a b c
#1 A A A
#2 B M M
#3 C N U
#4 D P T
#5 E E W
#6 F H H
Choose any other column names that you might find better.
A dplyr way could be
df %>%
unlist(recursive = FALSE) %>%
as.data.frame %>%
select(., starts_with("Col1"))
# Col1 Col1.1 Col1.2
#1 A A A
#2 B M M
#3 C N U
#4 D P T
#5 E E W
#6 F H H

With map_dfc from purrr:
library(purrr)
map_dfc(df, `[`, 1)
Output:
Col1 Col11 Col12
1 A A A
2 B M M
3 C N U
4 D P T
5 E E W
6 F H H

Alternative use of map_dfc making use of purrr's concise element extraction syntax that allows specifying elements of elements by name or position. The first is, for example, equivalent to
map_dfc(df, `[[`, 1)
which differs from the use of [ in that the columns will not be named variations of Col1 and just get V names instead, which may be desirable since names like Col11 and Col12 may be confusing.
df <- list(structure(list(Col1 = structure(1:6, .Label = c("A", "B", "C", "D", "E", "F"), class = "factor"), Col2 = structure(c(1L, 2L, 3L, 2L, 4L, 5L), .Label = c("B", "C", "D", "F", "G"), class = "factor")), class = "data.frame", row.names = c(NA, -6L)), structure(list(Col1 = structure(c(1L, 4L, 5L, 6L, 2L, 3L), .Label = c("A", "E", "H", "M", "N", "P"), class = "factor"), Col2 = structure(c(1L, 2L, 3L, 2L, 4L, 5L), .Label = c("B", "C", "D", "F", "G"), class = "factor")), class = "data.frame", row.names = c(NA, -6L)), structure(list(Col1 = structure(c(1L, 4L, 6L, 5L, 2L, 3L), .Label = c("A", "W", "H", "M", "T", "U"), class = "factor"), Col2 = structure(c(1L, 2L, 3L, 2L, 4L, 5L), .Label = c("B", "C", "D", "S", "G"), class = "factor")), class = "data.frame", row.names = c(NA, -6L)))
library(purrr)
map_dfc(df, 1)
#> # A tibble: 6 x 3
#> V1 V2 V3
#> <fct> <fct> <fct>
#> 1 A A A
#> 2 B M M
#> 3 C N U
#> 4 D P T
#> 5 E E W
#> 6 F H H
map_dfc(df, "Col1")
#> # A tibble: 6 x 3
#> V1 V2 V3
#> <fct> <fct> <fct>
#> 1 A A A
#> 2 B M M
#> 3 C N U
#> 4 D P T
#> 5 E E W
#> 6 F H H
Created on 2018-09-19 by the reprex package (v0.2.0).

res<-1:nrow(df[[1]][1])
for(i in 1:length(df)){
print ( as.vector(df[[i]][1]))
res<-cbind(res,as.data.frame(df[[i]][1]))
}
res$res<-NULL
So, the output is:
Col1 Col1 Col1
1 A A A
2 B M M
3 C N U
4 D P T
5 E E W
6 F H H

Using dplyr
library(dplyr)
df %>%
sapply('[[',1) %>%
as.data.frame
#returns
V1 V2 V3
1 A A A
2 B M M
3 C N U
4 D P T
5 E E W
6 F H H

Spread every other row then unite to append row names in dplyr

I am in the process of trying to make untidy data data. I have data in the following format:
name x
a NA
value 1
b NA
value 2
c NA
value 3
I would like it to be in the following format
name x
a_value 1
b_value 2
c_value 3
How can I do this in dplyr?
My first thought is to come up with a way to spread so that
name name2 x x2
a value NA 1
b value NA 2
c value NA 3
From there I know I can use unite for name and name2 and delete column x, but I am not sure if spread can produce the above.

You can group on NA and summarise, i.e.
library(dplyr)
df %>%
group_by(grp = cumsum(is.na(x))) %>%
summarise(name = paste(name, collapse = '_'))
which gives,
# A tibble: 3 x 2
grp name
<int> <chr>
1 1 a_value
2 2 b_value
3 3 c_value
DATA
dput(df)
structure(list(name = c("a", "value", "b", "value", "c", "value"
), x = c(NA, 1L, NA, 2L, NA, 3L)), .Names = c("name", "x"), row.names = c(NA,
-6L), class = "data.frame")

Use na.locf and then remove the unwanted rows:
library(dplyr)
library(zoo)
DF %>%
mutate(x = na.locf(x, fromLast = TRUE)) %>%
filter(name != "value")
giving:
name x
1 a 1
2 b 2
3 c 3
Note
DF <-
structure(list(name = structure(c(1L, 4L, 2L, 4L, 3L, 4L), .Label = c("a",
"b", "c", "value"), class = "factor"), x = c(NA, 1L, NA, 2L,
NA, 3L)), .Names = c("name", "x"), class = "data.frame", row.names = c(NA,
-6L))

I need to convert the levels of multiple categorical variable into 0,1

I have five columns with 2 levels and their column names are like c(a,b,x,y,z). The command below works for 1 column at time. But I need to it for all five columns at the same time.
levels(car_data[,"x"]) <- c(0,1)
car_data[,"x"] <- as.numeric(levels(car_data[,"x"]))[car_data[,"x"]]

If there are two levels, then we can do
library(dplyr)
car_data %>%
mutate_all(funs(as.integer(.)-1))
# a b c
#1 0 0 0
#2 1 1 1
#3 0 0 0
#4 1 1 1
data
car_data <- structure(list(a = structure(c(1L, 2L, 1L, 2L), .Label = c("a",
"b"), class = "factor"), b = structure(c(1L, 2L, 1L, 2L), .Label = c("a",
"b"), class = "factor"), c = structure(c(1L, 2L, 1L, 2L), .Label = c("a",
"b"), class = "factor")), .Names = c("a", "b", "c"), row.names = c(NA,
-4L), class = "data.frame")

Multiply a table(file1) with individual cells of a column(file2) using R

File 1:Ele A B C DEs 1 2 3 4Ep 2 4 3 4Ek 1 9 3 8File2:A 1 B 2 C 3 D 5
Need is to ensure that each element under Column A (file 1) gets multiplied by the value assigned to A in file 2 (and so on). I know matrix multiplication in R but this is not the case of matrix multiplication I suppose. Help would be greatly appreciated. Thanks

You could try
indx <- df2$Col1
df1[indx]*df2$Col2[col(df1[indx])]
# A B C D
#1 1 4 9 20
#2 2 8 9 20
#3 1 18 9 40
Or you could use sweep
sweep(df1[indx], 2, df2$Col2, '*')
# A B C D
#1 1 4 9 20
#2 2 8 9 20
#3 1 18 9 40
data
df1 <- structure(list(Ele = c("Es", "Ep", "Ek"), A = c(1L, 2L, 1L),
B = c(2L, 4L, 9L), C = c(3L, 3L, 3L), D = c(4L, 4L, 8L)),
.Names = c("Ele", "A", "B", "C", "D"), class = "data.frame",
row.names = c(NA, -3L))
df2 <- structure(list(Col1 = c("A", "B", "C", "D"), Col2 = c(1L, 2L,
3L, 5L)), .Names = c("Col1", "Col2"), class = "data.frame",
row.names = c(NA, -4L))

R language: Replace variables if the value can be matched in another table

I have a sales report table (DF1)and I need to replace only a few product codes by their associated group codes
Model SOLD
A 5
B 4
C 4
D 3
F 11
I have another table (DF2) where I have the Model# and the associated group codes
Model Group
A 1
B 1
C 2
D 2
I would like to replace the model# in DF1 by the group number if the model exist in DF2.
The wanted end result:
Model SOLD
1 5
1 4
2 4
2 3
F 11
Thank you!

You can do this with qdapTools's lookup family, specifically, the binary operator %lc+% (a wrapper for the data.table package). The l stands for lookup, the c forces te terms to character and the + only replaces those elements that are found in the lookup table:
library(qdap)
df1$Model <- df1$Model %lc+% df2
Here it is more explicitly:
df1 <- structure(list(Model = structure(1:5, .Label = c("A", "B", "C",
"D", "F"), class = "factor"), SOLD = c(5L, 4L, 4L, 3L, 11L)), .Names = c("Model",
"SOLD"), class = "data.frame", row.names = c(NA, -5L))
df2 <- structure(list(Model = structure(1:4, .Label = c("A", "B", "C",
"D"), class = "factor"), Group = c(1L, 1L, 2L, 2L)), .Names = c("Model",
"Group"), class = "data.frame", row.names = c(NA, -4L))
library(qdap)
df1$Model <- df1$Model %lc+% df2
df1
## Model SOLD
## 1 1 5
## 2 1 4
## 3 2 4
## 4 2 3
## 5 F 11

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How do I join repeatedly the tables in r? - r

Related

Create a dataframe with list elements with dplyr in R

Spread every other row then unite to append row names in dplyr

I need to convert the levels of multiple categorical variable into 0,1

Multiply a table(file1) with individual cells of a column(file2) using R

R language: Replace variables if the value can be matched in another table

Categories

Resources