Inserting R dataframe column to dataframe based on presence absence values - r

My input file is:
input_file <- structure(list(species = structure(1:3, .Label = c("x", "y",
"z"), class = "factor"), header1 = c(0L, 1L, 0L), header2 = c(0L,
1L, 1L), header3 = c(1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-3L))
Here 1 and 0 indicate presence and absence.
Now, I need to convert this file (based on presence - absence values) to:
output_file <- structure(list(header1 = structure(c(2L, 1L, 1L), .Label = c("",
"y"), class = "factor"), header2 = structure(c(2L, 3L, 1L), .Label = c("",
"y", "z"), class = "factor"), header3 = structure(1:3, .Label = c("x",
"y", "z"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L))
For this, first I try to melt my input file using reshape2:
library(reshape2)
df2 <- melt(input_file, id.var = "species")
Now I am not sure how to create a dataframe to get my desired output.
Thanks!

since you are using reshape2 you could do:
library(reshape2)
dcast(subset(df1,value>0),ave(value,variable,FUN = seq_along)~variable,value.var = "species")[-1]
header1 header2 header3
1 y y x
2 <NA> z y
3 <NA> <NA> z
You can then replace the NA with the empty string
In base R, you could do:
df1 <- subset(reshape(input_file,-1,sep="",dir="long",idvar = "species"),header>0)
reshape(transform(df1,header = ave(time,time,FUN = seq_along)),dir="wide",idvar = "header",sep="")[-1]
species1 species2 species3
y.1 y y x
z.2 <NA> z y
z.3 <NA> <NA> z

Here's a base R solution. It first does an ifelse on each row. If it finds a 1 it replaces it with the species name. If it finds a zero it writes a blank. The species column is then removed. The second line just ensures that any empty cells are moved to the bottom of the columns.
m <- t(apply(input_file, 1, function(x) ifelse(x == "1", x[1], ""))[-1,])
df <- as.data.frame(apply(m, 2, function(x) x[order(-nchar(x))]))
So we can see this matches your output file:
df
#> header1 header2 header3
#> 1 y y x
#> 2 z y
#> 3 z
identical(df, output_file)
#> [1] TRUE

Related

How do I join repeatedly the tables in r?

I would like to join repeatedly between two tables. Here is the table.
structure(list(key = structure(1:4, .Label = c("A", "B", "C", "D"),
class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
structure(list(key = structure(c(1L, 2L, 2L, 3L), .Label = c("A",
"B", "C"), class = "factor"), source = structure(c(1L, 1L, 2L, 2L), .Label = c("a", "b"), class = "factor"), value = c(1L, 1L, 2L, 2L)), class = "data.frame", row.names = c(NA, -4L))
<joined>
key
A
B
C
D
<joining>
key source value
A a 1
B a 1
B b 2
C b 2
If I use left_join function like left_join(joined, joining, by = "key"), the results is here.
key source value
1 A a 1
2 B a 1
3 B b 2
4 C b 2
5 D <NA> NA
However, I want to join grouping by "source". My expected results are here.
joining_a <- joining %>%
filter(source == "a")
joining_b <- joining %>%
filter(source == "b")
left_join(joined, joining_a, by = "key")
left_join(joined, joining_b, by = "key")
bind_rows(left_join(joined, joining_a, by = "key"), left_join(joined, joining_b, by = "key"))
key source value
1 A a 1
2 B a 1
3 C <NA> NA
4 D <NA> NA
5 A <NA> NA
6 B b 2
7 C b 2
8 D <NA> NA
How do I join the tables not dividing these tables?
We can group_split(or split from base R) the 'joining' into a list and then do the left_join with 'joined' using map
library(tidyverse)
joining %>%
group_split(source) %>%
map_dfr(~ left_join(joined, .x, by = 'key'))
# key source value
#1 A a 1
#2 B a 1
#3 C <NA> NA
#4 D <NA> NA
#5 A <NA> NA
#6 B b 2
#7 C b 2
#8 D <NA> NA
Or without a lambda function
joining %>%
group_split(source) %>%
map_dfr(left_join, x = joined, by = 'key')
data
joined <- structure(list(key = structure(1:4, .Label = c("A", "B", "C",
"D"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
joining <- structure(list(key = structure(c(1L, 2L, 2L, 3L),
.Label = c("A",
"B", "C"), class = "factor"), source = structure(c(1L, 1L, 2L,
2L), .Label = c("a", "b"), class = "factor"), value = c(1L, 1L,
2L, 2L)), class = "data.frame", row.names = c(NA, -4L))

rename the category of variable in R

I have text variable X1. It takes value A,B,C,D. I need to rename category D to F. So in output i expect A,B,C,F
How can i do it?
here my dataset
mydat=structure(list(x1 = structure(1:4, .Label = c("a", "b", "c",
"d"), class = "factor"), x2 = c(1L, 1L, 1L, 1L), x3 = c(2L, 2L,
2L, 2L)), .Names = c("x1", "x2", "x3"), class = "data.frame", row.names = c(NA,
-4L))
Convert it to characters, use simple subsetting and convert it back to a factor (optional):
mydat$x1 <- as.character(mydat$x1)
mydat$x1[mydat$x1 == 'd'] <- 'f'
# optional
mydat$x1 <- as.factor(mydat$x1)
Or - as you were looking for a dplyr solution:
library(dplyr)
mydat %>%
mutate(x1 = as.character(x1),
x1 = if_else(x1 == 'd', 'f', x1),
x1 = as.factor(x1))
Both will yield
x1 x2 x3
1 a 1 2
2 b 1 2
3 c 1 2
4 f 1 2

Spread every other row then unite to append row names in dplyr

I am in the process of trying to make untidy data data. I have data in the following format:
name x
a NA
value 1
b NA
value 2
c NA
value 3
I would like it to be in the following format
name x
a_value 1
b_value 2
c_value 3
How can I do this in dplyr?
My first thought is to come up with a way to spread so that
name name2 x x2
a value NA 1
b value NA 2
c value NA 3
From there I know I can use unite for name and name2 and delete column x, but I am not sure if spread can produce the above.
You can group on NA and summarise, i.e.
library(dplyr)
df %>%
group_by(grp = cumsum(is.na(x))) %>%
summarise(name = paste(name, collapse = '_'))
which gives,
# A tibble: 3 x 2
grp name
<int> <chr>
1 1 a_value
2 2 b_value
3 3 c_value
DATA
dput(df)
structure(list(name = c("a", "value", "b", "value", "c", "value"
), x = c(NA, 1L, NA, 2L, NA, 3L)), .Names = c("name", "x"), row.names = c(NA,
-6L), class = "data.frame")
Use na.locf and then remove the unwanted rows:
library(dplyr)
library(zoo)
DF %>%
mutate(x = na.locf(x, fromLast = TRUE)) %>%
filter(name != "value")
giving:
name x
1 a 1
2 b 2
3 c 3
Note
DF <-
structure(list(name = structure(c(1L, 4L, 2L, 4L, 3L, 4L), .Label = c("a",
"b", "c", "value"), class = "factor"), x = c(NA, 1L, NA, 2L,
NA, 3L)), .Names = c("name", "x"), class = "data.frame", row.names = c(NA,
-6L))

Merging by condition OR

I need to merge two tables in R.
The table X looks this way:
company_name country_code country cost1 cost2
1 Test1 FR <NA> NA 9.945000e-02
2 Test1 BR Brazil NA NA
3 Test2 <NA> USA 1 1.053000e-01
The table Y looks this way:
country country_code tier
France FR 1
Brazil BR 2
USA US 1
I need to merge X and Y to get Z:
name country_code tier
Test1 FR 1
Test2 BR 2
....
What should I do to merge by OR condition or something?
The following will do it. Note that I use a function from package zoo, so you will need to have it installed.
m <- merge(df1, df2, all = TRUE)
m$country <- zoo::na.locf(m$country)
m <- lapply(split(m, m$country), function(.m) zoo::na.locf(.m, fromLast = TRUE))
m <- lapply(m, function(.m) zoo::na.locf(.m))
m <- do.call(rbind, m)
m <- m[!duplicated(m), c(3, 2, 4)]
row.names(m) <- NULL
m
# name country_code tier
#1 First FR 1
#2 Third US 1
#3 Second BR 2
DATA.
df1 <-
structure(list(name = structure(1:3, .Label = c("First", "Second",
"Third"), class = "factor"), country = structure(c(1L, NA, 2L
), .Label = c("France", "USA"), class = "factor"), country_code = structure(c(NA,
1L, 2L), .Label = c("BR", "US"), class = "factor")), .Names = c("name",
"country", "country_code"), class = "data.frame", row.names = c(NA,
-3L))
df2 <-
structure(list(country = structure(c(2L, 1L, 3L), .Label = c("Brazil",
"France", "USA"), class = "factor"), country_code = structure(c(2L,
1L, 3L), .Label = c("BR", "FR", "US"), class = "factor"), tier = c(1L,
2L, 1L)), .Names = c("country", "country_code", "tier"), class = "data.frame", row.names = c(NA,
-3L))
EDIT.
After the comments and the question edit by the OP, the input data has changed and the following code and new df1 reflect that change.
fun <- function(DF, col){
sp <- split(DF, DF[[col]])
m <- lapply(sp, function(.m) zoo::na.locf(.m, fromLast = TRUE))
m <- lapply(m, function(.m) zoo::na.locf(.m))
m <- do.call(rbind, m)
row.names(m) <- NULL
m
}
m <- merge(df1, df2, all = TRUE)
m$country <- zoo::na.locf(m$country)
m$country_code <- zoo::na.locf(m$country_code)
m <- fun(m, "country_code")
m <- m[!duplicated(m), ]
m
# country_code country company_name cost1 cost2 tier
#1 BR Brazil Test <NA> 0.0819 2
#2 FR France Test <NA> 0.09945 1
#4 US USA Test <NA> 0.1053 1
df1 <-
structure(list(company_name = structure(c(1L, 1L, 1L), .Label = "Test", class = "factor"),
country_code = structure(c(2L, 1L, NA), .Label = c("BR",
"FR"), class = "factor"), country = structure(c(NA, 1L, 2L
), .Label = c("Brazil", "USA"), class = "factor"), cost1 = c(NA,
NA, NA), cost2 = c(0.09945, 0.0819, 0.1053)), .Names = c("company_name",
"country_code", "country", "cost1", "cost2"), class = "data.frame", row.names = c("1",
"2", "3"))

R language: Replace variables if the value can be matched in another table

I have a sales report table (DF1)and I need to replace only a few product codes by their associated group codes
Model SOLD
A 5
B 4
C 4
D 3
F 11
I have another table (DF2) where I have the Model# and the associated group codes
Model Group
A 1
B 1
C 2
D 2
I would like to replace the model# in DF1 by the group number if the model exist in DF2.
The wanted end result:
Model SOLD
1 5
1 4
2 4
2 3
F 11
Thank you!
You can do this with qdapTools's lookup family, specifically, the binary operator %lc+% (a wrapper for the data.table package). The l stands for lookup, the c forces te terms to character and the + only replaces those elements that are found in the lookup table:
library(qdap)
df1$Model <- df1$Model %lc+% df2
Here it is more explicitly:
df1 <- structure(list(Model = structure(1:5, .Label = c("A", "B", "C",
"D", "F"), class = "factor"), SOLD = c(5L, 4L, 4L, 3L, 11L)), .Names = c("Model",
"SOLD"), class = "data.frame", row.names = c(NA, -5L))
df2 <- structure(list(Model = structure(1:4, .Label = c("A", "B", "C",
"D"), class = "factor"), Group = c(1L, 1L, 2L, 2L)), .Names = c("Model",
"Group"), class = "data.frame", row.names = c(NA, -4L))
library(qdap)
df1$Model <- df1$Model %lc+% df2
df1
## Model SOLD
## 1 1 5
## 2 1 4
## 3 2 4
## 4 2 3
## 5 F 11

Resources