rename the category of variable in R - r

I have text variable X1. It takes value A,B,C,D. I need to rename category D to F. So in output i expect A,B,C,F
How can i do it?
here my dataset
mydat=structure(list(x1 = structure(1:4, .Label = c("a", "b", "c",
"d"), class = "factor"), x2 = c(1L, 1L, 1L, 1L), x3 = c(2L, 2L,
2L, 2L)), .Names = c("x1", "x2", "x3"), class = "data.frame", row.names = c(NA,
-4L))

Convert it to characters, use simple subsetting and convert it back to a factor (optional):
mydat$x1 <- as.character(mydat$x1)
mydat$x1[mydat$x1 == 'd'] <- 'f'
# optional
mydat$x1 <- as.factor(mydat$x1)
Or - as you were looking for a dplyr solution:
library(dplyr)
mydat %>%
mutate(x1 = as.character(x1),
x1 = if_else(x1 == 'd', 'f', x1),
x1 = as.factor(x1))
Both will yield
x1 x2 x3
1 a 1 2
2 b 1 2
3 c 1 2
4 f 1 2

Related

Inserting R dataframe column to dataframe based on presence absence values

My input file is:
input_file <- structure(list(species = structure(1:3, .Label = c("x", "y",
"z"), class = "factor"), header1 = c(0L, 1L, 0L), header2 = c(0L,
1L, 1L), header3 = c(1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-3L))
Here 1 and 0 indicate presence and absence.
Now, I need to convert this file (based on presence - absence values) to:
output_file <- structure(list(header1 = structure(c(2L, 1L, 1L), .Label = c("",
"y"), class = "factor"), header2 = structure(c(2L, 3L, 1L), .Label = c("",
"y", "z"), class = "factor"), header3 = structure(1:3, .Label = c("x",
"y", "z"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L))
For this, first I try to melt my input file using reshape2:
library(reshape2)
df2 <- melt(input_file, id.var = "species")
Now I am not sure how to create a dataframe to get my desired output.
Thanks!
since you are using reshape2 you could do:
library(reshape2)
dcast(subset(df1,value>0),ave(value,variable,FUN = seq_along)~variable,value.var = "species")[-1]
header1 header2 header3
1 y y x
2 <NA> z y
3 <NA> <NA> z
You can then replace the NA with the empty string
In base R, you could do:
df1 <- subset(reshape(input_file,-1,sep="",dir="long",idvar = "species"),header>0)
reshape(transform(df1,header = ave(time,time,FUN = seq_along)),dir="wide",idvar = "header",sep="")[-1]
species1 species2 species3
y.1 y y x
z.2 <NA> z y
z.3 <NA> <NA> z
Here's a base R solution. It first does an ifelse on each row. If it finds a 1 it replaces it with the species name. If it finds a zero it writes a blank. The species column is then removed. The second line just ensures that any empty cells are moved to the bottom of the columns.
m <- t(apply(input_file, 1, function(x) ifelse(x == "1", x[1], ""))[-1,])
df <- as.data.frame(apply(m, 2, function(x) x[order(-nchar(x))]))
So we can see this matches your output file:
df
#> header1 header2 header3
#> 1 y y x
#> 2 z y
#> 3 z
identical(df, output_file)
#> [1] TRUE

How do I join repeatedly the tables in r?

I would like to join repeatedly between two tables. Here is the table.
structure(list(key = structure(1:4, .Label = c("A", "B", "C", "D"),
class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
structure(list(key = structure(c(1L, 2L, 2L, 3L), .Label = c("A",
"B", "C"), class = "factor"), source = structure(c(1L, 1L, 2L, 2L), .Label = c("a", "b"), class = "factor"), value = c(1L, 1L, 2L, 2L)), class = "data.frame", row.names = c(NA, -4L))
<joined>
key
A
B
C
D
<joining>
key source value
A a 1
B a 1
B b 2
C b 2
If I use left_join function like left_join(joined, joining, by = "key"), the results is here.
key source value
1 A a 1
2 B a 1
3 B b 2
4 C b 2
5 D <NA> NA
However, I want to join grouping by "source". My expected results are here.
joining_a <- joining %>%
filter(source == "a")
joining_b <- joining %>%
filter(source == "b")
left_join(joined, joining_a, by = "key")
left_join(joined, joining_b, by = "key")
bind_rows(left_join(joined, joining_a, by = "key"), left_join(joined, joining_b, by = "key"))
key source value
1 A a 1
2 B a 1
3 C <NA> NA
4 D <NA> NA
5 A <NA> NA
6 B b 2
7 C b 2
8 D <NA> NA
How do I join the tables not dividing these tables?
We can group_split(or split from base R) the 'joining' into a list and then do the left_join with 'joined' using map
library(tidyverse)
joining %>%
group_split(source) %>%
map_dfr(~ left_join(joined, .x, by = 'key'))
# key source value
#1 A a 1
#2 B a 1
#3 C <NA> NA
#4 D <NA> NA
#5 A <NA> NA
#6 B b 2
#7 C b 2
#8 D <NA> NA
Or without a lambda function
joining %>%
group_split(source) %>%
map_dfr(left_join, x = joined, by = 'key')
data
joined <- structure(list(key = structure(1:4, .Label = c("A", "B", "C",
"D"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
joining <- structure(list(key = structure(c(1L, 2L, 2L, 3L),
.Label = c("A",
"B", "C"), class = "factor"), source = structure(c(1L, 1L, 2L,
2L), .Label = c("a", "b"), class = "factor"), value = c(1L, 1L,
2L, 2L)), class = "data.frame", row.names = c(NA, -4L))

Spread every other row then unite to append row names in dplyr

I am in the process of trying to make untidy data data. I have data in the following format:
name x
a NA
value 1
b NA
value 2
c NA
value 3
I would like it to be in the following format
name x
a_value 1
b_value 2
c_value 3
How can I do this in dplyr?
My first thought is to come up with a way to spread so that
name name2 x x2
a value NA 1
b value NA 2
c value NA 3
From there I know I can use unite for name and name2 and delete column x, but I am not sure if spread can produce the above.
You can group on NA and summarise, i.e.
library(dplyr)
df %>%
group_by(grp = cumsum(is.na(x))) %>%
summarise(name = paste(name, collapse = '_'))
which gives,
# A tibble: 3 x 2
grp name
<int> <chr>
1 1 a_value
2 2 b_value
3 3 c_value
DATA
dput(df)
structure(list(name = c("a", "value", "b", "value", "c", "value"
), x = c(NA, 1L, NA, 2L, NA, 3L)), .Names = c("name", "x"), row.names = c(NA,
-6L), class = "data.frame")
Use na.locf and then remove the unwanted rows:
library(dplyr)
library(zoo)
DF %>%
mutate(x = na.locf(x, fromLast = TRUE)) %>%
filter(name != "value")
giving:
name x
1 a 1
2 b 2
3 c 3
Note
DF <-
structure(list(name = structure(c(1L, 4L, 2L, 4L, 3L, 4L), .Label = c("a",
"b", "c", "value"), class = "factor"), x = c(NA, 1L, NA, 2L,
NA, 3L)), .Names = c("name", "x"), class = "data.frame", row.names = c(NA,
-6L))

I need to convert the levels of multiple categorical variable into 0,1

I have five columns with 2 levels and their column names are like c(a,b,x,y,z). The command below works for 1 column at time. But I need to it for all five columns at the same time.
levels(car_data[,"x"]) <- c(0,1)
car_data[,"x"] <- as.numeric(levels(car_data[,"x"]))[car_data[,"x"]]
If there are two levels, then we can do
library(dplyr)
car_data %>%
mutate_all(funs(as.integer(.)-1))
# a b c
#1 0 0 0
#2 1 1 1
#3 0 0 0
#4 1 1 1
data
car_data <- structure(list(a = structure(c(1L, 2L, 1L, 2L), .Label = c("a",
"b"), class = "factor"), b = structure(c(1L, 2L, 1L, 2L), .Label = c("a",
"b"), class = "factor"), c = structure(c(1L, 2L, 1L, 2L), .Label = c("a",
"b"), class = "factor")), .Names = c("a", "b", "c"), row.names = c(NA,
-4L), class = "data.frame")

R language: Replace variables if the value can be matched in another table

I have a sales report table (DF1)and I need to replace only a few product codes by their associated group codes
Model SOLD
A 5
B 4
C 4
D 3
F 11
I have another table (DF2) where I have the Model# and the associated group codes
Model Group
A 1
B 1
C 2
D 2
I would like to replace the model# in DF1 by the group number if the model exist in DF2.
The wanted end result:
Model SOLD
1 5
1 4
2 4
2 3
F 11
Thank you!
You can do this with qdapTools's lookup family, specifically, the binary operator %lc+% (a wrapper for the data.table package). The l stands for lookup, the c forces te terms to character and the + only replaces those elements that are found in the lookup table:
library(qdap)
df1$Model <- df1$Model %lc+% df2
Here it is more explicitly:
df1 <- structure(list(Model = structure(1:5, .Label = c("A", "B", "C",
"D", "F"), class = "factor"), SOLD = c(5L, 4L, 4L, 3L, 11L)), .Names = c("Model",
"SOLD"), class = "data.frame", row.names = c(NA, -5L))
df2 <- structure(list(Model = structure(1:4, .Label = c("A", "B", "C",
"D"), class = "factor"), Group = c(1L, 1L, 2L, 2L)), .Names = c("Model",
"Group"), class = "data.frame", row.names = c(NA, -4L))
library(qdap)
df1$Model <- df1$Model %lc+% df2
df1
## Model SOLD
## 1 1 5
## 2 1 4
## 3 2 4
## 4 2 3
## 5 F 11

Resources