How to change the row names in R data.frame? - r

I would like to rename station in DF to something like DA056 to Happy and AB786 to Sad.
library(tidyverse)
DF1 <- data.frame(Station = rep("DA056",3), Level = 100:102)
DF2 <- data.frame(Station = rep("AB786",3), Level = 201:203)
DF <- bind_rows(DF1,DF2)

We can use factor with labels specified for corresponding levels
library(dplyr)
DF <- DF %>%
mutate(Station = factor(Station, levels = c("DA056", "AB786"),
labels = c("Happy", "Sad")))
DF$Station
#[1] Happy Happy Happy Sad Sad Sad
#Levels: Happy Sad
Or with recode
DF %>%
mutate(Station = recode(Station, DA056 = 'Happy', AB786 = 'Sad'))
# Station Level
#1 Happy 100
#2 Happy 101
#3 Happy 102
#4 Sad 201
#5 Sad 202
#6 Sad 203
If there are many values to be changed, a better option is a join after creating a key/val dataset
keyval <- data.frame(Station = c("DA056", "AB786"),
val = c("Happy", "Sad"), stringsAsFactors = FALSE)
DF %>%
left_join(keyval) %>%
mutate(Station = coalesce(val, Station))
Or with base R
DF$Station <- with(df, factor(Station, levels = c("DA056", "AB786"),
labels = c("Happy", "Sad")))

An option is to use dplyr::case_when:
library(dplyr)
DF1 <- data.frame(Station = rep("DA056",3), Level = 100:102, stringsAsFactors = F)
DF2 <- data.frame(Station = rep("AB786",3), Level = 201:203, stringsAsFactors = F)
DF <- bind_rows(DF1,DF2)
DF <- DF %>% mutate(Station = case_when( Station == "DA056" ~ "Happy",
Station == "AB786" ~ "Sad",
TRUE ~ Station))
Output
> DF
Station Level
1 Happy 100
2 Happy 101
3 Happy 102
4 Sad 201
5 Sad 202
6 Sad 203

You can do it using case_when:
DF %>%
mutate(Station = case_when(Station == "DA056" ~ "Happy", Station =="AB786" ~ "Sad"))

Another simple solution
DF$Station = ifelse(DF$Station == "DA056", "Happy", "Sad")

Related

r: add day to a date-variable under a certain condition

I was trying to add one day to the to-variable - but only if from is not missing:
df <- data.frame(from = c("2020-01-01", "2020-02-01", ""),
to = c("2020-01-05", "2020-02-20", "2020-03-04"))
df <- df %>% mutate(
from = as.Date(from),
to = as.Date(to),
to = ifelse(!is.na(from), to + 1, to)
)
df
Obviously, this doesn't work :(. Can anyone tell me how to do it?
Try this. The function ifelse() uses to transform dates to numbers. Instead you can use if_else() from dplyr. Here the code:
#Data
df <- data.frame(from = c("2020-01-01", "2020-02-01", ""),
to = c("2020-01-05", "2020-02-20", "2020-03-04"))
#Variables
df <- df %>% mutate(
from = as.Date(from),
to = as.Date(to),
to = if_else(!is.na(from), to + 1, to)
)
#Output
df
Output:
df
from to
1 2020-01-01 2020-01-06
2 2020-02-01 2020-02-21
3 <NA> 2020-03-04
We can also do
library(lubridate)
library(dplyr)
df %>%
mutate(across(everything(), ymd),
to = case_when(!is.na(from)~ to+1, TRUE ~ to))
-output
# from to
#1 2020-01-01 2020-01-06
#2 2020-02-01 2020-02-21
#3 <NA> 2020-03-04

Summarizing and spreading data

I have data similar to below :
df=data.frame(
company=c("McD","McD","McD","KFC","KFC"),
Title=c("Crew Member","Manager","Trainer","Crew Member","Manager"),
Manhours=c(12,NA,5,13,10)
)
df
I would wish to manipulate it and obtain the data frame as below:
df=data.frame(
company=c("KFC", "McD"),
Manager=c(1,1),
Surbodinate=c(1,2),
TotalEmp=c(2,3),
TotalHours=c(23,17)
)
I have managed to manipulate and categorise the employees as well as their count as below:
df<- df %>%
mutate(Role = if_else((Title=="Manager" ),
"Manager","Surbodinate"))%>%
count(company, Role) %>%
spread(Role, n, fill=0)%>%
as.data.frame() %>%
mutate(TotalEmp= select(., Manager:Surbodinate) %>%
apply(1, sum, na.rm=TRUE))
Also, I have summarised the man hours as below:
df <- df %>%group_by(company) %>%
summarize(TotalHours = sum(Manhours, na.rm = TRUE))
How would I combine these two steps at once or is there a cleaner/simpler way of getting the desired output?
dplyr solution:
df %>%
mutate(Title = if_else((Title=="Manager" ),
"Manager","Surbodinate")) %>%
group_by(company) %>%
summarise(Manager = sum(Title == "Manager"), Subordinate = sum(Title == "Surbodinate"), TotalEmp = n(), Manhours = sum(Manhours, na.rm = TRUE))
company Manager Subordinate TotalEmp Manhours
<fct> <int> <int> <int> <dbl>
1 KFC 1 1 2 23
2 McD 1 2 3 17
how about something like this:
df %>%
mutate(Role = ifelse(Title=="Manager" ,
"Manager", "Surbodinate"))%>%
group_by(company) %>%
mutate(TotalEmp = n(),
TotalHours = sum(Manhours, na.rm=TRUE)) %>%
reshape2::dcast(company + TotalEmp + TotalHours ~ Role)
This is not tidyverse nor is it a one step process. But if you use data.table you could do:
library(data.table)
setDT(df, key = "company")
totals <- DT[, .(TotalEmp = .N, TotalHours = sum(Manhours, na.rm = TRUE)), by = company]
dcast(DT, company ~ ifelse(Title == "Manager", "Manager", "Surbodinate"))[totals]
# company Manager Surbodinate TotalEmp TotalHours
# 1 KFC 1 1 2 23
# 2 McD 1 2 3 17

How can I convert data frame of survey responses to a frequency table?

I have an R dataframe of survey results. Each column is a response to a question on the survey. It can take values 1 to 10 and NA. I would like turn this into a frequency table.
This is an example of the data I have. I'm pretending the values go from 1 to 3, instead of 1 to 10.
data.frame(
"Person" = c(1,2,3),
"Question1" = c(NA, "1", "1"),
"Question2" = c("1", "2", "3")
)
What I want:
data.frame(
"Question" = c("Question1", "Question2"),
"Frequency of 1" = c(2, 1),
"Frequency of 2" = c(0 , 1),
"Frequency of 3" = c(0, 1)
)
I have tried using likert() from the likert package, but I'm getting fractional results which cannot be correct. Is there a simple solution to this problem?
Here is a solution using the dplyr and purrr packages
library(dplyr)
library(purrr)
data.frame(
"Person" = c(1,2,3),
"Question1" = c(NA, "1", "1"),
"Question2" = c("1", "2", "3")
)
df %>%
select(-Person) %>%
mutate_all(~ factor(.x, levels = as.character(1:10) ) %>% addNA() ) %>%
map(table) %>%
transpose() %>%
map(as.integer) %>%
set_names( ~ paste0("Frequency of ",ifelse(is.na(.), "NA", .))) %>%
as_tibble() %>%
mutate(Question = setdiff(names(df),"Person")) %>%
select(Question,everything(), "Frequency of NA" = `Frequency of ` )
A data.table solution:
require(data.table)
setDT(df)
# Melt data:
df <- melt(df, id.vars = "Person", value.name = "Question")
# Cast data to required structure:
df <- data.frame(dcast(df, variable ~ Question))
# Rename variables and remove NA count (as per Ops question):
names(df)[1] <- "Question"
names(df)[-1] <- gsub("X", "Frequency of ", names(df)[-1])
df$NA. <- NULL
df
# Question Frequency of 1 Frequency of 2 Frequency of 3
#1 Question1 2 0 0
#2 Question2 1 1 1
Or a one line answer:
dcast(melt(setDT(df), id.vars="Person", value.name="Question")[!Question %in% NA][, Question := paste0("Frequency of ", Question)], variable ~ Question)
A different tidyverse possibility could be:
df %>%
gather(Question, val, -Person, na.rm = TRUE) %>%
group_by(Question, val) %>%
summarise(res = length(val)) %>%
ungroup() %>%
mutate(val = paste0("Frequency.of.", val)) %>%
spread(val, res, fill = NA)
Question Frequency.of.1 Frequency.of.2 Frequency.of.3
<chr> <int> <int> <int>
1 Question1 2 NA NA
2 Question2 1 1 1
Here it, first, transforms the data from wide to long format. Second, it calculates the frequencies according the questions. Finally, it creates the "Frequency.of." variables and returns the data to its desired shape.
Or if you want to calculate also the NA values per questions:
df %>%
gather(Question, val, -Person) %>%
group_by(Question, val) %>%
summarise(res = length(val)) %>%
ungroup() %>%
mutate(val = paste0("Frequency.of.", val)) %>%
spread(val, res, fill = NA)
Question Frequency.of.1 Frequency.of.2 Frequency.of.3 Frequency.of.NA
<chr> <int> <int> <int> <int>
1 Question1 2 NA NA 1
2 Question2 1 1 1 NA
This is not the most elegant but might help: df2 is your data set.
Data:
df2<-data.frame(
"Person" = c(1,2,3),
"Question1" = c(NA, "1", "1"),
"Question2" = c("1", "2", "3"),stringsAsFactors = F
)
Target:
EDIT:: You could "automate" as follows
df2[is.na(df2)]<-0 #To allow numeric manipulation
values<-c("1","2","3")
Final_df<-sapply(values,function(val) apply(df2[,-1],2,function(x) sum(x==val)))
Final_df<-as.data.frame(Final_df)
names(Final_df)<-paste0("Frequency of_",1:ncol(Final_df))
This yields:
Frequency of_1 Frequency of_2 Frequency of_3
Question1 2 0 0
Question2 1 1 1

Match grouping variable with stripping/shading using kableExtra

I have a table with multiple records for each individual (ID1) and would like the row shading (i.e. kable_styling(c("striped")) to alternate by group (ID1) rather than by every other row. I was hoping I could add group_by(ID1) to the code below... Alas I am still in search of a solution. While there are lots of helpful tips are shown here, I have not been able to find a solution.
I am also wondering how to make a single outside border to the table rather than border every cell.
Below is a reproducible data set.
Many thanks in advance.
```{r echo=F, warning=F, message = FALSE}
library(tidyverse)
library(kableExtra)
set.seed(121)
Dat <- data.frame(
ID1 = sample(c("AAA", "BBB", "CCC","DDD"), 100, replace = T),
ID2 = sample(c("Cat", "Dog", "Bird"), 100, replace = T),
First = rnorm(100),
Two = sample.int(100))
ExTbl <- Dat %>%
group_by(ID1, ID2) %>%
summarize(One = mean(First),
Max = max(Two)) %>%
arrange(ID1)
kable(ExTbl) %>%
kable_styling(c("striped", "bordered"), full_width = F)
```
> head(as.data.frame(ExTbl) )
ID1 ID2 One Max
1 AAA Bird 0.15324169 86
2 AAA Cat -0.02726006 83
3 AAA Dog -0.19618126 78
4 BBB Bird 0.62176633 100
5 BBB Cat -0.35502912 77
6 BBB Dog -0.29977145 87
>
Right now there is no direct approach in kableExtra but this is the method I used last time. Maybe I should pack this into this package.
library(tidyverse)
library(kableExtra)
set.seed(121)
Dat <- data.frame(
ID1 = sample(c("AAA", "BBB", "CCC","DDD"), 100, replace = T),
ID2 = sample(c("Cat", "Dog", "Bird"), 100, replace = T),
First = rnorm(100),
Two = sample.int(100))
ExTbl <- Dat %>%
group_by(ID1, ID2) %>%
summarize(One = mean(First),
Max = max(Two)) %>%
arrange(ID1)
ind_end <- cumsum(rle(as.character(ExTbl$ID1))$lengths)
ind_start <- c(1, ind_end[-length(ind_end)] + 1)
pos <- purrr::map2(ind_start, ind_end, seq)
pos <- unlist(pos[1:length(pos) %% 2 != 0])
kable(ExTbl) %>%
kable_styling(c("bordered"), full_width = F) %>%
row_spec(pos, background = "#EEEEEE")

Rename columns of dataframe by days in R

I need to rename a dataframe by days in analysis.
names(dados) <- c("name", "day_1","Freq_1","Percent_1","day_2","Freq_2","Percent_2",
"day_3","Freq_3","Percent_3","day_4","Freq_4","Percent_4",
"day_5","Freq_5","Percent_5","day_6","Freq_6","Percent_6",
"day_7","Freq_7","Percent_7","day_8","Freq_8","Percent_8",
"day_9","Freq_9","Percent_9")
I'm doing an analysis that the data I get is in a list of dataframes, where each dataframe represents a day of analysis. I combine the dataframes and I have the columns 'name' unique and 'day_X', 'Freq_X' and 'Percent_X' for each dataframe as a return.
As return I need the columns to have the following names:
"name", "day_1","Freq_1","Percent_1","day_2","Freq_2","Percent_2","day_3","Freq_3","Percent_3"
How do I go about analyzing 50 days?
reproducible example:
day1 <- data.frame(name = c("jose", "mary", "julia"), freq = c(1,5,3), percent = c(40,30,20))
day2 <- data.frame(name = c("abner", "jose", "mary"), freq = c(3,5,4), percent = c(20,30,20))
day3 <- data.frame(name = c("abner", "jose", "mike"), freq = c(6,2,3), percent = c(40,30,70))
day4 <- data.frame(name = c("andre", "joseph", "ana"), freq = c(1,5,8), percent = c(40,30,20))
day5 <- data.frame(name = c("abner", "poli", "joseph"), freq = c(4,3,3), percent = c(10,30,10))
dates <- list(day1,day2,day4,day5)
data <- Reduce(function(x, y) merge(x, y, by = "name", all = TRUE), dates)
Here's a way to get what you want using the tidyverse suite of packages. We start by putting the data in the "long" format - but add a column with the date:
long_form <- dates %>%
imap_dfr(function(x, y) dplyr::mutate(x, day_num = y))
Now, to get the wide format you are after, we need to reformat things a bit, as done in the following code. I'm not sure what is supposed to go in the day_# variables, as #useR mentioned in the comments, so it's missing. If you have a variable called day, the code should automatically do the right thing as written.
wide_form <- long_form %>%
gather(key, value, -name,-day_num) %>%
dplyr::mutate(
key = paste(key, day_num, sep = "_")
) %>%
select(-day_num) %>%
spread(key, value)
One can use dplyr::bind_rows to merge all data frames form the list to a data frame. Please provide name to list so that day1, day2 etc can set beforehand. Finally, gather and spread is used to transform the data.
names(dates) <- paste("day", seq_along(dates), sep = "")
library(tidyverse)
bind_rows(dates,.id = "Name") %>%
group_by(Name) %>%
mutate(rn = row_number()) %>%
ungroup() %>%
gather(Key, value, -Name,-rn) %>%
unite("Key", c("Key", "Name")) %>%
spread(Key, value) %>%
select(-rn)
Result:
# # A tibble: 3 x 12
# freq_day1 freq_day2 freq_day3 freq_day4 name_day1 name_day2 name_day3 name_day4 percent_day1 percent_day2 percent~ percent~
# * <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 1 3 1 4 jose abner andre abner 40 20 40 10
# 2 5 5 5 3 mary jose joseph poli 30 30 30 30
# 3 3 4 8 3 julia mary ana joseph 20 20 20 10
#
Data:
Data is slightly modified from OP. I have included stringsAsFactors = FALSE argument as part of data.frame to avoid a mutate_at call to convert factor to character.
day1 <- data.frame(name = c("jose", "mary", "julia"), freq = c(1,5,3), percent = c(40,30,20), stringsAsFactors = FALSE)
day2 <- data.frame(name = c("abner", "jose", "mary"), freq = c(3,5,4), percent = c(20,30,20), stringsAsFactors = FALSE)
day3 <- data.frame(name = c("abner", "jose", "mike"), freq = c(6,2,3), percent = c(40,30,70), stringsAsFactors = FALSE)
day4 <- data.frame(name = c("andre", "joseph", "ana"), freq = c(1,5,8), percent = c(40,30,20), stringsAsFactors = FALSE)
day5 <- data.frame(name = c("abner", "poli", "joseph"), freq = c(4,3,3), percent = c(10,30,10), stringsAsFactors = FALSE)
dates <- list(day1,day2,day4,day5)

Resources