I have data like this:
Name Rating
Tom 3
Tom 4
Tom 2
Johnson 5
Johnson 7
But I'd like it so each unique name is instead a column, with the ratings below, in each row. How can I approach this?
Here is a good way of doing it
x <- data.frame(c("Tom", "Tom", "Tom", "Johnson", "Johnson"), c(3,4,2,5,7))
colnames(x) <- c("Name", "Rating")
n <- unique(x[,1])
m <- max(table(x[,1]))
c <- data.frame(matrix(, ncol = length(n), nrow = m))
for (i in 1:length(n)) {
l <- x[which(x[,1] == n[i]), 2]
l2 <- rep("", m - length(l))
c[,i] <- c(l, l2)
}
colnames(c) <- n
Results:
Tom Johnson
1 3 5
2 4 7
3 2
Here is a way using CRAN package reshape.
library(reshape2)
d <- dcast(mydata, Rating ~ Name, value.var = "Rating")[-1]
d
# Johnson Tom
#1 NA 2
#2 NA 3
#3 NA 4
#4 5 NA
#5 7 NA
As you can see, there are too many NA values in this result. One way of getting rid of them could be:
d <- lapply(d, function(x) x[!is.na(x)])
n <- max(sapply(d, length))
d <- do.call(cbind.data.frame, lapply(d, function(x) c(x, rep(NA, n - length(x)))))
d
# Johnson Tom
#1 5 2
#2 7 3
#3 NA 4
Well, this does the job but introduces some NAs.
Edit: Replace the NAs with some other Rating.
mydata<-data.frame(Name=c("Tom","Tom","Tom","Johnson","Johnson"),Rating=c(3,4,2,5,7))
library(reshape2)
library(tidyverse)
mydata1<-mydata %>%
mutate(Name=as.factor(Name)) %>%
melt(id.var="Name") %>%
dcast(variable+value~Name) %>%
select(-value) %>%
rename(Name=variable) %>%
select_if(is.numeric)
mydata1 %>%
mutate(Johnson=as.factor(Johnson),Tom=as.factor(Tom)) %>%
mutate(Johnson=fct_explicit_na(Johnson,na_level = "No Rating"),
Tom=fct_explicit_na(Tom,na_level = "No Rating"))
Johnson Tom
1 No Rating 2
2 No Rating 3
3 No Rating 4
4 5 No Rating
5 7 No Rating
Related
I need to get the common columns of a data frame list separated in different data frames. Please look at the following example:
df1 <- data.frame(Dates = c('01-01-2020','02-01-2020','03-01-2020'), col1 = c(1,2,3), col2 = c(3,2,1))
df2 <- data.frame(Dates = c('01-01-2020','02-01-2020','03-01-2020'), col1 = c(4,5,6), col2 = c(6,5,4))
df3 <- data.frame(Dates = c('01-01-2020','02-01-2020'), col1 = c(7,8), col2 = c(8,7))
ldf <- list(df1, df2, df3)
The desired output would be the following two data frames:
df_col1:
Date df1 df2 df3
01-01-2020 1 4 7
02-01-2020 2 5 8
03-01-2020 3 6 NA
df_col2:
Date df1 df2 df3
01-01-2020 3 6 8
02-01-2020 2 5 7
03-01-2020 1 4 NA
Of course, ldf is actually way longer, but the number of columns is fixed to 5, so the number of outputs is also fixed (4). This means I wouldn't mind if I use a block of code for each output.
I've tried several things but none seems to work. I'm using base R and hope to find a solution wihtout additional packages.
Thanks a lot for your time!
We bind the list elements with bind_rows from dplyr, then loop over the 'col' columns, along with the common 'Dates', reshape to 'wide' format with pivot_wider and rename if needed
library(dplyr)
library(purrr)
library(tidyr)
library(stringr)
newdf <- bind_rows(ldf)
out <- map(names(newdf)[-1], ~
newdf %>%
select(Dates, .x) %>%
mutate(rn = rowid(Dates)) %>%
pivot_wider(names_from =rn, values_from = !! rlang::sym(.x)) %>%
rename_at(-1, ~ str_c('df', seq_along(.))))
-output
out
#[[1]]
# A tibble: 3 x 4
# Dates df1 df2 df3
# <chr> <dbl> <dbl> <dbl>
#1 01-01-2020 1 4 7
#2 02-01-2020 2 5 8
#3 03-01-2020 3 6 NA
#[[2]]
# A tibble: 3 x 4
# Dates df1 df2 df3
# <chr> <dbl> <dbl> <dbl>
#1 01-01-2020 3 6 8
#2 02-01-2020 2 5 7
#3 03-01-2020 1 4 NA
Or using base R
newdf <- do.call(rbind, ldf)
f1 <- function(dat, colName) {
lst1 <- split(dat[[colName]], dat$Dates)
do.call(rbind, lapply(lst1, `length<-`, max(lengths(lst1))))
}
f1(newdf, 'col1')
f1(newdf, 'col2')
Another Base R option is to do:
m <- Reduce(function(x,y)merge(x, y, by='Dates', all=TRUE), ldf)
lapply(split.default(m[-1], sub("\\..*", "", names(m[-1]))), cbind, m[1])
Another wordy approach using base R:
#Code
names(ldf) <- paste0('df',1:length(ldf))
#Function
myfun <- function(x) {
y <- reshape(x,direction = 'long',
v.names='col',
idvar = 'Dates',varying = list(2:3))
return(y)
}
z <- do.call(rbind,lapply(ldf,myfun))
z$Data <- gsub("\\..*","",rownames(z))
rownames(z) <- NULL
#Reshape
z2 <- reshape(z,idvar = c('Dates','time'),timevar = 'Data')
#List
List <- split(z2,z2$time)
List
Output:
List
$`1`
Dates time col.df1 col.df2 col.df3
1 01-01-2020 1 1 4 7
2 02-01-2020 1 2 5 8
3 03-01-2020 1 3 6 NA
$`2`
Dates time col.df1 col.df2 col.df3
4 01-01-2020 2 3 6 8
5 02-01-2020 2 2 5 7
6 03-01-2020 2 1 4 NA
I want to check a word (in a column in a data-frame) against 4 lists (a, b, c, d):
if df$word is in a then df$code <- 1
if df$word is in b then df$code <- 2
if df$word is in c then df$code <- 3
if df$word is in d then df$code <- 4
if df$word is in a & b then df$code <- 1 2
if df$word is in a & c then df$code <- 1 3
if df$word is in a & d then df$code <- 1 4
if df$word is in b & c then df$code <- 2 3
if df$word is in b & d then df$code <- 2 4
if df$word is in c & d then df$code <- 3 4
etc.
What is the most efficient way to do so?
Example
df <- data.frame(word = c("book", "worm", "digital", "context"))
a <- c("book", "context")
b <- c("book", "worm", "context")
c <- c("digital", "worm", "context")
d <- c("context")
Expected output:
book 1 2
worm 2 3
digital 3
context 1 2 3 4
We can use a double sapply loop where for every element in the data frame we check in which list element it is present and get the corresponding list number.
lst <- list(a, b, c, d)
df$output <- sapply(df$V1, function(x) paste0(which(sapply(lst,
function(y) any(grepl(x,y)))), collapse = ","))
df
# V1 output
#1 book 1,2
#2 worm 2,3
#3 digital 3
#4 context 1,2,3,4
data
df <- read.table(text = "book
worm
digital
context")
Try this:
df <- data.frame(x =c("book", "worm","digital", "context"))
a <- c("book", "context")
b<- c("book", "worm", "context")
c <- c("digital", "worm", "context")
d <- c("context")
anno <- function(x){
rslt = ""
if (x %in% a) rslt =paste0(rslt," 1")
if (x %in% b) rslt =paste0(rslt," 2")
if (x %in% c) rslt =paste0(rslt," 3")
if (x %in% d) rslt =paste0(rslt," 4")
return(stringr::str_trim(rslt))
}
df$code <- sapply(df$x, anno)
df
#> x code
#> 1 book 1 2
#> 2 worm 2 3
#> 3 digital 3
#> 4 context 1 2 3 4
Created on 2018-08-17 by the reprex package (v0.2.0.9000).
This can also be accomplished in two steps:
Combine the four lists and reshape into long format
Aggregate while joing with df
using data.table:
library(data.table)
long <-setDT(melt(list(a, b, c, d), value.name = "word"))
long[setDT(df), on = "word", by = .EACHI, .(code = toString(L1))][]
word code
1: book 1, 2
2: worm 2, 3
3: digital 3
4: context 1, 2, 3, 4
I want to select all columns that start in one of the four following ways: CB, LB, LW, CW but not any columns that have the string "con."
My current approach is:
tester <- df_ans[,names(df_ans) %in% colnames(df_ans)[grepl("^(LW|LB|CW|CB)[A-Z_0-9]*",colnames(df_ans))]]
tester <- tester[,names(tester) %in% colnames(tester)[!grepl("con",colnames(tester))]]
Is there a better / more efficient way to do this in a library like dplyr?
We can use matches
library(dplyr)
df %>%
select(matches("^(CB|LB|LW|CW)"), -matches("con"))
# CB1 LB2 CW3 LW20
#1 3 9 6 1
#2 3 3 4 5
#3 7 7 7 7
#4 5 8 7 2
#5 6 3 3 3
data
set.seed(24)
df <- as.data.frame(matrix(sample(1:9, 10 * 5, replace = TRUE),
ncol = 10, dimnames = list(NULL, c("CB1", "LB2", "CW3", "WC1",
"LW20", "conifer", "hercon", "other", "other2", "other3"))))
Try this:
nms <- names(df_ans)
df_ans[ grepl("^(LW|LB|CW|CB)", nms) & !grepl("con", nms) ]
I'm working on a data.table with a column like this:
A <- c("a;b;c","a;a;b","d;a;b","f;f;f")
df <- data.frame(A)
I would like to separate this column into 3 columns like this:
seg1 seg2 seg3
1 a b c
2 a b <NA>
3 d a b
4 f <NA> <NA>
The thing here is that when i split each row by ";" i need to keep unique of the row.
Here's a tidyverse approach. We split the character in A, keep only the unique values, paste the result back together and separate into three columns:
library(tidyverse)
df %>%
mutate(A = map(strsplit(as.character(A), ";"),
.f = ~ paste(unique(.x), collapse = ";"))) %>%
separate(A, into = c("seg1", "seg2", "seg3"))
Which gives:
# seg1 seg2 seg3
#1 a b c
#2 a b <NA>
#3 d a b
#4 f <NA> <NA>
library(stringr)
A <- c("a;b;c","a;a;b","d;a;b","f;f;f")
df <- data.frame(A)
df <- str_split_fixed(df$A, ";", 3)
df <- apply(X = df,
FUN = function(x){
return(x[!duplicated(x)][1:ncol(df)])
},
MARGIN = 1)
df <- t(df)
df <- as.data.frame(df)
names(df) <- c("seg1", "seg2", "seg3")
df
# seg1 seg2 seg3
# 1 a b c
# 2 a b <NA>
# 3 d a b
# 4 f <NA> <NA>
I have a question related to separate() in the tidyr package. When there is no NA in a data frame, separate() works. I have been using this function a lot. But, today I had a case in which there were NAs in a data frame. separate() returned an error message. I could be very silly. But, I wonder if tidyr may not be designed for this kind of data cleaning. Or is there any way separate() can work with NAs? Thank you very much for taking your time.
Here is an updated sample based on the comments. Say I want to separate characters in y and create new columns. If I remove the row with NA, separate() will work. But, I do not want to delete the row, what could I do?
x <- c("a-1","b-2","c-3")
y <- c("d-4","e-5", NA)
z <- c("f-6", "g-7", "h-8")
foo <- data.frame(x,y,z, stringsAsFactors = F)
ana <- foo %>%
separate(y, c("part1", "part2"))
# > foo
# x y z
# 1 a-1 d-4 f-6
# 2 b-2 e-5 g-7
# 3 c-3 <NA> h-8
# > ana <- foo %>%
# + separate(y, c("part1", "part2"))
# Error: Values not split into 2 pieces at 3
One way would be:
res <- foo %>%
mutate(y=ifelse(is.na(y), paste0(NA,"-", NA), y)) %>%
separate(y, c('part1', 'part2'))
res[res=='NA'] <- NA
res
# x part1 part2 z
#1 a-1 d 4 f-6
#2 b-2 e 5 g-7
#3 c-3 <NA> <NA> h-8
You can use extra option in separate.
Here's an example from hadley's github issue page
> df <- data.frame(x = c("a", "a b", "a b c", NA))
> df
x
1 a
2 a b
3 a b c
4 <NA>
> df %>% separate(x, c("a", "b"), extra = "merge")
a b
1 a <NA>
2 a b
3 a b c
4 <NA> <NA>
> df %>% separate(x, c("a", "b"), extra = "drop")
a b
1 a <NA>
2 a b
3 a b
4 <NA> <NA>