Given a dataframe df as follows:
df <- structure(list(class = c("A", "", "", "B", ""), name = c("Jack",
"Rose", "Steve", "James", "Rick"), score = c(7L, 18L, 9L, 15L,
12L)), class = "data.frame", row.names = c(NA, -5L))
Is it possible I get the output effect as follows with formattable:
library(df)
formattable(df)
Out:
Updated code:
library(flextable)
data <- flextable(
df,
col_keys = c("class", "name", "score"))
data <- merge_v(data, j = c("class"))
formattable(data)
Out:
Error in create_obj(x, "formattable", list(formatter = formatter, format = list(...), :
argument "formatter" is missing, with no default
It raises same error as above:
library(gt)
# dummy data
dat <- tibble(
a=1:3,
b=c("a","b c","d e f")
)
d <- dat %>%
mutate(b = str_replace_all(b, " ", "<br>")) %>%
gt() %>%
fmt_markdown(columns = TRUE)
formattable(d)
Code which may help:
a1 <- c(1, 2, 3)
data <- c(100, 155, -4)
a2 <- c(0, paste(data, collapse = "<br> "), 1000000)
b <- data.frame(cbind(a1, a2))
width <- 10
formattable(b)
Out:
Reference:
In R when using formattable() place line breaks between entries inside a single cell
I have a dataset called data1 that I need to split the first column into two columns. The issue I'm having is that there is no delimiter between what I need to split and the character lengths are different is many rows.
I would like to split it by the date and sex.
E.g
12/1/09male
1/9/20female
13/1/19female
4/12/12male
I've been trying this but because the values have a different amount of characters I'm stuck.
separate(data1, col = 1, into = c("date","sex"), sep = "")
Any help would be hugely appreciated!
An option is a positive look-behind and look-ahead to split on a digit followed by an "m" or "f".
df %>% separate(1, c("date", "sex"), sep = "(?<=\\d)(?=[mf])")
# date sex
#1 12/1/09 male
#2 1/9/20 female
#3 13/1/19 female
#4 4/12/12 male
For what it's worth, the same regexp pattern works in base R's strsplit
setNames(do.call(
rbind.data.frame,
strsplit(as.character(df[, 1]), "(?<=\\d)(?=[mf])", perl = T)),
c("date", "sex"))
Sample data
df <- read.table(text =
'12/1/09male
1/9/20female
13/1/19female
4/12/12male')
I am fairly new to R so I am sure this is not the most elegant solution. I first add a comma between the date and sex and then separate on the comma
a <- data.frame(row_1 = c("12/1/09male", "1/9/20female", "13/1/19female", "4/12/12male"))
a[, "row_1"] = str_replace(a$row_1, "(male|female)", ",\\1")
separate(a, row_1, ",", into = c("date", "sex"))
Using tidyr::extract, we can capture data into two parts. First capture the date (in the format d/m/y) and second capture all the remaining part of the string.
tidyr::extract(df, V1, c("date", "sex"), "(\\d+/\\d+/\\d+)(.*)")
# date sex
#1 12/1/09 male
#2 1/9/20 female
#3 13/1/19 female
#4 4/12/12 male
data
df <- structure(list(V1 = structure(c(2L, 1L, 3L, 4L), .Label = c("1/9/20female",
"12/1/09male", "13/1/19female", "4/12/12male"), class = "factor")),
class = "data.frame", row.names = c(NA,-4L))
Base R solution using gsub and some regex:
df_clean <- within(df, {
date <- as.Date(gsub("[A-Za-z]+", "", V1), format = "%d/%m/%y")
sex <- as.factor(gsub("\\d+|\\/", "", V1))
rm(V1)
}
)
Data:
df <- structure(list(V1 = structure(c(2L, 1L, 3L, 4L), .Label = c("1/9/20female",
"12/1/09male", "13/1/19female", "4/12/12male"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
I have just started my journey with R. I want to test values across multiple columns for the same condition and return 5 if any of the values is "hello" within a row:
result = ifelse((myData[1] == "hello") | (myData[2] == "hello") | (myData[3] == "hello"), 5, 0)
This works fine, but code seems to be redundant. When I do:
resultSec = ifelse(myData[1:3] == "hello", 5, 0)
Then all 3 columns are checked against the condition, but the result I get is not a single column, but 3 columns. So then I would have to perform an additional comparison for all columns which makes totally more lines of code then the first redundant method.
How can I get in this case a one column of values in efficient way ?
You can use the function apply() to iterate over a data.frame or matrix, by either columns or rows. The margin argument determines which one you use.
Here we want to check the rows, so we use margin = 1:
dat <- data.frame(col1 = c("happy", "sad", "mad"),
col2 = c("tired", "sleepy", "happy"),
col3 = c("relaxed", "focused", "fine"))
dat$res <- apply(X = dat, MARGIN = 1,
FUN = function(x) ifelse("happy" %in% x, 5, 0))
dat
col1 col2 col3 res
1 happy tired relaxed 5
2 sad sleepy focused 0
3 mad happy fine 5
We can use rowSums here
df1$res <- rowSums(df1 == "happy") * 5
df1$res
#[1] 5 0 5
data
df1 <- structure(list(col1 = structure(c(1L, 3L, 2L), .Label = c("happy",
"mad", "sad"), class = "factor"), col2 = structure(c(3L, 2L,
1L), .Label = c("happy", "sleepy", "tired"), class = "factor"),
col3 = structure(c(3L, 2L, 1L), .Label = c("fine", "focused",
"relaxed"), class = "factor")), .Names = c("col1", "col2",
"col3"), row.names = c(NA, -3L), class = "data.frame")
I will put dput of what my list looks like at the bottom such that the q can be reproducible. The dput is of a not x.
I have a big nested list called x that I'm trying to build a data frame from but cannot figure it out.
I have done the first part:
for(i in 1:3){a[[i]]<-x$results[[i]]$experiences
indx <- lengths(a)
zz <- as.data.frame(do.call(rbind,lapply(a, `length<-`, max(indx))))}
For this I used the following answer:
Converting nested list (unequal length) to data frame
This leaves me a data.frame with n columns for results where n is the max results for any i:
v1 v2 v3
1 NULL NULL NULL
2 * * *
3 NULL NULL NULL
Each * is another nested list in the format list(experience = list(duration = ...
For example the first * in row 2, column v1. I don't want the total list. I only want:
a[[2]][[1]]$experience$start
or in terms of the original list x:
x$results[[2]]$experiences[[1]]$experience$start
I feel like I'm nearly there with some tweaks. I tried:
for(i in 1:3){a[[i]]<-x$results[[i]]$experiences
indx <- lengths(a)
for(y in 1:length(a[[i]])) aa <- rbind(aa,tryCatch(x$results[[i]]$experiences[[y]]$experience$start, error=function(e) print(NA)))
zz <- as.data.frame(do.call(rbind,lapply(aa, `length<-`, max(indx))))}
Resulting in:
v1 v2 v3
1 NA NA NA
2 NA NA NA
3 2014 NA NA
4 2012 NA NA
5 2006 NA NA
6 NA NA NA
7 NA NA NA
Tried cbind instead of rbind on final line and that put all the dates in the first row.
I also tried the following:
for(i in 1:3){a[[i]]<-lengths(x$results[[i]]$experiences)
indx <- lengths(a)
for(y in 1:length(indx)){tt[i] <- tryCatch(x$results[[i]]$experiences[[y]]$experience$start, error=function(e) print(""))}
zz <- as.data.frame(do.call(rbind,lapply(tt, `length<-`, max(indx))))}
This came close, builds the right format but only returns the first result:
v1 v2 v3
1 NA NA NA
2 2014 NA NA
3 NA NA NA
The format I want is:
V1 V2 V3
1 NA NA NA
2 2014 2012 2006
3 NA NA NA
((Sample data now at bottom))
Newest attempt:
Doing the following but returns only the first start date from each a[[i]], the second loop I need to make the list aa[i][y] something different.
for(i in 1:3){a[[i]]<-x$results[[i]]$experiences
for(y in 1:length(a[[i]])){aa[i][y] = if(is.null(a[[i]][[y]]$experience$start)){"NULL"}else{a[[i]][[y]]$experience$start}}}
So for dput2 I'd like the form:
v1 v2 v3 v4 v5 v6 v7 v8
1 2015
2 2011 2007 null null null null null null
3 2016 2015 2015 2015 2013 2010
I dont mind if the blanks are null or na
UPDATE
The below answer almost works, however in my data the structure changes, the order of the names (roleName, duration etc) change so that ruins the answer as cumsum is used to determine when a new list is found. If you have duration then start the keys are 9 and 1 and the cumsum part labels them two different lists.
I wrote the following:
my.list <- list(structure(
list(
experience = structure(
list(
start = "1",
end = "1",
roleName = "a",
summary = "a",
duration = "a",
current = "a",
org = structure(list(name = "a", url = "a"), .Names = c("name","url")),
location = structure(
list(
displayLocation = NULL,
lat = NULL,
lng = NULL
),
.Names = c("displayLocation",
"lat", "lng")
) ),.Names = c("start", "end", "roleName", "summary", "duration", "current", "org", "location")),
`_meta` = structure(
list(weight = 1L, `_sources` = list(structure(
list(`_origin` = "a"), .Names = "_origin"
))),.Names = c("weight", "_sources"))),.Names = c("experience", "_meta")))
Then:
aa <- lapply(1:length(a), function(y){tryCatch(lapply(1:length(a[[y]]),
function(i){a[[y]][[i]]$experience[names(my.list2[[1]]$experience)]}), error=function(e) print(list()))})
This changes the structure such that key2 will always be in the right order.
However Then I found after this loop I have another issue.
Sometimes I have for example nothing but a roleName in the experience list. If that occurs twice in a row the keys are repeated. cumsum treats them as the same experience instead of separate ones.
This means I cannot create df3 because of duplicate identifiers for rows. And even if I could by removing troublesome rows, the names wouldn't match as i in the solution below matches the names using the sequence, if I remove any rows that changes the lengths.
Here is my total code for more insight:
for(i in 1:x$count){a[[i]]<-x$results[[i]]$experiences}
aa <- lapply(1:length(a), function(y){tryCatch(lapply(1:length(a[[y]]),
function(i){a[[y]][[i]]$experience[names(my.list2[[1]]$experience)]}), error=function(e) print(list()))})
aaa <- unlist(aa)
dummydf <- data.frame(b=c("start", "end", "roleName", "summary",
"duration", "current", "org.name", "org.url"), key=1:8)
df <- data.frame(a=aaa, b=names(aaa))
df2 <- left_join(df, dummydf)
df2$key2 <- as.factor(cumsum(df2$key < c(0, df2$key[-length(df2$key)])) +1)
df_split <- split(df2, df2$key2)
df3 <- lapply(df_split, function(x){
x %>% select(-c(key, key2)) %>% spread(b, a)
}) %>% data.table::rbindlist(fill=TRUE) %>% t
df3 <- data.frame(df3)
i <- sapply(seq_along(aa), function(y) rep(y, sapply(aa, function(x) length(x))[y])) %>% unlist
names(df3) <- paste0(names(df3), "_", i)
df4 <- data.frame(t(df3))
df4$dates <- as.Date(NA)
df4$dates <- as.Date(df4$start)
df4 <- data.frame(dates = df4$dates)
df4 <- t(df4)
df4 <- data.frame(df4)
names(df4) <- paste0(names(df4), "_", i)
df4[] <- lapply(df4[], as.character)
l1 <- lapply(split(stack(df4), sub('.*_', '', stack(df4)[,2])), '[', 1)
df5 <- t(do.call(cbindPad, l1))
df5 <- data.frame(df5)
cbindpad taken from this question
New sample code including the issues:
dput3 =
list(list(), list(
structure(list(experience = structure(list(
duration = "1", start = "2014",
end = "3000", roleName = "a",
summary = "aaa",
org = structure(list(name = "a"), .Names = "name"),
location = structure(list(displayLocation = NULL, lat = NULL,
lng = NULL), .Names = c("displayLocation", "lat", "lng"
))), .Names = c("duration", "start", "end", "roleName", "summary",
"org", "location")), `_meta` = structure(list(weight = 1L, `_sources` = list(
structure(list(`_origin` = ""), .Names = "_origin"))), .Names = c("weight",
"_sources"))), .Names = c("experience", "_meta")),
structure(list(
experience = structure(list(end = "3000",
start = "2012", duration = "2",
roleName = "a", summary = "aaa",
org = structure(list(name = "None"), .Names = "name"),
location = structure(list(displayLocation = NULL, lat = NULL, lng = NULL), .Names = c("displayLocation", "lat", "lng"))), .Names = c("duration", "start", "end", "roleName",
"summary", "org", "location")), `_meta` = structure(list(
weight = 1L, `_sources` = list(structure(list(`_origin` = " "), .Names = "_origin"))), .Names = c("weight", "_sources"))), .Names = c("experience", "_meta")),
structure(list(
experience = structure(list(duration = "3",
start = "2006", end = "3000",
roleName = "a", summary = "aaa", org = structure(list(name = " "), .Names = "name"),
location = structure(list(displayLocation = NULL, lat = NULL, lng = NULL), .Names = c("displayLocation", "lat", "lng"))), .Names = c("duration", "start", "end", "roleName",
"summary", "org", "location")), `_meta` = structure(list(weight = 1L, `_sources` = list(structure(list(`_origin` = ""), .Names = "_origin"))), .Names = c("weight",
"_sources"))), .Names = c("experience", "_meta")),
structure(list(
experience = structure(list(roleName = "a",
location = structure(list(displayLocation = NULL, lat = NULL, lng = NULL), .Names = c("displayLocation", "lat", "lng"))), .Names = c("roleName",
"location")), `_meta` = structure(list(
weight = 1L, `_sources` = list(structure(list(`_origin` = " "), .Names = "_origin"))), .Names = c("weight", "_sources"))), .Names = c("experience", "_meta")),
structure(list(
experience = structure(list(roleName = "a",
location = structure(list(displayLocation = NULL, lat = NULL, lng = NULL), .Names = c("displayLocation", "lat", "lng"))), .Names = c("roleName",
"location")), `_meta` = structure(list(
weight = 1L, `_sources` = list(structure(list(`_origin` = " "), .Names = "_origin"))), .Names = c("weight", "_sources"))), .Names = c("experience", "_meta"))
),
list(
structure(list(experience = structure(list(
duration = "1", start = "2014",
end = "3000", roleName = "a",
summary = "aaa",
org = structure(list(name = "a"), .Names = "name"),
location = structure(list(displayLocation = NULL, lat = NULL,
lng = NULL), .Names = c("displayLocation", "lat", "lng"
))), .Names = c("duration", "start", "end", "roleName", "summary",
"org", "location")), `_meta` = structure(list(weight = 1L, `_sources` = list(
structure(list(`_origin` = ""), .Names = "_origin"))), .Names = c("weight",
"_sources"))), .Names = c("experience", "_meta"))))
Maybe this can help
library(dplyr)
library(tidyr)
a <- unlist(a)
df <- data.frame(a=a, b=names(a)) %>% mutate(key=cumsum(b=="experience.duration")) %>%
split(.$key) %>% lapply(function(x) x %>% select(-key) %>% spread(b, a)) %>%
do.call(rbind, .) %>% t %>% data.frame
df$key <- rownames(df)
Then you can filter in on the rows of interest
The above would be equivalent to
rbind(unlist(a)[1:8], unlist(a)[9:16],unlist(a)[17:24]) %>% t
Update
try this for dput2
a <- unlist(dput2)
library(dplyr)
library(tidyr)
dummydf <- data.frame(b=c("experience.start", "experience.end", "experience.roleName", "experience.summary",
"experience.org", "experience.org.name", "experience.org.url",
"_meta.weight", "_meta._sources._origin", "experience.duration"), key=1:10)
df <- data.frame(a=a, b=names(a))
df2 <- left_join(df, dummydf)
df2$key2 <- as.factor(cumsum(df2$key < c(0, df2$key[-length(df2$key)])) +1)
df_split <- split(df2, df2$key2)
df3 <- lapply(df_split, function(x){
x %>% select(-c(key, key2)) %>% spread(b, a)
}) %>% data.table::rbindlist(fill=TRUE) %>% t
df3 <- data.frame(df3)
i <- sapply(seq_along(dput2), function(y) rep(y, sapply(dput2, function(x) length(x))[y])) %>% unlist
names(df3) <- paste0(names(df3), "_", i)
View(df3)
Managed to figure something out, using dput3 above:
a <- dput3
aa <- lapply(1:length(a), function(y){tryCatch(lapply(1:length(a[[y]]),
function(i){if(is.null(a[[y]][[i]]$experience$start)){"Null"}else{a[[y]][[i]]$experience$start}}),error=function(e) print(list()))})
for(i in 1:length(aa)){for(y in 1:length(aa[[i]])){tryCatch(for(z in length(aa[[i]][[y]]))
{test <- rbind(test, data.frame(key = i, key2= y))},error=function(e) print(0))}}
aaa <- unlist(aa)
df <- data.frame(a=aaa)
df2 <- cbind(df, test)
i <- sapply(seq_along(aa), function(y) rep(y, sapply(aa, function(x) length(x))[y])) %>% unlist
df5 <- data.frame(dates = df2$a)
df5 <- t(df5)
df5 <- data.frame(df5)
names(df5) <- paste0(names(df5), "_", i)
df5[] <- lapply(df5[], as.character)
l1 <- lapply(split(stack(df5), as.numeric(sub('.*_', '', stack(df5)[,2]))), '[', 1)
df6 <- t(do.call(cbindPad, l1))
df6 <- data.frame(df6)
Will try and expand it so it works with more than one vertical (as currently in aa I isolate start)
I have this data.frame:
df <- structure(list(att_number = structure(1:3, .Label = c("0", "1",
"2"), class = "factor"), `1` = structure(c(2L, 3L, 1L), .Label = c("1026891",
"412419", "424869"), class = "factor"), `10` = structure(c(2L,
1L, 3L), .Label = c("235067", "546686", "92324"), class = "factor"),
`2` = structure(c(3L, 1L, 2L), .Label = c("12729", "7569",
"9149"), class = "factor")), .Names = c("att_number", "1",
"10", "2"), row.names = c(NA, -3L), class = "data.frame")
It looks like this having numbers as the column names.
att_number 1 10 2
0 412419 546686 9149
1 424869 235067 12729
2 1026891 92324 7569
Within a dplyr chain, I would like to order the columns in ascending order, like this:
att_number 1 2 10
0 412419 9149 546686
1 424869 12729 235067
2 1026891 7569 7569
I've tried using select_, but it doesn't want to work according to plan. Any idea on how I can do this? Here's my feeble attempt:
names_order <- names(df)[-1] %>%
as.numeric %>%
.[order(.)] %>%
as.character %>%
c('att_number', .)
df %>%
select_(.dots = names_order)
Error: Position must be between 0 and n
Update:
For newer versions of dplyr (>= 0.7.0):
library(tidyverse)
sort_names <- function(data) {
name <- names(data)
chars <- keep(name, grepl, pattern = "[^0-9]") %>% sort()
nums <- discard(name, grepl, pattern = "[^0-9]") %>%
as.numeric() %>%
sort() %>%
sprintf("%s", .)
select(data, !!!c(chars, nums))
}
sort_names(df)
Original:
You need back ticks around the numeric column names to stop select from trying to interpret them as column positions:
library(tidyverse)
sort_names <- function(data) {
name <- names(data)
chars <- keep(name, grepl, pattern = "[^0-9]") %>% sort()
nums <- discard(name, grepl, pattern = "[^0-9]") %>%
as.numeric() %>%
sort() %>%
sprintf("`%s`", .)
select_(data, .dots = c(chars, nums))
}
sort_names(df)