Related
I'm trying to count the frequency of dates matching in a data frame.
df1 <- data.frame(c('1991-01-09', '1991-01-11', '1991-02-17'))
df2 <- data.frame(c('1991-01-09', '1991-01-09', '1991-02-17'))
The result would be the following:
Date Freq
1991-01-09 2
1991-01-11 0
1991-02-17 1
df1$count <- rowSums(outer(df1$d, df2$d, `==`))
df1
# d count
# 1 1991-01-09 2
# 2 1991-01-11 0
# 3 1991-02-17 1
Data
df1 <- structure(list(d = c("1991-01-09", "1991-01-11", "1991-02-17")), row.names = c(NA, -3L), class = "data.frame")
df2 <- structure(list(d = c("1991-01-09", "1991-01-09", "1991-02-17")), class = "data.frame", row.names = c(NA, -3L))
Using sapply :
stack(sapply(df1$col1, function(x) sum(df2$col2 == x)))
# values ind
#1 2 1991-01-09
#2 0 1991-01-11
#3 1 1991-02-17
or you could use purrr::map_dbl()
data.frame("Date" = df1[, 1],
"Freq" = purrr::map_dbl(df1[, 1], ~sum(.x == df2[, 1]))
)
Date Freq
1 1991-01-09 2
2 1991-01-11 0
3 1991-02-17 1
I have a similar problem than the following, but the solution presented in the following link does not work for me:
tidyr spread does not aggregate data
I have a df in the following structure:
UndesiredIndex DesiredIndex DesiredRows Result
1 x1A x1 A 50,32
2 x1B x2 B 7,34
3 x2A x1 A 50,33
4 x2B x2 B 7,35
Using the code below:
dftest <- bd_teste %>%
select(-UndesiredIndex) %>%
spread(DesiredIndex, Result)
I expected the following result:
DesiredIndex A B
A 50,32 50,33
B 7,34 7,35
Although, I keep getting the following result:
DesiredIndex x1 x2
1 A 50.32 NA
2 B 7.34 NA
3 A NA 50.33
4 B NA 7.35
PS: Sometimes I force the column UndesiredIndex out with select(-UndesiredIndex), but I keep getting the following message:
Adding missing grouping variables: UndesiredIndex
Might be something easy to stack those rows, but I'm new to R and have been trying so hard to solve this but without success.
Thanks in advance!
We group by DesiredIndex, create a sequence column and then do the spread:
library(tidyverse)
df1 %>%
select(-UndesiredIndex) %>%
group_by(DesiredIndex) %>%
mutate(new = LETTERS[row_number()]) %>%
ungroup %>%
select(-DesiredIndex) %>%
spread(new, Result)
# A tibble: 2 x 3
# DesiredRows A B
# <chr> <chr> <chr>
#1 A 50,32 50,33
#2 B 7,34 7,35
Data
df1 <- structure(
list(
UndesiredIndex = c("x1A", "x1B", "x2A", "x2B"),
DesiredIndex = c("x1", "x2", "x1", "x2"),
DesiredRows = c("A", "B", "A", "B"),
Result = c("50,32", "7,34", "50,33", "7,35")
),
class = "data.frame",
row.names = c("1", "2", "3", "4")
)
Shorter, but more theoretically round-about.
Data
(Thanks to #akrun!)
df1 <- structure(
list(
UndesiredIndex = c("x1A", "x1B", "x2A", "x2B"),
DesiredIndex = c("x1", "x2", "x1", "x2"),
DesiredRows = c("A", "B", "A", "B"),
Result = c("50,32", "7,34", "50,33", "7,35")
),
class = "data.frame",
row.names = c("1", "2", "3", "4")
)
This is a great technique for concatenating rows.
df1 %>%
group_by(DesiredRows) %>%
summarise(Result = paste(Result, collapse = "|")) %>% #<Concatenate rows
separate(Result, into = c("A", "B"), sep = "\\|") #<Separate by '|'
#> # A tibble: 2 x 3
#> DesiredRows A B
#> <chr> <chr> <chr>
#> 1 A 50,32 50,33
#> 2 B 7,34 7,35
Created on 2018-08-06 by the reprex package (v0.2.0).
I have some poorly formatted data that I must work with. It contains two identifiers in the first two rows, followed by the data. The data looks like:
V1 V2 V3
1 Date 12/16/18 12/17/18
2 Equip a b
3 x1 1 2
4 x2 3 4
5 x3 5 6
I want to gather the data to make it tidy, but gathering only works when you have single column names. I've tried looking at spreading as well. The only solutions I've come up with are very hacky and don't feel right. Is there an elegant way to deal with this?
Here's what I want:
Date Equip metric value
1 12/16/18 a x1 1
2 12/16/18 a x2 3
3 12/16/18 a x3 5
4 12/17/18 b x1 2
5 12/17/18 b x2 4
6 12/17/18 b x3 6
This approach gets me close, but I don't know how to deal with the poor formatting (no header, no row names). It should be easy to gather if the formatting was proper.
> as.data.frame(t(df))
V1 V2 V3 V4 V5
V1 Date Equip x1 x2 x3
V2 12/16/18 a 1 3 5
V3 12/17/18 b 2 4 6
And here's the dput
structure(list(V1 = c("Date", "Equip", "x1", "x2", "x3"), V2 = c("12/16/18",
"a", "1", "3", "5"), V3 = c("12/17/18", "b", "2", "4", "6")), class = "data.frame", .Names = c("V1",
"V2", "V3"), row.names = c(NA, -5L))
Thanks for posting a nicely reproducible question. Here's some gentle tidyr/dplyr massaging.
library(tidyr)
df %>%
gather(key = measure, value = value, -V1) %>%
spread(key = V1, value = value) %>%
dplyr::select(-measure) %>%
gather(key = metric, value = value, x1:x3) %>%
dplyr::arrange(Date, Equip, metric)
#> Date Equip metric value
#> 1 12/16/18 a x1 1
#> 2 12/16/18 a x2 3
#> 3 12/16/18 a x3 5
#> 4 12/17/18 b x1 2
#> 5 12/17/18 b x2 4
#> 6 12/17/18 b x3 6
Updated for tidyr v1.0.0:
This is just a little bit cleaner syntax with the pivot functions.
df %>%
pivot_longer(cols = -V1) %>%
pivot_wider(names_from = V1) %>%
pivot_longer(cols = matches("x\\d"), names_to = "metric") %>%
dplyr::select(-name)
You can using reshape
library(reshape)
row.names(df) = df$V1
df$V1 = NULL
df = melt(data.frame(t(df)),id.var = c('Date','Equip'))
df[order(df$Date),]
Date Equip variable value
1 12/16/18 a x1 1
3 12/16/18 a x2 3
5 12/16/18 a x3 5
2 12/17/18 b x1 2
4 12/17/18 b x2 4
6 12/17/18 b x3 6
Here's another way starting from your approach using t(). We can replace the headers from the first row and then drop the first row, allowing just a single gather which might be more intuitive.
library(tidyverse)
df <- structure(list(V1 = c("Date", "Equip", "x1", "x2", "x3"), V2 = c(
"12/16/18",
"a", "1", "3", "5"
), V3 = c("12/17/18", "b", "2", "4", "6")), class = "data.frame", .Names = c(
"V1",
"V2", "V3"
), row.names = c(NA, -5L))
df %>%
t() %>%
`colnames<-`(.[1, ]) %>%
`[`(-1, ) %>%
as_tibble() %>%
gather("metric", "value", x1:x3) %>%
arrange(Date, Equip, metric)
#> # A tibble: 6 x 4
#> Date Equip metric value
#> <chr> <chr> <chr> <chr>
#> 1 12/16/18 a x1 1
#> 2 12/16/18 a x2 3
#> 3 12/16/18 a x3 5
#> 4 12/17/18 b x1 2
#> 5 12/17/18 b x2 4
#> 6 12/17/18 b x3 6
Created on 2018-04-20 by the reprex package (v0.2.0).
I've a nested list of objects that I'd like to first rename some variables and row bind its object, but selecting only some variables.
In the example below, I'd like to rename columns A to a in the second object, and w to x in the third object to, then row bind all three object selecting only columns a and x using.
Data:
df <- list(structure(list(a = 1:3,
x = c(-1.99, -1.11, -0.34),
y = c("C", "B", "A")), .Names = c("a", "x", "y"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L)), structure(list(a = 1:3, x = c(-0.44, -1.07, -0.23)), .Names = c("A", "x"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L)),
structure(list(a = 1:3, x = c(-0.62, -0.60, -0.06),
y = c(3L, 2L, 1L)), .Names = c("a", "w", "y"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L)))
List structure:
> lapply(df, names)
[[1]]
[1] "a" "x" "y"
[[2]]
[1] "A" "x"
[[3]]
[1] "a" "w" "y"
Then, row binding then:
library(plyr)
df2 <- ldply(df, data.frame)
using purrr (map), dplyr(rename,select,bind_rows,%>%) and magrittr (%<>%,%>%) ):
library(purrr)
library(dplyr)
library(magrittr)
df[[2]] %<>% rename(.,a = A)
df[[3]] %<>% rename(.,x = w)
df %>% map_df(. %>% select("a","x"))
# # A tibble: 9 x 2
# a x
# <int> <dbl>
# 1 1 -1.99
# 2 2 -1.11
# 3 3 -0.34
# 4 1 -0.44
# 5 2 -1.07
# 6 3 -0.23
# 7 1 -0.62
# 8 2 -0.60
# 9 3 -0.06
Or in base R:
names(df[[2]])[names(df[[2]]) == "A"] <- "a"
names(df[[3]])[names(df[[3]]) == "w"] <- "x"
do.call(rbind,lapply(df,"[",c("a","x")))
You could achieve that with:
library(plyr)
df = lapply(df, function(x) {plyr::rename(x,c("A"="a","w"="x"),warn_missing = F)})
df2 <- ldply(lapply(df, function(x) {x[,c("a","x")]}), data.frame)
Output:
a x
1 1 -1.99
2 2 -1.11
3 3 -0.34
4 1 -0.44
5 2 -1.07
6 3 -0.23
7 1 -0.62
8 2 -0.60
9 3 -0.06
Hope this helps.
Another idea could be to create a named vector v with the replacement values, loop over your list, rename if there is a match and select the desired columns.
v <- c("a" = "A", "x" = "w")
map_df(df, .f = ~ rename_if(
.x,
.p = names(.x) %in% v,
.f = funs(stringi::stri_replace_all_fixed(., v, names(v), vectorize_all = FALSE))) %>%
select(names(v))
)
Which gives:
## A tibble: 9 x 2
# a x
# <int> <dbl>
#1 1 -1.99
#2 2 -1.11
#3 3 -0.34
#4 1 -0.44
#5 2 -1.07
#6 3 -0.23
#7 1 -0.62
#8 2 -0.60
#9 3 -0.06
I have a list of some length(let's say 1000). Each element of the list is another list of length = 2. Each element of the new list is a data.table. The second element of each list might be an empty data.table.
I need to rbind() all the data.frames that are in the first position of the list. I am currently doing the following:
DT1 = data.table()
DT2 = data.table()
for (i in 1:length(myList)){
DT1 = rbind(DT1, myList[[i]][[1]]
DT2 = rbind(DT2, myList[[i]][[2]]
}
This works, but it is too slow. Is there a way I can avoid the for-loop?
Thank you in advance!
data table has a dedicated fast function: rbindlist
Cf: http://www.inside-r.org/packages/cran/data.table/docs/rbindlist
Edited:
Here is an example of code
library(data.table)
srcList=list(list(DT1=data.table(X=0),DT2=NULL),list(DT1=data.table(X=2),data.table(Y=3)))
# first have a list for all DT1s
DT1.list= lapply(srcList, FUN=function(el){el$DT1})
rbindlist(DT1.list)
X
1: 0
2: 2
Do this:
do.call("rbind", lapply(df.list, "[[", 1)) # for first list element
# x y
# 1 1 10
# 2 2 20
# 3 3 30
# 4 4 40
# 5 5 50
# 6 6 60
do.call("rbind", lapply(df.list, "[[", 2)) # for second list element
# x y
# 1 1 30
# 2 2 40
# 3 3 50
# 4 4 70
# 5 5 80
# 6 6 90
DATA
df.list=list(list(structure(list(x = 1:3, y = c(10, 20, 30)), .Names = c("x",
"y"), row.names = c(NA, -3L), class = "data.frame"), structure(list(
x = 1:3, y = c(30, 40, 50)), .Names = c("x", "y"), row.names = c(NA,
-3L), class = "data.frame")), list(structure(list(x = 4:6, y = c(40,
50, 60)), .Names = c("x", "y"), row.names = c(NA, -3L), class = "data.frame"),
structure(list(x = 4:6, y = c(70, 80, 90)), .Names = c("x",
"y"), row.names = c(NA, -3L), class = "data.frame")))
# df.list
# [[1]]
# [[1]][[1]]
# x y
# 1 1 10
# 2 2 20
# 3 3 30
# [[1]][[2]]
# x y
# 1 1 30
# 2 2 40
# 3 3 50
# [[2]]
# [[2]][[1]]
# x y
# 1 4 40
# 2 5 50
# 3 6 60
# [[2]][[2]]
# x y
# 1 4 70
# 2 5 80
# 3 6 90