how to do the calculation with different units [duplicate] - r

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 1 year ago.
I have a df that are in different unit. I would like to convert them into the same unit based on the conversion factors in cov. When df$Test==cov$Type, df$Unit==cov$Raw, then we do the calculation. if df$unit can not be found in cov, then keep as is and flag it with new variable "Check"=="Y"
What will be the best way to complete this conversion process. I saw someone using the method: building an empty df then read in each record one by one with calculation. Is it a good way? or it is the way counts as more careful way? I would like to know what will you do if you are the person to handle such task. As many as possible. Many thanks.
df<-structure(list(Test = c("Length", "Weight", "Weight", "Weight",
"Weight", "Length", "Length", "Length", "Length", "Length", "Length",
"Length", "Length", "Length"), Result = c(4.5, 36, 147, 55, 175,
2, 125, 222, 1.6, 3, 56, 512, 28, 78), Unit = c("m", "lb", "g",
"kg", "oz", "cm", "in", "mm", "ft", "m", "in", "cm", "cm", NA
)), row.names = c(NA, -14L), class = c("tbl_df", "tbl", "data.frame"))
cov<- structure(list(Type = c("Length", "Length", "Length", "Length",
"Length", "Weight", "Weight", "Weight", "Weight"), Raw = c("m",
"cm", "mm", "in", "ft", "lb", "g", "kg", "oz"), Standard = c("cm",
"cm", "cm", "cm", "cm", "g", "g", "g", "g"), Factor = c(100,
1, 0.1, 2.54, 30.48, 453, 1, 1000, 28)), row.names = c(NA, -9L
), class = c("tbl_df", "tbl", "data.frame"))

A sensible first step is to merge the two dataframes so that each line in df is added the suitable cov$Factor as in:
merge(df, cov, by.x = "Unit", by.y = "Raw", all.x = TRUE, all.y = FALSE)
which gives:
Unit Test Result Type Standard Factor
1 cm Length 512.0 Length cm 1.00
2 cm Length 28.0 Length cm 1.00
3 cm Length 2.0 Length cm 1.00
4 ft Length 1.6 Length cm 30.48
5 g Weight 147.0 Weight g 1.00
6 in Length 125.0 Length cm 2.54
7 in Length 56.0 Length cm 2.54
8 kg Weight 55.0 Weight g 1000.00
9 lb Weight 36.0 Weight g 453.00
10 m Length 4.5 Length cm 100.00
11 m Length 3.0 Length cm 100.00
12 mm Length 222.0 Length cm 0.10
13 oz Weight 175.0 Weight g 28.00
14 <NA> Length 78.0 <NA> <NA> NA
It is then easy to multiply Result by Factor to get results in unified units and run an ifelse to add a Check variable, should that be necessary even after each line has been checked.

Using dplyr you can do -
library(dplyr)
left_join(df, cov, by = c('Test' = 'Type', 'Unit' = 'Raw')) %>%
mutate(final = Result * Factor,
Check = ifelse(is.na(final), 'Y', 'F'))
# Test Result Unit Standard Factor final Check
# <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr>
# 1 Length 4.5 m cm 100 450 F
# 2 Weight 36 lb g 453 16308 F
# 3 Weight 147 g g 1 147 F
# 4 Weight 55 kg g 1000 55000 F
# 5 Weight 175 oz g 28 4900 F
# 6 Length 2 cm cm 1 2 F
# 7 Length 125 in cm 2.54 318. F
# 8 Length 222 mm cm 0.1 22.2 F
# 9 Length 1.6 ft cm 30.5 48.8 F
#10 Length 3 m cm 100 300 F
#11 Length 56 in cm 2.54 142. F
#12 Length 512 cm cm 1 512 F
#13 Length 28 cm cm 1 28 F
#14 Length 78 NA NA NA NA Y

Related

replacing rowwise() operations in grouped data

Anonymised example subset of a much larger dataset (now edited to show an option with multiple competing types):
structure(list(`Sample File` = c("A", "A", "A", "A", "A", "A",
"A", "A", "A", "B", "B", "B", "B", "B", "C", "C", "C", "C"),
Marker = c("X", "X", "X", "X", "Y", "Y", "Y", "Y", "Y", "Z",
"Z", "Z", "Z", "Z", "q", "q", "q", "q"), Allele = c(19, 20,
22, 23, 18, 18.2, 19, 19.2, 20, 12, 13, 14, 15, 16, 10, 10.2,
11, 12), Size = c(249.15, 253.13, 260.64, 264.68, 366, 367.81,
369.97, 372.02, 373.95, 91.65, 95.86, 100, 104.24, 108.38,
177.51, 179.4, 181.42, 185.49), Height = c(173L, 1976L, 145L,
1078L, 137L, 62L, 1381L, 45L, 1005L, 38L, 482L, 5766L, 4893L,
19L, 287L, 36L, 5001L, 50L), Type = c("minusone", "allele",
"minusone", "allele", "ambiguous", "minushalf", "allele",
"minushalf", "allele", "minustwo", "ambiguous", "allele",
"allele", "plusone", "minusone", "minushalf", "allele", "plusone"
), LUS = c(11.75, 11.286, 13.375, 13.5, 18, 9, 19, 10, 20,
12, 11, 14, 15, 16, 9.5, NA, 11, 11.5)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -18L), groups = structure(list(
`Sample File` = c("A", "A", "B", "C"), Marker = c("X", "Y",
"Z", "q"), .rows = structure(list(1:4, 5:9, 10:14, 15:18), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L), .drop = TRUE))
I want to look up values based on the classification $Type.
"minustwo" means I want to look up the "Allele", "Height" and "LUS"
values for the row with "Allele" equal to the current row plus two,
with the same Sample File and Marker.
"minusone" means the same but for "Allele" equal to the current row plus one.
"minushalf" means the same but for "Allele" equal to the current row plus 0.2 but the dot values here are 25% each, so 12.1, 12.3, 12.3, 13, 13.1 etc - I have a helper function plusTwoBP() for this.
"plusone" means the same for "Allele" equal to the current row -1
"allele" or "ambiguous" don't need to do anything.
Ideal output:
# A tibble: 18 × 10
# Rowwise: Sample File, Marker
`Sample File` Marker Allele Size Height Type LUS ParentHeight ParentAllele ParentLUS
<chr> <chr> <dbl> <dbl> <int> <chr> <dbl> <int> <dbl> <dbl>
1 A X 19 249. 173 minusone 11.8 1976 20 11.3
2 A X 20 253. 1976 allele 11.3 NA NA NA
3 A X 22 261. 145 minusone 13.4 1078 23 13.5
4 A X 23 265. 1078 allele 13.5 NA NA NA
5 A Y 18 366 137 ambiguous 18 NA NA NA
6 A Y 18.2 368. 62 minushalf 9 1381 19 19
7 A Y 19 370. 1381 allele 19 NA NA NA
8 A Y 19.2 372. 45 minushalf 10 1005 20 20
9 A Y 20 374. 1005 allele 20 NA NA NA
10 B Z 12 91.6 38 minustwo 12 5766 14 14
11 B Z 13 95.9 482 ambiguous 11 NA NA NA
12 B Z 14 100 5766 allele 14 NA NA NA
13 B Z 15 104. 4893 allele 15 NA NA NA
14 B Z 16 108. 19 plusone 16 4893 15 15
15 C q 10 178. 287 minusone 9.5 5001 11 11
16 C q 10.2 179. 36 minushalf NA 5001 11 11
17 C q 11 181. 5001 allele 11 NA NA NA
18 C q 12 185. 50 plusone 11.5 5001 11 11
I have a rather belaboured way of doing it:
# eg for minustwo
sampleData %>%
filter(Type == "minustwo") %>%
rowwise() %>%
mutate(ParentHeight = sampleData$Height[sampleData$`Sample File` == `Sample File` & sampleData$Marker == Marker & sampleData$Allele == (Allele + 2)],
ParentAllele = sampleData$Allele[sampleData$`Sample File` == `Sample File` & sampleData$Marker == Marker & sampleData$Allele == (Allele + 2)],
ParentLUS = sampleData$LUS[sampleData$`Sample File` == `Sample File` & sampleData$Marker == Marker & sampleData$Allele == (Allele + 2)]) %>%
right_join(sampleData)
I then have to redo that for each of my Types
My real dataset is thousands of rows so this ends up being a little slow but manageable, but more to the point I want to learn a better way to do it, in particular the sampleData$'Sample File' == 'Sample File' & sampleData$Marker == Marker seems like it should be doable with grouping so I must be missing a trick there.
I have tried using group_map() but I've clearly not understood it correctly:
sampleData$ParentHeight <- sampleData %>%
group_by(`Sample File`, `Marker`) %>%
group_map(.f = \(.x, .y) {
pmap_dbl(.l = .x, .f = \(Allele, Height, Type, ...){
if(Type == "allele" | Type == "ambiguous") { return(0)
} else if (Type == "plusone") {
return(.x$Height[.x$Allele == round(Allele - 1, 1)])
} else if (Type == "minushalf") {
return(.x$Height[.x$Allele == round(plustwoBP(Allele), 1)])
} else if (Type == "minusone") {
return(.x$Height[.x$Allele == round(Allele + 1, 1)])
} else if (Type == "minustwo") {
return(.x$Height[.x$Allele == round(Allele + 2, 1)])
} else { stop("unexpected peak type") }
})}) %>% unlist()
Initially seems to work, but on investigation it's not respecting both layers of grouping, so brings matches from the wrong Marker. Additionally, here I'm assigning the output to a new column in the data frame, but if I try to instead wrap a mutate() around this so that I can create all three new columns in one go then the group_map() no longer works at all.
I also considered using complete() to hugely extend the data frame will all possible values of Allele (including x.0, x.1, x.2, x.3 variants) then use lag() to select the corresponding rows, then drop the spare rows. This seems like it'd make the data frame enormous in the interim.
To summarise
This works, but it feels ugly and like I'm missing a more elegant and obvious solution. How would you approach this?
You can create two versions of Allele: one identical to the original Allele, and one that is equal to an adjustment based on minusone, minustwo, etc
Then do a self left join, based on that adjusted version of Allele (and Sample File and Marker)
sampleData = sampleData %>% group_by(`Sample File`,Marker) %>% mutate(id = Allele) %>% ungroup()
left_join(
sampleData %>%
mutate(id = case_when(
Type=="minusone"~id+1,
Type=="minustwo"~id+2,
Type=="plusone"~id-1,
Type=="minushalf"~ceiling(id))),
sampleData %>% select(-c(Size,Type)),
by=c("Sample File", "Marker", "id"),
suffix = c("", ".parent")
) %>% select(-id)
Output:
# A tibble: 14 × 10
`Sample File` Marker Allele Size Height Type LUS Allele.parent Height.parent LUS.parent
<chr> <chr> <dbl> <dbl> <int> <chr> <dbl> <dbl> <int> <dbl>
1 A X 19 249. 173 minusone 11.8 20 1976 11.3
2 A X 20 253. 1976 allele 11.3 NA NA NA
3 A X 22 261. 145 minusone 13.4 23 1078 13.5
4 A X 23 265. 1078 allele 13.5 NA NA NA
5 A Y 18 366 137 ambiguous 18 NA NA NA
6 A Y 18.2 368. 62 minushalf 9 19 1381 19
7 A Y 19 370. 1381 allele 19 NA NA NA
8 A Y 19.2 372. 45 minushalf 10 20 1005 20
9 A Y 20 374. 1005 allele 20 NA NA NA
10 B Z 12 91.6 38 minustwo 12 14 5766 14
11 B Z 13 95.9 482 ambiguous 11 NA NA NA
12 B Z 14 100 5766 allele 14 NA NA NA
13 B Z 15 104. 4893 allele 15 NA NA NA
14 B Z 16 108. 19 plusone 16 15 4893 15
15 C q 10 178. 287 minusone 9.5 11 5001 11
16 C q 10.2 179. 36 minushalf NA 11 5001 11
17 C q 11 181. 5001 allele 11 NA NA NA
18 C q 12 185. 50 plusone 11.5 11 5001 11

How to find min and max in dplyr?

I know the sum of points for each person.
I need to know: what is the minimum number of points that a person could have. And what is the maximum number of points that a person could have.
What I have tried:
min_and_max <- dataset %>%
group_by(person) %>%
dplyr::filter(min(sum(points, na.rm = T))) %>%
distinct(person) %>%
pull()
min_and_max
My dataset:
id person points
201 rt99 NA
201 rt99 3
201 rt99 2
202 kt 4
202 kt NA
202 kt NA
203 rr 4
203 rr NA
203 rr NA
204 jk 2
204 jk 2
204 jk NA
322 knm3 5
322 knm3 NA
322 knm3 3
343 kll2 2
343 kll2 1
343 kll2 5
344 kll NA
344 kll 7
344 kll 1
I would suggest this dplyr approach. You have to summarize data like this:
library(tidyverse)
#Code
df %>% group_by(id,person) %>%
summarise(Total=sum(points,na.rm = T),
min=min(points,na.rm = T),
max=max(points,na.rm=T))
Output:
# A tibble: 7 x 5
# Groups: id [7]
id person Total min max
<int> <chr> <int> <int> <int>
1 201 rt99 5 2 3
2 202 kt 4 4 4
3 203 rr 4 4 4
4 204 jk 4 2 2
5 322 knm3 8 3 5
6 343 kll2 8 1 5
7 344 kll 8 1 7
Here is the data.table solution -
dataset[, min_points := min(points, na.rm = T), by = person]
dataset[, max_points := max(points, na.rm = T), by = person]
Since I don't have your data, I cannot test this code, but it should work fine.
The summarize() verb is what you want for this. You don't even need to filter out the NA values first since both min() and max() can have na.rm = TRUE.
library(dplyr)
min_and_max <- dataset %>%
group_by(person) %>%
summarize(min = min(points, na.rm = TRUE),
max = max(points, na.rm = TRUE))
min_and_max
# A tibble: 7 x 3
person min max
<chr> <dbl> <dbl>
1 jk 2 2
2 kll 1 7
3 kll2 1 5
4 knm3 3 5
5 kt 4 4
6 rr 4 4
7 rt99 2 3
dput(dataset)
structure(list(id = c(201, 201, 201, 202, 202, 202, 203, 203,
203, 204, 204, 204, 322, 322, 322, 343, 343, 343, 344, 344, 344
), person = c("rt99", "rt99", "rt99", "kt", "kt", "kt", "rr",
"rr", "rr", "jk", "jk", "jk", "knm3", "knm3", "knm3", "kll2",
"kll2", "kll2", "kll", "kll", "kll"), points = c(NA, 3, 2, 4,
NA, NA, 4, NA, NA, 2, 2, NA, 5, NA, 3, 2, 1, 5, NA, 7, 1)), class = "data.frame", row.names = c(NA,
-21L), spec = structure(list(cols = list(id = structure(list(), class = c("collector_double",
"collector")), person = structure(list(), class = c("collector_character",
"collector")), points = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))

How to remove rows with 0 in numeric columns in R [duplicate]

This question already has answers here:
Extracting columns having greater than certain values in R dataframe
(5 answers)
Select columns that don't contain any NA value in R
(3 answers)
Closed 2 years ago.
i have the following Dataset:
structure(list(Species = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label =
c("Bream", "Parkki", "Perch", "Pike", "Roach", "Smelt", "Whitefish"),
class = "factor"),
WeightGRAM = c(242, 290, 340, 363, 430, 450), VertLengthCM = c(23.2,
24, 23.9, 26.3, 26.5, 26.8), DiagLengthCM = c(25.4, 26.3,
26.5, 29, 29, 29.7), CrossLengthCM = c(30, 31.2, 31.1, 33.5,
34, 34.7), HeightCM = c(11.52, 12.48, 12.3778, 12.73, 12.444,
13.6024), WidthCM = c(4.02, 4.3056, 4.6961, 4.4555, 5.134,
4.9274)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
I am trying to check for "0" or negative values in the numeric columns and remove them.
I have the following code:
fish_data <- fish_data [which(rowSums(fish_data) > 0), ]
But i will get a error message:
Error in rowSums(fish_data) : 'x' must be numeric
I roughly guess because my "species" columns are factor, this message came up.
Can i know how can i skip the first column and ask R to check for only numeric columns for "0" or negative values?
Here is a way that keeps only the columns with no values less than or equal to zero.
keep <- sapply(fish_data, function(x) {
if(is.numeric(x)) all(x > 0) else TRUE
})
fish_data[keep]
## A tibble: 6 x 7
# Species WeightGRAM VertLengthCM DiagLengthCM CrossLengthCM HeightCM WidthCM
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Bream 242 23.2 25.4 30 11.5 4.02
#2 Bream 290 24 26.3 31.2 12.5 4.31
#3 Bream 340 23.9 26.5 31.1 12.4 4.70
#4 Bream 363 26.3 29 33.5 12.7 4.46
#5 Bream 430 26.5 29 34 12.4 5.13
#6 Bream 450 26.8 29.7 34.7 13.6 4.93
Using dplyr we can use select to select columns where all values are greater than 0 or are not numeric.
library(dplyr)
df %>% select(where(~(is.numeric(.) && all(. > 0)) || !is.numeric(.)))
# A tibble: 6 x 7
# Species WeightGRAM VertLengthCM DiagLengthCM CrossLengthCM HeightCM WidthCM
# <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Bream 242 23.2 25.4 30 11.5 4.02
#2 Bream 290 24 26.3 31.2 12.5 4.31
#3 Bream 340 23.9 26.5 31.1 12.4 4.70
#4 Bream 363 26.3 29 33.5 12.7 4.46
#5 Bream 430 26.5 29 34 12.4 5.13
#6 Bream 450 26.8 29.7 34.7 13.6 4.93
In the previous version of dplyr, we can use select_if :
df %>% select_if(~(is.numeric(.) && all(. > 0)) || !is.numeric(.))
you only need to specifiy the columns for the rowSums() function:
fish_data <- fish_data[which(rowSums(fish_data[,2:7]) > 0), ]
note that rowsums sums all values across the row im not sure if thats whta you really want to achieve?
you can check the output of rowsums with:
> rowSums(fish_data[,2:7])
[1] 336.1400 388.2856 438.5739 468.9855 537.0780 559.7298
Thanks all, i think i figure out.
i should be keying:
fish_data[fish_data <= 0] <- NA #convert records with less than or equal to 0 to NA
fish_data <- na.omit(fish_data) # delete rows with NA
But i will get a warning message:
Warning message: In Ops.factor(left, right) : ‘<=’ not meaningful for
factors
# Option 1: (Safer because will retain rows containing NAs)
# Subset data.frame to not contain any observations with 0 values:
# data.frame => stdout (console)
df[rowMeans(df != 0, na.rm = TRUE) == 1,]
# Option 2: (More dangerous because it will remove all rows containing
# NAs) subset data.frame to not contain any observations with 0 values:
# data.frame => stdout (console)
df[complete.cases(replace(df, df == 0, NA)),]
# Option 3 (Variant of Option 1):
# Subset data.frame to not contain any observations with 0 values:
# data.frame => stdout (console)
df[rowMeans(Vectorize(function(x){x != 0})(df[,sapply(df, is.numeric)]),
na.rm = TRUE) == 1,]
# Option 4: Using Higher-order functions:
# Subset data.frame to not contain any observations with 0 values:
# data.frame => stdout (console)
df[Reduce(function(y, z){intersect(y, z)},
Map(function(x){which(x > 0)}, df[,sapply(df, is.numeric)])), ]
# Option 5 tidyverse:
# Subset data.frame to not contain any observations with 0 values:
# data.frame => stdout (console)
library(dplyr)
df %>%
filter_if(is.numeric, all_vars(. > 0))
Data:
df <- structure(list(Species = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label =
c("Bream", "Parkki", "Perch", "Pike", "Roach", "Smelt", "Whitefish"),
class = "factor"),
WeightGRAM = c(242, 290, 340, 363, 0, 450), VertLengthCM = c(23.2,
24, 23.9, 26.3, 26.5, 26.8), DiagLengthCM = c(25.4, 26.3,
26.5, 29, 29, 29.7), CrossLengthCM = c(30, 31.2, 31.1, 33.5,
34, 34.7), HeightCM = c(11.52, 0, 12.3778, 12.73, 12.444,
13.6024), WidthCM = c(4.02, 4.3056, 4.6961, 4.4555, 5.134,
4.9274)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

Is there a way in R function to iterate over each row with a cell value of the row as an argument?

I would like to calculate the percentage of row elements using a value of the same row
My data frame looks like this
Group.1 a b c d e total
1 test1 470.0 5696.0 393.5 0.5 8.0 6568.0
2 test2 646.0 5376.0 279.0 0.5 9.5 6311.0
3 test3 855.0 5279.5 297.0 0.5 11.0 6443.0
4 test4 660.5 7472.0 201.0 11.5 481.5 8826.5
5 test5 87.0 3900.0 119.0 11.5 491.5 4609.0
Now i would like to calculate the percentage of a,b,c,d and e
percentage <- t(apply(mydata[-c(1,7)], 1, FUN = function(x) x / Here im not sure how to access the right cell ))
Is this possible using apply or is there a better way to achieve this?
If we want to do this vectorized
percentage <- mydata[-c(1, 7)]/mydata$total
percentage
# a b c d e
#1 0.07155907 0.8672351 0.05991169 7.612667e-05 0.001218027
#2 0.10236096 0.8518460 0.04420852 7.922675e-05 0.001505308
#3 0.13270216 0.8194164 0.04609654 7.760360e-05 0.001707279
#4 0.07483147 0.8465417 0.02277233 1.302895e-03 0.054551634
#5 0.01887611 0.8461705 0.02581905 2.495118e-03 0.106639184
If we are checking the sum of the elements to calculate the percentage
rowSums(mydata[-c(1, 7)])/mydata$total
data
mydata <- structure(list(Group.1 = c("test1", "test2", "test3", "test4",
"test5"), a = c(470, 646, 855, 660.5, 87), b = c(5696, 5376,
5279.5, 7472, 3900), c = c(393.5, 279, 297, 201, 119), d = c(0.5,
0.5, 0.5, 11.5, 11.5), e = c(8, 9.5, 11, 481.5, 491.5), total = c(6568,
6311, 6443, 8826.5, 4609)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))

Split a nested list of a dataframe column into different columns

I have tried related solutions but they do not work for my case. I have a dataframe that has a nested list in one column and i want to split this list and put it in columns.The list contains another list with the time stamp for each month(ts) and the consumption for each month(v). The dataframe is:
id monthly_consum
1 112 list1
2 34 list2
3 54 list3
where
list1<-list(list(ts = "2016-01-01T00:00:00+01:00", v = 466.6),list(ts = "2016-02-01T00:00:00+01:00", v = 565.6),
list(ts = "2016-03-01T00:00:00+01:00", v = 765.6),list(ts = "2016-04-01T00:00:00+01:00", v = 888.6),
list(ts = "2016-05-01T00:00:00+01:00", v = 465),list(ts = "2016-06-01T00:00:00+01:00", v = 465.6),
list(ts = "2016-07-01T00:00:00+01:00", v = 786),list(ts = "2016-08-01T00:00:00+01:00", v = 435),
list(ts = "2016-09-01T00:00:00+01:00", v = 568),list(ts = "2016-10-01T00:00:00+01:00", v = 678),
list(ts = "2016-11-01T00:00:00+01:00", v = 522),list(ts = "2016- 12-01T00:00:00+01:00", v = 555))
list2<-list(list(ts = "2016-01-01T00:00:00+01:00", v = 333.6),list(ts = "2016-02-01T00:00:00+01:00", v = 565.6),
list(ts = "2016-03-01T00:00:00+01:00", v = 765.6),list(ts = "2016-04-01T00:00:00+01:00", v = 333.6),
list(ts = "2016-05-01T00:00:00+01:00", v = 465),list(ts = "2016-06-01T00:00:00+01:00", v = 465.6),
list(ts = "2016-07-01T00:00:00+01:00", v = 786),list(ts = "2016-08-01T00:00:00+01:00", v = 435),
list(ts = "2016-09-01T00:00:00+01:00", v = 568),list(ts = "2016-10-01T00:00:00+01:00", v = 678),
list(ts = "2016-11-01T00:00:00+01:00", v = 522),list(ts = "2016-12-01T00:00:00+01:00", v = 555))
list3<-list(list(ts = "2016-01-01T00:00:00+01:00", v = 323.6),list(ts = "2016-02-01T00:00:00+01:00", v = 565.6),
list(ts = "2016-03-01T00:00:00+01:00", v = 333.6),list(ts = "2016-04-01T00:00:00+01:00", v = 888.6),
list(ts = "2016-05-01T00:00:00+01:00", v = 465),list(ts = "2016-06-01T00:00:00+01:00", v = 465.6),
list(ts = "2016-07-01T00:00:00+01:00", v = 786),list(ts = "2016-08-01T00:00:00+01:00", v = 435),
list(ts = "2016-09-01T00:00:00+01:00", v = 568),list(ts = "2016-10-01T00:00:00+01:00", v = 678),
list(ts = "2016-11-01T00:00:00+01:00", v = 522),list(ts = "2016-12-01T00:00:00+01:00", v = 555))
I would like to split the list and create a dataframe which will have one of the 2 following formats:
id ts.1 cons.1 ts.2 cons.2 ts.3 etc..
1 112 2016-01-01T00:00:00+01:00 466.6 2016-02.. ... ...
2 34 2016-01-01T00:00:00+01:00 333.6 2016-02.. ... ...
3 54 2016-01-01T00:00:00+01:00 323.6 2016-02.. ... ...
OR
id ts consumption
112 2016-01-01T00:00:00+01:00 466.6
112 2016-02-01T00:00:00+01:00 565.6
112 2016-03-01T00:00:00+01:00 765.6
112 2016-04-01T00:00:00+01:00 888.6
112 2016-05-01T00:00:00+01:00 465
112 2016-06-01T00:00:00+01:00 465.6
112 2016-07-01T00:00:00+01:00 786
112 2016-08-01T00:00:00+01:00 435
112 2016-09-01T00:00:00+01:00 568
112 2016-10-01T00:00:00+01:00 678
112 2016-11-01T00:00:00+01:00 522
112 2016-12-01T00:00:00+01:00 555
34 2016-01-01T00:00:00+01:00 466.6
34 2016-02-01T00:00:00+01:00 333.6
34 2016-03-01T00:00:00+01:00 323.6
etc............
could you help me? I am using data.frame(matrix(unlist..)) but it does not give the format that i want. When I use rbind list i get:
"Error in rbindlist(....) :
Item 1 of list input is not a data.frame, data.table or list"
Thank you in advance!
UPDATE
Using dput i would get (in the real problem):
>dput(locs_total[9:12,1:5])
structure(list(X.dep_id. = c("34", "34", "34", "34"), X.loc_id. = c("17761",
"17406", "23591", "27838"), X.surface. = c("200", "1250", "54",
"150"), X.sector. = c("HOUSING", "SMALL-STORE-FOOD", "LIBRARY",
"OFFICE-BUILDING"),
X.avg_cons_main. = list(list(structure(list(
ts = "2016-01-01T00:00:00+01:00", v = 466.65), .Names = c("ts",
"v")), structure(list(ts = "2016-02-01T00:00:00+01:00", v = 406.45),
.Names = c("ts",
"v")), structure(list(ts = "2016-03-01T00:00:00+01:00", v = 483.35),
.Names = c("ts",
"v")), structure(list(ts = "2016-04-01T00:00:00+02:00", v = 79.45), .
Names = c("ts",
"v"))), NULL, NULL, NULL)), .Names = c("X.dep_id.", "X.loc_id.",
"X.surface.", "X.sector.", "X.avg_cons_main."
), row.names = c("9", "10", "11", "12"), class = "data.frame")
If the ids are also in the lists, you can use dplyr::bind_rows
dplyr::bind_rows(list1, list2, list3)
# A tibble: 36 × 2
ts v
<chr> <dbl>
1 2016-01-01T00:00:00+01:00 466.6
2 2016-02-01T00:00:00+01:00 565.6
3 2016-03-01T00:00:00+01:00 765.6
4 2016-04-01T00:00:00+01:00 888.6
5 2016-05-01T00:00:00+01:00 465.0
6 2016-06-01T00:00:00+01:00 465.6
7 2016-07-01T00:00:00+01:00 786.0
8 2016-08-01T00:00:00+01:00 435.0
9 2016-09-01T00:00:00+01:00 568.0
10 2016-10-01T00:00:00+01:00 678.0
# ... with 26 more rows
To add IDs from another df
library(dplyr)
ids <- data_frame(list_id = c(112, 34, 54),
monthly_consum = c("list1", "list2", "list3"))
If we consider nested lists, you can use purrr:map as follows:
-combine the three lists in one list
k <- list(list1, list2, list3)
-use map to bind_rows in each column independently
k1 <- purrr:: map(k, bind_rows)
-use the ids as names for the lists
names(k1) <- ids$list_id
-bind_rows using .id
bind_rows(k1, .id = "id")
# A tibble: 36 × 3
id ts v
<chr> <chr> <dbl>
1 112 2016-01-01T00:00:00+01:00 466.6
2 112 2016-02-01T00:00:00+01:00 565.6
3 112 2016-03-01T00:00:00+01:00 765.6
4 112 2016-04-01T00:00:00+01:00 888.6
5 112 2016-05-01T00:00:00+01:00 465.0
6 112 2016-06-01T00:00:00+01:00 465.6
7 112 2016-07-01T00:00:00+01:00 786.0
8 112 2016-08-01T00:00:00+01:00 435.0
9 112 2016-09-01T00:00:00+01:00 568.0
10 112 2016-10-01T00:00:00+01:00 678.0
We can loop through the list
res <- do.call(rbind, Map(cbind, id = df1$id, lapply(mget(df1$monthly_consum),
function(x) do.call(rbind.data.frame, x))))
names(res)[3] <- "consumption"
row.names(res) <- NULL
head(res, 14)
# id ts consumption
#1 112 2016-01-01T00:00:00+01:00 466.6
#2 112 2016-02-01T00:00:00+01:00 565.6
#3 112 2016-03-01T00:00:00+01:00 765.6
#4 112 2016-04-01T00:00:00+01:00 888.6
#5 112 2016-05-01T00:00:00+01:00 465.0
#6 112 2016-06-01T00:00:00+01:00 465.6
#7 112 2016-07-01T00:00:00+01:00 786.0
#8 112 2016-08-01T00:00:00+01:00 435.0
#9 112 2016-09-01T00:00:00+01:00 568.0
#10 112 2016-10-01T00:00:00+01:00 678.0
#11 112 2016-11-01T00:00:00+01:00 522.0
#12 112 2016- 12-01T00:00:00+01:00 555.0
#13 34 2016-01-01T00:00:00+01:00 333.6
#14 34 2016-02-01T00:00:00+01:00 565.6
data
df1 <- structure(list(id = c(112L, 34L, 54L), monthly_consum = c("list1",
"list2", "list3")), .Names = c("id", "monthly_consum"),
class = "data.frame", row.names = c("1", "2", "3"))

Resources