r - merge rows in group while replacing NAs [duplicate]

r - merge rows in group while replacing NAs [duplicate] - r

This question already has answers here:
Collapsing rows where some are all NA, others are disjoint with some NAs
(5 answers)
Closed 4 years ago.
I was trying to find the answer to this but couldn't. If there is an answer I apologize and will immediately delete my question.
I'm trying to merge several rows into one (this calculation should be done separately on groups, in this case variable id can be used to group), so that no NA values are left.
# initial dataframe
df_start <- data.frame(
id = c("as", "as", "as", "as", "as", "bs", "bs", "bs", "bs", "bs"),
b = c(NA, NA, NA, NA, "A", NA, NA, 6, NA, NA),
c = c(2, NA, NA, NA, NA, 7, NA, NA, NA, NA),
d = c(NA, 4, NA, NA, NA, NA, 8, NA, NA, NA),
e = c(NA, NA, NA, 3, NA, NA, NA, NA, "B", NA),
f = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 10))
# desired output
df_end <- data.frame(id = c("as", "bs"),
b = c("A", 6),
c = c(2, 7),
d = c(4, 8),
e = c(3,"B"),
f = c(5, 10))

No need to delete the question, it may be helpful to some users. This summarises each group to the first non NA occurrence for each column.
library(dplyr)
df_start <- data.frame(
id = c("as", "as", "as", "as", "as", "bs", "bs", "bs", "bs", "bs"),
b = c(NA, NA, NA, NA, "A", NA, NA, 6, NA, NA),
c = c(2, NA, NA, NA, NA, 7, NA, NA, NA, NA),
d = c(NA, 4, NA, NA, NA, NA, 8, NA, NA, NA),
e = c(NA, NA, NA, 3, NA, NA, NA, NA, "B", NA),
f = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 10))
df_start %>%
group_by(id) %>%
summarise_all(list(~first(na.omit(.))))
Output:
# A tibble: 2 x 6
id b c d e f
<fct> <fct> <dbl> <dbl> <fct> <dbl>
1 as A 2. 4. 3 5.
2 bs 6 7. 8. B 10.
You will, of course, get some data lost if there is more than one occurrence of a value with each group for each column.

Hope this helps, Using dplyr
df_start <- sapply(df_start, as.character)
df_start[is.na(df_start)] <- " "
df_start <- as.data.frame(df_start)
library(dplyr)
df_start %>%
group_by(id) %>%
summarise_all(funs(trimws(paste(., collapse = '')))) -> df

Related

reshaping multiple columns in R, based on name values

Df <- data.frame(prop1 = c(NA, NA, NA, "French", NA, NA,NA, "-29 to -20", NA, NA, NA, "Pop", NA, NA, NA, "French", "-29 to -20", "Pop"),
prop1_rank = c(NA, NA, NA, 0, NA, NA,NA, 11, NA, NA, NA, 1, NA, NA, NA, 40, 0, 2),
prop2 = c(NA, NA, NA, "Spanish", NA, NA,NA, "-19 to -10", NA, NA, NA, "Rock", NA, NA, NA, "Spanish", "-19 to -10", "Rock"),
prop2_rank = c(NA, NA, NA, 10, NA, NA,NA, 4, NA, NA, NA, 1, NA, NA, NA, 1, 0, 2),
initOSF1 = c(NA, NA, NA, NA, NA, "French", NA,NA,NA, "-29 to -20", NA, NA, NA, "Pop", NA, NA, NA, NA),
initOSF1_freq = c(NA, NA, NA, NA, NA, 66, NA,NA,NA, 0, NA, NA, NA, 14, NA, NA, NA, NA),
initOSF2 = c(NA, NA, NA, NA, NA, "Spanish", NA,NA,NA, "-19 to -10", NA, NA, NA, "Rock", NA, NA, NA, NA),
initOSF2_freq = c(NA, NA, NA, NA, NA, 0, NA,NA,NA, 6, NA, NA, NA, 14, NA, NA, NA, NA))
Df
I would like to organize this into
3 columns consisting: c("propositions", "ranks", "freqs"),
where,
Propositions column has the values: "French", "Spanish", "-29 to -20", "19 to -10", "Pop", "Rock", and having a separate columns for the rank values e.g., 0 for French, 10 for Spanish, etc., and frequency values e.g., 66 for French, 0 for Spanish, etc.

This is not an easy one. Probably a better solution exists:
library(tidyverse)
library(data.table)
setDT(Df) %>%
select(contains(c('prop', 'rank', 'freq'))) %>%
filter(!if_all(everything(), is.na)) %>%
melt(measure.vars = patterns(c('prop.$', 'rank$', 'freq'))) %>%
group_by(gr=cumsum(!is.na(value1)))%>%
summarise(across(-variable, ~if(length(.x)>1) na.omit(.x) else .x))
# A tibble: 12 x 4
gr value1 value2 value3
<int> <chr> <dbl> <dbl>
1 1 French 0 66
2 2 -29 to -20 11 0
3 3 Pop 1 14
4 4 French 40 NA
5 5 -29 to -20 0 NA
6 6 Pop 2 NA
7 7 Spanish 10 0
8 8 -19 to -10 4 6
9 9 Rock 1 14
10 10 Spanish 1 NA
11 11 -19 to -10 0 NA
12 12 Rock 2 NA

Collapsing Dataframe Rows along several variables

I have a dataframe that looks something like this, in which I have several rows for each user, and many NAs in the columns.
user
Effect T1
Effect T2
Effect T3
Benchmark T1
Benchmark T2
Benchmark T3
Tom
01
NA
NA
02
NA
NA
Tom
NA
07
NA
NA
08
NA
Tom
NA
NA
13
NA
NA
14
Larry
03
NA
NA
04
NA
NA
Larry
NA
09
NA
NA
10
NA
Larry
NA
NA
15
NA
NA
16
Dave
05
NA
NA
06
NA
NA
Dave
NA
11
NA
NA
12
NA
Dave
NA
NA
17
NA
NA
18
I want to collapse the columns using the name and filling the values from reach row, this this.
user
Effect T1
Effect T2
Effect T3
Benchmark T1
Benchmark T2
Benchmark T3
Tom
01
07
13
02
08
14
Larry
03
09
15
04
10
16
Dave
05
11
17
06
12
18
How might I accomplish this?
Thank you in advance for your help. Update: I've added the dput of a subset of the actual data below.
structure(list(name = c("Abraham_Ralph", "Abraham_Ralph", "Abraham_Ralph",
"Ackerman_Gary", "Adams_Alma", "Adams_Alma", "Adams_Alma", "Adams_Alma",
"Adams_Sandy", "Aderholt_Robert", "Aderholt_Robert", "Aderholt_Robert",
"Aderholt_Robert", "Aderholt_Robert", "Aguilar_Pete", "Aguilar_Pete",
"Aguilar_Pete"), state = c("LA", "LA", "LA", "NY", "NC", "NC",
"NC", "NC", "FL", "AL", "AL", "AL", "AL", "AL", "CA", "CA", "CA"
), seniority = c(1, 2, 3, 15, 1, 2, 3, 4, 1, 8, 9, 10, 11, 12,
1, 2, 3), legeffect_112 = c(NA, NA, NA, 0.202061712741852, NA,
NA, NA, NA, 1.30758035182953, 3.73544979095459, NA, NA, NA, NA,
NA, NA, NA), legeffect_113 = c(NA, NA, NA, NA, 0, NA, NA, NA,
NA, NA, 0.908495426177979, NA, NA, NA, NA, NA, NA), legeffect_114 = c(2.07501077651978,
NA, NA, NA, NA, 0.84164834022522, NA, NA, NA, NA, NA, 0.340001106262207,
NA, NA, 0.10985741019249, NA, NA), legeffect_115 = c(NA, 0.493490308523178,
NA, NA, NA, NA, 0.587624311447144, NA, NA, NA, NA, NA, 0.159877583384514,
NA, NA, 0.730929613113403, NA), legeffect_116 = c(NA, NA, 0.0397605448961258,
NA, NA, NA, NA, 1.78378939628601, NA, NA, NA, NA, NA, 0.0198802724480629,
NA, NA, 0.0497006773948669), benchmark_112 = c(NA, NA, NA, 0.738679468631744,
NA, NA, NA, NA, 0.82908970117569, 1.39835929870605, NA, NA, NA,
NA, NA, NA, NA), benchmark_113 = c(NA, NA, NA, NA, 0.391001850366592,
NA, NA, NA, NA, NA, 1.58223271369934, NA, NA, NA, NA, NA, NA),
benchmark_114 = c(1.40446054935455, NA, NA, NA, NA, 0.576326191425323,
NA, NA, NA, NA, NA, 1.42212760448456, NA, NA, 0.574363172054291,
NA, NA), benchmark_115 = c(NA, 1.3291300535202, NA, NA, NA,
NA, 0.537361204624176, NA, NA, NA, NA, NA, 1.45703768730164,
NA, NA, 0.523149251937866, NA), benchmark_116 = c(NA, NA,
0.483340591192245, NA, NA, NA, NA, 1.31058621406555, NA,
NA, NA, NA, NA, 0.751261711120605, NA, NA, 1.05683290958405
)), row.names = c(NA, -17L), class = c("tbl_df", "tbl", "data.frame"
))

A data.table solution:
# melt data, remove NA, then recast ...
dt <- dcast(melt(data.table(d), "name")[!value %in% NA], name ~ variable)
dcast(melt(data.table(d), "name")[!value %in% c(NA) & !variable %in% c("variable", "seniority", "state")], name ~ variable)
name legeffect_112 legeffect_113 legeffect_114 legeffect_115 legeffect_116 benchmark_112 benchmark_113 benchmark_114 benchmark_115 benchmark_116
1: Abraham_Ralph <NA> <NA> 2.07501077651978 0.493490308523178 0.0397605448961258 <NA> <NA> 1.40446054935455 1.3291300535202 0.483340591192245
2: Ackerman_Gary 0.202061712741852 <NA> <NA> <NA> <NA> 0.738679468631744 <NA> <NA> <NA> <NA>
3: Adams_Alma <NA> 0 0.84164834022522 0.587624311447144 1.78378939628601 <NA> 0.391001850366592 0.576326191425323 0.537361204624176 1.31058621406555
4: Adams_Sandy 1.30758035182953 <NA> <NA> <NA> <NA> 0.82908970117569 <NA> <NA> <NA> <NA>
5: Aderholt_Robert 3.73544979095459 0.908495426177979 0.340001106262207 0.159877583384514 0.0198802724480629 1.39835929870605 1.58223271369934 1.42212760448456 1.45703768730164 0.751261711120605
6: Aguilar_Pete <NA> <NA> 0.10985741019249 0.730929613113403 0.0497006773948669 <NA> <NA> 0.574363172054291 0.523149251937866 1.05683290958405
Data/Setup
# Load data.table
# install.packages("data.table")
library(data.table)
# Read example data
d <- structure(list(name = c("Abraham_Ralph", "Abraham_Ralph", "Abraham_Ralph",
"Ackerman_Gary", "Adams_Alma", "Adams_Alma", "Adams_Alma", "Adams_Alma",
"Adams_Sandy", "Aderholt_Robert", "Aderholt_Robert", "Aderholt_Robert",
"Aderholt_Robert", "Aderholt_Robert", "Aguilar_Pete", "Aguilar_Pete",
"Aguilar_Pete"), state = c("LA", "LA", "LA", "NY", "NC", "NC",
"NC", "NC", "FL", "AL", "AL", "AL", "AL", "AL", "CA", "CA", "CA"
), seniority = c(1, 2, 3, 15, 1, 2, 3, 4, 1, 8, 9, 10, 11, 12,
1, 2, 3), legeffect_112 = c(NA, NA, NA, 0.202061712741852, NA,
NA, NA, NA, 1.30758035182953, 3.73544979095459, NA, NA, NA, NA,
NA, NA, NA), legeffect_113 = c(NA, NA, NA, NA, 0, NA, NA, NA,
NA, NA, 0.908495426177979, NA, NA, NA, NA, NA, NA), legeffect_114 = c(2.07501077651978,
NA, NA, NA, NA, 0.84164834022522, NA, NA, NA, NA, NA, 0.340001106262207,
NA, NA, 0.10985741019249, NA, NA), legeffect_115 = c(NA, 0.493490308523178,
NA, NA, NA, NA, 0.587624311447144, NA, NA, NA, NA, NA, 0.159877583384514,
NA, NA, 0.730929613113403, NA), legeffect_116 = c(NA, NA, 0.0397605448961258,
NA, NA, NA, NA, 1.78378939628601, NA, NA, NA, NA, NA, 0.0198802724480629,
NA, NA, 0.0497006773948669), benchmark_112 = c(NA, NA, NA, 0.738679468631744,
NA, NA, NA, NA, 0.82908970117569, 1.39835929870605, NA, NA, NA,
NA, NA, NA, NA), benchmark_113 = c(NA, NA, NA, NA, 0.391001850366592,
NA, NA, NA, NA, NA, 1.58223271369934, NA, NA, NA, NA, NA, NA),
benchmark_114 = c(1.40446054935455, NA, NA, NA, NA, 0.576326191425323,
NA, NA, NA, NA, NA, 1.42212760448456, NA, NA, 0.574363172054291,
NA, NA), benchmark_115 = c(NA, 1.3291300535202, NA, NA, NA,
NA, 0.537361204624176, NA, NA, NA, NA, NA, 1.45703768730164,
NA, NA, 0.523149251937866, NA), benchmark_116 = c(NA, NA,
0.483340591192245, NA, NA, NA, NA, 1.31058621406555, NA,
NA, NA, NA, NA, 0.751261711120605, NA, NA, 1.05683290958405
)), row.names = c(NA, -17L), class = c("tbl_df", "tbl", "data.frame"
))

This solution is using only the base functions (no extra packages), but the one-liner may cause eyes to cross, so I'll split it into several functions.
The plan is the following:
Split the original data.frame by the values in name column, using the function by;
For each partition of the data.frame, collapse the columns;
A collapsed column returns the max value of the column, or NA if all its values are NA;
The collapsed data.frame partitions are stacked together.
So, this is a function that does that:
dfr_collapse <- function(dfr, col0)
{
# Collapse the columns of the data.frame "dfr" grouped by the values of
# the column "col0"
# Max/NA function
namax <- function(x)
{
if(all(is.na(x)))
NA # !!!
else
max(x, na.rm=TRUE)
}
# Column collapse function
byfun <- function(x)
{
lapply(x, namax)
}
# Stack the partitioning results
return(do.call(
what = rbind,
args = by(dfr, dfr[[col0]], byfun)
))
}
May not look as slick as a one-liner, but it does the job. It can be tunrned into a one-liner, but you don't want that.
Assuming that df0 is the data.frame from you dput, you can test this function with
dfr_collapse(df0)
Nota bene: for the sake of simplicity, I return an NA of type logical (see the comment # !!! above). The correct code should convert that NA to the mode of the x vector. Also, the function should check the type of its inputs, etc.

Filled values are not shown

I am a new to R.
I had some values with NAs and i filled them like this
katsastus_3_20211227_115940%>% fill(c("registration_year","reg"), .direction = "down")
when i run the code, at console i got what i desired, like this https://i.stack.imgur.com/2EkjL.png
and when im trying view(katsastus_3_20211227_115940)
i get this https://i.stack.imgur.com/zcBfK.png which is how the data was when i got them

you can reassign your data.frame (as #Peace Wang suggested in his/her first comment) using fill, e.g.:
f <- structure(list(reg = c("2017", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), modela = c("Alfa Romeo - Models in total", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), registration_year = c("Years in total", NA, NA, NA, NA, NA, NA, NA, NA, "2002", NA, NA), object_of_inspection = c("A", "B","C", "D", "E","F", "G","H", "I","A", "B","C")), row.names = c(NA,-12L), class = c("tbl_df", "tbl", "data.frame"))
f <- f%>% fill(c("registration_year","reg"), .direction = "down")

Column linking problem on Parent_ID and Extension in R

I have a file which contains some Order_IDs and their Externsion_ID if exists. A new order can be fresh order, or an extension of existing Order_ID or an extension of an existing extension.
My problem is to add a new column named Parent_ID which marks the root of the Order_ID.
Please find the expected output as below :
A reproducible input is attached below.
df1 = structure(list(Order_ID = c("SL158", "SL159", "SL160", "SL162",
"SL164", "SL165", "SL168", "SL169", "SL170", "SL171", "SL172",
"SL176", "SL177", "SL178", "SL179", "SL180", "SL183", "SL184",
"SL189", "SL190", "SL191", "SL192", "SL193", "SL195", "SL196",
"SL199", "SL200", "SL201", "SL207", "SL208", "SL209", "SL218",
"SL219", "SL223", "SL224", "SL225", "SL226", "SL227", "SL229",
"SL232", "SL233", "SL234", "SL235", "SL239", "SL240", "SL241",
"SL242", "SL243", "SL251", "SL252", "SL257", "SL258", "SL260",
"SL261", "SL262", "SL266", "SL267", "SL268", "SL269", "SL277",
"SL278", "SL279", "SL280", "SL281", "SL287", "SL288", "SL289",
"SL300", "SL301", "SL302", "SL303", "SL304", "SL305", "SL315",
"SL316", "SL322", "SL323", "SL327", "SL328", "SL333", "SL334",
"SL335", "SL336", "SL337", "SL340", "SL341", "SL342", "SL343",
"SL344", "SL345", "SL350", "SL351", "SL352", "SL353", "SL354",
"SL355", "SL363", "SL364", "SL365", "SL366", "SL367", "SL368",
"SL369", "SL370", "SL376", "SL377", "SL378", "SL379", "SL380",
"SL381", "SL382", "SL383", "SL384", "SL385", "SL1217", "SL1452",
"SL4316", "SL4317", "SL4348", "SL4381", "SL4681", "SL4738", "SL5319",
"SL5520", "SL5703", "SL6132", "SL6244", "SL6855", "SL6997", "SLB1253161",
"SLB2970530", "SLB27287329", "SLB36502009", "SLB81913180", "SLB82838226",
"SLB90244936", "SLB99701642", "SL11995", "SLH5317239", "SLH22149557",
"SLH44727392", "SLH45803004", "SLH57801072", "SLH74470000", "SLH79063451",
"SL1134", "SL1011", "SL3686", "SL3691", "SL3695", "SL3716", "SL3718",
"SL3720", "SL3721", "SL3727", "SL5242", "SL5245", "SL5246", "SL5254",
"SL5255", "SL10126", "SL10134", "SL10143", "SL11333", "SL11338",
"SL11365", "SL11377", "SL11384", "SL10004", "SL10046", "SL10058",
"SL10070", "SL10092", "SL11335", "SL11364", "SL11366"),
Extension_Of = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, "SL1134", "SL1011", "SL3691", "SL3718", "SL3727", "SL3695",
"SL3720", "SL3716", "SL3721", "SL5242", "SL5246", "SL5245", "SL5254",
"SL5255", "SL3686", "SL11365", "SL11384", "SL11377", "SL10134",
"SL11333", "SL10143", "SL11338", "SL10126", "SL10046", "SL10070",
"SL11364", "SL11335", "SL10004", "SL10058", "SL11366", "SL10092",
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, "SL384", NA, NA, "SL171", NA,
NA, NA)),
row.names = c(NA, -176L),
class = c("tbl_df", "tbl", "data.frame"))
head(df1)
# Order_ID Extension_Of
#1 SL158 <NA>
#2 SL159 <NA>
#3 SL160 <NA>
#4 SL162 <NA>
#5 SL164 <NA>
#6 SL165 <NA>

Here is a solution based on igraph:
library(igraph) # 1.2.1
v <- data.frame(name = unique(unlist(df1)), stringsAsFactors = FALSE)
v <- v[!is.na(v$name), ]
g <- graph_from_data_frame(df1[!is.na(df1$Extension_Of), 2:1], vertices = v)
df1$Parent_ID <- sapply(df1$Order_ID, function(oid){
n <- ego(g, order = nrow(df1), oid, mode = 'in')[[1]]
nin <- lapply(n, function(x){ego(g, order = nrow(df1), x, mode = 'in')[[1]]})
root <- n[lengths(nin) == 1]$name
})
df1[df1$Parent_ID == 'SL384', ]
# Order_ID Extension_Of Parent_ID
# 113 SL384 <NA> SL384
# 138 SL11995 SL10046 SL384
# 170 SL10046 SL384 SL384
This answer is inspired by this answer and this function.
The rationale: Each line without NA in df1 can be treated as an edge in a graph. if B is extension of A, we have an edge A -> B. If C is extension of B, we get B->C. Then the problem can be rephrased as: for each node (Order_ID), find its root node. For C, its root node is A since (A->B->C).
In the code above, for Order_ID, ego finds all the nodes that are directly or indirectly upstream of it (including itself). Among those upstream nodes, we can determine the root node as the one without other upstream nodes.

Calculating the mean of 3 columns in data frame

I have 3 data frames and they are just replicates. So I want to bind them and calculate the mean of each fraction.
Three data frames:
Nr.1
> dput(head(tbl_gel1))
structure(list(Name = c("yal003w", "yal005c", "yal012w", "yal016w",
"yal035w", "yal038w"), `1_1` = c(1.08346521189121, NA, NA, NA,
NA, NA), `1_10` = c(0.267721905361376, 1.43303883148383, 1.61684304894131,
NA, NA, NA), `1_11` = c(0.189487668138674, 0.75522363065885,
1, NA, NA, NA), `1_12` = c(NA, 1.01340492119247, NA, NA, NA,
NA), `1_13` = c(0.374782308020683, 0.945489433731933, NA, NA,
NA, 0.0317297633029047), `1_14` = c(0.437488212634424, 1.18763709680314,
NA, NA, NA, 0.0278039649538794), `1_15` = c(1, 0.963283876302253,
NA, NA, NA, 0.101985769564935), `1_16` = c(0.933864874212228,
0.534233379286527, NA, NA, NA, 0.216767470594226), `1_17` = c(1,
0.665519263271478, NA, NA, 1, 1), `1_18` = c(0.666036574750145,
0.570465125348879, NA, NA, NA, 1.42894349812116), `1_19` = c(0.514337131747938,
0.23204076838128, NA, NA, 1, 1.2521214021452), `1_2` = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `1_20` = c(NA,
NA, NA, NA, NA, 1.40803677399372), `1_21` = c(1.09990599806138,
NA, NA, NA, NA, 1.04631699593704), `1_22` = c(1.26442418472118,
NA, NA, NA, NA, 0.928872017485782), `1_23` = c(1.11596921281805,
NA, NA, NA, 1, 0.34698227364696), `1_24` = c(0.754496014447251,
NA, NA, NA, 1, 0.222234793614252), `1_3` = c(6.29254185223621,
NA, NA, 0.693642968439352, NA, NA), `1_4` = c(1.36347593974479,
NA, NA, 1, NA, NA), `1_5` = c(0.765885344543765, NA, NA, 1, NA,
NA), `1_6` = c(0.238118001668604, 0.679584207611477, NA, NA,
NA, NA), `1_7` = c(0.847897771442355, 0.277348019879946, NA,
NA, NA, NA), `1_8` = c(0.356154192700505, 1, 0.409523853881517,
NA, NA, NA), `1_9` = c(0.180109142324181, 1, 0.578310191227172,
NA, NA, 0.093113736249161)), .Names = c("Name", "1_1", "1_10",
"1_11", "1_12", "1_13", "1_14", "1_15", "1_16", "1_17", "1_18",
"1_19", "1_2", "1_20", "1_21", "1_22", "1_23", "1_24", "1_3",
"1_4", "1_5", "1_6", "1_7", "1_8", "1_9"), row.names = c(NA,
6L), class = "data.frame")
Nr. 2
> dput(head(tbl_gel2))
structure(list(Name = c("yal003w", "yal005c", "yal012w", "yal016w",
"yal035w", "yal038w"), `2_1` = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), `2_2` = c(1.0548947840373, NA,
NA, NA, NA, NA), `2_3` = c(1.61794716486303, 0.346821796129205,
NA, NA, NA, NA), `2_4` = c(1, NA, NA, 0.378254379051086, NA,
NA), `2_5` = c(0.670710809411423, NA, NA, 1, NA, NA), `2_6` = c(0.313872585645673,
NA, NA, NA, NA, NA), `2_7` = c(0.299293639466945, 0.13920907824675,
NA, NA, NA, NA), `2_8` = c(0.311431376422469, 0.511742245543671,
0.342807141055383, NA, NA, NA), `2_9` = c(0.243672215177189,
1, 0.689138745271004, NA, NA, 0.0540861571772987), `2_10` = c(0.154732102234279,
1.08973258347909, 1, NA, NA, NA), `2_11` = c(0.149365726324845,
1.1210733533474, 1.0427649268992, NA, NA, 0.0955468461925663),
`2_12` = c(0.153741630869067, 2.96276072446013, 1, NA, NA,
NA), `2_13` = c(0.629371115599316, 0.952868912207058, 0.0771105403237483,
NA, NA, 0.0885212695236819), `2_14` = c(0.907644486740723,
1.43000783337778, NA, NA, NA, 0.138102409899801), `2_15` = c(1.09683345304359,
0.423641943213571, NA, NA, NA, 0.255699738225622), `2_16` = c(0.913095779338154,
0.510977400533081, NA, NA, 0.520556617688936, 0.284898552722227
), `2_17` = c(0.935941553863477, 0.388225948821767, NA, NA,
1.14984991998928, 1), `2_18` = c(2.21746156904543, 0.642743615867438,
NA, NA, NA, 2.22716071647178), `2_19` = c(0.500618035526774,
0.282924681750454, NA, NA, NA, 1), `2_20` = c(0.701627311828743,
0.254001731153973, NA, NA, 1, 1.15996914621286), `2_21` = c(1.97359874904275,
NA, NA, NA, 1.67526802494991, 1.38709456754353), `2_22` = c(2.09198896289293,
NA, NA, NA, NA, 0.921672834103247), `2_23` = c(1.18791465369551,
NA, NA, NA, NA, 0.576309066193914), `2_24` = c(0.473199477125101,
0.176144702328764, NA, NA, 1, 0.130236848112641)), .Names = c("Name",
"2_1", "2_2", "2_3", "2_4", "2_5", "2_6", "2_7", "2_8", "2_9",
"2_10", "2_11", "2_12", "2_13", "2_14", "2_15", "2_16", "2_17",
"2_18", "2_19", "2_20", "2_21", "2_22", "2_23", "2_24"), row.names = c(NA,
6L), class = "data.frame")
Nr.3
> dput(head(tbl_gel3))
structure(list(Name = c("yal003w", "yal005c", "yal012w", "yal016w",
"yal035w", "yal038w"), `3_1` = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), `3_2` = c(1, 1.4605309655311,
NA, NA, NA, NA), `3_3` = c(1.74480713727388, 0.42825619952525,
NA, NA, NA, NA), `3_4` = c(1, 0.431712121875013, NA, 0.395182020245312,
NA, NA), `3_5` = c(2.26247329056518, 0.644462177666441, NA, 1,
NA, NA), `3_6` = c(0.619783374266709, 0.472094874244026, NA,
NA, NA, NA), `3_7` = c(0.45731912574756, 0.176354321796083, NA,
NA, NA, NA), `3_8` = c(0.271829278733367, 0.517232771669986,
0.153774052052871, NA, NA, NA), `3_9` = c(0.141017619508583,
1.41279969394534, 0.651948154271122, NA, NA, NA), `3_10` = c(NA,
1.64435171100405, 0.998807430240956, NA, NA, NA), `3_11` = c(0.110046035477971,
1.33684444261939, 1.25595310581771, NA, NA, 0.0236163735479745
), `3_12` = c(NA, 0.982250906830292, 0.39283619985401, NA, NA,
0.0688303458902568), `3_13` = c(0.136798076436642, 0.55729642483448,
0.176525038283566, NA, NA, 0.0251189412372225), `3_14` = c(0.316623893146817,
1, NA, NA, NA, 0.0727823461722849), `3_15` = c(NA, 0.607991038574375,
NA, NA, NA, 0.133968257432001), `3_16` = c(0.362994392402489,
0.547183167896534, NA, NA, NA, 0.0777347708647245), `3_17` = c(1,
0.116561118715651, NA, NA, 0.710972173471528, 1), `3_18` = c(NA,
3.63330458071475, NA, NA, NA, 3.24019081192985), `3_19` = c(NA,
NA, NA, NA, NA, 2.46635222132474), `3_20` = c(0.452303676849426,
0.0896715384025126, NA, NA, 1, 1), `3_21` = c(1.50169299468485,
0.513442106966708, NA, NA, 1.45124841710635, 1.02529618467026
), `3_22` = c(0.565232592993276, 0.748536315065533, NA, NA, 2.9089322117881,
0.782555457293307), `3_23` = c(1.62622280168665, 0.704926586534075,
NA, NA, NA, 0.584486806995139), `3_24` = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_)), .Names = c("Name",
"3_1", "3_2", "3_3", "3_4", "3_5", "3_6", "3_7", "3_8", "3_9",
"3_10", "3_11", "3_12", "3_13", "3_14", "3_15", "3_16", "3_17",
"3_18", "3_19", "3_20", "3_21", "3_22", "3_23", "3_24"), row.names = c(NA,
6L), class = "data.frame")
I used function below to bind them. There are different number of rows in each data frame and in some cases different names so in the final table should be more rows than in each of them.
mylist <- list(tbl_gel1,tbl_gel2,tbl_gel3)
tbl_all <- Reduce(function(x, y) merge(x, y, all=T,by="Name",sort=F),
mylist, accumulate=F)
Everything goes fine until this moment.
Now I want to calculate the mean of each fraction (there is 24 fractions in total)
## Calculating the mean
tbl_all1 <- tbl_all[-1]
ind <- c(1, 25, 49)
tbl_mean <- cbind(tbl_all[1], sapply(0:23, function(i) rowMeans(tbl_all1[ind+i])))
There is something wrong with that function because sum of many rows gives 0. That's definitely wrong because in tbl_gel1 and others are only rows with atleast one number in any fraction.
If I take a look on tbl_mean I see that rows with sum of 0 are in the bottom.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

r - merge rows in group while replacing NAs [duplicate] - r

Hope this helps, Using dplyr df_start <- sapply(df_start, as.character) df_start[is.na(df_start)] <- " " df_start <- as.data.frame(df_start) library(dplyr) df_start %>% group_by(id) %>% summarise_all(funs(trimws(paste(., collapse = '')))) -> df

Related

reshaping multiple columns in R, based on name values

Collapsing Dataframe Rows along several variables

Filled values are not shown

Column linking problem on Parent_ID and Extension in R

Calculating the mean of 3 columns in data frame

Categories

Resources