Aggregate some columns while keeping other columns unchanged

Aggregate some columns while keeping other columns unchanged - r

I have data frame like this dummy sample, my real dataset had 56 variables.
I would like to drop the date and aggregate by id and sum last 4 total variables while keep the other unchanged.
df <- data.frame(stringsAsFactors=FALSE,
date = c("2019-02-10", "2019-02-10", "2019-02-11", "2019-02-11",
"2019-02-12", "2019-02-12", "2019-02-13", "2019-02-13",
"2019-02-14", "2019-02-14"),
id = c("18100410-aa", "18101080-ae", "18100410-aa", "18101080-ae",
"18100410-aa", "18101080-ae", "18100410-aa", "18101080-ae",
"18100410-aa", "18101080-ae"),
f_type = c(4L, 2L, 4L, 2L, 4L, 2L, 4L, 2L, 4L, 2L),
reg = c(6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L),
hh_p10 = c(2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L),
internet = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L),
youngest = c(5L, 7L, 5L, 7L, 5L, 7L, 5L, 7L, 5L, 7L),
a_group = c(3L, 6L, 3L, 6L, 3L, 6L, 3L, 6L, 3L, 6L),
total_prd = c(130L, 337L, 374L, 261L, 106L, 230L, 150L, 36L, 15L, 123L),
B_totalprod = c(20L, 0L, 256L, 0L, 32L, 0L, 0L, 36L, 0L, 45L),
p_totalprod = c(0L, 81L, 11L, 260L, 26L, 230L, 0L, 0L, 15L, 0L),
n_totalprod = c(110L, 256L, 107L, 1L, 48L, 0L, 150L, 0L, 0L, 78L)
)
I found this solution from plyr package here it is working but I need to specify all my 52 unaffected variables. I am just wondering is there any other way to do this task?
library(plyr)
ddply(df,.(id,f_type, reg, internet,hh_p10 ,youngest, a_group ),summarise,total_prd = sum(total_prd) ,
B_totalprod = sum(B_totalprod) , p_totalprod = sum(p_totalprod) ,
n_totalprod = sum(n_totalprod))

If your real dataset also has columns that contain "total" this should work:
library(tidyverse)
df %>%
select(-date) %>%
group_by(.dots = str_subset(names(.), "total", negate = TRUE)) %>%
summarise_all(list(sum = sum))
# A tibble: 2 x 11
# Groups: id, f_type, reg, hh_p10, internet, youngest [2]
id f_type reg hh_p10 internet youngest a_group total_prd_sum B_totalprod_sum p_totalprod_sum n_totalprod_sum
<chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 18100410-aa 4 6 2 1 5 3 775 308 52 415
2 18101080-ae 2 7 1 2 7 6 987 81 571 335
The line group_by(.dots = str_subset(names(.), "total", negate = TRUE)) means we are going to group by all the column names in our this dataset that do not contain the word "total".

Related

Loop over specific columns data and add the result as a new column in R

I have a dataframe df with following information:
df <- structure(list(Samples = structure(c(1L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 2L, 1L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L, 1L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 2L, 1L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L, 2L), .Label = c("Sample1", "Sample10", "Sample2",
"Sample3", "Sample4", "Sample5", "Sample6", "Sample7", "Sample8",
"Sample9"), class = "factor"), patient.vital_status = c(0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 1L, 0L, 1L), years = c(3.909589041, 1.457534247,
2.336986301, 5.010958904, 1.665753425, 1.81369863, 1.191780822,
4.687671233, 2.167123288, 1.95890411, 3.909589041, 1.457534247,
2.336986301, 5.010958904, 1.665753425, 1.81369863, 1.191780822,
4.687671233, 2.167123288, 1.95890411, 3.909589041, 1.457534247,
2.336986301, 5.010958904, 1.665753425, 1.81369863, 1.191780822,
4.687671233, 2.167123288, 1.95890411, 3.909589041, 1.457534247,
2.336986301, 5.010958904, 1.665753425, 1.81369863, 1.191780822,
4.687671233, 2.167123288, 1.95890411), Genes = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("A1BG", "A1CF", "A2M",
"A2ML1"), class = "factor"), value = c(0.034459012, 0.017698878,
0.023313851, 0.010456762, 0.032674019, 0.037561831, 0.03380681,
0, 0.019954956, 0.012392427, 0.835801613, 2.265192447, 2.431409095,
5.012117956, 2.139962802, 2.371946704, 4.555234385, 0.550293401,
0.924012327, 2.274642129, 92.85639578, 79.50897642, 23.72187602,
26.86025304, 32.80504253, 222.6449054, 71.78812505, 45.76371588,
29.93976676, 22.97515484, 0.03780441, 0.005825143, 0, 0.002867985,
0.011948708, 0.02060423, 0.004636111, 0.015903347, 0.005473063,
0.033988816)), class = "data.frame", row.names = c(NA, -40L))
I want to loop over the information based on the columns Genes and value and get a result. And again I want the result to be added to the dataframe df. The result will be with low or high.
I'm trying to do this with the following code, but it doesn't work:
genes <- as.character(unique(df$Genes))
library(survival)
library(survminer)
for(i in genes){
surv_rnaseq.cut <- surv_cutpoint(
df,
time = "years",
event = "patient.vital_status",
variables = c("Genes","value"))
df$cat <- surv_categorize(surv_rnaseq.cut)
}
Along with the above result I also wanted the summary for surv_rnaseq.cut for all the four genes with mentioning its name.
Any help please. thanq

An option would be to split by 'genes' (group_split), loop over the list, apply the functions and bind the list elements after creating the column
library(survminer)
library(survival)
library(dplyr)
library(purrr)
df %>%
group_split(Genes) %>%
map_dfr(~ surv_cutpoint(.x,
time = "years",
event = "patient.vital_status",
variables = c("Genes", "value")) %>%
surv_categorize %>%
pull(value) %>%
mutate(.x, cat = .))
# A tibble: 40 x 6
# Samples patient.vital_status years Genes value cat
# <fct> <int> <dbl> <fct> <dbl> <chr>
# 1 Sample1 0 3.91 A1BG 0.0345 high
# 2 Sample2 0 1.46 A1BG 0.0177 high
# 3 Sample3 0 2.34 A1BG 0.0233 high
# 4 Sample4 0 5.01 A1BG 0.0105 high
# 5 Sample5 0 1.67 A1BG 0.0327 high
# 6 Sample6 0 1.81 A1BG 0.0376 high
# 7 Sample7 0 1.19 A1BG 0.0338 high
# 8 Sample8 1 4.69 A1BG 0 low
# 9 Sample9 0 2.17 A1BG 0.0200 high
#10 Sample10 1 1.96 A1BG 0.0124 high
# … with 30 more rows

How can I identify specific string row number

My data looks like this
df<- structure(list(Main = structure(c(5L, 3L, 1L, 2L, 4L, 4L, 2L,
1L, 5L, 2L, 5L, 4L, 5L, 2L), .Label = c("IsMainbody", "IsMainbodyCandidate",
"IsMainbodyRejected", "Main", "None"), class = "factor"), Group.IDs = c(52L,
NA, 2L, 12L, 38L, 38L, 6L, 3L, NA, 49L, 20L, 38L, 54L, 85L),
X..Number1 = c(12L, 6L, 1L, 5L, 1L, 1L, 1L, 1L, 17L, 1L,
4L, 1L, 1L, 4L), X..No = c(20L, 62L, 2L, 16L, 3L, 3L, 1L,
3L, 32L, 3L, 36L, 3L, 1L, 20L), X..Unique.N = c(0L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-14L))
I am trying to find the row number of for specific strings.
Based on main column, I want to find this how many of my sample has "Main" , how many have "IsmainbodyCandidate" and how many are "IsMainbodyRejected"
Then I want to make a new dataset that only consists of Main and Ismainbody and Ismainbodycandidates like below .
Main Group IDs # Number1 # No # Unique N
IsMainbody. 2 1 2 1
IsMainbodyCandidate 12 5 16 0
Main 38 1 3 0
Main 38 1 3 0
IsMainbodyCandidate 6 1 1 0
IsMainbody 3 1 3 0
IsMainbodyCandidate 49 1 3 0
IsMainbodyCandidate 85 4 20 0

# count by main
table(df$Main)
# new dataframe without "None"
df[df$Main != "None", ]
# or more explicitly
df[df$Main %in% c("Main", "IsMainbody", "IsMainbodyCandidate"), ]

replacement missing values by groups in R

How can i replace the missing values for each group separately?
The reproducible example:
mydata=structure(list(group1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), group.2 = c(1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L), x1 = c(20L, 4L, 91L, NA, 94L, 69L, 38L,
NA, 29L, 69L, 55L, 86L, 81L, 11L, NA, 12L, 65L, 90L, 74L, NA,
49L, 90L), x2 = c(44L, 94L, NA, 1L, 67L, NA, 73L, 22L, 44L, 24L,
NA, 54L, 70L, 65L, 97L, 10L, 97L, NA, 74L, 97L, 34L, 29L)), class = "data.frame", row.names = c(NA,
-22L))
Now i found how to replace the missing values without groups.
library(dplyr)
mydata %>% mutate_at(vars(starts_with("x1")), funs(ifelse(is.na(.) & is.numeric(.) ,mean(., na.rm = TRUE),.)))
But i need to replace for each groups (group1,group2) separately.
edit to small dataset
structure(list(group1 = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L), group.2 = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
2L, 2L, 2L), x1 = c(63L, 67L, 57L, NA, 65L, 75L, 57L, 80L, 42L,
NA, 35L, 80L), x2 = c(46L, 1L, NA, 41L, 80L, NA, 74L, 73L, NA,
13L, 83L, NA)), class = "data.frame", row.names = c(NA, -12L))

mydata=structure(list(group1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), group2 = c(1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L), x1 = c(20L, 4L, 91L, NA, 94L, 69L, 38L,
NA, 29L, 69L, 55L, 86L, 81L, 11L, NA, 12L, 65L, 90L, 74L, NA,
49L, 90L), x2 = c(44L, 94L, NA, 1L, 67L, NA, 73L, 22L, 44L, 24L,
NA, 54L, 70L, 65L, 97L, 10L, 97L, NA, 74L, 97L, 34L, 29L)), class = "data.frame", row.names = c(NA,
-22L))
library(tidyverse)
mydata %>%
unite(group, group1, group2) %>% # combine groups
mutate(id = row_number()) %>% # add the row number as an id (useful when reshaping)
gather(var, value, -group, -id) %>% # reshape data
group_by(group, var) %>% # for each group combination and variable
mutate(value = ifelse(is.na(value), mean(value, na.rm = T), value)) %>% # replace NAs with mean
spread(var, value) %>% # reshape again
arrange(id) %>% # keep order of original dataset
select(-id) %>% # remove id
ungroup() %>% # forget the grouping
separate(group, c("group1","group2")) # split the groups again
# # A tibble: 22 x 4
# group1 group2 x1 x2
# <chr> <chr> <dbl> <dbl>
# 1 1 1 20 44
# 2 1 2 4 94
# 3 1 1 91 61.3
# 4 1 2 36.5 1
# 5 1 1 94 67
# 6 1 2 69 39
# 7 1 1 38 73
# 8 1 2 36.5 22
# 9 2 1 29 44
# 10 2 2 69 24
# # ... with 12 more rows

In R, how do I order within a single column so one category is ascending and one is descending?

I am generating multiple experimental designs of different sizes and shapes. This is done using a function dependent on the agricolae package (I’ve included it below). To generate practical data sheets for field operations I need to order the data frame by Row, then for odd Rows sort the Range ascending and for even Rows sort it descending.
Using sort, order, rep and seq I have been able to find a simple solution to this. Any suggestions are greatly appreciated!
So the data frame will go from something like this:
df1 <- structure(list(Block = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), Range = c(1L, 2L, 3L, 4L, 1L, 2L,
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L,
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L,
3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), Row = c(1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L,
9L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L
), Plot = c(101L, 201L, 301L, 401L, 102L, 202L, 302L, 402L, 103L,
203L, 303L, 403L, 104L, 204L, 304L, 404L, 105L, 205L, 305L, 405L,
106L, 206L, 306L, 406L, 107L, 207L, 307L, 407L, 108L, 208L, 308L,
408L, 109L, 209L, 309L, 409L, 110L, 210L, 310L, 410L, 111L, 211L,
311L, 411L, 112L, 212L, 312L, 412L), Entry.Num = c(14L, 26L,
18L, 4L, 52L, 17L, 41L, 47L, 40L, 30L, 21L, 12L, 9L, 2L, 8L,
36L, 25L, 43L, 15L, 6L, 33L, 48L, 54L, 37L, 9L, 18L, 8L, 41L,
48L, 28L, 7L, 47L, 54L, 38L, 46L, 23L, 19L, 1L, 3L, 27L, 36L,
14L, 12L, 33L, 16L, 24L, 31L, 2L)), .Names = c("Block", "Range",
"Row", "Plot", "Entry.Num"), class = "data.frame", row.names = c(NA,
-48L))
To something like this:
df2 <- structure(list(Block = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), Range = c(1L, 2L, 3L, 4L, 4L, 3L,
2L, 1L, 1L, 2L, 3L, 4L, 4L, 3L, 2L, 1L, 1L, 2L, 3L, 4L, 4L, 3L,
2L, 1L, 1L, 2L, 3L, 4L, 4L, 3L, 2L, 1L, 1L, 2L, 3L, 4L, 4L, 3L,
2L, 1L, 1L, 2L, 3L, 4L, 4L, 3L, 2L, 1L), Row = c(1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L,
5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L,
9L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 12L, 12L, 12L, 12L
), Plot = c(101L, 201L, 301L, 401L, 402L, 302L, 202L, 102L, 103L,
203L, 303L, 403L, 404L, 304L, 204L, 104L, 105L, 205L, 305L, 405L,
406L, 306L, 206L, 106L, 107L, 207L, 307L, 407L, 408L, 308L, 208L,
108L, 109L, 209L, 309L, 409L, 410L, 310L, 210L, 110L, 111L, 211L,
311L, 411L, 412L, 312L, 212L, 112L), Entry.Num = c(14L, 26L,
18L, 4L, 47L, 41L, 17L, 52L, 40L, 30L, 21L, 12L, 36L, 8L, 2L,
9L, 25L, 43L, 15L, 6L, 37L, 54L, 48L, 33L, 9L, 18L, 8L, 41L,
47L, 7L, 28L, 48L, 54L, 38L, 46L, 23L, 27L, 3L, 1L, 19L, 36L,
14L, 12L, 33L, 2L, 31L, 24L, 16L)), .Names = c("Block", "Range",
"Row", "Plot", "Entry.Num"), class = "data.frame", row.names = c(NA,
-48L))
In case you're interested, this is the trial design function. There is undoubtedly a more elegant way to do this but I am not particularly good at R:
Trial.Design <- function(Total.Entries, Rows.per.Block, Ranges.per.Block, Trial.Name){
library(agricolae)
library(reshape2)
#########################################################################################
# Generate a trial design #
#########################################################################################
total.trt <- Total.Entries
if(total.trt%%2) # If the variety number is uneven it will return the following error message
stop("WARNING: Variety number is uneven! Subsequent script will not work correctly!")
blocks <- 4 # This is fixed, we are unlikely to use a different block number in any trial.
trt<-c(1:total.trt) # You could in theory have the variety names here.
# This function from agricolae generates a statistically sound trial design.
outdesign <-design.rcbd(trt, blocks, serie=0,continue=TRUE,986,"Wichmann-Hill") # seed for ranomization = 986
# This uses an agricolae function to print the "field book" of the trial.
book <-outdesign$book # field book
#########################################################################################
# Generate blocking in two directions #
#########################################################################################
# The following generates an appropriately blocked map. The idea is block in two directions.
# We use this design so that the blocking structure captures field trends both down and across the field.
Block.Rows <- Rows.per.Block
Block.Ranges <- Ranges.per.Block
ifelse(total.trt==Block.Rows*Block.Ranges, "Entry number is okay",
stop("WARNING: Block is uneven and/or does not equal entry number! Subsequent script will not work correctly!"))
Block <- matrix(rep(1, times=total.trt))
Range <- matrix(rep(1:Block.Rows, times=Block.Ranges))
Row <- matrix(rep(1:Block.Ranges, each=Block.Rows))
Block.1 <- cbind(Block, Range)
Block.1 <- cbind(Block.1, Row)
Block <- matrix(rep(3, times=total.trt))
Range <- matrix(rep((Block.Rows+1):(Block.Rows*2), times=Block.Ranges))
Row <- matrix(rep(1:Block.Ranges, each=Block.Rows))
Block.3 <- cbind(Block, Range)
Block.3 <- cbind(Block.3, Row)
Block <- matrix(rep(2, times=total.trt))
Range <- matrix(rep(1:Block.Rows, times=Block.Ranges))
Row <- matrix(rep((Block.Ranges+1):(Block.Ranges*2), each=Block.Rows))
Block.2 <- cbind(Block, Range)
Block.2 <- cbind(Block.2, Row)
Block <- matrix(rep(4, times=total.trt))
Range <- matrix(rep((Block.Rows+1):(Block.Rows*2), times=Block.Ranges))
Row <- matrix(rep((Block.Ranges+1):(Block.Ranges*2), each=Block.Rows))
Block.4 <- cbind(Block, Range)
Block.4 <- cbind(Block.4, Row)
# The following adds the coordinates generated above to our field book.
Field.book <- rbind(Block.1, Block.2)
Field.book <- rbind(Field.book, Block.3)
Field.book <- rbind(Field.book, Block.4)
Plots <- as.matrix(rep(1:(total.trt*4)))
Field.book <- cbind(Plots, Field.book)
# Generate temporary Range names.
colnames(Field.book) <- c("plots", "block", "range", "row")
Field.book <- as.data.frame(Field.book)
Field.book$range <- as.numeric(Field.book$range)
Field.book$row <- as.numeric(Field.book$row)
# This joins the experimental design generated by agricolae to the plot layout generated above.
Field.book <- join(Field.book, book, by= c("plots","block"))
# Generate better Range names.
colnames(Field.book) <- c("Plot.Num", "Block", "Range", "Row", "Entry.Num")
# Create Plot coordinates.
Field.book$Plot <- (Field.book$Range * 100) + Field.book$Row
# Reorders the Ranges to something more intuitive.
# I drop the 'plot number' Range generated by agricolae because I don't think it is useful or necessary in our case.
Field.book <- Field.book[c("Block", "Range", "Row", "Plot", "Entry.Num")]
# Sort the plots by Range and Row.
Field.book <- Field.book[order(Field.book$Range, Field.book$Row),]
Field.book <<- Field.book
# Convert the Ranges to factors to allow for conversion to a 'wide' format.
Field.book$Block <- as.factor(Field.book$Block)
Field.book$Range <- as.factor(Field.book$Range)
Field.book$Row <- as.factor(Field.book$Row)
Field.book$Plot <- as.factor(Field.book$Plot)
#########################################################################################
# Generate plot maps #
#########################################################################################
# This function rotates the design if it's deemed necessary.
# rotate <- function(x) t(apply(x, 2, rev))
Field.design.num <- dcast(Field.book, Row ~ Range, value.var = "Entry.Num")
Field.design.num$Row <- as.numeric(Field.design.num$Row)
Field.design.num <- Field.design.num[order(-Field.design.num$Row),]
Field.book$Plot <- as.factor(Field.book$Plot)
colnames(Field.design.num)[2:ncol(Field.design.num)] <- paste("Row", colnames(Field.design.num[,c(2:ncol(Field.design.num))]), sep = "-")
Field.design.num$Row <- sub("^", "Range-", Field.design.num$Row)
#rotate(Field.design.num)
Field.design.num <<- Field.design.num
Field.design.plot <- dcast(Field.book, Row ~ Range, value.var = "Plot")
Field.design.plot$Row <- as.numeric(Field.design.plot$Row)
Field.design.plot <- Field.design.plot[order(-Field.design.plot$Row),]
Field.book$Plot <- as.factor(Field.book$Plot)
colnames(Field.design.plot)[2:ncol(Field.design.plot)] <- paste("Row", colnames(Field.design.plot[,c(2:ncol(Field.design.plot))]), sep = "-")
Field.design.plot$Row <- sub("^", "Range-", Field.design.plot$Row)
#rotate(Field.design.plot)
Field.design.plot <<- Field.design.plot
Field.design.Block <- dcast(Field.book, Row ~ Range, value.var = "Block")
Field.design.Block$Row <- as.numeric(Field.design.Block$Row)
Field.design.Block <- Field.design.Block[order(-Field.design.Block$Row),]
Field.book$Block <- as.factor(Field.book$Block)
colnames(Field.design.Block)[2:ncol(Field.design.Block)] <- paste("Row", colnames(Field.design.Block[,c(2:ncol(Field.design.Block))]), sep = "-")
Field.design.Block$Row <- sub("^", "Range-", Field.design.Block$Row)
#rotate(Field.design.Block)
Field.design.Block <<- Field.design.Block
#########################################################################################
# Write the files #
#########################################################################################
write.csv(Field.book, paste("Field Book",Trial.Name,".csv"), row.names=FALSE)
write.csv(Field.design.num, paste("Field map Entry",Trial.Name,".csv"), row.names=FALSE)
write.csv(Field.design.plot, paste("Field map Plots",Trial.Name,".csv"), row.names=FALSE)
write.csv(Field.design.Block, paste("Field map Blocks",Trial.Name,".csv"), row.names=FALSE)
#########################################################################################
}
# The parameters are:
# The total number of entires/varieties in a replicate (NOTE: The number of entries must be an even number).
# The number of rows in an individual block/replicate.
# The number of ranges in an individual block/replicate.
# (NOTE: The number of rows and ranges must multiply to give the number of entries.)
# The trial name is what will be written to your working directory.
Total.Entries = 54
Rows.per.Block = 9
Ranges.per.Block = 6
Trial.Name = "Example"
Trial.Design (Total.Entries, Rows.per.Block, Ranges.per.Block, Trial.Name)

The magic of order awaits you:
df1[order(df1$Row, c(-1,1)[df1$Row %% 2 + 1] * df1$Range ),]
Essentially what this does is order by Row, then by Range, multiplied by -1 if it is even. x %% 2 can be used to check for odd/even status.
all.equal(
df1[order(df1$Row, c(-1,1)[df1$Row %% 2 + 1] * df1$Range ),],
df2,
check.attributes=FALSE
)
#[1] TRUE

Sampling distribution and sum of tables

I've made a few experiments and each experiment led to the apparition of color.
As I can't do more experiments, I want to sample by size=30 and see what frequency table (of colors) I could obtain for 1000 sampling. The resulting frequency table should be the sum of the 1000 frequency table.
I think about concatenating table as follows and try to agregate, but it did not work:
mydata=structure(list(Date = structure(c(11L, 1L, 9L, 9L, 10L, 1L, 2L,
3L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 6L, 7L, 4L, 4L, 4L, 6L, 6L, 11L,
5L, 4L, 7L, 10L, 6L, 6L, 2L, 5L, 7L, 11L, 1L, 9L, 11L, 11L, 11L,
1L, 1L), .Label = c("01/02/2016", "02/02/2016", "03/02/2016",
"08/02/2016", "10/02/2016", "11/02/2016", "16/02/2016", "22/02/2016",
"26/01/2016", "27/01/2016", "28/01/2016"), class = "factor"),
Color = structure(c(30L, 33L, 11L, 1L, 18L, 18L, 11L,
16L, 19L, 19L, 22L, 1L, 18L, 18L, 13L, 14L, 13L, 18L, 24L,
24L, 11L, 24L, 2L, 33L, 25L, 1L, 30L, 5L, 24L, 18L, 13L,
35L, 19L, 19L, 18L, 23L, 19L, 8L, 19L, 14L), .Label = c("ARD",
"ARP", "BBB", "BIE", "CFX", "CHR", "DDD", "DOO", "EAU", "ELY",
"EPI", "ETR", "GEN", "GER", "GGG", "GIS", "ISE", "JUV", "LER",
"LES", "LON", "LYR", "MON", "NER", "NGY", "NOJ", "NYO", "ORI",
"PEO", "RAY", "RRR", "RSI", "SEI", "SEP", "VIL", "XQU", "YYY",
"ZYZ"), class = "factor"), Categorie = structure(c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", "1,2", "1,2,3",
"1,3", "2", "2,3", "3", "4", "5"), class = "factor"), Portion_Longueur = c(3L,
4L, 1L, 1L, 2L, 4L, 5L, 6L, 7L, 7L, 8L, 8L, 9L, 8L, 8L, 9L,
11L, 7L, 7L, 7L, 9L, 8L, 3L, 8L, 7L, 11L, 2L, 9L, 8L, 5L,
8L, 12L, 3L, 4L, 1L, 3L, 3L, 3L, 4L, 5L)), .Names = c("Date",
"Color", "Categorie", "Portion_Longueur"), row.names = c(NA,
40L), class = "data.frame")
for (i in 1:1000) {
mysamp= sample(mydata$Color,size=30)
x=data.frame(table(mysamp))
if (i==1) w=x
else w <- c(w, x)
}
aggregate(w$Freq, by=list(Color=w$mysamp), FUN=sum)
Example, for 3 sampling, for (i in 1:3) I expect have sum as follow :
But I do not have Sum, instead I have:
Color x
1 ARD 2
2 ARP 1
3 BBB 0
4 BIE 0
5 CFX 0
6 CHR 0
7 DDD 0
8 DOO 1
9 EAU 0
10 ELY 0
11 EPI 3
12 ETR 0
13 GEN 2
14 GER 2
15 GGG 0
16 GIS 1
17 ISE 0
18 JUV 4
19 LER 5
20 LES 0
21 LON 0
22 LYR 1
23 MON 1
24 NER 2
25 NGY 1
26 NOJ 0
27 NYO 0
28 ORI 0
29 PEO 0
30 RAY 1
31 RRR 0
32 RSI 0
33 SEI 2
34 SEP 0
35 VIL 1
36 XQU 0
37 YYY 0
38 ZYZ 0
How to do this ?
Thanks a lot

Your for loop is what's causing your issues. You end up creating a big list that is somewhat difficult to perform calculations on (check out names(w) to see what I mean). A better data structure would allow for easier calculations:
x = NULL #initialize
for (i in 1:1000) {
mysamp = sample(mydata$Color,size=30) #sample
mysamp = data.frame(table(mysamp)) #frequency
x = rbind(x, mysamp) #bind to x
}
aggregate(Freq~mysamp, data = x, FUN = sum) #perform calculation
Note that this loop runs a bit slower than your loop. This is because of the rbind() function. See this post. Maybe someone will come along with a more efficient solution.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Aggregate some columns while keeping other columns unchanged - r

Related

Loop over specific columns data and add the result as a new column in R

How can I identify specific string row number

replacement missing values by groups in R

In R, how do I order within a single column so one category is ascending and one is descending?

Sampling distribution and sum of tables

Categories

Resources