Extract R2 values from multiple adonis results

Extract R2 values from multiple adonis results - r

I am wondering if there is some way how to extract results from adonis function in vegan package and possibly save it by write.table?
I mean other way than print results to console and copy-past R2 value to Excel.
This can be especially useful when running adonis iteratively for multiple combinations and saving objects with results into one list as suggested in this SO answer.

Here is an example on how you can extract the needed parameters from the model. I will use the linked example:
library(vegan)
data(dune)
data(dune.env)
lapply is used here instead of a loop:
results <- lapply(colnames(dune.env), function(x){
form <- as.formula(paste("dune", x, sep="~"))
z <- adonis(form, data = dune.env, permutations=99)
return(as.data.frame(z$aov.tab)) #convert anova table to a data frame
}
)
this will produce a list of data frames each having the form
> results[[1]]
#output
Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
A1 1 0.7229518 0.7229518 3.638948 0.1681666 0.01
Residuals 18 3.5760701 0.1986706 NA 0.8318334 NA
Total 19 4.2990219 NA NA 1.0000000 NA
now you can name the list elements with the appropriate variable:
names(results) <- colnames(dune.env)
convert to a data frame:
results <- do.call(rbind, results)
#output
Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
A1.A1 1 0.7229518 0.7229518 3.638948 0.1681666 0.01
A1.Residuals 18 3.5760701 0.1986706 NA 0.8318334 NA
A1.Total 19 4.2990219 NA NA 1.0000000 NA
Moisture.Moisture 3 1.7281651 0.5760550 3.585140 0.4019903 0.01
Moisture.Residuals 16 2.5708567 0.1606785 NA 0.5980097 NA
Moisture.Total 19 4.2990219 NA NA 1.0000000 NA
Management.Management 3 1.4685918 0.4895306 2.767243 0.3416107 0.01
Management.Residuals 16 2.8304301 0.1769019 NA 0.6583893 NA
Management.Total 19 4.2990219 NA NA 1.0000000 NA
Use.Use 2 0.5531507 0.2765754 1.255190 0.1286690 0.30
Use.Residuals 17 3.7458712 0.2203454 NA 0.8713310 NA
Use.Total 19 4.2990219 NA NA 1.0000000 NA
Manure.Manure 4 1.5238805 0.3809701 2.059193 0.3544714 0.03
Manure.Residuals 15 2.7751414 0.1850094 NA 0.6455286 NA
Manure.Total 19 4.2990219 NA NA 1.0000000 NA
and now you can save it to a csv or any other format you like:
write.csv(results, "res.csv")
If only R squared is needed change the lapply call to:
results <- lapply(colnames(dune.env), function(x){
form <- as.formula(paste("dune", x, sep="~"))
z <- adonis(form, data = dune.env, permutations=99)
return(data.frame(name = rownames(z$aov.tab), R2 = z$aov.tab$R2))
}
)

Related

Averaging control group conditions into new output column. Output column contains the associated control average values

Not new but still a beginner to R.
This is a snippet of my data (numbers randomised so the Average columns are not correct in this example).
> head(data)
1 2 3 Average HKAverage dC
Neg CNTRL NA NA NA NA NA NA
NEG CNTRL NA NA NA NA 0.80393767 NA
POS CNTRL 0.1836139 0.11392904 0.02925255 0.1089318 0.72559250 0.6165367
WT 1 0.5091585 0.15929057 0.51686195 0.3951037 0.26582395 0.5877941
WT 2 0.1924527 0.05267426 0.77929719 0.3414747 0.48798007 0.2600975
WT AA 1 0.2525962 0.97503047 0.62913683 0.6189212 0.03930599 0.9048247
> tail(data)
1 2 3 Average HKAverage dC
T AB 4 0.3425330 0.1698632 0.3100509 0.2741490 0.2312321 0.39589730
T C 1 0.8170886 0.8202081 0.1487331 0.5953433 0.1268834 0.99938496
T C 2 0.4374555 0.1926919 0.2847973 0.3049816 0.8647057 0.00970199
T C 3 0.3194017 0.2683773 0.8150882 0.4676224 0.8750478 0.73646663
T C 4 0.1091098 0.1547485 0.9696392 0.4111658 0.9897441 0.18335950
Pos CNTRL NA NA NA NA NA NA
I'm doing some calculations with these values and the outputs are generated as a new column.
I'm running this before running any calculations:
data <- as.data.frame(input.data)
data[data == "Undetermined"] <- NA
data[] <- sapply(data, as.numeric)
Ignoring the 4 CNTRL rows (I should probably just remove them then!) there are WT... and T... for the same conditions. These conditions are repeated 2 or 4 times (hence WT 1, WT 2, T 1, T 2, etc.).
I want to make a new column that contains the average of a WT condition. In the rows for the T conditions I want the same WT averages to show up there.
This would be an example of my output: (Av meaning average)
> head(newdata)
X X1 X2 X3 Average HKAverage dC ControlAv
1 Neg CNTRL NA NA NA NA NA NA NA
2 NEG CNTRL NA NA NA NA 0.80393767 NA NA
3 POS CNTRL 0.1836139 0.11392904 0.02925255 0.1089318 0.72559250 0.6165367 NA
4 WT 1 0.5091585 0.15929057 0.51686195 0.3951037 0.26582395 0.5877941 WT1:2Av
5 WT 2 0.1924527 0.05267426 0.77929719 0.3414747 0.48798007 0.2600975 WT1:2Av
6 WT AA 1 0.2525962 0.97503047 0.62913683 0.6189212 0.03930599 0.9048247 WTAA1:4Av
> tail(newdata)
X X1 X2 X3 Average HKAverage dC ControlAv
10 T V1 0.4568928 0.5566606 0.610042142 0.5411985 0.8372219 0.9200497 WT1:2Av
11 T V2 0.8633715 0.3191596 0.483468638 0.5553332 0.8860817 0.9486309 WT1:2Av
12 T AA 1 0.1587924 0.2986826 0.005692643 0.1543892 0.1064064 0.7750263 WTAA1:4Av
13 T AA 2 0.3665066 0.9289861 0.143083833 0.4795255 0.4543861 0.9992564 WTAA1:4Av
14 T AA 3 0.5580805 0.4041877 0.411612593 0.4579603 0.8457465 0.9380688 WTAA1:4Av
15 T AA 4 0.8149501 0.1642240 0.229479382 0.4028845 0.7638992 0.6026836 WTAA1:4Av
I'm currently trying to use the within() function but not finding success:
> data$wt.av <- within(data, mean(dC["WT 1" & "WT 2"]))
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 'mean': operations are possible only for numeric, logical or complex types
My dataframe is numeric but the rownames are obviously not.
aggregate() doesn't work in this case because the rownames do not match

I've done this using a very long winded method that is not very elegant. But it's done!
I took apart the data frame...
dat.V <- data[4:5, ]
for each control group. Then used cbind to put these averages onto the non-control group data frames. Then used bind_rows to bring all of this back together into one data frame.

Create column from data on dynamic number of columns depending on availabity in R

Given a uncertain number of columns containing source values for the same variable I would like to create a column that defines the final value to be selected depending on source importance and availability.
Reproducible data:
set.seed(123)
actuals = runif(10, 500, 1000)
get_rand_vector <- function(){return (runif(10, 0.95, 1.05))}
get_na_rand_ixs <- function(){return (round(runif(5,0,10),0))}
df = data.frame("source_1" = actuals*get_rand_vector(),
"source_2" = actuals*get_rand_vector(),
"source_n" = actuals*get_rand_vector())
df[["source_1"]][get_na_rand_ixs()] <- NA
df[["source_2"]][get_na_rand_ixs()] <- NA
df[["source_n"]][get_na_rand_ixs()] <- NA
My manual solution is as follows:
df$available <- ifelse(
!is.na(df$source_1),
df$source_1,
ifelse(
!is.na(df$source_2),
df$source_2,
df$source_n
)
)
Given the desired result of:
source_1 source_2 source_n available
1 NA NA NA NA
2 NA NA 930.1242 930.1242
3 716.9981 NA 717.9234 716.9981
4 NA 988.0446 NA 988.0446
5 931.7081 NA 924.1101 931.7081
6 543.6802 533.6798 NA 543.6802
7 744.6525 767.4196 783.8004 744.6525
8 902.8788 955.1173 NA 902.8788
9 762.3690 NA 761.6135 762.3690
10 761.4092 702.6064 708.7615 761.4092
How could I automatically iterate over the available sources to set the data to be considered? Given in some cases n_sources could be 1,2,3..,7 and priority follows the natural order (1 > 2 >..)

Once you have all of the candidate vectors in order and in an appropriate data structure (e.g., data.frame or matrix), you can use apply to apply a function over the rows. In this case, we just look for the first non-NA value. Thus, after the first block of code above, you only need the following line:
df$available <- apply(df, 1, FUN = function(x) x[which(!is.na(x))[1]])

coalesce() from dplyr is designed for this:
library(dplyr)
df %>%
mutate(available = coalesce(!!!.))
source_1 source_2 source_n available
1 NA NA NA NA
2 NA NA 930.1242 930.1242
3 716.9981 NA 717.9234 716.9981
4 NA 988.0446 NA 988.0446
5 931.7081 NA 924.1101 931.7081
6 543.6802 533.6798 NA 543.6802
7 744.6525 767.4196 783.8004 744.6525
8 902.8788 955.1173 NA 902.8788
9 762.3690 NA 761.6135 762.3690
10 761.4092 702.6064 708.7615 761.4092

Fill data frame by column with for loop

I created an empty data frame with 11 columns and 15 rows and subsequently named the columns.
L_df <- data.frame(matrix(ncol = 11, nrow = 15))
names(L_df) <- paste0("L_por", 0:10)
w <- c(0.2, 0.4, 0.6, 0.8, 1, 1.2, 1.4, 1.6, 1.8, 2, 2.2, 2.4, 2.6, 2.8, 3)
wu <- 0
L <- 333.7
pm <- c(2600, 2574, 2548, 2522, 2496, 2470, 2444, 2418, 2392, 2366, 2340)
The data frame looks like this:
head(L_df)
L_por0 L_por1 L_por2 L_por3 L_por4 L_por5 L_por6 L_por7 L_por8 L_por9 L_por10
1 NA NA NA NA NA NA NA NA NA NA NA
2 NA NA NA NA NA NA NA NA NA NA NA
3 NA NA NA NA NA NA NA NA NA NA NA
4 NA NA NA NA NA NA NA NA NA NA NA
5 NA NA NA NA NA NA NA NA NA NA NA
6 NA NA NA NA NA NA NA NA NA NA NA
Now, I would like to fill the data frame by column, based on a formula. I tried to express this with a nested for loop:
for (i in 1:ncol(L_df)) {
pm_tmp <- pm[i]
col_tmp <- colnames(L_df)[i]
for (j in 1:nrow(L_df)) {
w_tmp <- w[j]
L_por_tmp <- pm_tmp*L*((w_tmp-wu)/100)
col_tmp[j] <- L_por_tmp
}
}
For each column, I iterate over a predefined vector pm of length 11. For each row, I iterate over a predefined vector w of length 15 (repeats each column).
Example: First, select pm[1] for the first column. Second, select w[i] for each row in the first column. Store the formula in L_por_tmp and use it to fill the first column from row1 to row15. The whole procedure should start all over again for the second column (with pm[2]) with w[i] for each row and so on. wu and L are fixed in the formula.
R executes the code without an error. When I check the tmp values, they are correct. However, the data frame remains empty. L_df does not get filled. I would like solve this with a loop but if you have other solutions, I am happy to hear them! I get the impression there might be a smoother way of doing this. Cheers!

Solution
L_df <- data.frame(sapply(pm, function(x) x * L * ((w - wu) / 100)))
names(L_df) <- c("L_por0", "L_por1", "L_por2", "L_por3", "L_por4", "L_por5",
"L_por6", "L_por7", "L_por8", "L_por9", "L_por10")
L_df
L_por0 L_por1 L_por2 L_por3 L_por4 L_por5 L_por6 L_por7
1 1735.24 1717.888 1700.535 1683.183 1665.830 1648.478 1631.126 1613.773
2 3470.48 3435.775 3401.070 3366.366 3331.661 3296.956 3262.251 3227.546
3 5205.72 5153.663 5101.606 5049.548 4997.491 4945.434 4893.377 4841.320
4 6940.96 6871.550 6802.141 6732.731 6663.322 6593.912 6524.502 6455.093
5 8676.20 8589.438 8502.676 8415.914 8329.152 8242.390 8155.628 8068.866
6 10411.44 10307.326 10203.211 10099.097 9994.982 9890.868 9786.754 9682.639
7 12146.68 12025.213 11903.746 11782.280 11660.813 11539.346 11417.879 11296.412
8 13881.92 13743.101 13604.282 13465.462 13326.643 13187.824 13049.005 12910.186
9 15617.16 15460.988 15304.817 15148.645 14992.474 14836.302 14680.130 14523.959
10 17352.40 17178.876 17005.352 16831.828 16658.304 16484.780 16311.256 16137.732
11 19087.64 18896.764 18705.887 18515.011 18324.134 18133.258 17942.382 17751.505
12 20822.88 20614.651 20406.422 20198.194 19989.965 19781.736 19573.507 19365.278
13 22558.12 22332.539 22106.958 21881.376 21655.795 21430.214 21204.633 20979.052
14 24293.36 24050.426 23807.493 23564.559 23321.626 23078.692 22835.758 22592.825
15 26028.60 25768.314 25508.028 25247.742 24987.456 24727.170 24466.884 24206.598
L_por8 L_por9 L_por10
1 1596.421 1579.068 1561.716
2 3192.842 3158.137 3123.432
3 4789.262 4737.205 4685.148
4 6385.683 6316.274 6246.864
5 7982.104 7895.342 7808.580
6 9578.525 9474.410 9370.296
7 11174.946 11053.479 10932.012
8 12771.366 12632.547 12493.728
9 14367.787 14211.616 14055.444
10 15964.208 15790.684 15617.160
11 17560.629 17369.752 17178.876
12 19157.050 18948.821 18740.592
13 20753.470 20527.889 20302.308
14 22349.891 22106.958 21864.024
15 23946.312 23686.026 23425.740
Explanation
The sapply() function can be used to iterate over vectors in a more idiomatic way for R programming. We iterate over pm and use your formula once since R is vectorised; each time it creates a vector of length 15 (so 11 vectors of length 15), and when we wrap it in data.frame() returns the data frame you want and we add in the column names.
NOTE: Applying functions to every element of a vector using an apply() family function has some different implications than iterating using for loops. In your case, I think sapply() is easier and more understandable. For more information on when you need a loop or when something like apply is better, see for example this discussion from Hadley Wickham's Advanced R book.

You are just doing a small mistake and you were almost there, Edited your function:
for (i in 1:ncol(L_df)) {
pm_tmp <- pm[i]
col_tmp <- colnames(L_df)[i]
for (j in 1:nrow(L_df)) {
w_tmp <- w[j]
L_por_tmp <- pm_tmp*L*((w_tmp-wu)/100)
L_df[ j ,col_tmp] <- L_por_tmp ##You must have used df[i, j] referencing here
}
}
Output:
Just printing the head of few rows:
L_df
L_por0 L_por1 L_por2 L_por3 L_por4 L_por5 L_por6 L_por7 L_por8 L_por9 L_por10
1 1735.24 1717.888 1700.535 1683.183 1665.830 1648.478 1631.126 1613.773 1596.421 1579.068 1561.716
2 3470.48 3435.775 3401.070 3366.366 3331.661 3296.956 3262.251 3227.546 3192.842 3158.137 3123.432
3 5205.72 5153.663 5101.606 5049.548 4997.491 4945.434 4893.377 4841.320 4789.262 4737.205 4685.148

combine two zoo time series

I have 2 sucesive ZOO time series (the date of one begins after the other finishes), they have the following form (but much longer and not only NA values):
a:
1979-01-01 1979-01-02 1979-01-03 1979-01-04 1979-01-05 1979-01-06 1979-01-07 1979-01-08 1979-01-09
NA NA NA NA NA NA NA NA NA
b:
1988-08-15 1988-08-16 1988-08-17 1988-08-18 1988-08-19 1988-08-20 1988-08-21 1988-08-22 1988-08-23 1988-08-24 1988-08-25
NA NA NA NA NA NA NA NA NA NA NA
all I want to do is combine them in one time serie as a ZOO object, it seems to be a basic task but I am doing something wrong. I use the function "merge":
combined <- merge(a, b)
but the result is something in the form:
a b
1980-03-10 NA NA
1980-03-11 NA NA
1980-03-12 NA NA
1980-03-13 NA NA
1980-03-14 NA NA
1980-03-15 NA NA
1980-03-16 NA NA
.
.
which is not a time series, and the lengths dont fit:
> length(a)
[1] 10957
> length(b)
[1] 2557
> length(combined)
[1] 27028
how can I just combine them into one time series with the form of the original ones?

Assuming the series shown reproducibly in the Note at the end, the result of merging the two series has 20 times and 2 columns (one for each series). The individual series have lengths 9 and 11 elements and the merged series is a zoo object with 9 + 11 = 20 rows (since there are no intersecting times) and 2 columns (one for each input) and length 40 (= 20 * 2). Note that the length of a multivariate series is the number of elements in it, not the number of time points.
length(z1)
## [1] 9
length(z2)
## [1] 11
m <- merge(z1, z2)
class(m)
## [1] "zoo"
dim(m)
## [1] 20 2
nrow(m)
## [1] 20
length(index(m))
## [1] 20
length(m)
## [1] 40
If what you wanted is to string them out one after another then use c:
length(c(z1, z2))
## [1] 20
The above are consistent with how merge, c and length work in base R.
Note:
library(zoo)
z1 <- zoo(rep(NA, 9), as.Date(c("1979-01-01", "1979-01-02", "1979-01-03",
"1979-01-04", "1979-01-05", "1979-01-06", "1979-01-07", "1979-01-08",
"1979-01-09")))
z2 <- zoo(rep(NA, 11), as.Date(c("1988-08-15", "1988-08-16", "1988-08-17",
"1988-08-18", "1988-08-19", "1988-08-20", "1988-08-21", "1988-08-22",
"1988-08-23", "1988-08-24", "1988-08-25")))

Column looping through a user function and storing output in a newly created column (R)

I have some data that consists of an oscillatory-like pattern and would like to take some measurements of the peaks. I have several chunks of code and most of them work to do exactly what I want. The main issue I'm having is that I have no idea how to integrate them to work functionally together.
Essentially, I would like to use the freq function I've written on a dataframe so that it will loop through each column (a, b, and c) and give me the results of the function. Then I would like to store the output for each column in a new dataframe with the column names matching the source names.
I have read a lot of answers about looping through columns and creating new columns in a dataframe, which is how I've gotten to this point. Some of the individual pieces need a little tweaking but what I can't find anywhere is a good explanation of how I can put it all together. I have tried to no avail; I just can't see to get the order right.
(For reproducible data)
library(zoo)
count = 1:20
a = c(-0.802776, -0.748272, 0.187434, 1.23577, 1.00677, 0.874122, 0.232802, -0.279368, -1.57815, -1.76652, -0.958916, -0.316385, 0.831575, 1.19312, 1.45508, 0.848923, 0.257728, -0.318474, -1.14129, -1.42576)
b = c(-2.23512, -1.36572, -0.0357366, 0.925563, 1.53282, 0.171045, -0.438714, -1.38769, -0.696898, 1.37184, 2.01038, 2.6302, 2.53296, 1.8788, 0.100366, -1.34726, -1.4309, -1.37271, -0.750669, 0.100656)
c = c(0.749062, 0.0690315, -0.750494, -1.04069, -0.654432, 0.0186072, 0.710011, 0.920915, 1.13075, 0.227108, -0.195086, -0.68333, -0.607532, -0.485424, 0.495913, 0.655385, 0.468796, 0.274053, -0.906834 , 0.321526)
test = data.frame(count, a, b, c)
d = 20:40
This is the chunk of code I've written to go through any data I specify and identify local peaks, then calculate a series of things from the identified peaks. It works really well and there's no issue with the functionality of this (however, suggestions to make it better are welcome), just with putting it together with the rest.
I would like to loop through columns of a dataframe (using a for loop in the next section to accomplish that) and get the result of the freq function for each column
freq = function(x, y, data, w=1, span = 0.05, ...) {
require(zoo)
n = length(y)
y.smooth = loess(y ~ x, span = span)$fitted
y.max = rollapply(zoo(y.smooth), 2*w+1, max, align = "center")
delta = y.max - y.smooth[-c(1:w, n+1-1:w)]
i.max = which(delta <= 0) + w #identifies peaks
list(x = x[i.max], i = i.max, y.hat = y.smooth)
dist = diff(i.max) #calculates distance between peaks
instfreq = (25/dist) #calculates the rate of each peak occurence
print(instfreq) #output I ultimately want
}
#example
freq(count, a, span = 0.5)
This is how I'm looping through columns in a specified dataframe. Also, I'm not sure what I've done but this ends up printing my output twice...(which I'd like to avoid).
for(i in test){
output <- freq(test$count, y = i, span = 0.5)
print(output)
}
This is probably the part giving me the biggest headache. This should add new columns to an existing dataframe. It works so far but I have yet to figure out how to integrate it into the stuff above. Also, I'd really like for it to store the output in a new dataframe, rather than the source dataframe.
For reference, here df = data, to.add = data to add to df, new.name = name of new col
Another thing I'd like is for the new.name to come from the source (to.add). For example if I tried to add d (from above) to the end of test, I'd like for the column name (new.name) to read d without having to specify it. This will be helpful when I'm looping through multiple columns and want to keep the source name from which the output was calculated.
add.col = function(df, to.add, new.name){
if (nrow(df) < length(to.add)){
df = # pads rows if needed
rbind(df, matrix(NA, length(to.add)-nrow(df), ncol(df),
dimnames = list(NULL, names(df))))
}
length(to.add) = nrow(df) # pads with NA's
df[, new.name] = to.add; # names new col whatever was placed in new.name arg
return(head(df)) #shortened output so I can verify it worked
#when I was testing it for myself, this would
#need to be changed so that it adds the column
#to a dataframe and stores the results, which
#I believe would require I use print() and a store
#like Results = print(df)
}
#example
addcol(test, d, "d") #would like the code to grab the name d just from the to.add
#argument, without having to specify "d" as the new.name
Any help, suggestions, or refinements (to make it less clunky, more efficient, etc) would be greatly appreciated.
I can get by with the for loop (if the duplications get fixed) as long as I can figure out how to store all the output together in one place. My actual data is in a similar format to the reproducible set above, it just has far more rows and columns (and will already be in a .csv dataframe rather than creating it from individual vectors).
I've been beating my head over this for a few days now and have gotten so far but just can't get it all the way.
Also, feel free to edit the title to help it get to the right people!

Ok first of all, the reason your function is printing the output twice is because essentially what happens is:
instfreq gets calculated and returned
instfreq gets printed out
instfreq is getting assigned to output
output gets printed out again
Furthermore, I suppose you don't want you function to try and calculate it for the count argument (which returns numeric(0)) so it would be best to run it only for the other columns.
Lastly, such simple for loops can easily be replaced by the apply function in r. Which brings the first part of your question to:
freq = function(x, y, data, w=1, span = 0.05, ...) {
require(zoo)
n = length(y)
y.smooth = loess(y ~ x, span = span)$fitted
y.max = rollapply(zoo(y.smooth), 2*w+1, max, align = "center")
delta = y.max - y.smooth[-c(1:w, n+1-1:w)]
i.max = which(delta <= 0) + w #identifies peaks
list(x = x[i.max], i = i.max, y.hat = y.smooth)
dist = diff(i.max) #calculates distance between peaks
instfreq = (25/dist) #calculates the rate of each peak occurence
return(instfreq) #output I ultimately want
}
output <- apply(test[,2:length(test[1,])],2, function(v) freq(test$count, y=v, span=0.5))
output
# a b c
#2.500000 3.571429 2.777778
the second part of your question wants to return the name of a variable to use it as the name of the new column. for this we can use deparse(substitute(variable)) so your function becomes:
add.col = function(df, to.add){
new.name <- deparse(substitute(to.add))
if (nrow(df) < length(to.add)){
df = # pads rows if needed
rbind(df, matrix(NA, length(to.add)-nrow(df), ncol(df),
dimnames = list(NULL, names(df))))
}
length(to.add) = nrow(df) # pads with NA's
df[, new.name] = to.add; # names new col whatever was placed in new.name arg
return(df)
}
#example
dnametest = 20:40
add.col(test, dnametest)
# count a b c dnametest
#1 1 -0.802776 -2.2351200 0.7490620 20
#2 2 -0.748272 -1.3657200 0.0690315 21
#etc.
this function will not override your original dataframe, so you simply need to assign it to a new dataframe:
newframe <- add.col(test, dnametest)
EDIT adding a possibility to loop x amount of arrays:
The first problem you'll have while trying to loop, is that you're working with arrays of different lengths. This makes it hard to work with dataframes so you'll have to work with lists. In this case it will be easier to write a new function which takes any amount of arrays, and loops them for you automatically. Because it's easier to catch and add the names in this function, I've readjusted your function add.col to take new.name again:
add.col = function(df, to.add, new.name){
if (nrow(df) < length(to.add)){
df = # pads rows if needed
rbind(df, matrix(NA, length(to.add)-nrow(df), ncol(df),
dimnames = list(NULL, names(df))))
}
length(to.add) = nrow(df) # pads with NA's
df[, new.name] = to.add;
return((df))
}
then I can write a second function add.multicol like this:
#this function takes in an unspecfied number of arguments
add.multicol <- function(df, ...){
#convert this number of arguments to a list
to.add.cols <- list(...)
#add the variable names to this list
names(to.add.cols) <- as.list(substitute(list(...)))[-1]
#find number of columns to add
number.cols.to.add <- length(to.add.cols)
#loop add.col
newframe <- df
for(i in 1:number.cols.to.add){
to.add.col <- array(unlist(to.add.cols[i]))
to.add.col.name <- names(to.add.cols[i])
newframe <- add.col(newframe,to.add.col,to.add.col.name)
}
return(newframe)
}
This will allow you to do whichever you want. Example:
dnametest <- 20:40
test1 <- 1:15
test2 <- 25:56
argumentsake <- seq(0,1,length=21)
#run function
newframe <- add.multicol(test,dnametest,test1,test2,argumentsake)
newframe
# count a b c dnametest test1 test2 argumentsake
#1 1 -0.802776 -2.2351200 0.7490620 20 1 25 0.00
#2 2 -0.748272 -1.3657200 0.0690315 21 2 26 0.05
#3 3 0.187434 -0.0357366 -0.7504940 22 3 27 0.10
#4 4 1.235770 0.9255630 -1.0406900 23 4 28 0.15
#5 5 1.006770 1.5328200 -0.6544320 24 5 29 0.20
#6 6 0.874122 0.1710450 0.0186072 25 6 30 0.25
#7 7 0.232802 -0.4387140 0.7100110 26 7 31 0.30
#8 8 -0.279368 -1.3876900 0.9209150 27 8 32 0.35
#9 9 -1.578150 -0.6968980 1.1307500 28 9 33 0.40
#10 10 -1.766520 1.3718400 0.2271080 29 10 34 0.45
#11 11 -0.958916 2.0103800 -0.1950860 30 11 35 0.50
#12 12 -0.316385 2.6302000 -0.6833300 31 12 36 0.55
#13 13 0.831575 2.5329600 -0.6075320 32 13 37 0.60
#14 14 1.193120 1.8788000 -0.4854240 33 14 38 0.65
#15 15 1.455080 0.1003660 0.4959130 34 15 39 0.70
#16 16 0.848923 -1.3472600 0.6553850 35 NA 40 0.75
#17 17 0.257728 -1.4309000 0.4687960 36 NA 41 0.80
#18 18 -0.318474 -1.3727100 0.2740530 37 NA 42 0.85
#19 19 -1.141290 -0.7506690 -0.9068340 38 NA 43 0.90
#20 20 -1.425760 0.1006560 0.3215260 39 NA 44 0.95
#21 NA NA NA NA 40 NA 45 1.00
#22 NA NA NA NA NA NA 46 NA
#23 NA NA NA NA NA NA 47 NA
#24 NA NA NA NA NA NA 48 NA
#25 NA NA NA NA NA NA 49 NA
#26 NA NA NA NA NA NA 50 NA
#27 NA NA NA NA NA NA 51 NA
#28 NA NA NA NA NA NA 52 NA
#29 NA NA NA NA NA NA 53 NA
#30 NA NA NA NA NA NA 54 NA
#31 NA NA NA NA NA NA 55 NA
#32 NA NA NA NA NA NA 56 NA
EDIT 2: extending the loop to take in dataframes of any form as well
now it becomes quite messy, you will also need to rename your output elements so they don't match any column names already present.
add.multicol <- function(df, ...){
#convert this number of arguments to a list
to.add.cols <- list(...)
#find number of columns to add
number.args <- length(to.add.cols)
#number of elements per list entry
hierarch.cols.to.add <- array(0,length(number.args))
for(i in 1:number.args){
#if this list element has only one name, treat it as an array, else treat it as a data frame
if(is.null(names(to.add.cols[[i]]))){
#get variable names from input of normal arrays
names(to.add.cols[[i]]) <- as.list(substitute(list(...)))[i+1]
hierarch.cols.to.add[i] <- 1
} else {
#find the number of columns in the data frame
number <- length(names(to.add.cols[[i]]))
hierarch.cols.to.add[i] <- number
}
}
#loop add.col
newframe <- df
for(i in 1:number.args){
#if array
if(hierarch.cols.to.add[i]==1){
to.add.col <- array(unlist(to.add.cols[[i]]))
to.add.col.name <- names(to.add.cols[[i]][1])
newframe <- add.col(newframe,to.add.col,to.add.col.name)
} else { #if data.frame
#foreach column in the data frame
for(j in 1:hierarch.cols.to.add[i]){
#if only one element per column
if(is.null(dim(to.add.cols[[i]]))){
to.add.col <- to.add.cols[[i]][j]
} else { #if multiple elements per column
to.add.col <- to.add.cols[[i]][,j]
}
to.add.col.name <- names(to.add.cols[[i]])[j]
newframe <- add.col(newframe,to.add.col,to.add.col.name)
}
}
}
return(newframe)
}
testdf <- data.frame(cbind(test1,test2))
dnametest <- 20:40
output <- apply(test[,2:length(test[1,])],2, function(v) freq(test$count, y=v, span=0.5))
#edit output names because we can't have a dataframe with the same name for multiple columns
names(output) <- c("output_a","output_b","output_c")
newframe <- test
#function now takes dataframes of single elements, normal data frames and single arrays
newframe <- add.multicol(newframe,output,dnametest,testdf)
# count a b c output_a output_b output_c dnametest test1 test2
#1 1 -0.802776 -2.2351200 0.7490620 2.5 3.571429 2.777778 20 0 25
#2 2 -0.748272 -1.3657200 0.0690315 NA NA NA 21 1 26
#3 3 0.187434 -0.0357366 -0.7504940 NA NA NA 22 2 27
#4 4 1.235770 0.9255630 -1.0406900 NA NA NA 23 3 28
#...
this is probably not the most efficient way, but it gets the job done

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Extract R2 values from multiple adonis results - r

Related

Averaging control group conditions into new output column. Output column contains the associated control average values

Create column from data on dynamic number of columns depending on availabity in R

Fill data frame by column with for loop

combine two zoo time series

Column looping through a user function and storing output in a newly created column (R)

Categories

Resources