Column Mean for rows with unique values

Column Mean for rows with unique values - r

how can I compute the mean R, R1, R2, R3 values from the rows sharing the same lon,lat field? I'm sure this questions exists multiple times but I could not easily find it.
lon lat length depth R R1 R2 R3
1 147.5348 -35.32395 13709 1 0.67 0.80 0.84 0.83
2 147.5348 -35.32395 13709 2 0.47 0.48 0.56 0.54
3 147.5348 -35.32395 13709 3 0.43 0.29 0.36 0.34
4 147.4290 -35.27202 12652 1 0.46 0.61 0.60 0.58
5 147.4290 -35.27202 12652 2 0.73 0.96 0.95 0.95
6 147.4290 -35.27202 12652 3 0.77 0.92 0.92 0.91

I'd recommend using the split-apply-combine strategy, where you're splitting by BOTH lon and lat, applying mean to each group, then recombining into a single data frame.
I'd recommend using dplyr:
library(dplyr)
mydata %>%
group_by(lon, lat) %>%
summarize(
mean_r = mean(R)
, mean_r1 = mean(R1)
, mean_r2 = mean(R2)
, mean_r3 = mean(R3)
)

Related

Advanced pivot_longer: extract pattern in variables

My R df looks like this:
ID CO.RT CO.ER SC.RT SC.ER
1 0.19 0.06 1.24 0.09
2 0.61 0.01 0.63 0.03
3 0.43 0.02 1.31 0.09
I've been trying to find a way to use tidyr::pivot_longer to achieve the following:
ID Type RT ER
1 CO 0.19 0.06
1 SC 1.24 0.09
2 CO 0.61 0.01
2 SC 0.63 0.03
3 CO 0.43 0.02
3 SC 1.31 0.09
My issue: I can only pivot RT and RT into a single "score"-column—I fail to find the right regex(?) pattern to pivot my df in the way shown above.
Here is what I tried:
df %>% pivot_longer(cols = c(CO.RT:SC.ER),
names_pattern = "(.+).(.+)",
names_to=c("Type", ".value"))
There are many pivot_longer/names_to questions out there, however, I couldn't find the right one for my problem. Can anyone help?
My dataset:
df <- tibble(
ID = c(1,2,3),
CO.RT = c(0.19,0.61,0.43),
CO.ER = c(0.06,0.01,0.02),
SC.RT = c(1.24,0.63,1.31),
SC.ER = c(0.09,0.03,0.09)
)

Best way to find weighted averages from two dataframes in R?

new to R so sorry if this is a bit broad but I'm not really even sure where to start with an approach to this problem.
I have two dataframes, df1 containing demographic data from certain Census tracts.
AfricanAmerican AsianAmerican Hispanic White
Tract1 0.25 0.25 0.25 0.25
Tract2 0.50 0.10 0.20 0.10
Tract3 0.05 0.10 0.35 0.50
And df2 contains observations polygons with the percentage of each census tract that makes up its area.
Poly1 Poly2 Poly3
Tract1 0.33 0.25 0.00
Tract2 0.33 0.25 0.10
Tract3 0.34 0.50 0.90
What I want to do is get the weighted averages of the demographic data in each observation polygon
AfricanAmerican AsianAmerican Hispanic White
Poly1 0.26 0.15 0.27 0.29
Poly2 0.21 0.14 0.29 0.34
Poly3 0.10 0.10 0.34 0.46
So far I'm thinking I could do something like
sum(df1$AfricanAmerican * df2$Poly1)
Then use a for loop to iterate over all demographic variables for one polygon, then nest that in another for loop to iterate over all polygons, but given that I have hundreds of Census tracts and polygons in my working dataset, I was wondering if there was a better approach?

Use colSums of the products in mapply.
t(mapply(function(...) colSums(`*`(...)), list(df1), df2))
# AfricanAmerican AsianAmerican Hispanic White
# [1,] 0.2645 0.1495 0.2675 0.2855
# [2,] 0.2125 0.1375 0.2875 0.3375
# [3,] 0.0950 0.1000 0.3350 0.4600
If you want to round to two digits, just wrap round(..., 2) around it.
Data:
df1 <- read.table(header=T, text='
AfricanAmerican AsianAmerican Hispanic White
Tract1 0.25 0.25 0.25 0.25
Tract2 0.50 0.10 0.20 0.10
Tract3 0.05 0.10 0.35 0.50
')
df2 <- read.table(header=T, text='
Poly1 Poly2 Poly3
Tract1 0.33 0.25 0.00
Tract2 0.33 0.25 0.10
Tract3 0.34 0.50 0.90
')

Libraries
library(tidyverse)
Sample Data
df1 <-
tibble(
Trat = paste0("Trat",1:3),
AfricanAmerican = c(.25,.5,.05),
AsianAmerican = c(.25,.1,.1),
Hispanic = c(.25,.2,.35)
)
df2 <-
tibble(
Trat = paste0("Trat",1:3),
Poly1 = c(.33,.33,.34),
Poly2 = c(.25,.25,.5),
Poly3 = c(0,.1,.9)
) %>%
#Pivot df2 making a single column for all Poly values
pivot_longer(cols = -Trat,names_to = "Poly")
Code
df1 %>%
#Join df1 and df2 by Trat
left_join(df2) %>%
#Grouping by Poly
group_by(Poly) %>%
#Sum values across variables AfricanAmerican to Hispanic
summarise(across(AfricanAmerican:Hispanic,function(x)sum(x*value)))
Output
Joining, by = "Trat"
# A tibble: 3 x 4
Poly AfricanAmerican AsianAmerican Hispanic
<chr> <dbl> <dbl> <dbl>
1 Poly1 0.264 0.150 0.268
2 Poly2 0.212 0.138 0.288
3 Poly3 0.095 0.1 0.335

Multiply each value of a dataframe by a row of another dataframe searched by id

I'm new with R and I have tried a lot to solve this problem, if anyone could help me I'd be very grateful! This is my problem:
I have two data frame (df1 and df2) and what I need is to multiply each value of df1 by a row of df2 searched by id. This is an example of what I'm looking for:
df1<-data.frame(ID=c(1,2,3), x1=c(6,3,2), x2=c(2,3,1), x3=c(4,10,7))
df1
df2<-data.frame(ID=c(1,2,3), y1=c(0.01,0.02,0.05), y2=c(0.2,0.03,0.11), y3=c(0.3,0.09,0.07))
df2
#Example of what I need
df1xdf2<- data.frame(ID=c(1,2,3), r1=c(0.06,0.06,0.1), r2=c(1.2,0.09,0.22), r3=c(1.8,0.27,0.14),
r4=c(0.02,0.06,0.05),r5=c(0.4,0.09,0.11),r6=c(0.6,0.27,0.07),r7=c(0.04,0.2,0.35),r8=c(0.8,0.3,0.77),r9=c(1.2,0.9,0.49))
df1xdf2
I've tried with loops by row and column but I only get a 1x1 multiplication.
My dataframes have same number of rows, columns and factor names. My real life dataframes are much larger, both rows and columns.
Does anyone know how to solve it?

You could use lapply to multiply every column of df1 with complete df2. We can cbind the dataframes together and rename the columns
output <- do.call(cbind, lapply(df1[-1], `*`, df2[-1]))
cbind(df1[1], setNames(output, paste0("r", seq_along(output))))
# ID r1 r2 r3 r4 r5 r6 r7 r8 r9
#1 1 0.06 1.20 1.80 0.02 0.40 0.60 0.04 0.80 1.20
#2 2 0.06 0.09 0.27 0.06 0.09 0.27 0.20 0.30 0.90
#3 3 0.10 0.22 0.14 0.05 0.11 0.07 0.35 0.77 0.49

You could use the dplyr package
#Example with dplyr
require(dplyr)
# First we use merge() to join both DF
result <- merge(df1, df2, by = "ID") %>%
mutate(r1 = x1*y1,
r2 = x1*y2,
r3 = etc.)
within mutate() you can specify your new column formulas and names

An option with map
library(tidyverse)
bind_cols(df1[1], map_dfc(df1[-1], `*`, df2[-1]))
Or in base R by replicating the columns and multiplying
out <- cbind(df1[1], df1[-1][rep(seq_along(df1[-1]), each = 3)] *
df2[-1][rep(seq_along(df2[-1]), 3)])
names(out)[-1] <- paste0("r", seq_along(out[-1]))
out
# ID r1 r2 r3 r4 r5 r6 r7 r8 r9
#1 1 0.06 1.20 1.80 0.02 0.40 0.60 0.04 0.80 1.20
#2 2 0.06 0.09 0.27 0.06 0.09 0.27 0.20 0.30 0.90
#3 3 0.10 0.22 0.14 0.05 0.11 0.07 0.35 0.77 0.49

How can I get row-wise max based on condition of specific column in R dataframe?

I'm trying to get the maximum value BY ROW across several columns (climatic water deficit -- def_59_z_#) depending on how much time has passed (time since fire -- YEAR.DIFF). Here are the conditions:
If 1 year has passed, select the deficit value for first year.
(def_59_z_1).
If 2 years: max deficit of first 2 years.
If 3 years: max of deficit of first 3 years.
If 4 years: max of deficit of first 4 years.
If 5 or more years: max of first 5 years.
However, I am unable to extract a row-wise max when I include a condition. There are several existing posts that address row-wise min and max (examples 1 and 2) and sd (example 3) -- but these don't use conditions. I've tried using apply but I haven't been able to find a solution when I have multiple columns involved as well as a conditional requirement.
The following code simply returns 3.5 in the new column def59_z_max15, which is the maximum value that occurs in the dataframe -- except when YEAR.DIFF is 1, in which case def_50_z_1 is directly returned. But for all the other conditions, I want 0.98, 0.67, 0.7, 1.55, 1.28 -- values that reflect the row maximum of the specified columns. Link to sample data here. How can I achieve this?
I appreciate any/all suggestions!
data <- data %>%
mutate(def59_z_max15 = ifelse(YEAR.DIFF == 1,
(def59_z_1),
ifelse(YEAR.DIFF == 2,
max(def59_z_1, def59_z_2),
ifelse(YEAR.DIFF == 3,
max(def59_z_1, def59_z_2, def59_z_3),
ifelse(YEAR.DIFF == 4,
max(def59_z_1, def59_z_2, def59_z_3, def59_z_4),
max(def59_z_1, def59_z_2, def59_z_3, def59_z_4, def59_z_5))))))

Throw this function in an apply family function
func <- function(x) {
first.val <- x[1]
if (first.val < 5) {
return(max(x[2:(first.val+)])
} else {
return(max(x[2:6]))
}
}
Your desired output should be obtained by:
apply(data, 1, function(x) func(x)) #do it by row by setting arg2 = 1

An option would be to get the pmax (rowwise max - vectorized) for each set of conditions separately in a loop (map - if the value of 'YEAR.DIFF' is 1, select only the 'def_59_z_1', for 2, get the max of 'def_59_z_1' and 'def_59_z_2', ..., for 5, max of 'def_59_z_1' to 'def_59_z_5', coalesce the columns together and replace the rest of the NA with the pmax of all the 'def59_z" columns
library(tidyverse)
out <- map_dfc(1:5, ~
df1 %>%
select(seq_len(.x) + 1) %>%
transmute(val = na_if((df1[["YEAR.DIFF"]] == .x)*
pmax(!!! rlang::syms(names(.))), 0))) %>%
transmute(def59_z_max15 = coalesce(!!! rlang::syms(names(.)))) %>%
bind_cols(df1, .)%>%
mutate(def59_z_max15 = case_when(is.na(def59_z_max15) ~
pmax(!!! rlang::syms(names(.)[2:6])), TRUE ~ def59_z_max15))
head(out, 10)
# YEAR.DIFF def59_z_1 def59_z_2 def59_z_3 def59_z_4 def59_z_5 def59_z_max15
#1 5 0.25 -2.11 0.98 -0.07 0.31 0.98
#2 9 0.67 0.65 -0.27 0.52 0.26 0.67
#3 10 0.56 0.33 0.03 0.70 -0.09 0.70
#4 2 -0.34 1.55 -1.11 -0.40 0.94 1.55
#5 4 0.98 0.71 0.41 1.28 -0.14 1.28
#6 3 0.71 -0.17 1.70 -0.57 0.43 1.70
#7 4 -1.39 -1.71 -0.89 0.78 1.22 0.78
#8 4 -1.14 -1.46 -0.72 0.74 1.32 0.74
#9 2 0.71 1.39 1.07 0.65 0.29 1.39
#10 1 0.28 0.82 -0.64 0.45 0.64 0.28
data
df1 <- read.csv("https://raw.githubusercontent.com/CaitLittlef/random/master/data.csv")

apply a function on columns with specific names

I am new in R.
I have hundreds of data frames like this
ID NAME Ratio_A Ratio_B Ratio_C Ratio_D
AA ABCD 0.09 0.67 0.10 0.14
AB ABCE 0.04 0.85 0.04 0.06
AC ABCG 0.43 0.21 0.54 0.14
AD ABCF 0.16 0.62 0.25 0.97
AF ABCJ 0.59 0.37 0.66 0.07
This is just an example. The number and names of the Ratio_ columns are different between data frames, but all of them start with Ratio_. I want to apply a function (for example, log(x)), to the Ratio_ columns without specify the column number or the whole name.
I know how to do it df by df, for the one in the example:
A <- function(x) log(x)
df_log<-data.frame(df[1:2], lapply(df[3:6], A))
but I have a lot of them, and as I said the number of columns is different in each.
Any suggestion?
Thanks

Place the datasets in a list and then loop over the list elements
lapply(lst, function(x) {i1 <- grep("^Ratio_", names(x));
x[i1] <- lapply(x[i1], A)
x})
NOTE: No external packages are used.
data
lst <- mget(paste0("df", 1:100))

This type of problem is very easily dealt with using the dplyr package. For example,
df <- read.table(text = 'ID NAME Ratio_A Ratio_B Ratio_C Ratio_D
AA ABCD 0.09 0.67 0.10 0.14
AB ABCE 0.04 0.85 0.04 0.06
AC ABCG 0.43 0.21 0.54 0.14
AD ABCF 0.16 0.62 0.25 0.97
AF ABCJ 0.59 0.37 0.66 0.07',
header = TRUE)
library(dplyr)
df_transformed <- mutate_each(df, funs(log(.)), starts_with("Ratio_"))
df_transformed
# > df_transformed
# ID NAME Ratio_A Ratio_B Ratio_C Ratio_D
# 1 AA ABCD -2.4079456 -0.4004776 -2.3025851 -1.96611286
# 2 AB ABCE -3.2188758 -0.1625189 -3.2188758 -2.81341072
# 3 AC ABCG -0.8439701 -1.5606477 -0.6161861 -1.96611286
# 4 AD ABCF -1.8325815 -0.4780358 -1.3862944 -0.03045921
# 5 AF ABCJ -0.5276327 -0.9942523 -0.4155154 -2.65926004

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Column Mean for rows with unique values - r

Related

Advanced pivot_longer: extract pattern in variables

Best way to find weighted averages from two dataframes in R?

Multiply each value of a dataframe by a row of another dataframe searched by id

How can I get row-wise max based on condition of specific column in R dataframe?

apply a function on columns with specific names

Categories

Resources