R - extract minimum and maximum values - r

I have a list something like this:
my_data<- list(c(dummy= 300), structure(123.7, .Names = ""),
structure(143, .Names = ""), structure(113.675, .Names = ""),
structure(163.75, .Names = ""), structure(656, .Names = ""),
structure(5642, .Names = ""), structure(1232, .Names = ""))
I want the minimun and maximum values from this list
I have tried using
min(my_data)
max(my_data)
But I get an error: Error in min(weighted_mae) : invalid 'type' (list) of argument
typeof(my_data) #[1] "list"
class(my_data) #[1] "list"
What is the right way for getting the minimum and maximum from my_data?

You could do:
my_data |>
unlist(use.names = FALSE) |>
range()
The following is the same, without piping:
range(unlist(my_data, use.names = FALSE))
If you want to get minimum and maximum values separately, then you could do:
min(unlist(my_data, use.names = FALSE))
max(unlist(my_data, use.names = FALSE))

Related

execute different functions considering output in r

Let's say I have 2 different functions to apply. For example, these functions are max and min . After applying bunch of functions I am getting outputs below. I want to assign a function to each output.
Here is my data and its structure.
data<-structure(list(Apr = structure(list(`a1` = structure(list(
date = c("04-01-2036", "04-02-2036", "04-03-2036"), value = c(0,
3.13, 20.64)), .Names = c("date", "value"), row.names = 92:94, class = "data.frame"),
`a2` = structure(list(date = c("04-01-2037", "04-02-2037",
"04-03-2037"), value = c(5.32, 82.47, 15.56)), .Names = c("date",
"value"), row.names = 457:459, class = "data.frame")), .Names = c("a1",
"a2")), Dec = structure(list(`d1` = structure(list(
date = c("12-01-2039", "12-02-2039", "12-03-2039"), value = c(3,
0, 11)), .Names = c("date", "value"), row.names = 1431:1433, class = "data.frame"),
`d2` = structure(list(date = c("12-01-2064", "12-02-2064",
"12-03-2064"), value = c(0, 5, 0)), .Names = c("date", "value"
), row.names = 10563:10565, class = "data.frame")), .Names = c("d1",
"d2"))), .Names = c("Apr", "Dec"))
I applied these functions:
drop<-function(y){
lapply(y, function(x)(x[!(names(x) %in% c("date"))]))
}
q1<-lapply(data, drop)
q2<-lapply(q1, function(x) unlist(x,recursive = FALSE))
daily_max<-lapply(q2, function(x) lapply(x, max))
dailymax <- data.frame(matrix(unlist(daily_max), nrow=length(daily_max), byrow=TRUE))
row.names(dailymax)<-names(daily_max)
max_value <- apply(dailymax, 1, which.max)
And I'm getting
Apr Dec
2 1
And I am applying any random function to both Apr[2] and Dec[1] like:
Map(function(x, y) sum(x[[y]]), q2, max_value)
So, the function will be executed considering the outputs (to Apr's second element which is a1, Dec's first element which is a2.) As you can see, there are outputs as numbers 1 and 2.
What I want
What I want is assigning specific functions to 1 and 2. If output is 1 then max function; if it is 2, min function will be executed. In conclusion, max function will be applied to Apr[2] and min function will be applied to Dec[1].
I will get this:
min(q2$Apr$a2.value)
[1] 5.32
max(q2$Dec$d2.value)
[1] 5
How can I achieve this automatically for all my functions?
You can take help of switch here to apply a function based on number in max_value.
apply_function <- function(x, num) switch(num, `1` = max, `2` = min)(x)
Map(function(x, y) apply_function(x[[y]], y), q2, max_value)
#$Apr
#[1] 5.32
#$Dec
#[1] 11
Map returns a list if you want a vector output use mapply.

Count Backwards in String until pattern R

I'm trying to extract UPCs from item descriptions. There is a varying number of /'s in the front of the description, but the UPC is always right before the last /, so I was using a count of characters, however, there is a variable number of characters at the end based on pack size. In the replication, you can see on the first row what this is supposed to look like at the end, but the second row has dropped the first digit of the UPC and picked up the /. Looking for a way to do this inline with DPLYR. My original code is under the replication.
test <- structure(list(Month = structure(c(17987, 17987), class = "Date"),store_id = c("7005", "7005"), UPC = c("000004150860081","00001200050404/"), `Item Description` = c("ACQUA PANNA SPRING WATER/EACH/000004150860081/1","AQUAFINA 24PK/24PK/000001200050404/24"), `Cals Item Description` = c(NA_character_,NA_character_), `Sub-Category` = c(NA_character_, NA_character_), Category = c(NA_character_, NA_character_), Department = c(NA_character_,NA_character_), `Sales Dollars` = c(17.43, 131.78), Units = c(7,528), Cost = c(8.4, 112.2), `Gross Margin` = c(9.03, 19.58), `Gross Margin %` = c(0.5181, 0.1486)), row.names = c(NA,-2L), class = c("tbl_df", "tbl", "data.frame"))
foo <- list.files(pattern = "*.csv", full.names = T) %>%
map_df(~read_csv(.)) %>%
mutate(date = lubridate::mdy(str_sub(textbox43, start = -10))) %>%
mutate(store_id = str_sub(textbox6, start = 1, end = 4)) %>%
mutate(item_desc = textbox57) %>%
filter(!is.na(item_desc), item_desc != "") %>%
mutate(dollars = textbox58,
units = textbox59,
cost = textbox61,
gm = textbox66,
gm_pct = textbox67) %>%
mutate(UPC = str_sub(item_desc, start = -17, end = -3))
Is this what you want?
sub("^.*/([^/]+)/[^/]*$",
"\\1",
test$`Item Description`)
Returns:
[1] "000004150860081" "000001200050404"
Edit: You were asking for dplyr style:
test %>%
mutate(item_id = sub("^.*/([^/]+)/[^/]*$",
"\\1",
test$`Item Description`))

Automatically split function output (list) into component data.frames

I have a functions which yields 2 dataframes. As functions can only return one object, I combined these dataframes as a list. However, I need to work with both dataframes separately. Is there a way to automatically split the list into the component dataframes, or to write the function in a way that both objects are returned separately?
The function:
install.packages("plyr")
require(plyr)
fun.docmerge <- function(x, y, z, crit, typ, doc = checkmerge) {
mergedat <- paste(deparse(substitute(x)), "+",
deparse(substitute(y)), "=", z)
countdat <- nrow(x)
check_t1 <- data.frame(mergedat, countdat)
z1 <- join(x, y, by = crit, type = typ)
countdat <- nrow(z1)
check_t2 <- data.frame(mergedat, countdat)
doc <- rbind(doc, check_t1, check_t2)
t1<-list()
t1[["checkmerge"]]<-doc
t1[[z]]<-z1
return(t1)
}
This is the call to the function, saving the result list to the new object results.
results <- fun.docmerge(x = df1, y = df2, z = "df3", crit = c("id"), typ = "left")
In the following sample data to replicate the problem:
df1 <- structure(list(id = c("XXX1", "XXX2", "XXX3",
"XXX4"), tr.isincode = c("ISIN1", "ISIN2",
"ISIN3", "ISIN4")), .Names = c("id", "isin"
), row.names = c(NA, 4L), class = "data.frame")
df2 <- structure(list(id= c("XXX1", "XXX5"), wrong= c(1L,
1L)), .Names = c("id", "wrong"), row.names = 1:2, class = "data.frame")
checkmerge <- structure(list(mergedat = structure(integer(0), .Label = character(0), class = "factor"),
countdat = numeric(0)), .Names = c("mergedat", "countdat"
), row.names = integer(0), class = "data.frame")
In the example, a list with the dataframes df3 and checkmerge are returned. I would need both dataframes separately. I know that I could do it via manual assignment (e.g., checkmerge <- results$checkmerge) but I want to eliminate manual changes as much as possible and am therefore looking for an automated way.

creating a new variable based on string matching

I have the following dataframe:
df <- data.frame(Sample_name = c("01_00H_NA_DNA", "01_00H_NA_RNA", "01_00H_NA_S", "01_00H_NW_DNA", "01_00H_NW_RNA", "01_00H_NW_S", "01_00H_OM_DNA", "01_00H_OM_RNA", "01_00H_OM_S", "01_00H_RL_DNA", "01_00H_RL_RNA", "01_00H_RL_S"),
Pair = c("","", "S1","","","S2","","","S3","", "","S5"))
I am trying to create a new variable treatment based on sample_name. I used the following code:
df$treatment <- ifelse(grep("_NA_", df$sample_name, ignore.case = T), "nat",
ifelse(grep("_NW_", df$sample_name, ignore.case = T), "natH2",
ifelse(grep("_RL_", df$sample_name, ignore.case = T), "RNALat",
ifelse(grep("_OM_", df$sample_name, ignore.case = T ), "Om"))))
I don't understand what I am doing wrong here, I got an error saying
Error in $<-.data.frame(*tmp*, "treatment", value = logical(0)) :
replacement has 0 rows, data has 12
Any suggestions?
Got the answer, added grepl to each grep statement:
df$treatment <- ifelse(grepl("_NA_", df$sample_name, ignore.case = T), "nat",
ifelse(grepl("_NW_", df$sample_name, ignore.case = T ), "natH2",
ifelse(grepl("_RL_", df$sample_name, ignore.case = T), "RNALat",
ifelse(grepl("_OM_", df$sample_name, ignore.case = T ), "Om", "NA"))))

Arithmetic on summarized dataframe from dplyr in R

I have a large dataset I use dplyr() summarize to generate some means.
Occasionally, I would like to perform arithmetic on that output.
For example, I would like to get the mean of means from the output below, say "m.biomass".
I've tried this mean(data.sum[,7]) and this mean(as.list(data.sum[,7])). Is there a quick and easy way to achieve this?
data.sum <-structure(list(scenario = c("future", "future", "future", "future"
), state = c("fl", "ga", "ok", "va"), m.soc = c(4090.31654013689,
3654.45350562628, 2564.33199749487, 4193.83388887064), m.npp = c(1032.244475,
821.319385, 753.401315, 636.885535), sd.soc = c(56.0344229400332,
97.8553643582118, 68.2248389927858, 79.0739969429246), sd.npp = c(34.9421782033153,
27.6443555578531, 26.0728757486901, 24.0375040705595), m.biomass = c(5322.76631158111,
3936.79457763176, 3591.0902359206, 2888.25308402464), sd.m.biomass = c(3026.59250918009,
2799.40317348016, 2515.10516340438, 2273.45510178843), max.biomass = c(9592.9303,
8105.109, 7272.4896, 6439.2259), time = c("1980-1999", "1980-1999",
"1980-1999", "1980-1999")), .Names = c("scenario", "state", "m.soc",
"m.npp", "sd.soc", "sd.npp", "m.biomass", "sd.m.biomass", "max.biomass",
"time"), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4), vars = list(quote(scenario)), labels = structure(list(
scenario = "future"), class = "data.frame", row.names = c(NA,
-1), vars = list(quote(scenario)), drop = TRUE, .Names = "scenario"), indices = list(0:3))
We can use [[ to extract the column as a vector; as mean only works on a vector or a matrix -- not on a data.frame. If the OP wanted to do this on a single column, use this:
mean(data.sum[[7]])
#[1] 3934.726
If there was only the data.frame class, the data.sum[,7] would be extracting it as a vector, but the tbl_df prevents it to collapse it to vector
For multiple columns, the dplyr also has specialised functions
data.sum %>%
summarise_each(funs(mean), 3:7)

Resources