print maximum values of each pair of columns - r

I am sorry if I am asking the question if it has already been asked, but I could not find it..
AGE<-c(25, 37, 57, 72, 48, 28, 31, 57, 43, 38)
LLS<-c(24, 1, 24, 24, 14, 21, 4, 12, 8, 1)
RLS<-c(11, 1, 14, 21, 7, 21, 22, 8, 27, 12)
dat <- data.frame(AGE, LLS, RLS)
and want to get the maximum values of column LLS AND RLS for each rows.
Please can you tell me how to do it?
Thanks.

You can try pmax
do.call(pmax, dat[-1])
#[1] 24 1 24 24 14 21 22 12 27 12
If this is for each pair of columns, you can use combn
res <- combn(names(dat),2, FUN=function(x) do.call(pmax,dat[x]))
colnames(res) <- apply(combn(names(dat),2),2, paste, collapse="_")

I believe that for each row, you want to return a single value, whichever is higher of RLS or LLS. Right?
If so, Akrun's answer is good. Alternatively you can use the handy rowMaxs() function in the matrixStats package. In my opinion it's a little more straightforward, but that's the only real advantage.
Here is code, you can combine into fewer steps, but I wrote it out to make it clear what is going on.
AGE<-c(25, 37, 57, 72, 48, 28, 31, 57, 43, 38)
LLS<-c(24, 1, 24, 24, 14, 21, 4, 12, 8, 1)
RLS<-c(11, 1, 14, 21, 7, 21, 22, 8, 27, 12)
dat <- data.frame(AGE, LLS, RLS)
Create a subset of your dataframe, including only the columns you want
dat2 <- dat[,2:3]
Turn the new dataframe into a matrix so rowMaxs() doesn't complain
dat3 <- as.matrix(dat2)
Load the matrixStats package and call rowMaxs()
library(matrixStats)
rowMaxs(dat3)
[1] 24 1 24 24 14 21 22 12 27 12

Related

Understadning Krige.bayes() output

I am struggling with the Krige.bayes() function in the GeoR package. I was hoping to create a map with my output from the function but I can't seem to find a way to do this. The online pdf (https://cran.r-project.org/web/packages/geoR/geoR.pdf) of the geoR package indicates that you can make an image using geoR::image.kriging however I get the error code 'image.kriging' is not an exported object from 'namespace:geoR' when I do this. When using ls("package:geoR") this function does not appear indicating that it has been depriocated and just not taken off the package information. This leaves me with just the output from the Krige.bayes() function and some variomodels that I have created as well. I can see that I can modify the output output.control, however I'm not sure what I can change in there to make the output more comprehendable to me. The output thaty I am getting from the Krige.bayes() is as follows.
Only samples of the posterior for the parameters will be returned.
krige.bayes: computing the discrete posterior of phi/tausq.rel
krige.bayes: argument `phi.discrete` not provided, using default values
krige.bayes: computing the posterior probabilities.
Number of parameter sets: 50
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50
krige.bayes: sampling from posterior distribution
krige.bayes: sample from the (joint) posterior of phi and tausq.rel
[,1] [,2] [,3]
phi 1.869439e-03 0.003738877 0.007477755
tausq.rel 0.000000e+00 0.000000000 0.000000000
frequency 9.940000e+02 5.000000000 1.000000000
Am I misunderstnading this output, the next step or somethign else? Thanks in advance for the help

Choose a subsample of random numbers

I will play in the Brazilian Lottery with my friends. I requested every one of them to choose seven numbers. I create a variable for all of them.
pestana = c(04, 15, 29, 36, 54, 25, 07)
carol = c(7, 22, 30, 35, 44, 51, 57)
davi = c(8, 13, 21, 29, 37, 42, 55)
valerio = c(30, 20, 33, 14, 7, 41, 54)
victor = c(09, 11, 26, 33, 38, 52, 57)
Then, I created a list with all of the numbers, and a list with unique numbers (in order to avoid repeated numbers)
list = c(carol, davi, pestana, valerio, victor, diuli, cynara)
list2 = unique(list)
Finally, I made a sample() for the list2
sample(list2, 7)
After that, I was wondering. Is it possible for me not to use the unique and not have repeated numbers? Because for instance, that way, repeated numbers have the same probability of appearing, when in fact, they have more (for instance, seven appeared three times).
How about this:
pestana = c(04, 15, 29, 36, 54, 25, 07)
carol = c(7, 22, 30, 35, 44, 51, 57)
davi = c(8, 13, 21, 29, 37, 42, 55)
valerio = c(30, 20, 33, 14, 7, 41, 54)
victor = c(09, 11, 26, 33, 38, 52, 57)
list = c(carol, davi, pestana, valerio, victor)
l <- c(unlist(list))
nums <- table(l)
probs <- nums/sum(nums)
sample(names(probs), 7, prob = probs, replace=FALSE)
#> [1] "4" "33" "44" "11" "29" "52" "8"
Created on 2022-12-14 by the reprex package (v2.0.1)
Using the prob argument, you can make some values more likely to show up than others.

re-coding data into categories R

I have a column with some numbers, each corresponding to a primary condition of a patient. I want to recode this data into types of conditions ex. neurological, psychiatric etc
Can you please help as the code below is horrendous (and also not working)
October_data_UK$Primary_cat <- ifelse(October_data_UK$PRIMARY==2|October_data_UK$PRIMARY==5|October_data_UK$PRIMARY==7|October_data_UK$PRIMARY==10|October_data_UK$PRIMARY==13|October_data_UK$PRIMARY==14|October_data_UK$PRIMARY==16|October_data_UK$PRIMARY==23|October_data_UK$PRIMARY==24|October_data_UK$PRIMARY==27,"Chronic_pain",
ifelse(October_data_UK$PRIMARY==4|October_data_UK$PRIMARY==9|October_data_UK$PRIMARY==15|October_data_UK$PRIMARY==21|October_data_UK$PRIMARY==22|October_data_UK$PRIMARY==31|October_data_UK$PRIMARY==35|October_data_UK$PRIMARY==37|October_data_UK$PRIMARY==38, "Neurological",
ifelse(October_data_UK$PRIMARY==1|October_data_UK$PRIMARY==3|October_data_UK$PRIMARY==6|October_data_UK$PRIMARY==12|October_data_UK$PRIMARY==17|October_data_UK$PRIMARY==18|October_data_UK$PRIMARY==20|October_data_UK$PRIMARY==25|October_data_UK$PRIMARY==26,October_data_UK$PRIMARY==30|October_data_UK$PRIMARY==32|October_data_UK$PRIMARY==34|October_data_UK$PRIMARY==36,"Psychiatric",
ifelse(October_data_UK$PRIMARY==8|October_data_UK$PRIMARY==11|October_data_UK$PRIMARY==19|October_data_UK$PRIMARY==33|October_data_UK$PRIMARY==28|October_data_UK$PRIMARY==29|October_data_UK$PRIMARY==39,"Other",NA))))
I just want to write all the numbers together without having to repeat "October_data_UK$PRIMARY"
Try assigning the values with logical indices.
Chronic_pain <- c(2, 5, 7, 10, 13, 14, 16, 23, 24, 27)
Neurological <- c(4, 9, 15, 21, 22, 31, 35, 37, 38)
Psychiatric <- c(1, 3, 6, 12, 17, 18, 20, 25, 26, 30, 32, 34, 36)
Other <- c(8, 11, 19, 33, 28, 29, 39)
i1 <- October_data_UK$Primary_cat %in% Chronic_pain
i2 <- October_data_UK$Primary_cat %in% Neurological
i3 <- October_data_UK$Primary_cat %in% Psychiatric
#i4 <- October_data_UK$Primary_cat %in% Other
October_data_UK$Primary_cat <- NA_character_ # this creates the column
October_data_UK$Primary_cat[i1] <- "Chronic_pain"
October_data_UK$Primary_cat[i2] <- "Neurological"
October_data_UK$Primary_cat[i3] <- "Psychiatric"

R rewriting a for loop

I've got a loop in my code that I would like to rewrite so running the code takes a little less time to compete. I know you allways have to avoid loops in the code but I can't think of an another way to accomplice my goal.
So I've got a dataset "df_1531" containing a lot of data that I need to cut into pieces by using subset() (if anyone knows a better way, let me know ;) ). I've got a vector with 21 variable names on which I like assign a subset of df_1531. Furthermore the script contains 22 variables with constrains (shift_XY_time).
So, this is my code now...
# list containing different slots
shift_time_list<- c(startdate, shift_1m_time, shift_1a_time, shift_1n_time,
shift_2m_time, shift_2a_time, shift_2n_time,
shift_3m_time, shift_3a_time, shift_3n_time,
shift_4m_time, shift_4a_time, shift_4n_time,
shift_5m_time, shift_5a_time, shift_5n_time,
shift_6m_time, shift_6a_time, shift_6n_time,
shift_7m_time, shift_7a_time, shift_7n_time)
# List with subset names
shift_sub_list <- c("shift_1m_sub", "shift_1a_sub", "shift_1n_sub",
"shift_2m_sub", "shift_2a_sub", "shift_2n_sub",
"shift_3m_sub", "shift_3a_sub", "shift_3n_sub",
"shift_4m_sub", "shift_4a_sub", "shift_4n_sub",
"shift_5m_sub", "shift_5a_sub", "shift_5n_sub",
"shift_6m_sub", "shift_6a_sub", "shift_6n_sub",
"shift_7m_sub", "shift_7a_sub", "shift_7n_sub")
# The actual loop that I'd like to rewrite
for (i in 1:21) {
assign(shift_sub_list[i], subset(df_1531, df_1531$'PLS FFM' >= shift_time_list[i] & df_1531$'PLS FFM' < shift_time_list[i+1]))
}
Running the loop takes approximately 6 or 7 seconds. So, if anyone knows a better/cleaner or quicker way to write my code, I desperately like to hear your suggestion/opinion.
**Reproducible example **
mydata <- cars
dput(cars)
structure(list(speed = c(4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11,
12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 16,
16, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 20, 20, 20, 20, 20,
22, 23, 24, 24, 24, 24, 25), dist = c(2, 10, 4, 22, 16, 10, 18,
26, 34, 17, 28, 14, 20, 24, 28, 26, 34, 34, 46, 26, 36, 60, 80,
20, 26, 54, 32, 40, 32, 40, 50, 42, 56, 76, 84, 36, 46, 68, 32,
48, 52, 56, 64, 66, 54, 70, 92, 93, 120, 85)), class = "data.frame", row.names = c(NA,
-50L))
dist_interval_list <- c( 0, 5, 10, 15,
20, 25, 30, 35,
40, 45, 50, 55,
60, 65, 70, 75,
80, 85, 90, 95,
100, 105, 110, 115, 120)
var_name_list <- c("var_name_1a", "var_name_1b", "var_name_1c", "var_name_1d",
"var_name_2a", "var_name_2b", "var_name_2c", "var_name_2d",
"var_name_3a", "var_name_3b", "var_name_3c", "var_name_3d",
"var_name_4a", "var_name_4b", "var_name_4c", "var_name_4d",
"var_name_5a", "var_name_5b", "var_name_5c", "var_name_5d",
"var_name_6a", "var_name_6b", "var_name_6c", "var_name_6d")
for (i in 1:24){
assign(var_name_list[i], subset(mydata,
mydata$dist >= dist_interval_list[i] &
mydata$dist < dist_interval_list[i+1]))
}
Starting with the 'reproducible' part and the information that the final aim is to summarize another column, it is possible to exploit the fact that the intervals are non-overlapping and simply use the cut function.
library(tidyverse)
mydata %>%
mutate(interval = cut(dist, breaks = dist_interval_list)) %>%
group_by(interval) %>%
summarise(sum = sum(speed))
This should be much faster and will also help you not to get lost in a messy environment full of variables (which are actually part of your data). You want to keep all your data in a single data frame as long as possible;) You probably want to follow with something like purrrlyr::invoke_rows at the final modeling step, if your function does not work with data frames.

R finding relative maximum from outliers

Suppose I have a vector of numbers that I want to find a general cutoff for. For example:
x <- c(35, 2, 3, 30, 1, 4, 33, 6, 36)
In this case, I would want to only extract a subset that countains 35, 30, 33, 36. In this case the cutoff would be at 30 Without hardcoding a definite cutoff, I would like my code to adapt to different vectors of numbers in order to find that cutoff.
Another example would be:
x <- c(1, 20, 42, 13, 118, 149, 130, 30, 11, 32, 120, 0.5, 0.03)
In this case, a reasonable cutoff would be around 118.
Currently I am hard coding the cutoffs because I am dealing with simple cases, however I would like to make this process more modular for more variable vectors.
You could use the quantile function
cutoff <- function(y, prob=0.7) y[y > quantile(y, prob)]
x <- c(35, 2, 3, 30, 1, 4, 33, 6, 36)
cutoff(x)
[1] 35 33 36
x <- c(1, 20, 42, 13, 118, 149, 130, 30, 11, 32, 120, 0.5, 0.03)
cutoff(x)
[1] 118 149 130 120
And you can define a different probability as desired
cutoff(x, 0.8)
[1] 149 130 120

Resources