R sum values in a column but exclude lesser of specific values - r

I have the following data:
>str(Maximum)
num [1:6] 1.4 1.07 1.89 0.342 0.00 1.998
I want to sum all of these values but between the second and third, and fourth and fifth, only the greater value. So in this case I'm looking for 1.4 + 1.89 + 0.342 + 1.998. How would I go about doing this in R code?

First element, plus maximum of (2,3) plus maximum of (4,5) + the 6th element.
maximum[1] + max(maximum[2:3]) + max(maximum[4:5]) + maximum[6]

If your vector Maximum always has 6 element, Florian's answer is the simplest way to do it. But if your vector is longer then you could do:
z1 = Maximum[seq(from = 2, to = length(Maximum)-1, by = 2)]
z2 = Maximum[seq(from = 3, to = length(Maximum)-1, by = 2)]
z3 = ifelse(z1>z2, z1, z2)
result = Maximum[1] + sum(z3) + Maximum[length(Maximum)]
For example:
Maximum = floor(runif(22, 1, 100))
> Maximum
[1] 96 6 1 10 90 15 58 48 94 97 78 95 42 79 61 25 61 74 93 37 44 22
z1 would be the elements at even indexes (excluding ends):
> z1
[1] 6 10 15 48 97 95 79 25 74 37
z2 the elements at odd indexes (excluding ends):
> z2
[1] 1 90 58 94 78 42 61 61 93 44
and z3 the maximum value between z1 and z2 for each index:
> z3
[1] 6 90 58 94 97 95 79 61 93 44
And then calculate the result by adding z3 and the start and end of Maximum
Note: the Maximum vector should have an even number of elements.

You can specify the positions in a vector using <vector.name>[<positions>]. Moreover, you can specify positions to skip using -. Thus,
Maximum <- c(1.4, 1.07, 1.89, 0.342, 0.00, 1.998)
Maximum[-c(2,5)]
# [1] 1.400 1.890 0.342 1.998
sum( Maximum[-c(2,5)] )
# [1] 5.63

A common method here is to use tapply to perform "group operations" and then aggregate the intermediate values.
vec <- c(1.4, 1.07, 1.89, 0.342, 0.00, 1.998)
group <- c(1, 2, 2, 3, 3, 4)
Here, calculate the max of each group
tapply(vec, group, max)
1 2 3 4
1.400 1.890 0.342 1.998
Then you can sum the resulting values
sum(tapply(vec, group, max))
[1] 5.63
One way to construct the group variable dynamically would be using rep a couple times like this.
reps <- c(1, rep(2, (length(vec) / 2) - 1), 1)
rep(seq_along(reps), reps)
[1] 1 2 2 3 3 4

Related

How do I use for loop to find unique values in all columns in a Dataframe

I want to find out the unique values in every column in the dataframe using a for loop. Using names(df) stores the column names to a character datatype, which doesn't work in this case.
This may be what you're looking for:
set.seed(123)
df <- data.frame(a = sample(1:100, 20),
b = sample(LETTERS, 20),
c = round(runif(20),2))
for(i in colnames(df)){
cat("Unique values in", i, ":", unique(df[,i]), "\n")
}
Output:
#Unique values in a : 31 79 51 14 67 42 50 43 97 25 90 69 57 9 72 26 7 95 87 36
#Unique values in b : N Q K G U L O J M W I P S Y X E R V C F
#Unique values in c : 0.75 0.9 0.37 0.67 0.09 0.38 0.27 0.81 0.45 0.79 0.44 0.63 0.71 0 0.48 0.22

how to calculate standard deviation of values in 10 intervals?

I want to calculate a standard deviation step by 10 in R; for example
For a large number of values, I want to calculate the SD of the values in 10 intervals. 0-10, 10-20, 20-30 ...
Example: I have a vector of :
exemple <- seq (0,100,10)
If I do sd (example) : I have the value of standard deviation but for all values in example.
But, how can I do to calculate the standard deviation to this example selecting 10 by 10 steps ?
But instead of calculating the standard deviation of all these values, I want to calculate it between 0 and 10, between 10 and 20, between 20 and 30 etc…
I precise in interval 0-10 : we have values, in intervals 10-20, we have also values.. etc.
exemple2 0 to 10, we have values : 0.2, 0.3, 0.5, 0.7, 0.6, 0.7, 0.03, 0.09, 0.1, 0.05
An image for more illustrations :
Can someone help me please ?
You may use cut/findInterval to divide the data into groups and take sd of each group.
set.seed(123)
vec <- runif(100, max = 100)
tapply(vec, cut(vec, seq(0,100,10)), sd)
# (0,10] (10,20] (20,30] (30,40] (40,50] (50,60] (60,70] (70,80] (80,90] (90,100]
#3.438162 2.653866 2.876299 2.593230 2.353325 2.755474 2.454519 3.282779 3.658064 3.021508
Here is a solution using dplyr:
library(dplyr)
## Create a random a dataframe with a random variable with 1000 values between 1 and 100
df <- data.frame(x = runif(1000, 1, 100)
## Create a grouping variables, binning by 10
df$group <- findInterval(df$x, seq(10, 100, by=10))
## Calculate SD by group
df %>%
group_by(group) %>%
summarise(Std.dev = sd(x))
# A tibble: 10 x 2
group St.dev
* <int> <dbl>
1 0 2.58
2 1 2.88
3 2 2.90
4 3 2.71
5 4 2.84
6 5 2.90
7 6 2.88
8 7 2.68
9 8 2.98
10 9 2.89

Two columns from a single column values in r

I have data of single column and want to convert into two columns:
beta
2
.002
52
.06
61
0.09
70
0.12
85
0.92
I want into two col as:
col1 col2
2 0.002
52 0.06
61 0.09
70 0.12
85 0.92
Can anyone please help me sort this out????
We can unlist the dataframe and convert it into the matrix of nrow/2 rows
data.frame(matrix(unlist(df), nrow = nrow(df)/2, byrow = T))
# X1 X2
#1 2 0.002
#2 52 0.060
#3 61 0.090
#4 70 0.120
#5 85 0.920
We can do a logical index and create two columns
i1 <- c(TRUE, FALSE)
df2 <- data.frame(col1 = df1$beta[i1], col2 = df1$beta[!i1])

How can I specify probability values when invoking quantiles within apply in R?

I have an R script that I am using to create a new data matrix consisting of bin values from a matrix of continuous data. Right now it works fine using this command:
quant.mat <- apply(input.dat,2,quantile)
But this gives me the standard quantile distribution of 0.25, 0.5, and 0.75. What I want is to be able to arbitrary specify different values (e.g. 0.2, 0.4, 0.6, 0.8). I can't seem to make it work.
You can pass arbitrary probability values to the probs parameter, for example, if you have a random data frame:
input.dat
A B C D
1 78 12 43 12
2 23 12 42 13
3 14 42 11 99
4 49 94 27 72
apply(input.dat, 2, quantile, probs = c(0.2, 0.4, 0.6, 0.8))
A B C D
20% 19.4 12.0 20.6 12.6
40% 28.2 18.0 30.0 24.8
60% 43.8 36.0 39.0 60.2
80% 60.6 62.8 42.4 82.8
you can pass the cuts that you like to the quantile function. For example:
values<-runif(100,0,1)
quantile(values, c(0.2,0.4,0.6,0.8))
Does it answer your question?

R Programming issue intervals

I'm trying to figure out a formula to be able to divide the max and min number inside the intervals.
x <- sample(10:40,100,rep=TRUE)
factorx<- factor(cut(x, breaks=nclass.Sturges(x)))
xout<-as.data.frame(table(factorx))
xout<- transform(xout, cumFreq = cumsum(Freq), relative = prop.table(Freq))
Using the above code in the R editor program, I get the following:
xout
factorx Freq cumFreq relative
1 (9.97,13.8] 14 14 0.14
2 (13.8,17.5] 13 27 0.13
3 (17.5,21.2] 16 43 0.16
4 (21.2,25] 5 48 0.05
5 (25,28.8] 11 59 0.11
6 (28.8,32.5] 8 67 0.08
7 (32.5,36.2] 16 83 0.16
8 (36.2,40] 17 100 0.17
What I want to know is if there is a way to calculate the interval. For example it would be:
(13.8 + 9.97)/2
It's called the class midpoint in statistics I believe.
Here's a one-liner that is probably close to what you want:
> sapply(strsplit(levels(xout$factorx), ","), function(x) sum(as.numeric(gsub("[[:space:]]", "", chartr(old = "(]", new = " ", x))))/2)
[1] 11.885 15.650 19.350 23.100 26.900 30.650 34.350 38.100
#One possible solution is to split by (,] (xout is your dataframe)
x1<-strsplit(as.character(xout$factorx),",|\\(|]")
x2<-do.call(rbind,x1)
xout$lower=as.numeric(x2[,2])
xout$higher=as.numeric(x2[,3])
xout$ave<-rowMeans(xout[,c("lower","higher")])
> head(xout,3)
factorx Freq cumFreq relative higher lower aver
1 (9.97,13.7] 15 15 0.15 13.7 9.97 11.835
2 (13.7,17.5] 14 29 0.14 17.5 13.70 15.600
3 (17.5,21.2] 12 41 0.12 21.2 17.50 19.350

Resources