how to sum absolute values of multiple columns in R - r

I'd like to have the sum of absolute values of multiple columns with certain characteristics, say their names end in _s.
set.seed(154)
d <- data.frame(a_s = sample(-10:10,6,replace=F),b_s = sample(-5:10,6,replace=F), c = sample(-10:5,6,replace=F))
d$s <- abs(d$a_s)+abs(d$b_s)
where the output is column s below:
a_s b_s c s
4 8 -2 12
10 6 -8 16
-10 -1 1 11
0 2 4 2
5 1 -3 6
8 -5 5 13
I can use d$ss <- rowSums(d[,grepl('_s',colnames(d))]) to sum the values but not the absolute values.

Related

Randomly select number (without repetition) for each group in R

I have the following dataframe containing a variable "group" and a variable "number of elements per group"
group elements
1 3
2 1
3 14
4 10
.. ..
.. ..
30 5
then I have a bunch of numbers going from 1 to (let's say) 30
when summing "elements" I would get 900. what I want to obtain is to randomly select a number (from 0 to 30) from 1-30 and assign it to each group until I fill the number of elements for that group. Each of those should appear 30 times in total.
thus, for group 1, I want to randomly select 3 number from 0 to 30
for group 2, 1 number from 0 to 30 etc. until I filled all of the groups.
the final table should look like this:
group number(randomly selected)
1 7
1 20
1 7
2 4
3 21
3 20
...
any suggestions on how I can achieve this?
In base R, if you have df like this...
df
group elements
1 3
2 1
3 14
Then you can do this...
data.frame(group = rep(df$group, #repeat group no...
df$elements), #elements times
number = unlist(sapply(df$elements, #for each elements...
sample.int, #...sample <elements> numbers
n=30, #from 1 to 30
replace = FALSE))) #without duplicates
group number
1 1 19
2 1 15
3 1 28
4 2 15
5 3 20
6 3 18
7 3 27
8 3 10
9 3 23
10 3 12
11 3 25
12 3 11
13 3 14
14 3 13
15 3 16
16 3 26
17 3 22
18 3 7
Give this a try:
df <- read.table(text = "group elements
1 3
2 1
3 14
4 10
30 5", header = TRUE)
# reproducibility
set.seed(1)
df_split2 <- do.call("rbind",
(lapply(split(df, df$group),
function(m) cbind(m,
`number(randomly selected)` =
sample(1:30, replace = TRUE,
size = m$elements),
row.names = NULL
))))
# remove element column name
df_split2$elements <- NULL
head(df_split2)
#> group number(randomly selected)
#> 1.1 1 25
#> 1.2 1 4
#> 1.3 1 7
#> 2 2 1
#> 3.1 3 2
#> 3.2 3 29
The split function splits the df into chunks based on the group column. We then take those smaller data frames and add a column to them by sampling 1:30 a total of elements time. We then do.call on this list to rbind back together.
Yo have to generate a new dataframe repeating $group $element times, and then using sample you can generate the exact number of random numbers:
data<-data.frame(group=c(1,2,3,4,5),
elements=c(2,5,2,1,3))
data.elements<-data.frame(group=rep(data$group,data$elements),
number=sample(1:30,sum(data$elements)))
The result:
group number
1 1 9
2 1 4
3 2 29
4 2 28
5 2 18
6 2 7
7 2 25
8 3 17
9 3 22
10 4 5
11 5 3
12 5 8
13 5 26
I solved as follow:
random_sample <- rep(1:30, each=30)
random_sample <- sample(random_sample)
then I create a df with this variable and a variable containing one group per row repeated by the number of elements in the group itself

summation for multiple columns dynamically

Hi I have dataframe with multiple columns ,I.e first 5 columns are my metadata and remaing
columns (columns count will be even) are actual columns which need to be calculated
formula : (col6*col9) + (col7*col10) + (col8*col11)
country<-c("US","US","US","US")
name <-c("A","B","c","d")
dob<-c(2017,2018,2018,2010)
day<-c(1,4,7,9)
hour<-c(10,11,2,4)
a <-c(1,3,4,5)
d<-c(1,9,4,0)
e<-c(8,1,0,7)
f<-c(10,2,5,6)
j<-c(1,4,2,7)
m<-c(1,5,7,1)
df=data.frame(country,name,dob,day,hour,a,d,e,f,j,m)
how to get final summation if i have more columns
I have tried with below code
df$final <-(df$a*df$f)+(df$d*df$j)+(df$e*df$m)
Here is one way to do generalize the computation:
x <- ncol(df) - 5
df$final <- rowSums(df[6:(5 + x/2)] * df[(ncol(df) - x/2 + 1):ncol(df)])
# country name dob day hour a d e f j m final
# 1 US A 2017 1 10 1 1 8 10 1 1 19
# 2 US B 2018 4 11 3 9 1 2 4 5 47
# 3 US c 2018 7 2 4 4 0 5 2 7 28
# 4 US d 2010 9 4 5 0 7 6 7 1 37

transposing dataframe by column, finding column minimum and returning index

I have a dataframe p1. I would like to transpose by column a. Find minimum of each row and return the column name that has the minimum value.
a=c(0,1,2,3,4,0,1,2,3,4)
b=c(10,20,30,40,50,9,8,7,6,5)
p1=data.frame(a,b)
p1
> p1
a b
1 0 10
2 1 20
3 2 30
4 3 40
5 4 50
6 0 9
7 1 8
8 2 7
9 3 6
10 4 5
The final required answer
0 1 2 3 4 row_minimum column_index_of_minimum
10 20 30 40 50 10 0
9 8 7 6 5 5 4
I used many things but the main was ave(p1$a, p1$a, FUN = seq_along) which allowed me to separate the b into groups based on the number of times they were associated with a
myans = setNames(data.frame(do.call(rbind, lapply(split(p1, ave(p1$a, p1$a, FUN = seq_along)),
function(x) x[,2]))), nm = rbind(p1$a[ave(p1$a, p1$a, FUN = seq_along) == 1]))
minimum = apply(myans, 1, min)
index = colnames(myans)[apply(myans, 1, which.min)]
myans$min = minimum
myans$index = index
myans
# 0 1 2 3 4 min index
#1 10 20 30 40 50 10 0
#2 9 8 7 6 5 5 4
Consider using a running group count followed by an aggregate and reshape:
# RUNNING GROUP COUNT
p1$grpcnt <- sapply(seq(nrow(p1)), function(i) sum(p1[1:i, c("a")]==p1$a[[i]]))
# MINIMUM OF B BY GROUP COUNT MERGING TO RETRIEVE A VALUE
aggdf <- setNames(merge(aggregate(b~grpcnt, p1, FUN=min),p1,by="b")[c("grpcnt.x","b","a")],
c("grpcnt", "row_minimum", "column_index_of_minimum"))
# RESHAPE/TRANSPOSE LONG TO WIDE
reshapedf <- setNames(reshape(p1, timevar=c("a"), idvar=c("grpcnt"), direction="wide"),
c("grpcnt", paste(unique(p1$a))))
# FINAL MERGE
finaldf <- merge(reshapedf, aggdf, by="grpcnt")[-1]
finaldf
# 0 1 2 3 4 row_minimum column_index_of_minimum
# 1 10 20 30 40 50 10 0
# 2 9 8 7 6 5 5 4

Compute difference between rows in R and setting in zero first difference

Hi everybody I am trying to solve a little problem in R. I want to compute the difference between rows in a dataframe in R. My dataframe looks like this:
df <- data.frame(ID=1:8, x2=8:1, x3=11:18, x4=c(2,4,10,0,1,1,9,12))
I want to create a new column named diff.var. This column saves the results of differences from rows in variable. One posibble solution is using diff() function. When I used this function I got this:
diff(df$x4)
[1] 2 6 -10 1 0 8 3
That works fine but when I try to apply in my dataframe using df$diff.var=diff(df$x4) I got this:
Error in `$<-.data.frame`(`*tmp*`, "diff.var", value = c(2, 6, -10, 1, :
replacement has 7 rows, data has 8
Due to the fact that the firs row doesn't have a previous row to compute the difference I want to set this in zero. I would like to get something this:
ID x2 x3 x4 diff.var
1 8 11 2 0
2 7 12 4 2
3 6 13 10 6
4 5 14 0 -10
5 4 15 1 1
6 3 16 1 0
7 2 17 9 8
8 1 18 12 3
Where the first element of diff.var is zero due to this element doesn't have a previous element. I would like to build a function to set firts element of diff.var is zero and that makes the differences for the next rows. I wish to create a new dataframe with all variables and diff.var because ID is used por posterior analysis with diff.var. diff() doesn't allow to create this new variable. Thanks for your help.
This question was already asked before in this forum and can be found elsewhere. Anyway, do what Frank suggests
df <- data.frame(ID=1:8, x2=8:1, x3=11:18, x4=c(2,4,10,0,1,1,9,12))
df$vardiff <- c(0, diff(df$x4))
df
ID x2 x3 x4 vardiff
1 1 8 11 2 0
2 2 7 12 4 2
3 3 6 13 10 6
4 4 5 14 0 -10
5 5 4 15 1 1
6 6 3 16 1 0
7 7 2 17 9 8
8 8 1 18 12 3

In R, how do I set a value for a variable based on the change from the prior (or following) row?

Given a data frame as follows:
id<-c(1,1,1,1,1,1,2,2,2,2,2,2)
t<-c(6,8,9,11,12,14,55,57,58,60,62,63)
p<-c("a","a","a","b","b","b","a","a","b","b","b","b")
df<-data.frame(id,t,p)
row id t p
1 1 6 a
2 1 8 a
3 1 9 a
4 1 11 b
5 1 12 b
6 1 14 b
7 2 55 a
8 2 57 a
9 2 58 b
10 2 60 b
11 2 62 b
12 2 63 b
I want to create a new variable 'ta' such that the value of ta is:
Zero for the row in which 'p' changes from a to b for a given ID (rows 4 and 9) (this I can do)
Within each unique id, when p is 'a', the value of ta should count down from zero by the change in t between the row in question and the row above it. For example, for row 3, the value of ta should be 0 - (11-9) = -2.
Within each unique id, when p is 'b', the value of ta should count up from zero by the change in t between the row in question and the row below it. For example, for row 5, the value of ta should be 0 + (12-11) = 1.
Thus, when complete, the data frame should look as follows:
row id t p ta
1 1 6 a -5
2 1 8 a -3
3 1 9 a -2
4 1 11 b 0
5 1 12 b 1
6 1 14 b 3
7 2 55 a -3
8 2 57 a -1
9 2 58 b 0
10 2 60 b 2
11 2 62 b 4
12 2 63 b 5
I've been playing around with loops and cumsum() and head() and tail() and can't quite make this kind of within id/within condition summing work. There are a number of other questions about working with values from previous or following rows, but I can't quite reshape any of those techniques to work here. Your thoughts are greatly appreciated.
Here you go. This is a split-apply-combine strategy of breaking everything up by id, establishing the transition point between p=='a' and p=='b' and then subtracting values above and below that. It only works if your data are actually ordered in the way you show them here.
do.call('rbind',
lapply(split(df, id), function(x) {
# save values of `0` at transition points in `p`
x <- cbind.data.frame(x, ta=ifelse(c(0,diff(as.numeric(as.factor(x$p))))==1, 0, NA))
# identify indices for those points
w <- which(x$ta==0)
# handle `ta` values for `p=='b'`
x$ta[(w+1):nrow(x)] <- x$ta[w] + (x$t[(w+1):nrow(x)] - x$t[w])
# handle `ta` values for `p=='a'`
x$ta[1:(w-1)] <- x$ta[w] - (x$t[w] - x$t[1:(w-1)])
return(x)
})
)
Result:
id t p ta
1.1 1 6 a -5
1.2 1 8 a -3
1.3 1 9 a -2
1.4 1 11 b 0
1.5 1 12 b 1
1.6 1 14 b 3
2.7 2 55 a -3
2.8 2 57 a -1
2.9 2 58 b 0
2.10 2 60 b 2
2.11 2 62 b 4
2.12 2 63 b 5

Resources