Creating new column names using dplyr across and .names - r

I have the following data frame:
df <- data.frame(A_TR1=sample(10:20, 8, replace = TRUE),A_TR2=seq(2, 16, by=2), A_TR3=seq(1, 16, by=2),
B_TR1=seq(1, 16, by=2),B_TR2=seq(2, 16, by=2), B_TR3=seq(1, 16, by=2))
> df
A_TR1 A_TR2 A_TR3 B_TR1 B_TR2 B_TR3
1 11 2 1 1 2 1
2 12 4 3 3 4 3
3 18 6 5 5 6 5
4 11 8 7 7 8 7
5 17 10 9 9 10 9
6 17 12 11 11 12 11
7 14 14 13 13 14 13
8 11 16 15 15 16 15
What I would like to do, is subtract B_TR1 from A_TR1, B_TR2 from A_TR2, and so on and create new columns from these, similar to below:
df$x_TR1 <- (df$A_TR1 - df$B_TR1)
df$x_TR2 <- (df$A_TR2 - df$B_TR2)
df$x_TR3 <- (df$A_TR3 - df$B_TR3)
> df
A_TR1 A_TR2 A_TR3 B_TR1 B_TR2 B_TR3 x_TR1 x_TR2 x_TR3
1 12 2 1 1 2 1 11 0 0
2 11 4 3 3 4 3 8 0 0
3 19 6 5 5 6 5 14 0 0
4 13 8 7 7 8 7 6 0 0
5 12 10 9 9 10 9 3 0 0
6 16 12 11 11 12 11 5 0 0
7 16 14 13 13 14 13 3 0 0
8 18 16 15 15 16 15 3 0 0
I would like to name these columns "x TR1", "x TR2", etc. I tried to do the following:
xdf <- df%>%mutate(across(starts_with("A_TR"), -across(starts_with("B_TR")), .names="x TR{.col}"))
However, I get an error in mutate():
attempt to select less than one element in integerOneIndex
I also don't know how to create the proper column names, in terms of getting the numbers right -- I am not even sure the glue() syntax allows for it. Any help appreciated here.

We could use .names in the first across to replace the substring 'a' with 'x' from the column names (.col) while subtracting from the second set of columns
library(dplyr)
library(stringr)
df <- df %>%
mutate(across(starts_with("A_TR"),
.names = "{str_replace(.col, 'A', 'x')}") -
across(starts_with("B_TR")))
-output
df
A_TR1 A_TR2 A_TR3 B_TR1 B_TR2 B_TR3 x_TR1 x_TR2 x_TR3
1 10 2 1 1 2 1 9 0 0
2 10 4 3 3 4 3 7 0 0
3 16 6 5 5 6 5 11 0 0
4 12 8 7 7 8 7 5 0 0
5 20 10 9 9 10 9 11 0 0
6 19 12 11 11 12 11 8 0 0
7 17 14 13 13 14 13 4 0 0
8 14 16 15 15 16 15 -1 0 0

Related

Fill zeros for missing values in R

I am trying to deal with this problem.
I have a df with a date column and I want to count the occurences per hour. Here is what I've done:
x <- df %>%
mutate(hora = hour(date)) %>%
select(hora) %>%
count(hora)
that gives as a result:
> x
# A tibble: 19 x 2
hora n
<int> <int>
1 0 1
2 1 1
3 3 1
4 8 4
5 9 7
6 10 10
7 11 14
8 12 10
9 13 8
10 14 4
11 15 5
12 16 12
13 17 4
14 18 12
15 19 9
16 20 5
17 21 2
18 22 4
19 23 4
As you can see, there are hours that don't show up that would have n=0, like 2 or 4:7. What I want is it to add the hours that are not in x with n=0 so the table is complete.
The expected output should be something like this:
hora n
1 0 12
2 1 3
3 2 5
4 3 7
5 4 8
6 5 1
7 6 0
8 7 11
9 8 6
10 9 10
11 10 9
12 11 0
13 12 0
14 13 3
15 14 0
16 15 7
17 16 8
18 17 1
19 18 2
20 19 11
21 20 6
22 21 10
23 22 9
24 23 4
I tried creating a table with hours 0:23 and all n=0 and trying to sum the two tables but obviously that didn't work. I also tried x$hour <- 0:23, thinking that the missing values would be added, but it didn't work as well.
You could convert hora to factor and use .drop = FALSE in count
library(dplyr)
library(lubridate)
df %>%
mutate(hora = factor(hour(date), levels = 0:23)) %>%
count(hora, .drop = FALSE)
Another option is to use complete :
df %>%
mutate(hora = hour(date)) %>%
count(hora) %>%
tidyr::complete(hora = 0:23, fill = list(n = 0))
A solution in Base R merges a vector of hours with the summarized data, and sets the missing counts to 0.
textFile <- "row hour count
1 0 1
2 1 1
3 3 1
4 8 4
5 9 7
6 10 10
7 11 14
8 12 10
9 13 8
10 14 4
11 15 5
12 16 12
13 17 4
14 18 12
15 19 9
16 20 5
17 21 2
18 22 4
19 23 4"
data <- read.table(text = textFile,header = TRUE)[-1]
hours <- data.frame(hour = 0:23)
merged <- merge(data,hours,all.y = TRUE)
merged[is.na(merged$count),"count"] <- 0
...and the output:
> head(merged)
hour count
1 0 1
2 1 1
3 2 0
4 3 1
5 4 0
6 5 0
>

How to merge two data frames by ranges in R?

Suppose I have two data frames such like:
set.seed(123)
df0<-data.frame(pos=3:12,
count0=rbinom(10, 50, 0.5),
count2=rbinom(10, 20, 0.5))
df0
pos count0 count2
1 3 23 14
2 4 28 10
3 5 24 11
4 6 29 10
5 7 30 7
6 8 19 13
7 9 25 8
8 10 29 6
9 11 25 9
10 12 25 14
df1<-data.frame(start=c(4, 7, 11, 14),
end=c(6, 9, 12, 15),
cnv=c(1, 2, 3, 4))
df1
start end cnv
1 4 6 1
2 7 9 2
3 11 12 3
4 14 15 4
What I want is to merge df0 and df1 using the df0$pos with the ranges ofdf1$start and df1$end. If the pos falls into the range of start:end, fills in the cnv from df1 otherwise set cnv as zeros. An output from the above example would be:
pos count0 count2 cnv
1 3 23 14 0
2 4 28 10 1
3 5 24 11 1
4 6 29 10 1
5 7 30 7 2
6 8 19 13 2
7 9 25 8 2
8 10 29 6 0
9 11 25 9 3
10 12 25 14 3
We can use sapply to find if there is an index which is present in range else return 0.
df0$cnv <- sapply(df0$pos, function(x) {
inds <- x >= df1$start & x <= df1$end
if (any(inds))
df1$cnv[inds]
else 0
})
df0
# pos count0 count2 cnv
#1 3 23 14 0
#2 4 28 10 1
#3 5 24 11 1
#4 6 29 10 1
#5 7 30 7 2
#6 8 19 13 2
#7 9 25 8 2
#8 10 29 6 0
#9 11 25 9 3
#10 12 25 14 3

Need to count items from a tables

I have this DF (partially shown) with 15 categories in the first column and each cell has number between 1 and 15. Actually this is just a small example, The 15 categories are repeated with their different numbers in the other columns
What I need is to have a 16x15 matrix with the count of appearances of the values as follows.
I can program this in an old fashion with IFs etc but I am kind of lost using R
I hope this is clear.
Any advise is welcome
EDITED AS REQUESTED (I apology not to be clear)
RESULTADOS DF
PREOCUPACIÓN 13 15 4 4 1 8 3 1
TRISTEZA 15 13 2 5 4 14 6 6
PERDIDA 4 11 3 2 14 12 7 10
ANGUSTIA 14 10 11 3 2 13 1 2
IMPOTENCIA 1 8 9 6 5 5 5 4
MUERTE 2 1 14 14 15 6 13 15
ENOJO 12 7 10 8 6 7 12 5
INJUSTICIA 3 9 12 7 12 2 14 13
AUSENCIA 11 14 6 1 8 11 11 11
DOLOR 5 12 5 9 7 15 8 8
CORRUPCIÓN 8 6 15 13 11 3 15 12
MIEDO 9 3 13 10 3 10 9 3
SECUESTRO 10 2 1 11 9 4 4 14
INSEGURIDAD 7 4 7 15 10 1 10 9
DESESPERACIÓN 6 5 8 12 13 9 2 7
PREOCUPACIÓN 14 2 5 4 3 8 8 7
TRISTEZA 5 7 1 8 7 9 13 9
PERDIDA 2 6 6 12 2 10 6 10
ANGUSTIA 13 3 15 9 8 11 7 4
IMPOTENCIA 12 11 7 5 10 12 12 1
MUERTE 3 10 14 2 13 13 9 2
ENOJO 11 5 10 10 11 7 11 5
INJUSTICIA 7 13 2 6 15 14 10 6
AUSENCIA 8 1 9 11 1 6 4 12
DOLOR 6 8 8 13 9 3 3 3
CORRUPCIÓN 10 15 3 14 14 15 5 11
MIEDO 9 4 13 15 4 4 14 8
SECUESTRO 4 9 11 1 12 5 15 13
INSEGURIDAD 1 12 4 7 6 1 1 14
DESESPERACIÓN 15 14 12 3 5 2 2 15
PREOCUPACIÓN 13 10 4 1 7 4 11 2
TRISTEZA 15 11 11 2 9 3 12 8
PERDIDA 2 15 7 4 15 7 3 13
ANGUSTIA 8 13 5 3 6 1 7 1
IMPOTENCIA 10 4 8 5 12 10 13 3
MUERTE 7 8 15 15 3 6 6 9
ENOJO 14 12 12 10 10 8 15 10
INJUSTICIA 4 1 13 6 1 9 2 6
AUSENCIA 12 9 1 7 8 11 1 14
DOLOR 9 14 2 12 5 2 14 12
CORRUPCIÓN 3 6 14 14 14 14 5 15
MIEDO 6 2 3 9 2 5 10 7
SECUESTRO 1 3 6 8 13 15 4 5
INSEGURIDAD 5 5 9 11 4 13 8 4
DESESPERACIÓN 11 7 10 13 11 12 9 11
...
The result I need is like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
PREOCUPACION 3 2 2 5 1 0 2 3 0 1 1 0 2 0 1
TRISTEZA 1 2 1 1 2 2 2 2 3 0 2 1 1 1 2
Using apply on every row, convert to factor and get table:
res <-
cbind.data.frame(name = df1[, 1],
t(apply(df1[, -1], 1, function(i){
table(factor(i, levels = 1:15))
})))
res
# name 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
# 1 PREOCUPACIÓN 2 1 2 2 0 2 0 1 0 0 0 0 1 0 1
# 2 TRISTEZA 0 2 0 1 2 3 0 0 1 0 0 0 1 1 1
# 3 PERDIDA 0 1 1 1 0 0 1 0 0 1 2 2 1 2 0
# 4 ANGUSTIA 2 2 1 1 0 0 0 0 1 1 1 0 1 1 1
# ...
Edit: If you have names repeated on multiple rows, then try below. Split dataframe on 1st column, then loop through each split dataframe and get counts per factor level.
res <- t(data.frame(
lapply(split(df1, df1$V1), function(i){
as.numeric(table(factor(unlist(i[-1, ]), levels = 1:15)))
})))
res
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
# ANGUSTIA 4 0 2 1 1 1 2 2 1 0 1 0 2 0 1
# AUSENCIA 4 2 0 1 0 1 1 2 2 0 2 2 0 1 0
# CORRUPCIÓN 0 0 4 0 2 1 0 0 0 1 1 0 0 6 3
# DESESPERACIÓN 0 2 1 2 1 0 1 0 1 1 3 2 1 1 2
# ...

Replacing values in one column with another based on a 3rd column matching a 4th

I'm working with the following example:
Original Modified New_Orig New
1 2 1 0
2 4 1 0
3 6 4 0
4 8 5 0
5 10 5 0
6 12 5 0
7 14 5 0
8 16 5 0
9 18 9 0
10 20 10 0
I want to replace values in New with values from Modified if New_Orig matches with any value in Original.
Ideally New will look like this:
New
2
2
8
10
10
10
10
10
18
20
Any help much appreciated.
Kind regards,
Here, a new column New is created:
within(dat, New <- Modified*(New_Orig == Original))
Original Modified New_Orig New
1 1 2 1 2
2 2 4 1 0
3 3 6 4 0
4 4 8 5 0
5 5 10 5 10
6 6 12 5 0
7 7 14 5 0
8 8 16 5 0
9 9 18 9 18
10 10 20 10 20
Update
Match values and choose appropriate value from Modified:
within(dat, New <- Modified[match(New_Orig, Original)])
Original Modified New_Orig New
1 1 2 1 2
2 2 4 1 2
3 3 6 4 8
4 4 8 5 10
5 5 10 5 10
6 6 12 5 10
7 7 14 5 10
8 8 16 5 10
9 9 18 9 18
10 10 20 10 20
Since #rcs gave exactly the answer I would give, I thought I would show you an alternative approach to creating this "New" column rather than initializing it as all zeroes.
data <- data.frame(Original = 1:10,
Modified = seq(2, 20, 2),
New_Orig = c(1, 1, 4, 5, 5,
5, 5, 5, 9, 10))
within(data, {
New <- ifelse(Original == New_Orig, Modified, 0)
})
# Original Modified New_Orig New
# 1 1 2 1 2
# 2 2 4 1 0
# 3 3 6 4 0
# 4 4 8 5 0
# 5 5 10 5 10
# 6 6 12 5 0
# 7 7 14 5 0
# 8 8 16 5 0
# 9 9 18 9 18
# 10 10 20 10 20
Try the following:
v <- dat$New_Orig==dat$Original # this gives a logical vector,
# you could also use which(dat$New_Orig==dat$Original)
# to obtain the indices
dat[v, "New"] <- dat[v, "Modified"]

making sort order in merge() numeric

I have two easy matrices (or df's) to merge:
a <- cbind(one=0:15, two=0:15, three=0:15)
b <- cbind(one=0:15, two=0:15, three=0:15)
#a <- data.frame(one=0:15, two=0:15, three=0:15)
#b <- data.frame(one=0:15, two=0:15, three=0:15)
No problem: after sorting on column one, column one is output ascending nicely from 0 to 15:
merge(a,b,by=c("one"), sort=T)
one two.x three.x two.y three.y
1 0 0 0 0 0
2 1 1 1 1 1
3 2 2 2 2 2
4 3 3 3 3 3
5 4 4 4 4 4
6 5 5 5 5 5
7 6 6 6 6 6
8 7 7 7 7 7
9 8 8 8 8 8
10 9 9 9 9 9
11 10 10 10 10 10
12 11 11 11 11 11
13 12 12 12 12 12
14 13 13 13 13 13
15 14 14 14 14 14
16 15 15 15 15 15
But wait: when merging on two columns --- both numeric --- the sort order suddenly seems alphabetic.
merge(a,b,by=c("one", "two"), sort=T)
one two three.x three.y
1 0 0 0 0
2 1 1 1 1
3 10 10 10 10
4 11 11 11 11
5 12 12 12 12
6 13 13 13 13
7 14 14 14 14
8 15 15 15 15
9 2 2 2 2
10 3 3 3 3
11 4 4 4 4
12 5 5 5 5
13 6 6 6 6
14 7 7 7 7
15 8 8 8 8
16 9 9 9 9
Eww, gross. What's going on? And what do I do?
Based on #joran's comments, it looks like if you want the rows to be sorted in any particular order, you should explicitly set it yourself.
If the order you'd like is one in which the rows have increasing values of one or more columns, you can use the function order(), like this:
X <- merge(a, b, by = c("one", "two"))
X[with(X, order(one, two)),]

Resources