Label columns with a ascending number [duplicate] - r

This question already has answers here:
Make sequential numeric column names prefixed with a letter
(3 answers)
Closed 2 years ago.
I want to label columns with a ascending number. The reason is because in a bigger dataset I want to be able to sort the columns so they get in the right order.
How do i code this? Thanks!
set.seed(8)
id <- 1:6
diet <- rep(c("A","B"),3)
period <- rep(c(1,2),3)
score1 <- sample(1:100,6)
score2 <- sample(1:100,6)
score3 <- sample(1:100,6)
df <- data.frame(id, diet, period, score1, score2,score3)
df
id diet period score1 score2 score3
1 1 A 1 47 30 44
2 2 B 2 21 93 54
3 3 A 1 79 76 14
4 4 B 2 64 63 90
5 5 A 1 31 44 1
6 6 B 2 69 9 26
It should look like:
x1id x2diet x3period x4score1 x5score2 x6score3
1 1 A 1 47 30 44
2 2 B 2 21 93 54
3 3 A 1 79 76 14
4 4 B 2 64 63 90
5 5 A 1 31 44 1
6 6 B 2 69 9 26
I was thinking something like this, but something is missing....
colnames(wellbeing) <- paste(1:ncol, colnames(wellbeing))

Another options:
colnames(df) <- paste0('x', 1:dim(df)[2], colnames(df))
or
df %>%
dplyr::rename_all(~ paste0('x', 1:ncol(df), .))
Both methods would yield the same output:
# x1id x2diet x3period x4score1 x5score2 x6score3
#1 1 A 1 96 1 52
#2 2 B 2 52 93 75
#3 3 A 1 55 50 68
#4 4 B 2 79 3 9
#5 5 A 1 12 6 76
#6 6 B 2 42 86 62

You can use :
names(df) <- paste0('x', seq_along(df), names(df))
df
# x1id x2diet x3period x4score1 x5score2 x6score3
#1 1 A 1 96 1 52
#2 2 B 2 52 93 75
#3 3 A 1 55 50 68
#4 4 B 2 79 3 9
#5 5 A 1 12 6 76
#6 6 B 2 42 86 62
Maybe add an underscore?
names(df) <- paste0('x', seq_along(df), "_", names(df))
names(df)
#[1] "x1_id" "x2_diet" "x3_period" "x4_score1" "x5_score2" "x6_score3"

Here is a mapply approach.
mapply(paste0, paste0("x", 1:ncol(df)), names(df))

Related

How do I regroup data?

I am looking to change the structure of my dataframe, but I am not really sure how to do it. I am not even sure how to word the question either.
ID <- c(1,8,6,2,4)
a <- c(111,94,85,76,72)
b <- c(75,37,86,55,62)
dataframe <- data.frame(ID,a,b)
ID a b
1 1 111 75
2 8 94 37
3 6 85 86
4 2 76 55
5 4 72 62
Above is the code with the output, however, I want the output to look like the following; however, the only way I know how to do this is to just type it manually, is there any other way other than changing the input manually? Because I have quite a large data set that I would like to change and manually would just take forever.
ID letter value
1 1 a 111
2 1 b 75
3 8 a 94
4 8 b 37
5 6 a 85
6 6 b 86
7 2 a 76
8 2 b 55
9 4 a 72
10 4 b 62
We may use pivot_longer
library(dplyr)
library(tidyr)
dataframe %>%
pivot_longer(cols = a:b, names_to = 'letter')
-output
# A tibble: 10 × 3
ID letter value
<dbl> <chr> <dbl>
1 1 a 111
2 1 b 75
3 8 a 94
4 8 b 37
5 6 a 85
6 6 b 86
7 2 a 76
8 2 b 55
9 4 a 72
10 4 b 62
A base R option using reshape:
df <- reshape(dataframe, direction = "long",
v.names = "value",
varying = 2:3,
times = names(dataframe)[2:3],
timevar = "letter",
idvar = "ID")
df <- df[ order(match(df$ID, dataframe$ID)), ]
row.names(df) <- NULL
Output
ID letter value
1 1 a 111
2 1 b 75
3 8 a 94
4 8 b 37
5 6 a 85
6 6 b 86
7 2 a 76
8 2 b 55
9 4 a 72
10 4 b 62

R: How to merge a new data frame to several other data frames in a list

I have several seperate data frames that I would like to keep separated because merging them together would create a very large element.
However, there are variables from another data frame that I would like to merge with all of them now.
Here is an example of what I would like to do:
df1 <- data.frame(ID1 = c(1:10), Var1 = rep(c(1,0),5))
df2 <- data.frame(ID1 = c(1:10), Var2 = c(21:30))
dfs <- Filter(function(x) is(x, "data.frame"), mget(ls()))
mergewith <- data.frame(ID1 = c(1:10), ID2 = c(41:50))
My goal is that df1 and df2 will look like this:
df1
ID1 Var1 ID2
1 1 1 41
2 2 0 42
3 3 1 43
4 4 0 44
5 5 1 45
6 6 0 46
7 7 1 47
8 8 0 48
9 9 1 49
10 10 0 50
df2
ID1 Var2 ID2
1 1 21 41
2 2 22 42
3 3 23 43
4 4 24 44
5 5 25 45
6 6 26 46
7 7 27 47
8 8 28 48
9 9 29 49
10 10 30 50
What I have tried so far is:
dat = lapply(dfs,function(x){
merge(names(x), mergewith, by = "ID1");x})
list2env(dat,.GlobalEnv)
However, then I get the following message:
"'by' must specify a uniquely valid column"
Is it possible to do this without using a loop?
You can try Map
> Map(function(x, y) merge(x, y, by = "ID1"), dfs, list(mergewith))
[[1]]
ID1 Var1 ID2
1 1 1 41
2 2 0 42
3 3 1 43
4 4 0 44
5 5 1 45
6 6 0 46
7 7 1 47
8 8 0 48
9 9 1 49
10 10 0 50
[[2]]
ID1 Var2 ID2
1 1 21 41
2 2 22 42
3 3 23 43
4 4 24 44
5 5 25 45
6 6 26 46
7 7 27 47
8 8 28 48
9 9 29 49
10 10 30 50
You can use lapply to merge all the dataframes in dfs with mergewith. Use list2env to get the changed dataframes in the global environment.
list2env(lapply(dfs, function(x) merge(x, mergewith, by = 'ID1')), .GlobalEnv)

Creating a new dataset for each combination of rows in groups

I'm trying to create a dataset for each combination of rows from separate groups. Ideally, one row from each group would be selected and there would be a dataset for every combination. I have a dataset of that looks similar in structure to the sample below:
Name Group Stat1 Stat2
1 1 a 63 38
2 2 a 33 62
3 3 b 3 66
4 4 b 57 67
5 5 c 42 69
6 6 c 47 14
7 7 c 16 10
8 8 d 21 46
9 9 d 72 1
Trying to get the end result of the first dataset to look like this:
Name Group Stat1 Stat2
1 1 a 63 38
2 3 b 3 66
3 5 c 42 69
4 8 d 21 46
With the second data dataset looking like this:
Name Group Stat1 Stat2
1 1 a 63 38
2 3 b 3 66
3 5 c 42 69
4 9 d 72 1
Until every combination has been exhausted. I've tried strategies using apply functions and combn but cannot seem to get the result I want. This does not seem too challenging to me conceptually, so I'm not sure what I'm missing.
Any help would be greatly appreciated! Thanks in advance!
Lots of ways to approach this. A simple solution is to just generate all 4 row combos, then subset to those with all distinct Group values. I named your data df and assumed Name would be unique row id. If that's not true, you could replace df$Name with 1:nrow(df)
# All 4 row combos of row ids
combs <- combn(df$Name, 4)
# Match group labels to row ids
g <- matrix(df$Group[combs], nrow = 4)
# 4 row combs filtered to all distinct group vals
combs <- combs[,apply(g, 2, function(i) all(!duplicated(i)))]
# For each 4 row combo, extract rows from the dataframe
final_list <- apply(combs, 2, function(i) df[i,])
final_list[1:3]
[[1]]
Name Group Stat1 Stat2
1 1 a 63 38
3 3 b 3 66
5 5 c 42 69
8 8 d 21 46
[[2]]
Name Group Stat1 Stat2
1 1 a 63 38
3 3 b 3 66
5 5 c 42 69
9 9 d 72 1
[[3]]
Name Group Stat1 Stat2
1 1 a 63 38
3 3 b 3 66
6 6 c 47 14
8 8 d 21 46

How to find the normalized values within each level of a variable in R

I have a categorical variable B with 3 levels 1,2,3 also I have another variable A with some values.. sample data is as follows
A B
22 1
23 1
12 1
34 1
43 2
47 2
49 2
65 2
68 3
70 3
75 3
82 3
120 3
. .
. .
. .
. .
All I want is say for every level of B ( say in 1) I need to calculate Val(A)-Min/Max-Min, similarly I need to reproduce the same to other levels (2 & 3)
Solution using dplyr:
set.seed(1)
df=data.frame(A=round(rnorm(21,50,10)),B=rep(1:3,each=7))
library(dplyr)
df %>% group_by(B) %>% mutate(C= (A-min(A))/(max(A)-min(A)))
The output is like
# A tibble: 21 x 3
# Groups: B [3]
A B C
<dbl> <int> <dbl>
1 44 1 0.0833
2 52 1 0.417
3 42 1 0
4 66 1 1
5 53 1 0.458
6 42 1 0
7 55 1 0.542
8 57 2 0.784
9 56 2 0.757
10 47 2 0.514
# ... with 11 more rows
You could use the tapply function:
x = read.table(text="A B
22 1
23 1
12 1
34 1
43 2
47 2
49 2
65 2
68 3
70 3
75 3
82 3
120 3", header = TRUE)
y = tapply(x$A, x$B, function(z) (z - min(z)) / (max(z) - min(z)))
# Or using the scale() function
#y = tapply(x$A, x$B, function(z) scale(z, min(z), max(z) - min(z)))
cbind(x, unlist(y))
Not exactly sure how you want the output, but this should be a decent starting point.

How to make a table of the frequency of a number multiplied by the number raised to 3 in R

Having the following list, I would like the result of the frequency to be multiplied by its index value raised to 3. How could I do it?
data<-c(1,1,2,2,3,34,65,78,65,3)
table(data)
data
1 2 3 34 65 78
2 2 2 1 2 1
Expected:
1 2 3 34 65 78
2 8*2 27*2 39304*1 274625*2 474552*1
Thanks
with(rle(sort(data)), lengths*values^3)
#[1] 2 16 54 39304 549250 474552
OR
x = table(data)
x*as.numeric(names(x))^3
#data
#1 2 3 34 65 78
#2 16 54 39304 549250 474552
You could consider tapply:
res <- tapply(data, data, function(x) (x[1]^3) * length(x))
#1 2 3 34 65 78
#2 16 54 39304 549250 474552
Note as beautiful as the other answers but a multi-step solution.
Data:
data<-c(1,1,2,2,3,34,65,78,65,3)
data <- as.data.frame(table(data))
Solve the problem:
data$data <- as.numeric(as.character(data$data))
data$powers <- data$data**3
data$final <- data$Freq * data$powers
Result:
data Freq powers final
1 1 2 1 2
2 2 2 8 16
3 3 2 27 54
4 34 1 39304 39304
5 65 2 274625 549250
6 78 1 474552 474552

Resources