This question already has answers here:
For each row return the column name of the largest value
(10 answers)
Closed 4 years ago.
I have a table of locations and values for Precipitation of each month.
I need to add a new column with name of the month that has the maximum Precipitation for each location.
I tried to do that:
cbind(rainfall, max_month = apply(rainfall[,3:11],1,which.max))
but I'm getting only the number of the column and I need the name of the column.
I got this :
[1] 5 5 5 5 5 5 5 5 4 4 5 5 5 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
[59] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4 5 5 4 5 5 5 5 5 5 5 5 5 5 5 5
[117] 5 5 5 5 5 5 5 5 5 6 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4
I tried to add the names function and the colnames function' but both of them didnt help.
names(apply(rainfall[,3:11],1,(which.max)))
Thanks
Best way to this is via max.col. You should always avoid apply on data.frames,
names(rainfall)[max.col(rainfall[3:11])]
You probably need something along the lines of:
names(rainfall[,3:11])[apply(rainfall[,3:11],1,which.max)]
Here you transform the column id to a name by subsetting the names(rainfall) vector. Note that repeating an index, e.g. c(5, 5, 5, 5) repeats the extracted value.
An alternative approach using dplyr:
library(dplyr)
library(mtcars)
mtcars %>%
gather(month, precip_value, disp, hp, drat, wt) %>%
group_by(gear) %>%
summarise(max_month = month[which.max(precip_value)])
Note that this approach uses the mtcars dataset as your example was not reproducible. Here, gear would be your station id. The trick is to restructure the data a bit from wide to long format using gather, then splitting the data per station using group_by and then determining the max month using summarise. Just food for thought, the answer of #sotos is quite elegant.
Related
This question already has answers here:
Using seq and rep to create a sequence of 5 integers that go up by 1 on each repetition
(4 answers)
Closed 1 year ago.
I want to create the sequence 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9 if possible using only rep and 'seq'. So each repetition I want the repeating sequence to increase by one. This could be achieved my creating rep(seq(1,5),5) and then adding a vector rep(0:4, each = 5).
But is there any way to do this without creating a new vector and adding it to the first one?
You can use outer + seq in one line
> c(outer(seq(5), seq(5) - 1, `+`))
[1] 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9
or shorter code with embed
> c(embed(1:9, 5)[, 5:1])
[1] 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9
This question already has answers here:
How to create a consecutive group number
(13 answers)
Closed 1 year ago.
I have these set of variables in the column Num I want to create another column that ranks them with size similar to rankt below but I don't like how this is done.
x <- data.frame("Num" = c(2,5,2,7,7,7,2,5,5))
x$rankt <- rank(x$Num)
Num rankt
1 2 2
2 5 5
3 2 2
4 7 8
5 7 8
6 7 8
7 2 2
8 5 5
9 5 5
Desired Outcome I would like for rankt
Num rankt
1 2 1
2 5 2
3 2 1
4 7 3
5 7 3
6 7 3
7 2 1
8 5 2
9 5 2
Well, a crude approach is to turn them to factors, which are just increasing numbers with labels, and then fetch those numbers:
x <- data.frame("Num" = c(2,5,2,7,7,7,2,5,5))
x$rankt <- as.numeric(as.factor( rank(x$Num) ))
x
It produces:
Num rankt
1 2 1
2 5 2
3 2 1
4 7 3
5 7 3
6 7 3
7 2 1
8 5 2
9 5 2
A solution with dplyr
library(dplyr)
x1 <- x %>%
mutate(rankt=dense_rank(desc(-Num)))
This question already has answers here:
R: define distinct pattern from values of multiple variables [duplicate]
(3 answers)
Closed 5 years ago.
I have a dataset like this:
case x y
1 4 5
2 4 5
3 8 9
4 7 9
5 6 3
6 6 3
I would like to create a grouping variable.
This variable should have the same values when both x and y are the same.
I do not care what this value is but it is to group them. Because in my dataset if x and y are the same for two cases they are probably part of the same organization. I want to see which organizations there are.
So my preferred dataset would look like this:
case x y org
1 4 5 1
2 4 5 1
3 8 9 2
4 7 9 3
5 6 3 4
6 6 3 4
How would I have to program this in R?
As you said , I do not care what this value is, you can just do following
dt$new=as.numeric(as.factor(paste(dt$x,dt$y)))
dt
case x y new
1 1 4 5 1
2 2 4 5 1
3 3 8 9 4
4 4 7 9 3
5 5 6 3 2
6 6 6 3 2
A solution from dplyr using the group_indices.
library(dplyr)
dt2 <- dt %>%
mutate(org = group_indices(., x, y))
dt2
case x y org
1 1 4 5 1
2 2 4 5 1
3 3 8 9 4
4 4 7 9 3
5 5 6 3 2
6 6 6 3 2
If the group numbers need to be in order, we can use the rleid from the data.table package after we create the org column as follows.
library(dplyr)
library(data.table)
dt2 <- dt %>%
mutate(org = group_indices(., x, y)) %>%
mutate(org = rleid(org))
dt2
case x y org
1 1 4 5 1
2 2 4 5 1
3 3 8 9 2
4 4 7 9 3
5 5 6 3 4
6 6 6 3 4
Update
Here is how to arrange the columns in dplyr.
library(dplyr)
dt %>%
arrange(x)
case x y
1 1 4 5
2 2 4 5
3 5 6 3
4 6 6 3
5 4 7 9
6 3 8 9
We can also do this for more than one column, such as arrange(x, y) or use desc to reverse the oder, like arrange(desc(x)).
DATA
dt <- read.table(text = " case x y
1 4 5
2 4 5
3 8 9
4 7 9
5 6 3
6 6 3",
header = TRUE)
This question already has an answer here:
Data.table meta-programming
(1 answer)
Closed 6 years ago.
Ordering of the data.frame by column index:
> df <- data.frame(5:9, 8:4)
> df
X5.9 X8.4
1 5 8
2 6 7
3 7 6
4 8 5
5 9 4
> df[order(df[,2]),]
X5.9 X8.4
5 9 4
4 8 5
3 7 6
2 6 7
1 5 8
or by column name:
> df[order(df[,"X5.9"]),]
X5.9 X8.4
1 5 8
2 6 7
3 7 6
4 8 5
5 9 4
Is it possible to achieve the same with data.table and order by custom column name or index?
We can use setkey
setkey(setDT(df), X5.9)
I would like to randomly subset dataframe with condition that if the observation with alpha=1 is included in a subset, then all observation which has alpha=1 must be included in the subset. I simplify data, so it looks like this.
df
alpha beta gamma
1 5 2
1 6 3
1 5 3
2 3 2
2 5 9
2 2 6
3 3 4
3 4 7
3 3 8
4 3 4
4 8 3
4 4 9
5 9 8
5 5 5
5 3 5
What command should I use to get subsets like the following?
df1
alpha beta gamma
1 5 2
1 6 3
1 5 3
3 3 4
3 4 7
3 3 8
5 9 8
5 5 5
5 3 5
df2
alpha beta gamma
2 3 2
2 5 9
2 2 6
4 3 4
4 8 3
4 4 9
5 9 8
5 5 5
5 3 5
df3
alpha beta gamma
1 5 2
1 6 3
1 5 3
2 3 2
2 5 9
2 2 6
5 9 8
5 5 5
5 3 5
Specifically, the first observation in df with numbers (1,5,2) is randomly fell in subset df1 and df3. If so, it must follow that 2nd and 3d observations in df (1,6,3) and (1,5,3) are also included in subsets df1 and df2.
I hope that my question is clear. Please help.
Try this
str <- "alpha,beta,gamma
1,5,2
1,6,3
1,5,3
2,3,2
2,5,9
2,2,6
3,3,4
3,4,7
3,3,8
4,3,4
4,8,3
4,4,9
5,9,8
5,5,5
5,3,5"
df <- read.csv(textConnection(str))
df[df$alpha %in% sample(unique(df$alpha), 3), ]
Output
alpha beta gamma
4 2 3 2
5 2 5 9
6 2 2 6
10 4 3 4
11 4 8 3
12 4 4 9
13 5 9 8
14 5 5 5
15 5 3 5