This question already has an answer here:
Data.table meta-programming
(1 answer)
Closed 6 years ago.
Ordering of the data.frame by column index:
> df <- data.frame(5:9, 8:4)
> df
X5.9 X8.4
1 5 8
2 6 7
3 7 6
4 8 5
5 9 4
> df[order(df[,2]),]
X5.9 X8.4
5 9 4
4 8 5
3 7 6
2 6 7
1 5 8
or by column name:
> df[order(df[,"X5.9"]),]
X5.9 X8.4
1 5 8
2 6 7
3 7 6
4 8 5
5 9 4
Is it possible to achieve the same with data.table and order by custom column name or index?
We can use setkey
setkey(setDT(df), X5.9)
Related
This question already has answers here:
Using seq and rep to create a sequence of 5 integers that go up by 1 on each repetition
(4 answers)
Closed 1 year ago.
I want to create the sequence 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9 if possible using only rep and 'seq'. So each repetition I want the repeating sequence to increase by one. This could be achieved my creating rep(seq(1,5),5) and then adding a vector rep(0:4, each = 5).
But is there any way to do this without creating a new vector and adding it to the first one?
You can use outer + seq in one line
> c(outer(seq(5), seq(5) - 1, `+`))
[1] 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9
or shorter code with embed
> c(embed(1:9, 5)[, 5:1])
[1] 1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9
This question already has answers here:
How to create a consecutive group number
(13 answers)
Closed 1 year ago.
I have these set of variables in the column Num I want to create another column that ranks them with size similar to rankt below but I don't like how this is done.
x <- data.frame("Num" = c(2,5,2,7,7,7,2,5,5))
x$rankt <- rank(x$Num)
Num rankt
1 2 2
2 5 5
3 2 2
4 7 8
5 7 8
6 7 8
7 2 2
8 5 5
9 5 5
Desired Outcome I would like for rankt
Num rankt
1 2 1
2 5 2
3 2 1
4 7 3
5 7 3
6 7 3
7 2 1
8 5 2
9 5 2
Well, a crude approach is to turn them to factors, which are just increasing numbers with labels, and then fetch those numbers:
x <- data.frame("Num" = c(2,5,2,7,7,7,2,5,5))
x$rankt <- as.numeric(as.factor( rank(x$Num) ))
x
It produces:
Num rankt
1 2 1
2 5 2
3 2 1
4 7 3
5 7 3
6 7 3
7 2 1
8 5 2
9 5 2
A solution with dplyr
library(dplyr)
x1 <- x %>%
mutate(rankt=dense_rank(desc(-Num)))
This question already has answers here:
Replace a value NA with the value from another column in R
(5 answers)
Closed 3 years ago.
I don't have the slightest idea of programming, but I need to solve the following problem in R.
Let's suppose I have this data:
x y
5 8
6 5
2
9 8
4
0
6 6
7 3
3 2
I need to create a third column called "z" containing the data of "y" exccept for the missing values where it should have the values of "x". It would be something like this:
x y z
5 8 8
6 5 5
2 2
9 8 8
4 4
0 0
6 6 6
7 3 3
3 2 2
dat <- data.frame(x=c(5,6,2,9,4,0,6,7,3), y = c(8,5,NA,8,NA,NA,6,3,2))
library(tidyverse)
dat %>% mutate(z = ifelse(is.na(y), x, y))
# x y z
# 1 5 8 8
# 2 6 5 5
# 3 2 NA 2
# 4 9 8 8
# 5 4 NA 4
# 6 0 NA 0
# 7 6 6 6
# 8 7 3 3
# 9 3 2 2
This question already has answers here:
For each row return the column name of the largest value
(10 answers)
Closed 4 years ago.
I have a table of locations and values for Precipitation of each month.
I need to add a new column with name of the month that has the maximum Precipitation for each location.
I tried to do that:
cbind(rainfall, max_month = apply(rainfall[,3:11],1,which.max))
but I'm getting only the number of the column and I need the name of the column.
I got this :
[1] 5 5 5 5 5 5 5 5 4 4 5 5 5 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
[59] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4 5 5 4 5 5 5 5 5 5 5 5 5 5 5 5
[117] 5 5 5 5 5 5 5 5 5 6 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4 4 4
I tried to add the names function and the colnames function' but both of them didnt help.
names(apply(rainfall[,3:11],1,(which.max)))
Thanks
Best way to this is via max.col. You should always avoid apply on data.frames,
names(rainfall)[max.col(rainfall[3:11])]
You probably need something along the lines of:
names(rainfall[,3:11])[apply(rainfall[,3:11],1,which.max)]
Here you transform the column id to a name by subsetting the names(rainfall) vector. Note that repeating an index, e.g. c(5, 5, 5, 5) repeats the extracted value.
An alternative approach using dplyr:
library(dplyr)
library(mtcars)
mtcars %>%
gather(month, precip_value, disp, hp, drat, wt) %>%
group_by(gear) %>%
summarise(max_month = month[which.max(precip_value)])
Note that this approach uses the mtcars dataset as your example was not reproducible. Here, gear would be your station id. The trick is to restructure the data a bit from wide to long format using gather, then splitting the data per station using group_by and then determining the max month using summarise. Just food for thought, the answer of #sotos is quite elegant.
This question already has answers here:
How to randomize (or permute) a dataframe rowwise and columnwise?
(9 answers)
Closed 7 years ago.
I have a dataframe with 9000 rows and 6 columns. I want to make the order of rows random i.e. some kind of shuffling to produce another dataframe with the same data but the rows in random order. Could anyone tell me how to do this in R?
Thanks
If you want to sample (but keep) the same order of the rows then you can just sample the rows.
df <- data.frame(x=1:8, y=1:8, z=1:8)
df[sample(1:nrow(df)),]
which will produce
x y z
2 2 2 2
3 3 3 3
4 4 4 4
6 6 6 6
5 5 5 5
8 8 8 8
7 7 7 7
1 1 1 1
If you rows should be sampled individually for each row then you can do something like
lapply(df, function(x) { sample(x)})
which results in
$x
[1] 3 1 4 6 5 2 8 7
$y
[1] 2 5 6 3 4 8 7 1
$z
[1] 6 1 8 3 2 7 4 5