Related
df is a test dataframe and I need to sort the last three columns in ascending order (without hardcoding the order).
df <- data.frame(X = c(1, 2, 3, 4, 5),
Z = c(1, 2, 3, 4, 5),
Y = c(1, 2, 3, 4, 5),
A = c(1, 2, 3, 4, 5),
C = c(1, 2, 3, 4, 5),
B = c(1, 2, 3, 4, 5))
Desired output:
> df
X Z Y A B C
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
I'm aware of the order() function but I can't seem to find the right way to implement it to get the desired output.
Update:
Base R:
cbind(df[1:3],df[4:6][,order(colnames(df[4:6]))])
First answer:
We could use relocate from dplyr:
https://dplyr.tidyverse.org/reference/relocate.html
It is configured to arrange columns:
Here we relocate by the index.
We take last (index = 6) and put it before (position 5, which is C)
library(dplyr)
df %>%
relocate(6, .before = 5)
An alternative:
library(dplyr)
df %>%
select(order(colnames(df))) %>%
relocate(4:6, .before = 1)
X Z Y A B C
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
In base R, a selection on the first columns then sort the last 3 names :
df[, c(names(df)[1:(ncol(df)-3)], sort(names(df)[ncol(df)-2:0]))]
We want to reorder the columns based on the column names, so if we use names(df) as the argument to order, we can reorder the data frame as follows.
The complicating factor is that order() returns a vector of numbers, so if we want to reorder only a subset of the column names, we'll need an approach that retains the original sort order for the first three columns.
We accomplish this by creating a vector of the first 3 column names, the sorted remaining column names using a function that returns the values rather than locations in the vector, and then use this with the [ form of the extract operator.
df <- data.frame(X = c(1, 2, 3, 4, 5),
Z = c(1, 2, 3, 4, 5),
Y = c(1, 2, 3, 4, 5),
A = c(1, 2, 3, 4, 5),
C = c(1, 2, 3, 4, 5),
B = c(1, 2, 3, 4, 5))
df[,c(names(df[1:3]),sort(names(df[4:6])))]
...and the output:
> df[,c(names(df[1:3]),sort(names(df[4:6])))]
X Z Y A B C
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
to_order <- seq(ncol(df)) > ncol(df) - 3
df[order(to_order*order(names(df)))]
#> X Z Y A B C
#> 1 1 1 1 1 1 1
#> 2 2 2 2 2 2 2
#> 3 3 3 3 3 3 3
#> 4 4 4 4 4 4 4
#> 5 5 5 5 5 5 5
Created on 2021-12-24 by the reprex package (v2.0.1)
This question already has answers here:
Count number of rows per group and add result to original data frame
(11 answers)
Closed 1 year ago.
Here is an example of the panel dataset I'm working with:
library(data.table)
data <- data.table(ID = c(1,1,1,1,1,2,2,2,2),
crop = c(1,2,3,4,5,1,2,3,4))
ID, crop
1, 1
1, 2
1, 3
1, 4
1, 5
2, 1
2, 2
2, 3
2, 4
There are several ID variables each with a varying number of observations (rows) according to the number of crop's they have.
I want to create an additional variable that shows the total number of observations an ID has. The desired output would look like:
ID, crop, total
1, 1, 5
1, 2, 5
1, 3, 5
1, 4, 5
1, 5, 5
2, 1, 4
2, 2, 4
2, 3, 4
2, 4, 4
Is this possible to do using data.table in R?
You could use
library(data.table)
data[, total := .N, by = ID]
This returns
ID crop total
1: 1 1 5
2: 1 2 5
3: 1 3 5
4: 1 4 5
5: 1 5 5
6: 2 1 4
7: 2 2 4
8: 2 3 4
9: 2 4 4
Extending this former question, how can I shuffle (randomize) the following vector
a1 = c(1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5)
in order to get something like this:
a2 = c(5, 5, 3, 3, 3, 3, 1, 1, 2, 4, 4, 4)
or even better like this:
a3 = c(4, 4, 4, 2, 3, 3, 3, 3, 1, 1, 5, 5)?
such that each element could randomly change to another but with keeping the number of each element constant?
You can try something like this: create a factor from a1 with randomly shuffled levels and then convert it to integers:
as.integer(factor(a1, levels = sample(unique(a1), length(unique(a1)))))
# [1] 5 5 4 4 4 4 3 3 2 1 1 1
The data:
a1 <- c(1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5)
First steps:
# extract values and their frequencies
val <- unique(a1)
tab <- table(a1)
freq <- tab[as.character(val)]
Keep original order of frequencies but sample values
rep(sample(val), freq)
# [1] 4 4 1 1 1 1 3 3 5 2 2 2
Keep original frequencies but sample order of values
rep(sa <- sample(val), freq[as.character(sa)])
# [1] 4 2 2 2 2 3 3 1 1 5 5 5
Seems like a perfect application for rle and its inverse rep:
rand_inverse_rle <- function(x) { x=sort(x)
ord=sample (length(rle(x)$values) )
unlist( mapply( rep, rle(x)$values[ord], rle(x)$lengths[ord]))}
rand_inverse_rle(a1)
#----------
[1] 3 3 4 5 5 5 2 2 2 2 1 1
This was my reading of a function needed to satisfy the natural language requirements:
> a1 = sample( c(1, 1, 2, 2, 2, 2, 3, 3, 4, 5, 5, 5) )
> a1
[1] 5 2 5 2 5 1 3 4 2 2 3 1
> rand_inverse_rle(a1)
[1] 5 5 5 4 2 2 2 2 3 3 1 1
> rand_inverse_rle(a1)
[1] 1 1 3 3 5 5 5 2 2 2 2 4
> rand_inverse_rle(a1)
[1] 1 1 3 3 4 5 5 5 2 2 2 2
I have sequence of the numbers(Really it is just a piece of this sequence. In fact I have over 100k numbers)
1 2 3 3 2 3 2 3 2 1 2 3 2 3 2 3 3 2 3 2 3 2 1 3 3 2 3 3 2 3 3 3 2 3 2 3 2 1 3 2 3 3 3 2 3 3 2 3 2 3
I need to calculate the average number of steps after I get 1 in this sequence.
For example:
In this sequence 1 is first number. Now I count number of steps to get next 1 and I get 9. Next 1 is after 13 steps, next after 15 steps etc.
Now I have to calculate the average number of steps.
So there we have (9+13+15)/3= 12.(3)
How I can do this in R Language?
You can try:
mean(diff(which(x == 1)))
## [1] 12.33333
Given:
x <- c(1, 2, 3, 3, 2, 3, 2, 3, 2, 1, 2, 3, 2, 3, 2, 3, 3, 2, 3, 2,
3, 2, 1, 3, 3, 2, 3, 3, 2, 3, 3, 3, 2, 3, 2, 3, 2, 1, 3, 2, 3,
3, 3, 2, 3, 3, 2, 3, 2, 3)
I want to capture data values from a post on SE into RStudio, and I manage to do so by copying the values, and then pasting them into the following command in the console:
> a = as.numeric(read.table(text = "8 8 4 1 2 2 0 2 5 2 3 3 3 1 5 4 4 1 4 2", sep = " "))
> a
[1] 8 8 4 1 2 2 0 2 5 2 3 3 3 1 5 4 4 1 4 2
Now a is in the global environment. The problem is that I would like to save it into an R file containing a number of other things, let's call it file.R, where vector a would appear as:
a <- c(8, 8, 4, 1, 2, 2, 0, 2, 5, 2, 3, 3, 3, 1, 5, 4, 4, 1, 4, 2)
Unfortunately for me, the only way I know is to type the commas manually. How can I do this otherwise?