how to put variables in the right order using a vector with ordering cues? - r

I am generating dataframes with different amounts of variables, which are in the wrong positions most of the time.
e.g. this dataframe
df <- structure(list(A = c(1, 2, 3, 4, 5), F = c(5, 4, 3, 2, 1), D = c(5,
5, 4, 4, 1)), .Names = c("A", "F", "D"), row.names = c(NA, 5L
), class = "data.frame")
A F D
1 1 5 5
2 2 4 5
3 3 3 4
4 4 2 4
5 5 1 1
I have a vector that helps me knowing what the right order should be, e.g.:
c("A","B","C","D","E","F")
How can I use this vector put, programmatically my generated dataframes in the right order?
According to the vector, this should be the result:
A D F
1 1 5 5
2 2 5 4
3 3 4 3
4 4 4 2
5 5 1 1
Any ideas? most welcome!

intersect should work for this:
df[intersect(colorder, names(df))]
# A D F
# 1 1 5 5
# 2 2 5 4
# 3 3 4 3
# 4 4 4 2
# 5 5 1 1

Related

Is there a way to automatically append data frame columns below each other into one column within large list of data frames?

I have a large list with thousands of data frames included in it. These data frames have multiple columns each. Thereby, I want to automatically bind in each of these data frames the columns into one column. This means that they are appended below each other as shown below. Thereafter, I would transform the list to a data frame which would have varying column lengths due to the different number of columns within each element in the original list.
From this:
y1 y2
1 4
2 5
3 6
To this:
y1
1
2
3
4
5
6
This should be done for each element in the list, whereby the solution needs to take into account that there are thousands of different data frames, which cannot be mentioned individually (example):
df1 = data.frame(
X1 = c(1, 2, 3),
X1.2 = c(4, 5, 6)
)
df2 = data.frame(
X2 = c(7, 8, 9),
X2.2 = c(1, 4, 6)
)
df3 = data.frame(
X3 = c(3, 4, 1),
X3.2 = c(8, 3, 5),
X3.3 = c(3, 1, 9)
)
listOfDataframe = list(df1, df2, df3)
Final output:
df_final = data.frame(
X1 = c(1, 2, 3, 4, 5, 6),
X2 = c(7, 8, 9, 1, 4, 6),
X3 = c(3, 4, 1, 8, 3, 5, 3, 1, 9)
)
Another problem underlying this question is that there will be a differing number of rows, which I do not know how to account for in the data frame, as the columns need to have the same length.
Thank you in advance for your help, it is highly appreciated.
Structure of list within R:
We can unlist after looping over the list with lapply
lst1 <- lapply(listOfDataframe, \(x)
setNames(data.frame(unlist(x, use.names = FALSE)), names(x)[1]))
-output
lst1
[[1]]
X1
1 1
2 2
3 3
4 4
5 5
6 6
[[2]]
X2
1 7
2 8
3 9
4 1
5 4
6 6
[[3]]
X3
1 3
2 4
3 1
4 8
5 3
6 5
7 3
8 1
9 9
If we need to convert the list to a single data.frame, use cbind.na from qPCR
do.call(qpcR:::cbind.na, lst1)
X1 X2 X3
1 1 7 3
2 2 8 4
3 3 9 1
4 4 1 8
5 5 4 3
6 6 6 5
7 NA NA 3
8 NA NA 1
9 NA NA 9
Here is a tidyverse solution:
library(dplyr)
library(purrr)
listOfDataframe %>%
map(~.x %>% stack(.)) %>%
map(~.x %>% select(-ind))
[[1]]
values
1 1
2 2
3 3
4 4
5 5
6 6
[[2]]
values
1 7
2 8
3 9
4 1
5 4
6 6
[[3]]
values
1 3
2 4
3 1
4 8
5 3
6 5
7 3
8 1
9 9

Sorting specific columns of a dataframe by their names in R

df is a test dataframe and I need to sort the last three columns in ascending order (without hardcoding the order).
df <- data.frame(X = c(1, 2, 3, 4, 5),
Z = c(1, 2, 3, 4, 5),
Y = c(1, 2, 3, 4, 5),
A = c(1, 2, 3, 4, 5),
C = c(1, 2, 3, 4, 5),
B = c(1, 2, 3, 4, 5))
Desired output:
> df
X Z Y A B C
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
I'm aware of the order() function but I can't seem to find the right way to implement it to get the desired output.
Update:
Base R:
cbind(df[1:3],df[4:6][,order(colnames(df[4:6]))])
First answer:
We could use relocate from dplyr:
https://dplyr.tidyverse.org/reference/relocate.html
It is configured to arrange columns:
Here we relocate by the index.
We take last (index = 6) and put it before (position 5, which is C)
library(dplyr)
df %>%
relocate(6, .before = 5)
An alternative:
library(dplyr)
df %>%
select(order(colnames(df))) %>%
relocate(4:6, .before = 1)
X Z Y A B C
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
In base R, a selection on the first columns then sort the last 3 names :
df[, c(names(df)[1:(ncol(df)-3)], sort(names(df)[ncol(df)-2:0]))]
We want to reorder the columns based on the column names, so if we use names(df) as the argument to order, we can reorder the data frame as follows.
The complicating factor is that order() returns a vector of numbers, so if we want to reorder only a subset of the column names, we'll need an approach that retains the original sort order for the first three columns.
We accomplish this by creating a vector of the first 3 column names, the sorted remaining column names using a function that returns the values rather than locations in the vector, and then use this with the [ form of the extract operator.
df <- data.frame(X = c(1, 2, 3, 4, 5),
Z = c(1, 2, 3, 4, 5),
Y = c(1, 2, 3, 4, 5),
A = c(1, 2, 3, 4, 5),
C = c(1, 2, 3, 4, 5),
B = c(1, 2, 3, 4, 5))
df[,c(names(df[1:3]),sort(names(df[4:6])))]
...and the output:
> df[,c(names(df[1:3]),sort(names(df[4:6])))]
X Z Y A B C
1 1 1 1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3 3
4 4 4 4 4 4 4
5 5 5 5 5 5 5
to_order <- seq(ncol(df)) > ncol(df) - 3
df[order(to_order*order(names(df)))]
#> X Z Y A B C
#> 1 1 1 1 1 1 1
#> 2 2 2 2 2 2 2
#> 3 3 3 3 3 3 3
#> 4 4 4 4 4 4 4
#> 5 5 5 5 5 5 5
Created on 2021-12-24 by the reprex package (v2.0.1)

Merging 2 datasets by calling on the row numbers (without using merge() or lookup functions)

Hi This is a problem that I run into often in R programing and am in need of simple solution from this community. In sort, the problem requires a lookup value to be returned to a dataframe. I would like to call on the rownumber of the lookup table
> x1 <- c(2, 3, 1, 5, 4)
> x2 <- c("a", "b", "c", "d", "e")
>
> set.seed(5)
> x3 <- round(runif (10, 1, 5))
>
> lookup.df <- data.frame(x1, x2)
> Data.df <- data.frame(x3)
> lookup.df
x1 x2
1 2 a
2 3 b
3 1 c
4 5 d
5 4 e
> Data.df
x3
1 2
2 4
3 5
4 2
5 1
6 4
7 3
8 4
9 5
10 1
Data.df$x2 <- df1 [ (matching row numbers from Data.df with lookup.df$x1) , 2 ]
In theory, the code should be able to generate a list that would look like
rows <- c(1, 5, 4, 1, 3, 5, 2, 5, 4, 3)
so that the following would result
> Data.df$x2 <- df1 [ rows , 2 ]
> Data.df
x3 x2
1 2 a
2 4 e
3 5 d
4 2 a
5 1 c
6 4 e
7 3 b
8 4 e
9 5 d
10 1 c
I appreciate an ideas. Thanks.
We can use a named vector to match
Data.df$x2 <- setNames(lookup.df$x2, lookup.df$x1)[as.character(Data.df$x3)]
-output
> Data.df
x3 x2
1 2 a
2 4 e
3 5 d
4 2 a
5 1 c
6 4 e
7 3 b
8 4 e
9 5 d
10 1 c
You may use match function -
Data.df$x2 <- lookup.df$x2[match(Data.df$x3, lookup.df$x1)]
# x3 x2
#1 2 a
#2 4 e
#3 5 d
#4 2 a
#5 1 c
#6 4 e
#7 3 b
#8 4 e
#9 5 d
#10 1 c
From the title of the post I understand that you don't want to use merge function but that would be the most straightforward solution.
merge(lookup.df, Data.df, by.x = 'x1', by.y = 'x3')

Pair-wise manipulating rows in data.frame

I have data on several thousand US basketball players over multiple years.
Each basketball player has a unique ID. It is known for what team and on which position they play in a given year, much like the mock data df below:
df <- data.frame(id = c(rep(1:4, times=2), 1),
year = c(1, 1, 2, 2, 3, 4, 4, 4,5),
team = c(1,2,3,4, 2,2,4,4,2),
position = c(1,2,3,4,1,1,4,4,4))
> df
id year team position
1 1 1 1 1
2 2 1 2 2
3 3 2 3 3
4 4 2 4 4
5 1 3 2 1
6 2 4 2 1
7 3 4 4 4
8 4 4 4 4
9 1 5 2 4
What is an efficient way to manipulate df into new_df below?
> new_df
id move time position.1 position.2 year.1 year.2
1 1 0 2 1 1 1 3
2 2 1 3 2 1 1 4
3 3 0 2 3 4 2 4
4 4 1 2 4 4 2 4
5 1 0 2 1 4 3 5
In new_df the first occurrence of the basketball player is compared to the second occurrence, recorded whether the player switched teams and how long it took the player to make the switch.
Note:
In the real data some basketball players occur more than twice and can play for multiple teams and on multiple positions.
In such a case a new row in new_df is added that compares each additional occurrence of a player with only the previous occurrence.
Edit: I think this is not a rather simple reshape exercise, because of the reasons mentioned in the previous two sentences. To clarify this, I've added an additional occurrence of player ID 1 to the mock data.
Any help is most welcome and appreciated!
s=table(df$id)
df$time=rep(1:max(s),each=length(s))
df1 = reshape(df,idvar = "id",dir="wide")
transform(df1, move=+(team.1==team.2),time=year.2-year.1)
id year.1 team.1 position.1 year.2 team.2 position.2 move time
1 1 1 1 1 3 2 1 0 2
2 2 1 2 2 4 2 1 1 3
3 3 2 3 3 4 4 4 0 2
4 4 2 4 4 4 4 4 1 2
The below code should help you get till the point where the data is transposed
You'll have to create the move and time variables
df <- data.frame(id = rep(1:4, times=2),
year = c(1, 1, 2, 2, 3, 4, 4, 4),
team = c(1, 2, 3, 4, 2, 2, 4, 4),
position = c(1, 2, 3, 4, 1, 1, 4, 4))
library(reshape2)
library(data.table)
setDT(df) #convert to data.table
df[,rno:=rank(year,ties="min"),by=.(id)] #gives the occurance
#creating the transposed dataset
Dcast_DT<-dcast(df,id~rno,value.var = c("year","team","position"))
This piece of code did the trick, using data.table
#transform to data.table
dt <- as.data.table(df)
#sort on year
setorder(dt, year, na.last=TRUE)
#indicate the names of the new columns
new_cols= c("time", "move", "prev_team", "prev_year", "prev_position")
#set up the new variables
dtt[ , (new_cols) := list(year - shift(year),team!= shift(team), shift(team), shift(year), shift(position)), by = id]
# select only repeating occurrences
dtt <- dtt[!is.na(dtt$time),]
#outcome
dtt
id year team position time move prev_team prev_year prev_position
1: 1 3 2 1 2 TRUE 1 1 1
2: 2 4 2 1 3 FALSE 2 1 2
3: 3 4 4 4 2 TRUE 3 2 3
4: 4 4 4 4 2 FALSE 4 2 4
5: 1 5 2 4 2 FALSE 2 3 1

Count the occurrence of one vector's values in another vector including non match values in R

I have 2 vectors:
v1 <- c(1, 2, 3, 4, 1, 3, 5, 6, 4)
v2 <- c(1, 2, 3, 4, 5, 6, 7)
I want to calculate the occurrence of values of v1 in v2. The expected result is:
1 2 3 4 5 6 7
2 1 2 2 1 1 0
I know there is a function can do this:
table(v1[v1 %in% v2])
However, it only list the matched values:
1 2 3 4 5 6
2 1 2 2 1 1
How can I show all the values in v2?
You can do
table(factor(v1, levels=unique(v2)))
# 1 2 3 4 5 6 7
# 2 1 2 2 1 1 0

Resources