Swap the first 2 columns in a data frame with 100 columns? - r

Is there a way to swap the order of the first 2 columns in a data frame with 100+ columns?
All the methods online require you to input the order yourself and with 100 columns that's a bit too tedious.
Example solution being
dfrm <- dfrm[c("2", "3", "1", "4")]
However, with my large data frame this solution is impractical. I want to maintain all the columns order except swap the first two so column 2 is in column 1's position since the software I'm using requires column 1 to be sampleID which I have as column 2, which leads to an error.
Thanks

You can consider the following. dfrm is your target data frame.
dfrm <- dfrm[, c(2, 1, 3:ncol(dfrm))]
Since 3:ncol(dfrm) maintains the same column index as the original data frame, this code will preserve all the column order except the first two columns.

data.table::setcolorder(dfrm, c(2, 1))
It doesn't even need to be a data.table.

library(dplyr)
df <- tibble(a = c(2,3),
b = c(3,4),
c = c(9,9))
df <- df %>% relocate(b, .before = a)
df

We could use select from dplyr package to put the first two columns by name and then use everything():
library(dplyr)
select(head(mtcars), cyl, disp, everything())
cyl disp mpg hp drat wt qsec vs am gear carb
Mazda RX4 6 160 21.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 6 160 21.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 4 108 22.8 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 6 258 21.4 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 8 360 18.7 175 3.15 3.440 17.02 0 0 3 2
Valiant 6 225 18.1 105 2.76 3.460 20.22 1 0 3 1

Related

In a dataframe, replace first 'n' entries of a column with other values from another dataframe

I have a dataframe with a column called 'household'. Household has 2000 rows of entries. Now, I want to replace the first 200 rows with some values, that I have in another column.
So the final result of 'household' would be, first 200 is the replaced rows, and rest of the 1800 rows would be the original rows.
I have tried replace(), slice(), but couldn't figure out a way to do it.
oneday<- read.csv("C:\hh.csv")
top <- oneday %<>% slice(1:10) %>% select("household")
oneday["household"]=top["household"] #this is the part that does not work
It can select the top 10 data as a list, but cannot replace the data to complete the column as required.
Any help would be amazing.
Edit:
enter image description here
So as shown in picture, the data from 2 to 7 are changed, and the remaining are the same. So, data from 2 to 7 are in another data frame, and the remaining are original data.
You can use a simple ifelse statement, so that if the row is equal to or less than 200, then you can replace the values using data from the other dataframe, and if not keep the household values.
library(dplyr)
oneday %>%
mutate(household = ifelse(row_number() <= 200, OtherDF$datachange, household))
Example with mtcars
Here, I pull data from iris to replace in mtcars.
mtcars %>%
head %>%
mutate(mpg = ifelse(row_number() <= 4, iris$Sepal.Length, mpg))
Output
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 5.1 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 4.9 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 4.7 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 4.6 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Or in base R you can just specify the rows and column and directly replace them:
oneday[2:7, "household"] <- top[2:7, "household"]

How to rename all columns between two sets of columns in a dataframe using R?

Say I am interested in renaming columns across several datasets. The columns that need to be renamed vary by name and position, so they can't be selected that way. However, the columns prior to the columns I want to rename and the columns right after are constant.
As an example, say the mpg and cyl columns in mtcars are always the first two columns and their names never change. The vs:carb columns are similar, but their positions change depending on the number of columns added before them (but after cyl). However, the variable names from hp:qsec change and sometime a new variable will get added between them.
Say I want to append the word '_Value' to the end of each of the columns that are located after cyl and before vs. How would I go about doing that, ideally using dplyr?
You can try -
library(dplyr)
mtcars %>%
rename_with(~paste0(., '_Value'), -c(mpg:cyl, vs:carb)) %>%
head
# mpg cyl disp_Value hp_Value drat_Value wt_Value qsec_Value vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
If you have other columns in the data and to rename the columns specifically between cyl and vs you can do -
start <- match('cyl', names(mtcars))
end <- match('vs', names(mtcars))
cols <- (start + 1):(end - 1)
names(mtcars)[cols] <- paste0(names(mtcars)[cols], '_Value')

Can you use "starts_with" as shorthand within a simple "as.numeric" function to query multiple columns?

I have a dataframe with multiple columns of a numeric type, where I want to query if a range of values exist in any of them, and bring back a true/false binary flag with as.numeric.
So I can do this the long way with:
df <- df %>%
mutate(flag = as.numeric(days_dry %in% c(1:28) |
days_frozen %in% c(1:28) |
days_fresh %in% c(1:28))
But I have a bunch of columns I want to query. Why can't I bring back the same result with this?:
df <- df %>%
mutate(flag = as.numeric(vars(starts_with("days_")) %in% c(1:28))
I get no error, but it doesn't bring back any cases which match the criteria.
There might be a better way, but ...
mtcars %>%
mutate(flag = rowSums(sapply(cbind(select(., starts_with("c"))), `%in%`, 4:6)) > 0) %>%
head()
# mpg cyl disp hp drat wt qsec vs am gear carb flag
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 TRUE
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 TRUE
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 TRUE
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 TRUE
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 FALSE
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 TRUE
The premise is using cbind(select(., <>))) to form a mid-pipe inner frame. From there, we sapply over its columns, converting them to columns of logicals. The last step is using rowSums(.) > 0 to determine if a row has at least one TRUE; an alternative to rowSums can use Reduce(``` | ```, ...), but while that is elegant in a list-processing kind of way, it is also slower (especially with multiple matching columns).

How to recover property from table into dataframe in R? [duplicate]

I think I am missing a fundamental concept about R's data frames.
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The names of the cars here. Is this a column? I don't think so, because I am not able to access them via mtcars[,1]. And there is no column name/header for it.
How could I create a data frame like that? How could I use that special column e.g. to describe the data in a plot for example?
They are row names, to access them use:
rownames(mtcars)
For column names use colnames, to see both row and column names, we can use:
dimnames(mtcars)
To modify, for example the first row:
rownames(mtcars)[1] <- "myNewName"
When data frame is created with data.frame, row names are assigned with 1:n numbers.
mydata <- data.frame(x = 1:5)
Then we can modify them:
rownames(mydata) <- paste0("MyName", 1:5)
Or we can add rownames when creating the data.frame:
mydata <- data.frame(x = 1:5, row.names = paste0("MyName", 1:5))
Note:
rownames are not very reliable, for example see this post. (this could be subjective opinion and I avoid them by reassigning rownames to columns)
data.table and dplyr packages prefer not to have them. You can always reassign rownames into a columns as:
mydata$myNames <- rownames(mydata)
A shorter one liner argument with data.tablePackage will make the rowname a column.
library(data.table)
setDT(mtcars, keep.rownames = TRUE[])
head(mtcars)
rn mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
This works too using tibble.
library(tibble)
mtcars %>%
rownames_to_column(var="carnames")
How could you create a data frame like that? =>
you can transform a column to a row names using textshape package. see exemple below
> column to row names
library(textshape)
state_dat <- data.frame(state.name, state.area, state.center, state.division)
column_to_rownames(state_dat)
#making 'state.name' to row names in new data 'new_state_dat'
new_state_dat<-column_to_rownames(state_dat, 'state.name')
I advise you not to use row.names() to transform column into row names
How could I use that special column e.g. to describe the data in a
plot for example?
you can use superheat package, for more information, see https://rlbarter.github.io/superheat/index.html , it's more simple and more powerful if you use textshape package instead row.names() to transform column into rownames

What is about the first column in R's dataset mtcars?

I think I am missing a fundamental concept about R's data frames.
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The names of the cars here. Is this a column? I don't think so, because I am not able to access them via mtcars[,1]. And there is no column name/header for it.
How could I create a data frame like that? How could I use that special column e.g. to describe the data in a plot for example?
They are row names, to access them use:
rownames(mtcars)
For column names use colnames, to see both row and column names, we can use:
dimnames(mtcars)
To modify, for example the first row:
rownames(mtcars)[1] <- "myNewName"
When data frame is created with data.frame, row names are assigned with 1:n numbers.
mydata <- data.frame(x = 1:5)
Then we can modify them:
rownames(mydata) <- paste0("MyName", 1:5)
Or we can add rownames when creating the data.frame:
mydata <- data.frame(x = 1:5, row.names = paste0("MyName", 1:5))
Note:
rownames are not very reliable, for example see this post. (this could be subjective opinion and I avoid them by reassigning rownames to columns)
data.table and dplyr packages prefer not to have them. You can always reassign rownames into a columns as:
mydata$myNames <- rownames(mydata)
A shorter one liner argument with data.tablePackage will make the rowname a column.
library(data.table)
setDT(mtcars, keep.rownames = TRUE[])
head(mtcars)
rn mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
This works too using tibble.
library(tibble)
mtcars %>%
rownames_to_column(var="carnames")
How could you create a data frame like that? =>
you can transform a column to a row names using textshape package. see exemple below
> column to row names
library(textshape)
state_dat <- data.frame(state.name, state.area, state.center, state.division)
column_to_rownames(state_dat)
#making 'state.name' to row names in new data 'new_state_dat'
new_state_dat<-column_to_rownames(state_dat, 'state.name')
I advise you not to use row.names() to transform column into row names
How could I use that special column e.g. to describe the data in a
plot for example?
you can use superheat package, for more information, see https://rlbarter.github.io/superheat/index.html , it's more simple and more powerful if you use textshape package instead row.names() to transform column into rownames

Resources