split comma-separated column entry into rows - r

I have already found other versions of the same question but I was not able to adapt the answers given there for my problem. Here is an older link:
The op there had data consisting of two columns only - and the given answer handles this really nicely. But what about more than two columns? Is there a way to adapt the linked code snippet?
Here is an example:
ve <- rbind("4,2","3","1,2,3","5","6","7")
expl <- cbind(head(mtcars),ve)
row.names mpg cyl disp hp drat wt qsec vs am gear carb ve
1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 4,2
2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3
3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1,2,3
4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 5
5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 6
6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 7
I would need:
row.names mpg cyl disp hp drat wt qsec vs am gear carb ve
1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 4
2 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2
3 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3
4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1
5 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 2
6 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3
7 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 5
8 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 6
9 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 7
Thank you!

Try unnest from the tidyr package. My example uses dplyr, but you can also accomplish with base functions.
library(dplyr)
library(tidyr)
expl %>%
mutate(ve = strsplit(as.character(ve), ",")) %>%
unnest(ve)

Here's an attempt using base R only (which also preserves the row names- in a way at least...)
ve <- strsplit(ve, ",")
Res <- expl[rep(seq_len(nrow(expl)), sapply(ve, length)), ]
Res$ve <- unlist(ve)
Res
# mpg cyl disp hp drat wt qsec vs am gear carb ve
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 4
# Mazda RX4.1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1
# Datsun 710.1 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 2
# Datsun 710.2 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 5
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 6
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 7
Or using data.table, one option is
library(data.table)
setDT(expl)[,
strsplit(as.character(ve), ","),
c(names(expl)[-length(expl)])
]
Another option would be
setkey(expl, ve)[setDT(expl)[, strsplit(as.character(ve), ","), ve]]

I would recommend cSplit from my "splitstackshape" package.
Since your example has rownames, I've converted your example data to a data.table with the keep.rownames = TRUE argument.
library(splitstackshape)
cSplit(as.data.table(expl, keep.rownames = TRUE), "ve", ",", "long")
# rn mpg cyl disp hp drat wt qsec vs am gear carb ve
# 1: Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 4
# 2: Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 2
# 3: Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3
# 4: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 1
# 5: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 2
# 6: Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3
# 7: Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 5
# 8: Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 6
# 9: Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 7

Related

Select last row and place it on the top [duplicate]

This question already has answers here:
adding last value to the top of the data frame.
(4 answers)
Closed 3 years ago.
I have a data frame and I'd like to reorder it. I'd like to make the last row the top row.
Example, if I type mtcars into the console the last car listed is a volvo 142E. Suppose I wanted to make this the first row, how would I do that?
dplyr/tidyverse or base r preferred.
In base R -
mtcars[c(nrow(mtcars), seq(nrow(mtcars)-1)), ]
# top 6 rows
mpg cyl disp hp drat wt qsec vs am gear carb
Volvo 142E 21.4 4 121 109 4.11 2.780 18.60 1 1 4 2
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Here's a generalized function for moving any row to top -
move_to_top <- function(df, n) {
df[c(n, setdiff(1:nrow(df), n)), ]
}
head(move_to_top(mtcars, 32))
mpg cyl disp hp drat wt qsec vs am gear carb
Volvo 142E 21.4 4 121 109 4.11 2.780 18.60 1 1 4 2
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Here's a base R method which also works for rows other than the last row
to_top <- nrow(mtcars)
mtcars[order(seq(nrow(mtcars)) != to_top),]
# mpg cyl disp hp drat wt qsec vs am gear carb
# Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# ...
to_top <- which(rownames(mtcars) == 'Valiant')
mtcars[order(seq(nrow(mtcars)) != to_top),]
# mpg cyl disp hp drat wt qsec vs am gear carb
# Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# ...
You can also use setdiff for the same result
mtcars[c(to_top, setdiff(seq(nrow(mtcars)), to_top)),]
Or the order method in dplyr
library(dplyr)
mtcars %>%
rownames_to_column() %>%
arrange(row_number() != n())
# rowname mpg cyl disp hp drat wt qsec vs am gear carb
# 1 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
# 2 Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# 3 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# 4 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# 5 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
# 6 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
# ...
Another idea is to subset and bind rowwise, i.e.
rbind(tail(mtcars, 1), head(mtcars, -1))
# mpg cyl disp hp drat wt qsec vs am gear carb
#Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#...
We can use slice
library(tidyverse)
mtcars%>%
rownames_to_column('rn') %>%
slice(c(n(), 1:(n()-1))) %>%
column_to_rownames('rn')
# mpg cyl disp hp drat wt qsec vs am gear carb NA_1 NA_2
#Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 NA 21.4
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 NA 21.0
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 NA 21.0
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 NA 22.8
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 NA 21.4
# ...

how to swap the third column with the last column, and then delete the swapped last column in R

i want to swap a specific column with the last column, and then delete the last column after swapping. After delete ncol(testFrame) will decrease by 1
Usually a reproducible example is expected but your description is clear enough to understand what you want to do.
Using mtcars as sample data
df <- mtcars
head(df)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
swap_column <- 3
cols <- seq_len(ncol(df))
df1 <- df[replace(cols, cols == swap_column, ncol(df))][-ncol(df)]
head(df1)
# mpg cyl carb hp drat wt qsec vs am gear
#Mazda RX4 21.0 6 4 110 3.90 2.620 16.46 0 1 4
#Mazda RX4 Wag 21.0 6 4 110 3.90 2.875 17.02 0 1 4
#Datsun 710 22.8 4 1 93 3.85 2.320 18.61 1 1 4
#Hornet 4 Drive 21.4 6 1 110 3.08 3.215 19.44 1 0 3
#Hornet Sportabout 18.7 8 2 175 3.15 3.440 17.02 0 0 3
#Valiant 18.1 6 1 105 2.76 3.460 20.22 1 0 3
We replace the column number swap_column with last column number (ncol(df)) and then remove the last column (-ncol(df)).
We can do this conveniently with add_column from tibble. The .after and .before parameters can take either column index or column name. Suppose, we need to shift last column to third position
library(tibble)
data(mtcars)
df1 <- add_column(mtcars[-ncol(mtcars)], mtcars[ncol(mtcars)], .after = 2)
head(df1)
# mpg cyl carb disp hp drat wt qsec vs am gear
#Mazda RX4 21.0 6 4 160 110 3.90 2.620 16.46 0 1 4
#Mazda RX4 Wag 21.0 6 4 160 110 3.90 2.875 17.02 0 1 4
#Datsun 710 22.8 4 1 108 93 3.85 2.320 18.61 1 1 4
#Hornet 4 Drive 21.4 6 1 258 110 3.08 3.215 19.44 1 0 3
#Hornet Sportabout 18.7 8 2 360 175 3.15 3.440 17.02 0 0 3
#Valiant 18.1 6 1 225 105 2.76 3.460 20.22 1 0 3

R: Sort columns by object class

Can you sort a df based on object class? Say
data("mtcars")
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$am <- as.factor(mtcars$am)
sapply(mtcars,class)
and I want all numeric variables first and then all factors at the end? I want to be able to do this on a much larger dataset so I prefer solutions that do not rely on subsetting by column number. Cheers.
Maybe this one?
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
x <- mtcars[,names(sort(unlist(lapply(mtcars, class)), decreasing = T))]
head(x)
# mpg disp hp drat wt qsec gear carb cyl vs am
# Mazda RX4 21.0 160 110 3.90 2.620 16.46 4 4 6 0 1
# Mazda RX4 Wag 21.0 160 110 3.90 2.875 17.02 4 4 6 0 1
# Datsun 710 22.8 108 93 3.85 2.320 18.61 4 1 4 1 1
# Hornet 4 Drive 21.4 258 110 3.08 3.215 19.44 3 1 6 1 0
# Hornet Sportabout 18.7 360 175 3.15 3.440 17.02 3 2 8 0 0
# Valiant 18.1 225 105 2.76 3.460 20.22 3 1 6 1 0
In x, as you see, the columns cyl, vs and am that are of class factor are place at the end and those of class numeric first.

get rid of first column when converting dtm Matrix to DataFrame

I've converted a Document Term Matrix to a dataframe using this simple line
dtm.df <- as.data.frame(inspect(dtm))
The problem is I want to remove the first column (filenames) but the column has no name.
There might be two different issues here: rownames vs. columns.
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Here you see a column printed without a name. These are the rownames.
mpg is the first column. If we wanted to remove this column without refering to its name, we could use
mtcars <- mtcars[,-1]
head(mtcars)
cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 6 225 105 2.76 3.460 20.22 1 0 3 1
On the other hand, if you are talking about the rownames, which are still printed, you can remove them with the function rownames:
rownames(mtcars) <- NULL
head(mtcars)
cyl disp hp drat wt qsec vs am gear carb
1 6 160 110 3.90 2.620 16.46 0 1 4 4
2 6 160 110 3.90 2.875 17.02 0 1 4 4
3 4 108 93 3.85 2.320 18.61 1 1 4 1
4 6 258 110 3.08 3.215 19.44 1 0 3 1
5 8 360 175 3.15 3.440 17.02 0 0 3 2
6 6 225 105 2.76 3.460 20.22 1 0 3 1

How to insert a new column to a data frame with uniform values

I have the following data frame:
> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
What I want to do is to insert new columns called 'new_column' with values 'foo'
resulting in this:
new_column mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 foo 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag foo 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 foo 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive foo 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout foo 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant foo 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
I tried this but failed:
library(zoo)
zoo("foo",mtcars$new_columns)
What's the right way to do it?
You can just use cbind (if the position of the column must be first):
head(cbind("new_column" = "foo", mtcars))
# new_column mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 foo 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag foo 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 foo 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive foo 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout foo 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant foo 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
If the column can be at the end, you can also do:
mtcars$new_column <- "foo"

Resources