get rid of first column when converting dtm Matrix to DataFrame

get rid of first column when converting dtm Matrix to DataFrame - r

I've converted a Document Term Matrix to a dataframe using this simple line
dtm.df <- as.data.frame(inspect(dtm))
The problem is I want to remove the first column (filenames) but the column has no name.

There might be two different issues here: rownames vs. columns.
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Here you see a column printed without a name. These are the rownames.
mpg is the first column. If we wanted to remove this column without refering to its name, we could use
mtcars <- mtcars[,-1]
head(mtcars)
cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 6 225 105 2.76 3.460 20.22 1 0 3 1
On the other hand, if you are talking about the rownames, which are still printed, you can remove them with the function rownames:
rownames(mtcars) <- NULL
head(mtcars)
cyl disp hp drat wt qsec vs am gear carb
1 6 160 110 3.90 2.620 16.46 0 1 4 4
2 6 160 110 3.90 2.875 17.02 0 1 4 4
3 4 108 93 3.85 2.320 18.61 1 1 4 1
4 6 258 110 3.08 3.215 19.44 1 0 3 1
5 8 360 175 3.15 3.440 17.02 0 0 3 2
6 6 225 105 2.76 3.460 20.22 1 0 3 1

Related

select_at() drop some vars, pull some to front and then everything() in one call?

Example, I want to drop field mpg, select carb so that it's first, then just everything that's left over in their existing order.
mtcars |> select_at(vars(-mpg, carb, everything()))
This seems to drop mpg as desired, but carb is not in the front position / first variable.
My call to select_at() was intended to read in english 'drop mpg, then select carb first then everything else'.
On the docs for ?vars it says to use across. I'm open to either, but would prefer a one liner if possible as opposed to e.g. select(-mpg) |> select_at(vars(carb, everything()))

The order can be changed - i.e. place the column that needs to be deleted as the last entry
library(dplyr)
mtcars |>
select_at(vars(carb, everything(), -mpg)) |>
head()
carb cyl disp hp drat wt qsec vs am gear
Mazda RX4 4 6 160 110 3.90 2.620 16.46 0 1 4
Mazda RX4 Wag 4 6 160 110 3.90 2.875 17.02 0 1 4
Datsun 710 1 4 108 93 3.85 2.320 18.61 1 1 4
Hornet 4 Drive 1 6 258 110 3.08 3.215 19.44 1 0 3
Hornet Sportabout 2 8 360 175 3.15 3.440 17.02 0 0 3
Valiant 1 6 225 105 2.76 3.460 20.22 1 0 3
The _at/_all etc are all deprecated. We can directly use everything() within select
mtcars |>
select(carb, everything(), -mpg) |>
head()
carb cyl disp hp drat wt qsec vs am gear
Mazda RX4 4 6 160 110 3.90 2.620 16.46 0 1 4
Mazda RX4 Wag 4 6 160 110 3.90 2.875 17.02 0 1 4
Datsun 710 1 4 108 93 3.85 2.320 18.61 1 1 4
Hornet 4 Drive 1 6 258 110 3.08 3.215 19.44 1 0 3
Hornet Sportabout 2 8 360 175 3.15 3.440 17.02 0 0 3
Valiant 1 6 225 105 2.76 3.460 20.22 1 0 3
The issue is that when we use -mpg as the first entry, it removes that column keeping all the rest of the column, then adding 'carb' as second entry does nothing because 'carb' is already a column in the selected data and duplicates for column names are not allowed, the last everything() adds back the 'mpg' again.
> mtcars |> select_at(vars(-mpg)) |> head()
cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 6 225 105 2.76 3.460 20.22 1 0 3 1
> mtcars |> select_at(vars(-mpg, carb)) |> head()
cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 6 225 105 2.76 3.460 20.22 1 0 3 1
> mtcars |> select_at(vars(-mpg, carb, everything())) |> head()
cyl disp hp drat wt qsec vs am gear carb mpg
Mazda RX4 6 160 110 3.90 2.620 16.46 0 1 4 4 21.0
Mazda RX4 Wag 6 160 110 3.90 2.875 17.02 0 1 4 4 21.0
Datsun 710 4 108 93 3.85 2.320 18.61 1 1 4 1 22.8
Hornet 4 Drive 6 258 110 3.08 3.215 19.44 1 0 3 1 21.4
Hornet Sportabout 8 360 175 3.15 3.440 17.02 0 0 3 2 18.7
Valiant 6 225 105 2.76 3.460 20.22 1 0 3 1 18.1
According to ?select, the usage is
select(.data, ...)
where ... is variadic argument, which can take any number of column names, numbers etc.
The order of evaluation happens from left to right, thus first expression is evaluated, then second and so on ...

how to swap the third column with the last column, and then delete the swapped last column in R

i want to swap a specific column with the last column, and then delete the last column after swapping. After delete ncol(testFrame) will decrease by 1

Usually a reproducible example is expected but your description is clear enough to understand what you want to do.
Using mtcars as sample data
df <- mtcars
head(df)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
swap_column <- 3
cols <- seq_len(ncol(df))
df1 <- df[replace(cols, cols == swap_column, ncol(df))][-ncol(df)]
head(df1)
# mpg cyl carb hp drat wt qsec vs am gear
#Mazda RX4 21.0 6 4 110 3.90 2.620 16.46 0 1 4
#Mazda RX4 Wag 21.0 6 4 110 3.90 2.875 17.02 0 1 4
#Datsun 710 22.8 4 1 93 3.85 2.320 18.61 1 1 4
#Hornet 4 Drive 21.4 6 1 110 3.08 3.215 19.44 1 0 3
#Hornet Sportabout 18.7 8 2 175 3.15 3.440 17.02 0 0 3
#Valiant 18.1 6 1 105 2.76 3.460 20.22 1 0 3
We replace the column number swap_column with last column number (ncol(df)) and then remove the last column (-ncol(df)).

We can do this conveniently with add_column from tibble. The .after and .before parameters can take either column index or column name. Suppose, we need to shift last column to third position
library(tibble)
data(mtcars)
df1 <- add_column(mtcars[-ncol(mtcars)], mtcars[ncol(mtcars)], .after = 2)
head(df1)
# mpg cyl carb disp hp drat wt qsec vs am gear
#Mazda RX4 21.0 6 4 160 110 3.90 2.620 16.46 0 1 4
#Mazda RX4 Wag 21.0 6 4 160 110 3.90 2.875 17.02 0 1 4
#Datsun 710 22.8 4 1 108 93 3.85 2.320 18.61 1 1 4
#Hornet 4 Drive 21.4 6 1 258 110 3.08 3.215 19.44 1 0 3
#Hornet Sportabout 18.7 8 2 360 175 3.15 3.440 17.02 0 0 3
#Valiant 18.1 6 1 225 105 2.76 3.460 20.22 1 0 3

R: Sort columns by object class

Can you sort a df based on object class? Say
data("mtcars")
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$am <- as.factor(mtcars$am)
sapply(mtcars,class)
and I want all numeric variables first and then all factors at the end? I want to be able to do this on a much larger dataset so I prefer solutions that do not rely on subsetting by column number. Cheers.

Maybe this one?
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
x <- mtcars[,names(sort(unlist(lapply(mtcars, class)), decreasing = T))]
head(x)
# mpg disp hp drat wt qsec gear carb cyl vs am
# Mazda RX4 21.0 160 110 3.90 2.620 16.46 4 4 6 0 1
# Mazda RX4 Wag 21.0 160 110 3.90 2.875 17.02 4 4 6 0 1
# Datsun 710 22.8 108 93 3.85 2.320 18.61 4 1 4 1 1
# Hornet 4 Drive 21.4 258 110 3.08 3.215 19.44 3 1 6 1 0
# Hornet Sportabout 18.7 360 175 3.15 3.440 17.02 3 2 8 0 0
# Valiant 18.1 225 105 2.76 3.460 20.22 3 1 6 1 0
In x, as you see, the columns cyl, vs and am that are of class factor are place at the end and those of class numeric first.

How to insert a new column to a data frame with uniform values

I have the following data frame:
> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
What I want to do is to insert new columns called 'new_column' with values 'foo'
resulting in this:
new_column mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 foo 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag foo 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 foo 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive foo 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout foo 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant foo 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
I tried this but failed:
library(zoo)
zoo("foo",mtcars$new_columns)
What's the right way to do it?

You can just use cbind (if the position of the column must be first):
head(cbind("new_column" = "foo", mtcars))
# new_column mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 foo 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag foo 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 foo 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive foo 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout foo 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant foo 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
If the column can be at the end, you can also do:
mtcars$new_column <- "foo"

how to define the name of a new object with a string?

I would like to define a string
string<- "modelName"
That could be used to name an object later. Something like
paste0(string) <- mtcars
cat(string) <- mtcars
print(string) <- mtcars
get(string) <- mtcars
The needed result is the dataset called "modelName". None of the examples above work, obviously.
Question:
How can create one create an object which name is defined by the sourced string?

As #Spacedman notes this is not generally the way things are done but you can use assign
string<- "modelName"
assign(string, mtcars)
> head(modelName)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
In general it may be perferable to use sometthing like a list:
x <- list()
x[[string]] <- mtcars
> head(x$modelName)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

get rid of first column when converting dtm Matrix to DataFrame - r

I've converted a Document Term Matrix to a dataframe using this simple line dtm.df <- as.data.frame(inspect(dtm)) The problem is I want to remove the first column (filenames) but the column has no name.

Related

select_at() drop some vars, pull some to front and then everything() in one call?

how to swap the third column with the last column, and then delete the swapped last column in R

R: Sort columns by object class

How to insert a new column to a data frame with uniform values

how to define the name of a new object with a string?

Categories

Resources