Counting number of entries by ID in R? - r

So I'm trying to count the number of entries by ID in R, I'll use a modified version of mtcars to get my point across. Here's the data:
car type mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Datsun 710 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
I want to end up with a table that counts the number of entries for each ID, so that my results are:
Mazda RX4 2
Datsun 710 2
Should be a fairly simple and straightforward solution, but I'm new to R and can't quite figure it out. Should I use "Aggregate"?

You can use either table or count
as.data.frame(table(rownames(mtcars)))
Or
library(plyr)
count(rownames(mtcars))
If you need the count for one of the column,
as.data.frame(table(yourdf$id))

Using dplyr on a data frame named df with an ID variable called id:
library(dplyr)
tally(group_by(df, id))

Related

Getting the total of all rows but excluding certain columns Rstudio

I need to get the total of each row within my table however I want to exclude certain columns as these contain numeric data such as plot numbers or treatments that I don't want to be counted.
I have tried using mutate and rowsums for this but it doesn't seem to work and I get this error:
Error in UseMethod("mutate") :
no applicable method for 'mutate' applied to an object of class "c('double', 'numeric')"
mutate(total=rowSums(select(Flower,-Survey, -Date, -Recorder, -Site, -Block, -Plot, -Treatment)))
Following a comment here is my updated code:
df<- mutate(total = rowSums(select(Flower, !c(Ranunculus.repens, Ranunculus.acris, Ranunculus.ficaria, Trifolium.repens, Geranium.molle, Cardamine.flexuosa, Bellis.perennis, Taraxacum.officinalis, Lamium.purpureum, Glechoma.hederacea, Cardamine.pratensis, Medicago.lupulina, Medicago.arabica, Cerastium.fontanum, Prunella.vulgaris, Sonchus.arvensis, Veronica.persica, Veronica.chamaedrys, Viola.riviniana)), na.rm = TRUE))
I am now getting an error message saying that X must be numeric however after checking all of the columns entered are numeric.
The issue is that the first argument of mutate has to be a dataframe while you try to apply mutate on total which is a numeric. To make your code work you have to do:
library(dplyr)
mutate(Flower, total=rowSums(select(Flower,-Survey, -Date, -Recorder, -Site, -Block, -Plot, -Treatment)))
Using mtcars as example data:
library(dplyr)
mtcars |>
mutate(total = rowSums(select(mtcars, !c(hp, mpg, disp, drat, wt, qsec))))
#> mpg cyl disp hp drat wt qsec vs am gear carb total
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 15
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 15
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 11
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 11
Another option would be to use dplyr::across like so:
mtcars |>
mutate(total = rowSums(across(!c(hp, mpg, disp, drat, wt, qsec))))
#> mpg cyl disp hp drat wt qsec vs am gear carb total
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 15
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 15
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 11
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 11

How to make a list with a variable name?

I'm using the mtcars dataset for this example.
I have a function which creates a named list using a variable:
make_list <- function(df, variable_name) {
a <- df %>%
list(variable_name = .)
return(a)
}
When I use this function:
mylist <- make_list(mtcars, "car_info")
head(mylist)
$variable_name
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
The list name is called variable_name, rather than car_info.
How do I change the function (but still use a pipe format) so that the correct name is returned?
If you want to continue using the pipe, you can use setNames:
make_list <- function(df, variable_name) {
df %>%
list%>%
setNames(variable_name)
}
make_list(mtcars, "car_info")
Output:
$car_info
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
make_list <- function(df, variable_name) {
a <- df %>% list
names(a) <- variable_name
return(a)
}
Try this:
make_list <- function(df, variable_name) {
a <- df %>%
list()
names(a) <- variable_name
return(a)
}
mylist <- make_list(mtcars, "car_info")
Output (Some rows):
mylist
$car_info
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
rlang has a list2 function that does that
make_list <- function(df, variable_name) {
rlang::list2(!! variable_name := df)
}
make_list(mtcars, "car_info")
#> $car_info
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Or tibble::lst works the same: make_list <- function(df, variable_name) tibble::lst(!! variable_name := df)

Using select=-c() in subset function gives error: invalid argument to unary operator

I have a matrix and I would like to eliminate two columns by their names.
My code was :
trn_data = subset(trn_data, select = -c("Rye flour","Barley products"))
but R gave me an error message like this:
Error in -c("Rye flour", "Barley products") :
invalid argument to unary operator
I tried this
trn_data = subset(trn_data, select = -c(Rye flour,Barley products))
Also returning an error:
Error: unexpected symbol in "trn_data=subset(trn_data,select =-c(Rye flour"
How can I fix this? Is there any other method that can eliminate two columns by their names?
You should not provide the names as characters to subset. This works:
trn_data_subset <- subset(trn_data, select = -c(`Rye flour`,`Barley products`))
If you have spaces in the name of columns, you should use Grave Accent.
Here's an example using mtcars dataset:
mtexapmple <- mtcars[1:4,]
names(mtexapmple)[1] <- "mpg with space"
mtexapmple
#> mpg with space cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
subset(mtexapmple, select = -c(`mpg with space`, cyl))
#> disp hp drat wt qsec vs am gear carb
#> Mazda RX4 160 110 3.90 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 160 110 3.90 2.875 17.02 0 1 4 4
#> Datsun 710 108 93 3.85 2.320 18.61 1 1 4 1
#> Hornet 4 Drive 258 110 3.08 3.215 19.44 1 0 3 1
You can also do it like these:
within(trn_data, rm(`Rye flour`,`Barley products`))
or
trn_data[, !(colnames(trn_data) %in% c("Rye flour","Barley products"))]
With dplyr, we can still use - with double quote
library(dplyr)
mtexample %>%
select(-"mpg with space")
# cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 6 258 110 3.08 3.215 19.44 1 0 3 1
data
mtexample <- mtcars[1:4,]
names(mtexample)[1] <- "mpg with space"

Rename Columns with names from another data frame

I'm learning R programming as such have hit a few problems - and with your help have been able to fix them.
But I now have a need to rename columns of a data frame. I have a translation data frame with 2 columns that contains the column names and what the new columns should be called.
Here is my code: my question is how do I select the two columns from the trans dataframe and use them here as trans$old and trans$new variables?
I have 7 columns I'm renaming, and this might be even longer hence the translation table.
replace_header <- function()
{
names(industries)[names(industries)==trans$old] <- trans$new
replaced <- industries
return (replaced)
}
replaced_industries <- replace_header()
Here's an example using the built-in mtcars data frame. We'll use the match function to find the indices of the columns names we want to replace and then replace them with new names.
# Copy of built-in data frame
mt = mtcars
head(mt,3)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Data frame with column name substitutions
dat = data.frame(old=c("mpg","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
dat
old new
1 mpg new.name1
2 am new.name2
Use match to find the indices of the "old" names in the mt data frame:
match(dat[,"old"], names(mt))
[1] 1 9
Substitute "old" names with "new" names:
names(mt)[match(dat[,"old"], names(mt))] = dat[,"new"]
head(mt,3)
new.name1 cyl disp hp drat wt qsec vs new.name2 gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
I'd recommend setnames from "data.table" for this. Using #eipi10's example:
mt = mtcars
dat = data.frame(old=c("mpg","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
library(data.table)
setnames(mt, dat$old, dat$new)
names(mt)
# [1] "new.name1" "cyl" "disp" "hp" "drat" "wt"
# [7] "qsec" "vs" "new.name2" "gear" "carb"
If there's a concern as indicated by #jmbadia that the data.frame with the old and new names, you can add skip_absent=TRUE to setnames.
improving a bit the eipi10's answer, if we want to use a "rename dataframe" with old names not always present on the mt dataframe (e.g. because mt is provided by differnt sources so we don't always know its colnames) we can consider the following code
mt = mtcars
head(mt,3)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# dataframe with possible names to replace
dat = data.frame(old=c("strangeName","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
# find which old names are present in mt
namesMatched <- dat[dat$old %in% names(mt)
#renaming
names(mt)[match(namesMatched,"old"], names(mt))] = dat[namesMatched,"new"]
head(mt,3)
mpg cyl disp hp drat wt qsec vs new.name2 gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1

sum with unite function tidyr

I was reading through the tidyr documentation. I'm trying to make use of the unite function. Is it possible to use the unite function to sum specified columns? Using the example from the documentation.
mtcars %>%
unite(vs_am, vs, am)
mpg cyl disp hp drat wt qsec vs_am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0_1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0_1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1_1 4 1
I'm trying to figure how to get it so the vs_am isn't just the values combined as characters, rather it would add the values of the columns? Eg. for Mazda RX4, vs_am = 1 (because 0+1 = 1)
#Tyler is absolutely correct, unite is not the appropriate function for this task
Here is the code I was looking for
mutate(vs_am = vs + am)
mpg cyl disp hp drat wt qsec vs am gear carb vs_am
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 1
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 1

Resources