sum with unite function tidyr - r

I was reading through the tidyr documentation. I'm trying to make use of the unite function. Is it possible to use the unite function to sum specified columns? Using the example from the documentation.
mtcars %>%
unite(vs_am, vs, am)
mpg cyl disp hp drat wt qsec vs_am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0_1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0_1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1_1 4 1
I'm trying to figure how to get it so the vs_am isn't just the values combined as characters, rather it would add the values of the columns? Eg. for Mazda RX4, vs_am = 1 (because 0+1 = 1)

#Tyler is absolutely correct, unite is not the appropriate function for this task
Here is the code I was looking for
mutate(vs_am = vs + am)
mpg cyl disp hp drat wt qsec vs am gear carb vs_am
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 1
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 1

Related

Getting the total of all rows but excluding certain columns Rstudio

I need to get the total of each row within my table however I want to exclude certain columns as these contain numeric data such as plot numbers or treatments that I don't want to be counted.
I have tried using mutate and rowsums for this but it doesn't seem to work and I get this error:
Error in UseMethod("mutate") :
no applicable method for 'mutate' applied to an object of class "c('double', 'numeric')"
mutate(total=rowSums(select(Flower,-Survey, -Date, -Recorder, -Site, -Block, -Plot, -Treatment)))
Following a comment here is my updated code:
df<- mutate(total = rowSums(select(Flower, !c(Ranunculus.repens, Ranunculus.acris, Ranunculus.ficaria, Trifolium.repens, Geranium.molle, Cardamine.flexuosa, Bellis.perennis, Taraxacum.officinalis, Lamium.purpureum, Glechoma.hederacea, Cardamine.pratensis, Medicago.lupulina, Medicago.arabica, Cerastium.fontanum, Prunella.vulgaris, Sonchus.arvensis, Veronica.persica, Veronica.chamaedrys, Viola.riviniana)), na.rm = TRUE))
I am now getting an error message saying that X must be numeric however after checking all of the columns entered are numeric.
The issue is that the first argument of mutate has to be a dataframe while you try to apply mutate on total which is a numeric. To make your code work you have to do:
library(dplyr)
mutate(Flower, total=rowSums(select(Flower,-Survey, -Date, -Recorder, -Site, -Block, -Plot, -Treatment)))
Using mtcars as example data:
library(dplyr)
mtcars |>
mutate(total = rowSums(select(mtcars, !c(hp, mpg, disp, drat, wt, qsec))))
#> mpg cyl disp hp drat wt qsec vs am gear carb total
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 15
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 15
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 11
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 11
Another option would be to use dplyr::across like so:
mtcars |>
mutate(total = rowSums(across(!c(hp, mpg, disp, drat, wt, qsec))))
#> mpg cyl disp hp drat wt qsec vs am gear carb total
#> Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 15
#> Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 15
#> Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 11
#> Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 11

common data between multiple csv in R

I have 10 csv files, each with a single column data (name) entries. For instance, file 1 has 400 entries of names, file 2 has 386 entries of names, file 3 has 700 entries of names and so on.
I want to find the common entries across all 10 csv files and write them onto a new csv file.
It would be great if someone could post a solution, preferably in R.
You can do it this way:
your_files <- c(path1,path2,...)
your_tables <- lapply(your_files,read.csv)
your_common_colnames <- Reduce(intersect,lapply(your_tables,colnames))
your_new_tables <- lapply(your_tables,`[`,your_common_colnames)
your_output <- do.call(rbind,your_new_tables)
Example :
mtcars1 <- mtcars[1:3,1:5]
# mpg cyl disp hp drat
# Mazda RX4 21.0 6 160 110 3.90
# Mazda RX4 Wag 21.0 6 160 110 3.90
# Datsun 710 22.8 4 108 93 3.85
mtcars2 <- mtcars[1:3,3:10]
# disp hp drat wt qsec vs am gear
# Mazda RX4 160 110 3.90 2.620 16.46 0 1 4
# Mazda RX4 Wag 160 110 3.90 2.875 17.02 0 1 4
# Datsun 710 108 93 3.85 2.320 18.61 1 1 4
your_tables <- list(mtcars1,mtcars2)
your_common_colnames <- Reduce(intersect,lapply(your_tables,colnames))
your_new_tables <- lapply(your_tables,`[`,your_common_colnames)
your_output <- do.call(rbind,your_new_tables)
# disp hp drat
# Mazda RX4 160 110 3.90
# Mazda RX4 Wag 160 110 3.90
# Datsun 710 108 93 3.85
# Mazda RX41 160 110 3.90
# Mazda RX4 Wag1 160 110 3.90
# Datsun 7101 108 93 3.85

Rename Columns with names from another data frame

I'm learning R programming as such have hit a few problems - and with your help have been able to fix them.
But I now have a need to rename columns of a data frame. I have a translation data frame with 2 columns that contains the column names and what the new columns should be called.
Here is my code: my question is how do I select the two columns from the trans dataframe and use them here as trans$old and trans$new variables?
I have 7 columns I'm renaming, and this might be even longer hence the translation table.
replace_header <- function()
{
names(industries)[names(industries)==trans$old] <- trans$new
replaced <- industries
return (replaced)
}
replaced_industries <- replace_header()
Here's an example using the built-in mtcars data frame. We'll use the match function to find the indices of the columns names we want to replace and then replace them with new names.
# Copy of built-in data frame
mt = mtcars
head(mt,3)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Data frame with column name substitutions
dat = data.frame(old=c("mpg","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
dat
old new
1 mpg new.name1
2 am new.name2
Use match to find the indices of the "old" names in the mt data frame:
match(dat[,"old"], names(mt))
[1] 1 9
Substitute "old" names with "new" names:
names(mt)[match(dat[,"old"], names(mt))] = dat[,"new"]
head(mt,3)
new.name1 cyl disp hp drat wt qsec vs new.name2 gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
I'd recommend setnames from "data.table" for this. Using #eipi10's example:
mt = mtcars
dat = data.frame(old=c("mpg","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
library(data.table)
setnames(mt, dat$old, dat$new)
names(mt)
# [1] "new.name1" "cyl" "disp" "hp" "drat" "wt"
# [7] "qsec" "vs" "new.name2" "gear" "carb"
If there's a concern as indicated by #jmbadia that the data.frame with the old and new names, you can add skip_absent=TRUE to setnames.
improving a bit the eipi10's answer, if we want to use a "rename dataframe" with old names not always present on the mt dataframe (e.g. because mt is provided by differnt sources so we don't always know its colnames) we can consider the following code
mt = mtcars
head(mt,3)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# dataframe with possible names to replace
dat = data.frame(old=c("strangeName","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
# find which old names are present in mt
namesMatched <- dat[dat$old %in% names(mt)
#renaming
names(mt)[match(namesMatched,"old"], names(mt))] = dat[namesMatched,"new"]
head(mt,3)
mpg cyl disp hp drat wt qsec vs new.name2 gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1

how to align / display all columns in R using head() with the screen

In R, I did a head(dataset, x) where is x is some numbers.
The display of the columns truncate to next rows, instead of align in one rows.
How can I align every columns in a single row?
It sounds like you are looking for the display width option.
You can see what it's presently set at using getOption("width"), and you can set the options by including something like options(width = somebignumber), where you replace "somebignumber" with the width you would require to view the entire dataset without other columns wrapping below the data.
You can also explore the View function, which might be useful if you are more comfortable with a tabular spreadsheet-like layout.
Example:
> getOption("width")
[1] 80
> head(cbind(mtcars, mtcars), 2)
mpg cyl disp hp drat wt qsec vs am gear carb mpg cyl disp
Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4 21 6 160
Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4 21 6 160
hp drat wt qsec vs am gear carb
Mazda RX4 110 3.9 2.620 16.46 0 1 4 4
Mazda RX4 Wag 110 3.9 2.875 17.02 0 1 4 4
> options(width = 160)
> head(cbind(mtcars, mtcars), 2)
mpg cyl disp hp drat wt qsec vs am gear carb mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4 21 6 160 110 3.9 2.875 17.02 0 1 4 4
>

Counting number of entries by ID in R?

So I'm trying to count the number of entries by ID in R, I'll use a modified version of mtcars to get my point across. Here's the data:
car type mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Datsun 710 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
I want to end up with a table that counts the number of entries for each ID, so that my results are:
Mazda RX4 2
Datsun 710 2
Should be a fairly simple and straightforward solution, but I'm new to R and can't quite figure it out. Should I use "Aggregate"?
You can use either table or count
as.data.frame(table(rownames(mtcars)))
Or
library(plyr)
count(rownames(mtcars))
If you need the count for one of the column,
as.data.frame(table(yourdf$id))
Using dplyr on a data frame named df with an ID variable called id:
library(dplyr)
tally(group_by(df, id))

Resources