common data between multiple csv in R - r

I have 10 csv files, each with a single column data (name) entries. For instance, file 1 has 400 entries of names, file 2 has 386 entries of names, file 3 has 700 entries of names and so on.
I want to find the common entries across all 10 csv files and write them onto a new csv file.
It would be great if someone could post a solution, preferably in R.

You can do it this way:
your_files <- c(path1,path2,...)
your_tables <- lapply(your_files,read.csv)
your_common_colnames <- Reduce(intersect,lapply(your_tables,colnames))
your_new_tables <- lapply(your_tables,`[`,your_common_colnames)
your_output <- do.call(rbind,your_new_tables)
Example :
mtcars1 <- mtcars[1:3,1:5]
# mpg cyl disp hp drat
# Mazda RX4 21.0 6 160 110 3.90
# Mazda RX4 Wag 21.0 6 160 110 3.90
# Datsun 710 22.8 4 108 93 3.85
mtcars2 <- mtcars[1:3,3:10]
# disp hp drat wt qsec vs am gear
# Mazda RX4 160 110 3.90 2.620 16.46 0 1 4
# Mazda RX4 Wag 160 110 3.90 2.875 17.02 0 1 4
# Datsun 710 108 93 3.85 2.320 18.61 1 1 4
your_tables <- list(mtcars1,mtcars2)
your_common_colnames <- Reduce(intersect,lapply(your_tables,colnames))
your_new_tables <- lapply(your_tables,`[`,your_common_colnames)
your_output <- do.call(rbind,your_new_tables)
# disp hp drat
# Mazda RX4 160 110 3.90
# Mazda RX4 Wag 160 110 3.90
# Datsun 710 108 93 3.85
# Mazda RX41 160 110 3.90
# Mazda RX4 Wag1 160 110 3.90
# Datsun 7101 108 93 3.85

Related

Passing outer function params to inner function

Following on from my previous challenging exercise: promise already under evaluation with nesting from function, I have learnt thus far how to properly use: enquos, !!!, c() within a function for a variety of calling methods. However, my next challenge is more complex - I want to call a function within a function, and only passing it parameters from the outer function. Essentially, I wanted to make a list of functions and pass different parameters to each element from the list by using another function.
for example:
anotherTest <- function(data,...){
cols = enquos(...)
testFunc <- function(df, more){
df %>% mutate(!!!c(more))
}
n <- length(cols)
addMutation <- replicate(n, testFunc, simplify=FALSE)
print(addMutation)
addCars <- replicate(n, data)
mapply(function(x, y, z) x %>% reduce(., y, z),addCars, addMutation, cols)
}
When I call:
anotherTest(mtcars, vs, gear, am)
I get this error:
Error in fn(out, elt, ...) : unused argument (~vs)
We could try
anotherTest <- function(data,...){
cols = enquos(...)
testFunc <- function(df, more){
df %>% mutate(!!!c(more))
}
n <- length(cols)
addMutation <- replicate(n, testFunc, simplify=FALSE)
addCars <- replicate(n, data, simplify = FALSE)
Map(function(x, y, z) y(x, z), addCars, addMutation, cols)
}
-testing
out <- anotherTest(mtcars, vs, gear, am)
> lapply(out, head, 3)
[[1]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
[[2]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
[[3]]
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1

How to sort only the row names of a data frame in alphabetical order?

I don't know how to sort only the rownames of a dataframe in alphabetical order.
For example below.
mtcars
mpg cyl disp hp drat wt ...
Mazda RX4 21.0 6 160 110 3.90 2.62 ...
Mazda RX4 Wag 21.0 6 160 110 3.90 2.88 ...
Datsun 710 22.8 4 108 93 3.85 2.32 ...
My end goal is for it to look like this
Datsun 710
Mazda RX4
Mazda RX4 Wag
...
and not like
Datsun 710 22.8 4 108 93 3.85 2.32 ...
Mazda RX4 21.0 6 160 110 3.90 2.62 ...
Mazda RX4 Wag 21.0 6 160 110 3.90 2.88 ...
Thanks

Rename Columns with names from another data frame

I'm learning R programming as such have hit a few problems - and with your help have been able to fix them.
But I now have a need to rename columns of a data frame. I have a translation data frame with 2 columns that contains the column names and what the new columns should be called.
Here is my code: my question is how do I select the two columns from the trans dataframe and use them here as trans$old and trans$new variables?
I have 7 columns I'm renaming, and this might be even longer hence the translation table.
replace_header <- function()
{
names(industries)[names(industries)==trans$old] <- trans$new
replaced <- industries
return (replaced)
}
replaced_industries <- replace_header()
Here's an example using the built-in mtcars data frame. We'll use the match function to find the indices of the columns names we want to replace and then replace them with new names.
# Copy of built-in data frame
mt = mtcars
head(mt,3)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Data frame with column name substitutions
dat = data.frame(old=c("mpg","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
dat
old new
1 mpg new.name1
2 am new.name2
Use match to find the indices of the "old" names in the mt data frame:
match(dat[,"old"], names(mt))
[1] 1 9
Substitute "old" names with "new" names:
names(mt)[match(dat[,"old"], names(mt))] = dat[,"new"]
head(mt,3)
new.name1 cyl disp hp drat wt qsec vs new.name2 gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
I'd recommend setnames from "data.table" for this. Using #eipi10's example:
mt = mtcars
dat = data.frame(old=c("mpg","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
library(data.table)
setnames(mt, dat$old, dat$new)
names(mt)
# [1] "new.name1" "cyl" "disp" "hp" "drat" "wt"
# [7] "qsec" "vs" "new.name2" "gear" "carb"
If there's a concern as indicated by #jmbadia that the data.frame with the old and new names, you can add skip_absent=TRUE to setnames.
improving a bit the eipi10's answer, if we want to use a "rename dataframe" with old names not always present on the mt dataframe (e.g. because mt is provided by differnt sources so we don't always know its colnames) we can consider the following code
mt = mtcars
head(mt,3)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# dataframe with possible names to replace
dat = data.frame(old=c("strangeName","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
# find which old names are present in mt
namesMatched <- dat[dat$old %in% names(mt)
#renaming
names(mt)[match(namesMatched,"old"], names(mt))] = dat[namesMatched,"new"]
head(mt,3)
mpg cyl disp hp drat wt qsec vs new.name2 gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1

RMySQL error - duplicate row names

I get this below error when I am running the below code to read my MySQL server table.
my_data <- dbReadTable(mydb, "ar_data")
Warning message:
row.names not set (duplicate elements in field)
Is there any way through which I don't ask R to read the row names. My table is fine and I don't want to make any changes to my MySQL table.
Here are a few options:
library(RMySQL)
library(DBI)
drv <- dbDriver("MySQL")
con <- dbConnect (drv, dbname="mydb", user="username")
data <- mtcars; rownames(data) <- NULL; data$row_names <- rownames(mtcars)[1]
dbWriteTable(con, "mtcars", data, overwrite = T, row.names = F)
head( dbReadTable(con, "mtcars"), 3 )
# mpg cyl disp hp drat wt qsec vs am gear carb row_names
# 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4
# 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4
# 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Mazda RX4
# Warning message:
# row.names not set (duplicate elements in field)
# suppress warnings
head( suppressWarnings(dbReadTable(con, "mtcars")), 3 )
# mpg cyl disp hp drat wt qsec vs am gear carb row_names
# 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4
# 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4
# 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Mazda RX4
# rename column row_names to rn
dbSendQuery(con, "ALTER TABLE mtcars CHANGE COLUMN row_names rn TEXT")
head( dbReadTable(con, "mtcars"), 3 )
# rn mpg cyl disp hp drat wt qsec vs am gear carb
# 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# 2 Mazda RX4 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# 3 Mazda RX4 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
dbSendQuery(con, 'DROP TABLE mtcars')
dbDisconnect(con)

sum with unite function tidyr

I was reading through the tidyr documentation. I'm trying to make use of the unite function. Is it possible to use the unite function to sum specified columns? Using the example from the documentation.
mtcars %>%
unite(vs_am, vs, am)
mpg cyl disp hp drat wt qsec vs_am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0_1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0_1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1_1 4 1
I'm trying to figure how to get it so the vs_am isn't just the values combined as characters, rather it would add the values of the columns? Eg. for Mazda RX4, vs_am = 1 (because 0+1 = 1)
#Tyler is absolutely correct, unite is not the appropriate function for this task
Here is the code I was looking for
mutate(vs_am = vs + am)
mpg cyl disp hp drat wt qsec vs am gear carb vs_am
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 1
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 1

Resources