I need to merge all the csv files in a folder into one csv file. However I need a empty row between each of the file's contents in the merged CSV file. This is to help differentiate the different files and put into the correct formatting for later. Below I have attached the working code that merges the files using lapply, and I would appreciate any ideas on how I can modify this code to add in an empty line before each merge. Thanks.
filenames <- list.files(full.names=TRUE)
Combined <- lapply(filenames,function(x){
read.csv(x, header=FALSE)})
You just add a row of NA values at the end of each dataframe before you rbind the dataframes together.
For example:
All <- lapply(filenames,function(i){
dat = read.csv(i, header=FALSE)
dat[nrow(dat)+1,] = NA
return(dat)
})
Try adding a blank (NA) row to each frame before writing:
list_of_frames <- list(head(mtcars, 3), head(mtcars, 2))
lapply(list_of_frames, function(x) { x[nrow(x)+1,] <- NA; x})
# [[1]]
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# 4 NA NA NA NA NA NA NA NA NA NA NA
# [[2]]
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
# 3 NA NA NA NA NA NA NA NA NA NA NA
Related
This question already has answers here:
Subsetting R data frame results in mysterious NA rows
(7 answers)
Closed 1 year ago.
I am removing rows with certain values with
dat14a <- dat14[dat14${MYVARIABLE}<80, ]
dat14 looks like this
However, after using a/m code dat14a is also affected in other colums showing then the following
How can I avoid that?
Thanks!!
That is most likely because you have NA's in MYVARIABLE column.
Using a reproducible example from mtcars.
df <- mtcars[1:5, 1:5]
df$mpg[1:2] <- NA
df
# mpg cyl disp hp drat
#Mazda RX4 NA 6 160 110 3.90
#Mazda RX4 Wag NA 6 160 110 3.90
#Datsun 710 22.8 4 108 93 3.85
#Hornet 4 Drive 21.4 6 258 110 3.08
#Hornet Sportabout 18.7 8 360 175 3.15
df[df$mpg > 22, ]
# mpg cyl disp hp drat
#NA NA NA NA NA NA
#NA.1 NA NA NA NA NA
#Datsun 710 22.8 4 108 93 3.85
To fix the issue use an additional !is.na(..)
df[df$mpg > 22 & !is.na(df$mpg), ]
# mpg cyl disp hp drat
#Datsun 710 22.8 4 108 93 3.85
I'm still learning some very basic concepts in R. I have an excel file that I import into R, but it has atrocious variable names. I have another file with 2 columns: the first is the original column names in my data file, the second is what I want the variable names to be.
What's the most efficient way to update all column names using this auxiliary file that I have?
Since names or colnames of data frames is a character vector and every column in a data frame is an atomic vector, simply re-assign original with new names.
str(names_df)
# EXPECTED TWO COLUMNS OF CHR TYPE
# RE-ORDER COLUMNS BY PASSING CHARACTER VECTOR
excel_df <- excel_df[names_df$original_names]
# RE-ASSIGN NAMES: TWO METHODS
names(excel_df) <- names_df$new_names
excel_df <- setNames(excel_df, names_df$new_names)
This method is a little overkill if all columns are perfectly accounted for and in the correct order. If there are columns out of order or new columns, however, this method is robust by only changing those you intend to change and are found.
mt <- mtcars
head(mt, 3)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
namechange <- data.frame(oldname = c("mpg", "cyl", "hp"), newname = c("MPG", "CYL", "HP"))
namechange
# oldname newname
# 1 mpg MPG
# 2 cyl CYL
# 3 hp HP
ind <- match(names(mtcars), namechange$oldname)
ind
# [1] 1 2 NA 3 NA NA NA NA NA NA NA
ifelse(is.na(ind), names(mt), namechange$newname[ind])
# [1] "MPG" "CYL" "disp" "HP" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
names(mt) <- ifelse(is.na(ind), names(mt), namechange$newname[ind])
head(mt, 3)
# MPG CYL disp HP drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
you can use the rename command
library(dplyr)
#Exemple with base mtcars
View(mtcars)
new_mtcars<-mtcars%>%
rename('new_MPG'='mpg', 'new_cyl'='cyl')#Change the columns mpg and cyl
View(new_mtcars)
output
I want to calculate the mean of the resulting values returned by abs(((column A - column B)/column A)*100)
So for example on mtcars data i try:
> mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
...
mean((abs(mtcars['cyl']-mtcars['mpg'])/mtcars['mpg'])*100)
Which gives me error:
Warning message: In mean.default((abs(mtcars["cyl"] -
mtcars["mpg"])/mtcars["mpg"]) * : argument is not numeric or
logical: returning NA
How can i fix this?
You need to use $ operator to extract the values as vectors or use double brackets, i.e.
mean((abs(mtcars[['cyl']]-mtcars[['mpg']])/mtcars[['mpg']])*100)
#[1] 64.13455
#or
mean((abs(mtcars$cyl-mtcars$mpg)/mtcars$mpg)*100)
#[1] 64.13455
You can see the difference in structure,
str(mtcars['cyl'])
'data.frame': 32 obs. of 1 variable:
$ cyl: num 6 6 4 6 8 6 8 4 4 6 ...
str(mtcars[['cyl']])
num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
str(mtcars$cyl)
num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
A tidyverse alternative:
library(dplyr)
mtcars %>%
summarise(Mean = mean(abs(cyl - mpg)/mpg) * 100)
Mean
1 64.13455
This question already has answers here:
Rename multiple columns by names
(20 answers)
Closed 4 years ago.
While this is easy to do with base R or setnames in data.table or rename_ in dplyr 0.5. Since rename_ is deprecated, I couldn't find an easy way to do this in dplyr 0.6.0.
Below is an example. I want to replace column name in col.from with corresponding values in col.to:
col.from <- c("wt", "hp", "vs")
col.to <- c("foo", "bar", "baz")
df <- mtcars
head(df, 2)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Expected output:
names(df)[match(col.from, names(df))] <- col.to
head(df, 2)
#> mpg cyl disp bar drat foo qsec baz am gear carb
#> Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
#> Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
How can I do this with rename or rename_at in dplyr 0.6.0?
I don't know if this is the right way to approach it, but
library(dplyr)
df %>% rename_at(vars(col.from), function(x) col.to) %>% head(2)
# mpg cyl disp bar drat foo qsec baz am gear carb
# Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
Also note that I live in the future:
# packageVersion("dplyr")
# # [1] ‘0.7.0’
I'm learning R programming as such have hit a few problems - and with your help have been able to fix them.
But I now have a need to rename columns of a data frame. I have a translation data frame with 2 columns that contains the column names and what the new columns should be called.
Here is my code: my question is how do I select the two columns from the trans dataframe and use them here as trans$old and trans$new variables?
I have 7 columns I'm renaming, and this might be even longer hence the translation table.
replace_header <- function()
{
names(industries)[names(industries)==trans$old] <- trans$new
replaced <- industries
return (replaced)
}
replaced_industries <- replace_header()
Here's an example using the built-in mtcars data frame. We'll use the match function to find the indices of the columns names we want to replace and then replace them with new names.
# Copy of built-in data frame
mt = mtcars
head(mt,3)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Data frame with column name substitutions
dat = data.frame(old=c("mpg","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
dat
old new
1 mpg new.name1
2 am new.name2
Use match to find the indices of the "old" names in the mt data frame:
match(dat[,"old"], names(mt))
[1] 1 9
Substitute "old" names with "new" names:
names(mt)[match(dat[,"old"], names(mt))] = dat[,"new"]
head(mt,3)
new.name1 cyl disp hp drat wt qsec vs new.name2 gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
I'd recommend setnames from "data.table" for this. Using #eipi10's example:
mt = mtcars
dat = data.frame(old=c("mpg","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
library(data.table)
setnames(mt, dat$old, dat$new)
names(mt)
# [1] "new.name1" "cyl" "disp" "hp" "drat" "wt"
# [7] "qsec" "vs" "new.name2" "gear" "carb"
If there's a concern as indicated by #jmbadia that the data.frame with the old and new names, you can add skip_absent=TRUE to setnames.
improving a bit the eipi10's answer, if we want to use a "rename dataframe" with old names not always present on the mt dataframe (e.g. because mt is provided by differnt sources so we don't always know its colnames) we can consider the following code
mt = mtcars
head(mt,3)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# dataframe with possible names to replace
dat = data.frame(old=c("strangeName","am"), new=c("new.name1","new.name2"), stringsAsFactors=FALSE)
# find which old names are present in mt
namesMatched <- dat[dat$old %in% names(mt)
#renaming
names(mt)[match(namesMatched,"old"], names(mt))] = dat[namesMatched,"new"]
head(mt,3)
mpg cyl disp hp drat wt qsec vs new.name2 gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1