Deleting rows in r creates NA's in other rows [duplicate] - r

This question already has answers here:
Subsetting R data frame results in mysterious NA rows
(7 answers)
Closed 1 year ago.
I am removing rows with certain values with
dat14a <- dat14[dat14${MYVARIABLE}<80, ]
dat14 looks like this
However, after using a/m code dat14a is also affected in other colums showing then the following
How can I avoid that?
Thanks!!

That is most likely because you have NA's in MYVARIABLE column.
Using a reproducible example from mtcars.
df <- mtcars[1:5, 1:5]
df$mpg[1:2] <- NA
df
# mpg cyl disp hp drat
#Mazda RX4 NA 6 160 110 3.90
#Mazda RX4 Wag NA 6 160 110 3.90
#Datsun 710 22.8 4 108 93 3.85
#Hornet 4 Drive 21.4 6 258 110 3.08
#Hornet Sportabout 18.7 8 360 175 3.15
df[df$mpg > 22, ]
# mpg cyl disp hp drat
#NA NA NA NA NA NA
#NA.1 NA NA NA NA NA
#Datsun 710 22.8 4 108 93 3.85
To fix the issue use an additional !is.na(..)
df[df$mpg > 22 & !is.na(df$mpg), ]
# mpg cyl disp hp drat
#Datsun 710 22.8 4 108 93 3.85

Related

How can I convert a column into `NA`s with R [duplicate]

This question already has answers here:
Add empty columns to a dataframe with specified names from a vector
(6 answers)
Closed 6 months ago.
I need to convert multiple columns (all of the values in each column) in a data frame to have NA as their value, is this possible in R?
You may assign NA to multiple columns. Example:
mtcars[c("mpg", "cyl", "disp")] <- NA
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 NA NA NA 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag NA NA NA 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 NA NA NA 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive NA NA NA 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout NA NA NA 175 3.15 3.440 17.02 0 0 3 2
# Valiant NA NA NA 105 2.76 3.460 20.22 1 0 3 1

How to select columns of a data frame based on the columns of another data frame

I have a data frame, df_1_2017, with 38 columns. I have another data frame, df_2_2018, with 43 columns. I want the same number of columns/header names so I can easily cbind the two data frames.
I have tried the below with out any luck
col_names_2017 <- colnames(df_1_2017)
selected_cols_df_2_2018 <- df_2_2018 %>%
select(col_names_2017)
Error in `select()`:
! Can't subset columns that don't exist.
✖ Column `Canopy_cover_mean` doesn't exist.
How can I incorporate a select where if the colnames from df_1_2017 are present in df_2_2018 then to select all the columns the two data frames share.
You can use
common_colsnms <- intersect(colnames(df_1_2017) , colnames(df_2_2018))
# apply
selected_cols_df_2_2018 <- df_2_2018 %>%
select(common_colsnms)
Please see https://dplyr.tidyverse.org/reference/dplyr_tidy_select.html for future reference. Let me know if this works.
col_names_2017 <- colnames(df_1_2017)
selected_cols_df_2_2018 <- df_2_2018 %>%
select(all_of(col_names_2017))
If you really mean cbind, then it has nothing to do with the numbers or names of columns (well, duplicate names are discouraged but possible). In this case, you should be looking at the number of rows in each, and if they align row-wise; normally either they are perfectly a match (same number of rows, each row means the same thing) or they have shared ID fields that require a join/merge operation.
However, in case you mean rbind instead, where you feel you need the columns to match, by-name, then you have a couple of options.
base R
mt2 <- mtcars[1:3,]
mt3 <- mtcars[4:6,]
names(mt2)[3:5] <- paste(names(mt2)[3:5], "_2")
names(mt2)[6:8] <- paste(names(mt2)[3:5], "_3")
mt2 <- mtcars[1:3,]
mt3 <- mtcars[4:6,]
names(mt2)[3:5] <- paste(names(mt2)[3:5], "_2")
names(mt3)[6:8] <- paste(names(mt3)[6:8], "_3")
mt2
# mpg cyl disp_2 hp_2 drat_2 wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
mt3
# mpg cyl disp hp drat wt_3 qsec_3 vs_3 am gear carb
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
common <- intersect(names(mt2), names(mt3))
mt2[,common]
# mpg cyl am gear carb
# Mazda RX4 21.0 6 1 4 4
# Mazda RX4 Wag 21.0 6 1 4 4
# Datsun 710 22.8 4 1 4 1
mt3[,common]
# mpg cyl am gear carb
# Hornet 4 Drive 21.4 6 0 3 1
# Hornet Sportabout 18.7 8 0 3 2
# Valiant 18.1 6 0 3 1
rbind(mt2[,common], mt3[,common])
# mpg cyl am gear carb
# Mazda RX4 21.0 6 1 4 4
# Mazda RX4 Wag 21.0 6 1 4 4
# Datsun 710 22.8 4 1 4 1
# Hornet 4 Drive 21.4 6 0 3 1
# Hornet Sportabout 18.7 8 0 3 2
# Valiant 18.1 6 0 3 1
dplyr, limiting names
library(dplyr)
rbind(select(mt2, any_of(names(mt3))), select(mt3, any_of(names(mt2))))
# mpg cyl am gear carb
# Mazda RX4 21.0 6 1 4 4
# Mazda RX4 Wag 21.0 6 1 4 4
# Datsun 710 22.8 4 1 4 1
# Hornet 4 Drive 21.4 6 0 3 1
# Hornet Sportabout 18.7 8 0 3 2
# Valiant 18.1 6 0 3 1
Or using the more-flexible bind_rows:
select(mt2, any_of(names(mt3))) %>%
bind_rows(select(mt3, any_of(names(mt2))))
# mpg cyl am gear carb
# Mazda RX4 21.0 6 1 4 4
# Mazda RX4 Wag 21.0 6 1 4 4
# Datsun 710 22.8 4 1 4 1
# Hornet 4 Drive 21.4 6 0 3 1
# Hornet Sportabout 18.7 8 0 3 2
# Valiant 18.1 6 0 3 1
dplyr, accept all columns
If you are less concerned about extra columns, then you can use bind_rows and its innate ability to align columns by name and create columns in one frame where it is not found.
bind_rows(mt2, mt3)
# mpg cyl disp_2 hp_2 drat_2 wt qsec vs am gear carb disp hp drat wt_3 qsec_3 vs_3
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 NA NA NA NA NA NA
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 NA NA NA NA NA NA
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 NA NA NA NA NA NA
# Hornet 4 Drive 21.4 6 NA NA NA NA NA NA 0 3 1 258 110 3.08 3.215 19.44 1
# Hornet Sportabout 18.7 8 NA NA NA NA NA NA 0 3 2 360 175 3.15 3.440 17.02 0
# Valiant 18.1 6 NA NA NA NA NA NA 0 3 1 225 105 2.76 3.460 20.22 1

Adding empty line between each file when using lapply

I need to merge all the csv files in a folder into one csv file. However I need a empty row between each of the file's contents in the merged CSV file. This is to help differentiate the different files and put into the correct formatting for later. Below I have attached the working code that merges the files using lapply, and I would appreciate any ideas on how I can modify this code to add in an empty line before each merge. Thanks.
filenames <- list.files(full.names=TRUE)
Combined <- lapply(filenames,function(x){
read.csv(x, header=FALSE)})
You just add a row of NA values at the end of each dataframe before you rbind the dataframes together.
For example:
All <- lapply(filenames,function(i){
dat = read.csv(i, header=FALSE)
dat[nrow(dat)+1,] = NA
return(dat)
})
Try adding a blank (NA) row to each frame before writing:
list_of_frames <- list(head(mtcars, 3), head(mtcars, 2))
lapply(list_of_frames, function(x) { x[nrow(x)+1,] <- NA; x})
# [[1]]
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# 4 NA NA NA NA NA NA NA NA NA NA NA
# [[2]]
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
# 3 NA NA NA NA NA NA NA NA NA NA NA

transform variables and add to dataframe

I would like to log transform several variables in a dataframe and to then add the transformed variables to the dataframe as new variables named using 'logoldname'. What are the best ways of doing these in R efficiently? Thank you!
data("mtcars")
head(mtcars)
#Log transform - maunally
mtcars$logdisp <- log(mtcars$disp)
mtcars$loghp <- log(mtcars$hp)
mtcars$logwt <- log(mtcars$wt)
mtcars$logqsec <- log(mtcars$qsec)
Not sure why the downvotes; I think the question is perfectly fine, and a comment with an explanation how OP could've improved his question would have helped.
That aside, here is a tidyverse solution:
# These are the columns with entries you'd like to log-transform
ss <- c("disp", "hp", "wt", "qsec")
mtcars %>%
mutate_at(vars(one_of(ss)), funs(log = log(.))) %>%
rename_at(vars(contains("_log")), funs(paste0("log_", gsub("_log", "", .)))) %>%
select(contains("log_"))
# log_disp log_hp log_wt log_qsec
#1 5.075174 4.700480 0.9631743 2.800933
#2 5.075174 4.700480 1.0560527 2.834389
#3 4.682131 4.532599 0.8415672 2.923699
#4 5.552960 4.700480 1.1678274 2.967333
#5 5.886104 5.164786 1.2354715 2.834389
#6 5.416100 4.653960 1.2412686 3.006672
Explanation: mutate_at selects columns that match ss and applies a log transformation. This generates new columns, named e.g. "disp_log", "hp_log" and so on. We then rename those columns into log_disp, log_hp, etc., and select only the log-transformed columns in the final step.
This solution uses base R only, and I believe is simpler than the tidyverse solution. I will use the vector ss in that solution, by #Maurits Evers.
data("mtcars")
ss <- c("disp", "hp", "wt", "qsec")
logs <- sapply(mtcars[ss], log)
colnames(logs) <- paste("log", ss, sep = "_")
result <- cbind(mtcars, logs)
head(result)
# mpg cyl disp hp drat wt qsec vs am gear carb log_disp
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 5.075174
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 5.075174
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 4.682131
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 5.552960
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 5.886104
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 5.416100
# log_hp log_wt log_qsec
#Mazda RX4 4.700480 0.9631743 2.800933
#Mazda RX4 Wag 4.700480 1.0560527 2.834389
#Datsun 710 4.532599 0.8415672 2.923699
#Hornet 4 Drive 4.700480 1.1678274 2.967333
#Hornet Sportabout 5.164786 1.2354715 2.834389
#Valiant 4.653960 1.2412686 3.006672
If you don't want to cbind the logs with the original dataframe, you can coerce the matrix produced by sapply to data.frame:
result <- as.data.frame(logs)
And maybe a final clean up, rm(logs).

A missing data type

I am working on this predifend dataset in R called mtcars. The head of this dataset looks like:
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
on the left side there is the name of each car. how come it doesn't have data type such as num or factor ? how can I apply that on a simillar datset?
the structure is as :
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
where name of the cars doesn't appear.
From what you share, the name of car was read in and assign as rowname of the dataset. As str() only print out the summary of columns in dataset so it will not print any thing related with the row names.
So if you want to have the car name as a column just add a column:
mtcars_trial <- mtcars
mtcars_trial$carname <- rownames(mtcars)
Regard to comment:
# this will assign the column as rowname for dataset
rownames(mtcars_trial) <- mtcars_trial$carname
# this will remove the carname column
mtcars_trial$carname <- NULL
The reason behind this is that here, the names of cars are stored as rownames and NOT as a column in the data.frame.
Based on Kunal Puri,
## values
mpg <- c(21.0, 21.0,22.8)
cyl <- c(6,6,4)
disp <- c(160,160,108)
hp <- c(110, 110, 93)
drat <- c(3.90,3.90,3.85)
wt <- c(2.620, 2.875, 2.320)
qsec <- c(16.46, 17.02,18061)
vs <- c(0,0,1)
am <- c(1,1,1)
gear <- c(4,4,4)
crab <- c(4,4,1)
## data.frame
df <- data.frame(mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, crab, row.names = c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710"))
## give a column name, take rownames values
df$cars <- rownames(df)
## row names removed
rownames(df) <- NULL
## rearranged df
df <- data.frame(df[12], df[1], df[2], df[3], df[4], df[5], df[6],df[7], df[8], df[9], df[10], df[11])
print(df)
output:
cars mpg cyl disp hp drat wt qsec vs am gear crab
1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 Datsun 710 22.8 4 108 93 3.85 2.320 18061.00 1 1 4 1
Does it help?I guess it is the same thing as previous solution.

Resources