How can I convert a column into `NA`s with R [duplicate] - r

This question already has answers here:
Add empty columns to a dataframe with specified names from a vector
(6 answers)
Closed 6 months ago.
I need to convert multiple columns (all of the values in each column) in a data frame to have NA as their value, is this possible in R?

You may assign NA to multiple columns. Example:
mtcars[c("mpg", "cyl", "disp")] <- NA
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 NA NA NA 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag NA NA NA 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 NA NA NA 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive NA NA NA 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout NA NA NA 175 3.15 3.440 17.02 0 0 3 2
# Valiant NA NA NA 105 2.76 3.460 20.22 1 0 3 1

Related

How to select columns of a data frame based on the columns of another data frame

I have a data frame, df_1_2017, with 38 columns. I have another data frame, df_2_2018, with 43 columns. I want the same number of columns/header names so I can easily cbind the two data frames.
I have tried the below with out any luck
col_names_2017 <- colnames(df_1_2017)
selected_cols_df_2_2018 <- df_2_2018 %>%
select(col_names_2017)
Error in `select()`:
! Can't subset columns that don't exist.
✖ Column `Canopy_cover_mean` doesn't exist.
How can I incorporate a select where if the colnames from df_1_2017 are present in df_2_2018 then to select all the columns the two data frames share.
You can use
common_colsnms <- intersect(colnames(df_1_2017) , colnames(df_2_2018))
# apply
selected_cols_df_2_2018 <- df_2_2018 %>%
select(common_colsnms)
Please see https://dplyr.tidyverse.org/reference/dplyr_tidy_select.html for future reference. Let me know if this works.
col_names_2017 <- colnames(df_1_2017)
selected_cols_df_2_2018 <- df_2_2018 %>%
select(all_of(col_names_2017))
If you really mean cbind, then it has nothing to do with the numbers or names of columns (well, duplicate names are discouraged but possible). In this case, you should be looking at the number of rows in each, and if they align row-wise; normally either they are perfectly a match (same number of rows, each row means the same thing) or they have shared ID fields that require a join/merge operation.
However, in case you mean rbind instead, where you feel you need the columns to match, by-name, then you have a couple of options.
base R
mt2 <- mtcars[1:3,]
mt3 <- mtcars[4:6,]
names(mt2)[3:5] <- paste(names(mt2)[3:5], "_2")
names(mt2)[6:8] <- paste(names(mt2)[3:5], "_3")
mt2 <- mtcars[1:3,]
mt3 <- mtcars[4:6,]
names(mt2)[3:5] <- paste(names(mt2)[3:5], "_2")
names(mt3)[6:8] <- paste(names(mt3)[6:8], "_3")
mt2
# mpg cyl disp_2 hp_2 drat_2 wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
mt3
# mpg cyl disp hp drat wt_3 qsec_3 vs_3 am gear carb
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
common <- intersect(names(mt2), names(mt3))
mt2[,common]
# mpg cyl am gear carb
# Mazda RX4 21.0 6 1 4 4
# Mazda RX4 Wag 21.0 6 1 4 4
# Datsun 710 22.8 4 1 4 1
mt3[,common]
# mpg cyl am gear carb
# Hornet 4 Drive 21.4 6 0 3 1
# Hornet Sportabout 18.7 8 0 3 2
# Valiant 18.1 6 0 3 1
rbind(mt2[,common], mt3[,common])
# mpg cyl am gear carb
# Mazda RX4 21.0 6 1 4 4
# Mazda RX4 Wag 21.0 6 1 4 4
# Datsun 710 22.8 4 1 4 1
# Hornet 4 Drive 21.4 6 0 3 1
# Hornet Sportabout 18.7 8 0 3 2
# Valiant 18.1 6 0 3 1
dplyr, limiting names
library(dplyr)
rbind(select(mt2, any_of(names(mt3))), select(mt3, any_of(names(mt2))))
# mpg cyl am gear carb
# Mazda RX4 21.0 6 1 4 4
# Mazda RX4 Wag 21.0 6 1 4 4
# Datsun 710 22.8 4 1 4 1
# Hornet 4 Drive 21.4 6 0 3 1
# Hornet Sportabout 18.7 8 0 3 2
# Valiant 18.1 6 0 3 1
Or using the more-flexible bind_rows:
select(mt2, any_of(names(mt3))) %>%
bind_rows(select(mt3, any_of(names(mt2))))
# mpg cyl am gear carb
# Mazda RX4 21.0 6 1 4 4
# Mazda RX4 Wag 21.0 6 1 4 4
# Datsun 710 22.8 4 1 4 1
# Hornet 4 Drive 21.4 6 0 3 1
# Hornet Sportabout 18.7 8 0 3 2
# Valiant 18.1 6 0 3 1
dplyr, accept all columns
If you are less concerned about extra columns, then you can use bind_rows and its innate ability to align columns by name and create columns in one frame where it is not found.
bind_rows(mt2, mt3)
# mpg cyl disp_2 hp_2 drat_2 wt qsec vs am gear carb disp hp drat wt_3 qsec_3 vs_3
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 NA NA NA NA NA NA
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 NA NA NA NA NA NA
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 NA NA NA NA NA NA
# Hornet 4 Drive 21.4 6 NA NA NA NA NA NA 0 3 1 258 110 3.08 3.215 19.44 1
# Hornet Sportabout 18.7 8 NA NA NA NA NA NA 0 3 2 360 175 3.15 3.440 17.02 0
# Valiant 18.1 6 NA NA NA NA NA NA 0 3 1 225 105 2.76 3.460 20.22 1

Extracting string from Named chr in R [duplicate]

I am looking for just the value of the B1(newx) linear model coefficient, not the name. I just want the 0.5 value. I do not want the name "newx".
newx <- c(0.5,1.5,2.5)
newy <- c(2,3,4)
out <- lm(newy ~ newx)
out looks like:
Call:
lm(formula = newy ~ newx)
Coefficients:
(Intercept) newx
1.5 1.0
I arrived here. But now I am stuck.
out$coefficients["newx"]
newx
1.0
For a single element like this, use [[ rather than [. Compare:
coefficients(out)["newx"]
# newx
# 1
coefficients(out)[["newx"]]
# [1] 1
More generally, use unname():
unname(coefficients(out)[c("newx", "(Intercept)")])
# [1] 1.0 1.5
head(unname(mtcars))
# NA NA NA NA NA NA NA NA NA NA NA
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
## etc.
If the question is about removing names, another way is here
my_vec <- # some quantile function(returns named vector)
names(my_vec) <- NULL
my_vec
## [1] 1 2 3
An easy and rather direct way to do it is
as.numeric(out$coefficients["newx"])
Another way would be to use broom package:
broom::tidy(out)$estimate[1]
#1.5

problems with NA in data.table

I have problems with missing values NA in data.table. When using mean(x) BY=z, I got NA if some of observations in a group with the same value of z has x=NA. How I can treat that?
As you have not provided any example data, its hard to guess what are you trying to do. However, here is a sample example to exclude the NA values from calculation. Consider a data table dt
dt = data.table(mtcars)[1:6][2, mpg := NA][]
mpg cyl disp hp drat wt qsec vs am gear carb
1: 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2: NA 6 160 110 3.90 2.875 17.02 0 1 4 4
3: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
6: 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Where you have NA value in second row of first column. If you are interested to calculate the mean of first column, you can use na.rm.
mean(dt$mpg, na.rm = TRUE)
#[1] 20.06129
Or, when doing by-group calculations:
dt[, mean(mpg, na.rm = TRUE), by=cyl]
# cyl V1
# 1: 6 20.16667
# 2: 4 22.80000
# 3: 8 18.70000

Move a column to first position in a data frame

I would like to have the last column of the data frame moved to the start (as first column). How can I do it in R?
My data.frame has about a thousand columns to changing the order wont to. I just want to pick one column and "move it to the start".
Dplyr's select() approach
Moving the last column to the start:
new_df <- df %>%
select(last_column_name, everything())
This is also valid for any column and any quantity:
new_df <- df %>%
select(col_5, col_8, everything())
Example using mtcars data frame:
head(mtcars, n = 2)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Last column is 'carb'
new_df <- mtcars %>% select(carb, everything())
head(new_df, n = 2)
# carb mpg cyl disp hp drat wt qsec vs am gear
# Mazda RX4 4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
# Mazda RX4 Wag 4 21.0 6 160 110 3.90 2.875 17.02 0 1 4
dplyr 1.0.0 now includes the relocate() function to reorder columns. The default behaviour is to move the named column(s) to the first position.
library(dplyr) # from version 1.0.0
mtcars %>%
relocate(carb) %>%
head()
carb mpg cyl disp hp drat wt qsec vs am gear
Mazda RX4 4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
Mazda RX4 Wag 4 21.0 6 160 110 3.90 2.875 17.02 0 1 4
Datsun 710 1 22.8 4 108 93 3.85 2.320 18.61 1 1 4
Hornet 4 Drive 1 21.4 6 258 110 3.08 3.215 19.44 1 0 3
Hornet Sportabout 2 18.7 8 360 175 3.15 3.440 17.02 0 0 3
Valiant 1 18.1 6 225 105 2.76 3.460 20.22 1 0 3
But other locations can be specifed with the .before or .after arguments:
mtcars %>%
relocate(gear, carb, .before = cyl) %>%
head()
mpg gear carb cyl disp hp drat wt qsec vs am
Mazda RX4 21.0 4 4 6 160 110 3.90 2.620 16.46 0 1
Mazda RX4 Wag 21.0 4 4 6 160 110 3.90 2.875 17.02 0 1
Datsun 710 22.8 4 1 4 108 93 3.85 2.320 18.61 1 1
Hornet 4 Drive 21.4 3 1 6 258 110 3.08 3.215 19.44 1 0
Hornet Sportabout 18.7 3 2 8 360 175 3.15 3.440 17.02 0 0
Valiant 18.1 3 1 6 225 105 2.76 3.460 20.22 1 0
You can change the order of columns by adressing them in the new order by choosing them explicitly with data[,c(ORDER YOU WANT THEM TO BE IN)]
If you just want the last column to be first use: data[,c(ncol(data),1:(ncol(data)-1))]
> head(cars)
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
> head(cars[,c(2,1)])
dist speed
1 2 4
2 10 4
3 4 7
4 22 7
5 16 8
6 10 9
dataframe<-dataframe[,c(1000, 1:999)]
this will move your last column i.e. 1000th column to the first column.
I don't know if it's worth adding this as an answer or if a comment would be fine, but I wrote a function called moveme that lets you do what you want to do with the language you describe. You can find the function at this answer: https://stackoverflow.com/a/18540144/1270695
It works on the names of your data.frame and produces a character vector that you can use to reorder your columns:
mydf <- data.frame(matrix(1:12, ncol = 4))
mydf
moveme(names(mydf), "X4 first")
# [1] "X4" "X1" "X2" "X3"
moveme(names(mydf), "X4 first; X1 last")
# [1] "X4" "X2" "X3" "X1"
mydf[moveme(names(mydf), "X4 first")]
# X4 X1 X2 X3
# 1 10 1 4 7
# 2 11 2 5 8
# 3 12 3 6 9
If you're shuffling things around like this, I suggest converting your data.frame to a data.table and using setcolorder (with my moveme function, if you wish) to make the change by reference.
In your question, you also mentioned "I just want to pick one column and move it to the start". If it's an arbitrary column, and not specifically the last one, you could also look at using setdiff.
Imagine you're working with the "mtcars" dataset and want to move the "am" column to the start.
x <- "am"
mtcars[c(x, setdiff(names(mtcars), x))]
If you want to move any named column to the first position, simply use:
df[,c(which(colnames(df)=="desired_colname"),which(colnames(df)!="desired_colname"))]
A native R approach that works with any number of rows or columns to move the last column of a dataframe to the first column position:
df <- df[,c(ncol(df),1:ncol(df)-1)]
It can be used to move any column to the first column by replacing:
df <- df[,c(your_column_number_here,1:ncol(df)-1)]
If you don't know the column number, but know the column label name, do the following replacing "your_column_name_here":
columnNumber <- which(colnames(df)=="your_column_name_here")
df <- df[,c(columnNumber,1:ncol(df)-1)]
There is also the data.table option with setcolorder():
library(data.table)
mtcars_copy <- copy(mtcars)
setDT(mtcars_copy)
# Move column "gear" in the first position
setcolorder(mtcars_copy, neworder = "gear")
head(mtcars_copy)
# gear mpg cyl disp hp drat wt qsec vs am carb
# 1: 4 21.0 6 160 110 3.90 2.620 16.46 0 1 4
# 2: 4 21.0 6 160 110 3.90 2.875 17.02 0 1 4
# 3: 4 22.8 4 108 93 3.85 2.320 18.61 1 1 1
# 4: 3 21.4 6 258 110 3.08 3.215 19.44 1 0 1
# 5: 3 18.7 8 360 175 3.15 3.440 17.02 0 0 2
# 6: 3 18.1 6 225 105 2.76 3.460 20.22 1 0 1
If multiple columns, then mention the order in a vector:
setcolorder(mtcars_copy, neworder = c("vs", "carb"))
head(mtcars_copy)
# vs carb gear mpg cyl disp hp drat wt qsec am
# 1: 0 4 4 21.0 6 160 110 3.90 2.620 16.46 1
# 2: 0 4 4 21.0 6 160 110 3.90 2.875 17.02 1
# 3: 1 1 4 22.8 4 108 93 3.85 2.320 18.61 1
# 4: 1 1 3 21.4 6 258 110 3.08 3.215 19.44 0
# 5: 0 2 3 18.7 8 360 175 3.15 3.440 17.02 0
# 6: 1 1 3 18.1 6 225 105 2.76 3.460 20.22 0
Move any column from any position for the first position in your data
n <- which(colnames(df)=="column_need_move")
column_need_move <- df$column_need_to_move
df <- cbind(column_need_move, df[,-n])
If you want to create a new column and have it be the first column, use the .before=1 argument:
my_data <- my_data %>% mutate(newcol = a*b, .before=1)

How to parse the blank columns in read.table? [duplicate]

This question already has answers here:
Read fixed width text file
(6 answers)
Closed 9 years ago.
dat="mpg cyl disp hp drat wt qsec vs am gear carb
21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
2"
I need it as below, see row 6?
> read.table(text=dat,fill=T,header=TRUE)
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5 2.0 NA NA NA NA NA NA NA NA NA NA
> read.table(text=dat,fill=T,header=TRUE)
mpg cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
5 NA NA NA NA NA NA NA NA NA NA 2
I solved it myself.
read.fwf(file=textConnection(dat),fill=TRUE,skip=1,widths=c(12,4,6,4,5,6,6,3,3,5,5)) -> r
unlist(strsplit(y,split="\\s+")) -> colnames(y)
unlist(strsplit(y,split="\\s+")) -> colnames(r)

Resources