How to rename multiple Columns in R? - r

My goal is to get a concise way to rename multiple columns in a data frame. Let's consider a small data frame df as below:
df <- data.frame(a=1, b=2, c=3)
df
Let's say we want to change the names from a, b, and c to Y, W, and Z respectively.
Defining a character vector containing old names and new names.
df names <- c(Y = "a", Z ="b", E = "c")
I would use this to rename the columns,
rename(df, !!!names)
df
suggestions?

One more !:
df <- data.frame(a=1, b=2, c=3)
df_names <- c(Y = "a", Z ="b", E = "c")
library(dplyr)
df %>% rename(!!!df_names)
## Y Z E
##1 1 2 3
A non-tidy way might be through match:
names(df) <- names(df_names)[match(names(df), df_names)]
df
## Y Z E
##1 1 2 3

You could try:
sample(LETTERS[which(LETTERS %in% names(df) == FALSE)], size= length(names(df)), replace = FALSE)
[1] "S" "D" "N"
Here, you don't really care what the new names are as you're using sample. Otherwise a straight forward names(df) < c('name1', 'name2'...

Related

Printing a data.frame with zero columns but still has row.names?

Is there a built-in function to display a data.frame with zero columns but still show row.names?
> df
DataFrame with 5 rows and 0 columns
> row.names(df)
[1] "ID1" "ID2" "ID3" "ID4" "ID5"
It would be useful if instead:
> df
DataFrame with 5 rows and 0 columns
ID1
ID2
ID3
ID4
ID5
I wrote a custom function to do it via cat, but would be nice to know if there's a built-in way of doing it.
library(tidyverse)
df <- df %>%
select(-everything())
cat(print(df), cat(rownames(df), sep = "\n"))
Or could also be simplified to:
df %>%
select(-everything()) %>%
cat(print(.), cat(rownames(.), sep = "\n"))
Output
data frame with 0 columns and 2 rows
A
B
Or using base R, if you don't care about the information being displayed about the dataframe.
df <- df[1]
df[1] <- rep("", nrow(df))
colnames(df) <- ""
Output
A
B
Data
df <- data.frame(a = c(1, 2),
b = c(1, 2),
c = c(4, 5))
rownames(df) <- c("A", "B")

Select columns based on another column in a different data frame in R

I have a df:
AA <- c("GA","GA", "GA","GA","GA")
A <- c(1,2,3,4,5)
B <- c(5,4,3,2,1)
C <- c(2,3,4,5,1)
D <- c(4,3,2,1,5)
df <- data.frame(AA, A, B, C, D)
The other df is:
E <- c("B", "D")
F <- c("GA","GA")
df2 <- data.frame(E, F)
I would like to only select the columns from df based on the values from df2$E.
And that data frame would look like this:
AA <- c("GA","GA", "GA","GA","GA")
B <- c(5,4,3,2,1)
D <- c(4,3,2,1,5)
df3 <- data.frame(AA, B, D)
My current code below gives me a empty data frame with 0 obs and 5 variables
df3 <- df %>% filter(df %in% df2$E)
Any assistance in generating a code that works would be greatly appreciated.
Thank you!
Here we can index via column names.
df[,c("AA",df2$E)]

How to find if a value exists in a range and print "FOUND" or "MISSING" in a new column

I am trying to perform a function simmiliar to the function in excel fount below:
IF(COUNTIF(RANGE, CRITERIA), "FOUND", "MISSING")
I want to print a new column in my dataframe with found or missing. I understand in R that I can use %in% for example:
A$C %in C$B
To find if the values in column C of the A dataframe exist in the values in column B of the C datafame. However, I do not know how to subset said results with a conditional function to print found or missing to a new column in the correct row.
Here is an example of the dataframes:
A <- data.frame("C" = c(3,5,9,21,25), "D" = 1:5)
C <- data.frame("B" = c(3,6,21,22,8) , "F" = 10:14)
A$C %in% C$B
A[A$C %in% C$B,]
Based on the limited information:
lookup_list <- c(1:3)
x <- c('a','b','c')
y <- c(10, 3, 5)
df <- data.frame(x,y)
x y
1 a 10
2 b 3
3 c 5
df <- df %>%
mutate(status = case_when(
y %in% lookup_list ~ 'FOUND',
!y %in% lookup_list ~ 'MISSING'
))
x y status
1 a 10 MISSING
2 b 3 FOUND
3 c 5 MISSING

Conditional value replacement in linked column

In a data frame, I want to replace a value based on a condition in another column.
Example: when the value in column A is above x, then both values in column A and B are replaced by NA.
I can't find the proper way to do this with the different functions: na_if, ifelse, if_else,case_when...
Subscript the data frame by a logical vector having the condition:
DF[DF$A > x, c("A", "B")] <- NA
Here's a working answer:
d <- data.frame("A" = 1:10, "B" = 11:20)
x <- 5
d[d$A > x, c("A", "B")] <- NA

Add missing value in column with value from row above

Every week I a incomplete dataset for a analysis. That looks like:
df1 <- data.frame(var1 = c("a","","","b",""),
var2 = c("x","y","z","x","z"))
Some var1 values are missing. The dataset should end up looking like this:
df2 <- data.frame(var1 = c("a","a","a","b","b"),
var2 = c("x","y","z","x","z"))
Currently I use an Excel macro to do this. But this makes it harder to automate the analysis. From now on I would like to do this in R. But I have no idea how to do this.
Thanks for your help.
QUESTION UPDATE AFTER COMMENT
var2 is not relevant for my question. The only thing I am trying to is. Get from df1 to df2.
df1 <- data.frame(var1 = c("a","","","b",""))
df2 <- data.frame(var1 = c("a","a","a","b","b"))
Here is one way of doing it by making use of run-length encoding (rle) and its inverse rle.inverse:
fillTheBlanks <- function(x, missing=""){
rle <- rle(as.character(x))
empty <- which(rle$value==missing)
rle$values[empty] <- rle$value[empty-1]
inverse.rle(rle)
}
df1$var1 <- fillTheBlanks(df1$var1)
The results:
df1
var1 var2
1 a x
2 a y
3 a z
4 b x
5 b z
Here is a simpler way:
library(zoo)
df1$var1[df1$var1 == ""] <- NA
df1$var1 <- na.locf(df1$var1)
The tidyr packages has the fill() function which does the trick.
df1 <- data.frame(var1 = c("a",NA,NA,"b",NA), stringsAsFactors = FALSE)
df1 %>% fill(var1)
Here is another way which is slightly shorter and doesn't coerce to character:
Fill <- function(x,missing="")
{
Log <- x != missing
y <- x[Log]
y[cumsum(Log)]
}
Results:
# For factor:
Fill(df1$var1)
[1] a a a b b
Levels: a b
# For character:
Fill(as.character(df1$var1))
[1] "a" "a" "a" "b" "b"
Below is my unfill function, encontered same problem, hope will help.
unfill <- function(df,cols){
col_names <- names(df)
unchanged <- df[!(names(df) %in% cols)]
changed <- df[names(df) %in% cols] %>%
map_df(function(col){
col[col == col %>% lag()] <- NA
col
})
unchanged %>% bind_cols(changed) %>% select(one_of(col_names))
}

Resources