Printing a data.frame with zero columns but still has row.names? - r

Is there a built-in function to display a data.frame with zero columns but still show row.names?
> df
DataFrame with 5 rows and 0 columns
> row.names(df)
[1] "ID1" "ID2" "ID3" "ID4" "ID5"
It would be useful if instead:
> df
DataFrame with 5 rows and 0 columns
ID1
ID2
ID3
ID4
ID5
I wrote a custom function to do it via cat, but would be nice to know if there's a built-in way of doing it.

library(tidyverse)
df <- df %>%
select(-everything())
cat(print(df), cat(rownames(df), sep = "\n"))
Or could also be simplified to:
df %>%
select(-everything()) %>%
cat(print(.), cat(rownames(.), sep = "\n"))
Output
data frame with 0 columns and 2 rows
A
B
Or using base R, if you don't care about the information being displayed about the dataframe.
df <- df[1]
df[1] <- rep("", nrow(df))
colnames(df) <- ""
Output
A
B
Data
df <- data.frame(a = c(1, 2),
b = c(1, 2),
c = c(4, 5))
rownames(df) <- c("A", "B")

Related

How to rename multiple Columns in R?

My goal is to get a concise way to rename multiple columns in a data frame. Let's consider a small data frame df as below:
df <- data.frame(a=1, b=2, c=3)
df
Let's say we want to change the names from a, b, and c to Y, W, and Z respectively.
Defining a character vector containing old names and new names.
df names <- c(Y = "a", Z ="b", E = "c")
I would use this to rename the columns,
rename(df, !!!names)
df
suggestions?
One more !:
df <- data.frame(a=1, b=2, c=3)
df_names <- c(Y = "a", Z ="b", E = "c")
library(dplyr)
df %>% rename(!!!df_names)
## Y Z E
##1 1 2 3
A non-tidy way might be through match:
names(df) <- names(df_names)[match(names(df), df_names)]
df
## Y Z E
##1 1 2 3
You could try:
sample(LETTERS[which(LETTERS %in% names(df) == FALSE)], size= length(names(df)), replace = FALSE)
[1] "S" "D" "N"
Here, you don't really care what the new names are as you're using sample. Otherwise a straight forward names(df) < c('name1', 'name2'...

Formatting individual variables in a dataframe

I have a simple question. I have a large dataset with text and numerical variables. I would like to format the numerical variables, but without saving them in a separate dataset and re-merging them (that would take way to much time).
How do I do this?
Here is a minimal example of what I mean:
a <- c("name1", "name2", "name3")
b <- rnorm(3)
df <- data.frame(a=a, b=b)
df<- format(round(df, 3), nsmall=3)
This gives me an error as "a" is a non-numeric variable. So how do I format just "b"?
Format one column:
df$b <- format(round(df$b, 3), nsmall = 3)
If we need to format many numeric columns:
ix <- sapply(df, is.numeric)
df[ ix ] <- format(round(df[ ix ], 3), nsmall = 3)
And a tidyverse-based solution for arbitrary number (and location) of numeric columns:
library(tidyverse)
df<- df %>%
mutate_if(is.numeric, . %>% round(3) %>% format(nsmall=3))
df
a b
1 name1 -0,105
2 name2 0,186
3 name3 0,161

How to append two dataframe by overwriting the existing rows

I have a dataframe say df. I have extracted a sample 5% rows from df and created a new dataframe df1 to do few manipulations in the dataset. Now I need to append df1 to df and overwrite the existing rows of df1 as it is a subset of df.
I tried to extract the rows that are not present in df using
df2 <- subset(df, !(rownames(df) %in% rownames(df1[])))
But this didnt work.
Can anyone help please.
Save the filter and re-use it like so
set.seed(357)
xy <- data.frame(col1 = letters[1:5], col2 = runif(5))
col1 col2
1 a 0.10728121
2 b 0.05504568
3 c 0.27987766
4 d 0.22486212
5 e 0.65348521
your.condition <- xy$col1 %in% c("c", "d")
newxy1 <- xy[your.condition, ]
newxy1$col2 <- 1:2
xy[your.condition, "col2"] <- newxy1$col2
xy
col1 col2
1 a 0.10728121
2 b 0.05504568
3 c 1.00000000
4 d 2.00000000
5 e 0.65348521
You should always try to make a reproducible example so that it is easy for others to help you
I have tried to do that with the help of mtcars dataset
#Copied mtcars data into df
df = mtcars
# sample 5 rows from df
df1 = df[sample(1:nrow(df), 5), ]
# did few manipulations in the dataset
df1 = df1 * 2
# overwrite the existing rows of df1 as it is a subset of df
df[rownames(df1), ] <- df1

R - co-locate columns with the same name after merge

Situation
I have two data frames, df1 and df2with the same column headings
x <- c(1,2,3)
y <- c(3,2,1)
z <- c(3,2,1)
names <- c("id","val1","val2")
df1 <- data.frame(x, y, z)
names(df1) <- names
a <- c(1, 2, 3)
b <- c(1, 2, 3)
c <- c(3, 2, 1)
df2 <- data.frame(a, b, c)
names(df2) <- names
And am performing a merge
#library(dplyr) # not needed for merge
joined_df <- merge(x=df1, y=df2, c("id"),all=TRUE)
This gives me the columns in the joined_df as id, val1.x, val2.x, val1.y, val2.y
Question
Is there a way to co-locate the columns that had the same heading in the original data frames, to give the column order in the joined data frame as id, val1.x, val1.y, val2.x, val2.y?
Note that in my actual data frame I have 115 columns, so I'd like to stay clear of using joned_df <- joined_df[, c(1, 2, 4, 3, 5)] if possible.
Update/Edit: also, I would like to maintain the original order of column headings, so sorting alphabetically is not an option (-on my actual data, I realise it would work with the example I have given).
My desired output is
id val1.x val1.y val2.x val2.y
1 1 3 1 3 3
2 2 2 2 2 2
3 3 1 3 1 1
Update with solution for general case
The accepted answer solves my issue nicely.
I've adapted the code slightly here to use the original column names, without having to hard-code them in the rep function.
#specify columns used in merge
merge_cols <- c("id")
# identify duplicate columns and remove those used in the 'merge'
dup_cols <- names(df1)
dup_cols <- dup_cols [! dup_cols %in% merge_cols]
# replicate each duplicate column name and append an 'x' and 'y'
dup_cols <- rep(dup_cols, each=2)
var <- c("x", "y")
newnames <- paste(dup_cols, ".", var, sep = "")
#create new column names and sort the joined df by those names
newnames <- c(merge_cols, newnames)
joined_df <- joined_df[newnames]
How about something like this
numrep <- rep(1:2, each = 2)
numrep
var <- c("x", "y")
var
newnames <- paste("val", numrep, ".", var, sep = "")
newdf <- cbind(joined_df$id, joined_df[newnames])
names(newdf)[1] <- "id"
Which should give you the dataframe like this
id val1.x val1.y val2.x val2.y
1 1 3 1 3 3
2 2 2 2 2 2
3 3 1 3 1 1

Add missing value in column with value from row above

Every week I a incomplete dataset for a analysis. That looks like:
df1 <- data.frame(var1 = c("a","","","b",""),
var2 = c("x","y","z","x","z"))
Some var1 values are missing. The dataset should end up looking like this:
df2 <- data.frame(var1 = c("a","a","a","b","b"),
var2 = c("x","y","z","x","z"))
Currently I use an Excel macro to do this. But this makes it harder to automate the analysis. From now on I would like to do this in R. But I have no idea how to do this.
Thanks for your help.
QUESTION UPDATE AFTER COMMENT
var2 is not relevant for my question. The only thing I am trying to is. Get from df1 to df2.
df1 <- data.frame(var1 = c("a","","","b",""))
df2 <- data.frame(var1 = c("a","a","a","b","b"))
Here is one way of doing it by making use of run-length encoding (rle) and its inverse rle.inverse:
fillTheBlanks <- function(x, missing=""){
rle <- rle(as.character(x))
empty <- which(rle$value==missing)
rle$values[empty] <- rle$value[empty-1]
inverse.rle(rle)
}
df1$var1 <- fillTheBlanks(df1$var1)
The results:
df1
var1 var2
1 a x
2 a y
3 a z
4 b x
5 b z
Here is a simpler way:
library(zoo)
df1$var1[df1$var1 == ""] <- NA
df1$var1 <- na.locf(df1$var1)
The tidyr packages has the fill() function which does the trick.
df1 <- data.frame(var1 = c("a",NA,NA,"b",NA), stringsAsFactors = FALSE)
df1 %>% fill(var1)
Here is another way which is slightly shorter and doesn't coerce to character:
Fill <- function(x,missing="")
{
Log <- x != missing
y <- x[Log]
y[cumsum(Log)]
}
Results:
# For factor:
Fill(df1$var1)
[1] a a a b b
Levels: a b
# For character:
Fill(as.character(df1$var1))
[1] "a" "a" "a" "b" "b"
Below is my unfill function, encontered same problem, hope will help.
unfill <- function(df,cols){
col_names <- names(df)
unchanged <- df[!(names(df) %in% cols)]
changed <- df[names(df) %in% cols] %>%
map_df(function(col){
col[col == col %>% lag()] <- NA
col
})
unchanged %>% bind_cols(changed) %>% select(one_of(col_names))
}

Resources