I have a df1
X Y
CA23-11 002 0033
CA67-55 011 0245
I would like to create df2
Z
CA2311-2-33
CA6755-11-245
My code to do this is
df2 <- df %>% unite(Z, X:Y, remove = "0", sep="-")
and my error is: Error in if(remove) { : argument in not interpertable as logical
Any assistance in this would be appreciated. Thanks!
We could use base R to return the expected output. Read the 'Y' column with read.table so that it automatically reads into numeric columns splitting at the whitespace, cbind with - removed 'X' column, change the format with sprintf
data.frame(Z = do.call(sprintf, c(fmt = '%s-%d-%d',
cbind(sub("-", "", df1$X),
read.table(text = df1$Y, header = FALSE)))))
-ouptut
Z
1 CA2311-2-33
2 CA6755-11-245
Or using tidyverse
We separate the 'Y' by splitting at space, and convert the type to numeric
Remove the - in 'X' - str_remove
Then unite the 3 columns together
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
separate(Y, into = c("Y1", "Y2"), convert = TRUE) %>%
mutate(X = str_remove(X, '-')) %>%
unite(Z, X, Y1, Y2, sep= '-')
Z
1 CA2311-2-33
2 CA6755-11-245
data
df1 <- structure(list(X = c("CA23-11", "CA67-55"), Y = c("002 0033",
"011 0245")), class = "data.frame", row.names = c(NA, -2L))
Here is another solution using built-in functions:
df2 <- data.frame(Z = paste0(sub("-", "", df1$X), gsub("^0*| 0*", "-", df1$Y)))
df2
# Z
# 1 CA2311-2-33
# 2 CA6755-11-245
I have a dataset df1 like so:
snp <- c("rs7513574_T", "rs1627238_A", "rs1171278_C")
p.value <- c(2.635489e-01, 9.836280e-01 , 6.315047e-01 )
df1 <- data.frame(snp, p.value)
I want to remove the _ underscore and the letters after it (representing allele) in df1 and make this into a new dataframe df2
I tried this using the code
df2 <- df1[,c("snp", "allele"):=tstrsplit(`snp`, "_", fixed = TRUE)]
However, this changes the df1 data frame. Is there another way to do this?
This is my best guess as to what you want:
library(tidyr)
separate(df1, snp, into = c("snp", "allele"), sep = "_")
# snp allele p.value
# 1 rs7513574 T 0.2635489
# 2 rs1627238 A 0.9836280
# 3 rs1171278 C 0.6315047
df2 = df1 %>%
dplyr::mutate(across(c(V1, V2, V3), ~stringr::str_remove_all(., "_[:alpha:]")))
> df2
V1 V2 V3
snp rs7513574 rs1627238 rs1171278
p.value 0.2635489 0.983628 0.6315047
Try:
df2 <- df1 %>% mutate(snp=gsub("_.","",snp))
Consider creating a copy of the dataset and do the tstrsplit on the copied data to avoid changes in original data
library(data.table)
df2 <- copy(df1)
setDT(df2)[,c("snp", "allele") := tstrsplit(snp, "_", fixed = TRUE)]
data=data.frame(cat=runif(100), dog = runif(100), fox = runif(100), bunny = runif(100))
I just wish to rename such that cat = var01, dog = var04, fox = var07, bunny = var09.
We can use rename on a named vector and evaluate with (!!!)
library(dplyr)
nm1 <- c('cat', 'dog', 'fox', 'bunny')
nm2 <- c('var01', 'var04', 'var07', 'var09')
Or create it with seq
nm2 <- sprintf('var%02d', seq(1, length.out = length(nm1), by = 3))
data <- rename(data, !!! setNames(nm1, nm2))
Or with setnames from data.table to change the column names in place by providing a vector of 'old', 'new' names
library(data.table)
setDT(data)
setnames(data, nm1, nm2)
names(data)
#[1] "var01" "var04" "var07" "var09"
If you want to rename only specific columns from the data you could use
library(dplyr)
data %>%
rename(var01 = cat, var04 = dog, var07 = fox, var09 = bunny) %>%
head
# var01 var04 var07 var09
#1 0.3817939 0.82917877 0.29435146 0.07547698
#2 0.7235733 0.89619003 0.11643227 0.07026431
#3 0.2500442 0.01800189 0.02804676 0.29175499
#4 0.1229257 0.87631870 0.86204151 0.83269660
#5 0.2191805 0.90387735 0.75390315 0.59554349
#6 0.5019568 0.87161199 0.05806871 0.31988761
Every week I a incomplete dataset for a analysis. That looks like:
df1 <- data.frame(var1 = c("a","","","b",""),
var2 = c("x","y","z","x","z"))
Some var1 values are missing. The dataset should end up looking like this:
df2 <- data.frame(var1 = c("a","a","a","b","b"),
var2 = c("x","y","z","x","z"))
Currently I use an Excel macro to do this. But this makes it harder to automate the analysis. From now on I would like to do this in R. But I have no idea how to do this.
Thanks for your help.
QUESTION UPDATE AFTER COMMENT
var2 is not relevant for my question. The only thing I am trying to is. Get from df1 to df2.
df1 <- data.frame(var1 = c("a","","","b",""))
df2 <- data.frame(var1 = c("a","a","a","b","b"))
Here is one way of doing it by making use of run-length encoding (rle) and its inverse rle.inverse:
fillTheBlanks <- function(x, missing=""){
rle <- rle(as.character(x))
empty <- which(rle$value==missing)
rle$values[empty] <- rle$value[empty-1]
inverse.rle(rle)
}
df1$var1 <- fillTheBlanks(df1$var1)
The results:
df1
var1 var2
1 a x
2 a y
3 a z
4 b x
5 b z
Here is a simpler way:
library(zoo)
df1$var1[df1$var1 == ""] <- NA
df1$var1 <- na.locf(df1$var1)
The tidyr packages has the fill() function which does the trick.
df1 <- data.frame(var1 = c("a",NA,NA,"b",NA), stringsAsFactors = FALSE)
df1 %>% fill(var1)
Here is another way which is slightly shorter and doesn't coerce to character:
Fill <- function(x,missing="")
{
Log <- x != missing
y <- x[Log]
y[cumsum(Log)]
}
Results:
# For factor:
Fill(df1$var1)
[1] a a a b b
Levels: a b
# For character:
Fill(as.character(df1$var1))
[1] "a" "a" "a" "b" "b"
Below is my unfill function, encontered same problem, hope will help.
unfill <- function(df,cols){
col_names <- names(df)
unchanged <- df[!(names(df) %in% cols)]
changed <- df[names(df) %in% cols] %>%
map_df(function(col){
col[col == col %>% lag()] <- NA
col
})
unchanged %>% bind_cols(changed) %>% select(one_of(col_names))
}
I know if I have a data frame with more than 1 column, then I can use
colnames(x) <- c("col1","col2")
to rename the columns. How to do this if it's just one column?
Meaning a vector or data frame with only one column.
Example:
trSamp <- data.frame(sample(trainer$index, 10000))
head(trSamp )
# sample.trainer.index..10000.
# 1 5907862
# 2 2181266
# 3 7368504
# 4 1949790
# 5 3475174
# 6 6062879
ncol(trSamp)
# [1] 1
class(trSamp)
# [1] "data.frame"
class(trSamp[1])
# [1] "data.frame"
class(trSamp[,1])
# [1] "numeric"
colnames(trSamp)[2] <- "newname2"
# Error in names(x) <- value :
# 'names' attribute [2] must be the same length as the vector [1]
This is a generalized way in which you do not have to remember the exact location of the variable:
# df = dataframe
# old.var.name = The name you don't like anymore
# new.var.name = The name you want to get
names(df)[names(df) == 'old.var.name'] <- 'new.var.name'
This code pretty much does the following:
names(df) looks into all the names in the df
[names(df) == old.var.name] extracts the variable name you want to check
<- 'new.var.name' assigns the new variable name.
colnames(trSamp)[2] <- "newname2"
attempts to set the second column's name. Your object only has one column, so the command throws an error. This should be sufficient:
colnames(trSamp) <- "newname2"
colnames(df)[colnames(df) == 'oldName'] <- 'newName'
This is an old question, but it is worth noting that you can now use setnames from the data.table package.
library(data.table)
setnames(DF, "oldName", "newName")
# or since the data.frame in question is just one column:
setnames(DF, "newName")
# And for reference's sake, in general (more than once column)
nms <- c("col1.name", "col2.name", etc...)
setnames(DF, nms)
This can also be done using Hadley's plyr package, and the rename function.
library(plyr)
df <- data.frame(foo=rnorm(1000))
df <- rename(df,c('foo'='samples'))
You can rename by the name (without knowing the position) and perform multiple renames at once. After doing a merge, for example, you might end up with:
letterid id.x id.y
1 70 2 1
2 116 6 5
3 116 6 4
4 116 6 3
5 766 14 9
6 766 14 13
Which you can then rename in one step using:
letters <- rename(letters,c("id.x" = "source", "id.y" = "target"))
letterid source target
1 70 2 1
2 116 6 5
3 116 6 4
4 116 6 3
5 766 14 9
6 766 14 13
I think the best way of renaming columns is by using the dplyr package like this:
require(dplyr)
df = rename(df, new_col01 = old_col01, new_col02 = old_col02, ...)
It works the same for renaming one or many columns in any dataset.
I find that the most convenient way to rename a single column is using dplyr::rename_at :
library(dplyr)
cars %>% rename_at("speed",~"new") %>% head
cars %>% rename_at(vars(speed),~"new") %>% head
cars %>% rename_at(1,~"new") %>% head
# new dist
# 1 4 2
# 2 4 10
# 3 7 4
# 4 7 22
# 5 8 16
# 6 9 10
works well in pipe chaines
convenient when names are stored in variables
works with a name or an column index
clear and compact
I like the next style for rename dataframe column names one by one.
colnames(df)[which(colnames(df) == 'old_colname')] <- 'new_colname'
where
which(colnames(df) == 'old_colname')
returns by the index of the specific column.
Let df be the dataframe you have with col names myDays and temp.
If you want to rename "myDays" to "Date",
library(plyr)
rename(df,c("myDays" = "Date"))
or with pipe, you can
dfNew <- df %>%
plyr::rename(c("myDays" = "Date"))
Try:
colnames(x)[2] <- 'newname2'
This is likely already out there, but I was playing with renaming fields while searching out a solution and tried this on a whim. Worked for my purposes.
Table1$FieldNewName <- Table1$FieldOldName
Table1$FieldOldName <- NULL
Edit begins here....
This works as well.
df <- rename(df, c("oldColName" = "newColName"))
You can use the rename.vars in the gdata package.
library(gdata)
df <- rename.vars(df, from = "oldname", to = "newname")
This is particularly useful where you have more than one variable name to change or you want to append or pre-pend some text to the variable names, then you can do something like:
df <- rename.vars(df, from = c("old1", "old2", "old3",
to = c("new1", "new2", "new3"))
For an example of appending text to a subset of variables names see:
https://stackoverflow.com/a/28870000/180892
You could also try 'upData' from 'Hmisc' package.
library(Hmisc)
trSamp = upData(trSamp, rename=c(sample.trainer.index..10000. = 'newname2'))
If you know that your dataframe has only one column, you can use:
names(trSamp) <- "newname2"
The OP's question has been well and truly answered. However, here's a trick that may be useful in some situations: partial matching of the column name, irrespective of its position in a dataframe:
Partial matching on the name:
d <- data.frame(name1 = NA, Reported.Cases..WHO..2011. = NA, name3 = NA)
## name1 Reported.Cases..WHO..2011. name3
## 1 NA NA NA
names(d)[grepl("Reported", names(d))] <- "name2"
## name1 name2 name3
## 1 NA NA NA
Another example: partial matching on the presence of "punctuation":
d <- data.frame(name1 = NA, Reported.Cases..WHO..2011. = NA, name3 = NA)
## name1 Reported.Cases..WHO..2011. name3
## 1 NA NA NA
names(d)[grepl("[[:punct:]]", names(d))] <- "name2"
## name1 name2 name3
## 1 NA NA NA
These were examples I had to deal with today, I thought might be worth sharing.
I would simply change a column name to the dataset with the new name I want with the following code:
names(dataset)[index_value] <- "new_col_name"
I found colnames() argument easier
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/row%2Bcolnames
select some column from the data frame
df <- data.frame(df[, c( "hhid","b1005", "b1012_imp", "b3004a")])
and rename the selected column in order,
colnames(df) <- c("hhid", "income", "cost", "credit")
check the names and the values to be sure
names(df);head(df)
I would simply add a new column to the data frame with the name I want and get the data for it from the existing column. like this:
dataf$value=dataf$Article1Order
then I remove the old column! like this:
dataf$Article1Order<-NULL
This code might seem silly! But it works perfectly...
We can use rename_with to rename columns with a function (stringr functions, for example).
Consider the following data df_1:
df_1 <- data.frame(
x = replicate(n = 3, expr = rnorm(n = 3, mean = 10, sd = 1)),
y = sample(x = 1:2, size = 10, replace = TRUE)
)
names(df_1)
#[1] "x.1" "x.2" "x.3" "y"
Rename all variables with dplyr::everything():
library(tidyverse)
df_1 %>%
rename_with(.data = ., .cols = everything(.),
.fn = str_replace, pattern = '.*',
replacement = str_c('var', seq_along(.), sep = '_')) %>%
names()
#[1] "var_1" "var_2" "var_3" "var_4"
Rename by name particle with some dplyr verbs (starts_with, ends_with, contains, matches, ...).
Example with . (x variables):
df_1 %>%
rename_with(.data = ., .cols = contains('.'),
.fn = str_replace, pattern = '.*',
replacement = str_c('var', seq_along(.), sep = '_')) %>%
names()
#[1] "var_1" "var_2" "var_3" "y"
Rename by class with many functions of class test, like is.integer, is.numeric, is.factor...
Example with is.integer (y):
df_1 %>%
rename_with(.data = ., .cols = is.integer,
.fn = str_replace, pattern = '.*',
replacement = str_c('var', seq_along(.), sep = '_')) %>%
names()
#[1] "x.1" "x.2" "x.3" "var_1"
The warning:
Warning messages:
1: In stri_replace_first_regex(string, pattern, fix_replacement(replacement), :
longer object length is not a multiple of shorter object length
2: In names[cols] <- .fn(names[cols], ...) :
number of items to replace is not a multiple of replacement length
It is not relevant, as it is just an inconsistency of seq_along(.) with the replace function.
library(dplyr)
rename(data, de=de.y)