I'm trying to extract the name of the i column used in a loop:
for (i in df){
print(name(i))
}
Python code solution example:
for i in df:
print(i)
PS: R gives me the column values If I use the same code than Python (but python gives just the name).
EDIT: It has to be in a loop. As I will do more elaborate things with this.
for (i in names(df)){
print(i)
}
Just do
names(df)
to print all the column names in df. There's no need for a loop, unless you want to do something more elaborate with each column.
If you want the i'th column name:
names(df)[i]
Instead of looping, you can use the imap function from the purrr package. When writing the code, .x is the object and .y is the name.
df <- data.frame(a = 1:10, b = 21:30, c = 31:40)
library(purrr)
imap(df, ~paste0("The name is ", .y, " and the sum is ", sum(.x)))
# $a
# [1] "The name is a and the sum is 55"
#
# $b
# [1] "The name is b and the sum is 255"
#
# $c
# [1] "The name is c and the sum is 355"
This is just a more convenient way of writing the following Base R code, which gives the same output:
Map(function(x, y) paste0("The name is ", y, " and the sum is ", sum(x))
, df, names(df))
You can try the following code:
# Simulating your data
a <- c(1,2,3)
b <- c(4,5,6)
df <- data.frame(a, b)
# Answer 1
for (i in 1:ncol(df)){
print(names(df)[i]) # acessing the name of column
print(df[,i]) # acessing column content
print('----')
}
Or this alternative:
# Answer 2
columns <- names(df)
for(i in columns) {
print(i) # acessing the name of column
print(df[, i]) # acessing column content
print('----')
}
Hope it helps!
Related
I have a data frame df. It has a column named b. I know this column name, although I do not know its position in the data frame. I know that colnames(df) will give me a vector of character strings that are the names of all the columns, but I do not know how to get a string for this particular column. In other words, I want to obtain the string "b". How can I do that? I imagine this may involve the rlang package, which I have difficulty understanding.
Here's an example:
library(rlang)
library(tidyverse)
a <- c(1:8)
b <- c(23,34,45,43,32,45,68,78)
c <- c(0.34,0.56,0.97,0.33,-0.23,-0.36,-0.11,0.17)
df <- data.frame(a,b,c)
tf <- function(df,MYcol) {
print(paste0("The name of the input column is ",MYcol)) # does not work
print(paste0("The name of the input column is ",{{MYcol}})) # does not work
y <- {{MYcol}} # This gives the values in column b as it shoulkd
}
z <- tf(df,b) # Gives undesired values - I want the string "b"
If you cannot pass column name as string in the function (tf(df,"b")) directly, you can use deparse + substitute.
tf <- function(df,MYcol) {
col <- deparse(substitute(MYcol))
print(paste0("The name of the input column is ",col))
return(col)
}
z <- tf(df,b)
#[1] "The name of the input column is b"
z
#[1] "b"
We can use as_string with enquo/ensym
tf <- function(df, MYcol) {
mycol <- rlang::as_string(rlang::ensym(MYcol))
print(glue::glue("The name of the input column is {mycol}"))
return(mycol)
}
z <- tf(df,b)
The name of the input column is b
z
#[1] "b"
I'm working on a program and now I'm looking for a way to check the column names when uploading a file. If the names are not unique an error should be written. Is there any way to do this?
For example if I have these df:
> a <- c(10, 20, 30)
> b <- c(1, 2, 3)
> c <- c("Peter", "Ann", "Mike")
> test <- data.frame(a, b, c)
with:
library(dplyr)
test <- rename(test, Number = a)
test <- rename(test, Number = b)
> test
Number Number c
1 10 1 Peter
2 20 2 Ann
3 30 3 Mike
If this were a file how could I check if the column names are unique. Nice would be as result only True or False!
Thanks!
We can use:
any(duplicated(names(df))) #tested with df as iris
[1] FALSE
On OP's data:
any(duplicated(names(test)))
[1] TRUE
The above can be simplified using the following as suggested by #sindri_baldur and #akrun
anyDuplicated(names(test))
If you wish to know how many are duplicated:
length(which(duplicated(names(test))==TRUE))
[1] 1
This can also be simplified to(as suggested by #sindri_baldur:
sum(duplicated(names(test)))
test.frame <- data.frame(a = c(1:5), b = c(6:10))
a <- c(5:1)
test.frame <- cbind(test.frame, a)
## Build data.frame with duplicate column
test.unique <- function(df) { ## function to test unique columns
length1 <- length(colnames(df))
length2 <- length(unique(colnames(df)))
if (length1 - length2 > 0 ) {
print(paste("There are", length1 - length2, " duplicates", sep=" "))
}
}
This results in ...
test.unique(test.frame)
[1] "There are 1 duplicates"
Check for the functions unique() and colnames(). For example:
are.unique.colnames <- function(array){
return(length(unique(colnames(array))) == dim(array)[2])
}
is a function based on the number of different column names (a easy and useful metadata of any array-like structure)
I have multiple .csv files that are in the same format, same column names etc.
I am wanting to do some operations on the columns then return the operations after each for loop. Here is some repeatable code:
df1 <- data.frame(x= (0:9), y= (10:19))
df2 <- data.frame(x= (20:29), y=(30:39))
listy <- list(df1, df2)
avg <- 0
filenames<- c("df1", "df2")
filenumbers<-seq(listy)
b <- 0
for(filenumber in filenumbers){ b <- b+1
allDM <- as.data.frame(filenames[filenumber],
header=TRUE)
allDM <- data.frame(
pred= filenames[filenumber]$x,
actual= filenames[filenumber]$y
)
allDM$pa <- allDM$pred-allDM$actual
avg <- mean(allDM$pa)
return(avg)
}
It is not happy using the $ function here.
Error is: Error in filenames[filenumber]$x :
$ operator is invalid for atomic vectors
Cheers,
filenames[filenumber]
is simply an (atomic) character object, i.e.
[1] "df1"
or
[1] "df2"
thus it wouldn't make sense to use $ on it.
You can fix this by using get():
for(filenumber in filenumbers){
b <- b + 1
allDM <- as.data.frame(filenames[filenumber],
header=TRUE)
tmp <- get(filenames[filenumber])
allDM <- data.frame(
pred = tmp$x,
actual = tmp$y
)
allDM$pa <- allDM$pred-allDM$actual
avg <- mean(allDM$pa)
}
Note that I also took out return(avg) because this is not a function so you can't use return(), but you have no need to anyway. avg still gets created.
I have a data frame (df), a vector of column names (foo), and a function (spaces) that calculates a value for all rows in a specified column of a df. I am trying to accomplish the following:
Private foo as input to spaces
Spaces operates on each element of foo matching a column name in df
For each column spaces operates on, store the output of spaces in a new column of df with a column name produced by concatenating the name of the original column and ".counts".
I keep receiving Error:
> Error: unexpected '=' in:
>" new[i] <- paste0(foo[i],".count") # New variable name
> data <- transform(data, new[i] ="
> }
> Error: unexpected '}' in " }"
Below is my code. Note: spaces does what I want when provided an input of a single variable of the form df$x but using transform() should allow me to forego including the prefix df$ for each variable.
# Create data for example
a <- c <- seq(1:5)
b <- c("1","1 2", "1 2 3","1 2 3 4","1 2 3 4 5")
d <- 10
df <- data.frame(a,b,c,d) # data fram df
foo <- c("a","b") # these are the names of the columns I want to provide to spaces
# Define function: spaces
spaces <- function(s) { sapply(gregexpr(" ", s), function(p) { sum(p>=0) } ) }
# Initialize vector with new variable names
new <- vector(length = length(foo))
# Create loop with following steps:
# (1) New variable name
# (2) Each element (e.g. "x") of foo is fed to spaces
# a new variable (e.g. "x.count") is created in df,
# this new df overwrites the old df
for (i in 1:length(foo)) {
new[i] <- paste0(foo[i],".count") # New variable name
df <- transform(df, new[i] = spaces(foo[i])) # Function and new df
}
transform(df, new[i] = spaces(foo[i])) is not valid syntax. You cannot call argument names by an index. Create a temporary character string and use that.
for (i in 1:length(foo)) {
new[i] <- paste0(foo[i],".count") # New variable name
tmp <- paste0(new[i], ".counts")
df <- transform(df, tmp = spaces(foo[i])) # Function and new df
}
I have a single column data frame - example data:
1 >PROKKA_00002 Alpha-ketoglutarate permease
2 MTESSITERGAPELADTRRRIWAIVGASSGNLVEWFDFYVYSFCSLYFAHIFFPSGNTTT
3 QLLQTAGVFAAGFLMRPIGGWLFGRIADRRGRKTSMLISVCMMCFGSLVIACLPGYAVIG
4 >PROKKA_00003 lipoprotein
5 MRTIIVIASLLLTGCSHMANDAWSGQDKAQHFLASAMLSAAGNEYAQHQGYSRDRSAAIG
Each sequence of letters is associated with the ">" line above it. I need a two-column data frame with lines starting in ">" in the first column, and the respective lines of letters concatenated as one sequence in the second column. This is what I've tried so far:
y <- matrix(0,5836,2) #empty matrix with 5836 rows and two columns
z <- 0
for(i in 1:nrow(df)){
if((grepl(pattern = "^>", x = df)) == TRUE){ #tried to set the conditional "if a line starts with ">", execute code"
z <- z + 1
y[z,1] <- paste(df[i])
} else{
y[z,2] <- paste(df[i], collapse = "")
}
}
I would eventually convert the matrix y back to a data.frame using as.data.frame, but my loop keeps getting Error: unexpected '}' in "}". I'm also not sure if my conditional is right. Can anyone help? It would be greatly appreciated!
Although I will stick with packages, here is a solution
initialize data
mydf <- data.frame(x=c(">PROKKA_00002 Alpha-ketoglutarate","MTESSITERGAPEL", "MTESSITERGAPEL",">PROKKA_00003 lipoprotein", "MTESSITERGAPEL" ,"MRTIIVIASLLLT"), stringsAsFactors = F)
process
ind <- grep(">", mydf$x)
temp<-data.frame(ind=ind, from=ind+1, to=c((ind-1)[-1], nrow(mydf)))
seqs<-rep(NA, length(ind))
for(i in 1:length(ind)) {
seqs[i]<-paste(mydf$x[temp$from[i]:temp$to[i]], collapse="")
}
fastatable<-data.frame(name=gsub(">", "", mydf[ind,1]), sequence=seqs)
> fastatable
name sequence
1 PROKKA_00002 Alpha-ketoglutarate MTESSITERGAPELMTESSITERGAPEL
2 PROKKA_00003 lipoprotein MTESSITERGAPELMRTIIVIASLLLT
Try creating an index of the rows with the target symbol with the column headers. Then split the data on that index. The call cumsum(ind1)[!ind1] first creates an id rows by coercing the logical vector into numeric, then eliminates the rows with the column headers.
ind1 <- grepl(">", mydf$x)
#split data on the index created
newdf <- data.frame(mydf$x[ind1][cumsum(ind1)], mydf$x)[!ind1,]
#Add names
names(newdf) <- c("Name", "Value")
newdf
# Name Value
# 2 >PROKKA_00002 Alpha-ketoglutarate
# 3 >PROKKA_00002 MTESSITERGAPEL
# 5 >PROKKA_00003 lipoprotein
# 6 >PROKKA_00003 MRTIIVIASLLLT
Data
mydf <- data.frame(x=c(">PROKKA_00002","Alpha-ketoglutarate","MTESSITERGAPEL", ">PROKKA_00003", "lipoprotein" ,"MRTIIVIASLLLT"))
You can use plyr to accomplish this if you are able to assigned a section number to your rows appropriately:
library(plyr)
df <- data.frame(v1=c(">PROKKA_00002 Alpha-ketoglutarate permease",
"MTESSITERGAPELADTRRRIWAIVGASSGNLVEWFDFYVYSFCSLYFAHIFFPSGNTTT",
"QLLQTAGVFAAGFLMRPIGGWLFGRIADRRGRKTSMLISVCMMCFGSLVIACLPGYAVIG",
">PROKKA_00003 lipoprotein",
"MRTIIVIASLLLTGCSHMANDAWSGQDKAQHFLASAMLSAAGNEYAQHQGYSRDRSAAIG"))
df$hasMark <- ifelse(grepl(">",df$v1,fixed=TRUE),1, 0)
df$section <- cumsum(df$hasMark)
t <- ddply(df, "section", function(x){
data.frame(v2=head(x,1),v3=paste(x$v1[2:nrow(x)], collapse=''))
})
t <- subset(t, select=-c(section,v2.hasMark,v2.section)) #drop the extra columns
if you then view 't' I believe this is what you were looking for in your original post