It is a very basic question.How can you set the column names of data frame to column index? So if you have 4 columns, column names will be 1 2 3 4. The data frame i am using can have up to 100 columns.
It is not good to name the column names with names that start with numbers. Suppose, we name it as seq_along(D). It becomes unnecessarily complicated when we try to extract a column. For example,
names(D) <- seq_along(D)
D$1
#Error: unexpected numeric constant in "D$1"
In that case, we may need backticks or ""
D$"1"
#[1] 1 2 3
D$`1`
#[1] 1 2 3
However, the [ should work
D[["1"]]
#[1] 1 2 3
I would use
names(D) <- paste0("Col", seq_along(D))
D$Col1
#[1] 1 2 3
Or
D[["Col1"]]
#[1] 1 2 3
data
D <- data.frame(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9),d=c(10,11,12))
Just use names:
D <- data.frame(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9),d=c(10,11,12))
names(D) <- 1:ncol(D) # sequence from 1 through the number of columns
Related
I have 2 datasets with columns having the same names.
a:
A B C
1 2 3
5 6 7
b:
B E A
2 3 4
9 1 2
How can I find the column indices with the matched names?
I have tried converting them from wide to long format by using gather() respectively and matching both datasets with match(a,b). It didn't work.
#Find common column names in the two dataframes
intersect(names(a), names(b))
#[1] "A" "B"
#Find the column number in a which is present in b
which(names(a) %in% names(b))
#[1] 1 2
#find the column number in b which is present in a
which(names(b) %in% names(a))
#[1] 1 3
I personally like to use grep for this
grep(pattern = paste(names(a), collapse = "|") , x = names(b))
I have a data that looks as follows:
Patent_number<-c(2323,4449,4939,4939,12245)
IPC_class_1<-c("C12N",4,"C29N00185",2,"C12F")
IPC_class_2<-c(3,"K12N","C12F","A01N",8)
IPC_class_3<-c("S12F",1,"CQ010029393049",5,"CQ1N")
df<-data.frame(Patent_number, IPC_class_1, IPC_class_2, IPC_class_3)
View(df)
I want to count only the number o (string) values such as C12N, A01N etc. per row by adding another column "counts" in the end of the data frame. In other words, I want to exclude the numeric values from the row count.
Any suggestions?
You can't have mixed types in a dataframe column, so all of the numeric values will also be stored as type character. One approach would be to convert everything using as.numeric, and then use is.na to count those that are not coercible to numeric...
df$counts <- apply(sapply(df, as.numeric), 1, function(x) sum(is.na(x)))
df
Patent_number IPC_class_1 IPC_class_2 IPC_class_3 counts
1 2323 C12N 3 S12F 2
2 4449 4 K12N 1 1
3 4939 C29N C12F CQ01 3
4 4939 2 A01N 5 1
5 12245 C12F 8 CQ1N 2
We may also count by checking if all the characters are digits
df$counts <- ncol(df) - Reduce(`+`, lapply(df, grepl, pattern = '^[0-9.]+$'))
df$counts
[1] 2 1 3 1 2
Let I have the below data frame.
df.open<-c(1,4,5)
df.close<-c(2,8,3)
df<-data.frame(df.open, df.close)
> df
df.open df.close
1 1 2
2 4 8
3 5 3
I wanto change column names which includes "open" with "a" and column names which includes "close" with "b":
Namely I want to obtain the below data frame:
a b
1 1 2
2 4 8
3 5 3
I have a lot of such data frames. The pre values(here it is "df.") are changing but "open" and "close" are fix.
Thanks a lot.
We can create a function for reuse
f1 <- function(dat) {
names(dat)[grep('open$', names(dat))] <- 'a'
names(dat)[grep('close$', names(dat))] <- 'b'
dat
}
and apply on the data
df <- f1(df)
-output
df
a b
1 1 2
2 4 8
3 5 3
if these datasets are in a list
lst1 <- list(df, df)
lst1 <- lapply(lst1, f1)
Thanks to dear #akrun's insightful suggestion as always we can do it in one go. So we create character vectors in pattern and replacement arguments of str_replace to be able to carry out both operations at once. We can assign character vector of either length one or more to each one of them. In case of the latter the length of both vectors should correspond. More to the point as the documentation says:
References of the form \1, \2, etc will be replaced with the contents
of the respective matched group (created by ())
library(dplyr)
library(stringr)
df %>%
rename_with(~ str_replace(., c(".*\\.open", ".*\\.close"), c("a", "b")))
a b
1 1 2
2 4 8
3 5 3
Another base R option using gsub + match + setNames
setNames(
df,
c("a", "b")[match(
gsub("[^open|close]", "", names(df)),
c("open", "close")
)]
)
gives
a b
1 1 2
2 4 8
3 5 3
I have the following R dataframe: df = data.frame(value=c(5,4,3,2,1), a=c(2,0,1,6,9), b=c(7,0,0,3,4)). I would like to duplicate the values of a and b by the number of times of the corresponding position values in value. For example, Expanding b would look like b_ex = c(7,7,7,7,7,2,2,2,4). No values of three or four would be in b_ex because values of zero are in b[2] and b[3]. The expanded vectors would be assigned names and be stand-alone.
Thanks!
Maybe you are looking for :
result <- lapply(df[-1], function(x) rep(x[x != 0], df$value[x != 0]))
#$a
#[1] 2 2 2 2 2 1 1 1 6 6 9
#$b
#[1] 7 7 7 7 7 3 3 4
To have them as separate vectors in global environment use list2env :
list2env(result, .GlobalEnv)
got that one I can't resolve.
Example dataset:
company <- c("compA","compB","compC")
compA <- c(1,2,3)
compB <- c(2,3,1)
compC <- c(3,1,2)
df <- data.frame(company,compA,compB,compC)
I want to create a new column with the value from the column which name is in the column "company" of the same line. the resulting extraction would be:
df$new <- c(1,3,2)
df
The way you have it set up, there's one row and one column for every company, and the rows and columns are in the same order. If that's your real dataset, then as others have said diag(...) is the solution (and you should select that answer).
If your real dataset has more than one instance of company (e.g., more than one row per company, then this is more general:
# using your df
sapply(1:nrow(df),function(i)df[i,as.character(df$company[i])])
# [1] 1 3 2
# more complex case
set.seed(1) # for reproducible example
newdf <- data.frame(company=LETTERS[sample(1:3,10,replace=T)],
A = sample(1:3,10,replace=T),
B=sample(1:5,10,replace=T),
C=1:10)
head(newdf)
# company A B C
# 1 A 1 5 1
# 2 B 1 2 2
# 3 B 3 4 3
# 4 C 2 1 4
# 5 A 3 2 5
# 6 C 2 2 6
sapply(1:nrow(newdf),function(i)newdf[i,as.character(newdf$company[i])])
# [1] 1 2 4 4 3 6 7 2 5 3
EDIT: eddi's answer is probably better. It is more likely that you would have the dataframe to work with rather than the individual row vectors.
I am not sure if I understand your question, it is unclear from your description. But it seems you are asking for the diagonals of the data values since this would be the place where "name is in the column "company" of the same line". The following will do this:
df$new <- diag(matrix(c(compA,compB,compC), nrow = 3, ncol = 3))
The diag function will return the diagonal of the matrix for you. So I first concatenated the three original vectors into one vector and then specified it to be wrapped into a matrix of three rows and three columns. Then I took the diagonal. The whole thing is then added to the dataframe.
Did that answer your question?