Select column dataframe index R - r

I have a data frame df like this
1 2 3 4
A B C A
where the colnames are {1,2,3,4}. I would like to select one of the column of the data frame according to an index that I set externally
colf <- as.numeric(mo)
fmo <- df[[colf]]
Many thanks,

First things first I don't recommend having numbers as column names. Saying that, this should help you out.
> df <- data.frame("1"="A","2"="B","3"="C")
> df
X1 X2 X3
1 A B C
> df$X1 #Get column by name
[1] A
Levels: A
> df[,1] #Get first column
[1] A
Levels: A
>

Treat the data frame as a matrix and index it using [row,column] notation, i.e.
fmo = df[,colf]
This will always get column number colf.

Related

If a vector "x" is a subset of a vector df$y, df a dataframe which consists of: "y" and frequencies "fy", can I get a df of "x" and "fx" ? [ R ]

If list_a is a subset of df$list_b, where df$frequency_b are the frequencies of the values in df$list_b, I would like to create a data frame where I want the list_a with the corresponding frequencies, thus a data frame [df1$list_a,df1$frequency_a] . (NOTE: all the elements in list_a are in df$list_b).
Example:
list_a <- c("John","George","Jack","Kathrine")
df$list_b <- c("Mario","Jack","Ana","George","Loizos",
"Kathrine","John","Jack","Yannis")
where,
df$frequency_b <- c("10","3","15","23","13","50","553","334","332")
I want a data frame such as:
df1$list_a <- c("John","George","Jack","Kathrine")
and the corresponding frequencies:
df1$frequencies <- c(553,3,15,23)
Is there any way to implement this in R?
One can use the %in% operator to subset the names in the original data frame.
> list_a <- c("John","George","Jack","Kathrine")
> list_b <- c("Mario","Jack","Ana","George","Loizos",
+ "Kathrine","John","Jack","Yannis")
> frequency_b <- c("10","3","15","23","13","50","553","334","332")
> df <- data.frame(name=list_b, count=frequency_b)
> df1 <- df[df$name %in% list_a,]
> df1
name count
2 Jack 3
4 George 23
6 Kathrine 50
7 John 553
8 Jack 334
Note that there were two people named Jack in your data, so the output data frame has 5 rows, not 4 as in your original post.
regards,
Len

What's the best way to add a specific string to all column names in a dataframe in R?

I am trying to train a data that's converted from a document term matrix to a dataframe. There are separate fields for the positive and negative comments, so I wanted to add a string to the column names to serve as a "tag", to differentiate the same word coming from the different fields - for example, the word hello can appear both in the positive and negative comment fields (and thus, represented as a column in my dataframe), so in my model, I want to differentiate these by making the column names positive_hello and negative_hello.
I am looking for a way to rename columns in such a way that a specific string will be appended to all columns in the dataframe. Say, for mtcars, I want to rename all of the columns to have "_sample" at the end, so that the column names would become mpg_sample, cyl_sample, disp_sample and so on, which were originally mpg, cyl, and disp.
I'm considering using sapplyor lapply, but I haven't had any progress on it. Any help would be greatly appreciated.
Use colnames and paste0 functions:
df = data.frame(x = 1:2, y = 2:1)
colnames(df)
[1] "x" "y"
colnames(df) <- paste0('tag_', colnames(df))
colnames(df)
[1] "tag_x" "tag_y"
If you want to prefix each item in a column with a string, you can use paste():
# Generate sample data
df <- data.frame(good=letters, bad=LETTERS)
# Use the paste() function to append the same word to each item in a column
df$good2 <- paste('positive', df$good, sep='_')
df$bad2 <- paste('negative', df$bad, sep='_')
# Look at the results
head(df)
good bad good2 bad2
1 a A positive_a negative_A
2 b B positive_b negative_B
3 c C positive_c negative_C
4 d D positive_d negative_D
5 e E positive_e negative_E
6 f F positive_f negative_F
Edit:
Looks like I misunderstood the question. But you can rename columns in a similar way:
colnames(df) <- paste(colnames(df), 'sample', sep='_')
colnames(df)
[1] "good_sample" "bad_sample" "good2_sample" "bad2_sample"
Or to rename one specific column (column one, in this case):
colnames(df)[1] <- paste('prefix', colnames(df)[1], sep='_')
colnames(df)
[1] "prefix_good_sample" "bad_sample" "good2_sample" "bad2_sample"
You can use setnames from the data.table package, it doesn't create any copy of your data.
library(data.table)
df <- data.frame(a=c(1,2),b=c(3,4))
# a b
# 1 1 3
# 2 2 4
setnames(df,paste0(names(df),"_tag"))
print(df)
# a_tag b_tag
# 1 1 3
# 2 2 4

Splitting a dataframe if rows are numeric or not in R

I have a data frame (let's call it 'df') it consists of two columns
Name Contact
A 34552325
B 423424
C 4324234242
D hello1#company.com
I want to split the dataframe into two dataframe based on whether a row in column "Contact" is numeric or not
Expected Output:
Name Contact
A 34552325
B 423424
C 4324234242
and
Name Contact
D hello1#company.com
I tired using:
df$IsNum <- !(is.na(as.numeric(df$Contact)))
But this classified "hello1#company.com" also as numeric.
Basically if there is even a single non-numeric value in column "Contact", then code must classify it as non-numeric
You may use grepl..
x <- " Name Contact
A 34552325
B 423424
C 4324234242
D hello1#company.com"
df <- read.table(text=x, header = T)
x <- df[grepl("^\\d+$",df$Contact),]
y <- df[!grepl("^\\d+$",df$Contact),]
x
# Name Contact
# 1 A 34552325
# 2 B 423424
# 3 C 4324234242
y
# Name Contact
# 4 D hello1#company.com
We can create a grouping variable with grepl (same as how #Avinash Raj created), split the dataframe with that to create a list of data.frames.
split(df, grepl('^\\d+$', df$Contact))

Set up column names in a new data frame based on variable

My goal is to be able to allocate column names to a data frame that I create based on a passed variable. For instance:
i='column1'
data.frame(i=1)
i
1 1
Above the column name is 'i' when I want it to be 'column1'. I know the following works but isn't as efficient as I'd like:
i='column1'
df<-data.frame(x=1)
setnames(df,i)
column1
1 1
It's good to learn how base R works this way:
i <- 'cloumn1'
df <- `names<-`(data.frame(1), i)
df
# cloumn1
#1 1
Aside from the answers posted by other users, I think you may be stuck with the solution you've already presented. If you already have a data frame with the intended number of rows, you can add a new column using brackets:
df <- data.frame('column1'=1)
i <- 'column2'
df[[i]] <- 2
df
column1 column2
1 2
If the idea is to get rid of the setNames, you would probably never do this but
i <- 'column1'
data.frame(`attr<-`(list(1), "names", i))
# column1
# 1 1
You can see in data.frame, it has the code
x <- list(...)
vnames <- names(x)
so, you can mess with the name attribute.
Not exactly sure how you want it more efficient but you could add all the column names at once after your data frame has been assembled with colnames. Here's an example based on yours.
data.frame(Td)
a b
1 1 4
2 1 5
nam<-c("Test1","Test2")
colnames(Td)<-nam
data.frame(Td)
Test1 Test2
1 1 4
2 1 5
You could simply pass the name of your column variable and its values as arguments to a dataframe, without adding more lines:
df <- data.frame(column1=1)
df
# column1
#1 1

Create a list from a dataframe in R

Consider the following dataframe:
test.df <- data.frame(a = c("1991-01-01","1991-01-01","1991-02-01","1991-02-01"), b = rnorm(4), c = rnorm(4))
I would like to create a list from test.df. Each element of the list would be a subset dataframe of test.df corresponding to a specific value of column a, i.e. each date. In other words, in this case, column a takes unique values 1991-01-01 and 1991-02-01. Therefore, the resulting list would be comprised of two elements: the subset of test.df when a = 1991-01-01 (excluding column a), and the other element of the list would be the subset of test.df when 1991-02-01 = 2 (excluding column a). Here is the output I am looking for:
lst <- list(test.df[1:2,2:3], test.df[3:4,2:3])
Note that the subset dataframes may not have the same number of rows.
In my real practical example, column a is a date column with many more values.
I would appreciate any attempt of help! Thanks a lot!
You can use split
lst <- split(test.df, test.df$a)
If you want to get rid of column a, use split(test.df[-1], test.df$a) (thanks to #akrun for comment).
You can use the following code:
sapply(union(test.df$a,NULL), function(y,x) x[x$a==y,], x=test.df, simplify=FALSE)
You could also use the dlply function in the plyr package:
> library(plyr)
> dlply(test.df, .(a))
$`1991-01-01`
a b c
1 1991-01-01 1.3658775 0.9805356
2 1991-01-01 -0.2292211 2.2812914
$`1991-02-01`
a b c
1 1991-02-01 -0.2678131 0.5323250
2 1991-02-01 0.3736910 0.4988308
Or the data.table package:
> library(data.table)
> setDT(test.df)
> dt <- test.df[, list(list(.SD)), by = a]$V1
> names(dt) <- unique(test.df$a)
> dt
$`1991-01-01`
b c
1: 1.3658775 0.9805356
2: -0.2292211 2.2812914
$`1991-02-01`
b c
1: -0.2678131 0.5323250
2: 0.3736910 0.4988308

Resources