I am trying to subset/filter in a data frame.
Col1 Col2
A 23454,34543
B 23456
C 34543,34532
I want to subset and get Row B alone example : B 23456, based on the length of Col2 either as grep or some other function.
Related
This question already has answers here:
filling in columns with matching IDs from two dataframes in R
(2 answers)
Simple lookup to insert values in an R data frame
(5 answers)
Closed 1 year ago.
Relatively new to R...
I am trying to create a column in a data frame (df1, that currently has a single column), in which each new value will be determined based on the value in the existing column - by referring to another (reference) data frame (df2) that has two columns and is effectively like a hash. I was trying to avoid making an actual has because that doesn't seem to be the done thing in R.
So, the reference dataframe df2 looks like this:
col1 col2
A 71
R 156
N 114
D 115
...
The values in col1 only occur once each in the column
The data frame df1 that I'm working on might look like this (for example):
D
D
R
A
N
A
D
...
So, for each row in df1, I'd like to create a new column where the script takes the col1 value from df1, looks up df2, finds the matching value in col1, and then takes the corresponding value from col2 and places it in the new column in df1. So, if it worked, I'd end up with df1 looking like this:
D 115
D 115
R 156
A 71
N 114
A 71
D 115
...
I have a question about R. If I havr a date like this
# name : Matrix
col1. col2
row1
row2
How can read the name of the subset of data from header and make a frame of same name in R. So that I can use it by the name Matrix.
And if I needed to read number of rows as specified in the header. What should I do. For example if this was the header, I would like to read 2 columns and 2 rows and name this frame Matrix
# name : Matrix
# row: 2
# col: 2
col1. col2
row1
row2
Thank you all.
I have a list of data frames, with each data frame named after patient ID.
df.list <- (1297, 2468, 3323, 4453, 4785, 6489, 7338, 8244, 9345, etc.)
Each data frame has data like this (this is very simplified, but it gets the point across):
A B C D
1 8 4 2
3 4 6 8
I want to merge all of the data frames in the list so that all A values are in one column, all B values in another, etc.
However, I also want to add a new column which tells me which patient this data came from. So I would like to extract the name of the data frame (which is patient ID) from which the data in that particular row came from and add this value to a new column in the merged data frame. I plan on merging it using rbind, but I do not know how to add another column with the patient ID information.
The goal is to have the following information in the final data frame:
A B C D Patient ID
Any help is appreciated!
Thanks!
Using the input data shown in reproducible form in the Note below, rbind the data frames together. The row names will contain the ID followed by a suffix indicating the row number so we can get the desired data frame, df2, like this:
df2 <- do.call("rbind", mget(df.list))
df2$id <- sub("[.].*", "", rownames(df2))
rownames(df2) <- NULL
Note: We assume this input data:
df.list <- c(1297, 2468, 3323, 4453, 4785, 6489, 7338, 8244, 9345)
df.list <- as.character(df.list)
Lines <- "A B C D
1 8 4 2
3 4 6 8"
df <- read.table(text = Lines, header = TRUE)
for(nm in df.list) assign(nm, df)
I have a list of data frames of the following structure:
cust_num V2 V3 ...
Each data frame present a group of customers, where cust_num can appear more than once in a single data frame.
I want to extract the unique customers of each data frame and to insert them to a new data frame with index of the data frame (i.e., group) they came from.
Here is an example:
# df1
cust_num V2 V3 ...
1
1
2
# df2
cust_num V2 V3 ...
4
4
5
and I want my result to be:
cust_num group
1 1
2 1
4 2
5 2
I tried to use for loop, but I got troubles inserting the data into new data frame and create the group index:
for (i in 1:length(df_list)) {
x <- unique(df_list[[i]][1])
new_df <- rbind(x)
}
Thank you in advance
If dat is your list of data frames:
do.call(rbind,lapply(seq_along(dat), function(x) data.frame(cust_num=unique(dat[[x]][,1]),group=x)))
If I understand your problem correctly, I think you can avoid a for loop by using the dplyr package to bind the data frames together while adding in that index column.
library(dplyr)
# Bind list of data frames into a single data frame.
d1 <- bind_rows(df_list, .id = "index")
# Remove duplicates.
filter(d1, !duplicated(d1))
This question already has answers here:
Select subset of columns in data.table R [duplicate]
(7 answers)
Closed 6 years ago.
I would like to pass a variable (that holds the column name as a string) as argument to data.table. How do I do it?
Consider a data.table below:
myvariable <- "a"
myvariable_2 <- "b"
DT = data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18)
DT
# ID a b c
# 1: b 1 7 13
# 2: b 2 8 14
# 3: b 3 9 15
# 4: a 4 10 16
# 5: a 5 11 17
# 6: c 6 12 18
I can use subset to extract columns i.e: subset(DT, TRUE, myvariable)but this just outputs the column/s
How do I use subset to extract column based on some criteria? e.g: extract myvariable column when myvariable_2 < 10
How do I extract summary statistics over groups by passing column names as variables?
How do I plot descriptive plots using data.table by passing column names as variables?
I know that this could be easier in data.frame i.e. passing variables as column names. But I read everywhere that data.table is faster/memory efficient hence would like to stick with it.
Does switching between data.table and data.frame have huge memory/performance implications?
I do not want to explicitly code the column names as I want this piece of code to be re-usable.
the comment from #thelatemail is a very good start. Do read that first! Another quick way is below
library(data.table)
df = data.table(a=1:10, b=letters[1:2], c=11:20)
var1="a"
var2="b"
dt1=df[,c(var1,var2), with=F]
Think of "with=F" as making "j" part data.table behave like that of data.frame
Edit 1 : to subset on a condition within a datatable
df[get(var1) > 5, c(var1, var2),with = F]