This question already has answers here:
refer to range of columns by name in R
(6 answers)
Closed 6 years ago.
> df
a b c d e
1 1 4 7 10 13
2 2 5 8 11 14
3 3 6 9 12 15
To subset the columns b,c,d we can use df[,2:4] or df[,c("b", "c", "d")]. However, I am looking for a solution which fetches me the columns b,c,d using something like df[,b:d]. In other words, I want to simply use the first and last column names of interest to subset the data. I have been looking for a solution to this but am unsuccessful. All the examples I have seen till date refer to each and every specific column name while subsetting.
It's also simple in base R, e.g.:
subset(df, select=b:d)
Or roll your own:
df[do.call(seq, as.list(match(c("b","d"), names(df))) )]
If you are open to using dplyr:
dplyr::select(df, b:d)
b c d
1 4 7 10
2 5 8 11
3 6 9 12
Related
This question already has answers here:
Remove duplicated rows
(10 answers)
Closed 2 years ago.
I have tried various functions including compare and all.equal but I am having difficulty finding a test to see if variables are the same.
For context, I have a data.frame which in some cases has a duplicate result. I have tried copying the data.frame so I can compare it with itself. I would like to remove the duplicates.
One approach I considered was to look at row A from dataframe 1 and subtract it from row B from dataframe 2. If they equal to zero, I planned to remove one of them.
Is there an approach I can use to do this without copying my data?
Any help would be great, I'm new to R coding.
Suppose I had a data.frame named data:
data
Col1 Col2
A 1 3
B 2 7
C 2 7
D 2 8
E 4 9
F 5 12
I can use the duplicated function to identify duplicated rows and not select them:
data[!duplicated(data),]
Col1 Col2
A 1 3
B 2 7
D 2 8
E 4 9
F 5 12
I can also perform the same action on a single column:
data[!duplicated(data$Col1),]
Col1 Col2
A 1 3
B 2 7
E 4 9
F 5 12
Sample Data
data <- data.frame(Col1 = c(1,2,2,2,4,5), Col2 = c(3,7,7,8,9,12))
rownames(data) <- LETTERS[1:6]
This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 2 years ago.
I want to create a data frame by repeating rows by using content of a column in a data frame. Below is the source data frame.
data.frame(c("a","b","c"), c(4,5,6), c(2,2,3)) -> df
colnames(df) <- c("sample", "measurement", "repeat")
df
sample measurement repeat
1 a 4 2
2 b 5 2
3 c 6 3
I want to repeat the rows by using the "repeat" column and its content to get a data frame like the one below. Ideally, I would like to have a function to this.
sample measurement repeat
1 a 4 2
2 a 4 2
3 b 5 2
4 b 5 2
5 c 6 3
6 c 6 3
7 c 6 3
Thanks in advance!
Solved. df[rep(rownames(df), df$repeat), ] did the job.
This question already has answers here:
Split data.frame into groups by column name
(2 answers)
Closed 4 years ago.
I have a data frame with 3 columns, for example:
my.data <- data.frame(A=c(1:5), B=c(6:10), C=c(11:15))
I would like to split each column into its own data frame (so I'd end up with a list containing three data frames). I tried to use the "split" function but I don't know what I would set as the factor argument. I tried this:
data.split <- split(my.data, my.data[,1:3])
but that's definitely wrong and just gives me a bunch of empty data frames. It sounds fairly simple but after searching through previous questions I haven't come across a way to do this.
Not sure why you'd want to do that; lapply let's you already operate on the columns directly; but you could do
lst <- split(t(my.data), 1:3);
names(lst) <- names(my.data);
lst;
#$A
#[1] 1 2 3 4 5
#
#$B
#[1] 6 7 8 9 10
#
#$C
#[1] 11 12 13 14 15
Turn vector entries into data.frames with
lapply(lst, as.data.frame);
You can use split.default, i.e.
split.default(my.data, seq_along(my.data))
$`1`
A
1 1
2 2
3 3
4 4
5 5
$`2`
B
1 6
2 7
3 8
4 9
5 10
$`3`
C
1 11
2 12
3 13
4 14
5 15
This question already has answers here:
Joining aggregated values back to the original data frame [duplicate]
(5 answers)
Closed 6 years ago.
I am trying to compute an additional column in my dataframe that contains some summary data (mean, min, max). Starting from this dataframe
Group Value
A 15
A 5
B 4
B 2
C 25
C 15
I would like to calculate means for every group:
Group Mean
A 10
B 3
C 20
But i would like to add a column to the original dataframe repeating the value for every row of the same group, like this:
Group Value Mean
A 15 10
A 5 10
B 4 3
B 2 3
C 25 20
C 15 20
I managed to obtain this result using aggregate first (to create a temporary dataframe) and than merge the original dataframe with the temporary one using "Group" as merging variable.
I am sure there is an easier and faster way to do this. Of note, i would like to be able to do this with the base functions (e.g. no dplyr, reshape, etc) if possible. Thank you!
In base R, this can be easily done with ave
df$Mean <- with(df, ave(Value, Group))
df
# Group Value Mean
#1 A 15 10
#2 A 5 10
#3 B 4 3
#4 B 2 3
#5 C 25 20
#6 C 15 20
This question already has an answer here:
Subset of table in R using row numbers?
(1 answer)
Closed 9 years ago.
I have a large data frame which I would like to break down into smaller data frames. I know which rows I would like to split up (i.e I want to separate rows 1 - 33, 34 - 60, ....). I know I have to use subset(), but I cant seem to find the specific parameters.
If you mean from the 1st to the 33th row, just do this
df[1:33,]
as an example:
> df<-data.frame(A=LETTERS[1:10], B=c(1:10))
> df
A B
1 A 1
2 B 2
3 C 3
4 D 4
5 E 5
6 F 6
7 G 7
8 H 8
9 I 9
10 J 10
> df[1:3,]
A B
1 A 1
2 B 2
3 C 3