Sorting based on columns in R [duplicate] - r

This question already has answers here:
Sort (order) data frame rows by multiple columns
(19 answers)
Closed 7 years ago.
How to sort a matrix based on columns and its values in R?
For Example :
I have a matrix like this :
ID Name Number
1 Bat 43
2 Apple 42
4 Dog 41
5 Ball 41
6 Cat 40
I want to sort the matrix based on the values of the column Number. If two values are same then it should sort based on the column Name. The exepcted output should be
ID Name Number
6 Cat 40
5 Ball 41
4 Dog 41
2 Apple 42
1 Bat 43
Since, Ball and Dog has same value for the column Number . They are sorted according to the column Name(that is alphabetically). Can someone help me in doing this?

using order:
df[with(df, order(Number, Name)), ]

Related

creating multi rows depend on special conditions [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 3 years ago.
I have data.frame as follows :
duration classlabel
100 W
120 1
390 2
30 3
30 2
150 3
30 4
60 3
60 4
30 3
120 4
30 3
120 4
I have to make a number of lines according to duration with the class label in R. as an example, I have to make 100 rows with the class label 'W', and then 120 rows with the class label '2', etc.
anyone, can help me to solve this problem?
An option would be uncount
library(tidyr)
uncount(df1, duration, .remove = FALSE)
Or with rep from base R to replicate the sequence of rows by 'duration' column and expand the rows based on the numeric index
df1[rep(seq_len(nrow(df1)), df1$duration),]

R Concatenate column in data frame with one value/string [duplicate]

This question already has answers here:
How to add leading zeros?
(8 answers)
Closed 4 years ago.
I am trying to concatenate some data in a column of a df, with "0000"
I tried to use paste() in a loop, but it becomes very performance heavy, as I have +2.000.000 rows. Thus, it takes forever.
Is there a smart, less performance heavy way to do it?
#DF:
CUSTID VALUE
103 12
104 10
105 15
106 12
... ...
#Desired result:
#DF:
CUSTID VALUE
0000103 12
0000104 10
0000105 15
0000106 12
... ...
How can this be achieved?
paste is vectorized so it'll work with a vector of values (i.e. a column in a data frame. The following should work:
DF <- data.frame(
CUSTID = 103:107,
VALUE = 13:17
)
DF$CUSTID <- paste0('0000', DF$CUSTID)
Should give you
CUSTID VALUE
1 0000103 13
2 0000104 14
3 0000105 15
4 0000106 16
5 0000107 17

Apply numbering 1-n for every variable in a long form data frame containing NaN values [duplicate]

This question already has answers here:
Order of occurance of the same value in a vector
(1 answer)
Adding an repeated index for factors in data frame
(4 answers)
R create ID within a group [duplicate]
(2 answers)
Closed 5 years ago.
Say I have a long form data frame, of time series data, basically. It's going to look like this. Somewhere along my conversion of raw data the numbering got lost, and so I'd like to get back a column of frame numberings (starting from 1).
The $frame column is my desired output.
Edit: Newly added NaN values in my example, see comments below. Also changed title of question to reflect this specifically.
name value frame
A 41 1
A NaN 2
A 72 3
B 24 1
B 51 2
C 28 1
C NaN 2
C 57 3
C NaN 4
C 34 5
D 24 1
D 75 2

comparing the value of the first column of two dataframe to find the index of the same values in R? [duplicate]

This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 8 years ago.
what I wanted to do is to compare the first column of two data frame and find the indexes of the same value and assign the element of the second column of the first dataframe to the second dataframe :
please see the example :
datafranmeA dataframeB
id number id
1 1 45 1
2 3 78 4
3 5 67 12
4 12 18 5
5 4 44 8
6 8 32
7 13 41
output : dataframeB
id number
1 1 45
2 4 44
3 12 18
4 5 67
5 8 32
I use a two for loop and if to compare but it is really slow as my own data is really big , how should I speed it up ?
for (i in 1:length(A[,1])){
for (j in 1:length(B[,1])){
if (A[i,1]==B[j,1]) {
B[j,2]=A[i,2]}}}
thank you in advance,
Try
library(dplyr)
left_join(dataframeB, dataframeA)

sum columns of a data frame depending on the category the observations belong to [duplicate]

This question already has answers here:
Sum of rows based on column value
(4 answers)
Closed 8 years ago.
I have a dataframe as such:
Response Spent Saved
1 Yes 100 25
2 Yes 200 50
3 No 20 2
4 No 13 3
I would like to sum up the amounts Spent and Saved, depending on the Response, ie:
Response Spent Saved
1 Yes 300 75
2 No 33 5
Right now, I am using a hackneyed approach, where I subset the dataframe into 2 new dataframes, convert the 2nd and 3rd columns into numeric data, do a colSums on each column individually, then save the outputs into a vector, then create a new dataframe....suffice to say it is a terrible approach.
How could I do this is a more effective manner?
Thanks for reading
Check ?aggregate
If your data.frame is DF, following should do what you want.
aggregate(. ~ Response, data = DF, FUN = sum)
## Response Spent Saved
## 1 No 33 5
## 2 Yes 300 75

Resources