This question already has answers here:
Refer to the last column in R
(8 answers)
Closed 5 years ago.
I want to split a data frame, with an arbitrary number of columns, by the last column, without providing a column name or number. Something like [imaginary code land]:
d <- split(MY_DATA, ncol(MYDATA))
A sample data set might be something like:
pepsi 1
dr_pep 2
coke 1
Where our data set has no headers, by the last column would represent a desired grouping like the following:
dr_pep 2 --> group 2
pepsi 1 --> group 1
coke 1
df <- read.table(text = 'pepsi 1
dr_pep 2
coke 1', header=F)
split(df, df[,ncol(df)])
Related
This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 2 years ago.
In my data frame on how to count the number of each subject id and add a trails column with those many numbers per subject. Say, I have sub 1 doing something 10 times so I want the trail as (1,2,3....10) and say sub2 is doing something 15 times(1,2,3,4...15).How can I do this?
Here is another alternative with data.tablepackage. The code and the output is as follows:-
library(data.table)
df <- data.frame(subject = c("maths","maths","maths","science","science"))
df <- data.table(df)
df[, trail := seq_len(.N), by = subject]
df
#subject trail
#1: maths 1
#2: maths 2
#3: maths 3
#4: science 1
#5: science 2
This question already has answers here:
Filtering a data frame by values in a column [duplicate]
(3 answers)
Closed 4 years ago.
Lets say I have the following data frame in r:
> patientData
patientID age diabetes status
1 1 25 Type 1 Poor
2 2 34 Type 2 Improved
3 3 28 Type 1 Excellent
4 4 52 Type 1 Poor
How can I reference a specific row or group of rows by using the specific value/level of a particular column rather than the row index? For instance, if I wanted to set a variable x to equal all of the rows which contain a patient with Type 1 diabetes or all of the rows that contain a patient in "Improved" status, how would I do that?
Try this one:
library(dplyr)
patientData %>%
filter(diabetes == "Type 1")
Next time, please provide a Minimum Reproducible Example.
This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 6 years ago.
I've got a dataframe like this:
The first column is numeric, and the second column is a comma separated list (character)
id numbers
1 2,4,5
2 1,4,6
3 NA
4 NA
5 5,1,2
And I want to in essence "melt" the dataframe similar to the reshape package. So that the output is a dataframe which looks like this
id numbers
1 2
1 4
1 5
2 1
2 4
2 6
3 NA
4 NA
5 5
5 1
5 2
Except in the reshape2 package each number will have to be each in a column... which takes up too much storage space if there are many numbers... which is why I have opted to set the list of numbers as a comma separated list. But melt no longer works with this setup.
Can you recommend the most efficient way to achieve the transformation from the input dataframe to output dataframe?
The way I would do it for each row, create a data.frame and store them in a list, where df is your initial data.frame.
l = list()
for (j in 1:nrow(df)){
l[[j]] = data.frame(id = df$id[[j]],
numbers = split(df$numbers[[j]], ','))
}
Afterwards, you can stack all list elements into a single data.frame using plyr::ldply with the 'data.frame' option.
This question already has answers here:
Collapsing data frame by selecting one row per group
(4 answers)
Remove duplicated rows using dplyr
(6 answers)
Closed 6 years ago.
My apologies for this title, i didn't succeeded to find a good explicit title.
Here is a reproducible code for what my data looks like :
subject = gl(3,4,12)
item = factor(c("A","B","B","A","A","A","B","B","A","B","A","B"))
set.seed(123)
rt = runif(12, 1000, 2000)
df = data.frame(subject, item, rt)
> df
subject item rt
1 A 1287.578
1 B 1788.305
1 B 1408.977
1 A 1883.017
2 A 1940.467
2 A 1045.556
2 B 1528.105
2 B 1892.419
3 A 1551.435
3 B 1456.615
3 A 1956.833
3 B 1453.334
I would like to subset my data.frame in order to keep only the first occurence of each item for each subject.
For each subject, the item order is random and each item has been seen twice but i would like to keep only the first occurence.
Any idea of a simple way to do this ?
This question already has answers here:
How to do vlookup and fill down (like in Excel) in R?
(9 answers)
Closed 7 years ago.
I have a table of pending bills in the Scottish Parliament. One of the columns (BillTypeID) is populated with numbers that indicate what type of bill each one is (there are seven different types of bills).
I have another table that describes which number corresponds to which bill types ( 1 = "Executive", 2 = "Member's", etc.)
I want to replace the number in my main table with the corresponding string that describes the type for each bill.
Data:
bills <- jsonlite::fromJSON(url("https://data.parliament.scot/api/bills"))
bill_stages <- jsonlite::fromJSON(url("https://data.parliament.scot/api/billstages"))
This is probably a duplicate but I can't find the corresponding answer ...
The easiest way to do this is with merge().
d1 <- data.frame(billtype=c(1,1,3,3),
bill=c("first","second","third","fourth"))
d2 <- data.frame(billtype=c(1,2,3),
billtypename=c("foo","bar","bletch"))
d3 <- merge(d1,d2)
##
## billtype bill billtypename
## 1 1 first foo
## 2 1 second foo
## 3 3 third bletch
## 4 3 fourth bletch
... then drop the billtype column if you don't want it any more. You can probably do it slightly more efficiently with match() (see my answer to the linked question).