Replacing values in data frame column using another data frame [duplicate] - r

This question already has answers here:
How to do vlookup and fill down (like in Excel) in R?
(9 answers)
Closed 7 years ago.
I have a table of pending bills in the Scottish Parliament. One of the columns (BillTypeID) is populated with numbers that indicate what type of bill each one is (there are seven different types of bills).
I have another table that describes which number corresponds to which bill types ( 1 = "Executive", 2 = "Member's", etc.)
I want to replace the number in my main table with the corresponding string that describes the type for each bill.
Data:
bills <- jsonlite::fromJSON(url("https://data.parliament.scot/api/bills"))
bill_stages <- jsonlite::fromJSON(url("https://data.parliament.scot/api/billstages"))

This is probably a duplicate but I can't find the corresponding answer ...
The easiest way to do this is with merge().
d1 <- data.frame(billtype=c(1,1,3,3),
bill=c("first","second","third","fourth"))
d2 <- data.frame(billtype=c(1,2,3),
billtypename=c("foo","bar","bletch"))
d3 <- merge(d1,d2)
##
## billtype bill billtypename
## 1 1 first foo
## 2 1 second foo
## 3 3 third bletch
## 4 3 fourth bletch
... then drop the billtype column if you don't want it any more. You can probably do it slightly more efficiently with match() (see my answer to the linked question).

Related

Creating a loop to count the number of instances of each number, record the unique number and count of each instances in a new data frame [duplicate]

This question already has answers here:
Count number of occurences for each unique value
(14 answers)
Closed 2 years ago.
apologies in advance I am a beginner to R. I have loaded a CVS file into a new data frame - One of the columns provides a category number (from 1 to 6).I want to create a loop to count the number of times each category number appears , and then store this within a new data frame. (The new data frame would be the category number and how many times it appears)
I have created the below script so far however unsure how to store the results within the new data frame and include the category number.
Summarydf<-NULL
unique<-c(unique(Data$Type))
for (i in unique) {
Summarydf<-c(sum(Data$Type==i))
print(Summarydf)
}
You can just convert Data$Type to a factor and get a summary as a vector of the number of occurrences of each type. e.g.:
L <- LETTERS
Type <- sample(1:6, 26, replace = TRUE)
Data <- data.frame(L, Type)
Data$Type = as.factor(Data$Type)
summaryType <- summary(Data$Type)
summaryType
1 2 3 4 5 6
4 4 5 7 3 3

How to count the number of occurence of First Charcter of each string of a column in R [duplicate]

This question already has answers here:
Counting unique / distinct values by group in a data frame
(12 answers)
Closed 4 years ago.
I have a data set which has a single column containing multiple names.
For eg
Alex
Brad
Chrisitne
Alexa
Brandone
And almost 100 records like this. I want to display record as
A 2
B 2
C 1
Which means i need to show this frequency from higher to lower and if there is a tie breaker , the the values should be shown in Alphabetical Order .
I have been trying to solve this but i am not able to.
Any pointer on these ?
df <- data.frame(name = c("Alex", "Brad", "Brad"))
first_characters <- substr(df$name, 1, 1)
result <- sort(table(first_characters), decreasing = TRUE)
# from wide to long
data.frame(result)

How can I reference a specific row(s) in a data frame using an instance of a column variable in r? [duplicate]

This question already has answers here:
Filtering a data frame by values in a column [duplicate]
(3 answers)
Closed 4 years ago.
Lets say I have the following data frame in r:
> patientData
patientID age diabetes status
1 1 25 Type 1 Poor
2 2 34 Type 2 Improved
3 3 28 Type 1 Excellent
4 4 52 Type 1 Poor
How can I reference a specific row or group of rows by using the specific value/level of a particular column rather than the row index? For instance, if I wanted to set a variable x to equal all of the rows which contain a patient with Type 1 diabetes or all of the rows that contain a patient in "Improved" status, how would I do that?
Try this one:
library(dplyr)
patientData %>%
filter(diabetes == "Type 1")
Next time, please provide a Minimum Reproducible Example.

Viewing single column of data frame in R [duplicate]

This question already has answers here:
How to subset matrix to one column, maintain matrix data type, maintain row/column names?
(1 answer)
How do I extract a single column from a data.frame as a data.frame?
(3 answers)
Closed 5 years ago.
I am running a simulation model that creates a large data frame as its output, with each column corresponding to the time-series of a particular variable:
data5<-as.data.frame(simulation3$baseline)
Occasionally I want to look at subsets, especially particular columns, of this data frame in order to get an idea of the output. For this I am using the View-function like so
View(data5[1:100,1])
for instance, if I wish to see the first 100 rows of column 1. Alternatively, I also sometimes do something like this, using the names of the time series:
timeframe=1:100
toAnalyse=c("u","u_n","u_e","u_nw")
View(data5[timeframe,toAnalyse])
In either case, there is an annoying display problem when I am trying to view a single column on its own (as for instance with View(data5[1:100,1])), whereby what I get looks like this:
Example 1
As you can see, the top of the table which would usually contain the name of the variable in the dataset instead contains a string of all values that the variable takes. This problem does not appear if I select 2 or more columns:
Example 2
Does anyone know how to get rid of this issue? Is there some argument that I can feed to View to make sure that it behaves nicely when I ask it to just show a single column?
View(data5[1:100,1, drop=FALSE])
When you access a single column of a data frame it is converted to a vector, drop=FALSE prevents that and retains the column name.
For instance:
> df
n s b
1 2 aa TRUE
2 3 bb TRUE
3 5 cc TRUE
> df[, 1]
[1] 2 3 5
> df[, 1, drop=FALSE]
n
1 2
2 3
3 5

Split a data frame by the last column programmatically [duplicate]

This question already has answers here:
Refer to the last column in R
(8 answers)
Closed 5 years ago.
I want to split a data frame, with an arbitrary number of columns, by the last column, without providing a column name or number. Something like [imaginary code land]:
d <- split(MY_DATA, ncol(MYDATA))
A sample data set might be something like:
pepsi 1
dr_pep 2
coke 1
Where our data set has no headers, by the last column would represent a desired grouping like the following:
dr_pep 2 --> group 2
pepsi 1 --> group 1
coke 1
df <- read.table(text = 'pepsi 1
dr_pep 2
coke 1', header=F)
split(df, df[,ncol(df)])

Resources