how to specify a column in a function in R [duplicate] - r

This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 5 years ago.
I'm trying to create a function in R that takes two inputs, the dataset name and a column from the dataset. I want to display all the continents and the total population of the continents when the dataset name and the column is inserted as parameters. I'm currently using the code below but it displays all the continents with 0 population. How can I display the total population next to each continent instead of 0. Any help will be greatly appreciated, thanks.
continentLifeExp <- function(data, column){
continents <- group_by(data, Continent)
summarise(continents, population = sum(data$`column`))
}

Assuming you want the population per contintent you should be able to just use
summarise_(continents, population = paste0("sum(",column, ")"))
summarize_ evaluates expressions in strings

Try this data[[column]]
As long as column is a string, this should work fine.
sum(data[[column]])

Related

How do I sort my R data by a specific named condition inside a column? [duplicate]

This question already has answers here:
rearrange a data frame by sorting a column within groups
(3 answers)
Subsetting from a Data Frame
(2 answers)
Closed 2 months ago.
I have two columns in my dataset in R studio right now: one is "experience level", which contains four different two letter abbreviations ("SE", "MI", "EX", "EN") related to the experience level of an employee. The second column is "salary", which is the employee's salary in USD. How can I create a data frame or sort the data by a specific experience level, such as showing only salaries that are a part of "EN" employees?
I am not sure where to start even. Have tried using group_by but to no avail.
Showing "only" salaries that are part of a group, can be done with filter()
Sorting can be done with the arrange() function
library(tidyverse)
df %>%
filter(experience=="EN") %>% # filters to only EN
arrange(desc(salary)) #sort/arrange the salary data, descending (high to low)

How to collapse/sum-up a data-frame by not needed subpopulation variables in R? [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Group by multiple columns and sum other multiple columns
(7 answers)
Closed 12 months ago.
Screenshot: raw data-frame organization of COVID-Cases in Germany
I downloaded the notified COVID-Cases in Germany from an official website. This raw data-frame is organized by the following columns (see also screenshot): "IdCounty", "NameCounty", "DateNotification", "AgeGroup", "Gender", "FreqCases".
What is a clever way in R to collapse/re-arrange/sum-up this raw data-frame by all categories in "AgeGroup" and "Gender", i.e. so this two subpopulation-breakdown variables will disappear, i.e. are collapsed ? Reason: I want to do analyses of the COVID-Cases by counties and time-points, but I don't want to differentiate further by age nor gender, i.e. just take all ages and all genders as sums together.
I struggled with various functions to achieve this, but I am pretty sure there is a smart & clever way to do this quite easily.
library(tidyverse)
data <- read_csv("https://example.de/covid.csv")
data %>%
# group only by county
group_by(IdCounty, NameCounty) %>%
summarise(FreqCases = sum(FreqCases))

Count occurrences of value in a set of variables in R (per column) [duplicate]

This question already has answers here:
Counting the number of elements with the values of x in a vector
(20 answers)
Closed 1 year ago.
I have this data and I want to figure out a way to know how many ones and how many zeros are in each column (ie Arts and Crafts). I have been trying different things but it hasn't been working. Does anyone have any suggestions?
You can use the table() function in R. This creates a categorical representation of your data. Additionally here convert list to vector I have used unlist() function.
df1 <- read.csv("Your_CSV_file_name_here.csv")
table(unlist(df1$ArtsAndCrafts))
If you want to row vice categorize the number of zeros and ones you can refer to this question in Stackoverflow.

Need help to create a new dataframe (from data on countries over year) with all values from the year 2016 [duplicate]

This question already has answers here:
Filter data.frame rows by a logical condition
(9 answers)
Closed 1 year ago.
My R knowledge is pretty limited, but I have to an analysis for a project which is due in a few days and was hoping I could get some quick help around here!
I created this dataset https://1drv.ms/x/s!ArVyXA5cSMj2h7Mf07SZaVUSK3421Q?e=GQBfeU
but only want to use the data for the year 2016.
I would either like to create a new data frame containing only those rows which include year = 2016 or do my linear regression with the original data frame, but only using the 2016 data - either way will work fine!
Tried googling this, but I wasn't sure what to search for...
To create a new dataframe with year of 2016 we could use filter to filter your dataframe df:
new_dataframe <- filter(df, year==2016)
library(readxl)
co2_open_cvs <- read_excel("path_to_file/co2_open_cvs.xlsx")
library(dplyr)
co2_open_cvs_only_2016 <- co2_open_cvs %>% filter(year == 2016)

Categorising data within a range in R [duplicate]

This question already has answers here:
Convert continuous numeric values to discrete categories defined by intervals
(2 answers)
Closed 6 years ago.
I have a data frame in R that has a personal ID, an income and some other variables. I would like to add a new column to this data that categorises people in to which income group they fit in to (0-24,999, 25,000-49,999, 50,000-74,999, 75000-99,000, etc).
I then want to be able to make frequency tables of this data compared with some of the other variables (eg: weekly hours worked, age).
I should be fine to figure out the latter of these problems, but I am having trouble figuring out how to categorise my data. Any help would be greatly appreciated.
Thank you.
We can use cut or findInterval to group the "Variable"
gr <- cut(df1$Variable, breaks = c(0, 24999, 49999,74999,99999, Inf))
Then, use table to get the frequency count
table(gr, df1$age)

Resources