Categorising data within a range in R [duplicate] - r

This question already has answers here:
Convert continuous numeric values to discrete categories defined by intervals
(2 answers)
Closed 6 years ago.
I have a data frame in R that has a personal ID, an income and some other variables. I would like to add a new column to this data that categorises people in to which income group they fit in to (0-24,999, 25,000-49,999, 50,000-74,999, 75000-99,000, etc).
I then want to be able to make frequency tables of this data compared with some of the other variables (eg: weekly hours worked, age).
I should be fine to figure out the latter of these problems, but I am having trouble figuring out how to categorise my data. Any help would be greatly appreciated.
Thank you.

We can use cut or findInterval to group the "Variable"
gr <- cut(df1$Variable, breaks = c(0, 24999, 49999,74999,99999, Inf))
Then, use table to get the frequency count
table(gr, df1$age)

Related

plot 3 continuous variables against each other in one plot [duplicate]

This question already has answers here:
Plot each column against each column
(2 answers)
Plot all pairs of variables in R data frame based on column type [duplicate]
(1 answer)
Closed 17 days ago.
I have the output of 3 different algorithms as a continuous vector. Instead of comparing their correlation 1 by 1, I would like to plot them all simuntaionusly in the same plot, but in different panels. The dataframe looks like this (but contains >10k ids):
df <- data.frame(id=1:5,
feature1=runif(5),
feature2=runif(5,min = 3,max=5),
feature3=runif(5, min = 5,max=8))
Ideally, the resulting plot should looks something like this:
I am fairly sure that there is some simple tidyr function, which expands my dataframe in such a way that I can simply use ggplot2 in combination with facet_grid, but I searched and coudn't find anything..
Any help is much appreciated!

How to collapse/sum-up a data-frame by not needed subpopulation variables in R? [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Group by multiple columns and sum other multiple columns
(7 answers)
Closed 12 months ago.
Screenshot: raw data-frame organization of COVID-Cases in Germany
I downloaded the notified COVID-Cases in Germany from an official website. This raw data-frame is organized by the following columns (see also screenshot): "IdCounty", "NameCounty", "DateNotification", "AgeGroup", "Gender", "FreqCases".
What is a clever way in R to collapse/re-arrange/sum-up this raw data-frame by all categories in "AgeGroup" and "Gender", i.e. so this two subpopulation-breakdown variables will disappear, i.e. are collapsed ? Reason: I want to do analyses of the COVID-Cases by counties and time-points, but I don't want to differentiate further by age nor gender, i.e. just take all ages and all genders as sums together.
I struggled with various functions to achieve this, but I am pretty sure there is a smart & clever way to do this quite easily.
library(tidyverse)
data <- read_csv("https://example.de/covid.csv")
data %>%
# group only by county
group_by(IdCounty, NameCounty) %>%
summarise(FreqCases = sum(FreqCases))

Count occurrences of value in a set of variables in R (per column) [duplicate]

This question already has answers here:
Counting the number of elements with the values of x in a vector
(20 answers)
Closed 1 year ago.
I have this data and I want to figure out a way to know how many ones and how many zeros are in each column (ie Arts and Crafts). I have been trying different things but it hasn't been working. Does anyone have any suggestions?
You can use the table() function in R. This creates a categorical representation of your data. Additionally here convert list to vector I have used unlist() function.
df1 <- read.csv("Your_CSV_file_name_here.csv")
table(unlist(df1$ArtsAndCrafts))
If you want to row vice categorize the number of zeros and ones you can refer to this question in Stackoverflow.

How can i separate categorical data from continuous data in a dataset in R? [duplicate]

This question already has answers here:
Selecting only numeric columns from a data frame
(12 answers)
Closed 1 year ago.
I have a dataset and I need to plot histograms for all the continuous data which i know how to do, however I cant use a loop as there are categorical columns too, meaning histograms wont be created for them which will create an error. Is there a way to separate the continuous data from the categorical data? If worse comes to worst, I can just manually remove the categorical features however I would like to know if theres a way to do this automatically for future reference.
You can use package "dplyr", and in the example below, you chose all columns with factor variables
data <- data %>%
select_if(is.factor)

how to specify a column in a function in R [duplicate]

This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 5 years ago.
I'm trying to create a function in R that takes two inputs, the dataset name and a column from the dataset. I want to display all the continents and the total population of the continents when the dataset name and the column is inserted as parameters. I'm currently using the code below but it displays all the continents with 0 population. How can I display the total population next to each continent instead of 0. Any help will be greatly appreciated, thanks.
continentLifeExp <- function(data, column){
continents <- group_by(data, Continent)
summarise(continents, population = sum(data$`column`))
}
Assuming you want the population per contintent you should be able to just use
summarise_(continents, population = paste0("sum(",column, ")"))
summarize_ evaluates expressions in strings
Try this data[[column]]
As long as column is a string, this should work fine.
sum(data[[column]])

Resources