Working on the Titanic dataset and using Sm.binning to categorize continuous variables into categorical - r

Is there a way to automate the process of categorizing like the function cut does when using sm.binning? https://imgur.com/a/jy637
basically I want to have a new variable somehow that automatically puts it into the right category without right a bunch of nested ifelse statements.
Thanks!!

Related

Is there a way to plot variable that can take multiple levels at once with ggplot

I have a data set of events I went to where one variable/column is the person that went with me. Sometimes it is one person, sometimes a group of multiple people.
I'm having trouble plotting this with ggplot. I've tried formatting the data but either, the list of people is just displayed as one cohesive list without the individual people in it or just the first person in the group is shown.
Is it possible to somehow format data in a way that one variable can take multiple levels at once? Or some package in ggplot that allows that?
Thanks a lot :)

How can I create a matrix of scatterplots that summarize interactions between two sets of variables?

I have two sets of variables, one of environmental data, and one of response data. I want to make a matrix of scatterplots that shows how each of my response variables interacts with each of my environmental variables.
I have tried adapting the ggally package's ggpairs() function, but it seems to only work within one set of variables, showing the correlation between all combinations of variables. I am not interested in how each of my response variables correlates with each other or how my environmental variables correlate with each other. Showing this extra information would not be an issue, except that I have many more environmental variables than response variables, so if I create a symmetrical correlogram, my environmental variables dominate the figure and make it difficult to read. I could possibly remove columns and rows of the matrix of plots to create a figure with only one set of variables on either larger axis, but cannot find any way to 'crop' the correlogram this function creates.
Is there some way to adapt ggpairs() to my needs? Or is there another function that would better suit my purposes?

Interaction between a long list of variables and one specific variable

I have a long list of variables that I want to throw into a logistic model using the caret package. However, I want to interact all the variables in the list with one specific variable in the list. For example, I have variables v1, v2, ..., v100 and I want to interact all of these variables with v5. I'm not sure if there's a handy way of doing this efficiently and quickly without having to type out each pair. Any help with be great. Thanks!

Apply same subset to multiple datasets in R

I am pretty new to R, but I am working with several datasets containing the same data only from different days.
For my analysis I only need some specific columns from this dataset, therefore I created a new dataset with only the new colums (I do not want to overwrite or delete the old dataset). I am using the following code to do this:
subset01012018 <- (dataset01012018[,c(1,2,3,4,10,11,14,15,16)])
Now I want to apply the same to all the datasets. How could I do something like this? Could I do this with a for loop? Or do I need an apply function?
Hope someone can help me!

Stacked or Grouped Barcharts from Weighted Survey Data (Class="survey.design2" "survey.design") in R

I am working with weighted survey data of the class survey.design2 and survey.design. With the package survey, and the function call svytable, I can create contingency tables for survey data. With these contingency tables, I can then create normal bar-charts using lattice. The standard way for doing this (e.g. barchart(cars ~ mpg | factor(cyl), data=mtcars,...)) doesn't work for this data type.
I am used to working with ggplot2, and would like to create either stacked or grouped bar-charts, if possible even with facet-wraps. Unfortunately, ggplot2 does not know how to deal with data of the type survey.design2 either. As far as I am concerned, there also does not exist some sort of add-on, which would allow ggplot2 to deal with this kind of data.
So far I have:
sub-set my data set
converted it into class survey.design2 with the function call svydesign(),
plotted multiple bar-charts in one window using grid.arrange(). This sort of provides for a work around for facetting, but still doesn't allow me to create stacked or grouped bar-charts.
I'd be grateful for any suggestions.
Thank you
Good morning MatthewR
I have a data set with 62732 observations and 691 variables.
Original Data Set
So any example based on a random number generator should work as well, I guess. I am really just interested in a work around to this issue, not necessarily the final code.
I then convert the data frame into survey.design format using:
df_Survey <- svydesign(id=~1, weights=~IXPXHJ, data=df). IXPXHJ is the variable by which the original sample data set will be weighted so as to get the entire population. head(df$IXPXHJ) looks something like this:
87.70876
78.51809
91.95209
94.38899
105.32005
56.30210
str(df_Survey) looks something like this.
Survey Data Structure

Resources