I have one response variable which is continous and six output variables. Out of six output variables four of them are categorical.
I needed to plot a line diagram with the every unique combinations of data frame consisting from categorical variables with respect to x and y.
Please help me.
Related
I'm working with a dataset where I have one continous variable (V1) and want to see how that variable differs depending on demographics such as sex, age group etc.
I would like to do one graph that contains multiple boxplots - so that V1 is on the Y-axis and all my demographic variables (sex, age groups etc.) are on the x-axis with their corresponding p-values. Anyonw know how to do this in R?
I've added two photos to illustrate my dataset and the output I want.
Thanks!
Output example
Data example
It would be nice to have actual data and the code you already have so we can replicate what you have and work what you want. That being said, this link might be what you are looking for:
https://statisticsglobe.com/draw-multiple-boxplots-in-one-graph-in-r#example-2-drawing-multiple-boxplots-using-ggplot2-package
Scroll down about half way to Example 4: Drawing Multiple Boxplots for Each Group Side-by-Side
I have several data files and I showed their intersects with an upset plot. I now want to know what are the unique values in each dataset? For example, as in this picture, how can I extract the names/values of 232 sets of Thriller category?
I first used union to combine all my data into a single dataframe and then I used setdiff in setdiff(data1,all) to characterise the unique values, but nothing has shown up, while in my real upset plot, I have 10 values unique to my data1.
Thanks.
I apologize if this has been asked before, but I could not find the answer I needed when there are three grouping variables.
I need to fill a dataframe with possible combinations of variables, but insert NAs for a non-grouping observation values when a combination does not appear. Say there is a dataframe with three grouping variables: Year, Geography, and Grouping:
Year <- rep(2008:2019,each=50)
Geography <- rep(1:60,each=10)
Grouping <- rep(1:4,each=150)
value <- seq(rnorm(600,mean=0,sd=1))
df=cbind(Year,Geography)
df=as.data.frame(cbind(df,value))
But the dataframe is missing some random observations like so:
df2=df[-c(15,60,150,510),]
How would one go about changing the dataframe back into a length of 600 (which is the length it would be if all possible combinations of three grouping variables were present), but inserting NAs where the value would be if the combinations were in the dataframe? Note that all unique observations for each grouping variable are present in the dataset at some point.
I have two data sets, one of which shows seasonality while the other shows a trend.
I have removed seasonality from the first data set but I am not able to remove trend from the other data set.
Also, if I remove trend from the other data set and then try to make a data frame of both the altered data sets, then the number of rows will be different for both the data sets (because I have removed seasonality from the first data set using lag, so there is a difference of 52 values in the two data sets).
How do I go about it?
For de-trending a time series, you have several options, but the most commonly used one is HP filter from the "mFilter" package:
a <- hpfilter(x,freq=270400,type="lambda",drift=FALSE)
The frequency is for the weekly nature of the data, and drift=FALSE sets no intercept. The function calculates the cyclical and trend components and gives them to you separately.
If the time indices for both your series are the same (i.e weekly), you could use the following, where x and y are your dataframes:
final <- merge(x,y,by=index(a),all=FALSE)
You can always set all.x=TRUE (all.y=TRUE) to see which rows of x (y) have no matching output in y (x). Look at the documentation for merge here.
Hope this helps.
I have a data frame with 12 continuous variables and one grouping categorical response factor, containing two classes (G8 and V4).
I want to shuffle the rows in the data frame 10 times, so I acquire 10 different variations of the data frame to test. I want to use each version of the data frame to test a classifier algorithm. The code I am using is:-
Data(LDA.scores)
shuffle.cross.validation<-LDA.scores[sample(nrow(LDA.scores[2:13])),]
However, when I use this code, the categorical response factor strings transform into zero values when the data frame is shuffled. This defeats the object because the response variable is the grouping factor to classify the continuous variables. Thank you if anyone has a solution.