I have a single column data frame of race times called RaceTimes, with the data frame called R5.
I don't know what ggplot to use and how to get the percentile on the x-axis.
Related
I have data in the given format:
(https://i.stack.imgur.com/Y6gFM.png)
so it is categories on the far left (row names), and months are on the top (column names)
And I need to create a plot with a line for each Group.1 category on the same plot, which will have x axis of month and y axis as the corresponding value. Initially I was thinking of creating a separate data frame for each category with month and value as columns, but it seems there should be a better way. Any ideas?
I tried transposing the data frame, which would allow me to get the month and values correctly, but I would not know what category they would be for plotting the lines.
I am plotting a data that consists of some intervals that are more or less constant, and spikes in the data originating from the data being a quotient from two parameters. The relatively high and large quotients aren't not relevant for my purpose, so I have been looking for a way to filter these out. The dataset contains 40k+ values so I can not manually remove the high/low quotients.
Is there any function that can trim/filter out the very large/small quotients?
You can use the filter() function from dplyr. This can create a new dataframe without outliers that you can then plot. For example:
no_spikes <- filter(original_df, x > -100 & x < 100)
This would create a new dataframe, no_spikes, that only contains observations where the variable x is between the values -100 and 100.
I have a set of true/false data I need to prepare for a chi-squared analysis in R. Currently it's organized by time of day in several lists. What would be the best way to add a variable to each of these lists for time of day, fill in each list's points with the time they were collected, then combine them into one table?
I have two data sets, one of which shows seasonality while the other shows a trend.
I have removed seasonality from the first data set but I am not able to remove trend from the other data set.
Also, if I remove trend from the other data set and then try to make a data frame of both the altered data sets, then the number of rows will be different for both the data sets (because I have removed seasonality from the first data set using lag, so there is a difference of 52 values in the two data sets).
How do I go about it?
For de-trending a time series, you have several options, but the most commonly used one is HP filter from the "mFilter" package:
a <- hpfilter(x,freq=270400,type="lambda",drift=FALSE)
The frequency is for the weekly nature of the data, and drift=FALSE sets no intercept. The function calculates the cyclical and trend components and gives them to you separately.
If the time indices for both your series are the same (i.e weekly), you could use the following, where x and y are your dataframes:
final <- merge(x,y,by=index(a),all=FALSE)
You can always set all.x=TRUE (all.y=TRUE) to see which rows of x (y) have no matching output in y (x). Look at the documentation for merge here.
Hope this helps.
I basically have 14 years of data. Within each year, is anywhere from 100-300 observations of age.
I am trying to create a data frame of all of the ages in one column.
If I try
test=data.frame(vals[[1]]$age)
I get a data frame of all of the ages of year 1.
If I try
for (i in 1:length(survey$years){test=data.frame(vals[[i]]$age)}
I get a data frame of the correct number of observations for all of the years, but all "NA" values.
There are "NA" values for some of the observations-- I'm assuming this is the problem, as when I try it with a variable with no NA values (length), it works correctly. How can I get around the blank values?