switching the place of x and y axis data in r - r

I have a vector of data which consists of 20,000 numbers ranging between 0 and 1, i want to plot this data where x axis is the number values and y axis is their frequencies.
|
Freq|
|
|
|______________
values
but when i use plot(vector) in R, it shows frequency on x axis named as index and number values on y.
In the arguments used by plot() function i couldn't find anything helpful.
does anybody know how i could do this?

If you want a plot of frequencies, the best type of plot to make would be a barplot and the easiest way to make a barplot is just to pass a table to barplot(). For example
barplot(table(vector))
or if you just want a needle-style plot
plot(table(vector))
would also work.
If you want to trim outliers from the table, you could try
barplot( table( vector[vector<quantile(vector, .98)] ) )
here we drop samples that are above the 98% quantile.

Related

Counts, bars, bins for each pandas DataFrame histogram subplot

I am making separate histograms of travel distance per departure hour. However, for making further calculations I'd like to have the value of each bin in a histogram, for all histograms.
Up until now, I have the following:
df['Distance'].hist(by=df['Departuretime'], color = 'red',
edgecolor = 'black',figsize=(15,15),sharex=True,density=True)
This creates in my case a figure with 21 small histograms.
With single histograms, I'd paste counts, bins, bars = in front of the entire line and the variable counts would contain the data I was looking for, however, in this case it does not work.
Ideally I'd like a dataframe or list of some sort for each histogram, containing the density values of the bins. I hope someone can help me out! Thanks in advance!
Edit:
Data I'm using, about 2500 columns of this, Distance is float64, the Departuretime is str
Histogram output I'm receiving
Of all these histograms I want to know the y-axis value of each bar, preferably in a dataframe with the distance binning as rows and the hours as columns
By using the 'cut' function you can withdraw the requested data directly from your dataframe, instead of from the graph. This is less error-sensitive.
df['DistanceBin'] = pd.cut(df['Distance'], bins=10)
Then, you can use pivot_table to obtain a table with the counts for each combination of DistanceBin and Departuretime as rows and columns respectively as you asked.
df.pivot_table(index='DistanceBin', columns='Departuretime', aggfunc='count')

Using same X and Y axis for all par() plots [duplicate]

Lets say I have 10 observations of 200 points of integers between one and ten:
mysample = sample(rep(seq(1,10),20),10);
and I want to barplot it
barplot(table(mysample));
barplot
In this example, there are no observations of 7. Is there a quick way of telling barplot to set the x-axis range to all integers between 1 and 10, or do I have to manually edit the table?
Try
barplot(table(factor(mysample, levels=1:10)));
By using a factor, R will know which levels are "missing"

How to plot for repeating values in R

I am trying to implement an array in R but plotting same y-values for all x values. If value is NA, then it shouldn't be plotted
I tried the following plot which shows the histogram for all 10 values.
plot(c(1,2,NA,3,4,5,3,NA,2,4),type='h', ylim=c(0,4))
However, for the case below, when I try to control the y-values, the repeated values are not considered in the plot.
plot(c(1,2,NA,3,4,5,3,NA,2,4), rep(1,10),type='h', ylim=c(0,4))
Is this possible with plot function? Please suggest if the same can be done with an alternative.
Please look again at the help page of ?plot.
In your second line you plot the y value 1 at the x values 1 to 5. The plot you get is exactly the plot you asked for, which is not the plot you cared for. In the first plot, your values are interpreted as the y values, not the x values. The x values in the plot are just the indices in the first example.
If you want to get the lines not plotted at the NA values, just do:
x <- c(1,2,NA,3,4,5,3,NA,2,4)
plot(!is.na(x), type = 'h')
Now you plot a TRUE (which is a value of 1) whenever there is a value, and FALSE (which translates to 0) whenever there is none.
This is the exact same as :
xx <- ifelse(is.na(x),0,1)
plot(xx, type = 'h')
On a sidenote: Please do not call this a histogram. A histogram represents counts for bins, this doesn't even come close to that.
plot(!is.na(c(1,2,NA,3,4,5,3,NA,2,4)),type='h', ylim=c(0,4))

How to plot a barchart or histogram in R, indicating the probability (discrete data)?

I am new to R. I have discrete data. I want to plot a chart (barchart or histogram) indicating for each existing value (in my data) the normalized number of occurrences (actual count for that value divided by total records). For the moment I have figured out to use:
hist(mydata$x,5,probability = TRUE)
where the number 5 corresponds to the number of rectangles. This example works if the base of the rectangle is length=1, therefore I would always need to know the range of results and I could not have data like {0, 0.5, 1, 1.5, ...}. How to make a more general solution? I really think that there is a single line solution, for something so basic.
Thanks
I assume your are looking for the combination
table()
barplot()
e.g.
counts <- table(mtcars$gear)
barplot(counts / sum(counts), main="Car Distribution", xlab="Number of Gears")
Yes. There is a line for this.
barplot(prop.table(table(data$x)))
data$x is a discrete variable.
table(data$x) will give you a table with the first row=the different values of data$x and
the second row=the frequencies of each of those values.
prop.table(table(data$x)) will also give you a table. The same table but this time each
value will be divided by the length of the variable data$x so you will get the
probability of having each different value.
barplot will plot you a barchart. At x-axis you will get the first row of prop.table(table(data$x)). And at y-axis you will get the second row of prop.table(table(data$x)).

Plotting values over time in R

I have a vector of values: b=read.csv('https://dl.dropbox.com/u/22681355/b.csv')
I would like to plot them with having values 1:2000 on the x-axis representing time and the values of the vector on the y axis.
When I plot them using hist(b) I get the opposite thing with values from 1:2000 on the y axis and the actual values on the x.
How can I reverse this?
Try barplot(b) or plot(b,type="b") instead.
(Your link doesn't work for me.)

Resources