How to use text function in R? - r

I am learning graphical analysis using R. Here is the code, which I can not understand.
barplotVS <- barplot(table(mtcarsData$vs), xlab="Type of engine")
text(barplotVS,table(mtcarsData$vs)/2,table(mtcarsData$vs),cex=1.25)
The output is like below. I can not understand the function of text(), I googled the text() function, which shows that the parameter of text(x,y) is numeric vectors of coordinates where the text labels should be written. Can anyone tell me what is barplotVS,table(mtcarsData$vs)/2,table(mtcarsData$vs),cex=1.25 in my code.

barplotVS <- barplot(table(mtcarsData$vs), xlab="Type of engine")
print(barplotVS)
outputs:
[,1]
[1,] 0.7
[2,] 1.9
These are the positions where the center of the bars in the barplot are on the x axis.
print(table(mtcarsData$vs))
outputs:
0 1
18 14
the numbers below are the occurrences of each value that is present in mtcarsData$vs and the numbers above are the actual value that is counted.
When you run the function:
text(barplotVS,table(mtcarsData$vs)/2,table(mtcarsData$vs),cex=1.25)
the first value will be the x positions where to put the labels (i.e. 0.7 and 1.9), the second parameter will be the y positions set in this case to total counts divided by two (i.e. 9 and 7) meaning to put the labels halfway in the bars, the third will be the labels (i.e. 18 and 14) and finally cex is a value that allows to change the size of the font.
Anyway R has in general a good documentation that you can call by using the ? operator (as suggested in the comments). In order to understand try to run the code and check what each variable contains with print or str functions. If you use a IDE (e.g. RStudio) have the content of the variables in a graphical panel so you don't event need to print.

Related

R Plot Multiple Lines According to Choice of User in function()

I want to plot the data by using function(). My data consists of 4 vectors, say a, b, c and d.
I have to plot them by choice of vector at that time.
For example
i want to plot vector a and c then graph must have 2 lines....
if i want 2 plot all 4 vectors then there must be 4 lines in graph.
Till I have tried switch() but i think thats not suitable related to my work.
Is it even possible to write such code in an anonymous function ?
If yes, what is the right way, and if not, is there any workaround ?

Plot along different dimensions

I have the following basic code. The first line sums p along dimension 1 to create a 1 x column array. The next line plot A. Unfortunately, it seems that Julia assumes it must plot many lines (in this case just points) along dimension 2.
A = sum(p,dims = 1)
plot(A)
So, my question is, how can I plot a simple line when the data is in a 1 x column array?
I assume you use Plots.jl. The following is from Plots.jl's documentation.
If the argument [to plot] is a "matrix-type", then each column will map to a series, cycling through columns if there are fewer columns than series. In this sense, a vector is treated just like an "nx1 matrix".
The number of series plot(a) tries to plot is the number of columns in a.
To get a single series, you can do one of the followings
plot(vec(a)) # `vec` will give you a vector view of `a` without an allocation
plot(a') # or `plot(transpose(a))`. `transpose` does not allocate a new array
plot(a[:]) # this allocates a new array so you should probably avoid it

R: Finding duplicates in a data frame and recording them in vectors

I am trying to create some lines on a graph based on a third coordinate (x,y, temp). I would like to get a vector of indexes so I can split them into x and y vectors for each duplicate temperature. To make this more clear, I will include my actual data set:
DataFrame
I am trying to make multiple lines that have the same temp value. For example, I would like to have the following coordinates on the same line [0,14] [0,22] [0,26] [0,28]. They all have the temp value of 5.8. Once I find the duplicates, I will record the indexes in a vector which will allow me to retrieve the x and y coordinates. One other aspect is that I will not always know how many entries are going to be in the data.frame.
My question is how can I find the duplicates and store their indices in a vector? Once I have the indices for the duplicate temps, I can be sure to grab their x y coordinates and use that to create lines.
If you can answer my question or have any advice on how I can do this better, all help is appreciated
Consider the following:
df <- data.frame(temp = sample.int(n=3, size=5, replace=T))
df
temp
1 3
2 3
3 1
4 3
5 1
duplicated(df$temp)
[1] FALSE TRUE FALSE TRUE TRUE
which(duplicated(df$temp))
[1] 2 4 5
You've stated in the comments that you're looking to make an isopleth graph. The procedure you have described will not generate anything resembling an isopleth graph. Since it looks like your data is arranged in a regular grid, you should do something like the solutions presented in this question and answer, which use functions specifically designed for extracting contours from a grid of values. Another option is the contourLines function in the gDevices package. If you want higher-resolution, less jagged contours, you might look into using either the interp.surface or Krig functions from the fields package to interpolate your data to the resolution you require.

gnuplot computing stats over multiple columns

I have a simple 9 column file. I wan't to compute certain statistics for each column and then plot it (using gnuplot).
1) This is how I compute statistics for every column excluding the first one.
stats 'data' every ::2 name "stats"
2) In the output screen I can see that the operation is successful. Note that the number of columns/records is 8
* FILE:
Records: 8
Out of range: 0
Invalid: 0
Blank: 0
Data Blocks: 1
* COLUMNS:
Mean: 6.5000 491742.6625
Std Dev: 2.2913 703.4865
Sum: 52.0000 3.93394e+06
Sum Sq.: 380.0000 1.93449e+12
Minimum: 3.0000 [0] 490312.0000 [2]
Maximum: 10.0000 [7] 492643.5000 [7]
Quartile: 4.5000 491329.5000
Median: 6.5000 491911.1500
Quartile: 8.5000 492252.2500
Linear Model: y = 121.8 x + 4.91e+05
Correlation: r = 0.3966
Sum xy: 2.558e+07
3) Now I can access statistics on the first 2 columns by appending _x and _y like this
print stats_median_x
print stats_median_y
My questions are:
How can I access statistics (lets say medians) for the remaining 6 columns?
How could I plot lets say a line over all medians against some X axis?
I know that I can simply add a python script to pre-compute all this, but I would prefer to avoid it if there is an easy way to do it using gnuplot itself.
Thanks!
Short answer(s)
"How can I access statistics of the other column?"
with stats 'data'using n you will access to the nth column...
"How can I plot for example all medians?"
e.g. a set print and a do for cycle can create a data-file that you can use for the plot.
A working solution
set print "StatDat.dat"
do for [i=2:9] { # Here you will use i for the column.
stats 'data.dat' u i nooutput ;
print i, STATS_median, STATS_mean , STATS_stddev # ...
}
set print
plot "StatDat.dat" us 1:2 # or whatever column you want...
Some words more about it
Asking help to gnuplot itself with help stats it's possible to read a lot of interesting things :-).
Syntax:
stats 'filename' [using N[:M]] [name 'prefix'] [[no]output]]
This command prepares a statistical summary of the data in one or two columns of a file. The using specifier is interpreted in the same way as for plot commands. See plot for details on the index, every, and using directives.
From the first highlighted sentence we can understand that it prepares statistics for one or maximum two column each time (It's a pity let's see in future...).
From the second highlighted sentence it's possible to read that it will follow the same syntax of the plot command:
so stats 'data'using 3 will give you the statistic of the 3rd column in x
and stats 'data' using 4:5 of the 4th and 5th in x,y...
Notes about your interpretations
You said
This is how I compute statistics for every column excluding the first one.
stats 'data' every ::2 name "stats"
Not really this is the statistic for the first two column excluding the first two lines, indeed their counter starts from 0 and not from 1.
As consequence of the above assumption/interpretation, when we read
Records: 8
it means that the lines computed where 8; your file had 10 (usable) lines, you specify every ::2 and you skip the first two, thus you have 8 records useful for the statistic.
Indeed so we can better understand when in help stats it is said
STATS_records # total number of in-range data records
implying "used to compute this statistic".
Tested on gnuplot 4.6 patchlevel 4
Working on gnuplot Version 5.0 patchlevel 1

R spline function given a fixed space

So, I need to generate a spline function to feed it into another program which only accepts a fixed space between consecutive points. So, I used spline function in R with a given number of points to genrate spline, however, the floating-point cutoff makes the space among the points variable, for example:
spline(d$V1, d$V2, n=(max(d$V1)-min(d$V1))/0.0200)
> head(t.spl, 7)
x y
1 2.3000 -3.0204
2 2.3202 -3.0204
3 2.3404 -3.0204
4 2.3606 -3.0204
5 2.3807 -3.0204
6 2.4009 -3.0204
7 2.4211 -3.0204
so, the space between 1st 1nd 2nd row is 0.0202, while between 4th and 5th is 0.0201. So because of this problem, the other program that I am feeding this spline into, doesn't accept this. So, is there any way to make this work?
As an aside: please provide a reproducible example next time (I can't copy/paste your code in because I don't have d or t.spl)
I think you'll find that the different intervals (0.0202 vs 0.0201) is an artifact of the number of characters you are printing on the screen, not of the spline function.
It seems R is printing 4 digits after the decimal point for you for neatness, so it's doing the rounding only for the purposes of displaying the results to you.
You can see how many digits are displayed with options('digits')$digits, and adjust it with options(digits=new_number_of_digits) (see ?options for details).
For example:
options(digits=4)
pi
# 3.142
options(digits=10)
pi
# 3.141592654
In summary, when you feed the values in to your other program, make sure you print the values with enough decimal points that the other program accepts the intervals as being "equal".
If you are writing to a file, for example, just make sure you write enough digits out. If you are copy-pasting from the R console, make sure you adjust R to print out enough digits.
MathematicalCoffee is probably right. I'm just adding an alternative for the sake of wordiness.
myspline <- splinefun(dV$1,dV$2)
mydata.y <- myspline(desired_x_values,deriv=0)
Will guarantee the uniform x-spacings you desire.

Resources