I'm trying learn R by graphing 3 series from some data collected in a Physics lab. I have 3 csv files:
r.csv:
"Voltage","Current"
2,133
4,266
6,399
8,532
10,666
And there are two more. I load them like this:
r <- read.csv("~/Desktop/r.csv")
rs <- read.csv("~/Desktop/rs.csv")
rp <- read.csv("~/Desktop/rp.csv")
Then, I combine them into a matrix:
data <- cbind(r, rs, rp)
When I display data, I get this extra column on the left which I though was just there when the matrix was displayed:
data
Voltage Current Voltage Current Voltage Current
1 2 133 2 270 2 67.4
2 4 266 4 535 4 134.3
3 6 399 6 803 6 200.0
4 8 532 8 1070 8 267.0
5 10 666 10 1338 10 334.0
But the graph shows an extra series:
matplot(data, type = c("b"), pch=5, col = 1:5, xlab="Voltage (V)", ylab="Current (A)")
Also, why is the x axis all wrong? The data points are at x = 2, 4, 6, 8, and 10, but the graph shows them at 1, 2, 3, 4, and 5. What am I missing here?
help(matplot) for starters:
matplot(x, y, type = "p", lty = 1:5, lwd = 1, lend = par("lend"),
pch = NULL,....
says:
x,y: vectors or matrices of data for plotting. The number of rows
should match. If one of them are missing, the other is taken
as ‘y’ and an ‘x’ vector of ‘1:n’ is used. Missing values
(‘NA’s) are allowed.
so you've only given one of x and y, so matplot has treated your whole matrix as y and a vector of 1:5 on the X axis. matplot isn't psychic enough to realise your matrix is a mix of X and Y numbers.
Solution is is something like:
matplot(data[,1], data[,-c(1,3,5)],....etc...)
which gets the X coordinates from the first voltage column, and then creates the Y matrix by dropping all the voltage columns using negative indexing to leave just the current columns. data[,c(2,4,6)] would probably work just as well.
Note this only works because your three series have all the same X-values. If this isn't the case and you want to plot multiple lines then I reckon you should look into the ggplot2 package...
Related
I have a data frame of 48 samples of zinc concentration reading. I am trying to group the data as Normal, High and Low (0 -30 low, 31 - 80 normal and above 80 high) and proceed to plot a scatter plot with different pch for each group.
here is the first 5 entries
sample concentration
1 1 71.1
2 2 131.7
3 3 13.9
4 4 31.7
5 5 6.4
THANKS
In general please try to include sample data in by using dput(head(data)) and pasting the output into your question.
What you want is called binning (grouping is a very different operation in R terms).
The standard function here is cut :
numericVector <- runif(100, min = 1, max = 100 ) # sample vector
cut(numericVector, breaks = c(0,30,81,Inf),right = TRUE, include.lowest = FALSE, labels = c("0-30","31-80","80 or higher"))
Please check the function documentation to adapt the parameters to your specific case.
I'm trying to cut() my data D into 3 pieces: [0-4], [5-12], [13-40] (see pic below). But I wonder how to exactly define my breaks in cut to achieve that?
Here is my data and R code:
D <- read.csv("https://raw.githubusercontent.com/rnorouzian/m/master/t.csv", h = T)
table(cut(D$time, breaks = c(0, 5, 9, 12))) ## what should breaks be?
# (0,5] (5,9] (9,12] # cuts not how I want the 3 pieces .
# 228 37 10
The notation (a,b] means ">a and <=b".
So, to get your desired result, just define the cuts so you get the grouping that you want, including a lower and upper bound:
table(cut(D$time, breaks=c(-1, 4, 12, 40)))
## (-1,4] (4,12] (12,40]
## 319 47 20
You may also find it helpful to look at the two arguments right=FALSE, which changes the endpoints of the intervals from (a,b] to [a,b), and include.lowest, which includes the lowest breaks value (in the OP's example, this is [0,5] with closed brackets on the lower bound). You can also use infinity. Here's an example with a couple of those options put to use:
table(cut(D$time, breaks = c(-Inf, 4, 12, Inf), include.lowest=TRUE))
## [-Inf,4] (4,12] (12, Inf]
## 319 47 20
This produces the right buckets, but the interval notation would need tweaking. Assuming all times are integers. Might need to tweak the labels manually - each time you have an right-open interval notation, replace the factor label with a closed interval notation. Use your best string 'magic'
Personally, I like to make sure all possibilities are covered. Perhaps future data from this process might exceed 40? I like to put an upper bound of +Inf in all my cuts. This prevents NA from creeping into the data.
What cut needs is a 'whole numbers only` option.
F=cut(D$time,c(0,5,13,40),include.lowest = TRUE,right=FALSE)
# the below levels hard coded but you could write a loop to turn all labels
# of the form [m,n) into [m,n-1]
levels(F)[1:2]=c('[0,4]','[5,12]')
Typically there would be more analysis before final results are obtained, so I wouldn't sweat the labels too much until the work is closer to complete.
Here are my results
> table(F)
F
[0,4] [5,12] [13,40]
319 47 20
R can compare integers to floats, like in
> 6L >= 8.5
[1] FALSE
Thus you can use floats as breaks in cut such as in
table(cut(D$time, breaks = c(-.5, 4.5, 12.5, 40.5)))
For integers this fullfills your bucket definition of [0-4], [5-12], [13-40] without you having to think to much about square brackets against round brackets.
A fancy alternative would be clustering around the mean of you buckets as in
D <- read.csv("https://raw.githubusercontent.com/rnorouzian/m/master/t.csv", h = T)
D$cluster <- kmeans(D$time, center = c(4/2, (5+12)/2, (13+40)/2))$cluster
plot(D$time, rnorm(nrow(D)), col=D$cluster)
You shoud add two aditional arguments right and include.lowest to your code!
table(cut(D$time, breaks = c(0, 5, 13, 40), right=FALSE, include.lowest = TRUE))
In the case of right=FALSE the intervals should be closed on the left and open on the right such that you would have your desired result. include.lowest=TRUE causes that your highest break value (here 40) is included to the last interval.
Result:
[0,5) [5,13) [13,40]
319 47 20
Vice versa you can write:
table(cut(D$time, breaks = c(0, 4, 12, 40), right=TRUE, include.lowest = TRUE))
with the result:
[0,4] (4,12] (12,40]
319 47 20
Both mean exact what you looking for:
[0,4] [5,12] [13,40]
319 47 20
This question already has answers here:
get x-value given y-value: general root finding for linear / non-linear interpolation function
(2 answers)
Closed 3 years ago.
I am new to R but I am trying to figure out an automated way to determine where a given line between two points crosses the baseline (in this case 75, see dotted line in image link below) in terms of the x-coordinate. Once the x value is found I would like to have it added to the vector of all the x values and the corresponding y value (which would always be the baseline value) in the y value vectors. Basically, have a function look between all points of the input coordinates to see if there are any linear lines between two points that cross the baseline and if there are, to add those new coordinates at the baseline crossing to the output of the x,y vectors. Any help would be most appreciated, especially in terms of automating this between all x,y coordinates.
https://i.stack.imgur.com/UPehz.jpg
baseline = 75
X <- c(1,2,3,4,5,6,7,8,9,10)
y <- c(75,53,37,25,95,35,50,75,75,75)
Edit: added creation of combined data frame with original data + crossing points.
Adapted from another answer related to two intersecting series with uniform X spacing.
baseline = 75
X <- c(1,2,3,4,5,6,7,8,9,10)
Y1 <- rep(baseline, 10)
Y2 <- c(75,53,37,25,95,35,50,75,75,75)
# Find points where x1 is above x2.
above <- Y1>Y2
# Points always intersect when above=TRUE, then FALSE or reverse
intersect.points<-which(diff(above)!=0)
# Find the slopes for each line segment.
Y2.slopes <- (Y2[intersect.points+1]-Y2[intersect.points]) /
(X[intersect.points+1]-X[intersect.points])
Y1.slopes <- rep(0,length(Y2.slopes))
# Find the intersection for each segment
X.points <- intersect.points + ((Y2[intersect.points] - Y1[intersect.points]) / (Y1.slopes-Y2.slopes))
Y.points <- Y1[intersect.points] + (Y1.slopes*(X.points-intersect.points))
# Plot.
plot(Y1,type='l')
lines(Y2,type='l',col='red')
points(X.points,Y.points,col='blue')
library(dplyr)
combined <- bind_rows( # combine rows from...
tibble(X, Y2), # table of original, plus
tibble(X = X.points,
Y2 = Y.points)) %>% # table of interpolations
distinct() %>% # and drop any repeated rows
arrange(X) # and sort by X
> combined
# A tibble: 12 x 2
X Y2
<dbl> <dbl>
1 1 75
2 2 53
3 3 37
4 4 25
5 4.71 75
6 5 95
7 5.33 75
8 6 35
9 7 50
10 8 75
11 9 75
12 10 75
I am trying to display my data using radarchart {fmsb}. The values of my records are highly variable. Therefore, low values are not visible on final plot.
Is there a was to "free" axis per each record, to visualize data independently of their scale?
Dummy example:
df<-data.frame(n = c(100, 0,0.3,60,0.3),
j = c(100,0, 0.001, 70,7),
v = c(100,0, 0.001, 79, 3),
z = c(100,0, 0.001, 80, 99))
n j v z
1 100.0 100.0 100.000 100.000 # max
2 0.0 0.0 0.000 0.000 # min
3 0.3 0.001 0.001 0.001 # small values -> no visible on final chart!!
4 60.0 0.001 79.000 80.000
5 0.3 0.0 3.000 99.000
Create radarchart
require(fmsb)
radarchart(df, axistype=0, pty=32, axislabcol="grey",# na.itp=FALSE,
seg = 5, centerzero = T)
Result: (only rows #2 and #3 are visible, row #1 with low values is not visible !!)
How to make visible all records (rows), i.e. how to "free" axis for any of my records? Thank you a lot,
If you want to be sure to see all 4 dimensions whatever the differences, you'll need a logarithmic scale.
As by design of the radar chart we cannot have negative values we are restricted on our choice of base by the range of values and by our number of segments (axis ticks).
If we want an integer base the minimum we can choose is:
seg0 <- 5 # your initial choice, could be changed
base <- ceiling(
max(apply(df[-c(1,2),],MARGIN = 1,max) / apply(df[-c(1,2),],MARGIN = 1,min))
^(1/(seg0-1))
)
Here we have a base 5.
Let's normalize and transform our data.
First we normalize the data by setting the maximum to 1 for all series,then we apply our logarithmic transformation, that will set the maximum of each series to seg0 (n for black, z for others) and the minimum among all series between 1 and 2 (here the v value of the black series).
df_normalized <- as.data.frame(df[-c(1,2),]/apply(df[-c(1,2),],MARGIN = 1,max))
df_transformed <- rbind(rep(seg0,4),rep(0,4),log(df_normalized,base) + seg0)
radarchart(df_transformed, axistype=0, pty=32, axislabcol="grey",# na.itp=FALSE,
seg = seg0, centerzero = T,maxmin=T)
If we look at the green series we see:
j and v have same order of magnitude
n is about 5^2 = 25 times smaller than j (5 i the value of the base, ^2 because 2 segments)
v is about 5^2 = 25 times (again) smaller than z
If we look at the black series we see that n is about 3.5^5 times bigger than the other dimensions.
If we look at the red series we see that the order of magnitude is the same among all dimensions.
Maybe a workaround for your problem:
If you would transform your data before running radarchart
(e.g. logarithm, square root ..) then you could also visualise small values.
Here an example using a cubic root transformation:
library(specmine)
df.c<-data.frame(cubic_root_transform(df)) # transform dataset
radarchart(df.c, axistype=0, pty=32, axislabcol="grey",# na.itp=FALSE,
seg = 5, centerzero = T)`
and the result will look like this:
EDIT:
If you want to zoom the small values even more you can do that with a higher order of the root.
e.g.
t<-5 # for fifth order root
df.t <- data.frame(apply(df, 2, function(x) FUN=x^(1/t))) # transform dataset
radarchart(df.t, axistype=0, pty=32, axislabcol="grey",# na.itp=FALSE,
seg = 5, centerzero = T)
You can adjust the "zoom" as you want by changing the value of t
So you should find a visualization that is suitable for you.
Here is an example using 10-th root transformation:
library(specmine)
df.c<-data.frame((df)^(1/10)) # transform dataset
radarchart(df.c, axistype=0, pty=32, axislabcol="grey",# na.itp=FALSE,
seg = 5, centerzero = T)`
and the result will look like this:
You can try n-th root for find the one that is best for you. N grows, the root of a number nearby zero grows faster.
I'm importing coordinate data from a model (NetLogo) and am trying to plot it in R. In the NetLogo model the size of the area is given as a 221 x 221 torus with maximum x and y values of 110 and minimum x and y values also of 110.
My code to plot the data, grid the area and extract the number of points per patch is as follows:
x<-c(1:50)
y<-c(1:50)
plot(x, y, pch = 16, xlim=c(-110,110), ylim=c(-110,110))
grid(110,110,lty=1)
xt<-cut(x,seq(-110,110,1))
yt<-cut(y,seq(-110,110,1))
count<-as.vector(table(xt,yt))
table(count)
But when I do this it's obviously giving me the number of patches as 48400. How do I properly set the values so it's 48841 (i.e. 221 x 221)?
-2, -1, 0, 1, 2
5 objects mean 4 intervals, so the plot has (221 - 1)^ 2 patches.