filtering data within a correlated matrix - r

I have an data.frame, compare the X and Y axes, and then I get N results, and I need to generate a graphic, be ggcorplot or correlationplot, but I wanted to make a filter for the chart, where only values above 0 will be included in the graph
I have already tried ,
dataCorrealtion[dataCorrelation > 0] <- ""
but the graph doesn't accept an empty value, and I can't put a fake value
I hope to be able to generate a graph without values that are less than 0

Related

Alternate way to remove outliers in R

I'm looking to remove the outlier data points in the clusters after k means clustering and using this way to do so in R :-
1.)Plot the graph:-
plot(sort(df[[1]]$var))
plot(sort(df[[2]]$var))
2.)From the graph see the outlier( in my case extreme )data points.
rownames(df[[1]])<-1:nrow(df[[1]])
rownames(df[[2]])<-1:nrow(df[[2]])
3.)Go to view(df[[1]]),view(df[[2]]) sort the var in descending order and note down those row index numbers which are the outlier data points and remove those rows from df[[1]] ,df[[2]]
df[[1]]<-df[[1]][-c(200,320,216),]
df[[2]]<-df[[2]][-c(7000,1200,2320),]
df is a list with 3 elements , df[[1]] access the first element/ cluster
Is there any other easy and efficient way to achieve the same?
You need to include a short, reproducible example showing what you want and what you have tried. That said, the following may give you some hints if I'm guessing what you want correctly. Note that you can get min/max cut values from CIs or other means.
a <- 1:40
b <- a[a %in% 4:35] # Define outliers as <= 4 or >= 35
b
length(b) # Note there are no NAs using this approach
Basically cut off the outliers at the relevant outlier values and graph the remaining elements.

Plot along different dimensions

I have the following basic code. The first line sums p along dimension 1 to create a 1 x column array. The next line plot A. Unfortunately, it seems that Julia assumes it must plot many lines (in this case just points) along dimension 2.
A = sum(p,dims = 1)
plot(A)
So, my question is, how can I plot a simple line when the data is in a 1 x column array?
I assume you use Plots.jl. The following is from Plots.jl's documentation.
If the argument [to plot] is a "matrix-type", then each column will map to a series, cycling through columns if there are fewer columns than series. In this sense, a vector is treated just like an "nx1 matrix".
The number of series plot(a) tries to plot is the number of columns in a.
To get a single series, you can do one of the followings
plot(vec(a)) # `vec` will give you a vector view of `a` without an allocation
plot(a') # or `plot(transpose(a))`. `transpose` does not allocate a new array
plot(a[:]) # this allocates a new array so you should probably avoid it

Averaging different length vectors with same domain range in R

I have a dataset that looks like the one shown in the code.
What I am guaranteed is that the "(var)x" (domain) of the variable is always between 0 and 1. The "(var)y" (co-domain) can vary but is also bounded, but within a larger range.
I am trying to get an average over the "(var)x" but over the different variables.
I would like some kind of selective averaging, not sure how to do this in R.
ax=c(0.11,0.22,0.33,0.44,0.55,0.68,0.89)
ay=c(0.2,0.4,0.5,0.42,0.5,0.43,0.6)
bx=c(0.14,0.23,0.46,0.51,0.78,0.91)
by=c(0.1,0.2,0.52,0.46,0.4,0.41)
qx=c(0.12,0.27,0.36,0.48,0.51,0.76,0.79,0.97)
qy=c(0.03,0.2,0.52,0.4,0.45,0.48,0.61,0.9)
a<-list(ax,ay)
b<-list(bx,by)
q<-list(qx,qy)
What I would like to have something like
avgd_x = c(0.12,0.27,0.36,0.48,0.51,0.76,0.79,0.97)
and
avgd_y would have contents that would
find the value of ay and by at 0.12 and find the mean with ay, by and qy.
Similarly and so forth for all the values in the vector with the largest number of elements.
How can I do this in R ?
P.S: This is a toy dataset, my dataset is spread over files and I am reading them with a custom function, but the raw data is available as shown in the code below.
Edit:
Some clarification:
avgd_y would have the length of the largest vector, for example, in the case above, avgd_y would be (ay'+by'+qy)/3 where ay' and by' would be vectors which have c(ay(qx(i))) and c(by(qx(i))) for i from 1 to length of qx, ay' and by' would have values interpolated at data points of qx

R: Finding duplicates in a data frame and recording them in vectors

I am trying to create some lines on a graph based on a third coordinate (x,y, temp). I would like to get a vector of indexes so I can split them into x and y vectors for each duplicate temperature. To make this more clear, I will include my actual data set:
DataFrame
I am trying to make multiple lines that have the same temp value. For example, I would like to have the following coordinates on the same line [0,14] [0,22] [0,26] [0,28]. They all have the temp value of 5.8. Once I find the duplicates, I will record the indexes in a vector which will allow me to retrieve the x and y coordinates. One other aspect is that I will not always know how many entries are going to be in the data.frame.
My question is how can I find the duplicates and store their indices in a vector? Once I have the indices for the duplicate temps, I can be sure to grab their x y coordinates and use that to create lines.
If you can answer my question or have any advice on how I can do this better, all help is appreciated
Consider the following:
df <- data.frame(temp = sample.int(n=3, size=5, replace=T))
df
temp
1 3
2 3
3 1
4 3
5 1
duplicated(df$temp)
[1] FALSE TRUE FALSE TRUE TRUE
which(duplicated(df$temp))
[1] 2 4 5
You've stated in the comments that you're looking to make an isopleth graph. The procedure you have described will not generate anything resembling an isopleth graph. Since it looks like your data is arranged in a regular grid, you should do something like the solutions presented in this question and answer, which use functions specifically designed for extracting contours from a grid of values. Another option is the contourLines function in the gDevices package. If you want higher-resolution, less jagged contours, you might look into using either the interp.surface or Krig functions from the fields package to interpolate your data to the resolution you require.

explicit the x-value for plotting in gnuplot

In GNUPLOT, I would like to plot 5 values on a single bar chart, separated with some spacing in between. If I have data formatted as such:
3342336, 3375103, 7110653, 32770, 0
where those 5 values are the y-values, how can I specify the x-values myself for where they should belong?
For example, I would like my bar chart to have each entry be of length 1,
so I plot y-value 3342336 at x-value 1,
y-value 3375103 at x-value 3,
y-value 7110653 at x-value 5,
y-value 32770 at x-value 7,
and y-value 0 at x-value 9.
I would appreciate any example code that can achieve this. Thanks.
If your data is in one row as shown, you can achieve this by using the plot for syntax looping over the column index, and calculating the x value from that index. We can grab the column by using the column function which retrieves the specified column number.
set boxwidth 1
set datafile separator comma # only if data is comma separated
plot for [i=1:5] (2*i-1):(column(i)) with boxes
If we need to ensure the same line type is used each time, we can explicitly state it in the plot command.
plot for [i=1:5] (2*i-1):(column(i)) with boxes lt 1
Additionally, if a key is to be generated, and we don't wish each plot statement to generate one, we can test for and only give a nonempty title on the first iteration (an empty title is treated the same as no title).
plot for [i=1:5] (2*i-1):(column(i)) with boxes lt 1 title (i==1)?"Title":""
If your data is separated into rows as is the normal format, this can be obtained a different way.
Gnuplot has several pseuduocolumns (see help pseudocolumns for details). In your case, column 0 is of interest. Column 0 gives the line number of the data starting at 0. Thus to get sequential odd numbers like that, you can use 2*$0+1.
For example, if your data (stored in datafile.txt) looks like
3342336
3375103
7110653
32770
0
and you wish to plot boxes of length 1 at those values, you can do
set boxwidth 1
plot "datafile.txt" u (2*$0+1):1 with boxes

Resources