r - how to find the first empty bin of a histogram

r - how to find the first empty bin of a histogram - r

I have a large dataset with response times. I need to make reference to the first empty bin of the histogram (with x being milliseconds), and exclude all data that comes after that.
I think
Can anybody help?

If you capture the return of hist it contains all of the information that you need.
set.seed(3)
x = rnorm(20)
H = hist(x)
min(which(H$counts == 0))
[1] 5
To exclude the data that bin and above
MIN = min(which(H$counts == 0))
x = x[x<H$breaks[MIN]]

Related

How can I apply a piece of R code to every column of my data frame

I have to analyze EMG data, but I'm not very good in using R:
I have a data.frame with 9 columns: one column is specifying the time and the other 8 are specifying my channels.
I want to filter my emg data, but I am only able to do it per channel, but I want to do it for all channels of the dataframe at once, so I don't have to apply it to every single channel.
# This example computes the LE-envelope using the lowpass routine
# Coerce a data.frame into an 'emg' object
x <- as.emg(extensor_raw$channel1, samplingrate = 1000, units = "mV") ##do this for every channel
# Compute the rectified signal
x_rect <- rectification(x)
# Filter the rectified signal
y <- lowpass(x_rect, cutoff = 100)
# change graphical parameters to show multiple plots
op <- par(mfrow = c(3, 1))
# plot the original channel, the filtered channel and the
# LE-envelope
plot(x, channel = 1, main = "Original channel")
plot(x_rect, main = "Rectified channel")
plot(y, main = "LE-envelope")
# reset graphical parameters
par(op)
so instead of using extensor_raw$channel1 here can i put something in like extensor_raw$i and loop around it? Or is there any way to apply this bit of code to every channel (i.e. 8 column of the 9 column data frame excluding the first column which specified the time?)

If it is columnwise, use lapply and store as a list and assuming that all the columns needs to be changed. (Note that this is not tested. The par in plot may have to be changed)
lst1 <- lapply(extensor_raw, \(vec) {
x <- as.emg(vec, samplingrate = 1000, units = "mV")
# Compute the rectified signal
x_rect <- rectification(x)
# Filter the rectified signal
y <- lowpass(x_rect, cutoff = 100)
# change graphical parameters to show multiple plots
op <- par(mfrow = c(3, 1))
# plot the original channel, the filtered channel and the
# LE-envelope
plot(x, channel = 1, main = "Original channel")
plot(x_rect, main = "Rectified channel")
plot(y, main = "LE-envelope")
# reset graphical parameters
par(op)
})

Here my solution. First of all, as there is no data with your question I used the 'EMG data for gestures Data Set' from UCI Machine Learning Repository.
Link https://archive.ics.uci.edu/ml/datasets/EMG+data+for+gestures
It is fairly similar dataset you been using, first variable is time and after that 8 variables are channels, the last one is class
To create a graph for every channel you can use FOR loop by using the column of your concern as your iterating operator. Middle code is same as yours, at last while plotting I did the change in plot title so it resembles with its respective column name.
library(biosignalEMG)
extensor_raw <- read.delim("01/1_raw_data_13-12_22.03.16.txt")
head(extensor_raw)
for(i in names(extensor_raw[2:9])){
print(paste("Drawing for ", i))
# Coerce a data.frame into an 'emg' object
x <- as.emg(extensor_raw[i], samplingrate = 1000, units = "mV") ##do this for every channel
# Compute the rectified signal
x_rect <- rectification(x)
# Filter the rectified signal
y <- lowpass(x_rect, cutoff = 100)
# change graphical parameters to show multiple plots
op <- par(mfrow = c(3, 1))
# plot the original channel, the filtered channel and the
# LE-envelope
plot(x, channel = 1, main = paste("Original ", i))
plot(x_rect, main = paste("Rectified", i))
plot(y, main = paste("LE-envelope", i))
}
At the end of this code you can see multiple pages created in graph section of rstudio, plotting each channel from 1 to 8 simultaneously
like for channel 5 and similarly for others. I hope this should help you to resolve your problem.
On the second part you have asked in comments : If you have the files separate let's keep it separate. will read it one by one and then plot it. To achieve this we will use nested FOR loop.
First set up your working directory, where you have all your gesture files. Like here in my case I have two files in my directory with same structure.
The changes in the code is as follows :
setwd('~/Downloads/EMG_data_for_gestures-master/01')
library(biosignalEMG)
for(j in list.files()){
print(paste("reading file ",j))
extensor_raw <- read.delim(j)
head(extensor_raw)
for(i in names(extensor_raw[2:9])){
print(paste("Drawing for ", i))
# Coerce a data.frame into an 'emg' object
x <- as.emg(extensor_raw[i], samplingrate = 1000, units = "mV") ##do this for every channel
# Compute the rectified signal
x_rect <- rectification(x)
# Filter the rectified signal
y <- lowpass(x_rect, cutoff = 100)
# change graphical parameters to show multiple plots
op <- par(mfrow = c(3, 1))
# plot the original channel, the filtered channel and the LE-envelope
plot(x, channel = 1, main = paste("Original ", i," from ", j))
plot(x_rect, main = paste("Rectified", i," from ", j))
plot(y, main = paste("LE-envelope", i," from ", j))
}
}
I hope this will be helpful.

R - Histogram Doesn't show density due to magnitude of the Data

I have a vector called data with length 444000 approximately, and most of the numeric values are between 1 and 100 (almost all of them). I want to draw the histogram and draw the the appropriate density on it. However, when I draw the histogram I get this:
hist(data,freq=FALSE)
What can I do to actually see a more detailed histogram? I tried to use the breaks code, it helped, but it's really hard do see the histogram, because it's so small. For example I used breaks = 2000 and got this:
Is there something that I can do? Thanks!

Since you don't show data, I'll generate some random data:
d <- c(rexp(1e4, 100), runif(100, max=5e4))
hist(d)
Dealing with outliers like this, you can display the histogram of the logs, but that may difficult to interpret:
If you are okay with showing a subset of the data, then you can filter the outliers out either dynamically (perhaps using quantile) or manually. The important thing when showing this visualization in your analysis is that if you must remove data for the plot, then be up-front when the removal. (This is terse ... it would also be informative to include the range and/or other properties of the omitted data, but that's subjective and will differ based on the actual data.)
quantile(d, seq(0, 1, len=11))
d2 <- d[ d < quantile(d, 0.90) ]
hist(d2)
txt <- sprintf("(%d points shown, %d excluded)", length(d2), length(d) - length(d2))
mtext(txt, side = 1, line = 3, adj = 1)
d3 <- d[ d < 10 ]
hist(d3)
txt <- sprintf("(%d points shown, %d excluded)", length(d3), length(d) - length(d3))
mtext(txt, side = 1, line = 3, adj = 1)

R: use bin counts and bin breaks to get a histogram

I generated a random vector from normal distribution and plotted a histogram.
I modified the counts of the each bin and I want to plot another histogram with the same breaks(break_vector) and the new bin count vector (new_counts).
How to do that?
I tried barplot(), but the way it displays the bin labels is different.
x = rnorm(500,1,6)
delta = 1
break_vector = seq(min(x)-delta,max(x)+delta,by=delta)
hist_info = hist(x,breaks=break_vector)
new_counts = hist_info$counts+5

Try
new_hist <- hist_info
new_hist$counts <- hist_info$counts + 5
plot(new_hist)

Identify spikes/peaks in density plot by group

I created a density plot with ggplot2 package for R. I would like to identify the spikes/peaks in the plot which occur between 0.01 and 0.02. There are too many legends to pick it out so I deleted all legends. I tried to filter my data out to find most number of rows that a group has between 0.01 and 0.02. Then I filtered out the selected group to see whether the spike/peak is gone but no, it is there plotted still. Can you suggest a way to identify these spikes/peaks in these plots?
Here is some code :
ggplot(NumofHitsnormalized, aes(NumofHits_norm, fill = name)) + geom_density(alpha=0.2) + theme(legend.position="none") + xlim(0.0 , 0.15)
## To filter out the data that is in the range of first spike
test <- NumofHitsnormalized[which(NumofHitsnormalized$NumofHits_norm > 0.01 & NumofHitsnormalized$NumofHits_norm <0.02),]
## To figure it out which group (name column) has the most number of rows ##thus I thought maybe I could get the data that lead to spike
testMatrix <- matrix(ncol=2, nrow= length(unique(test$name)))
for (i in 1:length(unique(test$name))){
testMatrix[i,1] <- unique(test$name)[i]
testMatrix[i,2] <- nrow(unique(test$name)[i])}
Konrad,
This is the new plot made after I filtered my data out with extremevalues package. There are new peaks and they are located at different intervals and it also says 96% of the initial groups have data in the new plot (though number of rows in filtered data reduced to 0.023% percent of the initial dataset) so I cant identify which peaks belong to which groups.

I had a similar problem to this.
How i did was to create a rolling mean and sd of the y values with a 3 window.
Calculate the average sd of your baseline data ( the data you know won't have peaks)
Set a threshold value
If above threshold, 1, else 0.
d5$roll_mean = runMean(d5$`Current (pA)`,n=3)
d5$roll_sd = runSD(x = d5$`Current (pA)`,n = 3)
d5$delta = ifelse(d5$roll_sd>1,1,0)
currents = subset(d5,d5$delta==1,na.rm=TRUE) # Finds all peaks
my threshold was a sd > 1. depending on your data you may want to use mean or sd. for slow rising peaks mean would be a better idea than sd.

Without looking at the code, I drafted this simple function to add TRUE/FALSE flags to variables indicating outliers:
GenerateOutlierFlag <- function(x) {
# Load required packages
Vectorize(require)(package = c("extremevalues"), char = TRUE)
# Run check for ouliers
out_flg <- ifelse(1:length(x) %in% getOutliers(x, method = "I")$iLeft,
TRUE,FALSE)
out_flg <- ifelse(1:length(x) %in% getOutliers(x, method = "I")$iRight,
TRUE,out_flg)
return(out_flg)
}
If you care to read about the extremevalues package you will see that it provides some flexibility in terms of identifying outliers but broadly speaking it's a good tool for finding various peaks or spikes in the data.
Side point
You could actually optimise it significantly by creating one object corresponding to getOutliers(x, method = "I") instead of calling the method twice.
More sensible syntax
GenerateOutlierFlag <- function(x) {
# Load required packages
require("extremevalues")
# Outliers object
outObj <- getOutliers(x, method = "I")
# Run check for ouliers
out_flg <- ifelse(1:length(x) %in% outObj$iLeft,
TRUE,FALSE)
out_flg <- ifelse(1:length(x) %in% outObj$iRight,
TRUE,out_flg)
return(out_flg)
}
Results
x <- c(1:10, 1000000, -99099999)
table(GenerateOutlierFlag(x))
FALSE TRUE
10 2

How to plot specific data points in a column in R script

Imagine there are two columns, one for p-value and the other representing slope. I want to find a way to plot only the slope data points that have a significant p-value. Here is my code:
print("State the file name (include .csv)")
filename <- readline()
file <- read.csv(filename)
print ("Only include trials with p-value < .05? (enter yes or no)")
pval_filter <- readline()
if (pval_filter == "yes"){
i <- 0
count <- 0
filtered <- NULL
while (i > length(file$pval)){
if (file$pval[i] < .05){
filtered[count] <- i
count <- count + 1
}
i <- i + 1
}
x <- 0
while (x != -1){
print("State the variable to be plotted")
temp_var <- readline()
counter <- 0
var <- NULL
while (counter > length(filtered)){
var[counter] = file [, temp_var][filtered[counter]]
counter <- counter + 1
}
print ("State the title of the histogram")
title <- readline()
hist(var, main = title, xlab = var)
print("Enter -1 to exit or any other number to plot another variable")
x <- readline()
}
}

Isn't this much shorter and produces roughly the same:
df = read.csv('file.csv')
df = df[df$pval < 0.05,]
hist(df$value)
This should at least get you started.
Some remarks regarding the code:
You use a lot of reserved names (var, file) as an object name, that is a bad idea.
If you want the program to work with user input, you need to check it before doing anything with it.
There is no need to explicitly loop over rows in a data.frame, R is vectorized (e.g. see how I subsetted df above). This style looks like Fortran, there is no need for it in R.

It is hard to tell exactly what you want. It is best if an example is reproducible (we can copy/paste and run, we don't have your data so that does not work) and is minimal (there is a lot in your code that I don't think deals with your question).
But some pointers that may help.
First, the readline function has a prompt argument that will give you better looking interaction than the print statements.
If all your data is in a data frame with columns p and b for p-value and slope then you can include only the b values for which p<=0.05 with simple subsetting like:
hist( mydataframe$b[ mydataframe$p <= 0.05 ] )
or
with( mydataframe, hist(b[p<=0.05]) )
Is that enough to answer your question?

Given that data = cbind(slopes, pvalues) (so col(data) == 2)
Like this:
plot(data[data[ ,2] < 0.05 , ])
Explanation:
data[ ,2] < 0.05 will return a vector of TRUE/FALSE with the length of the columns.
so then you will get:
data[c(TRUE, FALSE....), ]
From there on, only the data will be selected where it says TRUE.
You will thus plot only those x's and y's where the pvalue is lower than 0.05.

Here is the code to plot only the slope data points with significant p-value:
Assuming the column names of the file will be pval and slope.
# Prompt a message on the Terminal
filename <- readline("Enter the file name that have p-value and slopes (include .csv)")
# Read the filename from the terminal
file <- read.csv(filename, header = TRUE)
# Prompt a message again on the Terminal and read the acceptance from user
pval_filter <- readline("Only include trials with p-value < .05? (enter yes or no)")
if (to-lower(pval_filter) == "yes"){
# Create a filtered file that contain only rows with the p-val less than that of siginificatn p-val 0.05
file.filtered <- file[file$pval < 0.05, ]
# Get the title of the Histogram to be drawn for the slopes (filtered)
hist.title <- readline("State the title of the histogram")
# Draw histogram for the slopes with the title
# las = 2 parameter in the histogram below makes the slopes to be written in parpendicular to the X-axis
# so that, the labels will not be overlapped, easily readable.
hist(file.filtered$slope, main = hist.title, xlab = Slope, ylab = frequency, las = 2)
}
Hope this would help.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

r - how to find the first empty bin of a histogram - r

I have a large dataset with response times. I need to make reference to the first empty bin of the histogram (with x being milliseconds), and exclude all data that comes after that. I think Can anybody help?

If you capture the return of hist it contains all of the information that you need. set.seed(3) x = rnorm(20) H = hist(x) min(which(H$counts == 0)) [1] 5 To exclude the data that bin and above MIN = min(which(H$counts == 0)) x = x[x<H$breaks[MIN]]

Related

How can I apply a piece of R code to every column of my data frame

R - Histogram Doesn't show density due to magnitude of the Data

R: use bin counts and bin breaks to get a histogram

Identify spikes/peaks in density plot by group

How to plot specific data points in a column in R script

Categories

Resources