I'm using R and I have a raster stack of surface soil moisture measurements from a radiometer on fixed on an observation tower. These data are daily values going back 10 years.
I also have another raster stack of satellite microwave measurements of soil moisture over a larger area going back 25 years. Both sensors have similar frequencies.
On a per-pixel basis, I would like to use a linear cumulative distribution function matching that rescales the satellite data against the tower data so that it would result in a longer time series of rescale satellite data.
This point is to correct for systematic differences between the soil moisture values and extend the time series. This is similar to what was done in the figure below where they matched the AMSR-E (blue plot) and ASCAT (red plot) data to Noah data (black plot).
Does anyone know how to implement this in R? Or at the very least help me get started? I've scoured the Internet and this website without success.
Related
Given the plot of the time-series data, I was wondering if there is a robust function/mathematical formula I can use in R to determine which plots are oscillating. For example each individual graph corresponds to a single cell's intensity value over a certain time period. I would want a method to give a score or some value that would be able to differentiate between plots that are not oscillating (#513 and 559) compared to the plots that are oscillating (508,512,557,558). All the plots have the same scaling.
I have several cumulative incidence curves showing the incidence of an outcome after an exposure, plotted on the same graph, stratified by age at exposure.
The longest follow-up period is 25 years, but in the oldest group, the follow-up does not last any longer than 15 years (as they are all dead by then).
When I plot the cum inc curves, the curve for the oldest group 'flatlines' after 15y, but does not disappear until the 25y point, when all the other curves also disappear (i.e the plot ends).
Is there a way to stop a cumulative incidence curve being plotted when there is nobody at risk, even if other curves for different groups on the same graph go on for longer than the point you wish to stop plotting at for that one problematic curve?
I did this with the surv_fit function in R.
I'm trying to build a histogram in which the X-axis shows each case I'm working with (my matrix's info includes the murders' resolution rate for different police stations in one city for a year), each police station, and the Y-axis would show the resolution rate (from 0 to 1). So, there would be 51 bars, one for each police station, and each one should reach one of those rates from 0 to 1.
But when I run hist with my matrix, the X-axis displays resolution rates and the Y-axis displays the frequency, the number of police stations that reach each resolution rate.
How can I get the result I wrote before? This is the code I'm using:
anobase<-matrix(CResolucion[seleccion_ano==2018], length(seleccion_estado), 1)
rownames(anobase) <- seleccion_estado
colnames(anobase) <- 2018
hist(anobase)
(and, yeah, I'm new at using R)
So, that's the plot. As you see, the X-axis displays values from 0 to 1. These values represent the resolution rate said before (result from dividing solved murders by the total of murders registered). The Y-axis on the other hand displays a frequency from 0-15. Then, each bar shows how many cases have each resolution rate. What I want to do is show in the X-axis each police station, so each bar would be a police station, and they should reach that resolution rate from 0-1 (Y-axis). I hope I'm being clear.
You don't want a histogram; you want a column or bar chart. Histograms summarize the distribution of a single continuous variable; column charts compare values of a continuous variable across categories (here, police stations).
You haven't posted a reproducible example, so I can't tell exactly what's going on with your data. Let's assume, though, that you have a vector of resolution rates called rates and a vector of station names associated with those rates called stations. In base R, you could then create a column chart with barplot(rates, names.arg = stations).
I'm trying to find out the peak frequencies hidden in my data using the fft() method in R. While preparing the data, a more experienced user recommends to create a "mask" (more after explaining the details), that does give me the exact diagram I'm looking for. The problem is, I don't understand what it does or why it's needed.
To give some context, I'm working with .txt files with around 12000 entries each. It's voltage vs. time information, and the expected result is just a sinusoidal wave with a clear peak frequency that should be close to 1-2 Hz. This is an example of what one of those files look like:
I've been trying to use the Fast Fourier Transform method fft() implemented in R to find the peak frequencies and get a diagram that reflected them clearly. At first, I calculate some things that I understand are going to be useful, like the Nyquist frequency and the range of frequencies I'll show in the final graph:
n = length(variable)
dt = time[5]-time[4]
df = 1/(max(time)) #Find out the "unit" frequency
fnyquist = 1/(2*dt) #The Nyquist frequency
f = seq(-fnyquist, fnyquist-df, by=df) #These are the frequencies I'll plot
But when I plot the absolute value of what fft(data) calculates vs. the range of frequencies, I get this:
The peak frequency seems to be close to 50 Hz, but I know that's not the case. It should be close to 1 Hz. I'm a complete newbie in R and in Fourier analysis, so after researching a little, I found in a Swiss page that this can be solved by creating a "mask", which is actually just a vector with a repeatting patern (1, -1, 1, -1...) with the same length as my data vector itself:
mask=rep(c(1, -1),length.out=n)
Then if I multiply my data vector by this mask and plot the results:
results = mask*data
plot(f,abs(fft(results)),type="h")
I get what I was looking for. (This is the graph after limiting the x-axis to a reasonable scale).
So, what's the mask actually doing? I undestand it's changing my data point signs in an alternate manner, but I don't get why it would take the infered peak frequencies from ~50 Hz to the correct result of ~1 Hz.
Thanks in advance!
Your "mask" is one of two methods of performing an fftshift, which is commonly done to center the 0 Hz output of an FFT in the middle of a graph or plot (instead of at the left edge, with the negative frequencies wrapping around to the right edge).
To perform an fftshift, you can hetrodyne or modulate your data (by Fs/2) before the FFT, or simply do a circular shift by 50% after the FFT. Both produce the same result. They are the same due to the shift property of the DFT.
I have a time series dataset with spatial data (x,y coordinates). Each point is static in location, but its value varies over time, ie. each point has its own unique function. I want to assign these functions as a mark, so I can plot the point pattern with each individual time series as a plotting symbol.
This is an exploratory step to eventually perform some spatial functional data analysis.
As an example, I want something like Figure 2 published in this article:
*Delicado,P., R. Giraldo, C. Comas, and J. Mateu. 2010. Spatial Functional Data: Some Recent Contibutions. Environmetrics 21:224-239
I'm having trouble posting an image of the figure
1) Working in R with ggplot2, I can plot a line of change in quant of each id over time:
(Fake example dataset, where x and y are Carteian coordinates, id is an individual observation, and quant are values of id at each year):
x<-c(1,1,1,2,2,2,3,3,3)
y<-c(1,1,1,2,2,2,3,3,3)
year<-c(1,2,3,1,2,3,1,2,3)
id<-c("a","a","a","b","b","b","c","c","c")
quant<-c(5,2,4,2,4,2,4,4,6)
allData<-data.frame(x,y,year,id,quant)
ggplot(allData,aes(x=year,y=quant, group=id))+geom_line()
2) Or I can plot the geographic point pattern of id:
ggplot(allData,aes(x=x,y=y,color=id))+geom_point()
I want to plot the graph from (2), but use the line plots from (1) as the point symbols (marks). Any suggestions?