monthly slope coefficient regressing first column with rest of the column - r

I like to regress the first column means market return (as y) with rest of the columns (as X) and create a data frame with the list of monthly slope coefficients. My data frame is like this
Date Marker return AFARAK GROUP PLC AFFECTO OYJ
1/3/2007 -0.45 0.00 0.85
1/4/2007 -0.92 2.47 -0.85
1/5/2007 -1.98 3.98 -1.14
The expected output of slope coefficient data frame is like this
Date AFARAK GROUP PLC AFFECTO OYJ
Jan-07 1 0.5
Feb-07 2 1.5
Mar-07 2 1
Apr-07 3 2
Could someone help me in this regard?

Related

Converting dataframe with multiple dimensions to raster layer

I am creating a raster layer for an area with multiple environmental variables. All data formats have usually been netCDF files (arrays) containing lat, long, date and the variable in question - in this case sea_ice_fraction.
The data for sea surface temperature (sst), came in an understandable format, at least from the point of view of trying to make a prediction grid:
, , Date = 2019-11-25
Long
Lat 294.875 295.125 295.375 295.625 295.875 296.125 296.375 296.625 296.875 297.125
-60.125 2.23000002 2.04 1.83 1.53 1.18 1.00 0.9800000 1.06 1.25 1.40999997
-60.375 2.06999993 1.79 1.60 1.31 1.09 0.97 1.0000000 1.15 1.30 1.42999995
-60.625 1.93999994 1.64 1.45 1.28 1.14 1.02 0.9899999 1.03 1.10 1.13000000
Each row is one single latitude coordinate (of the resolution of the data), and each column is a longitude coordinate paired with the date.
My goal is to calculate the mean of all the date-values for each coordinate cell. Which in the array case is easy:
sst.c1 <- apply(sst.c1, c(1,2), mean)
Then project to a Raster layer
However, the format of the sea ice data is in a dataframe, with 4 columns: lat, long, date, and sea_ice_fraction:
time lat lon sea_ice_fraction
<chr> <dbl> <dbl> <dbl>
1 2019-11-25T12:00:00Z -66.1 -65.1 0.580
2 2019-11-25T12:00:00Z -66.1 -65.1 NA
3 2019-11-25T12:00:00Z -66.1 -65.0 NA
4 2019-11-25T12:00:00Z -66.1 -65.0 NA
5 2019-11-25T12:00:00Z -66.1 -64.9 NA
How can I turn this dataframe into an array similar to the sst data? Or directly into a raster finding the mean of the values for the dates per cell in the dataframe?
Can you not just do this using dplyr?
The following should work fine:
library(dplyr)
df %>%
group_by(lat, lon) %>%
summarize(sea_ice_fraction = mean(sea_ice_fraction)) %>%
ungroup()
should work fine

Beta(Market model regression) value company wise with moving window

I have a dataframe which looks something like this:
company_name co_stkdate dailyreturns marketreturn
A 01-01-2000 5.67 4.54
A 02-01-2000 3.43 1.23
A 03-01-2000 -1.01 -0.53
.
.
.
A 30-12-2018 5.65 3.45
A 31-12-2018 2.32 1.32
B 01-01-2000 -2.34 -1.12
B 02-01-2000 1.32 0.34
.
.
.
There are hundred such companies. I want to perform OLS regression company-wise with moving window of 1 year.
regression model is
dailyreturn=alpha+beta*marketreturn
After performing regression. I want to get beta value for each year.
Output should look something like.
company_name year beta
A 2000 0.87
A 2001 0.99
A 2002 0.76
A 2003 0.65
.
.
.
this is what I have done so far.
betas <- dbdf %>% group_by(co_code,company_name) %>% do(model=lm(formula=dailylogrtn~niftyreturns,data=.))
This helped me to get one beta value company-wise for 2000-2018. I am not sure how to perform regression analysis with moving windows of 1 year.
Regression analysis should be from 01-01-2000 to 31-12-2000 then new window 01-01-2001 to 31-12-2001 then 01-01-2002 to 31-12-2002 and so on.
I have solved this on my own. First you need to convert date into year .You just need to replace some terms in group_by().
betas <- dbdf %>% group_by(company_name,year) %>% do(model=lm(formula=dailylogrtn~niftyreturns,data=.))
model can be converted into dataframe by tiny() function in library(broom)

Find where species accumulation curve reaches asymptote

I have used the specaccum() command to develop species accumulation curves for my samples.
Here is some example data:
site1<-c(0,8,9,7,0,0,0,8,0,7,8,0)
site2<-c(5,0,9,0,5,0,0,0,0,0,0,0)
site3<-c(5,0,9,0,0,0,0,0,0,6,0,0)
site4<-c(5,0,9,0,0,0,0,0,0,0,0,0)
site5<-c(5,0,9,0,0,6,6,0,0,0,0,0)
site6<-c(5,0,9,0,0,0,6,6,0,0,0,0)
site7<-c(5,0,9,0,0,0,0,0,7,0,0,3)
site8<-c(5,0,9,0,0,0,0,0,0,0,1,0)
site9<-c(5,0,9,0,0,0,0,0,0,0,1,0)
site10<-c(5,0,9,0,0,0,0,0,0,0,1,6)
site11<-c(5,0,9,0,0,0,5,0,0,0,0,0)
site12<-c(5,0,9,0,0,0,0,0,0,0,0,0)
site13<-c(5,1,9,0,0,0,0,0,0,0,0,0)
species_counts<-rbind(site1,site2,site3,site4,site5,site6,site7,site8,site9,site10,site11,site12,site13)
accum <- specaccum(species_counts, method="random", permutations=100)
plot(accum)
In order to ensure I have sampled sufficiently, I need to make sure the curve of the species accumulation plot reaches an asymptote, defined as a slope of <0.3 between the last two points (ei between sites 12 and 13).
results <- with(accum, data.frame(sites, richness, sd))
Produces this:
sites richness sd
1 1 3.46 0.9991916
2 2 4.94 1.6625403
3 3 5.94 1.7513054
4 4 7.05 1.6779918
5 5 8.03 1.6542263
6 6 8.74 1.6794660
7 7 9.32 1.5497149
8 8 9.92 1.3534841
9 9 10.51 1.0492422
10 10 11.00 0.8408750
11 11 11.35 0.7017295
12 12 11.67 0.4725816
13 13 12.00 0.0000000
I feel like I'm getting there. I could generate an lm with site vs richness and extract the exact slope (tangent?) between sites 12 and 13. Going to search a bit longer here.
Streamlining your data generation process a little bit:
species_counts <- matrix(c(0,8,9,7,0,0,0,8,0,7,8,0,
5,0,9,0,5,0,0,0,0,0,0,0, 5,0,9,0,0,0,0,0,0,6,0,0,
5,0,9,0,0,0,0,0,0,0,0,0, 5,0,9,0,0,6,6,0,0,0,0,0,
5,0,9,0,0,0,6,6,0,0,0,0, 5,0,9,0,0,0,0,0,7,0,0,3,
5,0,9,0,0,0,0,0,0,0,1,0, 5,0,9,0,0,0,0,0,0,0,1,0,
5,0,9,0,0,0,0,0,0,0,1,6, 5,0,9,0,0,0,5,0,0,0,0,0,
5,0,9,0,0,0,0,0,0,0,0,0, 5,1,9,0,0,0,0,0,0,0,0,0),
byrow=TRUE,nrow=13)
Always a good idea to set.seed() before running randomization tests (and let us know that specaccum is in the vegan package):
set.seed(101)
library(vegan)
accum <- specaccum(species_counts, method="random", permutations=100)
Extract the richness and sites components from within the returned object and compute d(richness)/d(sites) (note that the slope vector is one element shorter than the origin site/richness vectors: be careful if you're trying to match up slopes with particular numbers of sites)
(slopes <- with(accum,diff(richness)/diff(sites)))
## [1] 1.45 1.07 0.93 0.91 0.86 0.66 0.65 0.45 0.54 0.39 0.32 0.31
In this case, the slope never actually goes below 0.3, so this code for finding the first time that the slope falls below 0.3:
which(slopes<0.3)[1]
returns NA.

Mahalanobis distance between profiles in R

A sample of 100 subjects responded to two personality tests. These tests have slightly different wordings but are generally the same, i.e. they both measure the same 4 attitudes. Therefore, I have 2 matrices like this, with 4 scores per subject:
>test1
subj A1 A2 A3 A4
1 -2.14 1.21 0.93 -1.72
2 0.25 1.17 0.67 0.67
>test2
subj A1 A2 A3 A4
1 -1.99 1.11 1.00 -1.52
2 0.24 1.20 0.71 0.65
I'd like to evaluate the similarity of profiles in the two tests, i.e. the similarity of two sets of 4 scores for each individual. I feel like the mahalanobis distance is the measure I need and I checked some packages (HDMD, StatMatch) but couldn't find the right function.
One approach to this is to create a difference score matrix and then calculate the Mahalanobis distances on the difference scores.
testDiff <- test1 - test2
testDiffMahalanobis <- mahalanobis(testDiff,
center = colMeans(testDiff),
cov = cov(testDiff))

Histogram frequency count for column

I have a set of data (in the region of 800000 lines), in three columns (longitude, latitude and earthquake magnitude) that are not sorted in any way. A small example below...
-118.074 36.930 2.97
-118.005 36.898 2.61
-116.526 36.621 2.72
-116.488 36.650 2.68
-117.675 36.820 2.00
-117.963 36.514 1.30
-118.090 36.757 1.94
-117.651 36.518 1.40
-116.434 36.506 1.90
-117.914 36.531 2.10
-118.235 36.882 2.00
I am required to create a histogram of the earthquake magnitudes (in the range of 1.0 to 7.0), but I am not sure how to go about creating the frequency of magnitudes.
I understand that in order to create a histogram, I will need to discern the unique values, and set them in ascending order in a column. I believe I can then run a for command with a count function for each value... but I need a bit of help in doing so!
Thank you for any help you can offer!
awk '{counts[$3]++} END {for (c in counts) print c, counts[c]}' inputs.txt | sort -nk2
will print the unique magnitudes and their counts in ascending order:
1.30 1
1.40 1
1.90 1
1.94 1
2.10 1
2.61 1
2.68 1
2.72 1
2.97 1
2.00 2

Resources