monthly slope coefficient regressing first column with rest of the column - r
I like to regress the first column means market return (as y) with rest of the columns (as X) and create a data frame with the list of monthly slope coefficients. My data frame is like this
Date Marker return AFARAK GROUP PLC AFFECTO OYJ
1/3/2007 -0.45 0.00 0.85
1/4/2007 -0.92 2.47 -0.85
1/5/2007 -1.98 3.98 -1.14
The expected output of slope coefficient data frame is like this
Date AFARAK GROUP PLC AFFECTO OYJ
Jan-07 1 0.5
Feb-07 2 1.5
Mar-07 2 1
Apr-07 3 2
Could someone help me in this regard?
Related
Converting dataframe with multiple dimensions to raster layer
I am creating a raster layer for an area with multiple environmental variables. All data formats have usually been netCDF files (arrays) containing lat, long, date and the variable in question - in this case sea_ice_fraction. The data for sea surface temperature (sst), came in an understandable format, at least from the point of view of trying to make a prediction grid: , , Date = 2019-11-25 Long Lat 294.875 295.125 295.375 295.625 295.875 296.125 296.375 296.625 296.875 297.125 -60.125 2.23000002 2.04 1.83 1.53 1.18 1.00 0.9800000 1.06 1.25 1.40999997 -60.375 2.06999993 1.79 1.60 1.31 1.09 0.97 1.0000000 1.15 1.30 1.42999995 -60.625 1.93999994 1.64 1.45 1.28 1.14 1.02 0.9899999 1.03 1.10 1.13000000 Each row is one single latitude coordinate (of the resolution of the data), and each column is a longitude coordinate paired with the date. My goal is to calculate the mean of all the date-values for each coordinate cell. Which in the array case is easy: sst.c1 <- apply(sst.c1, c(1,2), mean) Then project to a Raster layer However, the format of the sea ice data is in a dataframe, with 4 columns: lat, long, date, and sea_ice_fraction: time lat lon sea_ice_fraction <chr> <dbl> <dbl> <dbl> 1 2019-11-25T12:00:00Z -66.1 -65.1 0.580 2 2019-11-25T12:00:00Z -66.1 -65.1 NA 3 2019-11-25T12:00:00Z -66.1 -65.0 NA 4 2019-11-25T12:00:00Z -66.1 -65.0 NA 5 2019-11-25T12:00:00Z -66.1 -64.9 NA How can I turn this dataframe into an array similar to the sst data? Or directly into a raster finding the mean of the values for the dates per cell in the dataframe?
Can you not just do this using dplyr? The following should work fine: library(dplyr) df %>% group_by(lat, lon) %>% summarize(sea_ice_fraction = mean(sea_ice_fraction)) %>% ungroup() should work fine
Beta(Market model regression) value company wise with moving window
I have a dataframe which looks something like this: company_name co_stkdate dailyreturns marketreturn A 01-01-2000 5.67 4.54 A 02-01-2000 3.43 1.23 A 03-01-2000 -1.01 -0.53 . . . A 30-12-2018 5.65 3.45 A 31-12-2018 2.32 1.32 B 01-01-2000 -2.34 -1.12 B 02-01-2000 1.32 0.34 . . . There are hundred such companies. I want to perform OLS regression company-wise with moving window of 1 year. regression model is dailyreturn=alpha+beta*marketreturn After performing regression. I want to get beta value for each year. Output should look something like. company_name year beta A 2000 0.87 A 2001 0.99 A 2002 0.76 A 2003 0.65 . . . this is what I have done so far. betas <- dbdf %>% group_by(co_code,company_name) %>% do(model=lm(formula=dailylogrtn~niftyreturns,data=.)) This helped me to get one beta value company-wise for 2000-2018. I am not sure how to perform regression analysis with moving windows of 1 year. Regression analysis should be from 01-01-2000 to 31-12-2000 then new window 01-01-2001 to 31-12-2001 then 01-01-2002 to 31-12-2002 and so on.
I have solved this on my own. First you need to convert date into year .You just need to replace some terms in group_by(). betas <- dbdf %>% group_by(company_name,year) %>% do(model=lm(formula=dailylogrtn~niftyreturns,data=.)) model can be converted into dataframe by tiny() function in library(broom)
Find where species accumulation curve reaches asymptote
I have used the specaccum() command to develop species accumulation curves for my samples. Here is some example data: site1<-c(0,8,9,7,0,0,0,8,0,7,8,0) site2<-c(5,0,9,0,5,0,0,0,0,0,0,0) site3<-c(5,0,9,0,0,0,0,0,0,6,0,0) site4<-c(5,0,9,0,0,0,0,0,0,0,0,0) site5<-c(5,0,9,0,0,6,6,0,0,0,0,0) site6<-c(5,0,9,0,0,0,6,6,0,0,0,0) site7<-c(5,0,9,0,0,0,0,0,7,0,0,3) site8<-c(5,0,9,0,0,0,0,0,0,0,1,0) site9<-c(5,0,9,0,0,0,0,0,0,0,1,0) site10<-c(5,0,9,0,0,0,0,0,0,0,1,6) site11<-c(5,0,9,0,0,0,5,0,0,0,0,0) site12<-c(5,0,9,0,0,0,0,0,0,0,0,0) site13<-c(5,1,9,0,0,0,0,0,0,0,0,0) species_counts<-rbind(site1,site2,site3,site4,site5,site6,site7,site8,site9,site10,site11,site12,site13) accum <- specaccum(species_counts, method="random", permutations=100) plot(accum) In order to ensure I have sampled sufficiently, I need to make sure the curve of the species accumulation plot reaches an asymptote, defined as a slope of <0.3 between the last two points (ei between sites 12 and 13). results <- with(accum, data.frame(sites, richness, sd)) Produces this: sites richness sd 1 1 3.46 0.9991916 2 2 4.94 1.6625403 3 3 5.94 1.7513054 4 4 7.05 1.6779918 5 5 8.03 1.6542263 6 6 8.74 1.6794660 7 7 9.32 1.5497149 8 8 9.92 1.3534841 9 9 10.51 1.0492422 10 10 11.00 0.8408750 11 11 11.35 0.7017295 12 12 11.67 0.4725816 13 13 12.00 0.0000000 I feel like I'm getting there. I could generate an lm with site vs richness and extract the exact slope (tangent?) between sites 12 and 13. Going to search a bit longer here.
Streamlining your data generation process a little bit: species_counts <- matrix(c(0,8,9,7,0,0,0,8,0,7,8,0, 5,0,9,0,5,0,0,0,0,0,0,0, 5,0,9,0,0,0,0,0,0,6,0,0, 5,0,9,0,0,0,0,0,0,0,0,0, 5,0,9,0,0,6,6,0,0,0,0,0, 5,0,9,0,0,0,6,6,0,0,0,0, 5,0,9,0,0,0,0,0,7,0,0,3, 5,0,9,0,0,0,0,0,0,0,1,0, 5,0,9,0,0,0,0,0,0,0,1,0, 5,0,9,0,0,0,0,0,0,0,1,6, 5,0,9,0,0,0,5,0,0,0,0,0, 5,0,9,0,0,0,0,0,0,0,0,0, 5,1,9,0,0,0,0,0,0,0,0,0), byrow=TRUE,nrow=13) Always a good idea to set.seed() before running randomization tests (and let us know that specaccum is in the vegan package): set.seed(101) library(vegan) accum <- specaccum(species_counts, method="random", permutations=100) Extract the richness and sites components from within the returned object and compute d(richness)/d(sites) (note that the slope vector is one element shorter than the origin site/richness vectors: be careful if you're trying to match up slopes with particular numbers of sites) (slopes <- with(accum,diff(richness)/diff(sites))) ## [1] 1.45 1.07 0.93 0.91 0.86 0.66 0.65 0.45 0.54 0.39 0.32 0.31 In this case, the slope never actually goes below 0.3, so this code for finding the first time that the slope falls below 0.3: which(slopes<0.3)[1] returns NA.
Mahalanobis distance between profiles in R
A sample of 100 subjects responded to two personality tests. These tests have slightly different wordings but are generally the same, i.e. they both measure the same 4 attitudes. Therefore, I have 2 matrices like this, with 4 scores per subject: >test1 subj A1 A2 A3 A4 1 -2.14 1.21 0.93 -1.72 2 0.25 1.17 0.67 0.67 >test2 subj A1 A2 A3 A4 1 -1.99 1.11 1.00 -1.52 2 0.24 1.20 0.71 0.65 I'd like to evaluate the similarity of profiles in the two tests, i.e. the similarity of two sets of 4 scores for each individual. I feel like the mahalanobis distance is the measure I need and I checked some packages (HDMD, StatMatch) but couldn't find the right function.
One approach to this is to create a difference score matrix and then calculate the Mahalanobis distances on the difference scores. testDiff <- test1 - test2 testDiffMahalanobis <- mahalanobis(testDiff, center = colMeans(testDiff), cov = cov(testDiff))
Histogram frequency count for column
I have a set of data (in the region of 800000 lines), in three columns (longitude, latitude and earthquake magnitude) that are not sorted in any way. A small example below... -118.074 36.930 2.97 -118.005 36.898 2.61 -116.526 36.621 2.72 -116.488 36.650 2.68 -117.675 36.820 2.00 -117.963 36.514 1.30 -118.090 36.757 1.94 -117.651 36.518 1.40 -116.434 36.506 1.90 -117.914 36.531 2.10 -118.235 36.882 2.00 I am required to create a histogram of the earthquake magnitudes (in the range of 1.0 to 7.0), but I am not sure how to go about creating the frequency of magnitudes. I understand that in order to create a histogram, I will need to discern the unique values, and set them in ascending order in a column. I believe I can then run a for command with a count function for each value... but I need a bit of help in doing so! Thank you for any help you can offer!
awk '{counts[$3]++} END {for (c in counts) print c, counts[c]}' inputs.txt | sort -nk2 will print the unique magnitudes and their counts in ascending order: 1.30 1 1.40 1 1.90 1 1.94 1 2.10 1 2.61 1 2.68 1 2.72 1 2.97 1 2.00 2