Change units of time dimension in NetCDF file from months to months since - r

I currently have multiple NetCDF files with 4 dimensions, (latitude, longitude, time, and depth). Each represents a single year of monthly data. The unit of time is "month", 1-12, and therefore quite useless if I want to merge these files across years to give me a single NetCDF file with a time dimension of size months*years.
The time dimension attributes for a single file:
time Size:12 *** is unlimited ***
long_nime: time
units: month
I used ncrcat of nco to merge.
ncrcat soda3.3.1*sst.nc -O soda3.3.1_1980_2015_sst.nc
This works except that when merged, time values read
#in R
soda.info$var$temp$dim[[3]]$vals
[1] 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1
[26] 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2
[51] 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
[76] 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4
[101] 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5
[126] 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6
[151] 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7
[176] 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8
[201] 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9
[226] 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10
[251] 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11
[276] 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
[301] 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1
[326] 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2
[351] 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
[376] 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4
[401] 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5
[426] 6 7 8 9 10 11 12
...which obviously isn't much help if I want to keep track of time.
In the past I've only used NetCDF files with a "months since..." unit. Is there a way to change these rather groundless 'month' units to 'months since...'?

Would it suffice to enumerate the months sequentially?
ncap2 -s 'time=array(0,1,$time)' soda3.3.1_1980_2015_sst.nc out.nc
You can also add a "months since ..." unit to time as described in the comment by Chelmy and/or in the NCO manual. I leave that as an exercise for you, gentle reader.

Related

How to order numeric values in a designed order in R?

My question is: Given the target table(on the right), how can I order rows of the original table(on the left) to get exactly the target table with R? Thank you in advance.
Original table:
A B
1 1
1 2
5 12
2 6
5 14
3 6
3 7
5 13
6 2
3 10
5 11
2 5
6 14
2 7
5 15
6 1
3 8
6 3
2 4
1 3
2 10
4 11
2 8
1 4
1 5
2 9
4 12
4 13
3 9
6 15
Target table:
A B
1 1
1 2
1 3
1 4
1 5
3 6
3 7
3 8
3 9
3 10
5 11
5 12
5 13
5 14
5 15
6 1
6 2
6 3
2 4
2 5
2 6
2 7
2 8
2 9
2 10
4 11
4 12
4 13
6 14
6 15
This can be accomplished by ordering by an odd/even flag, and dat$B:
dat[order(-(dat$A %% 2), dat$B),]
## A B
##1 1 1
##2 1 2
##20 1 3
##24 1 4
##25 1 5
##6 3 6
##7 3 7
##17 3 8
##29 3 9
##10 3 10
##11 5 11
##3 5 12
##8 5 13
##5 5 14
##15 5 15
##16 6 1
##9 6 2
##18 6 3
##19 2 4
##12 2 5
##4 2 6
##14 2 7
##23 2 8
##26 2 9
##21 2 10
##22 4 11
##27 4 12
##28 4 13
##13 6 14
##30 6 15
If it's not an odd/even split then you can manually set the 1/3/5, and 2/4/6 groups:
dat[order(`levels<-`(factor(dat$A), list('1'=c(1,3,5), '2'=c(6,2,4))), dat$B),]
This collapsed version of the code with levels<- called directly as a function is a bit hard to read, but it is equivalent to:
grpord <- factor(dat$A)
levels(grpord) <- list('1'=c(1,3,5), '2'=c(6,2,4))
dat[order(grpord, dat$B),]
...where "1" is assigned to the groups 1, 3 and 5, and "2" to the groups 6, 2 and 4.

Limit Number of Items Displayed in Legend - GGplot R

I have a large taxonomic dataset that I need to plot as a stacked bar chart. Sample Data:
ID X A B C D E F G
1 5 9 6 7 4 8 10 6
2 6 3 9 10 3 10 4 8
3 6 6 5 8 8 8 8 1
4 9 3 2 8 4 1 5 8
5 6 6 2 8 3 7 4 10
6 0 7 8 9 1 4 9 10
7 3 2 6 8 8 1 8 7
8 4 7 10 2 9 7 9 8
9 5 7 9 10 8 2 2 1
10 0 4 6 8 9 10 7 1
11 8 9 2 2 6 5 1 7
12 8 6 0 9 7 9 8 1
13 2 8 4 4 4 2 6 7
14 4 6 6 4 9 9 3 5
15 8 1 0 6 5 8 1 1
16 6 6 9 3 9 2 1 1
17 2 4 0 2 4 8 10 9
18 5 9 8 9 4 9 3 9
19 0 2 1 6 6 9 6 2
20 3 3 7 10 4 5 6 8
21 2 6 6 9 8 10 9 4
22 7 7 1 6 8 3 7 1
23 1 9 4 5 8 9 7 7
24 0 8 5 9 1 8 9 1
25 2 1 0 1 1 2 10 7
26 10 4 1 8 2 5 9 0
27 2 7 10 10 2 3 8 6
28 6 4 2 6 7 3 1 0
29 8 1 3 4 1 10 3 6
30 1 6 5 4 7 9 7 10
31 4 4 3 2 2 9 0 4
32 9 6 6 1 6 1 5 2
The plotting part is no problem, using gggplot as below:-
l5 <- read.xlsx(paste(taxawmeta,taxawmeta_files[2], sep = ""), sheetIndex = 1)
l5_long <- l5 %>% gather(taxa,value,-c(X.FinalSampleID,TimePoint_Luna))
ggplot(l5_long, aes(fill=taxa, y = value, x = X.FinalSampleID, )) +
geom_bar(position='stack', stat='identity') +
theme_minimal() +
labs(x='Sample', y='Relative Abundance', title='Family Level Relative Abundance') +
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank(),
legend.position="none")
Where I'm running into an issue is the actual dataset has almost 200 variables. Meaning the legend is completely out of control. I know I can just hide the legend with:-
theme(.position="none")
... but what I'd like to do is keep say the top 10 entries as those are the ones of most interest. Is there any simple method to limit the number of items that are displayed in the legend? Anything I've found so far seems very convoluted and not directly applicable to this problem.

Create numbers based on different probability in R

I am trying to simulate a matrix of data set i*j, with i=2 ; j = 200, which represent subject and trial separately, and create random number between 0-10 based on trials with different probability. For first subject (i=1), the first 100 trials (j = 1-100) there is 70% probability to be number 1-5 and 30% probability to be number 6-10, and the probability reverse in trial 101 to 200. For second subject (i=2), the first 100 trials (j = 1-100) there is 60% probability to be number 1-5 and 40% probability to be number 6-10, and the probability reverse in trial 101 to 200.
I gave an example of 2 subjects because I need to do this with multiple i but not only 1 i.
Can I work this out with sample?
I guess what you are after is Stratified Sampling.
With base R, you can implement stratified sampling via sample, but you may need to define a user function like f as below
f <- function(N, p) {
c(
sapply(
list(p, rev(p)),
function(v) {
sapply(
sample(c(TRUE, FALSE), N, replace = TRUE, prob = v),
function(x) ifelse(x, sample(1:5, 1), sample(6:10, 1))
)
}
)
)
}
When you use it, you first define a probability list probs for each trial, e.g.,
probs <- list(c(0.7, 0.3), c(0.6, 0.4))
and then run
> lapply(probs, f, N = j)
[[1]]
[1] 2 1 2 5 3 6 9 2 2 2 3 2 3 7 4 5 3 7 1 4 10 2 3 6 8
[26] 7 8 3 1 2 5 1 4 4 4 2 1 5 5 4 1 6 4 2 9 10 5 1 1 5
[51] 4 4 3 4 8 4 10 3 2 1 3 4 7 4 2 10 1 4 3 3 5 2 7 6 5
[76] 3 10 4 2 2 5 1 2 3 2 3 3 2 9 10 10 10 10 3 1 4 3 1 1 5
[101] 8 6 5 9 1 6 1 9 10 4 5 4 6 5 8 2 4 10 6 3 8 5 10 8 8
[126] 8 9 3 8 6 5 7 10 9 6 8 9 5 6 8 4 6 6 7 4 4 8 10 10 6
[151] 9 10 9 7 8 7 3 7 4 6 10 8 10 8 5 6 10 8 9 6 6 1 9 4 8
[176] 1 5 10 7 10 8 7 6 6 5 4 7 7 8 8 1 10 8 5 8 9 4 5 6 7
[[2]]
[1] 7 9 4 9 5 3 3 9 4 5 6 10 4 5 2 3 2 5 4 5 3 8 5 2 1
[26] 6 5 3 9 3 9 9 9 8 7 3 4 5 7 3 5 3 5 7 5 3 4 2 6 4
[51] 7 6 2 7 4 4 10 4 10 2 8 10 3 2 8 1 8 10 8 4 3 2 9 8 4
[76] 4 10 1 3 10 6 8 6 3 5 2 3 3 9 4 7 5 1 1 1 3 10 5 2 7
[101] 2 10 2 6 8 10 10 7 3 7 3 3 7 1 10 3 4 1 1 8 2 5 2 4 7
[126] 2 7 7 4 9 10 7 1 4 4 9 7 9 9 9 8 4 1 10 6 10 4 4 8 9
[151] 7 8 3 2 9 1 9 7 6 9 1 6 3 9 7 8 5 9 3 8 9 6 5 1 2
[176] 5 10 2 7 8 7 8 8 8 8 8 5 1 1 7 6 3 3 4 2 3 2 3 1 3

Extracting numbers from very long string into vector

I have the fairly long string shown below (~50k characters)
https://gist.github.com/anonymous/9de31de2e6fc9888f3debeda4698b739
I want to extract numbers (always 1 or 2 digit), that are always between "'>" and "<" and add them to a vector (must be in the correct order).
for example:
><td class='td-val ball-8'>13</td><td class='td-val ball-8'>9</td>
Would output a vector, [13,9]
I couldn't even get it to let me enter my string into r, when I tried to do it in the form.
mystring <- "text here"
When I would try to press enter then, it would just have a + next to the command line. So I think some of the symbols in the text were messing it up.
Since it's HTML that you're trying to parse, it's best to use an HTML parsing package like rvest:
library(rvest)
url <- 'https://gist.githubusercontent.com/anonymous/9de31de2e6fc9888f3debeda4698b739/raw/c07c2d6c6f00060806b15ec57ed06d4a4e0d9d74/gistfile1.txt'
url %>% read_html() %>% html_nodes('td.td-val') %>% html_text() %>% as.integer()
which returns
[1] 13 9 8 8 1 2 0 8 11 2 13 5 13 4 4 5 4 7 3 8 10 13 1 7 14 13 10 2 0 8
[31] 13 0 10 5 11 9 3 1 4 3 5 12 4 14 1 9 13 5 9 7 12 10 2 10 14 4 11 11 13 8
[61] 8 10 10 12 12 6 8 13 7 2 2 9 10 9 13 3 14 14 0 14 4 11 14 6 10 2 0 0 10 14
[91] 2 8 3 6 14 6 1 9 11 12 1 12 4 0 7 9 2 10 1 12 0 8 0 9 3 11 11 0 8 5
[121] 0 6 1 9 8 10 7 4 7 0 3 12 10 11 11 8 4 11 1 5 12 2 14 9 12 8 1 9 14 13
[151] 8 2 1 5 7 9 14 14 12 3 6 3 9 0 6 9 3 3 10 3 8 6 9 2 4 12 2 2 14 7
[181] 12 8 0 8 12 2 12 9 6 8 9 9 3 7 9 0 6 13 0 12 3 14 12 4 8 9 14 4 5 9
[211] 6 3 2 5 1 2 0 5 0 5 9 0 12 14 11 11 7 4 12 1 14 2 13 3 13 2 0 12 13 6
[241] 5 3 13 9 12 2 11 6 8 12 9 6 13 9 0 0 4 2 1 0 0 3 0 3 7 9 11 1 8 10
[271] 11 13 12 9 10 8 10 3 7 12 4 9 0 4 14 1 7 0 7 1 2 6 0 6 6 1 0 9 4 8
[301] 0 7 13 8 11 4 1 12 1 14 11 13 9 12 8 2 8 7 12 13 12 5 8 5 10 2 7 5 9 12
[331] 12 13 8 7 6 4 12 13 4 9 12 2 0 11 8 9 1 10 5 10 9 11 10 1 8 1 12 10 9 5
[361] 7 10 5 2 7 12 4 10 6 9 0 6 0 4 13 7 0 8 3 3 11 8 4 12 10 5 7 1 11 3
[391] 1 11 7 14 13 13 14 4 2 11 2 12 3 6 14 10 6 13 9 12 4 13 10 3 9 11 8 4 8 10
[421] 9 6 3 6 7 5 11 0 2 7 6 11 11 13 13 12 7 9 6 9 5 12 14 3 13 10 1 2 7 1
[451] 14 1 0 7 8 13 6 3 9 12 2 2 2 7 11 1 2 14 6 13 11 3 6 11 5 9 0 9 13 10
[481] 11 13 3 12 12 3 7 6 5 14 3 9 10 6 13 5 7 4 5 12 8 14 5 6 8 7 0 0 2 1
[511] 1 9 13 13 5 6 10 8 0 2 3 4 4 5 14 13 5 2 2 4 6 5 9 6 14 8 4 12 4 6
[541] 9 1 4 2 4 9 1 7 1 10 0 1 1 8 6 5 8 4 9 11 14 2 3 8 2 11 3 7 11 2
[571] 4 9 5 3 4 1 4 8 13 4 8 8 1 7 2 7 3 11 13 1 13 7 9 3 7 7 4 12 9 14
[601] 11 9 2 12 12 14 10 4 12 11 12 10 14 3 11 6 12 3 6 3 11 8 10 2 6 3 1 11 2 6
[631] 0 8 12 5 5 3 6 2 14 11 7 14 14 8 11 2 7 0 10 2 0 4 8 9 8 3 2 13 4 10
[661] 2 5 13 2 2 12 12 0 10 4 1 5 13 3 10 3 11 2 5 3 9 6 11 0 8 12 0 11 2 11
[691] 7 8 1 3 4 14 4 4 9 5 12 7 6 9 12 13 2 11 1 11 12 0 4 6 10 8 5 14 7 6
[721] 4 7 2 5 2 14 3 8 10 6 14 7 14 3 2 6 5 0 3 0 12 0 12 3 5 5 8 5 14 6
[751] 10 14 5 2 3 11 3 4 3 11 4 2 0 11 11 13 4 0 6 14 2 6 9 10 4 9 5 7 1 13
[781] 8 3 13 3 10 4 8 1 3 11 2 8 5 10 7 6 10 14 14 2 2 12 8 4 13 7 11 13 4 5
[811] 7 2 3 8 14 3 9 12 6 2 6 0 3 5 8 8 0 14 13 13 7 10 9 6 1 0 4 8 6 8
[841] 14 1 9 0 9 2 7 10 8 5 10 7 1 8 2 13 3 1 8 12 12 2 5 6 3 9 4 5 4 13
[871] 6 3 10 7 9 2 1 12 1 11 0 10 0 11 8 8 0 7 0 11 10 3 14 6 9 11 11 0 12 1
[901] 10 13 1 7 7 2 0 3 13 9 2 4 12 3 0 11 1 8 8 13 12 6 8 13 8 1 13 11 2 9
[931] 11 8 10 8 3 14 6 14 7 6 7 10 3 11 3 13 11 3 9 13 8 10 8 7 12 4 11 12 12 9
[961] 6 10 2 8 13 7 11 5 7 12 10 14 1 6 7 6 7 2 3 5 13 6 10 9 5 2 0 1 11 8
[991] 9 5 1 3 3 1 12 1 13 2 14 5 7 1 10 9 0 9 11 10 6 2 7 12 10 6 2 10 13 4
[1021] 9 9 14 4 4 5 7 13 13 13 6 7 12 1 6 11 12 14 4 11 6 4 10 0 9 12 10 10 13 8
[1051] 3 3 0 8 5 14 10 3 7 5 0 14 5 6 10 14 7 4 8 9 1 6 14 1 14 5 5 14 4 11
[1081] 12 14 9 13 14 13 2 13 11 9 14 2 1 9 8 11 13 11 14 13 3 4 9 6 9 6 10 13 1 12
[1111] 10 14 11 5 8 9 3 5 6 14 1 11 10 12 7 7 2 13 13 12 12 4 3 14 6 4 2 5 9 4
[1141] 14 11 6 4 11 6 4 4 8 2 2 5 14 1 7 11 8 9 11 11 10 6 14 3 0 3 8 8 14 13
[1171] 10 6 10 4 9 12 0 9 2 9 13 12 1 12 3 5 5 3 12 2 1 5 1 0 10 7 3 10 14 13
[1201] 11 8 0 10 12 9 4 5 4 8 5 6 2 11 7 5 5 8 4 9 9 10 14 3 7 9 1 9 9 8
[1231] 1 8 11 5 2 4 9 14 14 6 10 7 4 14 6 5 1 4 3 8 13 10 5 1 8 8 6 8 7 1
[1261] 14 4 4 7 2 12 10 8 10 5 6 7 2 3 5 13 1 2 9 8 5 14 1 11 9 5 8 12 13 0
[1291] 4 2 0 8 8 2 5 3 13 11 5 11 14 14 9 12 4 5 9 3 13 14 1 5 10 4 9 6 5 8
[1321] 7 5 7 3 14 8 4 8 4 6 5 8 11 0 14 13 2 13 12 13 3 4 7 8 11 4 14 12 3 6
[1351] 11 8 8 9 6 7 4 3 10 9 2 9 12 12 0 1 10 9 8 0 12 9 3 14 13 7 8 12 10 9
[1381] 10 10 2 11
You can use readLines to import string from the url which you can get by clicking the Raw button.
mystring <- readLines("https://gist.githubusercontent.com/anonymous/9de31de2e6fc9888f3debeda4698b739/raw/c07c2d6c6f00060806b15ec57ed06d4a4e0d9d74/gistfile1.txt")
Use some regular expression as follows should give you all the numbers you want:
library(stringr)
num <- gsub(">|<", "", str_extract_all(mystring, ">\\d+<", simplify = T))
head(as.vector(num))
[1] "13" "9" "8" "8" "1" "2"

Sequentially reorganize a vector in R

I have a numeric element z as below:
> sort(z)
[1] 1 5 5 5 6 6 7 7 7 7 7 9 9
I would like to sequentially reorganize this element so to have
> z
[1] 1 2 2 2 3 3 4 4 4 4 4 5 5
I guess converting z to a factor and use it as an index should be the way.
You answered it yourself really:
as.integer(factor(sort(z)))
I know this has been accepted already but I decided to look inside factor() to see how it's done there. It more or less comes down to this:
x <- sort(z)
match(x, unique(x))
Which is an extra line I suppose but it should be faster if that matters.
This should do the trick
z = sort(sample(1:10, 100, replace = TRUE))
cumsum(diff(z)) + 1
[1] 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3
[26] 3 3 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6
[51] 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8
[76] 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10
Note that diff omits the first element of the series. So to compensate:
c(1, cumsum(diff(z)) + 1)
Alternative using rle:
z = sort(sample(1:10, 100, replace = TRUE))
rle_result = rle(sort(z))
rep(rle_result$values, rle_result$lengths)
> rep(rle_result$values, rle_result$lengths)
[1] 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3
[26] 3 3 3 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 6 6 6
[51] 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8
[76] 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10
rep(seq_along(rle(x)$l), rle(x)$l)

Resources