extracting large layer netcdf using r - r

i am trying to extract layers from a large netcdf file using r
my nc file variable are
5 variables:"
[1] " double time_bnds[bnds,time] "
[1] " double plev_bnds[bnds,plev] "
[1] " double lat_bnds[bnds,lat] "
[1] " double lon_bnds[bnds,lon] "
[1] " float hur[lon,lat,plev,time] "
[1] " standard_name: relative_humidity"
[1] " long_name: Relative Humidity"
so i want to get the relative humidity (hur) data for all the lon and lat for a specific plev and specific time
plev<-get.var.ncdf(ncin,"plev")
time<-get.var.ncdf(ncin,"time")
plev2<-plev[2]
time2<-time[2]
hur<-get.var.ncdf(ncin,"hur",start=c(1,1,plev2,time2), count=c(1,1,2,2))
i am getting this error
Error in R_nc_get_vara_double: NetCDF: Index exceeds dimension bound
Var: hur Ndims: 4 Start: 711901,92499,0,0Count: 2,2,144,192Error in
get.var.ncdf(ncin, "hur", start = c(1, 1, plev2, time2), count = c(-1, :
C function R_nc_get_vara_double returned error
how do i solve this
edit------
solution
lat and lon is a vector
hur<-get.var.ncdf(ncin,"hur", start=c(1,1,2,2),count=c(dim(lon),dim(lat),1,1))
previously i kept directly the values of plev2 but we shud mention the index which is 2
thank you #pascal

Related

extracting information from pdfs that have line spills using R

I am trying to extract information from pdf files using R. The data I want are in tables although they arent recognised by R.
I am using the pdftools to read in the pdf file, export it to a text file and then re read it in line by line.
The files look like this.
I want to extract the Net cash from / (used in) operating activities but as you can see because the lines spill it makes it very hard.
pdf_text <- pdf_text("test.pdf")
write.table(pdf_text,"out.txt")
just <- readLines("input_file.txt")
> just[30:40]
[1] " (g) insurance costs - (137)"
[2] " 1.3 Dividends received (see note 3) - -"
[3] " 1.4 Interest received 9 21"
[4] " 1.5 Interest and other costs of finance paid - -"
[5] " 1.6 Income taxes paid - -"
[6] " 1.7 Government grants and tax incentives - -"
[7] " 1.8 Other (provide details if material) - -"
[8] " 1.9 Net cash from / (used in) operating"
[9] " (1,258) (3,785)"
[10] " activities"
I want to grab the numbers (1,258) and (3,785) still with the parentheses around them.
A common thing that happens is that the numbers will either be on line 8,9 or 10 (using my example above as reference) so I cant just simply write code to grab the data that is 'next' to "Net cash from / (used in) operating activities"
This code almost arrives at the desired result:
> text_file <- readLines("out.txt")
> operating_line <- grep("Net cash from / \\(used in\\) operat", text_file)
> operating_line <- operating_line[1]
> number_line1 <- text_file[operating_line]
> number_line2 <- text_file[operating_line + 1]
> number_line3 <- text_file[operating_line - 1]
> if (gsub("[^()[:digit:],]+", "", number_line1) != "") {
+ numbers <- gsub("[^()[:digit:],]+", "", number_line1)
+ } else if (gsub("[^()[:digit:],]+", "", number_line2) != "") {
+ numbers <- gsub("[^()[:digit:],]+", "", number_line2)
+ } else {
+ numbers <- gsub("[^()[:digit:],]+", "", number_line3)
+ }
> numbers <- gsub("\\d+\\(\\)", "", numbers)
> numbers
[1] "(1,258)(3,785)"
However there is no gap between the (1,258) and (3,785).
i.e. they are not being identified as different elements

R Column calculated into exponential/ e numbers rather than decimal [duplicate]

I have a dataframe with a column of p-values, and I want to make a selection on these p-values.
> pvalues_anova
[1] 9.693919e-01 9.781728e-01 9.918415e-01 9.716883e-01 1.667183e-02
[6] 9.952762e-02 5.386854e-01 9.997699e-01 8.714044e-01 7.211856e-01
[11] 9.536330e-01 9.239667e-01 9.645590e-01 9.478572e-01 6.243775e-01
[16] 5.608563e-01 1.371190e-04 9.601970e-01 9.988648e-01 9.698365e-01
[21] 2.795891e-06 1.290176e-01 7.125751e-01 5.193604e-01 4.835312e-04
Selection way:
anovatest<- results[ - which(results$pvalues_anova < 0.8) ,]
The function works really fine if I use it in R. But if I run it in another application (galaxy), the numbers which don't have e-01 e.g. 4.835312e-04 are not thrown out.
Is there another way to notate p-values, like 0.0004835312 instead of 4.835312e-04?
You can effectively remove scientific notation in printing with this code:
options(scipen=999)
format(99999999,scientific = FALSE)
gives
99999999
Summarising all existing answers
(And adding a few of my points)
Note : In the below explanation, value is the number to be represented in some (integer/float) format.
Solution 1 :
options(scipen=999)
Solution 2 :
format(value, scientific=FALSE);
Solution 3 :
as.integer(value);
Solution 4 :
You can use integers which don't get printed in scientific notation. You can specify that your number is an integer by putting an "L" behind it
paste(100000L)
will print 100000
Solution 5 :
Control formatting tightly using 'sprintf()'
sprintf("%6d", 100000)
will print 100000
Solution 6 :
prettyNum(value, scientific = FALSE, digits = 16)
I also find the prettyNum(..., scientific = FALSE) function useful for printing when I don't want trailing zeros. Note that these functions are useful for printing purposes, i.e., the output of these functions are strings, not numbers.
p_value <- c(2.45496e-5, 3e-17, 5.002e-5, 0.3, 123456789.123456789)
format(p_value, scientific = FALSE)
#> [1] " 0.00002454960000000" " 0.00000000000000003"
#> [3] " 0.00005002000000000" " 0.29999999999999999"
#> [5] "123456789.12345679104328156"
format(p_value, scientific = FALSE, drop0trailing = TRUE)
#> [1] " 0.0000245496" " 0.00000000000000003"
#> [3] " 0.00005002" " 0.29999999999999999"
#> [5] "123456789.12345679104328156"
# Please note that the last number's last two digits are rounded:
prettyNum(p_value, scientific = FALSE, digits = 16)
#> [1] "0.0000245496" "0.00000000000000003" "0.00005002"
#> [4] "0.3" "123456789.1234568"

How to read a matrix in R with set size

I have a matrix, saved as a file (no extension) looking like this:
Peter Westons NH 54 RTcoef level B matrix from L70 Covstats.
2.61949322E+00 2.27966995E+00 1.68120147E+00 9.88238464E-01 8.38279026E-01
7.41276375E-01
2.27966995E+00 2.31885465E+00 1.53558372E+00 4.87789344E-01 2.90254400E-01
2.56963125E-01
1.68120147E+00 1.53558372E+00 1.26129096E+00 8.18048022E-01 5.66120186E-01
3.23866166E-01
9.88238464E-01 4.87789344E-01 8.18048022E-01 1.38558423E+00 1.21272607E+00
7.20283781E-01
8.38279026E-01 2.90254400E-01 5.66120186E-01 1.21272607E+00 1.65314082E+00
1.35926028E+00
7.41276375E-01 2.56963125E-01 3.23866166E-01 7.20283781E-01 1.35926028E+00
1.74777330E+00
How do I go about reading this in as a fixed 6*6 matrix, skipping the first header? I don't see any options for the amount of columns in read.matrix, I tried with the scan() -> matrix() option but I can't read in the file as the skip parameter in scan() doesn't seem to work. I feel there must be a simple option to do this.
My original file is larger, and has 17 full rows of 5 elements and 1 row of 1 element in this structure, example of what needs to be in one row:
[1] " 2.61949322E+00 2.27966995E+00 1.68120147E+00 9.88238464E-01 8.38279026E-01"
[2] " 7.41276375E-01 5.23588785E-01 1.09559244E-01 -9.58430529E-02 -3.24544839E-02"
[3] " 1.96694874E-02 3.39249911E-02 1.54438478E-02 2.38380549E-03 9.59475077E-03"
[4] " 8.02748175E-03 1.63922615E-02 4.51778592E-04 -1.32080759E-02 -2.06313988E-02"
[5] " -1.56037533E-02 -3.35496588E-03 -4.22450803E-03 -3.17468525E-03 3.23012615E-03"
[6] " -8.68914773E-03 -5.94151619E-03 2.34059840E-04 -2.76737270E-03 -4.90334584E-03"
[7] " 1.53812087E-04 5.69891977E-03 5.33816835E-03 3.32982333E-03 -2.62856968E-03"
[8] " -5.15188677E-03 -4.47782553E-03 -5.49510247E-03 -3.71780229E-03 9.80192203E-04"
[9] " 4.18101180E-03 5.47513662E-03 4.14679058E-03 -2.81461574E-03 -4.67580613E-03"
[10] " 3.41841523E-04 4.07771227E-03 7.06154094E-03 6.61650765E-03 5.97925136E-03"
[11] " 3.92987162E-03 1.72895946E-03 -3.47249017E-03 9.90977857E-03 -2.36066909E-31"
[12] " -8.62803933E-32 -1.32472387E-31 -1.02360189E-32 -5.11800943E-33 -4.16409844E-33"
[13] " -5.11800943E-33 -2.52126889E-32 -2.52126889E-32 -4.16409844E-33 -4.16409844E-33"
[14] " -5.11800943E-33 -5.11800943E-33 -4.16409844E-33 -2.52126889E-32 -2.52126889E-32"
[15] " -2.52126889E-32 -1.58614773E-33 -1.58614773E-33 -2.55900472E-33 -1.26063444E-32"
[16] " -7.93073863E-34 -1.04102461E-33 -3.19875590E-34 -3.19875590E-34 -3.19875590E-34"
[17] " -2.60256152E-34 -1.30128076E-34 0.00000000E+00 1.78501287E-02 -1.14423068E-11"
[18] " 3.00625863E-02"
So the full matrix should be 86*86.
Thanks a bunch
Try this option :
Read the file with readLines removing the first line. ([-1]).
Split values on whitespace and create 1 X 6 matrix from every combination of two rows.
Combine them together in one matrix with do.call(rbind, ..).
rows <- readLines('filename')[-1]
result <- do.call(rbind,
tapply(rows, ceiling(seq_along(rows)/2), function(x)
strsplit(paste0(trimws(x), collapse = ' '), '\\s+')[[1]]))

How to convert a rotated NetCDF back to a normal lat/lon grid?

I have a NetCDF file with rotated coordinates. I need to convert it to normal lat/lon coordinates (-180 to 180 for lon and -90 to 90 for lat).
library(ncdf4)
nc_open('dat.nf')
For the dimensions, it shows:
[1] " 5 variables (excluding dimension variables):"
[1] " double time_bnds[bnds,time] "
[1] " double lon[rlon,rlat] "
[1] " long_name: longitude"
[1] " units: degrees_east"
[1] " double lat[rlon,rlat] "
[1] " long_name: latitude"
[1] " units: degrees_north"
[1] " char rotated_pole[] "
[1] " grid_mapping_name: rotated_latitude_longitude"
[1] " grid_north_pole_longitude: 83"
[1] " grid_north_pole_latitude: 42.5"
[1] " float tasmax[rlon,rlat,time] "
[1] " long_name: Daily Maximum Near-Surface Air Temperature"
[1] " standard_name: air_temperature"
[1] " units: K"
[1] " cell_methods: time:maximum within days time:mean over days"
[1] " coordinates: lon lat"
[1] " grid_mapping: rotated_pole"
[1] " _FillValue: 1.00000002004088e+20"
[1] " 4 dimensions:"
[1] " rlon Size:310"
[1] " long_name: longitude in rotated pole grid"
[1] " units: degrees"
[1] " axis: X"
[1] " standard_name: grid_longitude"
[1] " rlat Size:260"
[1] " long_name: latitude in rotated pole grid"
[1] " units: degrees"
[1] " axis: Y"
[1] " standard_name: grid_latitude"
[1] " bnds Size:2"
Could anyone show me how to convert the rotated coordinates back to normal lat/lon? Thanks.
NCO's ncks can probably do this in two commands using MSA
ncks -O -H --msa -d Lon,0.,180. -d Lon,-180.,-1.0 in.nc out.nc
ncap2 -O -s 'where(Lon < 0) Lon=Lon+360' out.nc out.nc
I would use cdo for this purpose https://code.zmaw.de/boards/2/topics/102
Another option is just create a mapping between rotated and geographic coordinates and use the original data without interpolation. I can find the equations if necessary.
I went through the CDO link as suggested by #kakk11, but somehow that could not work for me. Afte much research, I found a way
First, convert the rotated grid to curvilinear grid
cdo setgridtype,curvilinear Sin.nc out.nc
Next transform to your desired grid e.g. for global 1X1 degree
cdo remapbil,global_1 out.nc out2.nc
or for a grid like below
gridtype = lonlat
xsize = 320 # replace by your value
ysize = 180 # replace by your value
xfirst = 1 # replace by your value
xinc = 0.0625 # replace by your value
yfirst = 43 # replace by your value
yinc = 0.0625 # replace by your value
save this info as target_grid.txt and then run
cdo remapbil,target_grid.txt out.nc out2.nc
In my case there was additional issue that my variables did not have the grid information. so CDO assumed it to be regular lat-long grid. So before all the above-mentioned steps, I had to add grid information attribute to all the variables (in my cases all the variables ended with _ave) using nco
ncatted -a coordinates,'_ave$',c,c,'lon lat' in.nc
ncatted -a grid_mapping,'_ave$',c,c,'rotated_pole' in.nc
Please note that your should have a variable called rotated_pole in your nc file with the lat long information of rotated pole.
There is also the possibility to do that in R (as the User is referring to it in the question). Of course, NCO and CDO are more efficient (way faster).
Please, look also at this answer.
library(ncdf4)
library(raster)
nsat<- stack (air_temperature.nc)
##check the extent
extent(nsat)
## this will be in the form 0-360 degrees
#change the coordinates
nsat1<-rotate(nsat)
#check result:
extent(nsat1)
##this should be in the format you are looking for: -180/180
Hope this helps.
[edited]

R - Plotting netcdf climate data

I have been trying plot the following gridded netcdf file: "air.1999.nc" found at the following website:
http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.html
I have tried the code below based on answers I have found here and elsewhere, but no luck.
library(ncdf);
temp.nc <- open.ncdf("air.1999.nc");
temp <- get.var.ncdf(temp.nc,"air");
temp.nc$dim$lon$vals -> lon
temp.nc$dim$lat$vals -> lat
lat <- rev(lat)
temp <- temp[nrow(temp):1,]
temp[temp==-32767] <- NA
temp <- t(temp)
image(lon,lat,temp)
library(maptools)
data(wrld_simpl)
plot(wrld_simpl, add = TRUE)
This code was from modified from the one found here: The variable from a netcdf file comes out flipped
Does anyone have any ideas or experience with using these type of netcdf files? Thanks
In the question you linked the whole part from lat <- rev(lat) to temp <- t(temp) was very specific to that particular OP dataset and have absolutely no universal value.
temp.nc <- open.ncdf("~/Downloads/air.1999.nc")
temp.nc
[1] "file ~/Downloads/air.1999.nc has 4 dimensions:"
[1] "lon Size: 144"
[1] "lat Size: 73"
[1] "level Size: 12"
[1] "time Size: 365"
[1] "------------------------"
[1] "file ~/Downloads/air.1999.nc has 2 variables:"
[1] "short air[lon,lat,level,time] Longname:Air temperature Missval:32767"
[1] "short head[level,time] Longname:Missing Missval:NA"
As you can see from these informations, in your case, missing values are represented by the value 32767 so the following should be your first step:
temp <- get.var.ncdf(temp.nc,"air")
temp[temp=="32767"] <- NA
Additionnaly in your case you have 4 dimensions to your data, not just 2, they are longitude, latitude, level (which I'm assuming represent the height) and time.
temp.nc$dim$lon$vals -> lon
temp.nc$dim$lat$vals -> lat
temp.nc$dim$time$vals -> time
temp.nc$dim$level$vals -> lev
If you have a look at lat you see that the values are in reverse (which image will frown upon) so let's reverse them:
lat <- rev(lat)
temp <- temp[, ncol(temp):1, , ] #lat being our dimension number 2
Then the longitude is expressed from 0 to 360 which is not standard, it should be from -180 to 180 so let's change that:
lon <- lon -180
So now let's plot the data for a level of 1000 (i. e. the first one) and the first date:
temp11 <- temp[ , , 1, 1] #Level is the third dimension and time the fourth.
image(lon,lat,temp11)
And then let's superimpose a world map:
library(maptools)
data(wrld_simpl)
plot(wrld_simpl,add=TRUE)

Resources