Error in number of decimal places with read.xlsx - r
I am trying to read in a dataset of coordinates in the British National grid system, using the read.xlsx command.
This is the data:
NORTHING EASTING TOC ELEVATION WELL ID
1194228.31 2254272.83 117.30 AA-1
1194227.81 2254193.90 114.91 AA-2
1194228.41 2254116.26 114.76 AA-3
1194229.37 2254039.57 112.81 AA-4
1194227.09 2253960.17 112.10 AA-5
and this is my code:
coordinates <- read.xlsx2("Coordinates.xlsx",sheetName = "Sheet1",
startRow = 1,endRow = 111, colIndex = c(1:4),
colClasses = c("character","character","numeric","character"))
The problem is, my output looks like this:
NORTHING EASTING TOC.ELEVATION WELL.ID
1 1194228 2254273 117.30 AA-1
2 1194228 2254194 114.91 AA-2
3 1194228 2254116 114.76 AA-3
4 1194229 2254040 112.81 AA-4
5 1194227 2253960 112.10 AA-5
6 1194227 2253880 110.98 AA-6
The command is rounding up the horizontal and vertical coordinates, and while this is not a big issue, I'd like to be as exact as possible. Is there a workaround to this? I could not find anything in the options to the colClasses option either.
This is an issue of how R is printing out the data (it is generally convenient not to give the full representation of floating-point data); you didn't actually lose any precision.
Illustrating with read.table rather than read.xlsx (we're going to end up in the same place). (If I read the data with colClasses specifying "character", I do get all of the digits displayed, but I also end up with a rather useless data frame if I want to do anything sensible with the northings and eastings variables ...)
dat <- read.table(header=TRUE,
text="
NORTHING EASTING TOC.ELEVATION WELL.ID
1194228.31 2254272.83 117.30 AA-1
1194227.81 2254193.90 114.91 AA-2
1194228.41 2254116.26 114.76 AA-3
1194229.37 2254039.57 112.81 AA-4
1194227.09 2253960.17 112.10 AA-5")
This is how R prints the data frame:
# NORTHING EASTING TOC.ELEVATION WELL.ID
# 1 1194228 2254273 117.30 AA-1
# 2 1194228 2254194 114.91 AA-2
# 3 1194228 2254116 114.76 AA-3
# 4 1194229 2254040 112.81 AA-4
# 5 1194227 2253960 112.10 AA-5
But it's still possible to see that all of the precision is still there ...
print(dat$NORTHING,digits=12)
## [1] 1194228.31 1194227.81 1194228.41 1194229.37 1194227.09
You could also print(dat,digits=12) or set options(digits=12) globally ...
Related
rewriting a for loop into to a -apply formular in R for georoute
I have a massive data.frame with the starting and ending (latitudes & longtitude) and am using the georoute function from the taRifx.geo package, to find out how far and how much time does it take to drive from A to B. the data looked something like this (both latlon and latlon_end are class of characters: > LL[1:10,14:15] latlon latlon_end 1 52.481466 13.317647 52.518811 13.413034 2 52.518811 13.413034 52.504182 13.318051 3 52.504182 13.318051 52.502236 13.305396 4 52.502236 13.305396 52.548096 13.355104 5 52.548096 13.355104 52.569865 13.410967 6 52.569865 13.410967 52.54505 13.419071 7 52.54505 13.419071 52.527736 13.378182 8 52.527736 13.378182 52.495678 13.343019 9 52.495678 13.343019 52.496712 13.341767 10 52.496712 13.341767 52.458631 13.32529 and here is a for loop that I have written for the purpose: for(i in 38753:100000){ DT[i,]=tryCatch(t(as.matrix(unlist(georoute( c(as.character(LL$latlon[i]), as.character(LL$latlon_end[i])), verbose=TRUE, returntype=c("time", "distance"))), nrow = 1, ncol = 2)), error=function(a) {"."} ) } the base function here, georoute basically give out a list of two elements, time and distance, that's why I have to unlist them first before binding all them into a dataframe. for the trycatch function, that's to deal with occasional error for the georoute, I have no idea how alternatively I can do this.. I have really tried a lot of methods but only this seems to have to work out for me, since somehow this georoute function seems to take only one pair of latlon & latlon_end at one time so I have to do this row by row. However with a few hundred thousands of entries this is taking me days or even weeks to process all this data. I know I should go in the package and understand the codes behind(link inserted) just so I know what is a better fit for this purpose, yet the script is too advanced for my level and I don't even know what in the script that I am looking for to be exact. I guess I could use the lapply function for this but I just can't make it work. Any help or tips or ideas would be very really super greatly appreciated! ps. update for original georoute returns > georoute(c(as.character(LL$latlon[1]), as.character(LL$latlon_end[1])), verbose = FALSE, returntype = c("time","distance")) distance time 1 9.03 1338 > georoute(c(as.character(LL$latlon[1:3]), as.character(LL$latlon_end[1:3])), verbose = FALSE, returntype = c("time","distance")) distance time 1 35.599 5275 > class(georoute(c(as.character(LL$latlon[1]), as.character(LL$latlon_end[1])), verbose = FALSE, returntype = c("time","distance"))) [1] "data.frame" and I think the distance and time returned are numeric because the summary of that shows the 4 quantiles, mean, medians etc.
Consider bypassing the package and use its data source, namely Bing's Calculate a Route API which interfaces to http://dev.virtualearth.net for json feeds per parameters. On closer read, the GitHub source code looks heavy with vector and matrix manipulation that proves heavy in processing. Simply a json feed needs to be parsed for distance and time data points. Below uses the jsonlite library to send same parameters as package to build urls iteratively with each pair of Lat/Lon for waypoints. Once json feeds are imported, the needed dataframes are extracted into list. Do note: a Bing Maps API key is required which should have been per package requirements. library(jsonlite) BingMapsAPIkey <- "*****" dfList <- lapply(seq(38753:100000), function(i) { url <- paste0("http://dev.virtualearth.net/REST/v1/Routes?wayPoint.1=", gsub(" ", ",", LL$latlon[i]) , "&wayPoint.2=", gsub(" ", ",", LL$latlon_end[i]), "&maxSolutions=1&optimize=time&routePathOutput=Points&distanceUnit=km&travelMode=Driving", "&key=", BingMapsAPIkey) tryCatch({ jsondata <- fromJSON(url) return(jsondata$resourceSets$resources[[1]]$routeLegs[[1]]$routeSubLegs[[1]][c("travelDistance", "travelDuration")]) }, error=function(e) return(data.frame(travelDistance=NA, travelDuration=NA))) }) # ROW BIND DATAFRAME ELEMENTS IN LIST geodf <- do.call(rbind, dfList) # COLUMN BIND TO ORIGINAL DATAFRAME df <- cbind(LL[38753:100000,], geodf) Output (using above posted Lat/Lon data) # latlon latlon_end travelDistance travelDuration # 1 52.481466 13.317647 52.518811 13.413034 9.030 1338 # 2 52.518811 13.413034 52.504182 13.318051 8.148 1269 # 3 52.504182 13.318051 52.502236 13.305396 1.694 254 # 4 52.502236 13.305396 52.548096 13.355104 11.700 820 # 5 52.548096 13.355104 52.569865 13.410967 5.966 919 # 6 52.569865 13.410967 52.54505 13.419071 3.110 576 # 7 52.54505 13.419071 52.527736 13.378182 3.851 728 # 8 52.527736 13.378182 52.495678 13.343019 6.196 1051 # 9 52.495678 13.343019 52.496712 13.341767 0.986 277 # 10 52.496712 13.341767 52.458631 13.32529 6.129 947
Why does R add an "x" when renaming raster stack layers
I have a raster stack/brick in R containing 84 layers and I am trying to name them according to year and month from 199911 to 200610 (November 1999 to October 2006). However for some reason R keeps adding an "X" onto the beginning of any names I give my layers. Does anyone know why this is happening and how to fix it? Here are some of the ways I've tried: # Import raster brick rast <- brick("rast.tif") names(rast)[1:3] [1] "MonthlyRainfall.1" "MonthlyRainfall.2" "MonthlyRainfall.3" ## Method 1 names(rast) <- paste0(rep(1999:2006, each=12), 1:12)[11:94] names(rast)[1:3] [1] "X199911" "X199912" "X20001" ## Method 2 # Create a vector of dates dates <- format(seq(as.Date('1999/11/1'), as.Date('2006/10/1'), by='month'), '%Y%m') dates[1:3] [1] "199911" "199912" "200001" # Set names rast <- setNames(rast, dates) names(rast)[1:3] [1] "X199911" "X199912" "X200001" ## Method 3 names(rast) <- paste0("", dates) names(rast)[1:3] [1] "X199911" "X199912" "X200001" ## Method 4 substr(names(rast), 2, 7)[1:3] [1] "199911" "199912" "200001" names(rast) <- substr(names(rast), 2, 7) names(rast)[1:3] [1] "X199911" "X199912" "X200001" To some extent I have been able to work around the problem by adding "X" to the beginning of some of my other data but now its reached the point where I can't do that any more. Any help would be greatly appreciated!
R won't allow the column to begin with a numeral so it prepends a character to avoid that restriction.
Sum aggregation in elastic search
I have a data (df) in this format. I need to covert the Time stamp (tweetCreatedAt) into a date object so that I can further manipulate the data. tweetCreatedAt comment_text 1 2014-05-17T00:00:49.000Z #truthout: India Elects Hard-Right Hindu 2 2014-05-17T00:00:49.000Z Narendra Modi is welcome to visit US ! Any help? I have tried the following df[,1] <- lapply(df[,1],function(x) as.POSIXct(x, '%Y-%m-%dT%H:%M:%S')) But now I'm getting the dates only and not the actual time.
Not sure if this is the problem, but it's a possible one. As I've mentioned in my comment, the elements of a column could be values, or lists, due to the process that generated this dataset. Check this example: # simplified example dt = read.table(text = "tweetCreatedAt comment_text 1 2014-05-17T00:00:49.000Z #truthout 2 2014-05-19T00:00:49.000Z Narendra", header=T) dt$tweetCreatedAt = as.character(dt$tweetCreatedAt) # data set looks like dt # tweetCreatedAt comment_text # 1 2014-05-17T00:00:49.000Z #truthout # 2 2014-05-19T00:00:49.000Z Narendra as.POSIXct(dt$tweetCreatedAt, format='%Y-%m-%dT%H:%M:%S') # [1] "2014-05-17 00:00:49 BST" "2014-05-19 00:00:49 BST" # let's manually change this element to a list dt$tweetCreatedAt[2] = list(c("2014-05-19T00:00:49.000Z","2014-05-20T00:00:49.000Z")) # data set now looks like this dt # tweetCreatedAt comment_text # 1 2014-05-17T00:00:49.000Z #truthout # 2 2014-05-19T00:00:49.000Z, 2014-05-20T00:00:49.000Z Narendra as.POSIXct(dt$tweetCreatedAt, format='%Y-%m-%dT%H:%M:%S') # Error in as.POSIXct.default(dt$tweetCreatedAt, format = "%Y-%m-%dT%H:%M:%S") : # do not know how to convert 'dt$tweetCreatedAt' to class “POSIXct”
R - Plotting netcdf climate data
I have been trying plot the following gridded netcdf file: "air.1999.nc" found at the following website: http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.html I have tried the code below based on answers I have found here and elsewhere, but no luck. library(ncdf); temp.nc <- open.ncdf("air.1999.nc"); temp <- get.var.ncdf(temp.nc,"air"); temp.nc$dim$lon$vals -> lon temp.nc$dim$lat$vals -> lat lat <- rev(lat) temp <- temp[nrow(temp):1,] temp[temp==-32767] <- NA temp <- t(temp) image(lon,lat,temp) library(maptools) data(wrld_simpl) plot(wrld_simpl, add = TRUE) This code was from modified from the one found here: The variable from a netcdf file comes out flipped Does anyone have any ideas or experience with using these type of netcdf files? Thanks
In the question you linked the whole part from lat <- rev(lat) to temp <- t(temp) was very specific to that particular OP dataset and have absolutely no universal value. temp.nc <- open.ncdf("~/Downloads/air.1999.nc") temp.nc [1] "file ~/Downloads/air.1999.nc has 4 dimensions:" [1] "lon Size: 144" [1] "lat Size: 73" [1] "level Size: 12" [1] "time Size: 365" [1] "------------------------" [1] "file ~/Downloads/air.1999.nc has 2 variables:" [1] "short air[lon,lat,level,time] Longname:Air temperature Missval:32767" [1] "short head[level,time] Longname:Missing Missval:NA" As you can see from these informations, in your case, missing values are represented by the value 32767 so the following should be your first step: temp <- get.var.ncdf(temp.nc,"air") temp[temp=="32767"] <- NA Additionnaly in your case you have 4 dimensions to your data, not just 2, they are longitude, latitude, level (which I'm assuming represent the height) and time. temp.nc$dim$lon$vals -> lon temp.nc$dim$lat$vals -> lat temp.nc$dim$time$vals -> time temp.nc$dim$level$vals -> lev If you have a look at lat you see that the values are in reverse (which image will frown upon) so let's reverse them: lat <- rev(lat) temp <- temp[, ncol(temp):1, , ] #lat being our dimension number 2 Then the longitude is expressed from 0 to 360 which is not standard, it should be from -180 to 180 so let's change that: lon <- lon -180 So now let's plot the data for a level of 1000 (i. e. the first one) and the first date: temp11 <- temp[ , , 1, 1] #Level is the third dimension and time the fourth. image(lon,lat,temp11) And then let's superimpose a world map: library(maptools) data(wrld_simpl) plot(wrld_simpl,add=TRUE)
read.table and files with excess commas
I am trying to import a CSV file into R using the read.table command. I keep getting the error message "more columns than column names", even though I have set the strip.white to TRUE. The program that makes the csv files adds a large number of comma characters to the end of each line, which I think is the source of the extra columns. read.table("filename.csv", sep=",", fill=T, header=TRUE, strip.white = T, as.is=T,row.names = NULL, quote = "") How can I get R to strip away the extraneous columns of commas from the header line and from the rest of the CSV file as it reads it into the R console? Also, numerous cells in the csv file do not contain any data. Is it possible to get R to fill in these empty cells with "NA"? The first two lines of the csv file: Document_Name,Sequence_Name,Track_Name,Type,Name,Sequence,Minimum,Min_(with_gaps),Maximum,Max_(with_gaps),Length,Length_(with_gaps),#_Intervals,Direction,Average_Quality,Coverage,modified_by,Polymorphism_Type,Strand-Bias,Strand-Bias_>50%_P-value,Strand-Bias_>65%_P-value,Variant_Frequency,Variant_Nucleotide(s),Variant_P-Value_(approximate),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, Chr2_FT,Chr2,Chr2.bed,CDS,10000_ARHGAP15,GAAAGAATCATTAACAGTTAGAAGTTGATG-AAGTTTCAATAACAAGTGGGCACTGAGAGAAAG,55916421,56019336,55916483,56019399,63,64,1,forward,,,User,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
You can use a combination of colClasses with "NULL" entries to "blank-out" the commas (also still needing , fill=TRUE: read.table(text="1,2,3,4,5,6,7,8,,,,,,,,,,,,,,,,,, 9,9,9,9,9,9,9,9,,,,,,,,,,,,,,,,,", sep=",", fill=TRUE, colClasses=c(rep("numeric", 8), rep("NULL", 30)) ) #------------------ V1 V2 V3 V4 V5 V6 V7 V8 1 1 2 3 4 5 6 7 8 2 9 9 9 9 9 9 9 9 Warning message: In read.table(text = "1,2,3,4,5,6,7,8,,,,,,,,,,,,,,,,,,\n9,9,9,9,9,9,9,9,,,,,,,,,,,,,,,,,", : cols = 26 != length(data) = 38 I needed to add back in the missing linefeed at the end of the first line. (Yet another reason why you should edit questions rather than putting data examples in the comments.) There was an octothorpe in the header which required the comment.char be set to "": read.table(text="Document_Name,Sequence_Name,Track_Name,Type,Name,Sequence,Minimum,Min_(with_gaps),Maximum,Max_(with_gaps),Length,Length_(with_gaps),#_Intervals,Direction,Average_Quality,Coverage,modified_by,Polymorphism_Type,Strand-Bias,Strand-Bias_>50%_P-value,Strand-Bias_>65%_P-value,Variant_Frequency,Variant_Nucleotide(s),Variant_P-Value_(approximate),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,\nChr2_FT,Chr2,Chr2.bed,CDS,10000_ARHGAP15,GAAAGAATCATTAACAGTTAGAAGTTGATG-AAGTTTCAATAACAAGTGGGCACTGAGAGAAAG,55916421,56019336,55916483,56019399,63,64,1,forward,,,User,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,", header=TRUE, colClasses=c(rep("character", 24), rep("NULL", 41)), comment.char="", sep=",") Document_Name Sequence_Name Track_Name Type Name 1 Chr2_FT Chr2 Chr2.bed CDS 10000_ARHGAP15 Sequence Minimum Min_.with_gaps... Maximum 1 GAAAGAATCATTAACAGTTAGAAGTTGATG-AAGTTTCAATAACAAGTGGGCACTGAGAGAAAG 55916421 56019336 55916483 Max_.with_gaps. Length Length_.with_gaps. X._Intervals Direction Average.._Quality Coverage modified_by 1 56019399 63 64 1 forward User Polymorphism_Type Strand.Bias Strand.Bias_.50._P.va..lue Strand.Bias_.65._P.value Variant_Frequency 1 Variant_Nucleotide.s. Variant_P.Va..lue_.approximate. 1 If you know what your colClasses will be, then you can get missing values to be NA in the numeric columns automatically. You could also use the na.strings setting to accomplish this. You could also do some editing on the header to take out the illegal characters in the column names. (I didn't think I needed to be the one to do that though.) read.table(text="Document_Name,Sequence_Name,Track_Name,Type,Name,Sequence,Minimum,Min_(with_gaps),Maximum,Max_(with_gaps),Length,Length_(with_gaps),#_Intervals,Direction,Average_Quality,Coverage,modified_by,Polymorphism_Type,Strand-Bias,Strand-Bias_>50%_P-value,Strand-Bias_>65%_P-value,Variant_Frequency,Variant_Nucleotide(s),Variant_P-Value_(approximate),,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, Chr2_FT,Chr2,Chr2.bed,CDS,10000_ARHGAP15,GAAAGAATCATTAACAGTTAGAAGTTGATG-AAGTTTCAATAACAAGTGGGCACTGAGAGAAAG,55916421,56019336,55916483,56019399,63,64,1,forward,,,User,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,", header=TRUE, colClasses=c(rep("character", 24), rep("NULL", 41)), comment.char="", sep=",", na.strings="") #------------------------------------------------------ Document_Name Sequence_Name Track_Name Type Name 1 Chr2_FT Chr2 Chr2.bed CDS 10000_ARHGAP15 Sequence Minimum Min_.with_gaps... Maximum 1 GAAAGAATCATTAACAGTTAGAAGTTGATG-AAGTTTCAATAACAAGTGGGCACTGAGAGAAAG 55916421 56019336 55916483 Max_.with_gaps. Length Length_.with_gaps. X._Intervals Direction Average.._Quality Coverage modified_by 1 56019399 63 64 1 forward <NA> <NA> User Polymorphism_Type Strand.Bias Strand.Bias_.50._P.va..lue Strand.Bias_.65._P.value Variant_Frequency 1 <NA> <NA> <NA> <NA> <NA> Variant_Nucleotide.s. Variant_P.Va..lue_.approximate. 1 <NA> <NA>
I have been fiddling with the first two lines of your file, and the problem appears to be the # in one of your column names. read.table treats # as a comment character by default, so it reads in your header, ignores everything after # and returns 13 columns. You will be able to read in your file with read.table using the argument comment.char="". Incidentally, this is yet another reason why those who ask questions should include examples of the files/datasets they are working with.