Rayshader: Rendered polygons don't align with the surface height - r

this is my first post and i will try to describe my problem as exactly as i can without writing a novel. Also since english is not my native language please forgive any ambiguities or spelling errors.
I am currently trying out the rayshader package for R in order to visualise several layers and create a representation of georeferenced data from Berlin. The data i got is a DEM (5m resolution) and a GEOJSON including a building layer including information of the building heights, a water layer and a tree layer including tree heights.
For now only the DEM and the building layer are used.
I can render the DEM without any problems. The buildingpolygons are also getting extruded and rendered, but their foundation height does not coincide with the corresponding height that should be read from the elevation matrix created from the DEM.
I expected the polygons to be placed correctly and "stand" on the rendered surface, but most of them clip through said surface or are stuck inside the ground layer. My assumption is, that i use a wrong function for my purpose - the creator of the package uses render_multipolygonz() for buildings as can be seen here timecode 12:49. I tried that, but it just renders an unextruded continuous polygon on my base layer underneath the ground.
Or that i am missing an Argument of the render_polygons() function.
It could also be quite possible, that i am producing a superficial calling or assignment error, since i am all but an expert in R. I am just starting my coding journey.
Here is my code:
#set wd to save location
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
#load libs
library(geojsonR)
library(rayshader)
library(raster)
library(sf)
library(rgdal)
library(dplyr)
library(rgl)
#load DEM
tempel_DOM <- raster("Daten/Tempelhof_Gelaende_5m_25833.tif")
#load buildings layer from GEOJSON
buildings_temp <-
st_read(dsn = "Daten/Tempelhof_GeoJSON_25833.geojson", layer = "polygon") %>%
st_transform(crs = st_crs(tempel_DOM)) %>%
filter(!is.na(bh))
#create elevation matrix from DEM
tempel_elmat <- raster_to_matrix(tempel_DOM)
#Tempelhof Render
tempel_elmat %>%
sphere_shade(texture = "imhof1") %>%
add_shadow(ray_shade(tempel_elmat), 0.5) %>%
plot_3d(
tempel_elmat,
zscale = 5,
fov = 0,
theta = 135,
zoom = 0.75,
phi = 45,
windowsize = c(1000, 800),
)
render_polygons(
buildings_temp,
extent = extent(tempel_DOM),
color = 'hotpink4',
parallel = TRUE,
data_column_top = 'bh',
clear_previous = T,
)
The structure of my buildings_temp using str() is:
> str(buildings_temp)
Classes ‘sf’ and 'data.frame': 625 obs. of 11 variables:
$ t : int 1 1 1 1 1 1 1 1 1 1 ...
$ t2 : int NA NA NA NA NA NA NA NA NA NA ...
$ t3 : int NA NA NA NA NA NA NA NA NA NA ...
$ t4 : int NA NA NA NA NA NA NA NA NA NA ...
$ t1 : int 1 4 1 1 1 1 1 1 1 1 ...
$ bh : num 20.9 2.7 20.5 20.1 19.3 20.9 19.7 19.8 19.6 17.8 ...
$ t5 : int NA NA NA NA NA NA NA NA NA NA ...
$ t6 : int NA NA NA NA NA NA NA NA NA NA ...
$ th : num NA NA NA NA NA NA NA NA NA NA ...
$ id : int 261 262 263 264 265 266 267 268 269 270 ...
$ geometry:sfc_MULTIPOLYGON of length 625; first list element: List of 1
..$ :List of 1
.. ..$ : num [1:12, 1:2] 393189 393191 393188 393182 393177 ...
..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA
..- attr(*, "names")= chr [1:10] "t" "t2" "t3" "t4" ...
Thanks in advance for any help.
Cheers WiTell

Related

Reading and Writing CSV file in R

I want to update specific rows in a CSV file that has dates with a data frame that I created in R.
01/04/20, Asset, Position, Price, Mark-to-Market
0, PORTFOLIO, NA, NA, 1000000
1, CASH, NA, NA, 1000000
02/04/20, Asset, Position, Price, Mark-to-Market, Position prior, Transaction, TC spent
0, PORTFOLIO, NA, NA, 999231, NA, NA, NA
1, CASH, NA, NA, 509866, NA, NA, NA
2, FUTURES, 500, 2516, 1258250, 0, 500, 629
3, VXc1, -5931, 47, -279795, 0, -5931, 140
, Total, Buys:, 1, Sells:, 1, TC spent:, 769
There are approximately 1000+ rows.
However, I am unable to read this CSV file using the following codes.
Can anyone help me with this?
df4 <- read.csv("filename.csv")
Further, I have to add two columns (2 and 3) from df3 mentioned below in the rows of df4 that have dates (except the first row). Can anyone help me with this as well?
The code to get df3 is as follows. However, I don't know how to add the rows to df4 selectively in R.
df1 <- read.csv("filename1.csv")
df2 <- read.csv("filename2.csv")
df3 <- cbind(df2[,c(1)], df1[,c(3)], df2[,c(3)])
I'm not sure what you need for your second question, but to address the first:
txt <- readLines("filename.csv")
# Warning in readLines("filename.csv") :
# incomplete final line found on 'filename.csv'
multidf <- by(txt, cumsum(!grepl("\\S", txt)),
FUN = function(x) read.csv(text = x, strip.white = TRUE))
multidf
# cumsum(!grepl("\\S", txt)): 0
# X01.04.20 Asset Position Price Mark.to.Market
# 1 0 PORTFOLIO NA NA 1000000
# 2 1 CASH NA NA 1000000
# ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# cumsum(!grepl("\\S", txt)): 1
# X02.04.20 Asset Position Price Mark.to.Market Position.prior Transaction TC.spent
# 1 0 PORTFOLIO <NA> NA 999231 NA <NA> NA
# 2 1 CASH <NA> NA 509866 NA <NA> NA
# 3 2 FUTURES 500 2516 1258250 0 500 629
# 4 3 VXc1 -5931 47 -279795 0 -5931 140
# 5 NA Total Buys: 1 Sells: 1 TC spent: 769
The multidf object is technically a "by"-class object, but that's really just a glorified list:
str(multidf)
# List of 2
# $ 0:'data.frame': 2 obs. of 5 variables:
# ..$ X01.04.20 : int [1:2] 0 1
# ..$ Asset : chr [1:2] "PORTFOLIO" "CASH"
# ..$ Position : logi [1:2] NA NA
# ..$ Price : logi [1:2] NA NA
# ..$ Mark.to.Market: int [1:2] 1000000 1000000
# $ 1:'data.frame': 5 obs. of 8 variables:
# ..$ X02.04.20 : int [1:5] 0 1 2 3 NA
# ..$ Asset : chr [1:5] "PORTFOLIO" "CASH" "FUTURES" "VXc1" ...
# ..$ Position : chr [1:5] NA NA "500" "-5931" ...
# ..$ Price : int [1:5] NA NA 2516 47 1
# ..$ Mark.to.Market: chr [1:5] "999231" "509866" "1258250" "-279795" ...
# ..$ Position.prior: int [1:5] NA NA 0 0 1
# ..$ Transaction : chr [1:5] NA NA "500" "-5931" ...
# ..$ TC.spent : int [1:5] NA NA 629 140 769
From here, you can keep it as a list (can be good, see https://stackoverflow.com/a/24376207/3358227) or try to combine into a single frame (the same link has info for that, too).

How to convert in Date format the columns of a particular excel file?

I have an excel file with 77 columns (with 43 NA columns) of different length, 12 of which are Date. Ideally, I want to import it in R the dataset with the columns that refer to Date in date format, while the other columns in numeric format. There is lot of material in stackoverflow and I tried all the options but it is not working.
The first option would be to do it directly from excel:
dataset <- read_xlsx("Data.xlsx", col_types = "numeric") #it gives everything numeric but column date always in this format "36164"
#I also tried something like this:
dataset <- read_xlsx("Data.xlsx", col_types = c("date", rep("numeric", n))) #where "n" stands for all the columns with numbers I have but it did not work
I can import the data with the incorret date columns. After some cleaning (removing NA columns) I get a tbl with different column length. I tried the following codes to transform the incorrect column dates into date format:
dataset <- janitor::remove_empty(dataset, which = "cols") #remove NA columns
dataset <- dataset[-c(1),] #remove the first row of all columns
# Now using this command I could transform each incorrect date column into a date format:
date <- as.Date(as.numeric(dataset$column1), origin = "1899-12-30")
# I would like to do it for all the date columns in one shot but when I try to do it in this way
as.Date(as.numeric(dataset[,c(1,3,5,7,14,16,18,20,21,23,25,32)]), origin = "1899-12-30")
# I get an error, probably because the columns have different length
# the error is: Error in as.Date(as.numeric(var_dataset[, c(1, 3, 5, 7, 14, 16, 18, 20, :
'list' object cannot be coerced to type 'double'
# unlisting the object doesn't solve the problem
I am aware it is missing data to reproduce my problem but in the first scenario I don't know how to approximate my quite big excel file while in the second case I don't know how to create a tbl with many columns of different length without wasting lot of time. Sorry.
Do you have any solution? Either for importing directly from Excel or playing with the dataframe
Thanks so much
I attach here the structure of my dataset:
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 5500 obs. of 77 variables:
$ Name...1 : chr "Code" "36164" "36165" "36166" ...
$ VSTOXX VOLATILITY INDEX - PRICE INDEX : chr "VSTOXXI(PI)" "18.2" "29.69" "25.17" ...
$ ...3 : logi NA NA NA NA NA NA ...
$ ...4 : logi NA NA NA NA NA NA ...
$ ...5 : logi NA NA NA NA NA NA ...
$ ...6 : logi NA NA NA NA NA NA ...
$ Name...7 : chr "Code" "36799" "36830" "36860" ...
$ EM COMPOSITE INDICATOR OF SOVEREIGN STRESS: GDP WEIGHTS NADJ : chr "EMEBSCGWR" "7.8255999999999992E-2" "8.9886999999999995E-2" "8.0714999999999995E-2" ...
$ ...9 : logi NA NA NA NA NA NA ...
$ Name...10 : chr "Code" "36168" "36175" "36182" ...
$ CISS BOND MKT: GOV & NFC VOLATILITY - ECONOMIC SERIES : chr "EMCIBMG" "4.4651999999999997E-2" "6.6535999999999998E-2" "4.9789E-2" ...
$ ...12 : logi NA NA NA NA NA NA ...
$ Name...13 : chr "Code" "36168" "36175" "36182" ...
$ CISS MONEY MKT: 3M RATE+ VOLATILITY - ECONOMIC SERIES : chr "EMECM3E" "5.7435999999999994E-2" "7.463199999999999E-2" "7.2263999999999995E-2" ...
$ CISS FX MKT: EUR VOLATILITY - ECONOMIC SERIES : chr "EMECFEM" "7.2139999999999996E-2" "8.6049E-2" "4.5948999999999997E-2" ...
$ CISS FIN INTERM: BANK+ VOLATILITY - ECONOMIC SERIES : chr "EMCIFIN" "4.5384999999999995E-2" "0.11820399999999999" "0.11516499999999999" ...
$ CISS NF EQUITY: VOLATILITY - ECONOMIC SERIES : chr "EMCIEMN" "7.7453999999999995E-2" "0.12733" "0.11918899999999999" ...
$ CISS: CROSS SUBINDEXCORRELATION - ECONOMIC SERIES : chr "EMCICRO" "-0.21210999999999999" "-0.29791000000000001" "-0.2369" ...
$ SYSTEMIC STRESS COMPINDICATOR - ECONOMIC SERIES : chr "EMCISSI" "8.4954000000000002E-2" "0.174844" "0.16546" ...
$ ...20 : logi NA NA NA NA NA NA ...
$ ...21 : logi NA NA NA NA NA NA ...
$ ...22 : logi NA NA NA NA NA NA ...
$ ...23 : logi NA NA NA NA NA NA ...
$ ...24 : logi NA NA NA NA NA NA ...
$ ...25 : logi NA NA NA NA NA NA ...
$ Name...26 : chr "Code" "33253" "33284" "33312" ...
$ Z8 IPI: MFG., VOLUME INDEX OF PRODUCTION, 2015=100 (WDA) VOLA: chr "Z8ES493KG" "81" "79.7" "79.400000000000006" ...
$ ...28 : logi NA NA NA NA NA NA ...
$ ...29 : logi NA NA NA NA NA NA ...
$ ...30 : logi NA NA NA NA NA NA ...
$ ...31 : logi NA NA NA NA NA NA ...
$ ...32 : logi NA NA NA NA NA NA ...
$ ...33 : logi NA NA NA NA NA NA ...
$ ...34 : logi NA NA NA NA NA NA ...
$ Name...35 : chr "Code" "35779" "35810" "35841" ...
$ EH HICP: ALL-ITEMS NADJ : chr "EHES795WR" "1.7" "1.6" "1.6" ...
$ ...37 : logi NA NA NA NA NA NA ...
$ ...38 : logi NA NA NA NA NA NA ...
$ Name...39 : chr "Code" "35110" "35139" "35170" ...
$ EH HICP: ALL-ITEMS (%MOM) NADJ : chr "EHESPQ93R" "0.4" "0.4" "0.3" ...
$ ...41 : logi NA NA NA NA NA NA ...
$ ...42 : logi NA NA NA NA NA NA ...
$ ...43 : logi NA NA NA NA NA NA ...
$ Name...44 : chr "Code" "35445" "35476" "35504" ...
$ EH HICP: ALL-ITEMS HICP (%YOY) NADJ : chr "EHESAKZER" "2.2000000000000002" "2" "1.7" ...
$ ...46 : logi NA NA NA NA NA NA ...
$ ...47 : logi NA NA NA NA NA NA ...
$ ...48 : logi NA NA NA NA NA NA ...
$ ...49 : logi NA NA NA NA NA NA ...
$ Name...50 : chr "Code" "36206" "36234" "36265" ...
$ EM EUROSYSTEM: BASE MONEY CURN : chr "EMEBSMYBA" "426.64374199999997" "430.51499999999999" "432.34064499999999" ...
$ ...52 : logi NA NA NA NA NA NA ...
$ ...53 : logi NA NA NA NA NA NA ...
$ ...54 : logi NA NA NA NA NA NA ...
$ ...55 : logi NA NA NA NA NA NA ...
$ Name...56 : chr "Code" "35703" "35734" "35762" ...
$ EM EUROSYSTEM: TOTAL ASSETS/LIABILITIES (EP) CURN : chr "EMECBSALA" "710257.53500000003" "711193.47100000002" "714957.58900000004" ...
$ ...58 : logi NA NA NA NA NA NA ...
$ ...59 : logi NA NA NA NA NA NA ...
$ ...60 : logi NA NA NA NA NA NA ...
$ ...61 : logi NA NA NA NA NA NA ...
$ ...62 : logi NA NA NA NA NA NA ...
$ ...63 : logi NA NA NA NA NA NA ...
$ Name...64 : chr "Code" "41548" "41579" "41609" ...
$ TR EU FWD INFL-LKD SWAP 10YF20Y - MIDDLE RATE : chr "TREFSTT" NA NA NA ...
$ TR EU FWD INFL-LKD SWAP 10YF10Y - MIDDLE RATE : chr "TREFS1T" NA NA NA ...
$ TR EU FWD INFL-LKD SWAP 2YF2Y - MIDDLE RATE : chr "TREFS22" "1.5158" "1.4669000000000001" "1.4715" ...
$ TR EU FWD INFL-LKD SWAP 1YF1Y - MIDDLE RATE : chr "TREFS11" "1.4509000000000001" "1.2338" "1.1225000000000001" ...
$ TR EU FWD INFL-LKD SWAP 2YF3Y - MIDDLE RATE : chr "TREFS23" "1.5906000000000002" "1.5453000000000001" "1.5283000000000002" ...
$ TR EU FWD INFL-LKD SWAP 5YF10Y - MIDDLE RATE : chr "TREFS5T" "2.3516000000000004" "2.3323" "2.3070000000000004" ...
$ ...71 : logi NA NA NA NA NA NA ...
$ ...72 : logi NA NA NA NA NA NA ...
$ ...73 : logi NA NA NA NA NA NA ...
$ ...74 : logi NA NA NA NA NA NA ...
$ ...75 : logi NA NA NA NA NA NA ...
$ Name...76 : chr "Code" "41255" "41286" "41317" ...
$ TR EU FWD INFL-LKD SWAP 5YF5Y - MIDDLE RATE : chr "TREFS55" "2.2027000000000001" "2.2637" "2.383" ...
You have to specify the col_types correctly in the read_excel (or read_xlsx) command. For example:
dataset <- read_xlsx("Data.xlsx",
col_types=c("numeric","date","numeric","date","numeric", "date", ...))
Edit: Finally after much interrogation, the problem is that your data starts in row 3, not 2. So skip the first row (skip=1) and try again.
dataset <- read_xlsx("Data.xlsx", skip=1)
edit: While this will most likely solve the error you're getting, I agree with Edward's advice to use readxl::read_excel which should preserve the dates.
The problem with
as.Date(as.numeric(dataset[,c(1,3,5,7,14,16,18,20,21,23,25,32)]), origin = "1899-12-30")
is that you apply as.numeric on a tibble which internally is a list. Instead do
dplyr::mutate_at(
dataset,
c(1,3,5,7,14,16,18,20,21,23,25,32),
dplyr::funs(as.numeric, as.Date),
origin = "1899-12-30",
format = "%Y-%m-%d"
)
You say the columns have a different length but that's not possible in R's table-like structures (tibble, data.frame, data.table).
Lesson: Always be aware what datatype you're working with doing e.g. str(dataset). as.numeric does not work on tables but needs to be applied to specific columns, using e.g. mutate.

dplyr Mutate Creating Matrix Instead of Vector

I am creating a new column that looks at conditions in my data frame and alerts me whether an issue needs to be investigated or monitored. The code to add the column looks like this:
library(dplyr)
df %>%
mutate("Status" =
ifelse(apply(.[2:7], 1, sum) > 0 & .[8] > 0, "Investigate",
"Monitor"
)
)
If I run the command class(df$Status) on this newly generated column the class is listed as 'matrix'. What? Why isn't it listed as 'character'.
If I look at the structure of my data frame there's some oddity that may be the key, but I don't understand why. Notice that the first columns listed simply look like intergers, then the third column listed, which is the same data, has all this 'attr' phrasing. What is going on?
$ 2017-08 : int NA 1 NA 1 1 2 NA NA NA NA ...
$ 2017-09 : int NA NA 1 NA NA NA NA NA NA NA ...
$ 2017-10 : int NA NA NA NA NA NA 1 NA NA NA ...
- attr(*, "vars")= chr "Material"
- attr(*, "drop")= logi TRUE
- attr(*, "indices")=List of 34
..$ : int 0
..$ : int 1
..$ : int 2
..$ : int 3
..$ : int 4
...continued...
- attr(*, "group_sizes")= int 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "biggest_group_size")= int 1
- attr(*, "labels")='data.frame': 34 obs. of 1 variable:
I grouped variables earlier and sometimes ungrouping magically helps. In addition I often have to convert tibbles back to data frames to get other routines to work in my code. This may or may not be related.

Linear regresion of rectangular table against one set of values

I have a rectangular table with three variables: country, year and inflation. I already have all the descriptives I can have, now I need to do some analytics, and figured that I should do some linear regression against a target country. The best idea I had was to create a new variable called inflation.in.country.x and loop through the inflation of x in this new column but that seems somehow unclean solution.
How to get a linear regression of a rectangular data table? The structure is like this:
> dat %>% str
'data.frame': 1196 obs. of 3 variables:
$ Country.Name: Factor w/ 31 levels "Albania","Armenia",..: 9 8 10 11 12 14 15 16 17 19 ...
$ year : chr "1967" "1967" "1967" "1967" ...
$ inflation : num 1.238 8.328 3.818 0.702 1.467 ...
I want to take Armenia inflation as dependent variable and Albania as independent to get a linear regression. It is possible without transforming the data and keeping the years coherent?
One way is to spread your data table using Country.Name as key:
dat.spread <- dat %>% spread(key="Country.Name", value="inflation")
dat.spread %>% str
'data.frame': 50 obs. of 31 variables:
$ year : chr "1967" "1968" "1969" "1970" ...
$ Albania : num NA NA NA NA NA NA NA NA NA NA ...
$ Armenia : num NA NA NA NA NA NA NA NA NA NA ...
$ Brazil : num NA NA NA NA NA NA NA NA NA NA ...
[...]
But that forces you to transform the data which may seem undesirable. Afterwards, you can simply use cbind to do the linear regression against all countries:
lm(cbind(Armenia, Brazil, Colombia, etc...) ~ Albania, data = dat.spread)

why when I remove specific rows, my output is all NA?

I have a data that I uploaded it here
https://gist.github.com/anonymous/0bc36ec5f46757de7c2c
I load it in R using following command
df <- read.delim("path to the data", header=TRUE, sep="\t", fill=TRUE, row.names=1, stringsAsFactors=FALSE, na.strings='')
Then I check for a specific column to see how many + are there like this
length(which(df$Potential.contaminant == "+"))
which shows 9 in this cas. Then I try to remove all the rows that the + is in that row using the following command
Newdf <- df[df$Potential.contaminant != "+", ]
The output is all NA. what is wrong ?? what do I do wrong here ?
As #akrun suggested I have tried many different ways to do it but without success
df[!grepl("[+]", df$Potential.contaminant),]
df[ is.na(df$Potential.contaminant),]
subset(df, Potential.contaminant != "+")
df[-(which(df$Potential.contaminant == "+")),]
None of above commands could solve it. One idea was that the Potential.contaminant has NA and that is the reason. I replaced all NA with zero using
df[c("Potential.contaminant")][is.na(df[c("Potential.contaminant")])] <- 0
but still the same.
copy pasted your gist in a file c:/input.txt and then used your code:
df <- read.delim("c:/input.txt", header=TRUE, sep="\t", fill=TRUE, row.names=1, stringsAsFactors=FALSE, na.strings='')
Now:
> str(df)
'data.frame': 21 obs. of 11 variables:
$ Intensityhenya : int 0 NA NA NA NA 0 0 0 0 0 ...
$ Only.identified.by.site: chr "+" NA NA NA ...
$ Reverse : logi NA NA NA NA NA NA ...
$ Potential.contaminant : chr "+" NA NA NA ...
$ id : int 0 NA NA NA NA 1 2 3 4 5 ...
$ IDs.1 : chr "16182;22925;28117;28534;28538;29309;36387;36889;42536;49151;49833;52792;54591;54592" NA NA NA ...
$ razor : chr "True;True;False;False;False;False;False;True;False;False;False;False;False;False" NA NA NA ...
$ Mod.IDs : chr "16828;23798;29178;29603;29607;30404;38270;38271;38793;44633;51496;52211;55280;57146;57147;57148;57149" NA NA NA ...
$ Evidence.IDs : chr "694702;694703;694704;1017531;1017532;1017533;1017534;1017535;1017536;1017537;1017538;1017539;1017540;1017541;1017542;1017543;10"| __truncated__ NA NA NA ...
$ GHSIDs : chr NA NA NA NA ...
$ BestGSFD : chr NA NA NA NA ...
If I try to subset:
> df2 <- df[is.na(df$Potential.contaminant),]
> str(df2)
'data.frame': 12 obs. of 11 variables:
$ Intensityhenya : int NA NA NA NA NA NA NA NA NA NA ...
$ Only.identified.by.site: chr NA NA NA NA ...
$ Reverse : logi NA NA NA NA NA NA ...
$ Potential.contaminant : chr NA NA NA NA ...
$ id : int NA NA NA NA NA NA NA NA NA NA ...
$ IDs.1 : chr NA NA NA NA ...
$ razor : chr NA NA NA NA ...
$ Mod.IDs : chr NA NA NA NA ...
$ Evidence.IDs : chr NA NA NA NA ...
$ GHSIDs : chr NA NA NA NA ...
$ BestGSFD : chr NA NA NA NA ...
But your datas are so crazy it's nearly impossible to visualize them so let's try something else to get the glance of it.
> colnames(df)
[1] "Intensityhenya" "Only.identified.by.site" "Reverse" "Potential.contaminant" "id" "IDs.1" "razor" "Mod.IDs"
[9] "Evidence.IDs" "GHSIDs" "BestGSFD"
Your header is a pain to follow, let's have a look at it:
IDs Intensityhenya Only identified by site Reverse Potential contaminant id IDs razor Mod.IDs Evidence IDs GHSIDs BestGSFD
Along with a line of data where long data are cut to get a glance:
CON__A2A4G1 0 + + 0 16182;[...];4592 True;[..];False 16828;[...];57149 694702;[...];2208697;
208698;[...];2441826
3;2433194;[...];4682766
I've just stripped extraneous numbers when possible and sure, keeping the tabs and newlines.
I hope you see how and why this can lead to a proper analysis of your data, do some check on your input data to sanitize them before retrying to load them in R.
For illustration purpose here is your gist with ellipsis and %T% in place of tabs:
IDs%T%Intensityhenya%T%Only identified by site%T%Reverse%T%Potential contaminant%T%id%T%IDs%T%razor%T%Mod.IDs%T%Evidence IDs%T%GHSIDs%T%BestGSFD
CON__A2A4G1%T%0%T%+%T%%T%+%T%0%T%1618[...]4592%T%Tru[...]alse%T%1682[...]7149%T%69470[...]208697;%T%%T%
20869[...]441826%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
[...]20%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
00[...]%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
1271[...]682766%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
CON__A2A5Y0%T%0%T%%T%%T%+%T%1%T%443[...]5777%T%Fals[...]rue%T%464[...]8377%T%21071[...]489947%T%40503[...]780178%T%40505[...]780175
CON__A2AB72%T%0%T%%T%%T%+%T%2%T%443[...]0447%T%Tru[...]alse%T%464[...]2842%T%21070[...]232341%T%40502[...]250729%T%40502[...]250728
CON__ENSEMBL:ENSBTAP00000014147%T%0%T%%T%%T%+%T%3%T%53270%T%TRUE%T%55779%T%238286[...]382871%T%457377[...]573778%T%4573776
CON__ENSEMBL:ENSBTAP00000024146%T%0%T%%T%%T%+%T%4%T%186[...]5835%T%Tru[...]rue%T%194[...]8438%T%8382[...]492132%T%15455[...]783465%T%15455[...]783465
CON__ENSEMBL:ENSBTAP00000024466;CON__ENSEMBL:ENSBTAP00000024462%T%0%T%%T%%T%+%T%5%T%939[...]5179%T%Tru[...]rue%T%978[...]7757%T%41149[...]468480%T%78212[...]739209%T%78217[...]739209
CON__ENSEMBL:ENSBTAP00000025008%T%0%T%+%T%%T%+%T%6%T%1564[...]8580%T%Fals[...]alse%T%1627[...]9651%T%66672[...]269215%T%125151[...]439696%T%125151[...]439691
CON__ENSEMBL:ENSBTAP00000038253%T%0%T%%T%%T%+%T%7%T%120[...]5703%T%Fals[...]alse%T%125[...]8300%T%5326[...]25602%T%%T%
;125602[...]178%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
1[...]483384%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
22838[...]23247%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
;123247[...]411%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
4[...]7%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
603[...]790126;%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
79012[...]13848%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
;413848[...]765024%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
sp|O43790|KRT86_HUMAN;CON__O43790%T%0%T%%T%%T%+%T%8%T%121[...]5716%T%Tru[...]rue%T%126[...]8315%T%5455[...]484318%T%10404[...]426334%T%
It seems like your data rows which are not marked as contaminants, have no values. The "NA" are because of the "na.strings=''" emplyed during read.delim function call. So for example, if you do:
df <- read.delim("https://gist.githubusercontent.com/anonymous/0bc36ec5f46757de7c2c/raw/517ef70ab6a68e600f57308e045c2b4669a7abfc/example.txt", header=TRUE, row.names=1, sep="\t")
df<-df[df$Potential.contaminant!='+',]
summary(df)
you should see empty cells.

Resources