Why would script call round_any despite not being explicitly called? - r

I've been struggling with this script for the past month and I still haven't been able to answer this question. I know round_any is used in the plyr package but I don't even load it. I checked all my other packages using ls("package: ") and they don't have this function. Nothing else I find online has been able to point me in the right direction. In browser () I am able to see my type is double [4] (S3:integer64). Am I better off just changing my class from integer64 or finding out how to remove round_any?
Error in `mutate()`:
! Problem while computing `..2 = across(...)`.
Caused by error in `across()`:
! Problem while computing column `Property Count`.
Caused by error in `UseMethod()`:
! no applicable method for 'round_any' applied to an object of class "integer64"
Edit:
This .r file contains all the functions and I have another .r file that calls them.
Argument/Call
market_stats_table = key_market_stats(historical_stats, historical_stats_by_class, report_quarter)
Function
total_stats = stats_combined %>%
rename(`Net Absorption` = `Net Absorption QTD - Total`,
`Net Absorption YTD` = `Net Absorption YTD - Total`,
`Construction Deliveries` = `Construction Deliveries QTD`) %>%
filter(Submarket %in% submarket_order) %>%
mutate(Submarket = factor(Submarket, levels = submarket_order, ordered = T)) %>%
arrange(Submarket) %>%
# glimpse()
mutate(across(c(`Direct Vacancy Rate`, `Overall Vacancy Rate`, `Overall Availability Rate`), scales::percent, accuracy = .1),
across(any_of(sum_vars),
scales::dollar, accuracy = 1, style_negative="parens", prefix=""),
across(any_of(c("Full Service Gross Asking Rate", "Lease Rate")),
scales::dollar)) %>%
select(Submarket, all_of(stat_order))
market_stats_table(total_stats,
cell_width = if_else(property_type == "Office", 1.05, .9),
cell_height = if_else(property_type == "Office", .33, .27),
submarket_order,
totals)
}
structure
structure(list(Market = c("Los Angeles", "0.4", NA, "0.5", "0.3",
"New York"), `Property Count` = c("New York", "0.3", NA, "0.2",
"0.9", "New York"), C = c("Chicago", "0.1", NA, "0.4", "0.3",
"DC"), D = c("DC", "0.7", NA, "0", "0.2", "DC"), e = c("Miami",
"0.8", NA, "0.2", "0.1", "Los Angeles")), row.names = c(NA, 6L
), class = "data.frame")

Related

Problem when creating a weights column in the table

Running regression with panel data on different geographical levels in the US and Euro area with weights that essentially look like this:
lm(log(POP25) ~ log(EMPLOY25), weights = weights, data = data)
The weights are the 2007 observations of POP25 for every grouping. This is the code and data for Europe. (For this dataset I don't experience any trouble.)
require(dplyr)
Data <- Data |>
group_by(NUTS_ID) |>
mutate(weights = POP25[TIME==2007])
Data 1:
structure(list(...1 = 1:6, TIME = 2007:2012, NUTS_ID = c("AT",
"AT", "AT", "AT", "AT", "AT"), NUMBER = c(1L, 1L, 1L, 1L, 1L,
1L), POP15 = c(5529.1, 5549.3, 5558.5, 5572.1, 5601.1, 5620.8
), POP20 = c(5047.1, 5063.2, 5072.6, 5090, 5127.1, 5151.9), POP25 = c(4544,
4560.7, 4571.3, 4587.8, 4621.5, 4639), EMPLOY15 = c(3863.6, 3928.7,
3909.3, 3943.9, 3982.3, 4013.4), EMPLOY20 = c(3676.2, 3737, 3723.8,
3761.9, 3802.3, 3835), EMPLOY25 = c(3333.5, 3390.4, 3384.7, 3424.6,
3454.4, 3486.4), weights = c(4544, 4544, 4544, 4544, 4544, 4544
)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L), groups = structure(list(NUTS_ID = "AT", .rows = structure(list(
1:6), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr",
"list"))), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-1L), .drop = TRUE))
However, I am not able to do the same code on data for US counties.
US_County <- US_County |>
group_by(NAME) |>
mutate(weights = POP25[year==2007])
Data 2:
structure(list(NAME = c("Ada County, Idaho", "Ada County, Idaho",
"Ada County, Idaho", "Ada County, Idaho", "Ada County, Idaho",
"Ada County, Idaho"), GEOID = c(16001, 16001, 16001, 16001, 16001,
16001), year = c(2007, 2008, 2009, 2010, 2011, 2012), POP25 = c(205888,
208506, 212770, 212272, 216058, 220856), EMPLOY25 = c(161385,
160303, 152131, 155292, 155574, 164830), State = c("Idaho", "Idaho",
"Idaho", "Idaho", "Idaho", "Idaho"), StateID = c(16, 16, 16,
16, 16, 16)), row.names = c(NA, 6L), class = "data.frame")
When doing it with the last dataset I get this error message that I can´t figure out what means.
Error in `mutate()`:
! Problem while computing
`weights = POP25[year == 2007]`.
x `weights` must be size 5 or 1,
I don´t know if there is anything wrong with the data. I have tried specifying the class so that everything should be equal across the datasets. I have also tried removing all NA observations, however with no luck.
Am I doing something wrong in my code?
Are there any other ways to do the same?

Error in FUN(X[[i]], ...) : invalid 'type' (character) of argument with stabs

I'm having issues with loading my data. If anyone can take a look, that'd be much appreciated!
Code
data <- structure(list(Date = c("2-Nov-20", "2-Nov-20", "2-Nov-20", "2-Nov-20",
"2-Nov-20", "2-Nov-20"), Cycle = c(1L, 1L, 1L, 1L, 1L, 1L), Route = c("T1",
"T1", "T1", "T1", "T1", "T1"), Waypoint = c("FQ1120", "FQ1121",
"FQ1122", "FQ1123", "FQ1127", "FQ1125"), Latitude = c("1.326983012",
"1.327218041", "1.327946009", "1.328284973", "1.329542007", "1.329018977"
), Longitude = c("103.659741", "103.659496", "103.659467", "103.65963",
"103.660734", "103.659631"), Sampling.point = c("T01_01", "T01_01",
"T01_01", "T01_02", "T01_20", "T01_02"), Latitude.1 = c(NA, NA,
NA, NA, NA, NA), Longitude.1 = c(NA, NA, NA, NA, NA, NA), Time..24h. = c("1947",
"1948", "1950", "1952", "2003", "1957"), Common.name = c("Wild pig",
"Red junglefowl", "Changeable lizard", "Savanna nightjar", "Changeable lizard",
"Yellow-vented bulbul"), Taxon = c("Mammal", "Bird", "Reptile",
"Bird", "Reptile", "Bird"), Scientific.name = c("Sus scrofa",
"Gallus gallus", "Calotes versicolor", "Caprimulgus affinis",
"Calotes versicolor", "Pycnonotus goiavier"), Global.status..IUCN.CITES. = c("Least Concern",
"Least Concern", "Not Assessed", "Least Concern", "Not Assessed",
"Least Concern"), Local.status..Davison.et.al...2008..Jain.et.al...2018.for.butterflies..Soh.et.al...2019.for.odonates. = c("Not Assessed",
"Endangered", "Not Assessed", "Not Assessed", "Not Assessed",
"Not Assessed"), Quantity = c("1", "1", "1", "1", "3", "7"),
Observation.type..seen.heard.caught.scat.other.signs. = c("Seen",
"Seen", "Seen", "Heard", "Seen", "Seen"), Photo.no. = c("",
"", "", "", "", ""), Survey.method..targeted.incidental.point.count.trapping. = c("Incidental",
"Incidental", "Targeted", "Targeted", "Targeted", "Targeted"
), Remarks = c("", "", "", "", "", ""), abundance = c(1,
1, 1, 1, 1, 1)), row.names = c(NA, 6L), class = "data.frame")
data$abundance <- 1 #add a column of 1
data.matrix <- xtabs(Quantity~Scientific.name+Sampling.point, data=data)
Error code:
Error in FUN(X[[i]], ...) : invalid 'type' (character) of argument
not sure why this is happening? any help is appreciated!
We need the 'Quantity' to be numeric. It is character class
data$Quantity <- as.numeric(data$Quantity)
xtabs(Quantity~Scientific.name+Sampling.point, data=data)
# Sampling.point
#Scientific.name T01_01 T01_02 T01_20
# Calotes versicolor 1 0 3
# Caprimulgus affinis 0 1 0
# Gallus gallus 1 0 0
# Pycnonotus goiavier 0 7 0
# Sus scrofa 1 0 0

How to subset variables which have NA in the values?

I have an imdb dataset where I would like to replace the missing values for budget and box_office_gross, for which I think using multiple imputation would be a way to replace the missing values.
In order to separate the numeric columns from the entire dataset and perform imputation, I tried to subset the variables
> NBCU_Limited <- subset(NBCU_dataLaurel_Modified, select = c(NBCU_dataLaurel_Modified$imdb_votes, NBCU_dataLaurel_Modified$runtime_min, NBCU_dataLaurel_Modified$Budget, NBCU_dataLaurel_Modified$Box_Office_Gross))
Error: NA column indexes not supported
But I get an error because there are NA values in the variables, I cannot negate the rest of the character columns because even they have NA's and I get the same error.
How do I get only these four variables out into a new dataframe so that I can perform multiple imputation on them.
Sample Dataset
Update: The error is causing because I am specifying the data.frame individually in the subset, if I do not specify data.frame and just specify the name of the variable I do not get this error. I am not sure why but that is what causes the error, so maybe this is because of my improper code.
Below is the data,
> dput(Sample)
structure(list(imdbid = c("tt6256056", "tt0085450", "tt5050772",
"tt5069876", "tt0083791", "tt0083929"), title = c("Una Famiglia",
"Doctor Detroit", "Honeytrap", "Maniac 8.2.8", "The Dark Crystal",
"Fast Times at Ridgemont High"), plot = c("N/A", "A timid college professor, conned into posing as a flamboyant pimp, finds himself enjoying his new occupation on the streets.",
"Simeon's evening goes horribly wrong when a young woman tries to pick him up.",
"Maniac: a person afflicted with mania. Mania: A manifestation of bipolar disorder, characterized by profuse and rapidly changing ideas, exaggerated sexuality, gaiety, or irritability, decreased sleep and violent abnormal behavior.",
"On another planet in the distant past, a Gelfling embarks on a quest to find the missing shard of a magical crystal, and so restore order to his world.",
"A group of Southern California high school students are enjoying their most important subjects: sex, drugs and rock n' roll."
), rating = c("N/A", "R", "N/A", "N/A", "PG", "R"), imdb_rating = c(NA,
5.1, NA, NA, 7.2, 7.2), metacritic = c(NA, NA, NA, NA, NA, 67
), dvd_release = structure(c(NA, 1126569600, NA, NA, 939081600,
1099353600), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
production = c("N/A", "Universal", "Array Releasing", "N/A",
"Sony Pictures Home Entertainment", "Universal Pictures"),
actors = c("Patrick Bruel, Fortunato Cerlino, Matilda De Angelis, Ennio Fantastichini",
"Dan Aykroyd, Howard Hesseman, Donna Dixon, Lydia Lei", "Jennifer Nelson, Daemian Greaves, Polina Vasileva, Becki Lloyd",
"Dimitra Aggelou, Giorgos Efthimiou, Stavroula Kontopoulou, Maria-Antouanetta Tatsi",
"Jim Henson, Kathryn Mullen, Frank Oz, Dave Goelz", "Sean Penn, Jennifer Jason Leigh, Judge Reinhold, Robert Romanus"
), imdb_votes = c(NA, 4492, NA, NA, 44862, 76980), poster = c("N/A",
"https://images-na.ssl-images-amazon.com/images/M/MV5BMjhjY2Q4NWEtYTUzZC00YjE2LTk0ZjktNzUyZjIwNmQ0YTkyXkEyXkFqcGdeQXVyMTQxNzMzNDI#._V1_SX300.jpg",
"N/A", "https://images-na.ssl-images-amazon.com/images/M/MV5BZjdmZTRhYzgtOGY4MS00OGM5LWJlNmItYzJiYjZiNmVmYjhkXkEyXkFqcGdeQXVyNDA2NjM2ODk#._V1_SX300.jpg",
"https://images-na.ssl-images-amazon.com/images/M/MV5BMWZlZjk1MGEtYWMzOC00N2EyLWFkOTUtZDM4NGNlY2M0YjVmXkEyXkFqcGdeQXVyNTAyODkwOQ##._V1_SX300.jpg",
"https://images-na.ssl-images-amazon.com/images/M/MV5BYzBlZjE1MDctYjZmZC00ZTJmLWFkOWEtYjdmZDZkODBkZmI2XkEyXkFqcGdeQXVyNjQ2MjQ5NzM#._V1_SX300.jpg"
), director = c("Sebastiano Riso", "Michael Pressman", "Nick Archer",
"Giorgos Efthimiou", "Jim Henson, Frank Oz", "Amy Heckerling"
), release_date = structure(c(1493596800, 421027200, 1448928000,
1431734400, 408931200, 398044800), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Year = c(2017, 1983, 2015, 2015, 1982,
1982), Year_Groups = c("2010-2020", "1980-1989", "2010-2020",
"2010-2020", "1980-1989", "1980-1989"), Month = c("May",
"May", "December", "May", "December", "August"), runtime_min = c(97,
89, NA, 15, 93, 90), genre = c("Drama", "Comedy", "Short, Thriller",
"Short, Horror", "Adventure, Family, Fantasy", "Comedy, Drama"
), awards = c("N/A", "N/A", "N/A", "1 win.", "Nominated for 1 BAFTA Film Award. Another 2 wins & 4 nominations.",
"1 win & 1 nomination."), keywords = c(NA, "pimp|college-professor|voyeurism|voyeur|blue-panties|panties|red-dress|blonde|female-frontal-nudity|female-nudity|nude-girl|nude|bare-breasts|breasts|topless-female-nudity|scantily-clad-female|cleavage|two-word-title|reference-to-joe-frazier|reference-to-yul-brynner|mother-son-relationship|f-word|place-name-in-title|city-name-in-title|dual-identity|prostitution|independent-film|title-spoken-by-character|character-name-in-title",
NA, NA, "mystic|magical-crystal|crystal-shard|sword-and-sorcery|puppetry|crystal|shard|quest|evil|monster|feeding-on-energy|hidden-entrance|giant-crystal|actor-voicing-multiple-characters|planetary-alignment|reunification|three-word-title|dark-fantasy|slow-motion-scene|vampire|surrealism|christ-allegory|cult-film|sorceress|relic|race-against-time|muppet|mission|magic|kingdom|creature|good-versus-evil|directed-by-star|epic|multiple-monsters|invented-language|slavery|orrery|puppet|mutation|darkness|destiny",
"high-school|title-directed-by-female|females-talking-about-sex|unwanted-pregnancy|fired-from-the-job|teacher-student-relationship|irreverence|sexual-awakening|innocence-lost|ensemble-film|coming-of-age|teen-movie|high-school-teacher|advice|ticket-scalping|shopping-mall|loss-of-virginity|female-nudity|brother-sister-relationship|caught-masturbating|california|surfer|teacher|break-up|rock-'n'-roll|virgin|teenager|friendship|drugs|date|surfer-dude|blond-boy|redheaded-boy|generation-x|f-rated|vomiting|sex-scene|cult-film|breasts|jeans|hawaiian-shirt|payphone|teenage-girl|teen-sex-comedy|scantily-clad-female|reference-to-led-zeppelin|dream-girl|underage-girl|jailbait|trophy-wife|voyeur|sexual-promiscuity|sexual-desire|sexual-attraction|lust|sex-on-couch|female-rear-nudity|female-frontal-nudity|panties|cheerleader-uniform|female-removes-her-clothes|cleavage|marijuana|drug-use|teen-angst|surfing|school-life|pregnancy|masturbation|football-player|first-love|employment|bikini|stoner|rock-m... <truncated>
), Budget = c(NA, 10375893, NA, NA, 1.5e+07, 4500000), Box_Office_Gross = c(2.48,
70, 70, 124, 140, 140)), .Names = c("imdbid", "title", "plot",
"rating", "imdb_rating", "metacritic", "dvd_release", "production",
"actors", "imdb_votes", "poster", "director", "release_date",
"Year", "Year_Groups", "Month", "runtime_min", "genre", "awards",
"keywords", "Budget", "Box_Office_Gross"), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
The error is causing because I am specifying the data.frame individually in the subset, if I do not specify data.frame and just specify the name of the variable I do not get this error. I am not sure why but that is what causes the error, so maybe this is because of my improper code. Thanks #Tung for pointing this out.

Using mutate and a lookup/calc funtion

I wrote a function where I pass a company name to lookup in a 2nd table a set of records, calculate a complicated result, and return the result.
I want to process all companies and add a value to each record with that result.
I am using the following code:
`aa <- mutate(companies,newcol=sum_rounds(companies$company_name))`
But I get the following warning:
Warning message:
In c("Bwom", "Symple", "TravelTriangle", "Ark Biosciences", "Artizan Biosciences", :
longer object length is not a multiple of shorter object length
(each of these is a company name)
The company dataframe gets a new column, but all values are "false" where actually there should be both true and false.
Any advice would be welcome to a newbie.
Function follows:
sum_rounds<-function(co_name) {
#get records from rounds for the company name passed to the function
#remove NAs from column roundtype too
outval<- rounds %>%
filter(company_name.x==co_name & !is.na(roundtype)) %>%
#sort by date round is announced
arrange(announced_on) %>%
select(roundtype) %>%
#create a string of all round types in order
apply(2,paste,collapse="")
#the values from mixed to "M", venture to "V" and pureangel to "A"
# now see if it is of the form aaaaa (and #) followed by m or v
# in grep: ^ is start of a line and + is for ar least one copy
# [mv] is either m or v
# nice summary is here: http://www.endmemo.com/program/R/gsub.php
#is angel2vc?
angel2vc<-grepl("^a+[mv]+",outval)
#return(list("roundcodes"=outval,"angel2vc"=angel2vc))
return(angel2vc)
}
DPUT from Companies table Follows:
structure(list(company_name = c("Bwom", "Symple", "TravelTriangle",
"Ark Biosciences", "Artizan Biosciences", "Audiense"), domain = c("b-wom.com",
"getsymple.com", "traveltriangle.com", "arkbiosciences.com",
NA, "audiense.com"), country_code = c("ESP", "USA", "USA", "CHN",
"USA", "GBR"), state_code = c(NA, "CA", "VA", NA, "NC", NA),
region = c("Barcelona", "SF Bay Area", "Washington, D.C.",
"Shanghai", "Raleigh", "London"), city = c("Barcelona", "San Francisco",
"Charlottesville", "Shanghai", "Durham", "London"), status = c("operating",
"operating", "operating", "operating", "operating", "operating"
), short_description = c("Bwom is a tool that offers a test and personalized exercises for women's intimate health.",
"Symple is the cloud platform for all your business payments. Pay, get paid, connect.",
"TravelTriangle enables travel enthusiasts to reserve a personalized holiday plan with a local travel agent.",
"Ark Biosciences is a biopharmaceutical company that is dedicated to the discovery and development",
"Artizan Biosciences", "SaaS developer delivering unique consumer insight and engagement capabilities to many of the world’s biggest brands and agencies."
), category_list = c("health care", "cloud computing|machine learning|mobile apps|mobile payments|retail technology",
"e-commerce|personalization|tourism|travel", "health care",
"biopharma", "analytics|apps|marketing|market research|social crm|social media|social media marketing"
), category_group_list = c("health care", "apps|commerce and shopping|data and analytics|financial services|hardware|internet services|mobile|payments|software",
"commerce and shopping|travel and tourism", "health care",
"biotechnology|health care|science and engineering", "apps|data and analytics|design|information technology|internet services|media and entertainment|sales and marketing|software"
), employee_count = c("1 to 10", "11 to 50", "101 to 250",
NA, "1 to 10", "51 to 100"), funding_rounds = c(2L, 1L, 4L,
2L, 2L, 5L), funding_total_usd = c(1075791, 120000, 19900000,
NA, 3e+06, 8013391), founded_on = structure(c(16555, 16770,
15156, 16071, NA, 14975), class = "Date"), first_funding_on = structure(c(16526,
17204, 15492, 16532, 17091, 15294), class = "Date"), last_funding_on = structure(c(17204,
17204, 17204, 17203, 17203, 17203), class = "Date"), closed_on = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), email = c("hello#b-wom.com", "info#getsymple.com",
"admin#traveltriangle.com", "info#arkbiosciences.com", NA,
"moreinfo#audiense.com"), phone = c(NA, NA, "'+91 98 99 120408",
"###############################################################################################################################################################################################################################################################",
NA, "###############################################################################################################################################################################################################################################################"
), cb_url = c("https://www.crunchbase.com/organization/bwom",
"https://www.crunchbase.com/organization/symple-2", "https://www.crunchbase.com/organization/traveltriangle-com",
"https://www.crunchbase.com/organization/ark-biosciences",
"https://www.crunchbase.com/organization/artizan-biosciences",
"https://www.crunchbase.com/organization/socialbro"), twitter_url = c("https://www.twitter.com/hellobwom",
NA, "https://www.twitter.com/traveltriangle", NA, NA, "https://www.twitter.com/socialbro"
), facebook_url = c("https://www.facebook.com/hellobwom/?fref=ts",
NA, "http://www.facebook.com/traveltriangle", NA, NA, "http://www.facebook.com/socialbro"
), uuid = c("e6096d58-3454-d982-0dbe-7de9b06cd493", "fd0ab78f-0dc4-1f18-21d1-7ce9ff7a173b",
"742043c1-c17a-4526-4ed0-e911e6e9555b", "8e27eb22-ce03-a2af-58ba-53f0f458f49c",
"ed07ac9e-1071-fca0-46d9-42035c2da505", "fed333e5-2754-7413-1e3d-5939d70541d2"
), isbio = c("other", "other", "other", "other", "bio", "other"
), co_type = c("m", "m", "m", "v", "v", "m")), .Names = c("company_name",
"domain", "country_code", "state_code", "region", "city", "status",
"short_description", "category_list", "category_group_list",
"employee_count", "funding_rounds", "funding_total_usd", "founded_on",
"first_funding_on", "last_funding_on", "closed_on", "email",
"phone", "cb_url", "twitter_url", "facebook_url", "uuid", "isbio",
"co_type"), row.names = c(NA, -6L), class = c("tbl_df", "tbl",
"data.frame"))
>

Extract specific columns from dataset, create column of NAs if it doesn't exist

Data frame df has 57 columns. I later read in other csv files, each of which may have the same 57, but more likely have more or fewer columns. I take the names of the original file as:
df = read.csv(...)
str = colnames(df)
I know I can take subsets of a data frame as:
file = read.csv(...)
file = file[, str]
If the columns of file have the same or greater number of columns than the original 57, this will work fine. The extra columns would simply be dropped. However, if the columns of file are fewer than the original 57, the following error arises:
Error in `[.data.frame`(file, , str) : undefined columns selected
Is there a way to take this same approach, but create columns of NA if the column does not exist in file?
EDIT: Including dput ouput for #akrun. I'm not familiar with dput so I hope this is what you were asking for:
File 1 example:
`structure(list(ObservationURI = c("http://resources.usgin.org/uri-gin/wygs/bhtemp/49-037-20341_182_12296/",
"http://resources.usgin.org/uri-gin/wygs/bhtemp/49-037-20341_215_14316/",
"http://resources.usgin.org/uri-gin/wygs/bhtemp/49-037-20341_236_16496/"
), WellName = c("1 BRADY UNIT ANADARKO E&P COMPANY LP", "1 BRADY UNIT ANADARKO E&P COMPANY LP",
"1 BRADY UNIT ANADARKO E&P COMPANY LP"), APINo = c("49-037-20341",
"49-037-20341", "49-037-20341"), HeaderURI = c("http://resources.usgin.org/uri-gin/wygs/well/3720341/",
"http://resources.usgin.org/uri-gin/wygs/well/3720341/", "http://resources.usgin.org/uri-gin/wygs/well/3720341/"
), OtherID = c(3720341, 3720341, 3720341), OtherName = c(NA,
NA, NA), BoreholeName = c(NA, NA, NA), Label = c("Temperature observation for well 3720341",
"Temperature observation for well 3720341", "Temperature observation for well 3720341"
), Operator = c("", "", ""), LeaseName = c("", "", ""), LeaseOwner = c("",
"", ""), LeaseNo = c("", "", ""), SpudDate = c("1900-01-01T00:00",
"1900-01-01T00:00", "1900-01-01T00:00"), EndedDrillingDate = c("",
"", ""), WellType = c("Oil", "Oil", "Oil"), Status = c("Producing Oil Well",
"Producing Oil Well", "Producing Oil Well"), CommodityOfInterest = c("",
"", ""), StatusDate = c("1973-05-03T00:00:00", "1973-05-03T00:00:00",
"1973-05-03T00:00:00"), Function = c(NA, NA, NA), Production = c(NA,
NA, NA), ProducingInterval = c(NA, NA, NA), ReleaseDate = c(NA,
NA, NA), Field = c("", "", ""), OtherLocationName = c("Great Divide Basin",
"Great Divide Basin", "Great Divide Basin"), County = c("Sweetwater",
"Sweetwater", "Sweetwater"), State = c("WY", "WY", "WY"), PLSS_Meridians = c(NA,
NA, NA), TWP = c("16N", "16N", "16N"), RGE = c("101W", "101W",
"101W"), Section_ = c(11, 11, 11), SectionPart = c("NENW", "NENW",
"NENW"), Parcel = c(NA, NA, NA), UTM_E = c(NA, NA, NA), UTM_N = c(NA,
NA, NA), UTMDatumZone = c(NA, NA, NA), LatDegree = c(41.38696,
41.38696, 41.38696), LongDegree = c(-108.75009, -108.75009, -108.75009
), SRS = c("EPSG:4326", "EPSG:4326", "EPSG:4326"), LocationUncertaintyStatement = c("nil:missing",
"nil:missing", "nil:missing"), LocationUncertaintyCode = c(NA,
NA, NA), LocationUncertaintyRadius = c(NA, NA, NA), DrillerTotalDepth = c(NA_real_,
NA_real_, NA_real_), DepthReferencePoint = c(NA, NA, NA), LengthUnits = c("ft",
"ft", "ft"), WellBoreShape = c(NA, NA, NA), TrueVerticalDepth = c(NA,
NA, NA), ElevationKB = c(7135, 7135, 7135), ElevationDF = c(7106,
7106, 7106), ElevationGL = c(0, 0, 0), FormationTD = c("", "",
""), BitDiameterCollar = c(NA, NA, NA), BitDiameterTD = c(NA_real_,
NA_real_, NA_real_), DiameterUnits = c("", "", ""), Notes = c("Depth of measurement assumed to be equal to driller total depth (CRC-AZGS, 2013).",
"Depth of measurement assumed to be equal to driller total depth (CRC-AZGS, 2013).",
"Depth of measurement assumed to be equal to driller total depth (CRC-AZGS, 2013)."
), MaximumRecordedTemperature = c(NA_real_, NA_real_, NA_real_
), MeasuredTemperature = c(182, 215, 236), CorrectedTemperature = c(NA_real_,
NA_real_, NA_real_), TemperatureUnits = c(FALSE, FALSE, FALSE
), TimeSinceCirculation = c(NA_real_, NA_real_, NA_real_), CirculationDuration = c(11,
12, 12), MeasurementProcedure = c("Well log", "Well log", "Well log"
), CorrectionType = c(NA, NA, NA), DepthOfMeasurement = c(-99999,
-99999, -99999), MeasurementDateTime = c("", "", ""), MeasurementFormation = c("",
"", ""), MeasurementSource = c("Richard W. Davis: Deriving geothermal parameters from bottom-hole temperatures in Wyoming\" AAPG bulletin, V. 96, No. 8 (August 2012), pp. 1579-1592",
"Richard W. Davis: Deriving geothermal parameters from bottom-hole temperatures in Wyoming\" AAPG bulletin, V. 96, No. 8 (August 2012), pp. 1579-1592",
"Richard W. Davis: Deriving geothermal parameters from bottom-hole temperatures in Wyoming\" AAPG bulletin, V. 96, No. 8 (August 2012), pp. 1579-1592"
), RelatedResource = c(NA, NA, NA), CasingLogger = c(NA, NA,
NA), CasingBottomDepthDriller = c(NA, NA, NA), CasingTopDepth = c(NA_real_,
NA_real_, NA_real_), CasingPipeDiameter = c(NA, NA, NA), CasingWeight = c(NA,
NA, NA), CasingWeightUnits = c(NA, NA, NA), CasingThickness = c(NA,
NA, NA), DrillingFluid = c("", "", ""), Salinity = c(NA_real_,
NA_real_, NA_real_), MudResistivity = c(NA_real_, NA_real_, NA_real_
), Density = c(NA_real_, NA_real_, NA_real_), FluidLevel = c(NA_real_,
NA_real_, NA_real_), pH = c(NA_real_, NA_real_, NA_real_), Viscosity = c(NA_real_,
NA_real_, NA_real_), FluidLoss = c(NA_real_, NA_real_, NA_real_
), MeasurementNotes = c(NA, NA, NA), InformationSource = c("Wyoming State Geological Survey",
"Wyoming State Geological Survey", "Wyoming State Geological Survey"
)), .Names = c("ObservationURI", "WellName", "APINo", "HeaderURI",
"OtherID", "OtherName", "BoreholeName", "Label", "Operator",
"LeaseName", "LeaseOwner", "LeaseNo", "SpudDate", "EndedDrillingDate",
"WellType", "Status", "CommodityOfInterest", "StatusDate", "Function",
"Production", "ProducingInterval", "ReleaseDate", "Field", "OtherLocationName",
"County", "State", "PLSS_Meridians", "TWP", "RGE", "Section_",
"SectionPart", "Parcel", "UTM_E", "UTM_N", "UTMDatumZone", "LatDegree",
"LongDegree", "SRS", "LocationUncertaintyStatement", "LocationUncertaintyCode",
"LocationUncertaintyRadius", "DrillerTotalDepth", "DepthReferencePoint",
"LengthUnits", "WellBoreShape", "TrueVerticalDepth", "ElevationKB",
"ElevationDF", "ElevationGL", "FormationTD", "BitDiameterCollar",
"BitDiameterTD", "DiameterUnits", "Notes", "MaximumRecordedTemperature",
"MeasuredTemperature", "CorrectedTemperature", "TemperatureUnits",
"TimeSinceCirculation", "CirculationDuration", "MeasurementProcedure",
"CorrectionType", "DepthOfMeasurement", "MeasurementDateTime",
"MeasurementFormation", "MeasurementSource", "RelatedResource",
"CasingLogger", "CasingBottomDepthDriller", "CasingTopDepth",
"CasingPipeDiameter", "CasingWeight", "CasingWeightUnits", "CasingThickness",
"DrillingFluid", "Salinity", "MudResistivity", "Density", "FluidLevel",
"pH", "Viscosity", "FluidLoss", "MeasurementNotes", "InformationSource"
), row.names = c(NA, 3L), class = "data.frame")`
File 2 example:
`structure(list(ObservationURI = c("http://resources.usgin.org/uri-gin/mags/bhtemp/UM:MA-Weston47-422036N0711640.1/",
"http://resources.usgin.org/uri-gin/mags/bhtemp/UM:MA-Dover20-421431N0711752.1/",
"http://resources.usgin.org/uri-gin/mags/bhtemp/UM:MA-Lincoln13-422440N0711815.1/"
), WellName = c("Weston47-USGS HDR19", "Dover20-USGS HDR19",
"Lincoln13-USGS HDR19"), APINo = c(NA, NA, NA), HeaderURI = c("http://resources.usgin.org/uri-gin/mags/well/Weston47-USGS_HDR19/",
"http://resources.usgin.org/uri-gin/mags/well/Dover20-USGS_HDR19/",
"http://resources.usgin.org/uri-gin/mags/well/Lincoln13-USGS_HDR19/"
), OtherID = c("", "", ""), OtherName = c("", "", ""), BoreholeName = c(NA,
NA, NA), Operator = c(NA, NA, NA), LeaseOwner = c(NA, NA, NA),
LeaseNo = c(NA, NA, NA), SpudDate = c(NA, NA, NA), EndedDrillingDate = c("",
"", ""), WellType = c("temporarily abandoned", "observation",
"observation"), Status = c("Idle", "Idle", "Idle"), CommodityOfInterest = c("Water",
"Water", "Water"), StatusDate = c("", "", ""), Function = c("production",
"monitoring", "monitoring"), Production = c(NA, NA, NA),
Field = c(NA, NA, NA), County = c("Middlesex", "Norfolk",
"Middlesex"), State = c("MA", "MA", "MA"), PLSS_Meridians = c(NA,
NA, NA), TWP = c(NA, NA, NA), RGE = c(NA, NA, NA), Section_ = c(NA,
NA, NA), SectionPart = c(NA, NA, NA), Parcel = c(NA, NA,
NA), UTM_E = c(NA, NA, NA), UTM_N = c(NA, NA, NA), LatDegree = c(42.3147771183,
42.2417748607, 42.4110851252), LongDegree = c(-71.3257301787,
-71.2975422044, -71.3034583949), SRS = c("EPSG:4326", "EPSG:4326",
"EPSG:4326"), LocationUncertaintyStatement = c("Field located on topographic map",
"Field located on topographic map", "Field located on topographic map"
), DrillerTotalDepth = c(29, 22, 20), LengthUnits = c("ft",
"ft", "ft"), WellBoreShape = c("Vertical", "Vertical", "Vertical"
), TrueVerticalDepth = c(NA, NA, NA), ElevationGL = c(140,
150, 180), BitDiameterTD = c(72, 48, 42), DiameterUnits = c("in",
"in", "in"), Notes = c("", "", ""), MeasuredTemperature = c(8,
9, 8.5), CorrectedTemperature = c(NA, NA, NA), TemperatureUnits = c("C",
"C", "C"), TimeSinceCirculation = c(NA, NA, NA), CirculationDuration = c(NA,
NA, NA), MeasurementProcedure = c("Samples collected from spigot or faucet nearest to well. Water run until temperature, pH or specific conductance stablized. Temperature measured with a mercury thermometer to nearest half degree in degrees F. Converted to degrees C for table.",
"Samples collected from spigot or faucet nearest to well. Water run until temperature, pH or specific conductance stablized. Temperature measured with a mercury thermometer to nearest half degree in degrees F. Converted to degrees C for table.",
"Samples collected from spigot or faucet nearest to well. Water run until temperature, pH or specific conductance stablized. Temperature measured with a mercury thermometer to nearest half degree in degrees F. Converted to degrees C for table."
), CorrectionType = c(NA, NA, NA), DepthOfMeasurement = c(NA,
NA, NA), MeasurementDateTime = c(NA, NA, NA), MeasurementFormation = c(NA,
NA, NA), MeasurementSource = c("Walker, Eugene H., William W. Caswell, and S. William Wandle, Jr. Hydrologic Data of the Charles River Basin",
"Walker, Eugene H., William W. Caswell, and S. William Wandle, Jr. Hydrologic Data of the Charles River Basin",
"Walker, Eugene H., William W. Caswell, and S. William Wandle, Jr. Hydrologic Data of the Charles River Basin"
), CasingLogger = c(" Massachusetts\". USGS Massachusetts Hydrologic-Data Report No. 19 (1977): 1-57. Print. ftp://eclogite.geo.umass.edu/pub/stategeologist/Products/Geothermal/BoreholeTemperatureData/DataReport19.pdf\"",
" Massachusetts\". USGS Massachusetts Hydrologic-Data Report No. 19 (1977): 1-57. Print. ftp://eclogite.geo.umass.edu/pub/stategeologist/Products/Geothermal/BoreholeTemperatureData/DataReport19.pdf\"",
" Massachusetts\". USGS Massachusetts Hydrologic-Data Report No. 19 (1977): 1-57. Print. ftp://eclogite.geo.umass.edu/pub/stategeologist/Products/Geothermal/BoreholeTemperatureData/DataReport19.pdf\""
), CasingDepthDriller = c("", "", ""), CasingPipeDiameter = c("",
"", ""), CasingWeight = c(NA, NA, NA), CasingWeightUnits = c(NA,
NA, NA), CasingThickness = c(NA, NA, NA), DrillingFluid = c(NA,
NA, NA), Salinity = c(NA, NA, NA), MudResisitivity = c(NA,
NA, NA), Density = c(NA, NA, NA), FluidLevel = c(NA, NA,
NA), pH = c(NA, NA, NA), Viscosity = c(NA, NA, NA), FluidLoss = c(NA,
NA, NA), Unnamed..66 = c(NA, NA, NA), BitDiameterCollar = c(72,
48, 42), Unnamed..68 = c(NA, NA, NA), InformationSource = c("Stephen Mabee, MA State Geologist, University of Massachusetts, 611 North Pleasant Street, Amherst MA 01003 413-545-2285",
"Stephen Mabee, MA State Geologist, University of Massachusetts, 611 North Pleasant Street, Amherst MA 01003 413-545-2285",
"Stephen Mabee, MA State Geologist, University of Massachusetts, 611 North Pleasant Street, Amherst MA 01003 413-545-2285"
)), .Names = c("ObservationURI", "WellName", "APINo", "HeaderURI",
"OtherID", "OtherName", "BoreholeName", "Operator", "LeaseOwner",
"LeaseNo", "SpudDate", "EndedDrillingDate", "WellType", "Status",
"CommodityOfInterest", "StatusDate", "Function", "Production",
"Field", "County", "State", "PLSS_Meridians", "TWP", "RGE", "Section_",
"SectionPart", "Parcel", "UTM_E", "UTM_N", "LatDegree", "LongDegree",
"SRS", "LocationUncertaintyStatement", "DrillerTotalDepth", "LengthUnits",
"WellBoreShape", "TrueVerticalDepth", "ElevationGL", "BitDiameterTD",
"DiameterUnits", "Notes", "MeasuredTemperature", "CorrectedTemperature",
"TemperatureUnits", "TimeSinceCirculation", "CirculationDuration",
"MeasurementProcedure", "CorrectionType", "DepthOfMeasurement",
"MeasurementDateTime", "MeasurementFormation", "MeasurementSource",
"CasingLogger", "CasingDepthDriller", "CasingPipeDiameter", "CasingWeight",
"CasingWeightUnits", "CasingThickness", "DrillingFluid", "Salinity",
"MudResisitivity", "Density", "FluidLevel", "pH", "Viscosity",
"FluidLoss", "Unnamed..66", "BitDiameterCollar", "Unnamed..68",
"InformationSource"), row.names = c(NA, 3L), class = "data.frame")`
We can read the datasets in a list with fread and use rbindlist from data.table with fill = TRUE and idcol argument to create a single data.table object. The fill = TRUE ensure that NA elements are created for those datasets that have lesser number of columns.
library(data.table)
#get the files from the working directory
files <- list.files(pattern = ".csv")
#read files in a loop with fread and then rbind the data.tables
rbindlist(lapply(files, fread), fill = TRUE, idcol = "grp")

Resources