I am extremely new to building functions and loops. I have looked at previous questions that are similar to my issue but I can't seem to find the solution for my problem. My goal is to extract climate data from a webpage like this:
https://mesonet.agron.iastate.edu/cgi-bin/request/coop.py?network=NECLIMATE&stations=NE3065&year1=2020&month1=1&day1=1&year2=2020&month2=12&day2=31&vars%5B%5D=gdd_50_86&model=apsim&what=view&delim=comma&gis=no&scenario_year=2019
where I will use this data to calculate growing degree days for a crop growth model. I have had success pulling data using a for loop.
uticaNE <- "https://mesonet.agron.iastate.edu/cgi-bin/request/coop.py?network=NECLIMATE&stations=NE8745&year1=2020&month1=1&day1=1&year2=2020&month2=12&day2=31&vars%5B%5D=gdd_50_86&model=apsim&what=view&delim=comma&gis=no&scenario_year=2019"
friendNE <- "https://mesonet.agron.iastate.edu/cgi-bin/request/coop.py?network=NECLIMATE&stations=NE3065&year1=2020&month1=1&day1=1&year2=2020&month2=12&day2=31&vars%5B%5D=gdd_50_86&model=apsim&what=view&delim=comma&gis=no&scenario_year=2019"
location.urls <- c(uticaNE, friendNE)
location.meso.files <- c("uticaNe.txt", "friendNE.txt")
for(i in seq_along(location.urls)){
download.file(location.urls[i], location.meso.files[i], method="libcurl")
}
I will have around 20 locations I will be pulling data in daily. What I want to do is apply a task where I calculate fahrenheit, GDD, etc. to each file and save the output of each file separately.
This is the following code I have currently.
files <- list.files(pattern="*.txt", full.names=TRUE, recursive=FALSE)
func <- for (i in 1:length(files)){
df <- read.table(files[i], skip=10, stringsAsFactors =
FALSE)
colnames(df) <- c("year", "day", "solrad", "maxC",
"minC", "precipmm")
df$year <- as.f(df$year)
df$day <- as.factor(df$day)
df$maxF <- (df$maxC * (9/5) + 32)
df$minF <- (df$minC * (9/5) + 32)
df$GDD <- (((df$maxF + df$minF)/2)-50)
df$GDD[df$GDD <= 0] <- 0
df$GDD.cumulateive <- cumsum(df$GDD)
df$precipmm.cumulative <- cumsum(df$precipmm)
return(df)
write.table(df, path="./output", quote=FALSE,
row.names=FALSE, col.names=TRUE)
}
data <- apply(files, func)
Any help would be greatly appreciated.
-ML
Here is an approach using base R, and lapply() with an anonymous function to download the data, read it into a data frame, add the conversions to fahrenheit and cumulative precipitation, and write to output files.
First, we create the list of weather stations for which we will download data
# list of 10 stations
stationList <- c("NE3065","NE8745","NE0030","NE0050","NE0130",
"NE0245","NE0320","NE0355","NE0375","NE0420")
Here we create two URL fragments, one for the URL content prior to the station identifier, and another one for the URL content after the station identifier.
urlFragment1 <- "https://mesonet.agron.iastate.edu/cgi-bin/request/coop.py?network=NECLIMATE&stations="
urlFragment2 <- "&year1=2020&month1=1&day1=1&year2=2020&month2=12&day2=31&vars%5B%5D=gdd_50_86&model=apsim&what=view&delim=comma&gis=no&scenario_year"
Next, we create input and output directories, one to store the downloaded climate input files, and another for the output files.
# create input and output file directories if they do not already exist
if(!dir.exists("./data")) dir.create("./data")
if(!dir.exists("./data/output")) dir.create("./data/output")
The lapply() function uses paste0() to add the station names to the URL fragments we created above, enabling us to automate the download and subsequent operations against each input file.
stationData <- lapply(stationList,function(x){
theURL <-paste0(urlFragment1,x,urlFragment2)
download.file(theURL,
paste0("./data/",x,".txt"),method="libcurl")
df <- read.table(paste0("./data/",x,".txt"), skip=11, stringsAsFactors =
FALSE)
colnames(df) <- c("year", "day", "solrad", "maxC",
"minC", "precipmm")
df$year <- as.factor(df$year)
df$day <- as.factor(df$day)
df$maxF <- (df$maxC * (9/5) + 32)
df$minF <- (df$minC * (9/5) + 32)
df$GDD <- (((df$maxF + df$minF)/2)-50)
df$GDD[df$GDD <= 0] <- 0
df$GDD.cumulative <- cumsum(df$GDD)
df$precipmm.cumulative <- cumsum(df$precipmm)
df$station <- x
write.table(df,file=paste0("./data/output/",x,".txt"), quote=FALSE,
row.names=FALSE, col.names=TRUE)
df
})
# add names to the data frames returned by lapply()
names(stationData) <- stationList
...and the output, a directory containing one file for each station listed in the stationList object.
Finally, here is the data that has been written to the ./data/output/NE3065.txt file.
year day solrad maxC minC precipmm maxF minF GDD GDD.cumulateive precipmm.cumulative station
2020 1 8.992 2.2 -5 0 35.96 23 0 0 0 NE3065
2020 2 9.604 5.6 -3.9 0 42.08 24.98 0 0 0 NE3065
2020 3 4.933 5.6 -3.9 0 42.08 24.98 0 0 0 NE3065
2020 4 8.699 3.9 -7.2 0 39.02 19.04 0 0 0 NE3065
2020 5 9.859 6.1 -7.8 0 42.98 17.96 0 0 0 NE3065
2020 6 10.137 7.2 -5 0 44.96 23 0 0 0 NE3065
2020 7 8.754 6.1 -4.4 0 42.98 24.08 0 0 0 NE3065
2020 8 10.121 7.8 -5 0 46.04 23 0 0 0 NE3065
2020 9 9.953 7.2 -5 0 44.96 23 0 0 0 NE3065
2020 10 8.905 7.2 -5 0 44.96 23 0 0 0 NE3065
2020 11 0.416 -3.9 -15.6 2.29 24.98 3.92 0 0 2.29 NE3065
2020 12 10.694 -4.4 -16.1 0 24.08 3.02 0 0 2.29 NE3065
2020 13 1.896 -4.4 -11.1 0.51 24.08 12.02 0 0 2.8 NE3065
2020 14 0.851 0 -7.8 0 32 17.96 0 0 2.8 NE3065
2020 15 11.043 -1.1 -8.9 0 30.02 15.98 0 0 2.8 NE3065
2020 16 10.144 -2.8 -17.2 0 26.96 1.04 0 0 2.8 NE3065
2020 17 10.75 -5.6 -17.2 3.05 21.92 1.04 0 0 5.85 NE3065
Note that there are 11 rows of header data in the input files, so one must set the skip= argument in read.table() to 11, not 10 as was used in the OP.
Enhancing the code
The last line in the anonymous function returns the data frame to the parent environment, resulting in a list of 10 data frames stored in the stationData object. Since we assigned the station name to a column in each data frame, we can combine the data frames into a single data frame for subsequent analysis, using do.call() with rbind() as follows.
combinedData <- do.call(rbind,stationData)
Since this code was run on January 17th, the resulting data frame contains 170 observations, or 17 observations for each of the 10 stations whose data we downloaded.
At this point the data can be analyzed by station, such as finding the average year to date precipitation by station.
> aggregate(precipmm ~ station,combinedData,mean)
station precipmm
1 NE0030 0.01470588
2 NE0050 0.56764706
3 NE0130 0.32882353
4 NE0245 0.25411765
5 NE0320 0.28411765
6 NE0355 1.49411765
7 NE0375 0.55235294
8 NE0420 0.13411765
9 NE3065 0.34411765
10 NE8745 0.47823529
>
Instead of using base R which ,you can install tidyverse library.
https://www.tidyverse.org/
In which you can use load the link into data frame as
tsv(tab separated value) using read_tsv function.
dataframe<-read_tsv(url("http://some.where.net/"))
Then create a loop in R and do calculations
something<-c('link1','link2') #vector in R
for(i in someting){
#make sure to indent with one space
}
At the end, you save data frame to a file using
write_csv(dataframe, file = "c:\\myname\\yourfile.csv")
Related
Hello everyone I was hoping I could get some help with this issue:
I have shapefile with 2347 features that correspond to 3172 units, perhaps when the original file was created there were some duplicated geometries and they decided to arrange them like this:
Feature gis_id
1 "1"
2 "2"
3 "3,4,5"
4 "6,8"
5 "7"
6 "9,10,13"
... like that until the 3172 units and 2347 features
On the other side my data table has 72956 observations (about 16 columns) with data corresponding to the gis_id from the shapefile. However, this table has a unique gis_id per observation
head(hru_ls)
jday mon day yr unit gis_id name sedyld tha sedorgn kgha sedorgp kgha surqno3 kgha lat3no3 kgha
1 365 12 31 1993 1 1 hru0001 0.065 0.861 0.171 0.095 0
2 365 12 31 1993 2 2 hru0002 0.111 1.423 0.122 0.233 0
3 365 12 31 1993 3 3 hru0003 0.024 0.186 0.016 0.071 0
4 365 12 31 1993 4 4 hru0004 6.686 16.298 1.040 0.012 0
5 365 12 31 1993 5 5 hru0005 37.220 114.683 6.740 0.191 0
6 365 12 31 1993 6 6 hru0006 6.597 30.949 1.856 0.021 0
surqsolp kgha usle tons sedmin ---- tileno3 ----
1 0.137 0 0.010 0
2 0.041 0 0.009 0
3 0.014 0 0.001 0
4 0.000 0 0.175 0
5 0.000 0 0.700 0
6 0.000 0 0.227 0
With multiple records for each unit (20 years data)
I would like to merge the geometry data of my shapefile to my data table. I've done this before with sp::merge I think, but with a shapefile that did not have multiple id's per geometry/feature.
Is there a way to condition the merging so it gives each feature from the data table the corresponding geometry according to if it has any of the values present on the gis_id field from the shapefile?
This is a very intriguing question, so I gave it a shot. My answer is probably not the quickest or most concise way of going about this, but it works (at least for your sample data). Notice that this approach is fairly sensitive to the formatting of the data in shapefile$gis_id (see regex).
# your spatial data
shapefile <- data.frame(feature = 1:6, gis_id = c("1", "2", "3,4,5", "6,8", "7", "9,10,13"))
# your tabular data
hru_ls <- data.frame(unit = 1:6, gis_id = paste(1:6))
# loop over all gis_ids in your tabular data
# perhaps this could be vectorized?
gis_ids <- unique(hru_ls$gis_id)
for(id in gis_ids){
# Define regex to match gis_ids
id_regex <- paste0("(,|^)", id, "(,|$)")
# Get row in shapefile that matches regex
searchterm <- lapply(shapefile$gis_id, function(x) grepl(pattern = id_regex, x = x))
rowmatch <- which(searchterm == TRUE)
# Return shapefile feature id that maches tabular gis_id
hru_ls[hru_ls$gis_id == id, "gis_feature_id"] <- shapefile[rowmatch, "feature"]
}
Since you didn't provide the geometry fields in your question, I just matched on Feature in your spatial data. You could either add an additional step that merges based on Feature, or replace "feature" in shapefile[rowmatch, "feature"] with your geometry fields.
I have around 300-500 CSV files with some character information at the beginning and two-column as numeric data. I want to make one data.frame with all the numeric values in Such a way that I have column X once with multiple Y without the character rows.
**File1** has two-column and more than a thousand rows: an example looks like
info info
info info
info info
X Y
1 50.3
2 56.2
3 96.5
4 56.4
5 65.2
info 0
**File2**
info info
info info
info info
X Y
1 46.3
2 65.2
3 21.6
4 98.2
5 25.3
info 0
Only Y values are changing from file to file, I want to add all the files in one file with selective rows and make a data frame. Such as I want as a data frame.
X Y1 Y2
1 46.3 50.3
2 65.2 56.2
3 21.6 96.5
4 98.2 56.4
5 25.3 65.2
I tried
files <- list.files(pattern="*.csv")
l <- list()
for (i in 1:length(files)){
l[[i]] <- read.csv(files[i], skip = 3)
}
data.frame(l)
This gives me
X1 Y1 X2 Y2
1 46.3 1 50.3
2 65.2 2 56.2
3 21.6 3 96.5
4 98.2 4 56.4
5 25.3 5 65.2
info 0 info 0
How can I skip the last row and column X as the first column only (since X values do not change)
Define a function Read that reads one file removing all lines that do not start with a digit. We use sep="," to specify that the fields in each file are comma separated. Then use Read with read.zoo to read and merge the files giving zoo object z. Finally either use it as a zoo object or convert it to a data frame as shown.
library(zoo)
Read <- function(f) {
read.table(text = grep("^\\d", readLines(f), value = TRUE), sep = ".")
}
z <- read.zoo(Sys.glob("*.csv"), read = Read)
DF <- fortify.zoo(z)
I am trying to open and clean a massive oceanographic dataset in R, where station information is interspersed as headers in between the chunks of observations:
$
2008 1 774 8 17 5 11 2 78.4952 6.0375 30 7 1.2 -999.0 -9 -9 -9 -9 4868.8 2017 0 7114
2.0 6.0297 35.0199 34.4101 2.0 11111
3.0 6.0279 35.0201 34.4091 3.0 11111
4.0 6.0272 35.0203 34.4091 4.0 11111
5.0 6.0273 35.0204 34.4097 4.9 11111
6.0 6.0274 35.0205 34.4104 5.9 11111
$
2008 1 777 8 17 12 7 25 78.4738 8.3510 27 6 4.1 -999.0 3 7 2 0 4903.8 1570 0 7114
3.0 6.4129 34.5637 34.3541 3.0 11111
4.0 6.4349 34.5748 34.3844 4.0 11111
5.0 6.4803 34.5932 34.4426 4.9 11111
6.0 6.4139 34.5624 34.3552 5.9 11111
7.0 6.5079 34.6097 34.4834 6.9 11111
each $ is followed by a row containing station data (e.g. year, ..., lat, lon, date, time), then follow several rows containing the observations sampled at that station (e.g. depth, temperature, salinity etc.).
I would like to add the station data to the observation, so that each variable is a column
and each observation is a row, like this:
2008 1 774 8 17 5 11 2 78.4952 6.0375 30 7 1.2 -999 2 6.0297 35.0199 34.4101 2 11111
2008 1 774 8 17 5 11 2 78.4952 6.0375 30 7 1.2 -999 3 6.0279 35.0201 34.4091 3 11111
2008 1 774 8 17 5 11 2 78.4952 6.0375 30 7 1.2 -999 4 6.0272 35.0203 34.4091 4 11111
2008 1 774 8 17 5 11 2 78.4952 6.0375 30 7 1.2 -999 5 6.0273 35.0204 34.4097 4.9 11111
2008 1 774 8 17 5 11 2 78.4952 6.0375 30 7 1.2 -999 6 6.0274 35.0205 34.4104 5.9 11111
2008 1 777 8 17 12 7 25 78.4738 8.351 27 6 4.1 -999 3 6.4129 34.5637 34.3541 3 11111
2008 1 777 8 17 12 7 25 78.4738 8.351 27 6 4.1 -999 4 6.4349 34.5748 34.3844 4 11111
2008 1 777 8 17 12 7 25 78.4738 8.351 27 6 4.1 -999 5 6.4803 34.5932 34.4426 4.9 11111
2008 1 777 8 17 12 7 25 78.4738 8.351 27 6 4.1 -999 6 6.4139 34.5624 34.3552 5.9 11111
2008 1 777 8 17 12 7 25 78.4738 8.351 27 6 4.1 -999 7 6.5079 34.6097 34.4834 6.9 11111
This is simpler and only depends on base R. I assume that you have read the text file with x <- readLines(....) first:
start <- which(x == "$") + 1 # Find header indices
rows <- diff(c(start, length(x)+2)) - 2 # Find number of lines per group
# Function to read header and rows and cbind
getdata <- function(begin, end) {
cbind(read.table(text=x[begin]), read.table(text=x[(begin+1):(begin+end)]))
}
dta.list <- lapply(1:(length(start)), function(i) getdata(start[i], rows[i]))
dta.df <- do.call(rbind, dta.list)
This works with the two groups you included in your post. You will need to fix the column names since V1 - V6 are repeated at the beginning and end.
This solution is pretty involved, and rests on knowledge of several Tidyverse libraries and features. I'm not sure how robust it is for your needs, but it does do okay with the sample you posted. But the approach of folding blocks, creating functions to parse the smaller blocks, and then unfolding the results I think will serve you well.
The first piece involves finding the '$' markers, grouping following lines together, and then "nesting" the block of data together. Then we have a data frame that has only a few rows - one for each section.
library(tidyverse)
txt_lns <- readLines("ocean-sample.txt")
txt <- tibble(txt = txt_lns)
# Start by finding new sections, and nesting the data
nested_txt <- txt %>%
mutate(row_number = row_number()) %>%
mutate(new_section = str_detect(txt, "\\$")) %>% # Mark new sections
mutate(starting = ifelse(new_section, row_number, NA)) %>% # Index with row num
tidyr::fill(starting) %>% # Fill index down
# where missing
select(-new_section) %>% # Clean up
filter(!str_detect(txt, "\\$")) %>%
nest(data = c(txt, row_number)) # "Nest" the data
# Take a quick look
nested_txt
Then, we need to be able to deal with those nested blocks. The routines here parse those blocks by identifying header rows, and then separating the fields into dataframes of their own. Here, we have different logic for header rows vs. the shorter lesser rows.
# Deal with the records within a section
parse_inner_block <- function(x, header_ind) {
if (header_ind) {
df <- x %>%
mutate(txt = str_trim(txt)) %>%
# Separate the header row into 22 variables
separate(txt, into = LETTERS[1:22], sep = "\\s+")
} else {
df <- x %>%
mutate(txt = str_trim(txt)) %>%
# Separate the lesser rows into 6 variables
separate(txt, into = letters[1:6], sep = "\\s+")
}
return(df)
}
parse_outer_block <- function(x) {
df <- x %>%
# Determine if it's a header row with 22 variables or lesser row with 6
mutate(leading_row = (row_number == min(row_number))) %>%
# Fold by header row vs. not
nest(data = c(txt, row_number)) %>%
# Create data frames for both header and lesser rows
mutate(processed = purrr::map2(data, leading_row, parse_inner_block)) %>%
unnest(processed) %>%
# Copy header row values to lesser rows
tidyr::fill(A:V) %>%
# Drop header row
filter(!leading_row)
return(df)
}
And then we can put it all together -- starting with our nested data, processing each block, unnesting the fields that came back, and prepping the full output.
# Actually put all this together and generate an output dataframe
output <- nested_txt %>%
mutate(proc_out = purrr::map(data, parse_outer_block)) %>%
select(-data) %>%
unnest(proc_out) %>%
select(-starting, -leading_row, -data, -row_number)
output
Hope it helps. I'd recommend looking at some purrr tutorials as well for some similar problems.
My dataset looks like this:
Year Risk Resource Utilization Band Percent
2014 0 .25
2014 1 .19
2014 2 .17
2014 3 .31
2014 4 .06
2014 5 .01
2015 0 .23
2015 1 .21
2015 2 .19
2015 3 .31
2015 4 .06
2015 5 .31
I am attempting to compare percentage change year to year for the dataset I am working with. For example 2014 decreased 2% in 2015. So far, I have created a loop that puts each by year into bins and runs the calculation. The issue I am having is that the loop is indexing each loop as 1 so I have a bunch of repeating 1s next to my calculations. Here is the code I have been using, any help is much appreciated
Results.data <- data.frame()
head(data)
percent <- 0
baseyear <- 0
nextyear <- 0
bin <- 0
yearPlus1 <-0
bin2 <-0
percent1 <-0
percent2 <-0
percentDif <-0
for(i in 1:nrow(data))
{
percent[i] <- data$PERCENT[i]
baseyear[i] <- as.numeric(data$YEAR_RISK[i])
bin[i] <- as.numeric(data$RESOURCE_UTILIZATION_BAND[i])
#print(percent[i])
#print(baseyear[i])
#print(bin[i])
}
for (k in 1:nrow(data))
{
for (j in 1:nrow(data))
{
yearPlus1 <- as.numeric(baseyear[j])-1
firstYear <- as.numeric(baseyear[k])
bin2 <-bin[j]
bin1 <- bin[k]
percent1 <- as.numeric(percent[k])
percent2 <- as.numeric(percent[j])
if(firstYear==yearPlus1 && bin1==bin2)
{
percentDif <- percent2 - percent1
print(percentDif)
Results.data <- rbind(Results.data, c(percentDif))
}
}
}
If I understand your question, you can use grouping and vectorization to avoid loops. Here's an example using the dplyr package.
The code below first sorts by Year_Risk so that the data are ordered properly by time. Then we group by Resource_Utilization_Band so that we can get results separately for each level of Resource_Utilization_Band. Finally, we calculate the difference in Percent from year to year. The lag function returns the previous value in a sequence. (Instead of lag, we could have done Change = c(NA, diff(Percent)) as well.) All of these operations are chained one after the other using the dplyr chaining operator (%>%).
(Note that when I imported your data, I also changed your column names by adding underscores to make them legal R column names.)
library(dplyr)
# Year-over-year change within each Resource_Utilization_Band
# (Assuming your starting data frame is called "dat")
dat %>% arrange(Year_Risk) %>%
group_by(Resource_Utilization_Band) %>%
mutate(Change = Percent - lag(Percent))
Year_Risk Resource_Utilization_Band Percent Change
1 2014 0 0.25 NA
2 2014 1 0.19 NA
3 2014 2 0.17 NA
4 2014 3 0.31 NA
5 2014 4 0.06 NA
6 2014 5 0.01 NA
7 2015 0 0.23 -0.02
8 2015 1 0.21 0.02
9 2015 2 0.19 0.02
10 2015 3 0.31 0.00
11 2015 4 0.06 0.00
12 2015 5 0.31 0.30
I'm relatively new at R and I'm trying to build a function which will loop through columns in an imported table and produce an output which consists of the means and 95% confidence intervals. Ideally it should be possible to bootstrap columns with different sample sizes, but first I would like to get the iteration working. I have something that sort-of works, but I can't get it all the way there. This is what the code looks like, with the sample data and output included:
#cdata<-read.csv(file.choose(),header=T)#read data from selected file, works, commented out because data is provided below
#cdata #check imported data
#Sample Data
# WALL NRPK CISC WHSC LKWH YLPR
#1 21 8 1 2 2 5
#2 57 9 3 1 0 1
#3 45 6 9 1 2 0
#4 17 10 2 0 3 0
#5 33 2 4 0 0 0
#6 41 4 13 1 0 0
#7 21 4 7 1 0 0
#8 32 7 1 7 6 0
#9 9 7 0 5 1 0
#10 9 4 1 0 0 0
x<-cdata[,c("WALL","NRPK","LKWH","YLPR")] #only select relevant species
i<-nrow(x) #count number of rows for bootstrapping
g<-ncol(x) #count number of columns for iteration
#build bootstrapping function, this works for the first column but doesn't iterate
bootfun <- function(bootdata, reps) {
boot <- function(bootdata){
s1=sample(bootdata, size=i, replace=TRUE)
ms1=mean(s1)
return(ms1)
} # a single bootstrap
bootrep <- replicate(n=reps, boot(bootdata))
return(bootrep)
} #replicates bootstrap of "bootdata" "reps" number of times and outputs vector of results
cvr1 <- bootfun(x$YLPR,50000) #have unsuccessfully tried iterating the location various ways (i.e. x[i])
cvrquantile<-quantile(cvr1,c(0.025,0.975))
cvrmean<-mean(cvr1)
vec<-c(cvrmean,cvrquantile) #puts results into a suitable form for output
vecr<-sapply(vec,round,1) #rounds results
vecr
2.5% 97.5%
28.5 19.4 38.1
#apply(x[1:g],2,bootfun) ##doesn't work in this case
#desired output:
#Species Mean LowerCI UpperCI
#WALL 28.5 19.4 38.1
#NRPK 6.1 4.6 7.6
#YLPR 0.6 0.0 1.6
I've also tried this using the boot package, and it works beautifully to iterate through the means but I can't get it to do the same with the confidence intervals. The "ordinary" code above also has the advantage that you can easily retrieve the bootstrapping results, which might be used for other calculations. For the sake of completeness here is the boot code:
#Bootstrapping using boot package
library(boot)
#data<-read.csv(file.choose(),header=TRUE) #read data from selected file
#x<-data[,c("WALL","NRPK","LKWH","YLPR")] #only select relevant columns
#x #check data
#Sample Data
# WALL NRPK LKWH YLPR
#1 21 8 2 5
#2 57 9 0 1
#3 45 6 2 0
#4 17 10 3 0
#5 33 2 0 0
#6 41 4 0 0
#7 21 4 0 0
#8 32 7 6 0
#9 9 7 1 0
#10 9 4 0 0
i<-nrow(x) #count number of rows for resampling
g<-ncol(x) #count number of columns to step through with bootstrapping
boot.mean<-function(x,i){boot.mean<-mean(x[i])} #bootstrapping function to get the mean
z<-boot(x, boot.mean,R=50000) #bootstrapping function, uses mean and number of reps
boot.ci(z,type="perc") #derive 95% confidence intervals
apply(x[1:g],2, boot.mean) #bootstrap all columns
#output:
#WALL NRPK LKWH YLPR
#28.5 6.1 1.4 0.6
I've gone through all of the resources I can find and can't seem to get things working. What I would like for output would be the bootstrapped means with the associated confidence intervals for each column. Thanks!
Note: apply(x[1:g],2, boot.mean) #bootstrap all columns doesn't do any bootstrap. You are simply calculating the mean for each column.
For bootstrap mean and confidence interval, try this:
apply(x,2,function(y){
b<-boot(y,boot.mean,R=50000);
c(mean(b$t),boot.ci(b,type="perc", conf=0.95)$percent[4:5])
})