R programming (beginner): Combining two lists--> dataframe -> csv - r

I tried to combine two lists into one dataframe:
all_stas <- list()
for(i in vid_id){
stas <- get_stats(video_id = i)
all_stas <- rbind(all_stas,stas)
}
View(all_stas)
all_detail <- list()
for(i in vid_id){
detail1 <- get_video_details(video_id = i)
all_detail <- rbind(all_detail,detail1)
}
View(all_detail)
df <- data.frame(all_stas,all_detail)
write.csv(df, file = "new_file.csv")
Afterwards I would like to store it into a csv file.
When I run it it gives me the following warning message
Warning message:
In rbind(all_stas, stas) :
number of columns of result is not a multiple of vector length (arg 2)
Does anyone of you know how I can make the code work?

This block below is triggering an error
all_stas <- list()
for(i in vid_id){
stas <- get_stats(video_id = i)
all_stas <- rbind(all_stas,stas)}
If I understand your code correctly you can get around that error by
all_stas <- list()
for(i in vid_id){
all_stas[[i]] <- get_stats(video_id = i)}

Related

Variable length differ in R

i'm performing Anova testing for my current datasets that has multiple columns which i am trying to loop to make things easier but it seems to me that i am always facing the same error called "variable lengths differ"
here is my code for the loop:
for(i in 5:125){
WL<- colnames(NB[i])
model <- lm(WL ~ Treatment , data = NB)
if(!exists("aovNB")){
aovNB<-anova(model)
}
if(exists("aovNB")){
aovNB <- rbind(aovNB,anova(model))
}
}
and i'm wondering if it is possible that way to store the column names into WL variable which i can use to read the multiple columns i have.
thanks if anyone could solve it. i'm using base R.
Use reformulate/as.formula to create formula from strings. Also instead of rbinding the datasets in a loop store them in a list.
cols <- colnames(NB)[5:125]
result <- vector('list', length(cols))
for(i in seq_along(cols)){
model <- lm(reformulate('Treatment', cols[i]) , data = NB)
result[[i]] <- anova(model)
}
If needed you can combine them using result <- do.call(rbind, result)
We may do this with paste
cols <- colnames(NB)[5:125]
result <- vector('list', length(cols))
for(i in seq_along(cols)) {
result[[i]] <- anova(lm(as.formula(paste(cols[i], '~ Treatment')), data = NB))
}

How to use a for loop with multiple results

I have to automate this sequence of functions:
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
WBES_sf_angola_i <- subset(WBES_sf_angola, isic == i)
WBES_angola_i <- as_Spatial(WBES_sf_angola_i)
FDI_angola_i <- FDI_angola[FDI_angola$isic==i,]
dist_ao_i <- distm(WBES_angola_i,FDI_angola_i, fun = distGeo)/1000
rm(WBES_sf_angola_i,WBES_angola_i,FDI_angola_i)
}
As a result, I want a "dist_ao" for each i. The indexed values are to be found in the isic columns of the WBES_sf_angola and the FDI_angola datasets.
How can I embed the index in the various items' names?
EDIT:
I tried with following modification:
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
WBES_sf_angola_i <- subset(WBES_sf_angola, isic == i)
WBES_angola_i <- as_Spatial(WBES_sf_angola_i)
FDI_angola_i <- FDI_angola[FDI_angola$isic==i,]
result_list <- list()
result_list[[paste0("dist_ao_", i)]] <- distm(WBES_angola_i,FDI_angola_i, fun = distGeo)/1000
rm(WBES_sf_angola_i,WBES_angola_i,FDI_angola_i)
}
and the output is just a list of 1 that contains dist_ao_62. Where do I avoid overwriting?
Untested (due to missing MRE) but should work:
result_list <- list()
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
result_list[[paste0("dist_ao_", i)]] <- distm(as_Spatial(subset(WBES_sf_angola, isic == i)) , FDI_angola[FDI_angola$isic==i,], fun = distGeo)/1000
}
You could approach it this way. All resulting dataframes will be included in the list, which you can convert to a dataframe from the last line of the the code here. NOTE: since not reproducible, I have mostly taken the code from your question inside the loop.
WBES_sf_angola_result <- list() # renamed this, as it seems you are using a dataset with the name WBES_sf_angola
WBES_angola <- list()
FDI_angola <- list()
dist_ao <- list()
for (i in c(15,17,20,24,25,26,27,28,29,45,50,52,55,60,62)) {
WBES_sf_angola[[paste0("i_", i)]] <- subset(WBES_sf_angola, isic == i)
WBES_angola[[paste0("i_", i)] <- as_Spatial(WBES_sf_angola_i)
FDI_angola[[paste0("i_", i)] <- FDI_angola[FDI_angola$isic==i,]
dist_ao[[paste0("i_", i)] <- distm(WBES_angola_i,FDI_angola_i, fun = distGeo)/1000
rm(WBES_sf_angola_i,WBES_angola_i,FDI_angola_i)
}
WBES_sf_angola_result <- do.call(rbind, WBES_sf_angola_result) # to get a dataframe
Your subset data can also be accessed through list index. eg.
WBES_sf_angola_result[[i_15]] # for the first item.

Read many files in parallel and extract data

I have 1000 json files. And I would like to read them in parallel. I have 4 CPU cores.
I have a character vector which has the names of all the files as following:-
cik_files <- list.files("./data/", pattern = ".json")
And using this vector I load the file and extract the data and add it to the following list:-
data <- list()
Below is the code for extracting the data:-
for(i in 1:1000){
data1 <- fromJSON(paste0("./data/", cik_files[i]), flatten = TRUE)
if(("NetIncomeLoss" %in% names(data1$facts$`us-gaap`))){
data1 <- data1$facts$`us-gaap`$NetIncomeLoss$units$USD
data1 <- data1[grep("CY20[0-9]{2}$", data1$frame), c(3, 9)]
try({if(nrow(data1) > 0){
data1$cik <- strtrim(cik_files[i], 13)
data[[length(data) + 1]] <- data1
}}, silent = TRUE)
}
}
This however, takes quite a lot of time. So I was wondering how I can run the code within the for loop but in parallel.
Thanks in advance.
Here is an attempt to solve the problem in the question. Untested, since there is no data.
Step 1
First of all, rewrite the loop in the question as a function.
f <- function(i, path = "./data", cik_files){
filename <- file.path(path, cik_files[i])
data1 <- fromJSON(filename, flatten = TRUE)
if(("NetIncomeLoss" %in% names(data1$facts$`us-gaap`))){
data1 <- data1$facts$`us-gaap`$NetIncomeLoss$units$USD
found <- grep("CY20[0-9]{2}$", data1$frame)
if(length(found) > 0){
tryCatch({
out <- data1[found, c(3, 9)]
out$cik <- strtrim(cik_files[i], 13)
out
},
error = function(e) e,
warning = function(w) w)
} else NULL
} else NULL
}
Step 2
Now load the package parallel and run one of the following, depending on OS.
library(parallel)
# Not on Windows
library(jsonlite)
json_list <- mclapply(seq_along(cik_files), f, cik_files = cik_files)
# Windows
ncores <- detectCores()
cl <- makeCluster(ncores - 1L)
clusterExport(cl, "cik_files")
clusterEvalQ(cl, "cik_files")
clusterEvalQ(cl, library(jsonlite))
json_list <- parLapply(cl, seq_along(cik_files), f, cik_files = cik_files)
stopCluster(cl)
Step 3
Extract the data from the returned list json_list.
err <- sapply(json_list, inherits, "error")
warn <- sapply(json_list, inherits, "warning")
ok <- !(err | warn)
json_list[ok] # correctly read in

Replace a Polygon in a Spatial Polygon Data Frame

I have a spatial polygon dataframe with several shapefiles. I would like to trim these shapefiles by elevation and replace the original shapefile in the dataframe. However, there appears to be an error when I try to replace the polygon after the trim. Currently, my plan is run though the following loop for each shapefile in the dataset. However, when I try dist[i,] <- temp3, I get the following error:
Error in match(value, lx) : 'match' requires vector arguments
In addition: Warning message:
In checkNames(value) :
attempt to set invalid names: this may lead to problems later on. See ?make.names
Any suggestions?
# Load spdf
dist <- rgdal::readOGR('critterDistributions.shp');
# Load elevational ranges
rangeElevation <- read.csv(file = 'elevationRanges.csv');
# Load altitude data
elevation <- raster('ETOPO1_Bed_g_geotiff.tif');
# Tidy up CRSes
crs(elevation) <- crs(dist);
# Run loop
for (i in 1:length(dist)){
subjName <- as.character(dist#data$Species[i]);
if (!(subjName %in% rangeElevation$?..Species_name)){
paste0(subjName, 'does not exist in the elevational range database.');
}
else{
erNameMatch <- match(subjName, rangeElevation$?..Species_name);
temp <- raster::reclassify(elevation, rcl = c(-Inf,rangeElevation[erNameMatch,2],NA,
rangeElevation[erNameMatch,2],rangeElevation[erNameMatch,3],1,
rangeElevation[erNameMatch,3],Inf,NA));
temp2 <- dist[i,];
temp <- mask(temp, temp2);
temp <- crop(temp, temp2);
temp3 <- rasterToPolygons(temp, na.rm = T, dissolve = T);
names(temp3) <- make.names(names(temp2), unique = T);
temp3#data <- temp2#data;
dist[i,] <- temp3; # <<<< This is the line of code that doesn't work.
}
}
Upon further thought, I came up with a workaround: initiating a list, then using rbind after the loop to unite everything back together into a single object. I'm still interested in finding out why dist[i,] <- temp3 doesn't work, but at least I was able to get this job done.
oneSPDFtoRuleThemAll <- vector(mode = "list", length = length(dist));
for (i in 1:length(dist)){
subjName <- as.character(dist#data$Species[i]);
if (!(subjName %in% rangeElevation$?..Species_name)){
paste0(subjName, 'does not exist in the elevational range database.');
}
else{
erNameMatch <- match(subjName, rangeElevation$?..Species_name);
temp <- raster::reclassify(elevation, rcl = c(-Inf,rangeElevation[erNameMatch,2],NA,
rangeElevation[erNameMatch,2],rangeElevation[erNameMatch,3],1,
rangeElevation[erNameMatch,3],Inf,NA));
temp2 <- dist[i,];
temp <- mask(temp, temp2);
temp <- crop(temp, temp2);
temp3 <- rasterToPolygons(temp, na.rm = T, dissolve = T);
names(temp3) <- make.names(names(temp2), unique = T);
temp3#data <- temp2#data;
oneSPDFtoRuleThemAll[[i]] <- temp3; # <<<< This is the line of code that doesn't work.
}
}
finalSPDF <- rbind(unlist(oneSPDFtoRuleThemAll));

How to avoid writing the same line several times in R?

I'm writing a program in R and I need to select variables based in a particular value of one of the variable. The program is the next:
a1961 <- base[base[,5]==1961,]
a1962 <- base[base[,5]==1962,]
a1963 <- base[base[,5]==1963,]
a1964 <- base[base[,5]==1964,]
a1965 <- base[base[,5]==1965,]
a1966 <- base[base[,5]==1966,]
a1967 <- base[base[,5]==1967,]
a1968 <- base[base[,5]==1968,]
a1969 <- base[base[,5]==1969,]
a1970 <- base[base[,5]==1970,]
a1971 <- base[base[,5]==1971,]
a1972 <- base[base[,5]==1972,]
a1973 <- base[base[,5]==1973,]
a1974 <- base[base[,5]==1974,]
a1975 <- base[base[,5]==1975,]
a1976 <- base[base[,5]==1976,]
a1977 <- base[base[,5]==1977,]
a1978 <- base[base[,5]==1978,]
a1979 <- base[base[,5]==1979,]
a1980 <- base[base[,5]==1980,]
a1981 <- base[base[,5]==1981,]
a1982 <- base[base[,5]==1982,]
a1983 <- base[base[,5]==1983,]
a1984 <- base[base[,5]==1984,]
a1985 <- base[base[,5]==1985,]
a1986 <- base[base[,5]==1986,]
a1987 <- base[base[,5]==1987,]
a1988 <- base[base[,5]==1988,]
a1989 <- base[base[,5]==1989,]
...
a2012 <- base[base[,5]==2012,]
Is there a way (like modules in SAS) in which I can avoid writing the same thing over and over again?
In general, coding/implementation questions really belong on StackOverflow. That said, my recommendation is instead of naming individual variables for each result, just throw them all into a list:
a = lapply(1961:1989, function(x) base[base[,5]==x,]
You can also use the assign command.
years <- 1961:2012
for(i in 1:length(years)) {
assign(x = paste0("a", years[i]), value = base[base[,5]==years[i],])
}

Resources