Using API in R and converting to dataframe format - r

I'm basically trying to call an API to retrieve weather information from a government website.
library(data.table)
library(jsonlite)
library(httr)
base<-"https://api.data.gov.sg/v1/environment/rainfall"
date1<-"2020-01-25"
call1<-paste(base,"?","date","=",date1,sep="")
get_rainfall<-GET(call1)
get_rainfall_text<-content(get_rainfall,"text")
get_rainfall_json <- fromJSON(get_rainfall_text, flatten = TRUE)
get_rainfall_df <- as.data.frame(get_rainfall_json)
I'm getting an error
"Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 52, 287, 1"
Not too sure how to resolve this, i'm trying to format the retrieved data into a dataframe format so i can make sense of the readings.

Your "get_rainfall_json" object comes back as a "list". Trying to turn this into a data frame is where you are getting the error. If you specify the "items" object within the list, your error is resolved! (The outcome of this looks like it has some more embedded data within objects... So you'll have to parse through that into a format you're interested in.)
get_rainfall_df <- as.data.frame(get_rainfall_json$items)
Update
In order to loop through the next data frame. Here is one way you could do it. Which loops through each row, extracts the list in each row and turns that into a data frame and appends it to the "df". Then, you are left with one final df with all the data in one place.
library(data.table)
library(jsonlite)
library(httr)
library(dplyr)
base <- "https://api.data.gov.sg/v1/environment/rainfall"
date1 <- "2020-01-25"
call1 <- paste(base, "?", "date", "=", date1, sep = "")
get_rainfall <- GET(call1)
get_rainfall_text <- content(get_rainfall,"text")
get_rainfall_json <- fromJSON(get_rainfall_text, flatten = TRUE)
get_rainfall_df <- as.data.table(get_rainfall_json$items)
df <- data.frame()
for (row in 1:nrow(get_rainfall_df)) {
new_date <- get_rainfall_df[row, ]$readings[[1]]
colnames(new_date) <- c("stationid", "value")
date <- get_rainfall_df[row, ]$timestamp
new_date$date <- date
df <- bind_rows(df, new_date)
}

Related

Ordering columns of data in R

I have a CSV file with 141 rows and several columns. I wanted my data to be ordered in ascending order by the first two columns i.e. 'label' and 'index'. Following is my code:
final_data <- read.csv("./features.csv",
header = FALSE,
col.names = c('label','index', 'nr_pix', 'rows_with_1', 'cols_with_1',
'rows_with_3p', 'cols_with_3p', 'aspect_ratio',
'neigh_1', 'no_neigh_above', 'no_neigh_below',
'no_neigh_left', 'no_neigh_right', 'no_neigh_horiz',
'no_neigh_vert', 'connected_areas', 'eyes', 'custom'))
sorted_data_by_label <- final_data[order(label),]
sorted_data_by_index <- sorted_data_by_label[order(index),]
write.table(sorted_data_by_index, file = "./features.csv",
append = FALSE, sep = ',',
row.names = FALSE)
I chose to read from a CSV and use write.table because that was necessary for my code requirement to override the CSV with column names.
Now even when I added a , after order(label), and order(index), the code sorted data should still read other rows and columns right?
After running this code, I only get the first row out of 141 rows. Is there a way to fix this problem?
As #akrun has mentioned briefly, what you need to do is to change
sorted_data_by_label <- final_data[order(label),]
to
sorted_data_by_label <- final_data[order(final_data$label),]
and to change
sorted_data_by_index <- sorted_data_by_label[order(index),]
to
sorted_data_by_index <- sorted_data_by_label[order(sorted_data_by_label$index),]
This is because when you write label, R will try to find the index object in the global environment, not within the final_data data frame.
If you intended to use index that is a column of final_data, you need to use explicit final_data$index.
Other options
You can use with:
sorted_data_by_label <- with(final_data, final_data[order(label),])
sorted_data_by_index <- with(sorted_data_by_label, sorted_data_by_label[order(index),])
In dplyr you can use
sorted_data_by_label <- final_data %>% arrange(label)
sorted_data_by_index <- sorted_data_by_label %>% arrange(index)

Error in stream-delim when exporting to CSV

I'm trying to write this StatsBomb Data into a CSV but I keep on getting the following error message:
Error in stream_delim_(df, path, ..., bom = bom, quote_escape = quote_escape) :
Don't know how to handle vector of type list.
I'm lost (tried multiple things) and not sure what I did wrong here. Is there anyone out here who knows how to solve this? I've included my code below.
library(StatsBombR)
library(tidyverse)
### Read in all free events and matches from the FAWSL
data <- StatsBombFreeEvents()
matches <- FreeMatches(Competitions = 72)
### Clean and separate all data loaded above
dataclean <- allclean(data)
### Filter event data to include only FAWSL data.
data1 <- dataclean %>%
filter(dataclean$competition_id == 72)
### Join event and match data by "match_id"
data1 <- left_join(data1, matches, by = "match_id")
FullData <- data1 %>%
select(-c(related_events, tactics.lineup, shot.freeze_frame, location, pass.end_location, shot.end_location, goalkeeper.end_location))
setwd()
write_csv(FullData, "StatsBomb_FullData.csv")
I had the same problem. Unlisting the column fixed mine.
df$listcolumn <- sapply(df$listcolumn, function(x) paste0(unlist(x), collapse = "\n"))

Key-value pairs with character data to a dataframe in R?

I had been following this guide for how to convert a key-value dictionary to a dataframe in R using rjson, but I can't seem to get it to work with my data:
{"tagid":493,"name":"Early Access","count":75}
{"tagid":599,"name":"Simulation","count":68,"browseable":true}
{"tagid":1755,"name":"Space","count":64,"browseable":true}
Doesn't seem to want to parse into a dataframe because of the character values for the name key:
Error in FUN(X[[i]], ...) : unexpected character '''
I'm using the same code as the example I linked to:
library(rjson)
Lines <- readLines("clipboard")
json_df <- as.data.frame(t(sapply(Lines, fromJSON)))
Is there a way to do a similar conversion of the dictionary data to a data frame here if there's character data?
Edit: the following script is what I'm using to generate the data:
webpage <- read_html("https://store.steampowered.com/app/387290")
data <- html_nodes(webpage, css = "script") %>% html_text()
tag_data <- data[lapply(data,function(x) length(grep("InitAppTagModal",x,value=FALSE))) == 1]
tag_data <- regmatches(tag_data, gregexpr("[?<=\\[].*?[?=\\]]", tag_data, perl=T))[[1]][1]
tag_data <- gsub('[', "", tag_data, fixed = TRUE)
tag_data <- gsub(']', "", tag_data, fixed = TRUE)
tag_data <- gsub("},{", "}\n{", tag_data, fixed = TRUE)
writeLines(tag_data, con = "temp.json", sep = "\n")
tag_df <- stream_in(file("temp.json"))
I put your example data into a json file. And I named it test.json
library(jsonlite)
myoutput <- stream_in(file("test.json"))
myoutput
tagid name count browseable
1 493 Early Access 75 NA
2 599 Simulation 68 TRUE
3 1755 Space 64 TRUE

What is wrong with my pattern matching and replacement function

I have a dataframe with temperatures in the format XX,X instead of XX.X.
I can use the following code to successfully change them...
df$tempMedian <- sub(",",".",df$tempMedian)
df$tempMedian <- as.numeric(df$tempMedian)
I've tried writing the following function to do the same thing:
comma_to_point <- function(data, colname){
data$colname <- sub(",", ".", data$colname)
data$colname <- as.numeric(data$colname)
}
When I call the function:
comma_to_point(df, tempMedian)
I get the following error:
"Error in `$<-.data.frame`(`tmp`, colname, value = character(0)) :
replacement has 0 rows, data has 365"
My dataframe is 365 obs long.
Give this a shot
comma_to_point <- function(data, colname){
data[[colname]] <- sub(",", ".", data[[colname]])
data[[colname]] <- as.numeric(data[[colname]])
return (data)
}
df = comma_to_point(df, "tempMedian")
When using a variable var='my_column' to reference a column in a data.frame, you can't do df$var, since R will think var is the name of the column. Instead you can get the column with df[[var]].

R rbind - numbers of columns of arguments do not match

How can I ignore a data set if some column names don't exist in it?
I have a list of weather data from a stream but I think certain key weather conditions don't exist and therefore I have this error below with rbind:
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
My code:
weatherDf <- data.frame()
for(i in weatherData) {
# Get the airport code.
airport <- i$airport
# Get the date.
date <- as.POSIXct(as.numeric(as.character(i$timestamp))/1000, origin="1970-01-01", tz="UTC-1")
# Get the data in dailysummary only.
dailySummary <- i$dailysummary
weatherDf <- rbind(weatherDf, ldply(
list(dailySummary),
function(x) c(airport, format(as.Date(date), "%Y-%m-%d"), x[["meanwindspdi"]], x[["meanwdird"]], x[["meantempm"]], x[["humidity"]])
))
}
So how can I make sure these key conditions below exist in the data:
meanwindspdi
meanwdird
meantempm
humidity
If any of them does not exit, then ignore the bunch of them. Is it possible?
EDIT:
The content of weatherData is in jsfiddle (I can't post it here as it is too long and I dunno where is the best place to show the data publicly for R...)
EDIT 2:
I get some error when I try to export the data into a txt:
> write.table(weatherData,"/home/teelou/Desktop/data/data.txt",sep="\t",row.names=FALSE)
Error in data.frame(date = list(pretty = "January 1, 1970", year = "1970", :
arguments imply differing number of rows: 1, 0
What does it mean? It seems that there are some errors in the data...
EDIT 3:
I have exported my entire data in .RData to my google drive:
https://drive.google.com/file/d/0B_w5RSQMxtRSbjdQYWJMX3pfWXM/view?usp=sharing
If you use RStudio, then you can just import the data.
EDIT 4:
target_names <- c("meanwindspdi", "meanwdird", "meantempm", "humidity")
# If it has data then loop it.
if (!is.null(weatherData)) {
# Initialize a data frame.
weatherDf <- data.frame()
for(i in weatherData) {
if (!all(target_names %in% names(i)))
next
# Get the airport code.
airport <- i$airport
# Get the date.
date <- as.POSIXct(as.numeric(as.character(i$timestamp))/1000, origin="1970-01-01", tz="UTC-1")
# Get the data in dailysummary only.
dailySummary <- i$dailysummary
weatherDf <- rbind(weatherDf, ldply(
list(dailySummary),
function(x) c(airport, format(as.Date(date), "%Y-%m-%d"), x[["meanwindspdi"]], x[["meanwdird"]], x[["meantempm"]], x[["humidity"]])
))
}
# Rename column names.
colnames(weatherDf) <- c("airport", "key_date", "ws", "wd", "tempi", 'humidity')
# Convert certain columns weatherDf type to numberic.
columns <-c("ws", "wd", "tempi", "humidity")
weatherDf[, columns] <- lapply(columns, function(x) as.numeric(weatherDf[[x]]))
}
Inspect the weatherDf:
> View(weatherDf)
Error in .subset2(x, i, exact = exact) : subscript out of bounds
You can use next to skip the current iteration of the loop and go to the next iteration:
target_names <- c("meanwindspdi", "meanwdird", "meantempm", "humidity")
for(i in weatherData) {
if (!all(target_names %in% names(i)))
next
# continue with loop...

Resources