Error in stream-delim when exporting to CSV - r

I'm trying to write this StatsBomb Data into a CSV but I keep on getting the following error message:
Error in stream_delim_(df, path, ..., bom = bom, quote_escape = quote_escape) :
Don't know how to handle vector of type list.
I'm lost (tried multiple things) and not sure what I did wrong here. Is there anyone out here who knows how to solve this? I've included my code below.
library(StatsBombR)
library(tidyverse)
### Read in all free events and matches from the FAWSL
data <- StatsBombFreeEvents()
matches <- FreeMatches(Competitions = 72)
### Clean and separate all data loaded above
dataclean <- allclean(data)
### Filter event data to include only FAWSL data.
data1 <- dataclean %>%
filter(dataclean$competition_id == 72)
### Join event and match data by "match_id"
data1 <- left_join(data1, matches, by = "match_id")
FullData <- data1 %>%
select(-c(related_events, tactics.lineup, shot.freeze_frame, location, pass.end_location, shot.end_location, goalkeeper.end_location))
setwd()
write_csv(FullData, "StatsBomb_FullData.csv")

I had the same problem. Unlisting the column fixed mine.
df$listcolumn <- sapply(df$listcolumn, function(x) paste0(unlist(x), collapse = "\n"))

Related

Ordering columns of data in R

I have a CSV file with 141 rows and several columns. I wanted my data to be ordered in ascending order by the first two columns i.e. 'label' and 'index'. Following is my code:
final_data <- read.csv("./features.csv",
header = FALSE,
col.names = c('label','index', 'nr_pix', 'rows_with_1', 'cols_with_1',
'rows_with_3p', 'cols_with_3p', 'aspect_ratio',
'neigh_1', 'no_neigh_above', 'no_neigh_below',
'no_neigh_left', 'no_neigh_right', 'no_neigh_horiz',
'no_neigh_vert', 'connected_areas', 'eyes', 'custom'))
sorted_data_by_label <- final_data[order(label),]
sorted_data_by_index <- sorted_data_by_label[order(index),]
write.table(sorted_data_by_index, file = "./features.csv",
append = FALSE, sep = ',',
row.names = FALSE)
I chose to read from a CSV and use write.table because that was necessary for my code requirement to override the CSV with column names.
Now even when I added a , after order(label), and order(index), the code sorted data should still read other rows and columns right?
After running this code, I only get the first row out of 141 rows. Is there a way to fix this problem?
As #akrun has mentioned briefly, what you need to do is to change
sorted_data_by_label <- final_data[order(label),]
to
sorted_data_by_label <- final_data[order(final_data$label),]
and to change
sorted_data_by_index <- sorted_data_by_label[order(index),]
to
sorted_data_by_index <- sorted_data_by_label[order(sorted_data_by_label$index),]
This is because when you write label, R will try to find the index object in the global environment, not within the final_data data frame.
If you intended to use index that is a column of final_data, you need to use explicit final_data$index.
Other options
You can use with:
sorted_data_by_label <- with(final_data, final_data[order(label),])
sorted_data_by_index <- with(sorted_data_by_label, sorted_data_by_label[order(index),])
In dplyr you can use
sorted_data_by_label <- final_data %>% arrange(label)
sorted_data_by_index <- sorted_data_by_label %>% arrange(index)

Using API in R and converting to dataframe format

I'm basically trying to call an API to retrieve weather information from a government website.
library(data.table)
library(jsonlite)
library(httr)
base<-"https://api.data.gov.sg/v1/environment/rainfall"
date1<-"2020-01-25"
call1<-paste(base,"?","date","=",date1,sep="")
get_rainfall<-GET(call1)
get_rainfall_text<-content(get_rainfall,"text")
get_rainfall_json <- fromJSON(get_rainfall_text, flatten = TRUE)
get_rainfall_df <- as.data.frame(get_rainfall_json)
I'm getting an error
"Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 52, 287, 1"
Not too sure how to resolve this, i'm trying to format the retrieved data into a dataframe format so i can make sense of the readings.
Your "get_rainfall_json" object comes back as a "list". Trying to turn this into a data frame is where you are getting the error. If you specify the "items" object within the list, your error is resolved! (The outcome of this looks like it has some more embedded data within objects... So you'll have to parse through that into a format you're interested in.)
get_rainfall_df <- as.data.frame(get_rainfall_json$items)
Update
In order to loop through the next data frame. Here is one way you could do it. Which loops through each row, extracts the list in each row and turns that into a data frame and appends it to the "df". Then, you are left with one final df with all the data in one place.
library(data.table)
library(jsonlite)
library(httr)
library(dplyr)
base <- "https://api.data.gov.sg/v1/environment/rainfall"
date1 <- "2020-01-25"
call1 <- paste(base, "?", "date", "=", date1, sep = "")
get_rainfall <- GET(call1)
get_rainfall_text <- content(get_rainfall,"text")
get_rainfall_json <- fromJSON(get_rainfall_text, flatten = TRUE)
get_rainfall_df <- as.data.table(get_rainfall_json$items)
df <- data.frame()
for (row in 1:nrow(get_rainfall_df)) {
new_date <- get_rainfall_df[row, ]$readings[[1]]
colnames(new_date) <- c("stationid", "value")
date <- get_rainfall_df[row, ]$timestamp
new_date$date <- date
df <- bind_rows(df, new_date)
}

Getting "table of extent 0" in Shiny Web app output

I have a data file that I read in in my Shiny server function. I would like to display a frequency table of the two columns the user selects using drop-downs. I get the error "table of extent 0". I have looked at R error - Table of extent 0 and Can't solve table issue but I have imported my data correctly and the column names match as well. The same line of code works when I run it in the console.
Here is my code:
shinyServer(function(input, output) {
output$courseData = renderPrint( {
data = read.csv(file = 'FourCourseTableLetterGrades_POLISHED.tsv', sep = '\t', header = TRUE)
c1 = input$course1
c2 = input$course2
tbl = table(data$c1, data$c2)
tbl
}
)
}
)
Update: this is what the table looks like right now:
I would like the output to be in matrix format, just as what you get when running the table command in console. I also don't know why the columns are named Var1 and Var2 and where to change them.
the first problem is that c1 and c2 are character variables therefore you need to use [[]] instead of $. The second problem is that what you see ist the table format of the result from table if you rather have the matrix you can calculate it quite easy with the package dplyr fro example
library(dplyr)
data = read.csv(file = 'FourCourseTableLetterGrades_POLISHED.tsv', sep = '\t', header = TRUE)
c1 = input$course1
c2 = input$course2
tbl = tibble(data[[c1]], data[[c2]]) %>%
group_by_all() %>%
tally() %>%
tidyr::spread(2,n)
tbl
hope this helps!!
Using data[[c1]] instead of data$c1 as suggested in the comments removed the error and showed a basic (although malformed) output. I did not understand why.

Iterating through values in R

I'm new-ish to R and am having some trouble iterating through values.
For context: I have data on 60 people over time, and each person has his/her own dataset in a folder (I received the data with id #s 00:59). For each person, there are 2 values I need - time of response and picture response given (a number 1 - 16). I need to convert this data from wide to long format for each person, and then eventually append all of the datasets together.
My problem is that I'm having trouble writing a loop that will do this for each person (i.e. each dataset). Here's the code I have so far:
pam[x] <- fromJSON(file = "PAM_u[x].json")
pam[x]df <- as.data.frame(pam[x])
#Creating long dataframe for times
pam[x]_long_times <- gather(
select(pam[x]df, starts_with("resp")),
key = "time",
value = "resp_times"
)
#Creating long dataframe for pic_nums (affect response)
pam[x]_long_pics <- gather(
select(pam[x]df, starts_with("pic")),
key = "picture",
value = "pic_num"
)
#Combining the two long dataframes so that I have one df per person
pam[x]_long_fin <- bind_cols(pam[x]_long_times, pam[x]_long_pics) %>%
select(resp_times, pic_num) %>%
add_column(id = [x], .before = 1)
If you replace [x] in the above code with a person's id# (e.g. 00), the code will run and will give me the dataframe I want for that person. Any advice on how to do this so I can get all 60 people done?
Thanks!
EDIT
So, using library(jsonlite) rather than library(rjson) set up the files in the format I needed without having to do all of the manipulation. Thanks all for the responses, but the solution was apparently much easier than I'd thought.
I don't know the structure of your json files. If you are not in the same folder, like the json files, try that:
library(jsonlite)
# setup - read files
json_folder <- "U:/test/" #adjust you folder here
files <- list.files(path = paste0(json_folder), pattern = "\\.json$")
# import data
pam <- NULL
pam_df <- NULL
for (i in seq_along(files)) {
pam[[i]] <- fromJSON(file = files[i])
pam_df[[i]] <- as.data.frame(pam[[i]])
}
Here you generally read all json files in the folder and build a vector of a length of 60.
Than you sequence along that vector and read all files.
I assume at the end you can do bind_rowsor add you code in the for loop. But remember to set the data frames to NULL before the loop starts, e.g. pam_long_pics <- NULL
Hope that helped? Let me know.
Something along these lines could work:
#library("tidyverse")
#library("jsonlite")
file_list <- list.files(pattern = "*.json", full.names = TRUE)
Data_raw <- tibble(File_name = file_list) %>%
mutate(File_contents = map(File_name, fromJSON)) %>% # This should result in a nested tibble
mutate(File_contents = map(File_contents, as_tibble))
Data_raw %>%
mutate(Long_times = map(File_contents, ~ gather(key = "time", value = "resp_times", starts_with("resp"))),
Long_pics = map(File_contents, ~ gather(key = "picture", value = "pic_num", starts_with("pic")))) %>%
unnest(Long_times, Long_pics) %>%
select(File_name, resp_times, pic_num)
EDIT: you may or may not need not to include as_tibble() after reading in the JSON files, depending on how your data looks like.

How to tackle 'attempt to set 'colnames' on an object with less than two dimensions' error in R

I have a renderTable output in the server of Shiny and I am trying to rename the final table with the following codes:
output$tubeArrival <- renderTable({
#GET request and convert JSON to a dataframe
data <- GET(url)
text_data <- content(data,as = 'text')
json_data <- fromJSON(text_data)
json_data$timeToArrive = minSec(json_data$timeToStation)
json_data$bound <- substr(as.character(json_data$platformName),1,1)
json_data$platform <- substrRight(as.character(json_data$platformName),1)
cleaned_data <- subset(json_data,boundDirect(json_data$bound) == input$direction)
final_data <- cleaned_data[c('platform','towards','timeToArrive','currentLocation')]
colnames(final_data) <- c('Plat.','To','ETA','Current Loc.')
final_data <- final_data})
But the following error appears:
Warning: Error in colnames<-: attempt to set 'colnames' on an object with less than two dimensions
Very appreciate any helps!
Thanks in advance,
Tommy
What line is throwing the error? Try traceback() to get the callstack, and the offending line.
Just eyeballing it, it looks like you're missing a comma between the (empty) row position and the column position. That third-to-last line might need to change to
final_data <- cleaned_data[, c('platform','towards','timeToArrive','currentLocation')
BTW, renaming columns is a safer if you do something like this. If there is a missing column, the error message should be better.
cleaned_data <- json_data %>%
dplyr::filter(boundDirect(.$bound) == .$direction) %>%
dplyr::rename_(
'Plat.' = 'platform',
'To' = 'towards',
'ETA' = 'timeToArrive',
'Current Loc.' = 'currentLocation'
)

Resources