How can I convert this query into R.
Select Location, MAX(cast(Total_deaths as int)) as TotalDeathCount
From PortfolioProject..CovidDeaths
Where continent is not null
Group by Location
order by TotalDeathCount desc
I tried this but keeping getting an error.
library(dplyr)
library(tidyr)
covid_death %>%
drop_na() %>%
select(total_deaths) %>%
summarise(TotalDeathCount = max(total_deaths)) %>%
filter(continent != null)
Error in filter(., continent != null) :
Caused by error in `mask$eval_all_filter()`:
! object 'null' not found
The following should do:
covid_death %>%
drop_na() %>%
group_by(location) %>%
summarise(TotalDeathCount = max(total_deaths)) %>%
arrange(desc(TotalDeathCount))
Related
i am having a problem when trying to scrape some data, i have created a function that is properly working, problems occurs when i run this function for many different code.
require ("rvest")
library("dplyr")
getFin = function(ticker)
{
url= paste0("https://it.finance.yahoo.com/quote/",ticker,
"/key-statistics?p=",ticker)
a <- read_html(url)
tbl= a %>% html_nodes("section") %>% html_nodes("div")%>% html_nodes("table")
misureval = tbl %>% .[1] %>% html_table() %>% as.data.frame()
prezzistorici = tbl %>% .[2] %>% html_table() %>% as.data.frame()
titolistat = tbl %>% .[3] %>% html_table() %>% as.data.frame()
dividendi = tbl %>% .[4] %>% html_table() %>% as.data.frame()
annofiscale = tbl %>% .[5] %>% html_table() %>% as.data.frame()
redditivita = tbl %>% .[6] %>% html_table() %>% as.data.frame()
gestione = tbl %>% .[7] %>% html_table() %>% as.data.frame()
contoeco = tbl %>% .[8] %>% html_table() %>% as.data.frame()
bilancio = tbl %>% .[9] %>% html_table() %>% as.data.frame()
flussi = tbl %>% .[10] %>% html_table() %>% as.data.frame()
info1 = rbind(ticker, misureval, prezzistorici, titolistat, dividendi, annofiscale, redditivita, gestione, contoeco, bilancio, flussi)
}
What i am trying to do is to use
finale <- lapply(codici, getFin)
where codici is linked to many different Ticker which will be used in the function to generate one url at time and scrape data.
I have tried with 50 ticker and the function works properly, however when i increase the number i get this error:
Error in xml_nodeset(NextMethod()) : Expecting an external pointer:
[type=NULL].
i don't know if this may be related to the number of request or something other. i have also tested a non existing ticker and the function still works, problems just arises when the number is large.
Solved problem, i just need to add Sys.sleep in order to reduce the frequency of requests.
the best number in this case is 3, so Sys.sleep(3) at the end of the for cycle.
I am getting the above error message when I input the following syntax:
y <- mydata_final_distinct %>%
select(Case_ID,starts_with (Effect_)) %>%
pivot_longer(-Case_ID, names_to='symptoms', values_to=adr) %>%
distinct(Case_ID,'symptoms',adr,.keep_all = TRUE) %>%
count(adr) %>%
arrange(desc(n))
Although it is difficult to troubleshoot without having some of your data, I would first make the library explicit in the function calls, as it may be trying to use select from another library.
library(dplyr)
library(tidyr)
y <- mydata_final_distinct %>%
dplyr::select(Case_ID, starts_with (Effect_)) %>%
tidyr::pivot_longer(-Case_ID, names_to = 'symptoms', values_to = adr) %>%
dplyr::distinct(Case_ID, 'symptoms', adr, .keep_all = TRUE) %>%
dplyr::count(adr) %>%
dplyr::arrange(desc(n))
I use dplyr (version 0.8.3) to query data. The query includes parameters that are defined in the beginning of the query, as shown in the first reprex below. On top of that, I want to collect the parameters in a list, as shown the second reprex. The first and second examples, which query data from a dataframe, work fine. However, when I want to query data from a database using parameters saved in a list, the SQL translation produces a non-working SQL that cannot be used to query data from a database.
Is there a way to incorporate list entries into a dplyr pipeline to query data from a database?
library(dplyr)
library(dbplyr)
library(RSQLite)
# 1. Data frame: value
a <- 1
mtcars %>% filter(vs == a) # Works
# 2. Data frame: list value
input <- list()
input$a <- 1
mtcars %>% filter(vs == input$a) # Works
# 3. Database: value and list value
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, mtcars, "mtcars", temporary = FALSE)
db_mtcars <- tbl(con, "mtcars")
db_mtcars %>% filter(vs == a) %>% show_query() %>% collect() # Works
db_mtcars %>% filter(vs == input$a) %>% show_query() %>% collect() # Does not work
db_mtcars %>% filter(vs == input[1]) %>% show_query() %>% collect() # Does not work
db_mtcars %>% filter(vs == input[[1]]) %>% show_query() %>% collect() # Does not work
The background of my question is that I want to process and analyze data in a shiny app. I find it easier to develop the code for processing data outside the app and then include the code in the app afterwards. However, this task becomes increasingly difficult with a growing number of inputs. For development, my idea was to define a list named "input" such that I can copy and paste the code into the app. However, I stumble over the problem decribed above. Suggestions for an alternative development workflow are very welcome, too.
For dplyr>=1.0.0 you need to {{embrace}} values, see programming with dplyr :
db_mtcars %>% filter(vs == a) %>% show_query() %>% collect() # Works
db_mtcars %>% filter(vs == {{input$a}}) %>% show_query() %>% collect() # should work
db_mtcars %>% filter(vs == {{input[1]}}) %>% show_query() %>% collect() # should work
db_mtcars %>% filter(vs == {{input[[1]]}}) %>% show_query() %>% collect() # should work
for dplyr <1.0.0 you can use the bang bang !! operator : !!input$a.
I am using the following code to get a set of heights sampled from GaltonFamilies
set.seed(1989, sample.kind="Rounding")
library(HistData)
data("GaltonFamilies")
female_heights <- GaltonFamilies%>%
filter(gender == "female") %>%
group_by(family) %>%
sample_n(1) %>%
ungroup() %>%
select(mother, childHeight) %>%
rename(daughter = childHeight)
But then I get the following error:
"Error in eval(lhs, parent, parent) : object 'GaltonFamilies' not found"
Any ideas on what could be going on?
EDIT: I found my error in the example below. I made a typo in stored_group in filter. It works as expected.
I want to use a character value to filter a database table. I use dplyr functions directly on the connection object. See my steps below.
I connected to my MariaDB database:
con <- dbConnect(RMariaDB::MariaDB(),
dbname = mariadb.database,
user = mariadb.username,
password = mariadb.password,
host = mariadb.host,
port = mariadb.port)
Then I want to use a filter on a table in the database, by using dplyr code directly on the connection above:
stored_group <- "some_group"
con %>%
tbl("Table") %>%
select(id, group) %>%
filter(group == stored_group) %>%
collect()
I got a error saying Unknown column 'stored_group' in 'where clause'. So I used show_query() like this:
stored_group <- "some_group"
con %>%
tbl("Table") %>%
select(id, group) %>%
filter(group == stored_group) %>%
show_query()
And I got:
<SQL>
SELECT `id`, `group`
FROM `Table`
WHERE (`group` = `stored_group`)
In translation, stored_group is seen as a column name instead of value in R. How do I prevent this?
On normal data.frames in R this works. Like:
stored_group <- "some_group"
data %>%
select(id, group) %>%
filter(group == stored_group)
I just tested the solution below, and it works. But my database table will grow. I want to filter directly on the database before collecting.
stored_group <- "some_group"
con %>%
tbl("Table") %>%
select(id, group) %>%
collect() %>%
filter(group == stored_group)
Any suggestions?