Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed last year.
Improve this question
This is my first attempted reprex. I am working through R for Data Science. I am trying to narrow down a data frame to then be able to mutate it, but having trouble with the endsWith() function I think. When I run this section of the code I get the following error message. You can see that when I then change to (x, "delay") I get a different message. I am not sure how to deal with either and would love some help. Also, I'm not sure why, but dplyr::select() is working for me (as an example), while select() is not, so that's why it's different than the book. Thanks!
flights_sml <- dplyr::select(flights,
year:day,
endsWith("delay"),
distance,
air_time
)
Error: argument "suffix" is missing, with no default
Run rlang::last_error() to see where the error occurred.
flights_sml <- dplyr::select(flights,
year:day,
endsWith(x, "delay"),
distance,
air_time
)
Error: Must subset columns with a valid subscript vector.
x Subscript has the wrong type logical.
ℹ It must be numeric or character.
Good start. The tidy-select function is ends_with, while the base r function is endsWith
library(nycflights13)
library(dplyr)
flights_sml<-flights %>%
select(year:day,ends_with("delay"),distance,air_time)
flights_sml
#> # A tibble: 336,776 × 7
#> year month day dep_delay arr_delay distance air_time
#> <int> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2013 1 1 2 11 1400 227
#> 2 2013 1 1 4 20 1416 227
#> 3 2013 1 1 2 33 1089 160
#> 4 2013 1 1 -1 -18 1576 183
#> 5 2013 1 1 -6 -25 762 116
#> 6 2013 1 1 -4 12 719 150
#> 7 2013 1 1 -5 19 1065 158
#> 8 2013 1 1 -3 -14 229 53
#> 9 2013 1 1 -3 -8 944 140
#> 10 2013 1 1 -2 8 733 138
#> # … with 336,766 more rows
Created on 2022-01-16 by the reprex package (v2.0.1)
Related
I would like to write a function that extracts some contributor data from a GitHub project's contributor page. For example: https://github.com/easystats/report/graphs/contributors
How can I extract, using R, for example the username, number of commits, number of additions, and number of removals?
Here is my attempt at web scrapping using rvest (https://github.com/tidyverse/rvest):
library(rvest)
contribs <- read_html("https://github.com/easystats/report/graphs/contributors")
section <- contribs %>% html_elements("section")
section
#> {xml_nodeset (0)}
contribs$node
#> <pointer: 0x0000027d9b9e9f10>
contribs$doc
#> <pointer: 0x0000027d9e03d140>
Created on 2023-01-29 with reprex v2.0.2
But I think I am not getting the expected result.
However, I would much prefer a solution where I could use an existing R package for this, or the GitHub API (https://github.com/r-lib/gh). But is it possible at all?
Found their API in the network section in the developer tools
library(tidyverse)
library(httr2)
"https://github.com/easystats/report/graphs/contributors-data" %>%
request() %>%
req_headers("x-requested-with" = "XMLHttpRequest",
accept = "appliacation/json") %>%
req_perform() %>%
resp_body_json(simplifyVector = TRUE) %>%
unnest(everything()) %>%
group_by(username = str_remove(path, "/")) %>%
summarise(across(a:c, sum))
# A tibble: 21 x 4
username a d c
<chr> <int> <int> <int>
1 DominiqueMakowski 203778 148154 325
2 IndrajeetPatil 15082 10513 159
3 LukasWallrich 1 1 1
4 bwiernik 1371 156 11
5 cgeger 1 1 1
6 drfeinberg 127 23 1
7 dtoher 26 26 1
8 etiennebacher 127 162 7
9 fkohrt 1 1 1
10 grimmjulian 2 2 1
11 humanfactors 22 23 4
12 jdtrat 1 1 1
13 m-macaskill 33 31 2
14 mattansb 1009 603 14
15 mutlusun 265 4 4
16 pkoaz 3 2 1
17 rempsyc 3427 2938 14
18 strengejacke 5129 38164 223
19 vincentarelbundock 5 0 1
20 webbedfeet 85 85 2
21 wjschne 2 2 1
I'm working on the following dataset and I'm trying to fill the missing entries of the VISUAL52 variables, imputing data by LOCF method (Last Observation Carried Forward).
library(readr)
library(mice)
library(finalfit)
library(Hmisc)
library(lattice)
library(VIM)
library(rms)
library(zoo)
> hw3
# A tibble: 240 x 11
treat LINE0 LOST4 LOST12 LOST24 LOST52 VISUAL0 VISUAL4 VISUAL12 VISUAL24 VISUAL52
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2 12 1 3 NA NA 59 55 45 NA NA
2 2 13 -1 0 0 2 65 70 65 65 55
3 1 8 0 1 6 NA 40 40 37 17 NA
4 1 13 0 0 0 0 67 64 64 64 68
5 2 14 NA NA NA NA 70 NA NA NA NA
6 2 12 2 2 2 4 59 53 52 53 42
7 1 13 0 -2 -1 0 64 68 74 72 65
8 1 8 1 0 1 1 39 37 43 37 37
9 2 12 1 2 1 1 59 58 49 54 58
10 1 10 0 -4 -4 NA 49 51 71 71 NA
# ... with 230 more rows
I don't know whether I've done it good or not, but I've tried to describe the sample size, mean and the standard error for the VISUAL52 variable per treatment in this way (just let me know whether I would have been better to use a different function).
numSummary(hw3[,"VISUAL52", drop=FALSE], groups=hw3$treat,
statistics=c("mean", "se(mean)", "quantiles"),
quantiles=c(0,.25,.5,.75,1))
binnedCounts(hw3[hw3$treat == '1', "VISUAL52", drop=FALSE])
# treat = 1
binnedCounts(hw3[hw3$treat == '2', "VISUAL52", drop=FALSE])
# treat = 2
However, as to the imputation part, I've run the function nafill() from the data-table package, but I get back the error you may see aftyer ruuning the complete() function.
library(data.table)
imp_locf <- nafill(hw3$VISUAL52, "locf", nan=NA)
data_imputed <- complete(imp_locf)
*emphasized text*Error in UseMethod("complete_") :
no applicable method for 'complete_' applied to an object of class "c('double', 'numeric')"
I'm wondering why the function turn back this error and whether someone may know some alternative methods to impute data with locf method and fill the missing data in dataset.
If you want to apply locf on your dataset, you can use the imputeTS package.
library(imputeTS)
hw3 <- na_locf(hw3)
hw3
or if you just want to use LOCF for the VISUAL52 variable:
library(imputeTS)
hw3$VISUAL52 <- na_locf(hw3$VISUAL52)
hw3
Also keep in mind other algorithms might be even better suited for your data. imputeTS offers multiple functions especially for time series imputation (more algorithms in imputeTS). The mice package you already seem to use has additional algorithms for cross-sectional data.
I know that this question has been repeated multiple times but I am not able to look exactly for what I am looking for in the previous topics. Please feel free to close the topic in case that this is duplicated.
I have a dataframe as follows:
> data %>% arrange(customer_id)
region market unit_key
1 2 98 320
2 2 98 321
3 4 184 287
4 4 4 7
5 4 4 287
6 66 521 899
7 66 521 900
8 66 3012 899
9 66 521 916
10 66 3011 900
I would like to make a 4th column which is a unique identifier call combination id that is formed as follows:
So basically for each unique pair of region and market I should get a unique identifier that will allow me to retrieve the unit_keys that they are linked with the combination of markets for an specific region.
I tried to do it with a cross-join and with tidyr::crossing() but I didnt get the expected results.
Any hints on this topic?
BR
/Edgar
Unfortunately the proposed solution by:
df %>% group_by(region, market) %>% mutate(id = cur_group_id())
Does not work as I get the following result:
combination_id %>% arrange(region)
# A tibble: 373 x 4
# Groups: region, market [182]
region market unit_key id
<dbl> <dbl> <dbl> <int>
1 2 98 320 1
2 2 98 321 1
3 4 184 287 3
4 4 4 7 2
5 4 4 287 2
6 66 521 899 4
In this case, for region 4 we should have the following combinations:
id=2 where market is 184
id=3 where market is 4
id=4 where market is 4 and 184
I am working on customer segmentation problem. And working on rfm package in R. While using rfm_table_order function, I am storing its output in rfm_result variable (which is created as tibble by default). After running the rfm_result, the console is showing a correct number of variables and observations. But, in a global environment rfm_result is showing only 5 variable and 0 observations. So I am unable to View the entire rfm_result dataset. Also, I cannot export it as rfm_result is getting stored only as an empty dataset in the csv file. Please help.
I tried to convert the tibble into dataFrame which is not working. Still, rfm_result has 0 observations
Code:
rfm_result <- rfm_table_order(data, CustomerID, InvoiceDate, Amount, analysis_date)
Output in Console (It is showing a correct number of rows and columns):
A tibble: 3,891 x 9
customer_id date_most_recent recency_days transaction_count amount recency_score frequency_score monetary_score rfm_score
<int> <date> <dbl> <dbl> <dbl> <int> <int> <int> <dbl>
1 12346 2011-01-18 326 2 0 1 1 1 111
2 12747 2011-12-07 3 96 3837. 5 4 5 545
3 12748 2011-12-09 1 4279 27215. 5 5 5 555
4 12749 2011-12-06 4 231 3868. 5 5 5 555
5 12820 2011-12-06 4 59 942. 5 4 4 544
6 12821 2011-05-09 215 6 92.7 1 1 1 111
7 12822 2011-09-30 71 47 919. 2 3 4 234
8 12823 2011-09-26 75 5 1760. 2 1 4 214
9 12824 2011-10-11 60 25 397. 3 2 2 322
10 12826 2011-12-07 3 94 1468. 5 4 4 544
# ... with 3,881 more rows
But Global Environment is showing 0 observations and 5 variables for tibble rfm_result
Also, if I check the dimension of rfm_result with dim(rfm_result), I am getting the output as 0 observations and 5 variables only.
I am not even able to export this tibble via write.csv() function. Its exporting blank CSV file.
Please help, how can I export this tibble in csv or View this tibble R.
class(rfm_result)
[1] "rfm_table_order" "tibble" "data.frame"
rfm_result$rfm
the above code will show you the whole result.
#converting to csv
write.table(result$rfm,file="rfm_results.csv",sep=",",append=FALSE,row.names = FALSE)
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have a data.frame as follows:
timestamp index negative positive sentiment
<dttm> <dbl> <dbl> <dbl> <dbl>
1 2015-10-29 15:00:10 0 11 10 -1
2 2015-10-29 17:26:48 0 1 5 4
3 2015-10-29 17:30:07 0 10 22 12
4 2015-10-29 20:13:22 0 5 6 1
5 2015-10-30 14:25:26 0 3 2 -1
6 2015-10-30 18:22:30 0 14 15 1
7 2015-10-31 14:16:00 0 10 23 13
8 2015-11-02 20:30:18 0 14 7 -7
9 2015-11-03 14:15:00 0 8 26 18
10 2015-11-03 16:52:30 0 12 34 22
I would like to know if there is a possibility to merge rows with equal days such that i have a scoring for each day, since I have absolutely no clue how to approach this problem because I dont even know how to unlist each date and write a function which merges only equal dates, because the time differs in each day . I would like to obtain a data.frame which has the following form:
timestamp index negative positive sentiment
<dttm> <dbl> <dbl> <dbl> <dbl>
1 2015-10-29 0 27 43 16
2 2015-10-30 0 3 2 -1
3 2015-10-31 0 17 17 0
4 2015-11-02 0 14 7 -7
5 2015-11-03 0 20 60 40
Is there any possibility to get around to this result? I would be thankful for any hint.
You can use aggregate() to do this. Before doing that, you'll need to show that it should sort according to the day, ignoring the exact time-point.
I will assume you have your data stored as df:
aggregate(df[ ,2:5], FUN="sum", by=list(as.Date(df$timestamp, "%Y-%m-%d")))