My query returns a bunch of columns that I can't manually rename (using project-rename) one by one. Also the input query is fixed , I can't change it and it may return different number of columns each time I run it, so I can't have a fixed project-rename statement. For example lets say for one of the runs of the input query the result is the following columns:-
fixedstring_region
fixedstring_state
fixedstring_level
fixedstring_reach
fixedstring_mode
fixedstring_something
fixedstring_otherthing
... etc
These can be hundreds. I want to remove 'fixedstring_' from all of these. Is there some wild card technique for this?
There's no built-in way to do it.
The best I can think of is a very non-efficient way, that also changes the order of the columns:
datatable(fixedstring_region:string, fixedstring_state:string, fixedstring_level:string, fixedstring_reach:string)
[
"a1", "b1", "c1", "d1",
"a2", "b2", "c2", "d2"
]
| project PackedRecord = todynamic(replace('"fixedstring_([a-zA-Z0-9_]*)":"', #'"\1":"', tostring(pack_all())))
| evaluate bag_unpack(PackedRecord)
Output:
datatable(fixedstring_region:string, fixedstring_state:string, fixedstring_level:string, fixedstring_reach:string)
[
"a1", "b1", "c1", "d1",
"a2", "b2", "c2", "d2"
]
| project PackedRecord = todynamic(replace('"fixedstring_([a-zA-Z0-9_]*)":"', #'"\1":"', tostring(pack_all())))
| evaluate bag_unpack(PackedRecord)
Output:
level
reach
region
state
c1
d1
a1
b1
c2
d2
a2
b2
Related
I want to directly import the dataset from the following URL directly into R to work with the data:
http://www.football-data.co.uk/mmz4281/2223/B1.csv
Previously I have used read_csv() or fread() like this:
data <- data.table::fread("http://www.football-data.co.uk/mmz4281/2223/B1.csv")
data <- readr::read_csv("http://www.football-data.co.uk/mmz4281/2223/B1.csv")
This used to work with the data being in a data.frame, and looking like the original data. However, now the output appears to be HTML:
e.g. if using read_csv()
head(data)
# A tibble: 6 x 1
`<HTML>`
<chr>
1 "<HEAD>"
2 "<TITLE>Football Betting | Football Results | Free Bets | Betting Odds</TITLE>"
3 "<meta name=\"twitter:card\" content=\"summary\" />"
4 "<meta name=\"twitter:site\" content=\"#12Xpert\" />"
5 "<meta name=\"twitter:title\" content=\"Football-Data.co.uk\" />"
Is there a way to directly import the csv from such a downloadable link, without having to download the excel file onto your computer and then loading it?
Goede morgen Jalapic, I think you are almost there.
Short answer - check your download link. You will have success with https (and not http://....
bets2223 <- readr::read_csv("https://www.football-data.co.uk/mmz4281/2223/B1.csv")
Longer answer. You can always work out your links by using tools that the R ecosystem offers.
library(rvest) # package to download web-content
library(dplyr) # tidyverse data wrangling
library(readr) # tidyverse read package
# check the page given by Jalapic
# and extract all `href` links
page <- read_html("https://www.football-data.co.uk/belgiumm.php")
links <- page %>% html_nodes("a") %>% html_attr("href")
# the list can be reduced to our "payload" mmz... files
links <- links[grepl(pattern = "^mmz4281", x = links)]
base_url <- "https://www.football-data.co.uk/"
# construct a vector of all links that fit our search patters
download_urls <- paste0(base_url, links)
download_urls[1:4] # to shorten the output - show only first 4 results
This yields
[1] "https://www.football-data.co.uk/mmz4281/2223/B1.csv" "https://www.football-data.co.uk/mmz4281/2122/B1.csv"
[3] "https://www.football-data.co.uk/mmz4281/2021/B1.csv" "https://www.football-data.co.uk/mmz4281/1920/B1.csv"
Ok. We now spot the right file name (or write a loop to download all files of interest).
For your case we pick the first one, i.e. download_urls[1].
bets2223 <- read_csv(download_urls[1])
glimpse(bets2223)
This gets us what we are looking for (note: truncated for presentation purposes):
Rows: 134
Columns: 105
$ Div <chr> "B1", "B1", "B1", "B1", "B1", "B1", "B1", "B1", "B1", "B1", "B1", "B1", "B1", "B1", "B1", "B1", "B1", "B1", "B1", "B1", "B…
$ Date <chr> "22/07/2022", "23/07/2022", "23/07/2022", "23/07/2022", "23/07/2022", "24/07/2022", "24/07/2022", "24/07/2022", "24/07/202…
$ Time <time> 19:45:00, 15:00:00, 17:15:00, 17:15:00, 19:45:00, 12:30:00, 15:00:00, 17:30:00, 20:00:00, 19:45:00, 15:00:00, 17:15:00, 1…
$ HomeTeam <chr> "Standard", "Charleroi", "Kortrijk", ...
I have several files in a folder that look like "blabla_A1_bla.txt", "blabla_A1_bla.phd","blabla_B1_bla.txt", "blablabla_B1_bla.phd"...and all the way to H12.
Then I have a df that indicates which sample is each one.
well
sample
A1
F32-1
B1
F13-3
C1
B11-4
...
...
I want to rename the files in the folder according to the table. So that A1 gets replaces by F32-1, B1 by F13-3 and so on.
I have created a list of all the files in the directory with files<-list.files(directory). I know how to use the str_replace function of the stringr package to change them one by one, but I don't know how to make it automatic. I guess I need a loop that reads cell 1,1 of the dataframe, searches that string in "files" and replaces it with the value in cell 1,2. And then moves to cell 2,1 and so on. But I don't know how to code this. (Or if there is a better way to do it).
I'll appreciate your help with this.
You can create a named vector of replacement and pattern and use it in str_replace_all
files <- list.files(directory)
files <- stringr::str_replace_all(files, setNames(df$sample, df$well))
Using a reproducible example -
df <- structure(list(well = c("A1", "B1", "C1"), sample = c("F32-1",
"F13-3", "B11-4")), class = "data.frame", row.names = c(NA, -3L))
files <- c("blabla_A1_bla.txt", "blabla_A1_bla.phd","blabla_B1_bla.txt", "blablabla_B1_bla.phd")
stringr::str_replace_all(files, setNames(df$sample, df$well))
#[1] "blabla_F32-1_bla.txt" "blabla_F32-1_bla.phd" "blabla_F13-3_bla.txt"
#[4] "blablabla_F13-3_bla.phd"
I would first create a vector of new names and then use the function file.rename:
files = c("blabla_A1_bla.phd","blabla_B1_bla.txt", "blablabla_B1_bla.phd")
patterns = c('A1', 'B1')
replace = c('F22', 'G22')
new.name = c()
for (f in files){
# first identify which pattern corresponds to file f (sis it A1, B1, ...)
which.pattern = which(sapply(patterns, grepl, x = f))
# and then replace it by the correct string
new.name = c(new.name, gsub(patterns[which.pattern], replace[which.pattern], f))
}
file.rename(files, new.name)
replacing patterns and replace by df$well and df$sample should work for your case.
I am new to R so thank you in advance for your patience.
I would like to create a multiple choice quiz in R using the learnr package (the quiz content is not about r code). I have all of the questions, response options, and correct answers in a spreadsheet. Since my item bank has over 100 items, I will give a simpler example
Stem<-c("stem1", "stem2", "stem3")
OptionA <- c("a1", "a2", "a3")
OptionB<- c("b1", "b2", "b3")
OptionC<- c("c1", "c2", "c3")
Correct<- c("c1", "b2", "a3")
items<-cbind(Stem, OptionA, OptionB, OptionC, Correct)
Currently, the only way I know how to pull in the data from the spreadsheet is like this:
learnr::question(items$Stem[1],
answer(items$OptionA[1]),
answer(items$OptionB[1]),
answer(items$OptonC[1], correct = TRUE),
answer(items$OptionD[1])
)
however this still requires me to write that chunk of code for each item and manually assign the correct answers. Does anyone know an easier way of doing this, either with learnr or another package?
You can simply loop over the rows of your data or spreadsheet and use a function to set up the questions and save them in a list. My approach uses purrr::map but you a simple for-loop we also do the trick. Try this:
---
title: "Tutorial"
output: learnr::tutorial
runtime: shiny_prerendered
---
```{r setup, include=FALSE}
library(learnr)
library(dplyr)
library(purrr)
knitr::opts_chunk$set(echo = FALSE)
```
```{r}
Stem<-c("stem1", "stem2", "stem3")
OptionA <- c("a1", "a2", "a3")
OptionB<- c("b1", "b2", "b3")
OptionC<- c("c1", "c2", "c3")
Correct<- c("c1", "b2", "a3")
items<-data.frame(Stem, OptionA, OptionB, OptionC, Correct)
```
## Topic 1
### Quiz
```{r quiz}
make_q <- function(x) {
question(x$Stem,
answer(x$OptionA, correct = x$Correct == x$OptionA),
answer(x$OptionB, correct = x$Correct == x$OptionB),
answer(x$OptionC, correct = x$Correct == x$OptionC))
}
questions <- items %>%
split(.$Stem) %>%
purrr::map(make_q)
```
```{r}
quiz(
questions[[1]],
questions[[2]],
questions[[3]])
```
I'm trying to build a table either using pandoc.table or kable and have problems getting them to print all 10 rows in my table, atm they both only prints the first six. While I moved to write the table manually, which works, it would be nice to know what's wrong with my code. I haven't seen anything to suggest that 6 rows are the limit, so my code should be workning? Anyone know why it doesn't? If I subset the dt I can print the last 4 as well so maybe 6 rows are a limit. Code below:
library("data.table")
library("knitr")
library("pander")
count.mark <- 35
dt.tbl1 <- data.table(Var = c("Geo", "A", "A",
"Cust", "A",
"Ins", "A",
"Vei", "A",
"Brand"),
RangeR = c("A1", "S1", "T1",
"Com", "Pri",
"T", "B",
"Pa", "Pe",
paste("A1 - A99 (",
count.mark, ")", sep="")
)
)
pandoc.table(head(dt.tbl1), justify = c("left", "centre"))
kable(head(dt.tbl1), justify = c("left", "centre"))
That's because you're using head(dt.tbl1), which by default shows the first six rows. You should just do, e.g.
pandoc.table(dt.tbl1, justify = c("left", "centre"))
I'm rather new to R and I guess there's more than one thing inadequate practice in my code (like, using a for loop). I think in this example, it could be solved better with something from the apply-family, but I would have no idea how to do it in my original problem - so, if possible, please let the for-loop be a for-loop. If something else is bad, I'm happy to hear your opinion.
But my real problem is this. I have:
name <- c("a", "a", "a", "a", "a", "a","a", "a", "a", "b", "b", "b","b", "b", "b","b", "b", "b")
class <- c("c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3","c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3")
value <- c(100, 33, 80, 90, 80, 100, 100, 90, 80, 90, 80, 100, 100, 90, 80, 99, 80, 100)
df <- data.frame(name, class, value)
df
And i want to print it out. I use sink as well as hwriter (to output it as a html) later on. I get the problem with both, so I hope it's caused by the same and it's enough if we solve it for sink. That's the code:
sink("stuff.txt")
for (i in 1:nrow(df)) {
cat(" Name:")
cat(df$name[i-1])
cat("\n")
cat(" Class:")
cat(df$class[i-1])
cat("\n")
}
sink()
file.show("stuff.txt")
Part of the output I get is something like:
Name:1
Class:1
Name:1
Class:2
Name:1
Class:2
On the other hand, the output I want should be like:
Name:a
Class:c1
Name:a
Class:c2
Name:a
Class:c2
The reason cat was printing numbers was that your character variables were converted to "factors" when you put them in the data.frame. This is the default behavior for data.frames. It is often a more efficient way to store the values because it converts each string value to a unique integer value. That's why you see numbers when you cat the value.
If you don't want to use factors in your data.frame, you can use
df <- data.frame(name, class, value, stringsAsFactors=F)
and this will keep the values as characters. Alternatively, you can convert to character when you print
cat(as.character(df$name[i-1]))