I am trying to import this table:
Source: https://www.tradeskillmaster.com/black-market?realm=EU-draenor
However, upon using =IMPORTHTML("https://www.tradeskillmaster.com/black-market?realm=EU-draenor","table"), the first column remains blank.
Test: https://docs.google.com/spreadsheets/d/1MVgD5MUgOik89MMZweLKMZQLSkMeBNoCN_FrPe-eZ5U/edit?usp=sharing
if the table contains JavaScript elements (which it does in your case) then such elements are not possible to import into Google Sheet with any formula. scraping of JS in GS is just not supported.
How about this answer? Please think of this as just one of several possible answers.
In this answer, IMPORTXML is used instead of IMPORTHTML.
Sample formula:
={QUERY(IMPORTXML(A1,"//tr | //td/a/#title"),"SELECT Col1 WHERE Col1 IS NOT NULL"),QUERY(IMPORTXML(A1,"//tr"),"SELECT Col2,Col3,Col4,Col5,Col6,Col7,Col8")}
The URL of https://www.tradeskillmaster.com/black-market?realm=EU-draenor is put in the cell "A1".
Column "A" is retrieved from QUERY(IMPORTXML(A1,"//tr | //td/a/#title"). In this case, the xpath of //tr and //td/a/#title have the value and title of the column "A", respectively.
Column "B" to "H" are retrieved from QUERY(IMPORTXML(A1,"//tr"),"SELECT Col2,Col3,Col4,Col5,Col6,Col7,Col8"). In this case, the 1st column is removed.
Result:
Reference:
IMPORTXML
If I misunderstood your question and this was not the direction you want, I apologize.
Related
I am wanting to import the table information from https://www.pro-football-reference.com/years/2020/draft.htm into a google sheet. However, I'm trying to avoid pulling in null cells as well as information I already have in other sheets. Here are my questions:
The only columns I want are Round (col1), Pick (Col2), and Player (Col4). I've tried using ImportHTML and so far, all i can do is grab the whole table.
I want to create a new column called 'Rd.Pick' which would convert the pick column into a representation ofwhat pick in the respective round they were. So aka Pick 33 would display 2.1
Finally, I would like to be able to remove the rows that are listed in between the last pick of a round but before the first pick in the following round. I'm not sure how to do that given that the text in those rows matches the header row.
This is just to answer the question from your comment above - how to convert the sequential draft pick number to a number like 3.12, 12th pick in the 3rd round.
This formula is a bit brute force, but it works:
={"Round-Pick";
ArrayFormula(ifna(ifs(
D2:D=1,"1."& text(E2:E,"00"),
D2:D=2,"2."& text(E2:E-max(filter(D$2:E,D$2:D=1)),"00"),
D2:D=3,"3."& text(E2:E-max(filter(D$2:E,D$2:D=2)),"00"),
D2:D=4,"4."& text(E2:E-max(filter(D$2:E,D$2:D=3)),"00"),
D2:D=5,"5."& text(E2:E-max(filter(D$2:E,D$2:D=4)),"00"),
D2:D=6,"6."& text(E2:E-max(filter(D$2:E,D$2:D=5)),"00"),
D2:D=7,"7."& text(E2:E-max(filter(D$2:E,D$2:D=6)),"00")
),""))}
If you put that in NFLDraft!F1, it should do what you want. You could then hide Column E if you like.
UPDATED: To provide the format you've requested, with leading zero.
try:
=ARRAYFORMULA(QUERY({
QUERY(IMPORTHTML("https://www.pro-football-reference.com/years/2020/draft.htm",
"table", 1), "select Col4"),
QUERY(IMPORTHTML("https://www.pro-football-reference.com/years/2020/draft.htm",
"table", 1), "select Col1")&"."&
QUERY(IMPORTHTML("https://www.pro-football-reference.com/years/2020/draft.htm",
"table", 1), "select Col2")}, "where not Col2 matches '\.'", 1))
Update 2020-5-14
Working with a different but similar dataset from here, I found read_csv seems to work fine. I haven't tried it with the original data yet though.
Although the replies didn't help solve the problem because my question was not correct, Shan's reply fits the original question I posted the most, so I accepted his answer.
Update 2020-5-12
I think my original question is not correct. Like mentioned in the comment, the data was quoted. Although changing the separator made the 11582 row in R look the same as the 11583 row in excel, it doesn't mean it's "right". Maybe there is some incorrect line switch due to inappropriate encoding or something, and thus causing some of the columns to be displaced. If I open the data with notepad++, the instance at row 11583 in excel is at the 11596 row.
Original question
I am trying to read the listings.csv from this dataset in kaggle into R. I downloaded the file and wrote the coderead.csv('listing.csv'). The first column, the column id, is supposed to be numeric. However, it shows:
listing$id[1:10]
[1] 2015 2695 3176 3309 7071 9991 14325 16401 16644 17409
13129 Levels: Ole Berl穩n!,16736423,Nerea,Mitte,Parkviertel,52.55554132116211,13.340658248460871,Entire home/apt,36,6,3,2018-01-26,0.16,1,279\n17312576,Great 2 floor apartment near Friederich Str MITTE,116829651,Selin,Mitte,Alexanderplatz,52.52349354926847,13.391003496971203,Entire home/apt,170,3,31,2018-10-13,1.63,1,92\n17316675,80簡 m of charm in 3 rooms with office space,116862833,Jon,Neuk繹lln,Schillerpromenade,52.47499080234379,13.427509313575928...
I think it is because there are values with commas in the second column. For example, opening the file with MiCrosoft excel, I can see one of the value in the second column is Ole,Ole...:
How can I read a csv file into R correctly when some values contain commas?
Since you have access to the data in Excel, you can 'Save As' in Excel with a seperator other than comma (,). First go in to Control Panel –> Region and Language -> Additional settings, you can change the "List Seperator". Most common one other than comma is pipe symbol (|). In R, when you read_csv, specify the seperator as '|'.
You could try this?
lsitings <- read.csv("listings.csv", stringsAsFactors = FALSE)
listings$name <- gsub(",","", listings$name) - This will remove the comma in Col name
If you don't need the information in the second column, then you can always delete it (in Excel) before importing into R. The read.csv function, which calls scan, can also omit unwanted columns using the colClasses argument. However, the fread function from the data.table package does this much more simply with the drop argument:
library(data.table)
listings <- fread("listings.csv", drop=2)
If you do need the information in that column, then other methods are needed (see other solutions).
I have csv file contains iphone device roadmap like version number, name of model, release of model , price etc. I have done following:
I have imported data set in Rstudio in variable name iphonedetail by following command. iphonedetail <-read.csv("iphodedata.csv")
Than i hv changed the attribute "name of model" to character by using following: iphonedetail$nameofmodel <- as.character(iphonedetail$nameofmodel)
Now i need to access 1st 5 name of model and store them in vector .
I tried this to achieve : iphonesubset <- data.frame(iphonedetail$nameofmodel)
Then on console i typed iphonesubset, but gave 0 col and row.
Could someone help in above 2 steps correct or not ? and also suggest how to fix 3rd step?
if you want to extract the first five (non unique):
iphonedf1to5 <- df[1:5,]
That means that you get the first 5 rows and all columns. Then if you want to get the unique first five elements it should be like:
iphonedf1to5 <- unique(df[1:5,])
Edit:
df means your data frame of the read csv, iphonedetail in your case.
I need to change a specific column name in R. I used the following command but it did not change anything.
colnames(mydata[3])<-"newname"
"3" is the column number
colnames(mydata)[3] <- "newname". colnames is a vector itself, so just move your [3] outside it.
I'm trying to add header to the first column called "TYPE"-
this is the result I get:
this is the result I want:
thanks
I am not sure, my answer will be useful or not. But I faced same problem and I solved this problem in following way.
library(tidyverse)
rownames_to_column(mtcars,var = "Type") -> df
df
Thanks!!...
I understand correctly you want to name... the row names there is no way to do this unless you get the row names and make a column of it
You seem to think row names is a column
col.names(yourtable)[1] <- "Type"