Strsplit function in R - r

I'm currently working through some coursework where data has been provided on supermarket chip sales. Part of the task is to remove any entries where products are not chips and have been provided code to enter to help with this:
productWords <- data.table(unlist(strsplit(unique(transaction_data[, "PROD_NAME"]), "")))
the data file provided = transaction_data and PROD_NAME variable is the column we're interested in.
This however, returns the error:
Error in strsplit(unique(transaction_data[, "PROD_NAME"]), "") : non-character argument
Can someone please explain what it is that I'm doing wrong, or am I missing something? I'm not sure how this code would be able to understand the product and differentiate between another, am I meant to be adding something in with the code based on product names I've seen while looking through the data?
Here are some lines of the data as an example:
DATE STORE_NBR LYLTY_CARD_NBR TXN_ID PROD_NBR PROD_NAME PROD_QTY TOT_SALES
<date> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
1 2018-10-17 1 1000 1 5 Natural Chip Compny SeaSalt175g 2 6
2 2019-05-14 1 1307 348 66 CCs Nacho Cheese 175g 3 6.3
3 2019-05-20 1 1343 383 61 Smiths Crinkle Cut Chips Chicken 170g 2 2.9

Related

Why does dplyr put quotes around my variables

I'm using "pivot_wider" (also tried "spread") function in dplyr to rearrange some data. But when I use the function, the output variables get weird ` ` around them (shown below). I don't see any mention of this in the documentation and I haven't found anything on it on StackExchange. I've tried changing the variables into num and int before running the function, but it doesn't seem to have any effect on the quotes.
The problem is that if I want to do any operations on the new variables, I now need to write it as `2014`, which gets old fast. Am I doing something wrong, or is there something I need to do to my data before/after I run this?
Original data
ID group var
<dbl> <chr> <dbl>
1 4548 2014 18
2 4549 2014 19
3 4550 2015 20
pivot_wider(names_from=group, values_from=var)
Original data
ID `2014` `2015`
<dbl> <dbl> <dbl>
1 4548 18 NA
2 4549 19 NA
3 4550 NA 20

How to view all rows of an output thats not in table form

Problem
I started with an ungrouped data set which I proceeded to group, the output of the grouping however, does not return all 427 rows. The output is needed to input that data into a table.
So initially the data was ungrouped and appears as follows:
Occupation Education Age Died
1 household Secondary 39 no
2 farming primary 83 yes
3 farming primary 60 yes
4 farming primary 73 yes
5 farming Secondary 51 no
6 farming iliterate 62 yes
The data is then grouped as follows:
occu %>% group_by(Occupation, Died, Age) %>% count()##use this to group on the occupation of the suicide victimrs
which gives the following output:
Occupation Died Age n
<fct> <fct> <int> <int>
1 business/service no 20 2
2 business/service no 30 1
3 business/service no 31 2
4 business/service no 34 1
5 business/service no 36 2
6 business/service no 41 1
7 business/service no 44 1
8 business/service no 46 1
9 business/service no 84 1
10 business/service yes 21 1
# ... with 417 more rows
problem is i need all the rows in order to input the grouped data into a table using:
dt <- read.table(text="full output from above")
If any more code would be useful to solving this let me know.
It is not really clear what you want but try the following code :
occu %>% group_by(Occupation, Died, Age) %>% count()
dt <- as.data.frame(occu)
It seems you simply want to convert the tibble to a data frame. There is no need to print all the output and then copy-paste it into read.table().
Also if you need you can save your output with write.table(dt,"filename.txt"), it will create a .txt file with your data.
If what you want is really print all the tibble output in the console, then you can do the following code, as suggested by Akrun's link :
options(dplyr.print_max = 1e9)
It will allow R to print the full tibble into the console, which I think is not efficient to do what you are asking.

Obtaining data from Spotify Top Charts using spotifyr

I'm trying to obtain the audio features for the top 200 charts of all of 2017 using the spotifyr package on R, I tried:
days<- spotifycharts::chartdaily()
for (i in days) {
spotifycharts::chart_top200_daily(region = "global",days = "days[i]")
}
to obtain the top 200 daily for all of 2017, but I was unable to do it.
Can someone help me? :(
It works, if you turn days from tibble into vector:
days <- unlist(chart_daily())
lapply(days[1:3], function(i) chart_top200_daily("global", days = i))
But it parse data badly, so there will be problems with variable names, etc:
# A tibble: 6 x 5
x1 x2 x3 note.that.these.figures.are.generated.… x5
<int> <chr> <chr> <int> <chr>
1 NA Track Name Artist NA URL
2 1 thank u, next Ariana… 8293841 https://open.spoti…
3 2 Taki Taki (with S… DJ Sna… 5467625 https://open.spoti…
4 3 MIA (feat. Drake) Bad Bu… 3955367 https://open.spoti…
5 4 Happier Marshm… 3357435 https://open.spoti…
6 5 BAD XXXTEN… 3131745 https://open.spoti…

How to read in Data from Messy Excel Books

I've been dealing with patient and financial data from a hospital. The data is stored in .xlsx excel books. There are multiple pages within each sheet stretching horizontally and vertically. Some of the columns have neatly defined names as you would want for R but then others do not or have text in between and not to mention what appear to be randomly. At times
a section has a title which is the result of multiple rows being formatted into one singular row.
Unfortunately, I cannot show the data due to confidentiality. Is there anyway around this when the data is far from being in a tidy format?
So far I have been copying and pasting the data into a new CSV.
While this was effective I felt that it was largely inefficient.Is this the best approach to take?
Help would be much appreciated
Thanks
EDIT
As I cannot show data this is the best I can show
Hi #Paul
So Let me give a rough example
Jan Feb March April
Income X 1 2 3 4
Income Y 2 4 4 6
Expenditure
Jan Feb March April Another table here also
Expense 1 3 5 7
Expense 5 6 7 8
(Excel Bar chart)
Look at the readxl package, the range option might be what you're looking for:
library(readxl)
df1 <- read_xlsx("C:\\Users\\...\\Desktop\\Book1.xlsx", range = "A1:D3")
# # A tibble: 2 x 4
# Jan Feb March April
# <dbl> <dbl> <dbl> <dbl>
# 1 1 3 5 7
# 2 5 6 7 8
df2 <- read_xlsx("C:\\Users\\...\\Desktop\\Book1.xlsx", range = "B6:E8")
# # A tibble: 2 x 4
# Jan Feb March April
# <dbl> <dbl> <dbl> <dbl>
# 1 1 3 5 7
# 2 5 6 7 8

dplyr object not found error [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
I am not quite sure why this piece of code isn't working.
Here's how my data looks like:
head(test)
Fiscal.Year Fiscal.Quarter Seller Product.Revenue Product.Quantity Product.Family Sales.Level.1 Group Fiscal.Week
1 2015 2015Q3 ABCD1234 4000 4 Paper cup Americas Paper Division 32
2 2014 2014Q1 DDH1234 300 5 Paper tissue Asia Pacific Paper Division 33
3 2015 2015Q1 PNS1234 298 6 Spoons EMEA Cutlery 34
4 2016 2016Q4 CCC1234 289 7 Knives Africa Cutlery 33
Now, my objective is to summarize revenue by year.
Here's the dplyr code I wrote:
test %>%
group_by(Fiscal.Year) %>%
select(Seller,Product.Family,Fiscal.Year) %>%
summarise(Rev1 = sum(Product.Revenue)) %>%
arrange(Fiscal.Year)
This doesnt work. I get the error:
Error: object 'Product.Revenue' not found
However, when I get rid of select statement, it works but then I don't get to see the output with Sellers, and Product family.
test %>%
group_by(Fiscal.Year) %>%
# select(Seller,Product.Family,Fiscal.Year) %>%
summarise(Rev1 = sum(Product.Revenue)) %>%
arrange(Fiscal.Year)
The output is :
# A tibble: 3 x 2
Fiscal.Year Rev1
<dbl> <dbl>
1 2014 300
2 2015 4298
3 2016 289
This works well.
Any idea what's going on? It's been about 3 weeks since I started programming in R. So, I'd appreciate your thoughts. I am following this guide: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
Also, I looked at similar threads on SO, but I believe they were relating to issues because of "+" sign:Error in dplyr group_by function, object not found
I am looking for the following output:
Fiscal.Year Rev1 Product Family Seller
<dbl> <dbl> ... ...
1 2014 ...
2 2015 ...
3 2016 ...
Many thanks
Ok. This did the trick:
test %>%
group_by(Fiscal.Year, Seller,Product.Family) %>%
summarise(Rev1 = sum(Product.Revenue)) %>%
arrange(Fiscal.Year)
Output:
Source: local data frame [4 x 4]
Groups: Fiscal.Year, Seller [4]
Fiscal.Year Seller Product.Family Rev1
<dbl> <chr> <chr> <dbl>
1 2014 DDH1234 Paper tissue 300
2 2015 ABCD1234 Paper cup 4000
3 2015 PNS1234 Spoons 298
4 2016 CCC1234 Knives 289

Resources