Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
Good evening, I'd just like to start off by saying I'm the biggest newbie to coding. I've gone through so many tutorials just to try and make a simple football/soccer data frame.
I now currently have something like this;
Home team | Away Team | Home Goals | Away Goals
M.United Liverpool 0 0
I have that for all results of the season so far.
What i'd like to do is get the mean of home goals column, and away goals column.
Also, if it's do-able, I'd like to filter a specific team and see what their average goals are at home, and their average goals conceded at home are, etc.
Thanks in advance, and apologies for my total utter noobism.
Jay.
You can use the dplyr package with something like:
library(dplyr)
data %>%
group_by(team) %>%
summarise(mean_home = mean(home_goals), mean_away = mean(away_goals))
(I am pretty sure that will work, but one thing that is great to do is to create a reproducible example so I can run your code to double check, for example I am not exactly sure what your variables names / data set names are, and I am not able to run your code as it is, a great resource for this is the reprex package).
For obtaining mean Try,
summary(your_data_frame_name), it will give you basic statistics of each column including Home.Goals.
OR to just get mean of a column,
mean(your_data_frame_name$Home.Goals)
To filter specific team, look into select function. You can do:
M_united_home <- subset(your_data_frame_name, Home.team == M.united)
Then you can use this data frame to answer any more queries about Man united. If you want to do more, also look into dplyr package.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am learning using R to do data cleaning work. Just encounter a question that I could deal with by python but not in R.
The dataset is like this.dataset
I want to concat the first two columns and assign it as index. The first thing I need to do is to fillna('ffill') the first column. Then I need concat two columns.
Could you tell me how to do this in R (tidyverse is better)?
The result should like this:
result
Thanks in advance!
Try these. Be sure to read the help pages since many of them have arguments which may need to be set depending on what you want.
zoo::na.locf (last observation carried forward)
zoo::na.locf0
tidyr::fill
data.table::nafill
zoo also has na.aggregate, na.approx, na.contiguous, na.fill, na.spline, na.StructTS and na.trim for other forms of NA filling and tidyr also has replace_na.
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
So I have my first job as a data analyst however my boss wants me to use the data.table package and I'm having some issues with it.
My data set is about e-commerce shops with total purchases and returns(client returns). I want to visualize in a barplot how many items were returned per product denoted as Product name (I know having spaces in column names is a bit odd, I will change it later) so my code is as follows:
library(shiny)
library(ggplot2)
library(data.table)
library(tidyverse)
mainTable <- fread('returnStats.csv')
essentialReturnData <- mainTable[,c(7,9)]
returnsByProductName <- essentialReturnData[,
.(totalReturns = sum(essentialReturnData$`Return quantity`)),
by = 'Product name']
barplot(table(returnsByProductName$`Product name`))
However, I'm only getting a data.table with with the same sum value for all the Product names showed in the image below:
Then of course I'm having a barplot that looks like complete garbage:
There are two things wrong here:
Since you're asking for sum(essentialReturnData$`Return quantity`), which is a call to a different instance of the table, the sum is ignoring the by grouping. Use sum(`Return grouping`) instead, since this refers to the column within the same instance of the table.
table(returnsByProductName$`Product name`) is a frequency table for the product names, but returnsByProductName only has one row per name. You're not using the totalReturns at all! Use barplot(returnsByProductName$totalReturns, names.arg = returnsByProductName$`Product name`) instead.
Given how many products you have, you'll have problems fitting all the names on the axis in a nice way. You can do things like adding a las = 2 argument, which is passed to par() and makes the x-axis labels vertical. It's still going to look messy with that many products, however, and if the names are long then it doesn't leave much space for the plot itself, unless you make the plot size enormous.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have this data frame made in R :
Technical Junior Entry-Level Product Owner Technical Leader Senior
Professional 2,236 3,581 3,781 3,654 4,454
Communication 2,619 3,333 4,285 4,190 4,952
Innovation 3,625 4,208 4,500 4,000 4,791
And I want to plot something like that in R (I have made this in Excel)
Radar plot made in excel
Please help me to make this graph through R cause I failed making it with ggplot, ggtern, plotrix and other libraries I cannot manage properly.
I did it very easily using fmsb::radarchart(). Note I had to remove the commas first from your data, which I then read in as df. I also had to add rows for the min and max to reproduce your Excel plot.
library(fmsb)
df1 <- data.frame(t(df))
dfminmax <- rbind(rep(max(df1), 3) , rep(min(df1), 3) , df1)
radarchart(dfminmax)
You're going to want to tweak the settings to make it look better. Use ?radarchart to find out all the options.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
my question is rather general and not quite specific to wikipedia only, I would like to know is there a way to automate gneration and selection of search results. To give an eample of what I intend:
Let's say I'd like to write articles about American Food and I'd like to read information, such as ingredients, texture, cuisine(County-wise), preparation methods, etc. about approximately 500 different American foods. Let's say these are all available on Wiki too and I have an excel sheet with the names of these dishes and columns specifying their properties. But I dont want to manually look up these dishes/food-iems, can I automate this process? I am looking for some general guidance, some open-source links, some pseudo-code or algorithmic approach to this problem. Any help is appreciated.
Thanks.
P.S.: It'd be great if the logic had some links to help in carrying it out using R, since the other aspects of my project have already been built in R. Also i'd like to broaden my searches to include other major information gathering sites/search engines.
You can do it relatively quickly with use of the WikipediR package:
require(WikipediR)
phrs <- c("car","house")
j <- 1
for (i in phrs) {
pgs[j] <- page_content("en", "wikipedia", page_name = i, as_wikitext = TRUE)
j <- j + 1
}
The solution rather fortuitously assumes that your food names correspond to the page names on Wikipedia. Most probably this won't be the case for all the items. You may consider using the pages_in_category in order to source more pages at once. I presume that I would fist match my list against pages_in_category for a given category (foods) and if the number of errors is insignificant progressed to matching the data.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am using the edgeR and Limma packages to analyse a RNA-seq count data table.
I only need a subset of the data file, therefore my question is: Do I need to normalize my data within all the samples, or is it better to subset my data first and normalize the data then.
Thank you.
Regards Lisanne
I think it depends on what you want to proof/show. If you also want to take into account your "darkcounts" than you should normalize it at first such that you also take into account the percentage in which your experiment fails. Here your total number of experiments ( good and bad results) sums up to one.
If you want to find out the distribution of your "good events" than you should first produce your subset of good samples and normalize afterwards. In this case your number of good events sums up to 1
So once again, it depends on what you want to proof. As a physicist I would prefer the first method since we do not remove bad data points.
Cheers TL