Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
this is probably a very basic question but I'm just starting out using R and hope someone can help.
I've imported some data into R and created an object containing just the data I'm working on first:
Each of the values is from a scale of 1 to 10.
What I want to produce is a chart showing the mean of each column, something like this (which I did in Excel):
I'm sure this is possible, but I'm going round in circles figuring it out! Ignoring the vertical line (at maximum value) and standard deviations for now, though ultimately I'd like to have them included. Thank you!
set.seed(42)
dat <- setNames(data.frame(replicate(4, sample(10, 50, replace=TRUE))), c("2000", "2400", "2800", "3200"))
head(dat)
# 2000 2400 2800 3200
# 1 1 6 5 1
# 2 5 6 9 1
# 3 1 2 10 5
# 4 9 4 8 3
# 5 10 3 7 10
# 6 4 6 6 1
library(dplyr)
library(tidyr) # pivot_longer
library(ggplot2)
dat %>%
pivot_longer(everything()) %>%
group_by(name) %>%
summarize(value = mean(value), .groups = "drop") %>%
mutate(name = as.integer(name)) %>%
ggplot(aes(name, value)) + geom_line()
It seems that you have encoded a numerical value in the column name, which is not a good idea, because it is a violation of the first normal form. I would thus suggest to transpose the data and encode the first value in the first column.
With your peculiar data structure, you must first extract the number from the colmn names with
x <- as.numeric(names(dat))
Then you can compute all column means with
y <- colMeans(dat)
And then you can plot it
plot(x, y, type="l")
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 months ago.
Improve this question
I want to reproduce the attached table but am not able to do it whatsoever. which package should i use? can anyone redirect me to the right path?
the data look like the following,
ID a b c d
x 1 0 0 1
y 0 0 1 1
z 0 1 1 0
w 1 1 0 0
If your question is to convert the data you have into a table form that's more aesthetically pleasing, the flextable package may be an easy one to use. You can also get the counts of your data by using adorn_totals by column and by row. I have tried to recreate your data below and build a table around it:
#### Load Libraries ####
library(tidyverse) # for piping
library(flextable) # for table
library(janitor) # for row and column totals
#### Use Same Data ####
ID <- c("x","y","z","w")
a <- c(1,0,0,1)
b <- c(0,0,1,1)
c <- c(0,1,1,0)
d <- c(1,1,0,0)
#### Just Use Adorn Totals ####
df <- data.frame(ID,a,b,c,d) %>%
adorn_totals("col") %>%
adorn_totals("row")
#### Flextable ####
df %>%
flextable() %>%
add_header_lines("Total by Total Version")
Which gives you this:
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am currently trying to decrease the values in a column randomly according to a given sum.
For example, if the main data is like this;
ID Value
1 4
2 10
3 16
after running the code the sum of Value should be 10 and this need to be done randomly(the decrease for each member should be chosen randomly)
ID Value
1 1
2 8
3 1
Tried several command and library but could not manage it. Still a novice and
Any help would be appreciated!
Thanks
Edit: Sorry I was not clear enough. I would like to assign a new value for each observation smaller than original (randomly). And at the end new sum of value will be equal to 10
Using the sample data
dd <- read.table(text="ID Value
1 4
2 10
3 16", header=TRUE)
and the dplyr + tidyr library, you can do
library(dplyr)
library(tidyr)
dd %>%
mutate(ID=factor(ID)) %>%
uncount(Value) %>%
sample_n(10) %>%
count(ID, name = "Value", .drop=FALSE)
Here we repeat the row once for each Value, then we randomly sample 10 rows, then we count them back up. We turn ID to a factor to make sure IDs with 0 observations are preserved.
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
Here's my problem : dplyr group_by doesn't work on a data.frame, but it works well on another. The problematic dataframe is imported from a SPSS file with the package foreign. when I execute that :
d_summarised <- d %>%
group_by(group) %>%
summarise(Sex = (sum(d$GENRE == "F", na.rm = TRUE))/sum(!is.na(d$GENRE))) %>%
select(Sex, group)
The result is calculated on the whole sample, and not by group (so the result is the same by group, what is not expected).
# A tibble: 6 x 2
group Sex
* <fct> <dbl>
1 group1 0.626
2 group2 0.626
3 group3 0.626
4 group4 0.626
5 group5 0.626
6 NA 0.626
But, at the same time, on the same session, with the same packages loaded, this works :
dat <- data.frame(x=c(1,2,3,3,2,1), y=c(15,24,54,65,82,65))
dat %>%
group_by(x) %>%
summarise(mean(y))
Here's the result :
# A tibble: 3 x 2
x `mean(y)`
* <dbl> <dbl>
1 1 40
2 2 53
3 3 59.5
plyr is not loaded, only dplyr. How could that be possible ?
The issue would be breaking the grouping with d$. Instead, use the column names and it should work
library(dplyr)
d %>%
group_by(group) %>%
summarise(Sex = (sum(GENRE == "F", na.rm = TRUE))/sum(!is.na(GENRE))) %>%
select(Sex, group)
NOTE: when we use d$GENRE, it is selecting the whole column in the dataset and not limiting the elements within the group
In the second case, OP was applying mean directly on 'y' instead of mean(dat$y). In other words, it is not the data structure i.e. data.frame vs tibble, but it is because of extracting the whole column
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
In a dataframe, I have a column which has values(price) from 1 to 500. I need to create a pie chart with 3 buckets, 1-10, 10-50, greater than 100.
It should show the percentage contribution to it.
How to do this in R?
does that help you:
library(tidyverse)
df <- as_tibble(seq(1,500)) %>% rename(price=value)
so the data looks like (its stupid, but its an example, use your data):
# A tibble: 500 x 1
price
<int>
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
# ... with 490 more rows
than we do:
df %>%
mutate(bucket=ifelse(price<=10, "1-10",
ifelse(price>10 & price<=50, "11-50", "50<"))) %>%
count(bucket) %>%
mutate(percent=n/nrow(df)*100) %>%
ggplot(aes(x="", y=percent, fill=bucket)) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y", start=0)
with mutate we define the buckets and the percentage. with ifelse we simply say: if price = x, than mark it as y, else do ...
That's the resulting chart:
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a data frame like this
Name Value
A. -5
B. 100
F. 0
G. -5
I want to sort the data in an ascending order and add a rank column. So I want something like this:
Name. Value. Rank
A. -5. 1
G. -5. 1
F. 0. 2
B. 100. 3
A base R solution could be:
v1 <- order(df$Value)
data.frame(df[v1, ], rank = as.numeric(factor(df$Value[v1])))
# Name Value rank
#1 A. -5 1
#4 G. -5 1
#3 F. 0 2
#2 B. 100 3
Sorting the dataframe with order and converting the sorted Value to factors and then numeric so that the Value with same value would get same rank.
This can be achieved easily with the dplyr package.
#Recreate the data
df <- read.table(text = "Name Value
A. -5
B. 100
F. 0
G. -5", header = TRUE)
library(dplyr)
df %>% arrange(Value) %>% mutate(Rank = dense_rank(Value))
The dplyr function reads take the data frame df, then arrange it by Value, then add a new column Rank which equals the dense ranking of Value.