Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am currently trying to decrease the values in a column randomly according to a given sum.
For example, if the main data is like this;
ID Value
1 4
2 10
3 16
after running the code the sum of Value should be 10 and this need to be done randomly(the decrease for each member should be chosen randomly)
ID Value
1 1
2 8
3 1
Tried several command and library but could not manage it. Still a novice and
Any help would be appreciated!
Thanks
Edit: Sorry I was not clear enough. I would like to assign a new value for each observation smaller than original (randomly). And at the end new sum of value will be equal to 10
Using the sample data
dd <- read.table(text="ID Value
1 4
2 10
3 16", header=TRUE)
and the dplyr + tidyr library, you can do
library(dplyr)
library(tidyr)
dd %>%
mutate(ID=factor(ID)) %>%
uncount(Value) %>%
sample_n(10) %>%
count(ID, name = "Value", .drop=FALSE)
Here we repeat the row once for each Value, then we randomly sample 10 rows, then we count them back up. We turn ID to a factor to make sure IDs with 0 observations are preserved.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
this is probably a very basic question but I'm just starting out using R and hope someone can help.
I've imported some data into R and created an object containing just the data I'm working on first:
Each of the values is from a scale of 1 to 10.
What I want to produce is a chart showing the mean of each column, something like this (which I did in Excel):
I'm sure this is possible, but I'm going round in circles figuring it out! Ignoring the vertical line (at maximum value) and standard deviations for now, though ultimately I'd like to have them included. Thank you!
set.seed(42)
dat <- setNames(data.frame(replicate(4, sample(10, 50, replace=TRUE))), c("2000", "2400", "2800", "3200"))
head(dat)
# 2000 2400 2800 3200
# 1 1 6 5 1
# 2 5 6 9 1
# 3 1 2 10 5
# 4 9 4 8 3
# 5 10 3 7 10
# 6 4 6 6 1
library(dplyr)
library(tidyr) # pivot_longer
library(ggplot2)
dat %>%
pivot_longer(everything()) %>%
group_by(name) %>%
summarize(value = mean(value), .groups = "drop") %>%
mutate(name = as.integer(name)) %>%
ggplot(aes(name, value)) + geom_line()
It seems that you have encoded a numerical value in the column name, which is not a good idea, because it is a violation of the first normal form. I would thus suggest to transpose the data and encode the first value in the first column.
With your peculiar data structure, you must first extract the number from the colmn names with
x <- as.numeric(names(dat))
Then you can compute all column means with
y <- colMeans(dat)
And then you can plot it
plot(x, y, type="l")
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a data frame as below:
df <- data.frame(
id= c(1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3),
name= c("john","bob","bob","bob","bob","bob","leo","bob","bob","max","mike","mike","mike","mike","mike","mike","mike","Ronaldo","mike")
)
I want to count how many times a particular value is present in name column back to back group by id
what I expect is as below:
expected_output<-data.frame(
id=c(2,3),
column_name="name",
value=c("bob","Mike"),
count=c(5,7))
Thanks for helping in advance
If you want to select the maximum consecutive name for each id you can first count consecutive names using data.table::rleid and keep only the max value in each id.
library(dplyr)
df %>%
count(id, name, cons = data.table::rleid(name), name = 'count') %>%
group_by(id) %>%
slice(which.max(count)) %>%
select(-cons)
# id name count
# <dbl> <chr> <int>
#1 1 john 1
#2 2 bob 5
#3 3 mike 7
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a data frame like this
Name Value
A. -5
B. 100
F. 0
G. -5
I want to sort the data in an ascending order and add a rank column. So I want something like this:
Name. Value. Rank
A. -5. 1
G. -5. 1
F. 0. 2
B. 100. 3
A base R solution could be:
v1 <- order(df$Value)
data.frame(df[v1, ], rank = as.numeric(factor(df$Value[v1])))
# Name Value rank
#1 A. -5 1
#4 G. -5 1
#3 F. 0 2
#2 B. 100 3
Sorting the dataframe with order and converting the sorted Value to factors and then numeric so that the Value with same value would get same rank.
This can be achieved easily with the dplyr package.
#Recreate the data
df <- read.table(text = "Name Value
A. -5
B. 100
F. 0
G. -5", header = TRUE)
library(dplyr)
df %>% arrange(Value) %>% mutate(Rank = dense_rank(Value))
The dplyr function reads take the data frame df, then arrange it by Value, then add a new column Rank which equals the dense ranking of Value.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I've just started learning R. I wanted to know how can I find the lowest value in a column for a unique value in other column. For example, in this case I wanted to know the lowest avg price per year.
I have a data frame with about 7 columns, 2 of them being average price and year. The year is obviously recurrent ranges from 2000 to 2009. The data also has various NA's in different columns.
I have very less idea about running a loop or whatsoever in this regard.
Thank you :)
my data set looks something like this:
avgprice year
332 2002
NA 2009
5353 2004
1234 NA and so on.
To break down my problem to find first five lowest values from year 2000-2004.
s<-subset(tx.house.sales,na.rm=TRUE,select=c(avgprice,year)
s2<-subset(s,year==2000)
s3<-arrange(s2)
tail(s2,5)
I know the code fails miserably. I wanted to first subset my dataframe on the basis of year and avgprice. Then sort it for each year through 2000-2004. Arrange it and using tail() print the lowest five. However I also wanted to ignore the NAs
You could try
aggregate(averageprice~year, df1, FUN=min)
Update
If you need to get 5 lowest "averageprice" per "year"
library(dplyr)
df1 %>%
group_by(year) %>%
arrange(averageprice) %>%
slice(1:5)
Or you could use rank in place of arrange
df1 %>%
group_by(year) %>%
filter(rank(averageprice, ties.method='min') %in% 1:5)
This could be also done with aggregate, but the 2nd column will be a list
aggregate(averageprice~year, df1, FUN=function(x)
head(sort(x),5), na.action=na.pass)
data
set.seed(24)
df1 <- data.frame(year=sample(2002:2008, 50, replace=TRUE),
averageprice=sample(c(NA, 80:160), 50, replace=TRUE))
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a dataset of around 1.5 L observations and 2 variables: name and amount. name can have same value again and again, for example a name ABC can appear 50 times in the dataset.
I want a new data frame with two variables: name and total amount, where each name has a unique value and total amount is the sum of all amounts in previous dataset. For example if ABC appears three times with amount == 1, 2 and 3 respectively in the previous dataset then in the new dataset, ABC will only appear one time with total amount == 6.
You can use data.table for big datasets:
library(data.table)
res<- setDT(df)[, list(Total_Amount=sum(amount)), by=name]
Or use dplyr
library(dplyr)
df %>%
group_by(name) %>%
summarise(Total_Amount=sum(amount))
Or as suggested by #hrbrmstr,
count(df, name, wt=amount)
data
set.seed(24)
df <- data.frame(name=sample(LETTERS[1:5], 25, replace=TRUE),
amount=sample(150,25, replace=TRUE))