I would like to ask you how can I order the observations in one variable- needing it for my graphic. Now, the observations are sorted by 1 to 5 and I need to do a rank by 5,3,1,2,4
For more understanding: This is the x- axis of my graphic, I make a discrete geom_bar and need this ranging for better visualizing the data (y-axis is only count)
Thankful for every help!
{ggplot2} will reorder numeric and character data. In order to impose an order on your data, you need to
convert it to an ordered factor, and
impose your desired order.
Luckily this is very easy in a single step using the reorder function:
observations = reorder(1 : 5, c(5, 3, 1, 2, 4))
I understand that you have a vector of observations - this should do the trick when "observations" is your vector of values:
observations <- 1:5 # example data
new_order <- observations[c(5,3,1,2,4)]
new_order
5 3 1 2 4
Related
This question already has answers here:
Categorize numeric variable into group/ bins/ breaks
(4 answers)
Closed 2 years ago.
I am struggling to make a barplot with two variables in R. One variable has data ranging from 0-90, and I need to split it up into 3 groups-- the data that is <5, 5-10, and >10. So that there are only 3 bars in the plot instead of 90. Here is the code I have tried to use but I can't figure out how to get this to work. The problem is in the use of the <,>, and - signs.
First I created a new variable
SVLivedPlot <- SDreal2$SVLived
And then I am trying to group all the numbers that are under 5 to be the value of 1, 5-10 to be the value of 2, and greater than 10 to be 3.
SVLivedPlot[SDreal2$SVLived == c(<5)] <- 1
SVLivedPlot[SDreal2$SVLived == c(5-10)] <- 2
SVLivedPlot[SDreal2$SVLived == c(>90)] <-3
Once I get those values changed I will use the following code to save that new variable with the correct groupings as the variable I will use in my barplot
DataFrameName$OldVariableName <- NewVariableName
Once I can get this new variable created I know how to put it in the barplot() formula to get the plot. I just need to know how to group those data! Any help would be great! Thank you!:)
We can use cut
SDreal2$NewVar <- with(SDreal2, as.integer(cut(SVLived,
breaks = c(-Inf, 5, 10, 90))))
I want to calculate ratios for each row in a data frame using values from two columns for each row. The data are anatomical measurements from paired muscles, and I need to calculate a ratio of the measurement of one muscle to the measurement of the other. Each row is an individual specimen, and each of the 2 columns in question has measurements for one of the 2 muscles. Which of the two muscles is largest varies among individuals (rows), so I need to figure out how to write a script that always picks the smaller value, which may be in either column, for the numerator, and that always picks the larger values, which also can be in either column, for the denominator, rather than simply dividing all values of one column by values of the other. This might be simple, but I'm not so good with coding yet.
This doesn't work:
ratio <- DF$1/DF$2
I assume that what I need would loop through each row doing something like this:
ratio <- which.min(c(DF$1, DF$2))/which.max(c(DF$1, DF$2))
Any help would be greatly appreciated!
Assuming that you are only dealing with positive values, you could consider something like this:
# example data:
df <- data.frame(x = abs(rnorm(100)), y = abs(rnorm(100)))
# sorting the two columns so that the smaller always appears in the first
# column:
df_sorted <- t(apply(df,1, sort))
# dividing the first col. by the second col.
ratio <- df_sorted[,1]/df_sorted[,2]
Or, alternatively:
ifelse(df[,1] > df[,2], df[,2]/df[,1], df[,1]/df[,2])
I have data points ranging from 432 - 1789. I want to divide this data into 3 equal parts. For example,
I should first arrange this in ascending order and then divide into 3 equal parts.
and categorize into 'Low', 'Medium' and 'High'
How can I do this in R?
I used this, but I don't think it divided the data correctly,
x$level <- cut(x$newlevels, 3, include.lowest=TRUE, labels=c("Low", "Med", "High"))
I have a simple question. I have a vector of years, spanning 1945:2000, with many repeated years. I want to make this an ordinal vector, so that 1945 is changed to 1, 1946 to 2, etc...
Obviously in this case the easiest way is just to subtract 1944 from the vector. But I have to do this with other numberic vectors that are not evenly spaced.
Is there an R function that does this?
You can do:
as.numeric(factor(x))
For example:
x <- sample(1945:2010, 40)
ordinal_x <- as.numeric(as.factor(x))
plot(x, ordinal_x)
Notice that ordinal_x skips the gaps in x.
Following some excellent replies to an earlier question I posed - selecting n random rows across all levels of a factor within a dataframe - I have been considering an extension to this problem.
The previous question sought to randomly sample n rows/observations from each level of a particular factor, and to combine all information in a new dataframe.
However, this sort of random sampling may not be optimal for some types of data. Here, I want to again select n rows/observations per every level of a particular factor. The major difference here is that the rows/observations selected from each level of the particular factor should be consecutive.
This is an example dataset:
id<-sample(1:20, 100, replace = TRUE)
dat<-as.data.frame(id)
color <- c("blue", "red", "yellow", "pink", "green", "orange", "white", "brown")
dat$colors<- sample(color, 100, replace = TRUE)
To add to this example dataset are timestamps for each observation. These will form the order along which I wish to sample. I am using a function suggested in this thread - efficiently generate a random sample of times and dates between two dates - for this purpose:
randomts <- function(N, st="2013/12/09", et="2013/12/14") {
st <- as.POSIXct(as.Date(st))
et <- as.POSIXct(as.Date(et))
dt <- as.numeric(difftime(et,st,unit="sec"))
ev <- sort(runif(N, 0, dt))
rt <- st + ev
}
dat$ts<-randomts(100)
I am not sure if this is necessary, but it is also possible to add a variable that gives the 'day'. This is the factor which I wish to sample from every level.
temp<-strsplit(as.character(dat$ts), " ")
mat<-matrix(unlist(temp), ncol=2, byrow=TRUE)
df<-as.data.frame(mat)
colnames(df)<-c("date", "time")
dat<-cbind(df, dat)
mindate<-as.Date(min(dat$date))
dates<-as.Date(dat$date)
x<-as.numeric(dates-mindate)
x<-x+1
dat$day<-x
as.factor(dat$day) #in this example data there are 6 levels to 'day'.
#EDIT there may be 5 levels to day - depends on how data randomly generated by function
Original post did not accurately calculate day. This is better though not perfect. Seems ok but first day is day=0, when would like it to be day=1
To summarize, the problem is this. I want to create a new dataframe that contains e.g. 5 consecutive observations randomly sampled from every level of the factor day of the dataframe "dat" (ie 5 random consecutive observations taken from every day). Therefore, the new dataframe would have 30 observations. An additional caveat would be that if I wanted to sample e.g. 20 consecutive observations, and a particular level only had 15 observations, then all 15 are returned and there is no replacement.
I have tried to play around with seq_along to solve this. I seem to be able to get this to work for one variable at a time - e.g. if sampling from colors:
x <- sample(seq_along(dat$colors),1)
dat$colors[x:(x+4)]
This produces a randomly sampled list of 5 consecutive colors from the variable colors.
I am having trouble applying this to the problem at hand. I have tried modifying some of the answers to my previous question selecting n random rows across all levels of a factor within a dataframe - but can't seem to work out the correct placement of seq_along in any.
This should sample runs of colors assuming your data.frame is sorted by date. Here N is how many of each color you want. The return value keep will be TRUE for the runs for each color group.
N <- 5
keep <- with(dat, ave(rep(T, nrow(dat)), colors, FUN=function(x) {
start <- sample.int(max(length(x)-N,1),1)
end <- min(length(x), start+N-1)
r <- rep(c(F,T,F), c(start-1, end-start+1, length(x)-end))
}))
dat[keep, ]
This method does not look at any day value. It simply find a random run of N observations. It will only return fewer per category if there are fewer than N observations for a particular group.