Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have CSV data as follows:
code, label, value
ABC, len, 10
ABC, count, 20
ABC, data, 102
ABC, data, 212
ABC, data, 443
...
XYZ, len, 11
XYZ, count, 25
XYZ, data, 782
...
The number of data entries is different for each code. (This doesn't matter for my question; I'm just point it out.)
I need to analyze the data entries for each code. This would include calculating the median, plotting graphs, etc. This means I should separate out the data for each code and make it numeric?
Is there a better way of doing this than this kind of thing:
x = read.csv('dataFile.csv, header=T)
...
median(as.numeric(subset(x, x$code=='ABC' & x$label=='data')$value))
boxplot(median(as.numeric(subset(x, x$code=='ABC' & x$label=='data')$value)))
split and list2env allows you to separate your data.frame x for each code generating one data.frame for each level in code:
list2env(split(x, x$code), envir=.GlobalEnv)
or just
my.list <- split(x, x$code)
if you prefer to work with lists.
I'm not sure I totally understand the final objective of your question, do you just want some pointers of what you could do it? because there are a lot of possible solutions.
When you ask: I need to analyze the data entries for each code. This would include calculating the median, plotting graphs, etc. This means I should separate out the data for each code and make it numeric?
The answer would be no, you don't strictly have to. You could use R functions which does this task for you, for example:
x = read.csv('dataFile.csv', header=T)
#is it numeric?
class(x$value)
# if it is already numeric you shouldn't have to convert it,
# if it strictly numeric I don't know any reason why it
# should be read as strings but it happens.
aggregate(x,by=list(x$code),FUN="median")
boxplot(value~code,data=x)
# and you can do ?boxplot to look into its options.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I am trying to create a plot in R that shows post-surgical outcomes over time. I want to plot a certain data point at pre-op, 1 month post-op, 6 months post-op, etc. Here is an example dataframe:
dat <- data.frame(Preop=c(-2,0.5,-0.25,1.5), PO_1M=c(-1.5,0.2,-0.1,1.0), PO_6M=c(-1.2,0.1,-0.05,0.5), PO_1Y=c(-1.0,0.05,0,0.25))
dat
Ideally, the x axis will have markings for the time (preop, 1 month post-op, etc.), and the y axis will have the value at that time. The data should converge around y=0 coming from either the positive or negative direction, and I imagine a plot looking something like this:
My actual dataframe also has many missing values, so this would need to be accounted for somehow. I would appreciate if anyone could help approach this problem using either ggplot or base R plotting functions. Thanks so much!
Your data should be restructured. Use tidyr package to help make your columns into rows. Then use ifelse logic to convert your column names into the number of months. I assigned pre-op to zero months.
library(tidyverse)
dat2<-dat %>% tidyr::pivot_longer(cols=Preop:PO_1Y)
dat2$nummonths<-ifelse(dat2$name=='Preop',0,
ifelse(dat2$name=='PO_1M',1,
ifelse(dat2$name=='PO_6M',6,
ifelse(dat2$name=='PO_1Y',12,NA))))
ggplot(dat2, aes(nummonths,value))+geom_point()+theme_dark()
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
There are a total of 100 data from six teams' basketball games. I wrote the R code to see which team wins in each game like this.
win = ifelse(dat$away_score > dat$home_score, dat$away, dat$home)
However, the name of the basketball team is not output but is output as a number (1,2,3, ..). Of course,
After naming the basketball teams in alphabetical order, numbers were assigned according to their order. At this time, how do I print the results in the name of the original basketball team rather than numbers?
Seems like the columns are factor. We could convert the factor to character class and then it would work
ifelse(dat$away_score > dat$home_score, as.character(dat$away), as.character(dat$home))
Not sure what dat looks like, but if I do this:
dat <- c()
dat$home <- c("a","b","c") # home team names
dat$away <- c("d","e","f") # away team names
dat$away_score <- c(90,80,70)
dat$home_score <- c(89,81,69)
win = ifelse(dat$away_score > dat$home_score, dat$away, dat$home)
win # print results
I get the following showing the "name" of which team won:
[1] "d" "b" "f"
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm quite new to R, and if I imported a .csv file and if rows represent
time and columns represent n variables of interest, how could I construct a
function that returns any given 1xn vector from the table?
P.S. I'm not just interested in constructing a vector, but I will perform
matrix algebra with iterative calculations to estimate parameters, which means
I will need to use a for-loop.
If the data structure contains e.g. m rows and n columns i.e. n variables, you can easily construct the n vectors without much effort.
data<-read.csv(".../file.csv")
class(data)
[1] "data.frame"
class(as.numeric(data[1,]))
[1] "numeric"
So it is not a big deal to convert 1*n matrix i.e. vector of length(ncol(data)).
In a loop just use
data["required Row Number",]
to access the particular row. Each case it will ultimately give 1*n matrix or a vector of length(n)
You can use the function melt() from the package reshape2
Or if you want to use the for loop, try something like:
one_col <- data[,1]
for (i in 2:ncol(data)){
one_col <- rbind(one_col, data[,i])
}
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a dataframe with a column named Stage. The dataframe is generated from a regularly updated excel file.
This column should only have a certain few values in it, such as 'Planning', or 'Analysis', but people occasionally put custom values in and it is impractical to stop.
I want the dataframe sorted by this column, with a custom sort order that makes sense chronologically (e.g for us, planning comes before analysis). I would be able to implement this using factors (e.g. Reorder rows using custom order ), but if I use a predefined list of factors, I lose any unexpected values that people enter into that column. I am happy for the unexpected values not to be sorted properly but I don't want to lose them entirely.
EDIT: Answer by floo0 is amazing, but I neglected to mention that I was planning on barplotting the results, something like
barplot(table(MESH_assurance_involved()[MESH_assurance_involved_sort_order(), 'Stage']), main="Stage became involved")
(parentheses because these are shiny reactive objects, shouldn't make a difference).
The results are unsorted, although testing in the console reveals the underlying data is sorted.
table is also breaking the sorting but using ggplot and no table I get the identical result.
To display a barplot maintaining the source order seems to require something like Ordering bars in barplot() but all solutions I have found require factors, and mixing them with the solution here is not working for me somehow.
Toy data-set:
dat <- data.frame(Stage = c('random1', 'Planning', 'Analysis', 'random2'), id=1:4,
stringsAsFactors = FALSE)
So dat looks as follows:
> dat
Stage id
1 random1 1
2 Planning 2
3 Analysis 3
4 random2 4
Now you can do something like this:
known_levels <- c('Planning', 'Analysis')
my_order <- order(factor(dat$Stage, levels = known_levels, ordered=TRUE))
dat[my_order, ]
Which gives you
Stage id
2 Planning 2
3 Analysis 3
1 random1 1
4 random2 4
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
tried to pick up 2 random subjects, but don't know how to do in R
random.subj <- sample(1:max(Data$Id), 2)
rd <- subset(Data$Id, Data$Id==random.subj)
I have a dataset "Data" like
Id
1
1
2
2
3
3
4
4
4
...
Well, in this case random.subj will be a vector of two elements. In that case, doing an equality comparison with == probably isn't want you want because it will just recycle through the shorter list to perform the comparison rather than checking each row for either value as you probably intend.
Also i'm not sure if all your IDs are numerical and sequential. It's better to just take a random sample from the IDs themselves rather than from the index of the IDs.
Fixing the second problem first
random.subj <- sample(Data$Id, 2)
Actually, if you just want two IDs then that's all you need, but if you want the data for those IDs then
rd <- subset(Data, Data$Id %in% random.subj)
is the correct way to extract it.