R - plotting specific columns as x with two rows as the lines - r

Here is a small sample of my data:
gene_name ctrl_lsm1_ratio_t0 ctrl_lsm1_ratio_t1 ctrl_lsm1_ratio_t2
22 ABP140 -0.262682 -0.303352 -0.223626
246 ARI1 -0.163952 -0.374765 -0.321876
454 BPH1 -0.517519 -0.524553 -0.747609
513 BUR6 0.645573 0.217433 0.390403
588 CDC20 -0.264072 -0.665268 -0.594191
ctrl_lsm1_ratio_t3 ctrl_lsm1_stat_t0 ctrl_lsm1_stat_t1 ctrl_lsm1_stat_t2
22 -0.421704 no no no
246 -0.692391 no no no
454 -0.793595 no no yes
513 0.200799 yes no no
588 -0.523884 no yes yes
ctrl_lsm1_stat_t3 systematic_name
22 yes YOR239W
246 yes YGL157W
454 yes YCR032W
513 no YER159C
588 yes YGL116W
I would like to plot columns [,2:5] on the x axis (as in time point 0, 1, 2, and 3) with the y axis fitting the ratio columns.
If there's a way to color the points to be one color for "yes" or "no" at the specific time points, I would also like to be able to do that. (for instance, points in the ctrl_lsm1_ratio_t0 column would be colored based on values in the ctrl_lsm1_stat_t0 column).
I also only want to plot two rows at a time, both as lines (for instance row 22 with row 513). Hope this makes sense! I'm new to R and not sure what to do. I'm willing to download whatever package necessary.

data.csv:
gene_name,ctrl_lsm1_ratio_t0,ctrl_lsm1_ratio_t1,ctrl_lsm1_ratio_t2,ctrl_lsm1_ratio_t3,ctrl_lsm1_stat_t0,ctrl_lsm1_stat_t1,ctrl_lsm1_stat_t2,ctrl_lsm1_stat_t3,systematic_name
ABP140,-0.262682,-0.303352,-0.22362,-0.421704,no,no,no,yes,YOR239W6
ARI1,-0.163952,-0.374765,-0.32187,-0.692391,no,no,no,yes,YGL157W6
BPH1,-0.517519,-0.524553,-0.74760,-0.793595,no,no,yes,yes,YCR032W9
BUR6,0.645573,0.217433,0.39040,0.200799,yes,no,no,no,YER159C3
CDC20,-0.264072,-0.665268,-0.59419,-0.523884,no,yes,yes,yes,YGL116W1
Code:
d<-read.csv("data.csv", header=T, stringsAsFactors=F)
matplot(t(d[,2:5]), type="l", pch=20, lty=1, xlab="time", ylab="ctrl_lsm1_ratio")
d2<-reshape(d[,6:9],varying=list(names(d[,6:9])),direction="long",v.name="ctrl_lsm1_stat", ids=d$gene_name)
points(d2$time, unlist(d[,2:5]), col=ifelse(d2$ctrl_lsm1_stat=="yes",1,2),cex=2.0)
legend("topright",legend=c("yes","no"), col=c(1,2), pch=21)

Related

Adding new Data rows in R

I am trying to build a data frame so I can generate a Plot with a specific set of data, but I am having trouble getting the data into a table correctly.
So, here is what I have available from a data query:
> head(c, n=10)
EVTYPE FATALITIES INJURIES
834 TORNADO 5633 91346
856 TSTM WIND 504 6957
170 FLOOD 470 6789
130 EXCESSIVE HEAT 1903 6525
464 LIGHTNING 816 5230
275 HEAT 937 2100
427 ICE STORM 89 1975
153 FLASH FLOOD 978 1777
760 THUNDERSTORM WIND 133 1488
244 HAIL 15 1361
I then tried to generate a set of data variables to build a finished a data.frame like this:
a <- c(c[1,1], c[1,2], c[1,3])
b <- c(c[6,1], c[4,2] + c[6,2], c[4,3] + c[6,3])
d <- c(c[2,1], c[2,2], c[2,3])
e <- c(c[3,1], c[3,2], c[3,3])
f <- c(c[5,1], c[5,2], c[5,3])
g <- c(c[7,1], c[7,2], c[7,3])
h <- c(c[8,1], c[8,2], c[8,3])
i <- c(c[9,1], c[9,2], c[9,3])
j <- c(c[10,1], c[10,2], c[10,3])
k <- c(c[11,1], c[11,2], c[11,3])
df <- data.frame(a,b,d,e,f,g,h,i,j)
names(df) <- c("Event", "Fatalities","Injuries")
But, that is failing miserably. What I am getting is a long string of all the data variables, repeated 10 times. nice trick, but that is not what I am looking for.
I would like to get a finished data.frame with ten (10) rows of the data, like it was originally, but with my combined data in place. Is that possible.
I am using R version 3.5.3. and the tidyverse library is not available for install on that version.
Any ideas as to how I can generate that data.frame?
If a barplot is what you're after, here's a piece of code to get you that:
First, you need to get the data in the right format (that's probably what you tried to do in df), by column-binding the two numerical variables using cbindand transposing the resulting dataframe using t(i.e., turning rows into columns and vice versa):
plotdata <- t(cbind(c$FATALITIES, c$INJURIES))
Then set the layout to your plot, with a wide margin for the x-axis to accommodate your long factor names:
par(mfrow=c(1,1), mar = c(8,3,3,3))
Now you're ready to plot the data; you grab the labels from c$EVTYPE, reduce the label size in cex.names and rotate them with las to avoid overplotting:
barplot(plotdata, beside=T, names = c$EVTYPE, col=c("red","blue"), cex.names = 0.7, las = 3)
(You can add main =to define the heading to your plot.)
That's the barplot you should obtain:

How to create a heat map in R?

I am doing a multiple part project. To begin with I had a data set which provided the deposits per district over the years. After scrubbing the data set, I was able to create a data frame, which provides the growth of deposits by district. I have growth of deposits by 3 different kinds of institutions - foreign banks, public banks and private banks in 3 different data frames as the # of rows differs in each frame. I have been asked to create 3 maps (heat maps) with deposit growth against each of the kind of banks.
My data frame looks like the attached picture.
I want to make a heat map for the growth column. enter image description here
Thanks.
Maybe I provide some spam by this answer, so delete it without hasitation.
I'll show you how I make some heatmaps in R:
Fake data:
Gene Patient_A Patient_B Patient_C Patient_D
BRCA1 52 46 124 148
TP53 512 487 112 121
FOX3D 841 658 321 364
MAPK1 895 541 198 254
RASA1 785 554 125 69
ADAM18 12 65 85 121
hmcols <- rev(redgreen(2750))
heatmap.2(hm_mx, scale="row", key=TRUE, lhei=c(2,5), symkey="FALSE", density.info="none", trace="none", cexRow=1.1, cexCol=1.1, col=hmcols, dendrogram = "none")
In case of read.table you propably will have to convert data frame to matrix and put first column as a row names to avoid errors from R:
hm <- read.table("hm1.txt", sep = '\t', header=TRUE, stringsAsFactors=FALSE)
row.names(hm) <- hm$Gene
hm_mx <- data.matrix(hm)
hm_mx <- hm_mx[,-c(1)]

R: How to make a "factorial therefrom numeric v." demand from dataframe

I need some help! :)
What I would like to do in R is to make a command where I can just assign the numerical variables to the factorial variable and from there keep on working, something like:
AgeAlfalfaBand <- c(Band therefrom Age) #see table below
so I can do things like: Correlate "Band therefrom Age" with "Band therefrom Larvae"
Or is there an even easier way?
Table:
Farmer Age [years] Larvae [per m2]
Band 2 1315
Band 4 725
Band 6 90
Fechney 1 520
Fechney 3 285
Fechney 9 30
Mulholland 2 725
Mulholland 6 20
Thank you for helping me! Regards
You will need to subset your data by Farmer first. Then correlate Age and Larvae for each subset.
Example:
band <- x[x$Farmer == "Band", ]
with(band, plot(Larvae~Age, main="Band"))
Output:
Edited:
You can compare all Age and Larvae values as follows:
pairs(x$Age~x$Larvae, col = x$Farmer, pch = 16)
The col argument to pairs will plot the dots with a different color for every farmer. The pch argument will make the dots filled.

Scilab Data stretching

I have a data file with 2 columns. First column runs from 0 to 1390 second column has different values. (1st column is X pixel coordinates 2nd is intensity values).
I would like to "stretch" the data so that the first column runs from 0 to 1516 and the second column gets linearly interpolated for these new datapoints.
Any simple way to do this in scilab?
Data looks like this:
0 300.333
1 289.667
2 273
...
1388 427
1389 393.667
1390 252
Interpolation
You can linearly interpolate using interpln. Following the demo implementation on the docs, this results in the below code.
Example code
x=[0 1 2 1388 1389 1390];
y=[300.333 289.667 273 427 393.667 252];
plot2d(x',y',[-3],"011"," ",[-10,0,1400, 500]);
yi=interpln([x;y],0:1390);
plot2d((0:1390)',yi',[3],"000");
Resulting plot
Extrapolation
I think you are thinking of extrapolation, since it is outside the known measurements and not in between.
You should determine if you would like to fit the data datafit. For a tutorial see here or here.
The question was how to "stretch" the y vector from 1391 values to 1517 values. It is possible to do that with interpln as suggested by #user1149326 but we need to stretch the x vector before the interpolation:
x=[0 1 2 1388 1389 1390];
y=[300.333 289.667 273 427 393.667 252];
d=1391/1517;
x2=0:d:1390;
yi=interpln([x;y],x2);
x3=0:1516;
plot2d(x3',yi',[3],"000");

R: Plots of subset still include excluded attributes, how do I get draw a plot without them?

I am trying to draw a boxplot in R:
I have a dataset with 70 attributes:
The format is
patient number medical_speciality number_of_procedures
111 Ortho 21
232 Emergency 16
878 Pediatrics 20
981 OBGYN 31
232 Care of Elderly 15
211 Ortho 32
238 Care of Elderly 11
219 Care of Elderly 6
189 Emergency 67
323 Emergency 23
189 Pediatrics 1
289 Ortho 34
I have been trying to get a subset to only include emergency, pediatrics in a boxplot (there are 10000+ datapoints in reality)
I thought that I could just do this:
newdata<-subset(olddata[ms$medical_specialty=='emergency'|olddata$medical_specialty=='pediatrics',])
plot(newdata)
Since if I do a summary of newdata, all it has is the pediatrics and emergency results. But when it comes to plotting it still includes the ortho, OBGYN, care of elderly in the x axis with no boxplot.
I presume that there is a way to do this in ggplot by doing
ggplot(newdata, aes(x=medical_speciality, y=num_of_procedures, fill=cond)) + geom_boxplot()
but this gives me the error:
Don't know how to automatically pick scale for object of type data.frame.
Defaulting to continuous
Error: Aesthetics must either be length one, or the same length as the dataProblems:cond
Can someone help me out?
I believe your problem comes from the fact that the column medical_speciality is a factor.
So, even though you subset your data the right way, you still get all the levels (including "Ortho", "OBGYN", etc...).
You can get rid of them by using the function droplevels:
newdata<-subset(olddata[ms$medical_specialty=='emergency'|olddata$medical_specialty=='pediatrics',])
newdata <- droplevels(newdata) ## THIS IS THE NEW ADDITION
plot(newdata)
Does this help?

Resources