How does one ignore the "moral" column in R - r

moral verw ho dog
4 1049 1 2
4 2799 1 3
2 8412 4 4
4 2122 1 3
4 2171 1 3
4 2241 1 2
4 3398 1 4
I was normalizing a dataset using
noid = data.Normalization(newx,type="n4") but I want to ignore JUST the "moral" column but normalize everything else.
Any help will be greatly appreciated.

As David Arenburg suggested in his comment, if you don't pass the moral column to the normalisation routine it will obviously be ignored. If you want it to be included in the normalized dataset you can do something like this:
noid <- data.Normalization(newx[-1], type="n4")
noid <- cbind(moral=newx[1], noid)

Related

Is there an R function to redefine a variable so I can use the spread function?

I'm new with R and I have the following problem. Maybe it's a really easy question, but I don't know the terms to search for an answer.
My problem:
I have several persons, each person is assigned a studynumber (SN). And each SN has one or more tests being performed, the test can have multiple results.
My data is long at the moment, but I need it to be wide (one row for each SN).
For example:
What I have:
SN testnumbers result
1 1 1234 6
2 1 1234 9
3 2 4567 6
4 3 5678 9
5 3 8790 9
What I want:
SN test1result1 test1result2 test2result1
1 1 6 6 NA
2 2 6 NA NA
3 3 9 NA 9
So I need to renumber the testnumbers into test 1 etc for each SN, in order to use the spread function, I think. But I don't know how.
I did manage to renumber testnumber into a list of 1 till the last unique testnumber, but still the wide dataframe looks awful.

making a table with multiple columns in r

I´m obviously a novice in writing R-code.
I have tried multiple solutions to my problem from stackoverflow but I'm still stuck.
My dataset is carcinoid, patients with a small bowel cancer, with multiple variables.
i would like to know how different variables are distributed
carcinoid$met_any - with metastatic disease 1=yes, 2=no(computed variable)
carcinoid$liver_mets_y_n - liver metastases 1=yes, 2=no
carcinoid$regional_lymph_nodes_y_n - regional lymph nodes 1=yes, 2=no
peritoneal_carcinosis_y_n - peritoneal carcinosis 1=yes, 2=no
i have tried this solution which is close to my wanted result
ddply(carcinoid, .(carcinoid$met_any), summarize,
livermetastases=sum(carcinoid$liver_mets_y_n=="1"),
regionalmets=sum(carcinoid$regional_lymph_nodes_y_n=="1"),
pc=sum(carcinoid$peritoneal_carcinosis_y_n=="1"))
with the result being:
carcinoid$met_any livermetastases regionalmets pc
1 1 21 46 7
2 2 21 46 7
Now, i expected the row with 2(=no metastases), to be empty. i would also like the rows in the column carcinoid$met_any to give the number of patients.
If someone could help me it would be very much appreciated!
John
Edit
My dataset, although the column numbers are: 1, 43,28,31,33
1=yes2=no
case_nr met_any liver_mets_y_n regional_lymph_nodes_y_n pc
1 1 1 1 2
2 1 2 1 2
3 2 2 2 2
4 1 2 1 1
5 1 2 1 1
desired output - I want to count the numbers of 1:s and 2:s, if it works, all 1:s should end up in the met_any=1 row
nr liver_mets regional_lymph_nodes pc
met_any=1 4 1 4 2
met_any=2 1 4 1 3
EDIT
Although i probably was very unclear in my question, with your help i could make the table i needed!
setDT(carcinoid)[,lapply(.SD,table),.SDcols=c(43,28,31,33,17)]
gives
met_any lymph_nod liver_met paraortal extrahep
1: 50 46 21 6 15
2: 111 115 140 151 146
i am very grateful! #mtoto provided the solution
John
Based on your example data, this data.table approach works:
library(data.table)
setDT(df)[,lapply(.SD,table),.SDcols=c(2:5)]
# met_any liver_mets_y_n regional_lymph_nodes_y_n pc
# 1: 4 1 4 2
# 2: 1 4 1 3

Creating a column in R which is based in information in other columns

My data set consists of the columns CODE, FIRE, CARBON, and DEPTH
example of data in the column CODE are chic, chiq, lopc, lopq etc.
Besides this I got information in which is mentioned that each value, chic for example, in the column CODES belongs to either A, B or C.
So I want to create in R a new column which is based on the CODE column. So that when I say to R that when CODE is for example chic, the value in the new column is A on the same row.
I hope someone could help me with this,
Thanks in advance!
( I wanted to insert a column in here with an example of the data, but it didnt work, I hope it is now also clear what I want to create)
You could also use (assuming your data is a in a dataframe df)...
df$NEW[df$CODE == "chic"] <- "A"
df$NEW[df$CODE == "chiq"] <- "B"
df
# CODE DEPTH CARBON FIRE NEW
# 1 chic 0 24.2 U A
# 2 chic 10 25.6 U A
# 3 chiq 0 20.3 B B
# 4 chiq 0 50.3 B B
# 5 chiq 10 40.5 B B
# 6 chiq 20 24.2 B B
# 7 polc 0 60.5 U <NA>
# 8 polc 10 10.2 U <NA>
# 9 polq 0 20.5 U <NA>
etc...
To recode multiple values at once, use %in% and a vector for example:
df$NEW[df$CODE %in% c("chic", "chiq")] <- "A"
You can try :
yourdata$newcolumn<-sapply(as.character(yourdata$CODE),
switch,
"chic"="A","code2"="B","code3"="C")
Of course, you'll need to adapt the code to your data but you didn't give a lot of details...
Example :
with yourdata the below data.frame :
CODE FIRE CARBON DEPTH
1 chic 1 3 -1.3107045
2 code2 1 3 1.0781249
3 code3 1 3 -0.6762211
4 chic 1 3 0.9196376
5 code2 2 4 -0.7161380
6 code2 2 4 -0.5506980
7 chic 2 4 0.3323771
8 chic 2 4 -0.4131832
you get the new data.frame yourdata :
CODE FIRE CARBON DEPTH newcolumn
1 chic 1 3 -1.3107045 A
2 code2 1 3 1.0781249 B
3 code3 1 3 -0.6762211 C
4 chic 1 3 0.9196376 A
5 code2 2 4 -0.7161380 B
6 code2 2 4 -0.5506980 B
7 chic 2 4 0.3323771 A
8 chic 2 4 -0.4131832 A

How do you create a new column containing percentage data calculated from other columns?

Please excuse the very novice question, but I'm trying to create a new column in a data frame that contains percentages based on other columns. For example, the data I'm working with is similar to the following, where the That column is a binary factor (i.e. presence or absence of "that"), the Verb column is the individual verb (i.e. verbs that may or may not be following by "that"), and the Freq column indicates the frequency of each individual verb.
That Verb Freq
1 That believe 3
2 NoThat think 4
3 That say 3
4 That believe 3
5 That think 4
6 NoThat say 3
7 NoThat believe 3
8 NoThat think 4
9 That say 3
10 NoThat think 4
What I want is to add another column that provides the overall rate of "that" expression (coded as "That") for each of the different verbs. Something like the following:
That Verb Freq Perc.That
1 That believe 3 33.3
2 NoThat think 4 25.0
3 That say 3 33.3
4 That believe 3 33.3
5 That think 4 25.0
6 NoThat say 3 33.3
7 NoThat believe 3 33.3
8 NoThat think 4 25.0
9 That say 3 33.3
10 NoThat think 4 25.0
It may be that I've missed a similar question elsewhere. If so, my apologize. Nevertheless, thanks in advance for any help.
You want to use the ddply function in the plyr library:
#install.packages('plyr')
library(plyr)
dat # your data frame
ddply(dat, .(verb), transform, perc.that = freq/sum(freq))
# that verb freq perc.that
#1 That believe 3 0.3333333
#2 That believe 3 0.3333333
#3 NoThat believe 3 0.3333333
#4 That say 3 0.3333333
#...

Short(er) notation of selecting a part of a data.frame or other objects in R

I always get angry at my R code when I have to process dataframes, i.e. filtering out certain rows. The code gets very illegible as I tend to choose meaningful, but long, names for my objects. An example:
all.mutations.extra.large.name <- read.delim(filename)
head(all.mutations.extra.large.name)
id gene pos aa consequence V
ENSG00000105732 ZN574_HUMAN 81 x/N missense_variant 3
ENSG00000125879 OTOR_HUMAN 7 V/3 missense_variant 2
ENSG00000129194 SOX15_HUMAN 20 N/T missense_variant 3
ENSG00000099204 ABLM1_HUMAN 33 H/R missense_variant 2
ENSG00000103335 PIEZ1_HUMAN 11 Q/R missense_variant 3
ENSG00000171533 MAP6_HUMAN 39 A/G missense_variant 3
all.mutations.extra.large.name <- all.mutations.extra.large.name[which(all.mutations.extra.large.name$gene == ZN574_HUMAN)]
So in order to kick out all other lines in which I am not interested I need to reference 3 times the object all.mutations.extra.large.name. And reating this kind of step for different columns makes the code really difficult to understand.
Therefore my question: Is there a way to filter out rows by a criterion without referencing the object 3 times. Something like this would be beautiful: myobj[,gene=="ZN574_HUMAN"]
You can use subset for that:
subset(all.mutations.extra.large.name, gene == "ZN574_HUMAN")
Several options:
all.mutations.extra.large.name <- data.frame(a=1:5, b=2:6)
within(all.mutations.extra.large.name, a[a < 3] <- 0)
a b
1 0 2
2 0 3
3 3 4
4 4 5
5 5 6
transform(all.mutations.extra.large.name, b = b^2)
a b
1 1 4
2 2 9
3 3 16
4 4 25
5 5 36
Also check ?attach if you would like to avoid repetitive typing like all.mutations.extra.large.name$foo.

Resources