Updating a data.frame based on a lookup value - r

So I've got a very strict system that allows for adding in R scripting to handle some data. It's a front end system and I've got about 1000 characters to throw in as much as I can. What I'm working on doing is replacing the values on a data.frame (filedata_model) with a value from a translation list.
here's what I have so far:
vGrades <- c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 0, 4, 4, 3.7, 3.3, 3, 2.7, 2.3, 2, 1.7, 1.3, 1, 0, 0);
vGradeMx <- matrix(vGrades, nrow = 14, ncol = 2);
colnames(vGradeMx) <- c("CB_GRADE", "RNL_GPA");
vGradeTb <- as.data.frame(vGradeMx);
I get this is probably wildly inefficient. I'm used to working with VBA and C based programming languages and a LOT of SQL. If I could write an update statement this would take me 2 seconds. But I don't have any kind of back end access, or write capabilities on the actual data itself outside of this small box I can throw R scripting into.
So here's why I've written what I have:
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13 - These are the values we're getting back from a research vendor in a file
0, 4, 4, 3.7, 3.3, 3, 2.7, 2.3, 2, 1.7, 1.3, 1, 0, 0 - These are the values that we'd like to change them to.
I set up an additional data frame to hold the original and translation values, but now what? Everything I have tried generally fails. All the R I've learned has come from a weekend of trying to cram as many books in my brain as I can.
I appreciate the help!

filedata_model$column_name_here <- vGradeTb$RNL_GPA[match(filedata_model$column_name_here,
vGradeTb$CB_GRADE)]
where column_name_here is the column containing the values you want to change.

Related

Fisher exact test in R. Why it is row and column order dependent?

I have a contingency table which look like that:
matrix(c(0, 7, 2, 13), 2, 2)
So I started to think that those three contingency tables are the same:
matrix(c(2, 13, 0, 7), 2, 2)
matrix(c(7, 0, 13, 2), 2, 2)
matrix(c(13, 2, 7, 0), 2, 2)
There are only rows or/and columns permutations. According to fisher exact test, I think it is no matter. Look at the example paragraph, there is an equation.
Can you explain me why I have different results and I have to correct it by changing alternative argument? Equation's implementation and build in function strange usages is presented below:
fisher.test(matrix(c(0, 7, 2, 13), 2, 2), alternative = "less")
fisher.test(matrix(c(2, 13, 0, 7), 2, 2), alternative = "greater")
fisher.test(matrix(c(7, 0, 13, 2), 2, 2), alternative = "greater")
fisher.test(matrix(c(13, 2, 7, 0), 2, 2), alternative = "less")
(factorial(2)*factorial(20)*factorial(7)*factorial(15))/(factorial(2)*factorial(0)*factorial(13)*factorial(7)*factorial(22))
Tell me what I have missed from literature please. :)

Subsample a matrix by selection locations with specific values within a matrix in R

I'm have to use R instead of Matlab and I'm new to it.
I have a large array of data repeating like 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10...
I need to find the locations where values equal to 1, 4, 7, 10 are found to create a sample using those locations.
In this case it will be position(=corresponding value) 1(=1) 4(=4) 7(=7) 10(=10) 11(=1) 14(=4) 17(=7) 20(=10) and so on.
in MatLab it would be y=find(ismember(x,[1, 4, 7, 10 ])),
Please, help! Thanks, Pavel
something like this?
foo <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
bar <- c(1, 4, 7, 10)
which(foo %in% bar)
#> [1] 1 4 7 10 11 14 17 20
#nicola, feel free to copy my answer and get the recognition for your answer, simply trying to close answered questions.
The %in% operator is what you want. For example,
# data in x
targets <- c(1, 4, 7, 10)
locations <- x %in% targets
# locations is a logical vector you can then use:
y <- x[locations]
There'll be an extra step or two if you wanted the row and column indices of the locations, but it's not clear if you do. (Note, the logicals will be in column order).

highcharts display all dates for all points on x-axis

I have a spline graph with date time on x-axis. i want the date time to be automatic but also display it for all points on the x-axis. Take the demo for example - i want all the points to be display with their dates on the x-axis. how do i do this? the demo is found here . I have not tried anything because i do not know how to, that is why I am asking on here. why it is complaining to not let me post still i do not know why.
You can use tickPositions like in the example: http://jsfiddle.net/z5P8d/
tickPositions: [Date.UTC(1970,9, 27),Date.UTC(1970, 9, 26),Date.UTC(1970, 11, 1),Date.UTC(1970, 11, 11),Date.UTC(1970, 11, 25), Date.UTC(1971, 0, 8),Date.UTC(1971, 0, 15), Date.UTC(1971, 1, 1),Date.UTC(1971, 1, 8), Date.UTC(1971, 1, 21),Date.UTC(1971, 2, 12), Date.UTC(1971, 2, 25),Date.UTC(1971, 3, 4), Date.UTC(1971, 3, 9),Date.UTC(1971, 3, 13), Date.UTC(1971, 3, 19), Date.UTC(1971, 4, 25),Date.UTC(1971, 4, 31), Date.UTC(1971, 5, 7) ],

Visual Basic 2D Array Crashes Visual Studio

I am using Visual Studio 2012 for creating a web application server with vb.net and asp.net. I once had 116 separate arrays all with 116 values. I'm realizing now that it would be easier to run the calculations that I want if I used one 2D array instead. My 2D array has 116 sections of 116 integers. As soon as I complete the array with "}", Visual Studio crashes and restarts.
Is there a size limit to 2D arrays? Is there a step that I'm missing? Thanks!
My code looks like:
Dim data(,) As Integer = {{0, 9, 5, 7, 5, 7, 6, 6, 7, 2, 4, 2, 5, 7, 4, 6, 5, 3, 5...etc}, _
{9, 0, 7, 6, 6, 2, 5, 8, 8, 1, 5, 1, 7, 7, 6, 6, 7, 3, 7...etc}, _
{5, 7, 0, 7, 6, 5, 4, 4, 5, 2, 4, 3, 8, 5, 7, 8, 3, 5, 4...etc}, _
................[112 more of this].......................
{5, 8, 5, 8, 7, 7, 9, 0, 7, 2, 4, 2, 5, 7, 4, 6, 5, 3, 7...etc}}
^
'This is where crash happens
EDIT: I've done some testing and it seems to break around 40 lines.
Sounds like you should consider a matrix, or use a list(0f)
Creating 2-dimensional array with unlimited size?
Also, a possibility is a dictionary, though I'm not sure what you are doing so its hard to say. Maybe, tell us what you are using the data for?
Nice bit of explaination regarding generics, lists, collections, and arrays. As noted in the comment above, I think you should dimension your array properly array(115,115) as the case may be. Or ReDim Preseve as you discover the size
http://www.dreamincode.net/forums/topic/333038-arrays-and-collections-overview/

R hist right/left clump binning

I have a data set of length 15,000 with real values from 0 to 100. My data set is HEAVILY skewed to the left. I'm trying to accomplish the following bins: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, >10. What i have done so far is created the following:
breakvector = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 100)
and have run:
hist(datavector, breaks=breakvector, xlim=(0, 13))
However, it seems like this results in a histogram where data greater than 13 aren't included. Does anyone have any idea on how to get R to bin all the rest of the data in the last bin. Thanks in advance.
How about this
datavector<-c(sample(1:9, 40, replace=T), sample(10:100, 20, replace=T))
breakvector <- c(0:11)
hist(ifelse(datavector>10,11,datavector), breaks=breakvector, xlim=c(0, 13), xaxt="n")
axis(1, at=1:11-.5, labels=c(1:10, ">10"))
Rather than adjusting the breaks, i just throw all the values >10 into a bin for 11. Then i update the axis accordingly.

Resources