How to create a prompt/answer system in R? - r

I saw in the thread 'Creating a Prompt/Answer system to input data into R' that it is possible to Let R respond to questions. I would like to do the same but based on my dataframe.
My dataframe looks like the following:
PP
Trait
1
1
2
1
3
2
4
1
5
3
6
2
I basically would like to let R answer the following questions:
Did participants score at least 1 on Trait? Y/N
Did participants score at least 2 on Trait? Y/N
Did participants score at least 3 on Trait? Y/N
Is this possible?
Thank you in advance!

The svDialogs - package might provide what you need. You can create Dialog- and Input-Boxes with it that return the entered values wich you can later process within your script.
Link: svDialogs on Github

Related

Conditional sentence for specific rows

Disclaimer: I am not that advanced with R Studio and hence my question might be quite self explanatory.
Lets assume the following data set
**ID value1a value2a value1b value2b ...
1 2 3 ...
8 4 4
2 5 5
I want to create a forth variable that is part of the expression of an if sentence, that logically should go as follows:
If ID = 1 is over 5 in "value1x" and below 3 in "value2x", then add the value 1 to this forth variable. Hence the forth variable should function as a counter, that the number in the forth variable indiciates the frequency of value1x being over 5 and value2x being below 3.
I hope my question makes sense and Id appreciate answers!

Assign new ID taking into account previous changes

Sorry I do not know how to properly title my question. It is easier to understand with an example.
Sample data
Consider the following example.
> l_ids=as.data.frame(cbind(a=c("strong","intense","intensity"),
id=c("1","2","3"),new_id=c("","1","2")),stringsAsFactors = FALSE)
a id new_id
1 strong 1
2 intense 2 1
3 intensity 3 2
I would like to update the id of each word in a with a new_id, if it applies. Consider this as a synonym dictionary. As I iterate over new_id;
> for (i in 1:nrow(l_ids)){
+ if (nchar(l_ids$new_id[i])>0){
+ l_ids$id[i]=l_ids$new_id[i]
+ }
+ }
> l_ids
a id new_id
1 strong 1
2 intense 1 1
3 intensity 2 2
The problem is that I would like for intensity to also be given a 1. Is there a way to do this without having to iterate multiple times?
Update on background
I have a document where I have a list of synonyms. These are synonyms only relevant to the field of application of the problem. Example:
> dictionary
good bad
1 strong intense
2 intense intensity
3 light soft
I am then given a list of words, each with a given id. My task is to check if any of those words is in the bad column of dictionary and, if so, update it with the id of the word to its left. As can be seen, intensity would need two steps to become strong (a good word in the dictionary). Is there a way to do so without having to do multiple iterations? (say, a for loop)

Count no. of categories within categories in my df in R

Each row of data in my df is from a study (or "Article"). Each Article has a sponsor ("Sponsor") who may have sponsored a number of the articles in my dataset.
I want to produce a summary table to show how many articles each Sponsor has sponsored in my dataset.
I hope you can help!
many thanks!!!!
Annabel
What I presume your dataframe looks like:
df=data.frame(article=1:10,sponsor=letters[round(runif(10,1,5))])
head(df)
article sponsor
1 1 a
2 2 b
3 3 e
4 4 d
5 5 e
6 6 b
How you could quickly check the number of articles per sponsor:
table(df$sponsor)
I'm not sure what you want to do exactly, but
I guess what you are looking for is:
table(df$Sponsor)
If you want the value that occurs the most frequent, you can use:
names(sort(table(df$Sponsor), decreasing=TRUE))[1]
next time, please try to provide a MWE so it is easier for us to help.

%Rpush >> lists of complex objects (e.g. pandas DataFrames in IPython Notebook)

Once again, I am having a great time with Notebook and the emerging rmagic infrastructure, but I have another question about the bridge between the two. Currently I am attempting to pass several subsets of a pandas DataFrame to R for visualization with ggplot2. Just to be clear upfront, I know that I could pass the entire DataFrame and perform additional subsetting in R. My preference, however, is to leverage the data management capability of Python and the subset-wise operations I am performing are just easier and faster using pandas than the equivalent operations in R. So for the sake of efficiency and morbid curiosity...
I have been trying to figure out if there is a way to push several objects at once. The wrinkle is that sometimes I don't know in advance how many items will need to be pushed. To retain flexibility, I have been populating dictionaries with DataFrames throughout the front end of the script. The following code provides a reasonable facsimile of what I am working through (I have not converted via com.convert_to_r_dataframe for simplicity, but my real code does take this step):
import pandas as pd
from pandas import DataFrame
%load_ext rmagic
d1=DataFrame(np.arange(16).reshape(4,4))
d2=DataFrame(np.arange(20).reshape(5,4))
d_list=[d1,d2]
names=['n1','n2']
d_dict=dict(zip(names,d_list))
for name in d_dict.keys():
exec '%s=d_dict[name]' % name
%Rpush n1
As can be seen, I can assign a static name and push the DataFrame into the R namespace individually (as well as in a 'list' >> %Rpush n1 n2). What I cannot do is something like the following:
for name in d_dict.keys():
%Rpush d_dict[name]
That snippet raises an exception >> KeyError: u'd_dict[name]'. I also tried to deposit the dynamically named DataFrames in a list, the list references end up pointing to the data rather than the object reference:
df_list=[]
for name in d_dict.keys():
exec '%s=d_dict[name]' % name
exec 'df_list.append(%s)' % name
print df_list
for df in df_list:
%Rpush df
[ 0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15,
0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19]
%Rpush did not throw an exception when I looped through the lists contents, but the DataFrames could not be found in the R namespace. I have not been able to find much discussion of this topic beyond talk about the conversion of lists to R vectors. Any help would be greatly appreciated!
Rmagic's push uses the name that you give it both to look up the Python variable, and to name the R variable it creates. So it needs a valid name, not just any expression, on both sides.
There's a trick you can do to get the name from a Python variable:
d1=DataFrame(np.arange(16).reshape(4,4))
name = 'd1'
%Rpush {name}
# equivalent to %Rpush d1
But if you want to do more advanced things, it's best to get hold of the r object and use that to put your objects in. Rmagic is just a convenience wrapper over rpy2, which is a full API. So you can do:
from rpy2.robjects import r
r.assign('a', 1)
You can mix and match which interface you use - rmagic and rpy2 are talking to the same instance of R.

simple rank formula

I'm looking for a mathmatical ranking formula.
Sample is
2008 2009 2010
A 5 6 4
B 6 7 5
C 7 8 2
I want to add a rank column for each period code field
rank
2008 2009 2010 2008 2009 2010
B 6 7 5 2 1 1
A 5 6 4 3 2 2
C 7 2 2 1 3 3
please do not reply with methods that loop thru the rows and columns, incrementing the rank value as it goes, that's easy. I'm looking for a formula much like finding the percent total (item / total). I know i've seen this before but an havning a tough time locating it.
Thanks in advance!
sort ((letters_col, number_col) descending by number_col)
As efficient as your sort alg.
Then number the rows, of course
Edit
I really got upset by your comment "please don't up vote this answer, sorting and loop is not what I'm asking for. i specifically stated this in my original question. " , and the negative votes, because, as you may have noted by the various answers received, it's basically correct.
However, I remained pondering where and how you may "have seen this before".
Well, I think I got the answer: You saw this in Excel.
Look at this:
This is the result after entering the formulas and sorting by column H.
It's exactly what you want ...
What are you using? If you're using Excel, you're looking for RANK(num, ref).
=RANK(B2,B$2:B$9)
I don't know of any programming language that has that built in, it would always require a loop of some form.
If you want the rank of a single element, you can do it in O(n) by looping through the elements, counting how many have value above the given element, and adding 1.
If you want the rank of all the elements, the best (and really only) way is to sort the elements. Anything else you do will be equivalent to sorting (there is no "formula")
Are you using T-SQL? T-SQL RANK() may pull what you want.

Resources