Change the length of ContextPre and ContextPost in Quanteda KWIC

Change the length of ContextPre and ContextPost in Quanteda KWIC - r

Is there a way to increase the number of words appearing before and after the keyword in Quanteda kwic function?
I've tried by changing the numeric value in:
options(width = 200)
but it didn't work.
#KenBenoit

options(width) affects the number of text columns displayed by the R interpreter. You want the window argument to kwic():
> kwic(data_corpus_inaugural, "war against")
contextPre keyword contextPost
[1857-Buchanan, 2933:2934] advantage of the fortune of [ war against ] a sister republic, we
[1901-McKinley, 2284:2285] . We are not waging [ war against ] the inhabitants of the Philippine
[1901-McKinley, 2299:2300] portion of them are making [ war against ] the United States. By
[1901-McKinley, 2413:2414] used when those who make [ war against ] us shall make it no
[1933-Roosevelt, 1851:1852] Executive power to wage a [ war against ] the emergency, as great
> kwic(data_corpus_inaugural, "war against", window=7)
contextPre keyword contextPost
[1857-Buchanan, 2933:2934] to take advantage of the fortune of [ war against ] a sister republic, we purchased these
[1901-McKinley, 2284:2285] be deceived. We are not waging [ war against ] the inhabitants of the Philippine Islands.
[1901-McKinley, 2299:2300] . A portion of them are making [ war against ] the United States. By far the
[1901-McKinley, 2413:2414] needed or used when those who make [ war against ] us shall make it no more.
[1933-Roosevelt, 1851:1852] - broad Executive power to wage a [ war against ] the emergency, as great as the

Related

jq flatten list of objects into one object

I want to go from
[
{"key_skjdghkbs": "deep house"},
{"key_kjsskjbgs": "deadmau5"},
{"key_jhw98w4hl": "progressive house"},
{"key_sjkh348vg": "swedish house mafia"},
{"key_js3485jwh": "dubstep"},
{"key_jsg587jhs": "escape"}
]
to
{
"key_skjdghkbs": "deep house",
"key_kjsskjbgs": "deadmau5"
"key_jhw98w4hl": "progressive house",
"key_sjkh348vg": "swedish house mafia",
"key_js3485jwh": "dubstep",
"key_jsg587jhs": "escape"
}
Each object in the original list has exactly one key but the keys are unique.
I could do something like jq .[] .genre if the keys were the same but they're not.

jq's add function does exactly this
jq 'add'

Try this (assuming your file is named so72297039.json):
jq '[.[] | to_entries] | flatten | from_entries' < so72297039.json
(Edit: OP edited question, so here's relevant answer)

Since duplicate keys are not possible (see other answer) you can use to format like this:
{
"artist": [
"deadmau5",
"swedish house mafia"
],
"genre": [
"deep house",
"progressive house",
"dubstep"
],
"song": [
"escape"
]
}
with a jq call like this:
jq '
map(to_entries)
| flatten
| group_by(.key)
| map({key: first.key, value: map(.value)})
| from_entries
' input.json
If the keys artist, genre, song are known in advance an easier to understand expression can be used.

R Parsing error when trying to import JSON to R

I've got a JSON file that looks like this
I am trying to import it into R using the jsonlite package.
#Load package for import
library(jsonlite)
df <- fromJSON("test.json")
But it throws an error
Error in parse_con(txt, bigint_as_char) : parse error: trailing
garbage
ome in at a later time." } { "id": "e5fa37f44557c62ee
(right here) ------^
I've tried looking at all solutions on stackoverflow, but haven't been able to figure this out.
Any inputs would be very helpful.

The JSON file you linked contains two JSON objects. Perhaps you want an array:
[
{
"id": "71bb8883780bb152e4bb4db976bedc62",
"metadata": {
"abc_bad_date": "true",
"abc_client": "Hydra Corp",
"abc_doc_id": 1,
"abc_file": "Hydra Corp 2016.txt",
"abc_interview_type": "Post Analysis",
"abc_interviewee_role": "Director Corporate Engineering; Greater Chicago Area; Global Procurement Director Facilities and MRO",
"abc_interviewer": "Piper Thomas",
"abc_services_provided": "Food",
"section": "on_expectations"
},
"text": "Gerrit: There were a number ...."
},
{
"id": "e5fa37f44557c62eef44baafb13128f0",
"metadata": {
"abc_bad_date": "true",
"abc_client": "Hydra Corp",
"abc_doc_id": 1,
"abc_file": "Hydra Corp 2016.txt",
"abc_interview_type": "Post Analysis",
"abc_interviewee_role": "Director Corporate Engineering; Greater Chicago Area; Global Procurement Director Facilities and MRO",
"abc_interviewer": "Piper Thomas",
"abc_services_provided": "Painting",
"section": "on_relationships"
},
"text": "Gerrit: I thought the ABC ..."
}
]

fish shell --- how to simulate or implement a hash table, associative array, or key-value store

I am migrating from ksh to fish. I am finding that I miss the ability to define an associative array, hash table, dictionary, or whatever you wish to call it. Some cases can be simulated as in
set dictionary$key $value
eval echo '$'dictionary$key
But this approach is heavily limited; for example, $key may contain only letters, numbers, and underscores.
I understand that the fish approach is to find an external command when one is available, but I am a little reluctant to store key-value information in the filesystem, even in /run/user/<uid>, because that limits me to "universal" scope.
How do fish programmers work around the lack of a key-value store? Is there some simple approach that I am just missing?
Here's an example of the sort of problem I would like to solve: I would like to modify the fish_prompt function so that certain directories print not using prompt_pwd but using special abbreviations. I could certainly do this with a switch command, but I would much rather have a universal dictionary so I can just look up a directory and see if it has an abbreviation. Then I could change the abbreviations using set instead of having to edit a function.

You can store the keys in one variable and values in the other, and then use something like
if set -l index (contains -i -- foo $keys) # `set` won't modify $status, so this succeeds if `contains` succeeds
echo $values[$index]
end
to retrieve the corresponding value.
Other possibilities include alternating between key and value in one variable, though iterating through this is a pain, especially when you try to do it only with builtins. Or you could use a separator character and store a key-value pair as one element, though this won't work for directories because variables cannot contain \0 (which is the only possible separator for paths).

Here is how I implemented the alternative solution mentioned by #faho
I'm using '__' as seperator.
function set_field --argument-names dict key value
set -g $dict'__'$key $value
end
function get_field --argument-names dict key
eval echo \$$dict'__'$key
end

If you wanted to use a single variable with paired key/values, it's possible but as #faho mentioned, it is more complicated. Here's how you could do it:
function dict_keys -d "Print keys from a key/value paired list"
for idx in (seq 1 2 (count $argv))
echo $argv[$idx]
end
end
function dict_values -d "Print values from a key/value paired list"
for idx in (seq 2 2 (count $argv))
echo $argv[$idx]
end
end
function dict_get -a key -d "Get the value associated with a key in a k/v paired list"
test (count $argv) -gt 2 || return 1
set -l keyseq (seq 2 2 (count $argv))
# we can't simply use `contains` because it won't distinguish keys from values
for idx in $keyseq
if test $key = $argv[$idx]
echo $argv[(math $idx + 1)]
return
end
end
return 1
end
Then you could use these functions like this:
$ set -l mydict \
yellow banana \
red cherry \
green grape \
blue berry
$ dict_keys $mydict
yellow
red
green
blue
$ dict_values $mydict
banana
cherry
grape
berry
$ dict_get blue $mydict
berry
$ dict_get purple $mydict || echo "not found"
not found

#faho's answer got me thinking about this and there are a few this I wanted to add.
At first I wrote a small set of fish functions (A sort of library, if you will) that dealt with serialization, you would call a dict function with a key name, an operation (get, set, add or del) and it would use global variables to keep track of keys and their values. Works fine for flat hashes/dicts/objects, but felt somewhat unsatisfactory.
Then I realized I could use something like jq to (de-)serialize JSON. That would also make it a lot easier to deal with nesting, plus that allows having different dicts which use the same name for a key without any issues. It also separates "dealing-with-environment-variables" and "dealing-with-dicts(/hashes/etc)", which seems like a good idea. I'll focus on jq here, but the same applies to yq or pretty much anything, the core point is: Serialize data before storing, de-serialize when reading, and use some tool to work with such data.
I then proceeded to rewrite my functions using jq. however I soon realized it was easier to just use jq without any functions. To summarize the workfolow, let's consider OP's scenario and imagine we want to use abbreviations for User folders, or even better, we wanna use icons for such folders. To do that, let's assume we use Nerdfonts and have their icons availabe. A quick search for folders on Nerdfont's cheat sheet show we only have folder icons for the home folder (f74b), downloads(f74c) and images(f74e), so I'll use Material Design Icon's "File document box" (f719) for documents, and Material Design Icon's "Video" (fa66) for Videos.
So our Codepoints are:
User folder: \uf74b
Downloads \uf74c
Images: \uf74e
Documents: \uf719
Videos: \ufa66
So our JSON is:
{"~":"\uf74b","downloads":"\uf74c","images":"\uf74e","documents":"\uf719","videos":"\ufa66"}
I kept it in a single line for a reason which will become obvious now. Let's visualize this using jq:
echo '{"~":"\uf74b","downloads":"\uf74c","images":"\uf74e","documents":"\uf719","videos":"\ufa66"}' | jq 
For completeness sake, here's how it looks with Nerdfonts installed:
Now let's store this as a variable:
set -g FOLDER_ICONS (echo '{"~":"\uf74b","downloads":"\uf74c","images":"\uf74e","documents":"\uf719","videos":"\ufa66"}' | jq -c)
jq -c interprets JSON and outputs JSON in a compact structure, i.e., a single line. Ideal for storing variables.
If you need to edit something you can use jq, lat's say you want to change the abbreviation for documents to "doc" instead of an icon. Just do:
set -g FOLDER_ICONS (echo $FOLDER_ICONS | jq -c '.["documents"]="doc"')
The echo part is for reading a variable, and the set -g is for updating the variable. So those can be ignored if you're not working with variables.
As for retrieving values, jq also does that, obviously. Let's say you want to get the abbreviation for the documents folder, you can simply do:
echo $FOLDER_ICONS | jq -r '.["documents"]'
It will return doc. If you leave out the -r it will return "doc", with quotes, since strings are quoted in JSON.
You can also remove keys pretty easily, i.e.:
set -g FOLDER_ICONS (echo $FOLDER_ICONS | jq -c 'del(."documents")')
will set the variable FOLDER_ICONS to the result of reading it and passing its contents to jq -c 'del(."documents")', which tels jq to delete the key "documents" and output a compact representation of the JSON, i.e. a single line.
Everything I tried worked perfectly fine with nested JSON objects, so it seems like a pretty good solution. It's just a matter of keeping the operations in mind:
reading .["key"]
writing .["key"]="value"
deleting del(."key")
jq also has many other nice features, I wanted to showcase a bit of them so I tried looking for stuff that might be nice to include here. One of the things I use jq for is dealing with wayland stuff, especially swaymsg -t get_tree, which I've just ran and, with a mere 4 workspaces with a single window in each, outputs a 706-line JSON from hell (Was 929 when I wrote this, 6 windows across 5 workspaces, later I closed 2 windows I was done with so I came back here and re-ran the command to share the lowest possible value).
To give a more complex example of how jq might be used, here's parsing the swaymsg -t get_tree:
swaymsg -t get_tree | jq -C '{"id": .id, "type": .type, "name": .name, "nodes": (.nodes | map(.nodes) | flatten | map({"id": .id, "type": .type, "name": .name, "nodes": (.nodes | map(.nodes) | flatten | map({"id": .id, "type": .type, "name": .name}))}))}'
This will give you a tree with only id, type, name and nodes, where nodes is an array of objects, each consisting of the id, type, name and nodes of the children, with the children nodes also being an array of objects, now consisting of only id, type and name. In my case, it returned:
{
"id": 1,
"type": "root",
"name": "root",
"nodes": [
{
"id": 2147483646,
"type": "workspace",
"name": "__i3_scratch",
"nodes": []
},
{
"id": 184,
"type": "workspace",
"name": "1",
"nodes": []
},
{
"id": 145,
"type": "workspace",
"name": "2",
"nodes": []
},
{
"id": 172,
"type": "workspace",
"name": "3",
"nodes": [
{
"id": 173,
"type": "con",
"name": "Untitled-4 - Code - OSS"
}
]
},
{
"id": 5,
"type": "workspace",
"name": "4",
"nodes": []
}
]
}
You can also easily make a flattened version of that with jq by slightly changing the command:
swaymsg -t get_tree | jq -C '[{"id": .id, "type": .type, "name": .name}, (.nodes | map(.nodes) | flatten | map([{"id": .id, "type": .type, "name": .name}, (.nodes | map(.nodes) | flatten | map({"id": .id, "type": .type, "name": .name}))]))] | flatten'
Now instead of having a key nodes, the child nodes are also in the parent's array, flattened, in my case:
[
{
"id": 1,
"type": "root",
"name": "root"
},
{
"id": 2147483646,
"type": "workspace",
"name": "__i3_scratch"
},
{
"id": 184,
"type": "workspace",
"name": "1"
},
{
"id": 145,
"type": "workspace",
"name": "2"
},
{
"id": 172,
"type": "workspace",
"name": "3"
},
{
"id": 173,
"type": "con",
"name": "Untitled-4 - Code - OSS"
},
{
"id": 5,
"type": "workspace",
"name": "4"
}
]
It's pretty nifty, not limited to environment variables, and solves pretty much every problem I can think of. The only con is verbosity, so it may be a good idea to write a few fish functions for dealing with that, but that's beyond the scope here, as I'm focusing on a general approach to (de-)serialization of key-value mappings (i.e., dicts, hashes, objects etc), which can be (also) used with environment variables. For reference, a good starting point if dealing with variables might be:
function dict
switch $argv[2]
case write
read data
set -xg $argv[1] "$data"
case read, '*'
echo $$argv[1]
end
end
This simply deals with reading and writing to a variable, the only reason it's worth sharing is, first, that it allows piping something to a variable, and second, that it sets a starting point to make something more complex, i.e. automatically piping the echoed value to jq, or adding an add operation or whatever.
There's also the option of writing a script to deal with that, instead of using jq. Ruby's Marshal and to_yaml seems like interesting options, since I like ruby, but each person has their own preferences. For Python, pickle, pyyaml and json seem worth mentioning.
It's worth mentioning I'm not affiliated to jq in any way, never contributed nor even posted anything on issues or whatever, I just use it, and as someone who used to write scripts whenever I had to deal with JSON or YAML, it was quite surprising when I realized how powerful it was.

I finally needed this for an application, and I'm not super comfortable with fish builtins, so here is an implementation in Lua: https://gist.github.com/nrnrnr/b302db5c59c600dd75c38d460423cc3d. This code uses the alternating key/value representation:
key1 value1 key2 value2 ...

NetLogo: Model gets stuck w no error message

I try to make a bunch of turtles (Movers) to go trough a gate and avoid the wall which is white. Somehow the model freezes after a few runs. Go button stays black and blue circle turns for ever. No error MSG given. It must get stuck in some calculation within the "move-movers" function but I can't determine why.
I added a simplified version of my code which still produces the crash. Copy & paste to run. Disable world wrap. Include a slider for "num-movers" Variable.
breed [ movers mover ]
movers-own [ steps ] ; Steps will be used to determine if an agent has moved.
to setup
clear-all
reset-ticks
ask patches [ set pcolor green ]
basic-pattern
end
to basic-pattern ; sets up gate and wall
let wallXCor 16 ; sets a white line to determine the inside & outside of the gate
repeat 33 [
ask patch wallXCor 0 [ set pcolor white ]
set wallXCor wallXCor - 1
]
ask patches with [ pycor > 0 ] [ set pcolor lime ] ; sets the outside of the gate to another color (lime)
; changes color of the center to lime to create a passable opening
ask patch 0 0 [ set pcolor lime ]
ask patch 1 0 [ set pcolor lime ]
ask patch -1 0 [ set pcolor lime ]
end
to distribute-agents ; Distributes the Movers outside the gate based on the patch color lime. The number needs to be set via slider "num-movers"
repeat num-movers [
ask one-of patches with [ pcolor = lime and pycor > 2 and any? turtles-here = false ] [
sprout-movers 1 [ set color red set shape "circle" facexy 0 -12 ] set num-movers num-movers- 1 ]
] end
to go
move-movers
tick
end
to move-movers ; reset the steps variable and facing
ask movers [ set steps steps + 1 ]
ask movers [ facexy 0 -3 ]
; following lines checks if next patch to be steped upon is "legal".
while [ any? movers with [ steps > 0 ] ] [
ask movers with [ steps > 0 ] [
ifelse is-patch? patch-ahead 1
and not any? turtles-on patch-ahead 1
and [ not member? pcolor [ white brown ] ] of patch-ahead 1
[
fd 1
set steps steps - 1
] [ dirchange ]
]
]
end
to dirchange ;If not able to move to next patch change direction to allow a step backwards.
if ( pxcor <= 0 and ycor >= 0 ) [ facexy 1 3 ] ;fd 1 set steps steps - 1]
if ( pxcor >= 0 and ycor >= 0 ) [ facexy -1 3 ] ;fd 1 set steps steps - 1]
end

You're not getting an error message, because there is no actual error. The code just gets stuck in your while loop.
Did you mean to comment out the fd 1 set steps steps - 1 in your dirchange? My guess is that you have a bunch of turtles that face the same patch (either 1,3 or -1, 3) and get stuck because none of them can move because another turtle is in front of them. And because you only subtract from their steps if they actually move, some of them never get to 0 steps.
While is in general a bad primitive to use for this reason, especially when you have this many conditionals in your move code, making it hard to know what is causing your while loop to not end. is it because your turtles are facing a wall, or because they are at the boundary of the world, or because someone else is blocking their path? You just don't know, and because the code is stuck in a loop, your model view doesn't update so you can't see what is going on.
If you insist on keeping the while, I would at least put in a safeguard: write a turtle reporter that checks if your turtles are able to move and break your while if they can't, or give them a finite number of attempts at moving, rather requiring them to have actually moved.

How to translate a first-order logic sentence into a restriction in Protègè with string matching?

I'm tring to build an ontology to infer some informations about a domain classification and a terminology, but I'm experiencing some conceptual difficulties.
Let me explain the problem. In Protègè 4.1 i created 6 subclasses of Thing: Concept, conceptTitle, ConceptSynonym (for the classification) and Term, TermTitle, TermSynonym (for the terminology). I also have created hasConceptTitle, hasConceptSynonym, hasTermTitle and hasTermSynonym object relationships (with some constrint) to say that every Concept has one (and only one) title, and may have some synonyms, and every Term has one (and only one) title and some synonyms. Both Concept and Term have another relationship isA, giving to the classification a DAG/tree structure, while the terminology has a lattice structure (in other words, a term may be a subclass of more than one term).
Here comes the problem: I would like to create a subclass of Concept, let's say "MappedConcept"), which should be the set of mapped concepts, that is the set of concepts which have the title equals to a term's title, or it has a synonym equals to a term's title or has a synonym that is equal to a synonym of a term.
In the first-order logic, this set may be expressed as:
∀x∃y( ∃z((hasConceptTitle(x,z) ∧ hasTermTitle(y,z)) ∨
∃z((hasConceptTitle(x,z) ∧ hasTermSynonym(y,z)) ∨
∃z((hasConceptSynonym(x,z) ∧ hasTermTitle(y,z)) ∨
∃z((hasConceptSynonym(x,z) ∧ hasTermSynonym(y,z)) )
How can I obtain this? Defining data properties for "ConceptTitle", "ConceptSynonym", "TermTitle" and "TermSynonym"? And how to describe the string matches?
Maybe those 4 classes should be just data properties of Concept and Term classes?
I read the practical guide of Matthew Horridge several times, but I can't the practical ideas I have on my mind into an ongology in Protègè.
Thanks in advance.

I'm afraid you cannot do this in OWL 2 DL nor in Protégé, which is an editor for OWL 2 DL, because, as far as I can tell, it seems necessary to introduce the inverse of a datatype property, which is forbidden in OWL 2 DL. However, it's possible in OWL Full, and some DL reasoners may even be able to deal with it. Here, in Turtle:
<MappedConcept> a owl:Class;
owl:equivalentTo [
a owl:Class;
owl:unionOf (
[
a owl:Restriction;
owl:onProperty <hasConceptTitle>;
owl:someValuesFrom [
a owl:Restriction;
owl:onProperty [ owl:inverseOf <hasTermTitle> ];
owl:someValuesFrom <Term>
]
] [
a owl:Restriction;
owl:onProperty <hasConceptTitle>;
owl:someValuesFrom [
a owl:Restriction;
owl:onProperty [ owl:inverseOf <hasTermSynonym> ];
owl:someValuesFrom <Term>
]
] [
a owl:Restriction;
owl:onProperty <hasConceptSynonym>;
owl:someValuesFrom [
a owl:Restriction;
owl:onProperty [ owl:inverseOf <hasTermSynonym> ];
owl:someValuesFrom <Term>
]
] [
a owl:Restriction;
owl:onProperty <hasConceptSynonym>;
owl:someValuesFrom [
a owl:Restriction;
owl:onProperty [ owl:inverseOf <hasTermTitle> ];
owl:someValuesFrom <Term>
]
]
)
] .
You can also do it without OWL, with a rule language for instance. The rules would look closer to how you would do it in programming languages. In SWRL:
hasConceptTitle(?x,?z), hasTermTitle(?y,?z) -> MappedConcept(?x)
hasConceptTitle(?x,?z), hasTermSynonym(?y,?z) -> MappedConcept(?x)
hasConceptSynonym(?x,?z), hasTermTitle(?y,?z) -> MappedConcept(?x)
hasConceptSynonym(?x,?z), hasTermSynonym(?y,?z) -> MappedConcept(?x)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Change the length of ContextPre and ContextPost in Quanteda KWIC - r

Is there a way to increase the number of words appearing before and after the keyword in Quanteda kwic function? I've tried by changing the numeric value in: options(width = 200) but it didn't work. #KenBenoit

Related

jq flatten list of objects into one object

R Parsing error when trying to import JSON to R

fish shell --- how to simulate or implement a hash table, associative array, or key-value store

NetLogo: Model gets stuck w no error message

How to translate a first-order logic sentence into a restriction in Protègè with string matching?

Categories

Resources