<?xml version="1.0" encoding="utf-8"?>
<!-- generator="FeedCreator 1.7.2-ppt DokuWiki" -->
<?xml-stylesheet href="http://www.berck.se/dokuwiki/lib/exe/css.php?s=feed" type="text/css"?>
<rdf:RDF
    xmlns="http://purl.org/rss/1.0/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel rdf:about="http://www.berck.se/dokuwiki/feed.php">
        <title>Peter uvt</title>
        <description></description>
        <link>http://www.berck.se/dokuwiki/</link>
        <image rdf:resource="http://www.berck.se/dokuwiki/lib/images/favicon.ico" />
       <dc:date>2010-09-04T20:22:57+02:00</dc:date>
        <items>
            <rdf:Seq>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:arpa_format?rev=1208337579&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:clin20?rev=1265106482&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:cvs?rev=1203694299&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:experiments?rev=1201424152&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:gezond_verstand?rev=1214465863&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:giga_word?rev=1244034043&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:giza?rev=1206622243&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:levenshtein?rev=1230553689&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:lm?rev=1204897926&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:machines?rev=1212572372&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:mbmt_demo?rev=1232095297&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:mbmt_howto?rev=1234089343&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:mbmt_sv-en?rev=1235381544&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:moses?rev=1205744881&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:omcs_sander?rev=1214465574&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:papers?rev=1223044878&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:perl?rev=1221038277&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:pharaoh?rev=1218188576&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:srilm?rev=1224227683&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:srilm_wopr?rev=1256028344&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:tadpole?rev=1208944490&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:timbl-arpa?rev=1203694800&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:tmp_srilm_output?rev=1223629712&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:todo?rev=1253786840&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:wopr?rev=1273664549&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:wopr_commands?rev=1264328604&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:wopr_correct?rev=1263294616&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:wopr_etc?rev=1264193416&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:wopr_focus?rev=1276611711&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:wopr_generate?rev=1245767879&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:wopr_howto?rev=1273569796&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:wopr_lcontext?rev=1256896688&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:wopr_multi?rev=1231488581&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:wopr_multi_classifiers?rev=1276599832&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:wopr_multi_dist?rev=1261566242&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:wopr_ngrams?rev=1281017462&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:wopr_recipes?rev=1281712096&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:wopr_scripts?rev=1256826614&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:wopr_server?rev=1273664524&amp;do=diff"/>
                <rdf:li rdf:resource="http://www.berck.se/dokuwiki/uvt:wopr_three?rev=1214376099&amp;do=diff"/>
            </rdf:Seq>
        </items>
    </channel>
    <image rdf:about="http://www.berck.se/dokuwiki/lib/images/favicon.ico">
        <title>Peter</title>
        <link>http://www.berck.se/dokuwiki/</link>
        <url>http://www.berck.se/dokuwiki/lib/images/favicon.ico</url>
    </image>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:arpa_format?rev=1208337579&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-04-16T11:19:39+02:00</dc:date>
        <title>uvt:arpa_format</title>
        <link>http://www.berck.se/dokuwiki/uvt:arpa_format?rev=1208337579&amp;do=diff</link>
        <description>ARPA formaat


Attempt to output the arpa format (ngram-format) for Louis ten Bosch in Nijmegen.

Python version: prog/UvT/wopr/test/tree2graph

/Users/pberck/prog/UvT/Timbl6/src/Timbl -f num -I num.a0 -a0 +D

/Users/pberck/prog/UvT/Timbl6/src/Timbl -f num -I num.a1 -a1 +D</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:clin20?rev=1265106482&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2010-02-02T11:28:02+02:00</dc:date>
        <title>uvt:clin20</title>
        <link>http://www.berck.se/dokuwiki/uvt:clin20?rev=1265106482&amp;do=diff</link>
        <description>Op charybdis.

We used the following data sets:

               Lines      Words          nyt.1e5      100,000    2,269,935 nyt.1e6      1,000,000   22,855,429 nyt.1e7      10,000,000  228,077,503 nyt.2e7      20,000,000  455,567,567 nyt.3e7      30,000,000  683,955,037

And test sets:</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:cvs?rev=1203694299&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-02-22T16:31:39+02:00</dc:date>
        <title>uvt:cvs</title>
        <link>http://www.berck.se/dokuwiki/uvt:cvs?rev=1203694299&amp;do=diff</link>
        <description>export CVS_RSH=&quot;ssh&quot;
export CVS_ROOT=&quot;:ext:pberck@ilkcvs.uvt.nl:/corpus/cvsroot/&quot;
cvs -d $CVS_ROOT co wopr</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:experiments?rev=1201424152&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-01-27T09:55:52+02:00</dc:date>
        <title>uvt:experiments</title>
        <link>http://www.berck.se/dokuwiki/uvt:experiments?rev=1201424152&amp;do=diff</link>
        <description>Experiment 1


pberck@chaos:/exp/pberck/a3s_data$ 

Read (convert) the “a3” type dataset.

~/wopr/wopr -r read_a3 -p filename:giza++-nl-to-en.A3.final.train

Make data.

~/wopr/wopr -r window_s -p filename:giza++-nl-to-en.A3.final.train.xa3,ws:7

Start server.</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:gezond_verstand?rev=1214465863&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-06-26T09:37:43+02:00</dc:date>
        <title>uvt:gezond_verstand</title>
        <link>http://www.berck.se/dokuwiki/uvt:gezond_verstand?rev=1214465863&amp;do=diff</link>
        <description>Spul op de ilk.uvt.nl (“zeus”) zetten:


pberck@zeus:/var/www/gezondverstand$

&lt;del&gt;mysql --user=pberck --password=ma8 pberck&lt;/del&gt;

Dat was'm niet, maar DEZE:

pberck@zeus:~$ mysql -u gezondverstand -p &lt; gv.sql

pberck@zeus:/var/www/gezondverstand$ tar -xf ~/gv_20080626a.tar</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:giga_word?rev=1244034043&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2009-06-03T15:00:43+02:00</dc:date>
        <title>uvt:giga_word</title>
        <link>http://www.berck.se/dokuwiki/uvt:giga_word?rev=1244034043&amp;do=diff</link>
        <description>2009/05/26

	*  Copied all nyt_eng files from DVD2 to scylla:/exp2/pberck/gigaword_eng_v2_2/nyt_eng/.
	*  Wrote a small perl script to take the TEXT from type=“story”.
	*  Concatenated all in one big file.


pberck@scylla:/exp2/pberck/gigaword_eng_v2_2/nyt_eng$ perl gigaword_eng.pl &gt;nyt_eng_199407_200412_story.txt

pberck@scylla:/exp2/pberck/gigaword_eng_v2_2/nyt_eng$ wc nyt_eng_199407_200412_story.txt
  98948717  925225222 5562192492 nyt_eng_199407_200412_story.txt</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:giza?rev=1206622243&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-03-27T13:50:43+02:00</dc:date>
        <title>uvt:giza</title>
        <link>http://www.berck.se/dokuwiki/uvt:giza?rev=1206622243&amp;do=diff</link>
        <description>Giza++ tutorial: &lt;http://bulba.sdsu.edu/deptranswiki/GIZA%2B%2B_Tutorial&gt;</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:levenshtein?rev=1230553689&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-12-29T13:28:09+02:00</dc:date>
        <title>uvt:levenshtein</title>
        <link>http://www.berck.se/dokuwiki/uvt:levenshtein?rev=1230553689&amp;do=diff</link>
        <description>Mon 2008/12/29


WOPR spelling corrector. Implemented levenshtein distance in wopr source code (levenshtein.cc/.h)</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:lm?rev=1204897926&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-03-07T14:52:06+02:00</dc:date>
        <title>uvt:lm</title>
        <link>http://www.berck.se/dokuwiki/uvt:lm?rev=1204897926&amp;do=diff</link>
        <description>Language Models


Predict next word.

Have discontinuous sequences? If we have w1 w2, then we have w3 after (4,...,8) other words. Such as verbs at the end. w3 can be category as well, combination of word and categories and other meta info (“abstract model”).</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:machines?rev=1212572372&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-06-04T11:39:32+02:00</dc:date>
        <title>uvt:machines</title>
        <link>http://www.berck.se/dokuwiki/uvt:machines?rev=1212572372&amp;do=diff</link>
        <description>ilk.uvt.nl (zeus (was: theia))

	*  Webserver
	*  Conceptnet PG DB (API also here)

ls0168.uvt.nl (chaos)

	*  Wopr server op poort 1981

ls0138.uvt.nl (erebus)

	*  Django
	*  lm in /exp/pberck/lm

ls0171.uvt.nl (phoenix)


Instructions: &lt;http://www.statmt.org/moses/?n=Development.GetStarted&gt;</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:mbmt_demo?rev=1232095297&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2009-01-16T09:41:37+02:00</dc:date>
        <title>uvt:mbmt_demo</title>
        <link>http://www.berck.se/dokuwiki/uvt:mbmt_demo?rev=1232095297&amp;do=diff</link>
        <description>Fri 2009/01/16


durian:~ pberck$ mkdir mbmt_demo
cd mbmt_demo
cp ~/prog/trunk/sources/Mbmt/mbmt-0.1.tar.gz .
tar -zxf mbmt-0.1.tar.gz
cd mbmt-0.1
./configure
make</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:mbmt_howto?rev=1234089343&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2009-02-08T11:35:43+02:00</dc:date>
        <title>uvt:mbmt_howto</title>
        <link>http://www.berck.se/dokuwiki/uvt:mbmt_howto?rev=1234089343&amp;do=diff</link>
        <description>MBMT howto


This HOWTO will explain how to train an MBMT system from scratch.

1. Requirements


Two data sets and a test set:


	*  An GIZA++ aligned file (“A3” format) to train the translation system.
	*  A corpus in the target language for the language model (plain text, one sentence per line).
	*  Test set in the source language (one sentence per line).</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:mbmt_sv-en?rev=1235381544&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2009-02-23T10:32:24+02:00</dc:date>
        <title>uvt:mbmt_sv-en</title>
        <link>http://www.berck.se/dokuwiki/uvt:mbmt_sv-en?rev=1235381544&amp;do=diff</link>
        <description>2009/02


On phoenix: pberck@phoenix:/exp/pberck/europarl/</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:moses?rev=1205744881&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-03-17T10:08:01+02:00</dc:date>
        <title>uvt:moses</title>
        <link>http://www.berck.se/dokuwiki/uvt:moses?rev=1205744881&amp;do=diff</link>
        <description>Installing moses on Ubuntu

	*  bash regenerate-script (otherwise wrong bash)
	*  hardcoded aclocal/automake in regenerate-script.
	*  sudo apt-get install zlib1g-dev


(zie ook &lt;http://statmt.org/wmt07/baseline.html&gt;)

Scripts


pberck@ls0138:/exp/pberck/lm/moses/scripts$ make release</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:omcs_sander?rev=1214465574&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-06-26T09:32:54+02:00</dc:date>
        <title>uvt:omcs_sander</title>
        <link>http://www.berck.se/dokuwiki/uvt:omcs_sander?rev=1214465574&amp;do=diff</link>
        <description>Op ilk.uvt.nl:

psql -U omcs -d omcs -W -h localhost

psql -U omcs -d omcs -h localhost -W &lt; conceptnet-2007-09-25.psql</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:papers?rev=1223044878&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-10-03T16:41:18+02:00</dc:date>
        <title>uvt:papers</title>
        <link>http://www.berck.se/dokuwiki/uvt:papers?rev=1223044878&amp;do=diff</link>
        <description>*  [turian-summit03eval.pdf] &lt;http://nlp.cs.nyu.edu/eval/&gt; Evaluation of Machine Translation and its Evaluation, Joseph P. Turian, Luke Shen, and I. Dan Melamed (2003). ROUGE for MT evaluation. 

	*  [ALTA200519.pdf] RTT what is it good for, Somers, Harold. Round Trip Translation.</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:perl?rev=1221038277&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-09-10T11:17:57+02:00</dc:date>
        <title>uvt:perl</title>
        <link>http://www.berck.se/dokuwiki/uvt:perl?rev=1221038277&amp;do=diff</link>
        <description>Wed 2008/09/10


Local perl on ls0138 (as per &lt;http://servers.digitaldaze.com/extensions/perl/modules.html&gt;).



Your choice:  [INSTALLDIRS=site] PREFIX=~/perl

install Math::CDF


Added to paths: export PERL5LIB=/home/pberck/perl:/home/pberck/perl/lib/perl/5.8.8:$PERL5LIB</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:pharaoh?rev=1218188576&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-08-08T11:42:56+02:00</dc:date>
        <title>uvt:pharaoh</title>
        <link>http://www.berck.se/dokuwiki/uvt:pharaoh?rev=1218188576&amp;do=diff</link>
        <description>Thu 2008/08/07


Pharaoh on lw0164. Copied (with sftp because of scp version differences) from ls0171.


pberck@pi2628:~/pharaoh-v1.2.3$ ldd pharaoh
        libstdc++.so.5 =&gt; /usr/lib/libstdc++.so.5 (0x40027000)
        libm.so.6 =&gt; /lib/libm.so.6 (0x400d9000)
        libgcc_s.so.1 =&gt; /usr/lib/libgcc_s.so.1 (0x400fc000)
        libc.so.6 =&gt; /lib/libc.so.6 (0x40104000)
        /lib/ld-linux.so.2 =&gt; /lib/ld-linux.so.2 (0x40000000)
pberck@pi2628:~/pharaoh-v1.2.3$ pharaoh
Pharaoh v1.2.3, written by …</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:srilm?rev=1224227683&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-10-17T09:14:43+02:00</dc:date>
        <title>uvt:srilm</title>
        <link>http://www.berck.se/dokuwiki/uvt:srilm?rev=1224227683&amp;do=diff</link>
        <description>Installing srilm on Ubuntu

	*  machinetype i686, changed cxx/gcc variables, removed --pentium optimisation.
	*  Installed tck8.4-dev before.

On ls0138


Debian has no libtcl? Changed Makefile.i686 and set NO_TCL=X, removed the other TCL settings.</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:srilm_wopr?rev=1256028344&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2009-10-20T10:45:44+02:00</dc:date>
        <title>uvt:srilm_wopr</title>
        <link>http://www.berck.se/dokuwiki/uvt:srilm_wopr?rev=1256028344&amp;do=diff</link>
        <description>Thu 2008/09/11


In MBP:/Users/pberck/work/srilm_wopr.

Verschillende n-gram ibases, met kleinere kontext (or when unk words) de kleinere ibase (backoff) nemen)?

Hoe een zin te 'perplexiteren' :) ? windowen, per instantie voorspellen, vergelijken.


wopr


wopr -r lexicon -p filename:reuters.martin.tok.1000
wopr -r window_s -p filename:reuters.martin.tok.1000,ws:3,pre_s:&quot;&lt;s&gt;&quot;,suf_s:&quot;&lt;/s&gt;&quot;

(and ws:2 and ws:1)

wopr -r make_ibase -p filename:reuters.martin.tok.1000.ws3,timbl:&quot;-a1 +vdb +D +G&quot;
wop…</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:tadpole?rev=1208944490&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-04-23T11:54:50+02:00</dc:date>
        <title>uvt:tadpole</title>
        <link>http://www.berck.se/dokuwiki/uvt:tadpole?rev=1208944490&amp;do=diff</link>
        <description>03/02/2008


CVS update, and compiled.


peter-bercks-macbook-pro:Tadpole pberck$ ./src/Tadpole test.txt
./src/Tadpole v.0.1-beta

Initiating lemmatizer...
Initiating morphological analyzer...
Initiating tagger...
  Reading the lexicon from: /Users/pberck/install/etc/tadpole/cgn-wotan-dcoi.mbt.lex.ambi.05...ready, (200316 words).
  Reading frequent words list from: /Users/pberck/install/etc/tadpole/cgn-wotan-dcoi.mbt.top500...ready, (500 words).
  Reading case-base for known words from: /Users/p…</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:timbl-arpa?rev=1203694800&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-02-22T16:40:00+02:00</dc:date>
        <title>uvt:timbl-arpa</title>
        <link>http://www.berck.se/dokuwiki/uvt:timbl-arpa?rev=1203694800&amp;do=diff</link>
        <description>22-Feb-2008


pberck@ls0138:~/timbl-arpa$ 

SRILM LM:

/exp/pberck/lm/srilm/bin/i686/ngram-count -text zin01.txt -lm foo

N-grams for wopr:

../prog/wopr/wopr -r ngram -p filename:zin01.txt

Timbl tree (wgt contains 1 1, 2 2):

Timbl -f zin01.txt.ng3 -I zin01.txt.ng3.a1 -a1 +D -w wgt</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:tmp_srilm_output?rev=1223629712&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-10-10T11:08:32+02:00</dc:date>
        <title>uvt:tmp_srilm_output</title>
        <link>http://www.berck.se/dokuwiki/uvt:tmp_srilm_output?rev=1223629712&amp;do=diff</link>
        <description>lw0196:srilm_wopr pberck$ cat corpus1.p 
&lt;s&gt; a b c d e &lt;/s&gt;
&lt;s&gt; a b a f d &lt;/s&gt;



lw0196:srilm_wopr pberck$ /Users/pberck/wrk/mbmt/srilm/bin/macosx/ngram-count -text corpus1.p
&lt;s&gt;	2
&lt;s&gt; a	2
&lt;s&gt; a b	2
a	3
a b	2
a b c	1
a b a	1
a f	1
a f d	1
b	2
b c	1
b c d	1
b a	1
b a f	1
c	1
c d	1
c d e	1
d	2
d e	1
d e &lt;/s&gt;	1
d &lt;/s&gt;	1
e	1
e &lt;/s&gt;	1
&lt;/s&gt;	2
f	1
f d	1
f d &lt;/s&gt;	1



lw0196:srilm_wopr pberck$ cat corpus1.srilm 
\data\
ngram 1=8
ngram 2=10
ngram 3=9

\1-grams:
-0.7829502	&lt;/s&gt;
-99	&lt;s&gt;	-0.3569289
-0.6166…</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:todo?rev=1253786840&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2009-09-24T12:07:20+02:00</dc:date>
        <title>uvt:todo</title>
        <link>http://www.berck.se/dokuwiki/uvt:todo?rev=1253786840&amp;do=diff</link>
        <description>*  Joost (CC: antal) mailtje met Wopr output resultaten opnieuw sturen.

	*  Wopr extensie, shrinking of tree for selected number of words.

	*  Paragraaf voor VICI voortgangsrapport.

	*  pass aanvragen.

	*  Dimbl</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:wopr?rev=1273664549&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2010-05-12T13:42:29+02:00</dc:date>
        <title>uvt:wopr</title>
        <link>http://www.berck.se/dokuwiki/uvt:wopr?rev=1273664549&amp;do=diff</link>
        <description>WOPR


Memory Based Word Prediction toolbox. It grew out of the need for a (memory based) language model in MBMT. Over the last two years it has grown into a toolkit that can create data, train memory based models, calculate perplexity, has scripting capabilities, &amp;c.</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:wopr_commands?rev=1264328604&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2010-01-24T11:23:24+02:00</dc:date>
        <title>uvt:wopr_commands</title>
        <link>http://www.berck.se/dokuwiki/uvt:wopr_commands?rev=1264328604&amp;do=diff</link>
        <description>Description


WOPR is used to create data and interface with Timbl. It can run with a number of commands and parameters from the command line, or read a script.

Basic syntax: ./wopr -r COMMAND1,... -p PARAMETER1:VALUE1,...

Script syntax: ./wopr -s SCRIPT -p PARAMETER1:VALUE1,...</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:wopr_correct?rev=1263294616&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2010-01-12T12:10:16+02:00</dc:date>
        <title>uvt:wopr_correct</title>
        <link>http://www.berck.se/dokuwiki/uvt:wopr_correct?rev=1263294616&amp;do=diff</link>
        <description>wopr -l -r window_lr -p filename:correct.0,lc:3,rc:0
wopr -l -r window_lr -p filename:correct.1,lc:3,rc:0
wopr -l -r make_ibase -p filename:correct.0.l3r0,timbl:&quot;-a1 +D&quot;
wopr -l -r lexicon -p filename:correct.0

wopr -l -r correct -p filename:correct.1.l3r0,ibasefile:correct.0.l3r0_-a1+D.ibase,
                      timbl:'-a1 +D',mwl:1,lexicon:correct.0.lex,max_ent:100,mld:3,max_distr:100</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:wopr_etc?rev=1264193416&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2010-01-22T21:50:16+02:00</dc:date>
        <title>uvt:wopr_etc</title>
        <link>http://www.berck.se/dokuwiki/uvt:wopr_etc?rev=1264193416&amp;do=diff</link>
        <description>The etc/ directory contains a number of useful perl scripts.

examine_px.pl


Takes a .px file and prints info for each classification. It shows the target, the classification, a user-variable and the log2prob of the classification. In the end, a summary is printed. (lp, wlp, dp, lp, sz, md).</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:wopr_focus?rev=1276611711&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2010-06-15T16:21:51+02:00</dc:date>
        <title>uvt:wopr_focus</title>
        <link>http://www.berck.se/dokuwiki/uvt:wopr_focus?rev=1276611711&amp;do=diff</link>
        <description>Focus


Let wopr focus on a number of targets from the full training set. For example, to do confusibles, &amp;c. It also trains the created instances bases and creates a kvs file for the multi classifier.


	*  Parameters:   
		*  focus: a file with words which can be in the target. One word or more words per line, but only the first word is taken. This allows Wopr to use lexicon files and lists generated with rfl files as input.
		*  fco: target offset, default 0. Set to 1 to focus on the word at …</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:wopr_generate?rev=1245767879&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2009-06-23T16:37:59+02:00</dc:date>
        <title>uvt:wopr_generate</title>
        <link>http://www.berck.se/dokuwiki/uvt:wopr_generate?rev=1245767879&amp;do=diff</link>
        <description>Introduction


This page describes how the memory based language modeller wopr can be used to generate language. 

The web demos can be found here:

Generator 1 (Dutch)

Generator 2 (Swedish)

Generator 3 (English)

Startup like:



pberck@chaos:/exp/pberck/wopr-1.7.1$ ./wopr -r  generate_server -p ibasefile:clean.sv.10000.ws3.ibase,port:1983
,timbl:&quot;-a1 +D&quot;,end:&quot;.?\!&quot; &gt; /dev/null

pberck@chaos:/exp/pberck/wopr-1.7.1$ ./wopr -r generate_server -p ibasefile:austen.txt.ws3.ibase,timbl:&quot;-a4 +D&quot;,
ws…</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:wopr_howto?rev=1273569796&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2010-05-11T11:23:16+02:00</dc:date>
        <title>uvt:wopr_howto</title>
        <link>http://www.berck.se/dokuwiki/uvt:wopr_howto?rev=1273569796&amp;do=diff</link>
        <description>Wopr HOWTO


This HOWTO explains how to compile Wopr and train a language model.

Installation


Timbl is required to run Wopr.

Assuming Timbl is configured and installed locally with --prefix=/home/pberck/local. We also install Wopr locally:


sh bootstrap
/configure --prefix=/home/pberck/local --with-timbl=/home/pberck/local
make
make install</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:wopr_lcontext?rev=1256896688&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2009-10-30T10:58:08+02:00</dc:date>
        <title>uvt:wopr_lcontext</title>
        <link>http://www.berck.se/dokuwiki/uvt:wopr_lcontext?rev=1256896688&amp;do=diff</link>
        <description>Long Context


a.k.a global context.

This page explaines the global context features, called elastigrams, in Wopr.

Elastigrams


We mark words from a list with words that are deemed important. When
going through a text to make instances, we put the words from the list
we encounter in the text in the global context. The global context has a certain size, so that older words are pushed out by newer words. We also put a timer on the words, so that they will disappear even if they are not pushed o…</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:wopr_multi?rev=1231488581&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2009-01-09T09:09:41+02:00</dc:date>
        <title>uvt:wopr_multi</title>
        <link>http://www.berck.se/dokuwiki/uvt:wopr_multi?rev=1231488581&amp;do=diff</link>
        <description>pberck@chaos:/exp/pberck/multi_wopr$ ~/wopr/wopr -l -r multi -p filename:mbmt_te
st,kvs:wopr.ibases
08:58:47.40: Timbl support built in. 
08:58:47.40: /home/pberck/local
08:58:47.40: Starting wopr 1.4.6
08:58:47.40: PID:  19887 PPID:  21333
08:58:47.41: Starting.
08:58:47.41: Running: multi
08:58:47.41: multi
08:58:47.41:  filename:   mbmt_test  
08:58:47.41:  lexicon:
08:58:47.41:  counts:
08:58:47.41:  kvs:        wopr.ibases
08:58:47.41:  timbl:
08:58:47.41:  ws:         3
08:58:47.41:  lower…</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:wopr_multi_classifiers?rev=1276599832&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2010-06-15T13:03:52+02:00</dc:date>
        <title>uvt:wopr_multi_classifiers</title>
        <link>http://www.berck.se/dokuwiki/uvt:wopr_multi_classifiers?rev=1276599832&amp;do=diff</link>
        <description>multi_classifiers


From Wopr version 1.16.0

Apply multiple classifiers to a test file. Distributions can optionally be merged, and the training file format can be different for the different classifiers as long as the target is the same. To be totally correct, the targets may be different as well but in that case you have to write your own script to process the resulting .mc file.</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:wopr_multi_dist?rev=1261566242&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2009-12-23T12:04:02+02:00</dc:date>
        <title>uvt:wopr_multi_dist</title>
        <link>http://www.berck.se/dokuwiki/uvt:wopr_multi_dist?rev=1261566242&amp;do=diff</link>
        <description>multi_dist


From Wopr version 1.14.15

Apply multiple classifiers to a test set, and merge their distributions.

Example call, the id and topn parameters are optional:



wopr -r md -p filename:austen.test.l2r0,kvs:austen.kvs{,id:ID,topn:N}


austen.kvs contains a description of which classifiers to use:</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:wopr_ngrams?rev=1281017462&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2010-08-05T16:11:02+02:00</dc:date>
        <title>uvt:wopr_ngrams</title>
        <link>http://www.berck.se/dokuwiki/uvt:wopr_ngrams?rev=1281017462&amp;do=diff</link>
        <description>n-Gram Language Model


“Not as cool as a memory based model”

Training a n-gram model


There is one funtion to create an n-gram language model, ngl. In its full form:


wopr -l -r ngl -p filename:IN,n:N,fco:FCO


This creates a file with uni- to n-grams for the given text file. For each n-gram, the absolute frequency count and a conditional probability is calculated. The fco parameter is a frequency cut off value; only n-grams which occur more than the fco are included in the model.</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:wopr_recipes?rev=1281712096&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2010-08-13T17:08:16+02:00</dc:date>
        <title>uvt:wopr_recipes</title>
        <link>http://www.berck.se/dokuwiki/uvt:wopr_recipes?rev=1281712096&amp;do=diff</link>
        <description>An overview of Wopr commands and common tasks.

Wopr is used to create data and interface with Timbl. It can run with a number of commands and parameters from the command line, or read commands from a script.

Basic syntax: ./wopr -r COMMAND1,… -p PARAMETER1:VALUE1,…</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:wopr_scripts?rev=1256826614&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2009-10-29T15:30:14+02:00</dc:date>
        <title>uvt:wopr_scripts</title>
        <link>http://www.berck.se/dokuwiki/uvt:wopr_scripts?rev=1256826614&amp;do=diff</link>
        <description>Scripting


A Wopr script is a text file with Wopr-commands. It contains the same commands and parameters as on the command line.

A script is called as follows:



wopr -s SCRIPT -p ...


Only one script can be specified at a time. Optionally, extra parameters can still be specified on the command line.</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:wopr_server?rev=1273664524&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2010-05-12T13:42:04+02:00</dc:date>
        <title>uvt:wopr_server</title>
        <link>http://www.berck.se/dokuwiki/uvt:wopr_server?rev=1273664524&amp;do=diff</link>
        <description>WOPR server


Both the IGTree and ngram language models can be run in server mode. Wopr will listen on a port and process instances or sentences it receives. 

IGTree server


The command is server4. It returns a log10(probability) of a sentence or instance. An example invocation:</description>
    </item>
    <item rdf:about="http://www.berck.se/dokuwiki/uvt:wopr_three?rev=1214376099&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2008-06-25T08:41:39+02:00</dc:date>
        <title>uvt:wopr_three</title>
        <link>http://www.berck.se/dokuwiki/uvt:wopr_three?rev=1214376099&amp;do=diff</link>
        <description>Fre 2008/06/19


Three data sets from ls0168 (pberck@chaos:/exp/antalb/mbmt/GIZA++-v2/):


pberck@chaos:/exp/pberck/three$ wc -l *
   871180 EMEA-english.train.txt
  1312111 EuroParl-english.train.txt
   298514 OpenSub-english.train.txt
  2481805 total
pberck@chaos:/exp/pberck/three$ cat *.txt &gt; three.txt
pberck@chaos:/exp/pberck/three$ wc -l *
   871180 EMEA-english.train.txt
  1312111 EuroParl-english.train.txt
   298514 OpenSub-english.train.txt
  2481805 three.txt
  4963610 total
pberck@chao…</description>
    </item>
</rdf:RDF>
