Sunday, April 15, 2007

Thesis Code

Most computer science or mathematics theses are written in a language called LaTeX. This isn't a regular "markup" language like HTML. It is actually a horrible programming language based on TeX that, when executed, normally is used to output DVI, PostScript, or PDF files. It can also output other bits of information which it will then reuse on subsequent runs to generate bibliographies, table of contents, etc.

Most Linux users are acquainted with the shell. This is normally seen as a text environment to execute programs. It is actually a horrible programming language called "sh" that, when executed, can do pretty much anything, including generate files, run LaTeX, or prompt the user.

Most people are not acquainted with sed. This is normally seen as a utility program to manipulate strings. It pretty much is just that. Technically, it too is a programming language. Noone ever uses it like that.

All the experiment results for my thesis are stored in SQLite databases, which provides me a simple but fast relational storage. SQLite has a command line interface and can produce simple text tables. I can then take those tables and run them through sed, which produces LaTeX tables. I can then copy-paste those generated tables into my thesis. The shell code to automate most of this looks like:

echo '\\begin{tabular}{|c|ccc|ccc|c|}'
echo '\\hline'
echo '& \\multicolumn{3}{c|}{\\textbf{SBDS}} & \\multicolumn{3}{c|}{{\\textbf{AWCS}}} & \\\\'
echo '& Nogoods & Isgoods & Values & Nogoods & Nogoods\\footnotemark[1] & Values&$f$\\\\'
echo '\\hline'
for w in "instance>0" "constraints=420" "constraints=460" "constraints=500"
do
sqlite3 -separator '&' results.sqlite "select '$w',round(avg(a.nogoodssent)), round(avg(a.isgoodssent)), round(avg(a.valuessent)), round(avg(b.nogoodssent)), round(avg(b.nogoodsmade)), round(avg(b.valuessent)), round(avg(a.violations=0)*100)||'\\\\' from results as a,results as b where a.algorithm='coop' and b.algorithm='awcs' and a.instance=b.instance and a.$w";
done | sed -e 's/\.0//g' -e 's/violations=0/Feasible/' -e 's/violations>0/Infeasible/' -e 's/instance>0/Average/' -e 's/constraints=\([^\&]*\)/$|\\constraints|=\1$/' -e 's/tightness=\([^\&]*\)/$|t|=\1$/' -e "s/'values'=\([^\&]*\)/$|\domain|=\1$/"
echo '\\hline'
echo '\\end{tabular}'

So: I've written a program (in shell) that dynamically executes a query (in SQLite) that is piped through a dynamically created program (in sed) that generates another program (in TeX) to be executed. The execution of this TeX can produce another program (in PostScript) that can be executed to display my thesis.

Be afraid.

3 comments:

Anonymous said...

Oh dear.....

Keith said...

Hate to point it out but there's a bug at line 11. There should be a -f switch at the end of that line

Anonymous said...

Have you considered CSH or, better TCSH?