Due: TBD (1159 PM Friday Dec 14, but note that HW11 is due before
this)
Lets have the following late policy on this one: 5% off if submitted
before noon on Dec 15, 10% if before 1159PM Dec 16, 20% after that (up
to Dec 21 11:59 PM).
WARNING: This is by far the longest assignment of the semester,
and not one you want to procrastinate on!
Since this homework is a project, it is weighted significantly heavier
than other homeworks.
Objectives:
This program serves to put together everything you have learned about
Prolog and programming languages. In particular, it has the
following objectives:
- Learn more about some common data structures used in compilers
and software engineering. In particular, this includes abstract
syntax trees (ASTs) and control flow graphs (CFGs).
- Get familiar with a few common tools and techniques in compilers
and
software
engineering: the applications here are widely used in these areas, and
of course that is the topic of this course.
- Learn a little about XML: As you might know, Extensible Markup
Language (XML) is emerging as a standard for
interchanging information in various diverse domains. It is
useful
for you to learn the basics of XML structure, as you might want to
consider using XML for future projects. There isn't much to learn
though, as its structure is simple.
- Learn how to process XML data using Prolog: Prolog has found a
niche in processing xml (and html) data, including numerous web
applications such as the semantic web.
- Learn how to use APIs: You may already know this well through
your prior programming experience, but you will learn it now if
not.
Learning how to read documentation to reuse existing code is of immense
value towards your computer science career.
- Learn how to understand and attack a medium-sized problem: This
is
probably
the single most important thing to learn during all your first year
courses, in preparation for research. The task probably looks
daunting at first, but will become easy if you think clearly. The
hard part is understanding the problem, not coding it up; indeed, any
attempt to start coding first will probably be wasted.
To do these, you will be doing some analyses of programs written
in Louden's language. More precisely, you will be inputting
AST/CFGs and analyzing them. Please note that the length of this
assignment is not a reflection of the program's difficulty - the
program is fairly simple to implement if you take a methodogical
step-by-step approach, though it can be overwhelming if you try to do
it without understanding the project first. Of course, all real
applications are that way.
For
this program, you are allowed to use any predefined predicates that you
want; however, I caution you that you can probably write any desired
predicate in much less time than needed to find it in the manual.
You should also minimize the use of extralogical
predicates, as overuse leads to distress.
Problem:
Given
a program written in Louden's language, we will be generating an
AST/CFG from it, and then analyzing the AST/CFG to do some typical
software engineering tasks. The specific analysis tasks that you
will do are listed below.
Definitions:
We will use the following definitions throughout this assignment: A
variable x is
defined at a
node N if x is assigned to by N (i.e., N is an assignment and x is on
the
left
hand side of N). A variable x is
used
at a node N if N is an assignment and x is in the right
hand side of N, or N is an selection or iteration and x appears in the
condition of N.
A
control-flow graph (CFG)
consists of vertices for each statement in the program and edges for
each possible flow of execution in the program.
Since there are several programs we are talking about in the
assignment, we will refer to
the
program
(in
Louden's
language)
being
analyzed
as
the
target
program, and the [Prolog]
program you are writing as the
analyzer
program.
Attacking the Problem:
I've outlined how to attack the problem here. You are required to
use the XML AST produced by the parser discussed below, as one of the
major points of this program is to learn how to handle such data
structures.
0. Tokenize the input program:
As you know, the first stage of
compilation is lexical analysis (i.e., tokenization). For the
purposes of this project, your brain will be the tokenizer, and you
will input a target program as a list of tokens.
1. Parse the Input Program (into an AST):
Naturally, we will use Louden's language for target programs.
I've
implemented a
parser for his language here, so
there is very little for you to do here. To use this, store
the
parser in a file named parse.pl (you may need to use a different file
type on your system), and put ":- [parse]." (without the quotes) at the
beginning of your analyzer program file. This tells it to
load
parse.pl when it is loaded. You may then use parseLouden/3
defined
there to produce an AST/CFG into an XML file.
You should read the parser documentation in the file first, and then
run the parser on
some small target program to get familiar with how its output
looks.
Note that there are examples at the end of the file, but you will need
different examples for your test cases. You DO NOT
need to understand the parser code - it uses definite clause grammars,
which can be thought of as notational conveniences for Prolog rules.
If
you get an error message referring to a missing library or package
predicate, you need to compile the appropriate packages/libraries on
your installation. I am told that the Windows and Mac versions
include all libraries by default, but all of SWI's Linux versions
don't.
2. Generate a CFG
The next step after parsing is CFG generation. This step is also
trivial, since my parser also generates CFG
edges, resulting in an integrated AST/CFG. You should write a
small target program and figure out what the
CFG looks like (you want to draw the graphs on paper). Make sure
your program has nested loops and/or
selection statements.
3. Input the AST+CFG
This step is trivial, as SWI-Prolog includes predicates for
reading/writing xml files - see the SGML/XML package documentation
(from
the SWI homepage, click on "Manual" and follow the links). I
suggest looking at load_xml_file/2. At this point, you will have
the xml file contents as some huge Prolog
structure.
4. Perform analysis tasks
Write predicates for each of the analysis tasks listed below. You are
required to use the AST+CFG generated by the parser.
5. Test your program
You should test your analyzer program completely. Make sure you
supply
test cases that are complete enough to convince the grader that your
analyzer works. If you haven't tested major cases, we assume that
your analyzer doesn't work.
Write a runtest/0 predicate that runs all your test cases. If
your program structure isn't compatible with such an approach, you may
also CLEARLY indicate how to run your program through all your test
cases. Of course, we will try additional test cases.
Analysis Tasks:
For all of the following, the arguments are as follows:
- Prog:
a program, represented as a list of tokens (Ex: [x, ':=', '3',
'*', x, '+', x, ';', x, ':=', y].
- Var: the name of a variable in the target program (Ex: x).
- Node*: the
index of a token in Prog (0 is the first node). Depending on which
library predicates/options you use, it might be an integer or atom
(e.g., 4 or '4'). You can write your code either way, but note
that '4' would be printed to screen as 4 (so you want to first make
sure which way your choices work). Also note that this is not the same
as the node<n> atoms generated by the parser (read the parser
documentation).
- Nodes: a set of nodes
- *Vars: a set of variable names (Ex: [x,y])
- varStats(+Prog,+Var,?NumDef,?NumUsed) succeeds if NumDef/NumUsed
are the number of times Var is defined/used
in the program. However, we wish to distinguish defs/uses that are
inside an iteration from those outside, so NumDef is the structure
def(NumInIter,NumOutIter), and similarly for
NumUsed and used(NumInIter,NumOutIter).
- selClose(+Prog,+NodeIf,?NodeFi) succeeds if NodeIf is an 'if'
node and NodeFi is the index of the matching 'fi'.
- badLoop(+Prog,+Node) succeeds if Node is an iteration node and
the loop
body does not define any variables in the loop condition.
- uninitVar(+Prog,+Var,-Node,-Impossible) succeeds if Node uses Var
and it is possible that Var has not been defined before Node.
Impossible is bound to true if it is impossible that Var has been
defined before Node, and false otherwise.
- Determine a useful task of your own choice, and code it. Make
sure you document exactly what the task is (in comments). Your grade is
based on 1) how creative and useful your task is, 2) how difficult it
is to implement, and 3) correctness. Thus, you may wish to 1) argue for
why your task is useful, and 2) design a task that requires both AST
and CFG information.
Many of the above tasks are
useful and often done by compilers and/or software engineering tools,
and there are thus various compiler
techniques for doing similar tasks efficiently (e.g., dataflow
analysis). Of course, that's not the point here and you don't need to
learn those techniques for this project (indeed, you will probably not
finish if you spend time on that, though I'd be glad to talk to you
later about related research in the software engineering world).
Submission
The programs should be submitted electronically to the grader, and cc'd
to me. Your main program
should be named <firstname>_<surname>_hw10; other
filenames should start with your two initials, and your main program
should automatically load them when it is loaded.
Submit a program including runtest/0.
Hints/Clarifications/Corrections
- To repeat what the assignment says, you will not be writing any
significant code until step 4. If you are, then stop and read the
assignment again.
- As long as you can process small (<200 tokens) programs in a
few seconds, don't worry about efficiency.
- Don't forget that Prolog's only data structure is the structure,
with lists being a special case. Similarly, 'id=node17' is also
just shorthand for the structure '=(id,node17)'. Thus, you can
unify against this: e.g., if you unified 'id=X' against this, X would
be bound to node17.
- This assignment is about static
analysis, and you don't need to worry about data values, which are in
general dynamic. For example, in "x:=0-x; while x do ...", the
while loop is never entered but that relies on data values (so, you
don't need to worry about that). Dynamic analysis is a much harder (and
undecidable) problem of course. Most compilers eliminate constant
expressions using constant folding/propagation, but the assignment does
not ask you to do that.
- The "test cases" at the bottom of the parser are test cases I
used for my parser. They have nothing to do with the test cases that
you should come up with for your program.
- IMPORTANT: The parser produces a file (e.g., ast4.xml).
However,
the
line
in
the
code
that actually prints it out was commented
out in my original post, which makes the directions in step 3
mysterious. This line is not commented out in the currently posted
version of the parser, so that is correct. Sorry for any confusion this
may have caused.
- Its OK to use any built-in predicates you want (though you're
likely to waste more time finding something than just coding it).
However, its not OK to 'cheat' by using other languages/libraries.
Examples of this are using XPATH, or the external language interface to
an imperative language.
- I can't vouch for this, but a student told me that different
versions of SWI treat the atom vs. integer issue (see my definition of
Node* above) differently. Nobody has ever told me of such a problem in
the past though, so its probably because of different default options
in the xml predicates.
- If you need to convert between [numeric] atoms and integers,
atom_number/2 will do the job. This might be relevant due to the issues
I mentioned in defining Node* above. Note that Section 4.21 of the SWI
reference manual has various other similar conversion predicates.