Assignment 2
308-424A
Due: October 14th, 1999
Work is to be done independently. Please see the policy on due dates, cheating and inappropriate collaboration accessible via the class web page.
Bugs/Fixes/FAQ for this assignment can be found here.The complete source code (for those who are interested) can be found in ass2.tar.gz
Pedagogical objectives of this assignment:
In this assignment you are asked to find the shortest distance between two urls (Uniform Resource Locators) on the world wide web. Here we define the shortest distance as the number of links that must be followed from one page to get to the other page. You should implement both informed (e.g. algorithm A or better) and uninformed searches although the informed (heuristic-based) search should obviously do much better.
You will be given some utility functions which will function on the linux machines at cs. There functions will be:
char *get_page(char *url)
: This function will return a string
containing the page pointed to by url
. It will return
NULL
if the page can not be loaded or found.
char **get_links(char *url)
: This function will return
an array of links (char*) on the page pointed to by url
. The array
of links will have a null terminator. The return value will be NULL if there
was an error.
long count_links(char **links)
: If you pass the
pointer returned by get_links()
to this function, it will return
the number of urls in the array.
void free_links(char **links)
: This function will free any
memory associated with urls
.
void free_page(char *page)
: This function will free the
string containing the page returned by get_page()
.
These functions will be defined in ass2.h, which you may access by putting this in your code:
#include <ass2.h>
and compile and link to the library as shown below.
From a linux box, you can grab a sample main.c file and compile it with the library (the library only works under linux). Here is a sample session which links to the C file, and compiles it with the library:
[beatrice] [/u7/grad/ericb/tmp] cp /course/cs424/src/main.c ./ [beatrice] [/u7/grad/ericb/tmp] gcc -Wall -I/course/cs424/include -L/course/cs424/lib -o ass2 main.c -lass2If you wish to see the internals of the library, the source can be found in the course directory in :
/course/cs424/src/
. The files you will need
are spin_utils.c
and spin_utils.h
. Note that these
rely on two perl scripts in /course/cs424/scripts
.
If you experience problems with the library send mail to cs424@cs.mcgill.ca and explain what problems you are encountering. Please link against the library in the course directory (as opposed to copying it into your home directory) as we will be updating it if we find bugs.
You should first test your program by choosing two urls on the Computer Science web-site (these should be close together), and then test with the two urls which will be provided on the course web site.
Your program should take the urls on the command line, i.e.:
./ass2 http://www.cs.mcgill.ca/ http://www.mcgill.ca/
These urls will be available to your program in the
parameters argv[1]
and argv[2]
respectively:
int main (int argc, char *argv[]){ char *start_url=argv[1]; char *goal_url=argv[2]; /* the rest ... */ }
Note: other required test cases will be provided later.
Submit a written hardcopy document describing your heuristics, a discussion of their relative benefits, and experimental data showing how well they worked. Also submit your source code and a working executable compiled for linux, using the handin program. The executable MUST be called spinner and it should work using exactly the command:
spinner URL1 URL2
In your writeup, you should present results for various URL pairs, including the recommendd test cases. It is crucial to select good, informative examples and have an insightful discussion.
Questions to answer:
This discussion should probably 2 or 3 typed pages long, including the numerical data.