DocFather is a search utility developed by SITEFORUM Inc. (formerly SFS Software). Its origins trace back to early 1995, when Java Platform v1.0 was released. The company was founded by Frank Schruefer and Dirk Schlenzig in Erfurt, Germany. They originally created the company because there was no integrated development environment (IDE) for the Java platform, and decided to write one themselves.
DocFather - the Applet Version
As they got deeper into the world of Java technology, they faced a challenge searching the JDK documentation. It took too long to find the desired content by conventional browsing. A search utility was unavailable so they started developing it themselves. They called it DocFather and released its first version in the summer of 1996. The current version is available for viewing the JDK documentation as the applet version.
The company had started development of the index creator component using JavaCC in early 1996. The first version was ready in almost six weeks. The DocFather Viewer, a Java applet, took four additional weeks. The index creator was able to retrieve keywords from HTML and text-based files fully automatically and store them in index files. The applet displays the user interface -- where users insert keywords and the viewer presents the pages where these keywords are located. This enabled them to search the JDK documentation for keywords, saving time and improving their development efforts.
However, once they released DocFather they faced the fact that their index creator was well-suited only for the JDK HTML markup. Other documents used different HTML syntax and small syntax errors in their HTML pages would crash the parser. Frank busily adjusted the parser and after nearly 1 year, with the help of nearly 2000 users, he was able to fix these problems and make it robust.
"We are very lucky to have a installed client base that tested the product for us, bombed us with reports and helped us through the early stages -- they made it what it is today," says Dirk.
The company faced many challenges in the subsequent four years. One challenge was to improve the parser component to handle larger numbers of pages. In 1996, most users had an average of 200 pages to index. By 1998, the company had many inquiries for DocFather to handle 30,000 pages or more. These requests forced them to improve the index creator's memory management.
"Another challenge was the browser war between Microsoft and Netscape in 1997 and 1998. Both vendors implemented different versions of the Java VM and our DocFather applet showed completely different results in both browsers. This drove our customers and us crazy. We were forced to find a solution that worked in both browsers or even build dedicated code for each browser which was a disaster, because it increased the file size of our applet," sighs Frank.
As mentioned previously, the company originally decided to write an IDE. They called it javaDraw and renamed it to CoffeeShop in 1997. They had much success with it and still use it for the development of their flagship SITEFORUM product. With CoffeeShop they created some other useful "pure Java" tools, including "javaZIP", a compression utility and of course, DocFather.
Their product philosophy is to build software that is server-independent and database-independent. The software requires at most only a standard web server and web browser, and does not rely on an external database. This design decision forced them to save the index information in so-called index files. The more HTML pages that were indexed, the bigger the index files. Once the index files reached 2MB, it was senseless to use them over the Internet; only intranet or CD-ROM use was practical. So they searched for ways to keep these index files small -- for this purpose they created their own compression algorithm. ZIP compression was available in JDK 1.1, but one of their main targets was to have the applet JDK 1.0.2 compliant, to support popular browsers like Netscape 2 and 3. So the index creator compressed the files and the applet decompressed them. They were able to decrease the size of the index files by more than 75%.
When they started marketing DocFather, they made good use of its server-independence, which made it a good choice for search functionality in any environment, either online (Internet, intranet) or offline (CD-ROM, local network).
They soon recognized that DocFather leveraged its full power for CD-ROM. Clients such as Lucent Technologies, Intel, and Motorola used it for their CD-ROM-based documentation or on their intranets. For intranets, they integrated some update features into the index creator, with the advantage that when pages are added or updated, only those pages need to be re-indexed. In other words, it does not require re-indexing of the entire voume each time a page is added or updated.
SITEFORUM DocFather - the Servlet Version
In early 1999, the company decided to spend more time developing the Internet search facilities of DocFather. Because it didn't make much sense to transfer large files (indexes) over the Internet, they decided to develop a new server-based DocFather application using their Java-based SITEFORUM technology. They extended their server-independent philosophy to not rely on any server. Toward this end, the SITEFORUM product contains its own database and web server. The SITEFORUM platform has a built-in servlet-supporting WebServer, an internal Java-based relational database, and a SQL engine to manage communication with the external JDBC/ODBC databases or the internal database.
The DocFather index creator became an integral part of the SITEFORUM platform. Instead of storing index information in files, it stores this information in any JDBC/ODBC supported database. This was the birth of SITEFORUM DocFather, which is available for viewing the JDK documentation as the servlet version. "With SITEFORUM, it was really easy to create a web-based index creator component. Instead of the well-known DocFather applet, I created a sophisticated HTML front end, where people can insert keywords and get weighted results immediately," said Uthey Mengs, SITEFORUM DocFather project manager. SITEFORUM DocFather focuses on the Internet and intranets. It is a highly scaleable product that can easily index documentation and provide full-text search functionality immediately. "The challenge here was hidden in the SQL queries. Because they create a full-text search index, they needed to find a good table design to store this information as well as an extremely optimized query procedure to get the information in a short time. Anything over 3 seconds was unacceptable." said Uwe.
As a test environment for SITEFORUM DocFather, they used the Java 2 SE v1.3 Beta documentation with 5500 pages, 20,000 keywords and nearly 900,000 entries in the relations table. It took nearly one week to decrease the time for a keyword search for the three most-often used words -- "Java", "class", and "overview" -- in this repository to the 3-second limit.
"SITEFORUM is an ideal environment to build such applications. It offers a vast array of functions that can be combined to build effective features very rapidly", said Frank. A good example is the highlighting of keyword occurrences in the document. "SITEFORUM offers a function to import URLs and a replace function, that can replace text in a document. So we simply imported the document, replaced all keyword occurrences in it with a red font before we showed it to the user. The best thing is that we do not modify the document, we change it virtually," says Frank.
An advantage of the DocFather applet is its server-independence. All there is to do is to create an index, which happens fully-automatically. Then use the same index in conjunction with the DocFather applet in any environment, on any operating system without any modification. DocFather can index HTML, PDF, and text-based file formats. An earlier version was able to index Microsoft Word and Microsoft Excel also -- but this version required the Microsoft Java SDK, which is no longer supported by Microsoft and therefore no longer supported by SITEFORUM Inc.
This advantage at the same time presents limitations in DocFather. Because it is server-independent, DocFather does not use a true database and is therefore limited to an estimated 15,000 pages with an average of 5KB per page. Users can enhance this limit with a good stop-word list or by using META keywords only, instead a full-text search.
SITEFORUM DocFather is more scaleable and offers more features than the original DocFather product. It is fully web-based; users are able to create indexes from anywhere using only a web browser. The SITEFORUM server can be installed on any Java 2 compliant operating system such as Windows NT, Linux or Solaris. The index creator and search front-end requires only a standard web browser.
"Most users are impressed to see how easy it is to create the index and publish it. Later they recognize how customizable the product is and begin to use the broader range of features", said Dirk Schlenzig. Bigger projects from clients brought many good improvements to the entire DocFather customer base. A good milestone for DocFather was its adoption by Intel Corporation. "Intel wanted to use DocFather for an internal use documentation. They loved the main functionality but required a good amount of new features in both the applet and the index creator." At the end of the project, they received a wonderful mail from Intel project lead Jeff Orlando: "Everything is working perfectly now as I hoped it would. This is exactly the tool we hoped to find when we started out. Congratulations on an excellent job."
"Unlike most developers, we leave the DocFather research and comparison to our customers. We had many customers in the past that contacted us for an inquiry and then went out to search for a similar solution. We were not surprised when they returned to us 3 weeks later, because DocFather is a customer-driven product. They take requests and suggestions very seriously and it happened quite often that a customer requested a feature and three days later it was part of a new release," said Dirk. There are surely similar solutions available, for example Verity's search solutions or Astaware's SearchKey Pro. Their customers enjoy DocFather because it makes a complex process simple. Insert a web site or document location and press the button. The rest -- index creation, installing of the viewer -- will happen fully-automatically.
Developers using the Java language are highly encouraged to use the DocFather technology in their applications. For example one of their clients, Lucent Technologies, customized the application so that it ran in conjunction with the Sun HTML JavaBeans component. The "100% Pure Java" certificate earned in 1998 was very useful here. Of course, DocFather can be used as a platform-independent help system for developers also. Its server-independence is a big advantage here, as it makes no difference if it is placed on CD-ROM or distributed via ESD over the Internet. "It is really a pleasure to have a product that can be so easily adapted to such a diverse range of client requirements, like a 'Swiss Army Knife' for search and index solutions," says Brendan O'Gorman, Director of Business Development.
SITEFORUM DocFather offers a vast number of integration options for Java developers. The product is based on the highly acclaimed "SITEFORUM Interaction Platform" and offers maximum customization and development options through integrated development tools such as SITEFORUM Script API and the SITEFORUM SDK.
The development environment "SITEFORUM Studio" offers full customization options for the index creator and search front-end. These options enable an "integrate everything" scenario that allows developers to build their application around the SITEFORUM solution or make the SITEFORUM product part of their solution. Because SITEFORUM comes with a complete package -- server, business logic, and database -- developers can easily adopt the SITEFORUM system to create new functionality or customize it to their needs.
The Company
SITEFORUM Inc. began as SFS Software, making use of the first Java Development Kit release 1.0 in early 1995. It later merged with Schlenzig IT AG, also of Germany.
SITEFORUM Inc. is dedicated to providing leading edge collaborative e-Business solutions based on its own interaction platform, available as an application suite, as an ASP hosting solution, and as a complete integration and development platform.
Headquartered in Austin, the Silicon Hills region of Texas, SITEFORUM Inc. has offices located in Erfurt, Germany and with Asian representation based in Singapore. Top 1000 companies including BMW, CompuServe, Intel, Hewlett Packard, Motorola and Nortel are among their diverse customer-base from more than 70 countries worldwide.
The People on the Front Lines
Juergen Schlenzig, Co-founder of SITEFORUM Inc. and CEO
Frank Schruefer, Co-founder of SFS SOFTWARE and SITEFORUM Inc. Chief Technology Officer and Lead Programmer of CoffeeShop, DocFather and SITEFORUM
Dirk Schlenzig, Co-founder of SFS SOFTWARE and SITEFORUM Inc. and VP Research and Development
Brendan O'Gorman, Director, Business Development.
Uwe Mengs, Senior Programmer and Product Manager and Lead Programmer of SITEFORUM Contact, SITEFORUM Merchant and SITEFORUM DocFather
Mark Schlenzig, Chief Software Designer and Lead Designer of SITEFORUM Merchant, SITEFORUM Contact and SITEFORUM DocFather