[Erlug] Document Conversion Systems

Daniele Medri <daniele.medri@xxxxxxxxx> · Mon, 3 Jun 2002 17:35:58 +0200

Come è arrivato lo passo...

Document Conversion Systems 
---------------------------

A call for interest to the Open Source Software community 

H.A. Proper and E.D. Schabell 
June 3, 2002 

Introduction 
Imagine yourself sitting at your computer in the near future,
working on your latest document in your favorite editor. You decide to
save the document to a different format than the standard format used
by the editor. You try to 'save as...' another format, but this new
format does not exist in your current editors conversion list. You try
'export' from the options menu and enter the form you wish to convert
to. You receive a message from your editor that this new format is
not available locally but it might be able to search the Internet for
an algorithm that could make the conversion for you. Since you have a
connection to the Internet, feel that it might be time for a coffee,
you answer affirmative and the search begins. By the time you return
from getting that cup of coffee the editor has popped up a message that
the conversion algorithm has been found and applied. It also ask you if
you would like to have this new algorithm added to your local library of
conversion tools? Of course you think, and after giving the go-ahead your
editor reports that the document you were working on has been converted
and the new algorithm has been added to your local library of conversion
tools. That coffee is tasting even better now that you can further expand
on your document without conversion troubles!

Call for Interest 
The above scenario might seem a bit far fetched, however, at the moment we
are starting up a research project and associated prototype, in which a 
core part will provide the above sketched functionality. In the research 
project, the seemless conversion functionality will be used to research &
develop information retrieval systems for heterogenous data sources. 
However, as sketched in the above scenario, such functionality can be used
for many, many, other purposes.
As a first step we are therefore interested in implementing an open & 
distributed system for conversions between data objects. We aim to set this
up as an Open Source Software project environment, since we feel that

1: this kind of functionality will be usefull to other applications than
   only information retrieval.
2: other research groups in information retrieval are likely to be faced
   with similar challenges.

We are, therefore, looking for interested parties that would like to 
participate 
in developing, using and/or testing such a system as described above.

Vision
At the moment we envision an open distributed system that uses a peer to
peer (p2p) communication strategy as, for example, is used in gnutella.

The system should distinguish between:

- the definition of conversions from one data type to another data type.
- the actual implementations of these conversions.
- a suitable execution environment for these conversions.

This would make the conversion system fairly platform (OS, CPU, Memory) 
independent. 
Searches for appropriate services can be conducted using a p2p approach.

The conversions we aim to include in the system are sheer endless. They
may include:

- `simple' conversions such as: text to postscript, postscript to text,
  word to text, xml to word, latex to postscript, gif to tiff, bmp to gif,
  wav to mp3, etc.
- whole-part selection, for instance, splitting a mailbox into its constituent
  mails, or splitting a mail into subject, header, body or attachment-set,
  etc.
- aspect conversions, such as: a document's full-content to an abstract,
  a document's full-content to a set of keywords, etc. 

Conversions may be composed as well. For example, a Word->Text conversion may
be combined with a Full-Text->Abstract-Text conversion to derive an abstract
from a word document. The system should be able to figure out such
combinations automatically.

As you may expect, a powerfull typing mechanism is needed. We are considering
using the Typed Object Model (TOM) as a starting point: 

        http://tom.library.upenn.edu/sw/index.html

On top of the conversion infrastructure, a host of plug-ins for editors may 
be developed that would allow for seemless import/export in different formats.

Possible Components 
Some existing Open Source Software projects may be integrated into the planned
system.

An infrastructure for the p2p infrastructure may be provided by:
        * JXTA project, which allows for the concept of providing "services".

Pre-existing conversion routines which may be entered into the system as
conversions:
        * a2ps, ghostscript, wv, xpdf, gimp, psutils, etc...    
        * openjade, kea, etc...

Interested Parties Please feel free to contact the authors at: 

    pronir-conversion@xxxxxxxxx 

for further information and collaboration possibilities.

-- 
Daniele Medri - http://www.linux.it/~madrid/
fingerprint: 5E13 8A5F 9FE3 6857 BF6B  9EE7 1693 81ED 211F C4C1
"Per molti il giudizio e' un dente da estrarre"

To:	erlug@xxxxxxxxxxxxxx
Subject:	[Erlug] Document Conversion Systems
From:	Daniele Medri <daniele.medri@xxxxxxxxx>
Date:	Mon, 3 Jun 2002 17:35:58 +0200

<Prev in Thread]	Current Thread	[Next in Thread>
[Erlug] Document Conversion Systems, Daniele Medri <=

Previous by Date:	Re: [Erlug] Ancora lo antivirus.. :(, Sythos
Next by Date:	Re: [Erlug] Cose OT [topic closed], Pier Luigi Fiorini
Previous by Thread:	[Erlug] Ancora lo antivirus.. :(, "Sabrina L. Gonçalves"
Next by Thread:	[Erlug] non ci capisco, Stefano Zanelli
Indexes:	[Date] [Thread] [Top] [All Lists]