LINE INPUT from a WORD document

Using TrueBasic Bronze, how can I get LINE INPUT from a text file (a WORD.doc or .rtf or .txt file) that is already stored in my Windows laptop? Is there a std procedure? I've searched the Bronze Edition Guide for help, but need MORE. Help!

Here's a sample statement I've been using, without success:

LINE INPUT #2 : LN$

(I suspect the execution isn't recognizing the line length of the input document.)

Comments

Re: LINE INPUT statement ...

m2w ... There's nothing wrong your program line syntax, but you need to add more program lines to show what you did to open the target DOC file and get ready to assign a line of text to string variable LN$. You need an "OPEN #2: NAME filename.doc$, etc. ... " program line before you can extract a line of text from filename.doc$. Please show us more program lines.

If the target DOC file isn't located in the program folder you need to add the path to the DOC file in the OPEN program line.

Regards ... Tom M

LINE INPUT

Tom M: Many thanks for the prompt reply. Here's a copy of my last OPEN statement:

OPEN #2 : NAME "C:\Documents and Settings\Mel\My Documents\CONCTXTIN.txt"

I gather that's not enough, right?

Mel

LINE INPUT from WORD doc files

Hi,

Be careful when inporting text from .DOC files and RTF files because the text in these files is formatted in a particular way - it is not straight forward plain text. The text is mixed with special characters and other data that tells WORD how to display the text, what font to use and what colors to use. What LINE INPUT reads will be almost incomprehensible.

However, somewhere in the DOC file there is a copy of the text in plain ASCii characters. the secret is to look for this part of the file and to inport just that part with LINE INPUT.

Big John

LINE INPUT from WORD

Yes, Big John, that is why my latest attempts have all called for input from .txt files, but my Bronze TrueBasic doesn’t seem to be recognizing any end-of-line character that has been generated by some OTHER word processing program. I’ve also tried statements like “LINE INPUT, using ...” and “INPUT, using ...” to get around that problem, but all they do is generate error msgs during compilation. This is all a “puzzlement,” but I appreciate your interest.

Word Documents

The reason why LINE INPUT doesn't recognize the line endings is because Word doesn't use line endings! The location of each line in the file is stored in a table called a linked list so no line endings are needed. Here's a simplified example:

0
7
22
This isa test documentin word format.

The offsets show the first line begins at relative location 0, the second line at offset 7 and the third line at offset 22. When printed out, the text will be:

This is
a test document
in word format.

This is a vastly over-simplified example. To see the entire format of a Word document (it's about 25 HTML pages long!) go here:

http://www.wotsit.org/list.asp?fc=10

Good luck!
Tom Lake

Word Documents

Many thanks, Tom. Your kind note will save me further effort trying to find what isn’t there. What you tell me also reinforces one of my personal pet peeves: the penchant of latter-day programmers for “transparency,” by which they mean hiding their codes and routines and practices from all users who want or need them. Even so, I’m hoping you and other readers of this post can now help me find answers to the following:

1) Is there a Bronze TrueBASIC statement that will let me specify the number of characters to be taken from the one long string of text in a WORD input file? Something like the “PRINT, using...” statement? If so, I can easily edit my input into a standard length line, as I did in my punch card youth.

2) Do WordPerfect files include end-of-line characters that can be recognized by Bronze TrueBASIC? If so, I will arrange to use WordPerfect for my current project’s word processing.

Help!

Importing text from Word Perfect

Hi,

The only sure fire way to get text from WORD and WORD PERFECT is to import the text character by character using a BYTE file.

The reason for doing it character by character is that you can weed out all the special characters related to formatting. As I recall the end of line character in WORD is either chr$(7) or chr$(31).

Send me an e-mail na I will dig out the code for reading and writing to WORD.DOC files. As I recall, I think I looked for the first occurence of the first line of text, then I ignored that and looked for the second occurence. This told me where the plain text equivalent of the formatted text begins. (WORD stores two versions of the text - one is formatted and the second is plain text).

Regards
Big John

Re: OPEN #2: ...

Mel ... Right. I'm going to write a short program that completes what you have started. Then I'll post the code. I'll first create a .txt file to use. Regards ... Tom M

LINE INPUT from .DOC Files

Many thanks Tom M and Big John and Tom Lake for your interest and help. Learning from what you’ve told me -- and, I must confess, with a little bit o’ luck -- I found a way this past weekend to convert a .doc file to the sort of “plain text” file that a LINE INPUT statement will work with! How about THEM apples? Here’s what I did:

1) Started a new NOTEPAD file on my Windows laptop. (Start menu -> All Programs -> Accessories -> Notepad.)

2) Opened an old WORD.doc file.

3) Copied a part of it into the new NOTEPAD file. (Edit -> Copy / Edit -> paste)

Used the NOTEPAD file for my TrueBASIC program’s input file, as in the following example:

OPEN #2 : NAME "C:\Documents and Settings\Mel\My Documents\CONCTXTIN3em.txt"
. . . . . .
FOR REP = 1 to 6
LINE INPUT #2 : LN$
DO WHILE ...
. . . . . .
LOOP
NEXT REP
END

Again, many thanks. Hope my discovery proves helpful to you guys, too.

P.S. If any TrueBASIC staff have been following these postings, howzabout describing the above procedure in your next BRONZE Edition Guidebook?