Ganasa

March 20, 2009

When does software become obsolete?

Filed under: Uncategorized — ganasa @ 11:38 pm

I received the following e-mail: 

We have a VERY old, VERY beloved computer that uses the OASIS operating system from 1983.  It will not boot up.  Would it be possible to get a boot disk?

I told them I turned 6 in 1983 and I didn’t have a boot disk, but if I could get my hands on the computer, I would see what I could do. They dropped off the computer to me, which was branded Integrated Business Machines (IBC). It had a 20MB hard drive and 256K of RAM.  An amber WYSE 50 terminal was plugged in to the first of 12 serial ports on the back of the computer.  I turned on the terminal, booted the computer and saw nothing but the amber blinking cursor.

I opened up the cabinet and found a boot disk taped to the cover.  I put it in the floppy drive and restarted the computer.  Nothing…

I attempted to find a new terminal as I thought perhaps the terminal had given up after 25 years.  I plugged in an Esprit 5000 HS.  After setting the serial port rate and emulation mode, I restarted the computer again.  Still nothing…

My next thought is that the hard drive had finally given out.  I decided to look down in my Father’s basement to see if he had a compatible hard drive.  His basement is like a museum with a lot of very old computer hardware.  Not only did I find a few compatible hard drives, I found 3 identical IBC systems!  I spent an hour or two swapping components until I was greeted with the following:

Oasis 8 version 6.1-S #8-16-05026, 256K bytes

Copyright (c) 1984 Timothy S. Williams
Portions Copyright (c) 1984 IBC
All rights reserved.

Time(HH:MM:SS)

After entering in the time and date, I saw a familiar prompt:

Logon please: SYSTEM
Logon at 20:30:03, on 03/19/89
>

I poked around a little bit and found that all of the data was intact.  The 25 year old system had been resurrected!  I contacted the owner and let them know of my success.  They were ecstatic about being able to continue to use their computer system.

July 8, 2008

Oops

Filed under: Uncategorized — ganasa @ 8:14 pm

I indavertantly wiped out some data that was very important on one of the MySQL servers that I manage. To make matters worse, this data could not be restored from a backup, because it wasn’t the values at a point in time that were useful, it was the combination of all values over a two week period that was important. This post will describe what steps I took to recover the information.

Fortunately, the server is part of a replication chain, so the binary logs were enabled. My thought was to run the command mysqlbinlog to get a dump of all data processed in this server and then pass it through grep to extract the queries that modified the relevant data. I would then write a program to convert the queries in to CSV format, move it in to an OpenOffice spreadsheet and upload the spreadsheet data in to the external package that needed the data.

Unfortunately, the MySQL server was running on the Windows 2003 platform. Windows doesn’t have a grep utility and I’m more comfortable with a shell prompt than I am with a GUI, so I decided to connect to one of my Linux machines and perform the recovery from there.

# mysql -u root --password -h 10.0.100.4

mysql> show master logs;
+------------------+
| Log_name         |
+------------------+
| mysql-bin.000328 |
| mysql-bin.000329 |
| mysql-bin.000330 |
| mysql-bin.000331 |
| mysql-bin.000332 |
| mysql-bin.000333 |
| mysql-bin.000334 |
| mysql-bin.000335 |
| mysql-bin.000336 |
| mysql-bin.000337 |
| mysql-bin.000338 |
+------------------+
11 rows in set (0.00 sec)

mysql> quit
Bye

The data that I needed to recover was only valid within a certain timeframe, between Sunday 6/22 and Saturday 7/5. I had no idea which of the binary logs contained the correct data. However, I found that mysqlbinlog had two options that would only extract the time from the period that I needed (--start-datetime and --stop-datetime).

In addition to the actual queries that were performed, I needed the time that the query was executed. mysqlbinlog decodes the timestamp and prints it on a line before each query as SET TIMESTAMP=Unix_Time/*!*/;. The -B n option to grep prints n number of lines before the line containing the pattern.

Here is the long command string I used to extract the relevant data into a file.

# mysqlbinlog -u root --password -h 10.0.100.4 -R -t \
--start-datetime='2008-06-22 00:00:00' --stop-datetime='2008-07-05 23:59:59'  \
mysql-bin.000328 | grep -B 1 column_name > /tmp/my_data.sql

I wrote a simple program to convert the my_data.sql file in to a CSV file, which I then opened with OpenOffice. Now I had to convert the Unix_Time column to a human readable timestamp. I found that there was not a builtin formula to handle this conversion, so I began experimenting with a formula to do the conversion. The formula was actually pretty simple. I inserted a new column to the spreadsheet and added =A1/86400 + DATE(1970;1;1) - 6*3600. 86400 is the number of seconds in a day. DATE(1970;1;1) is the unix time epoch. 6*3600 is to adjust for the timezone. I am in the Mountain timezone and am in a state that practices daylight savings time. Therefore, I am currently 6 hours behind UTC. I then had to format the column as a date and it correctly showed me the date/time that the query was initially executed. I performed some additional sorts and subtotals on the data until it was ready to upload.

Luckily, I was able to recover the data using mysqlbinlog, grep and OpenOffice. Without these tools, I would have been in a lot of trouble because the data would not have been able to be recovered and I probably would have been fired.

June 28, 2008

Creating, transferring and restoring a TBACKUP file

Filed under: Uncategorized — ganasa @ 8:57 pm

As part of the project I am working on, I needed to backup all files from a private account on one computer and move them to another computer on a different private account.  I thought this would be fairly simple. Little did I know it would take much longer than necessary and was very tedious to accomplish.

On the source machine, I typed the following to create the backup file.

Source-ACCT_ID3>TBACKUP A A_Drive.tbk (BACKUP

This created a 2.5GB TBK file. I made this file accessible via FTP on the source machine and connected to the destination machine to transfer the file.

Destination-ACCT_ID4>NET RECEIVE FTP://username@Source/A_Drive.tbk

The file transfer box informed me that it would take about an hour to complete. I checked back in an hour and found the following on my screen:

┌────────────────────────────────────────┐
│NET FTP Download                        │
├────────────────────────────────────────┤
│ File:                                  │
│ Transferred: 919.36 MB of 2589.71 MB   │
│ Remaining: 52 min, 57 sec              │
│ Rate: 540 KBps                         │
│ ┌────────────────────────────────────┐ │
│ │████████████    35%                 │ │
│ └────────────────────────────────────┘ │
│              ┌──────────┐              │
│              │  Cancel  │              │
│              └──────────┘              │
└────────────────────────────────────────┘

I watched it for about a minute to see if there was any progress, which there wasn’t. I opened another session on the destination computer to see if it was still functioning, it was. I typed NET SOCK to check the status of any network sockets.

Nbr│OpCode│Local         │    Remote     │ Port│IState   │OState │Pid│   IN│  OUT│Retry
───┼──────┼──────────────┼───────────────┼─────┼─────────┼───────┼───┼─────┼─────┼──────
  6│      │62148         │10.0.100.2     │   21│ESTABLISH│IDLE   │203│   28│    0│    0
  7│RECVFR│62149         │10.0.100.2     │ 5021│ESTABLISH│IDLE   │203│    0│    0│    0

This shows the FTP command socket (number 6) and also the FTP data socket (number 7). It seemed strange that the command socket had 28 bytes in the input buffer.

I connected to the source machine to see if it had locked up. It hadn’t. I typed NET SOCK to check the status of any network sockets.

Nbr│OpCode│Local         │    Remote     │ Port│IState   │OState │Pid│   IN│  OUT│Retry
───┼──────┼──────────────┼───────────────┼─────┼─────────┼───────┼───┼─────┼─────┼──────
  6│RECVFR│   21=ftp     │10.10.3.115    │62148│ESTABLISH│IDLE   │200│    0│    0│    0

I immediately saw the problem. The data socket did not exist! Performing a SHOW USER showed that the PID serving the request was waiting for input.

 Id │Account │Program │Status / Info   │User    │Terminal│IP
────┼────────┼────────┼────────────────┼────────┼────────┼───────────────
200*│FTP10623│FTP     │I  TCPIP:13ABC  │        │        │10.10.3.115

I have no idea as to why the source machine decided to close the data socket. My only choice was to abandon the transfer and try again. I restarted it by pressing Break+Q (to abort the job), Page Up (to recall the prior command from the command line history) and Enter to execute the FTP command. Luckily, it continued from where it left off instead of starting over.

This time I watched the transmission. It failed again.

┌────────────────────────────────────────┐
│NET FTP Download                        │
├────────────────────────────────────────┤
│ File:                                  │
│ Transferred: 1801.94 MB of 2589.71 MB  │
│ Remaining: 6 min, 6 sec                │
│ Rate: 2.16 MBps                        │
│ ┌────────────────────────────────────┐ │
│ │████████████████69%██████           │ │
│ └────────────────────────────────────┘ │
│              ┌──────────┐              │
│              │  Cancel  │              │
│              └──────────┘              │
└────────────────────────────────────────┘

I typed NET SOCK again on both machines. The destination machine had similar results as before with the only difference being the command socket did not have the 28 bytes in the Input buffer. The difference on the source machine was that it now contained a data socket!

Nbr│OpCode│Local         │    Remote     │ Port│IState   │OState │Pid│   IN│  OUT│Retry
───┼──────┼──────────────┼───────────────┼─────┼─────────┼───────┼───┼─────┼─────┼──────
  6│      │   21=ftp     │10.10.3.115    │62150│ESTABLISH│IDLE   │200│    0│    0│    0
 10│SENDTO│ 5021=ftp-dta │10.10.3.115    │62151│ESTABLISH│RETRANS│200│    0│17520│   14

Apparently it was attempting to send 17520 bytes and was on the 14th retry. Shortly after I copied the above to my clipboard, the data socket disappeared and the 28 bytes appeared in the input buffer on the destination machine.

My guess is that something happened on the destination machine which caused packets from the data connection to not be received. The source machine attempted to resend the unacknowledged packet. After about 20 attempts, the source machine closed the data connection and sent a notification on the command socket that the connection had been dropped (the 28 bytes). The destination machine did not consume the 28 bytes which is why they were still waiting in the input buffer.

I restarted the transmission again and this time it finished successfully.

When the transfer finally finished, I attempted to restore the archive with the following

TBACKUP A_Drive.tbk (RESTORE REPLACE NOQ

Which gave me the following error.

┌───────────────────────────┐
│TBACKUP                    │
├───────────────────────────┤
│                           │
│ Source Account not found. │
│                           │
│       ┌──────────┐        │
│       │    OK    │        │
│       └──────────┘        │
└───────────────────────────┘

I realized that I failed to include the option to specify which account to restore. I tried again

TBACKUP A_Drive.tbk (RESTORE REPLACE NOQ FROM ACCT_ID3

but received the same error. Perhaps the FROM option only worked in conjunction with the VOLUME option and I needed to actually be logged on to the account ACCT_ID3 in order to restore the data. The problem is that ACCT_ID3 already had a different data set on it that could not be replaced. I decided to change ACCT_ID3 to be a synonym to the account I was trying to restore to, ACCT_ID4. I typed ACCOUNT, selected ACCT_ID3 in the list and clicked the Modify button, only to find that the Alias to: and Id: boxes were grayed out and didn’t allow modifications.

My next idea was to manually edit the Account.bin file. WW /Theos/Config/Account.bin opened the file as read only. I had to remove the write protections on the file before I could manually edit it.

CHANGE /Theos/Config/Account.bin (nrnwnxnenmnh shared nrnw noq

WW /Theos/Config/Account.bin

I found ACCT_ID3 and changed ID=S3 to ID=S4. I then logged on as ACCT_ID3, and was able to successfully restore the TBACKUP file.

June 24, 2008

C2BasicL() vs _c2b_str() & BasicL2C() vs _b2c_str()

Filed under: Uncategorized — ganasa @ 6:49 pm

These functions are a way of converting strings between BASIC and C in the THEOS Operating System. These functions are required when calling external C functions from within BASIC.

I use these function so infrequently, that I usually forget what the differences between them are.  I inadvertantly end up using the wrong function in the wrong place, which leads to a lot of debugging time.  In order to save time and frustration while debugging I thought I would write about the differences, so the next time I am forgetful I can just search my blog and see which function I need to use.

These two functions are part of the Standard THEOS C library and can be used by including <string.h>.

extern char * BasicL2C(char *cstr, const void *bstr);
extern char * C2BasicL(void *bstr, const char *cstr);

These two function definitions must be declared by the programmer

extern char * _b2c_str(char **cstr, void *bstr);
extern void * _c2b_str(void **bstr, char *cstr);

void can be replaced with BASIC_STRING after the following typedef struct is defined.

typedef struct basic_string {
  unsigned long len;
  char str[];
} BASIC_STRING;

The primary difference between these functions is that the two included in the C library (C2BasicL() and BasicL2C()) do not allocate memory, where _b2c_str() and _c2b_str() automatically allocate and free the necessary memory.

When using BasicL2C(), cstr must point to a buffer large enough to contain the contents of bstr + 1 for the NULL terminator. When using C2BasicL(), bstr must point to a buffer large enough to contain the contents of cstr + 4 for the len code. If allocating memory for the new bstr, remember to free() anything that is already being pointed to. Also, if the string is being returned to BASIC via ADDROF(), remember not to wipe out the original reference while using malloc() by dereferencing the original pointer.

free(*bstr);
(*bstr) = malloc(sizeof(long) + strlen(cstr));
C2BasicL(*bstr, cstr);

Here is an example of assigning data directly from a C array to a BASIC array.

for (ushort i=0; i<bstr_array_length; ++i) {
  free(bstr_array[i]);
  bstr_array[i] = malloc(sizeof(long) + strlen(cstr[i]));
  C2BasicL(bstr_array[i], cstr_array[i]);
}

Hopefully this post will me remember the differences between these functions so the next time I use them I will not have to spend any additional time debugging.

June 17, 2008

Changing Break+Q to a Custom Handler

Filed under: Uncategorized — ganasa @ 4:07 pm

This post will describe a problem I resolved while using the THEOS MultiUser Basic FTP Toolkit to transmit a local file to a remote FTP server. 

Occasionally, while using ftp.send(), the session will stop responding.  The only way I have found to recover from the task is to press Break+Q.  This is not a good solution, because every time the user presses Break+Q, they are completely disconnected from the system due to a LOGON.EXEC similar to the following:

  &CONTROL OFF
  MENU
  EXIT

My thought is that before the call to ftp.send(), I would change the Break+Q handler. If Break+Q was pressed, it would automatically abort the current transmission and try sending the file again. When ftp.send() returned, I would set the Break+Q handler back to the default.

Before this idea could be implemented, I needed to create a BASIC wrapper for the C function signal(). By convention, whenever I create a wrapper for a C function, I add C_ as a prefix to the function name that is accessible from within BASIC.

#include <signal>

/*
 * C_ON_BREAK_Q_SET() -- accepts a BASIC function pointer for the Break+Q interrupt handler
 * DECLARE CALL c.on.break.q.set(ADDROF( SUB BASIC.SUB.PROGRAM))
 */
short C_ON_BREAK_Q_SET(void (*func)()) (
        signal(SIGQUIT, func);
        return 0;
}

/*
 * C_ON_BREAK_Q_RESET() -- resets the Break+Q handler to the default
 * DECLARE CALL c.on.break.q.reset()
 */
short C_ON_BREAK_Q_RESET() {
        signal(SIGQUIT, SIG_DFL);
        return 0;
}

With these C functions, I can now change the Break+Q handler from within BASIC by calling C.ON.BREAK.Q.SET(ADDROF(SUB handle.break.q)) and reset the handler with C.ON.BREAK.Q.RESET. The changes to the code were fairly simple.

CALL ftp.send(local.file.name$, remote.file.name$)

became

CALL ftp.send.with.handler(local.file.name$, remote.file.name$)

I would have liked to pass the handler SUB program as a function pointer to the call ftp.send.with.handler(), but there is no way to receive a function pointer as a parameter in BASIC, so I had to hard code in the handler handle.break.q into the CALL below.

SUB ftp.send.with.handler(local.file.name$, remote.file.name$)
     CALL c.on.break.q.set(ADDROF( SUB handle.break.q))
     CALL ftp.send(local.file.name$, remote.file.name$)
     CALL c.on.break.q.reset
END SUB

Another problem I encountered was caused by the fact that the new Break+Q handler is a SUB program, which means data is placed on the stack. Aborting at this point is easy, you just end the application. Trying the transmission again is difficult because I am in a sub program and the stack is not cleaned up if I jump to another section of the program. I decided to make the handler SUB program very simple. All it does is throw custom exception #1001.

SUB handle.break.q
     LET ERR = 1001
END SUB

The execution is then passed to the sub-routine defined by the most recent ON ERROR GOTO statement

ON ERROR GOTO exception.handler
...
start.transmission:
     CALL ftp.open(...)
...
exception.handler:
     IF ERR = 1001
          CALL bwmsgbox(...)
          IF response% = wmsgbox.cancel% then END
          IF response% = wmsgbox.retry%
               CALL ftp.close
               RESUME start.transmission
          IFEND
     IFEND
...

The ability to change the behavior of Break+Q from within MultiUser BASIC applications will be very useful. Especially when the session appears to be locked, as is the case with the ftp.send() problem that I experienced.

June 16, 2008

THEOS IPP Client

Filed under: Uncategorized — ganasa @ 3:30 pm

We are having a very difficult time with spooler lockups in THEOS Corona.  I had an idea earlier this week (described here) which would allow print jobs generated from within THEOS to be sent to a linux based CUPS server using an IPP client that I would write. This post will describe version 0.1 of the IPP client.

IPP is transported over the Hypertext Transfer Protocol (HTTP), so I wrote a few SUB programs to prepare the request and handle the response. The THEOS MultiUser BASIC source code for the prepare.request() SUB program is listed below.

SUB prepare.request(ADDROF(request$),ADDROF(headers$),content$,host$,path$)
     crlf$ = CHR$(13)&CHR$(10)
     content.length = LEN(content$)

     headers$ = "POST "&path$&" HTTP/1.0"&crlf$
     headers$ = headers$&"Host: "&host$&crlf$
     headers$ = headers$&"User-agent: THEOS IPP Client v0.1"&crlf$
     headers$ = headers$&"Content-Type: application/ipp"&crlf$
     headers$ = headers$&"Content-Length: "&STR$(content.length)&crlf$
     headers$ = headers$&crlf$

     request$ = headers$&content$
END SUB

Now that the transport was taken care of, the next step was to create the content. RFC 2910 contains the following encoding table:

   -----------------------------------------------
   |                  version-number             |   2 bytes  - required
   -----------------------------------------------
   |               operation-id (request)        |
   |                      or                     |   2 bytes  - required
   |               status-code (response)        |
   -----------------------------------------------
   |                   request-id                |   4 bytes  - required
   -----------------------------------------------
   |                 attribute-group             |   n bytes - 0 or more
   -----------------------------------------------
   |              end-of-attributes-tag          |   1 byte   - required
   -----------------------------------------------
   |                     data                    |   q bytes  - optional
   -----------------------------------------------

As this program is pre-alpha, I just hard coded in the values for sending a print job.

Octets     Symbolic Value             Protocol field
0x0101     1.1                        version-number
0x0002     Print-Job                  operation-id (request)
0x00000001 1                          request-id
0x03       end-of-attributes          end-of-attributes-tag
0x1B28.... <PCL encoded print job>    data from THEOS Spooler

Here it is in MultiUser BASIC syntax

DEF fn.generate.content$(id)
     LOCAL s$, content$

     content$ = CHR$(1)&CHR$(1)                             ! version (1.1)
     content$ = content$&CHR$(0)&CHR$(2)                    ! operation ID (Print-Job)
     content$ = content$&CHR$(0)&CHR$(0)&CHR$(0)&CHR$(1)    ! request ID (1)
     content$ = content$&CHR$(3)                            ! end of attributes-tag

     CALL file.load("/Theos/Spooler/System.Spooler.F"&FORMAT$(id,"99999"),ADDROF(s$))
     content$ = content$&s$                                 ! data from THEOS spooler

     fn.generate.content$ = content$
FNEND

I now had enough to send a job to the CUPS server and see if I received a printout.

     host$ = "10.10.114.71"
     path$ = "/printers/prt1"
     port% = 631
     content$ = fn.generate.content$(1) ! THEOS spooler ID

     CALL open.connection(ADDROF(socket%), host$, port%)
     CALL prepare.request(ADDROF(request$), ADDROF(headers$), content$, host$, path$)
     CALL send.request(socket%, request$)
     CALL receive.response(socket%, ADDROF(response$))
     CALL handle.response(response$)
     CALL close.conection(socket%)

Sure enough, this sent a print job from THEOS, to the Linux based CUPS server, which in turn sent the print job to the printer. This was very encouraging. I will work on completing this project next weekend.

June 14, 2008

A Virus?

Filed under: Uncategorized — ganasa @ 9:40 pm

I recently installed pfsense as a new firewall /router at work.  I arrived to work early the morning after installation so I could ensure that everything was functioning propertly.  Around 8:30 I noticed that the Internet became very slow and unresponsive.

I connected to the pfsense machine via SSH and ran pftop, which shows statisitcs on the current states.  I pressed B to sort by descending bytes.  The VPN connections were listed at the top of the list, this made sense to me as they would negotiate an ESP connection and all subsequent connections would be sent over this link.

I also saw many connections from the same source IP going to different IPs on non-standard ports.  I copied the destination IPs and pasted them in to another SSH window in order to perform a nslookup and traceroute.  Most of these IPs didn’t have reverse DNS configured, so nslookup was of little value.  Traceroute also was of little value because the destinations seemed to be all over the country with different ISPs.  My thought is that this machine initiating all of these connections had a virus and was sending / receiving data at maximum speed, which consumed all of our bandwidth.  I used the port number from pftop to kill these states.  Unfortunately, I could not kill them fast enough.  As soon as I would kill on job, another one would start ip.

The image below shows the link utilization for one of our T1s during the time this was occurring.  You can see that starting at about 8:20, the link was fully utilized until around 9:20, which is when I started killing the states.  About 9:50 I brought down the Internet in order to setup a filter in pf which would block the affected IP from consuming the bandwidth.  I added the line block quick from 10.10.1.19 in to the file /etc/pf.conf which blocked all traffic directed to the Internet from the infected machine.

WAN2 Traffic 16 hours

 As you can see from the graph above, the remainder of the day did not experience any problems.  Diagnosing this problem without the tools that pfsense provides would have been extremely difficult.  So far I have not found anything that I have not liked about pfsense.

A New Firewall

Filed under: Uncategorized — ganasa @ 8:53 pm

Currently we are using a Linksys RV016 10/100 16-port VPN Router as our corporate firewall. One advantage of this router is that It allows multiple WAN ports and can be configured to load balance across all outgoing links.  One of the disadvantages of this device is that you can’t examine traffic or see the link utilization.

Overall, this has been a good router / firewall for us, but our network utilization now exceeds the capacity of this device.  Over the past few weeks, we have been experiencing network latency and packet loss.  It is now getting to the point where it is difficult for people to get their jobs done, so I am going to switch out this router with a new device.

I have used OpenBSD since version 2.4.  I really like the flexibility of pf, the OpenBSD packet filter, which I use at home.  I have tried using it at work a few times, but nobody else who I work with likes the TUI because it inadvertently leads to messed up packet filtering rules.  I did some searching on the web and found pfsense, which runs on FreeBSD and provides a web-based GUI for administrative tasks.

I installed pfsense on a machine with 5 NICs (4 WAN ports, 1 LAN port).  After installing it to a hard drive, it would not boot the kernel.  To correct this problem I manually went in to the BIOS and changed the  HD setting from Auto to LBA.

I setup IPSec for the 14 offsite VPNs  that connect in to the main network.  This setup was was pretty straightforward.  I also setup a load balancer, following the documentation found here.  Hopefully everything will go well tomorrow and the network latency and packet loss problems will disappear as a result of having a more robust Internet router.

June 10, 2008

THEOS Print Server

Filed under: Uncategorized — Tags: — ganasa @ 9:29 pm

One of the requirements of a project that I’m working on is to have two THEOS Corona servers share a common set of printers.  Currently, there are 42 printers attached to the Main server.  The type of printers range from high speed laser printers, thermal label printers, and even one dot-matrix printer.  The interfaces of these printers also vary.  Some are serial printers, others are network printers, a few are Windows only printers.

About half of the system reboots that occur on the main server are initiated by a sysadmin in order to resolve a stuck printer or stuck spooler queue.  This leads to a lot of unnecessary downtime in order to resolve printer problems.  My hope is that I will be able to setup a dedicated THEOS Print Server that will handle all of the communication between THEOS and the printers.  This would allow the user-loaded systems to not have to be restarted in order to resolve stuck print jobs.

My initial plan was to use LPR/LPD protocolsto communicate between the THEOS servers .  I would enable LPD on the Print Server and use LPR on the user-loaded systems to send the print jobs over the network.  I found that this worked, with one problem.  An extra page would be printed at the end of every print job.  I sent an e-mail to THEOS Suport, thinking that this was a bug.  I received a response telling me that this was the way it was supposed to work.  They suggested changing the 1000+ application program to use the option LOCK on all of the OPEN statements in order to resolve the extra page problem.  This is not a feasible option, so I tried another approach.

My next approach was to use the  proprietary PrtNet protocol to communicate between the THEOS servers.  On the user-loaded system I found it easier to edit the config file directly (/Theos/Config/PrtNet.cfg) instead of using the SETUP PRTNET CLIENT application.  The entries looked like this:

[PRTNET01]
SERVER=10.0.100.2
PORT=3333
PRT=PRT1
OPTIONS=1
SECOND_COLOR=7   

[PRTNET02]
SERVER=10.0.100.2
PORT=3333
PRT=PRT2
OPTIONS=1
SECOND_COLOR=7 

... 

[PRTNET42]
SERVER=10.0.100.2
PORT=3333
PRT=PRT42
OPTIONS=1
SECOND_COLOR=7

After manually modifying the config file, I had to go back in to SETUP PRTNET CLIENTwhich added the UCB entries in to /Theos/Config/Devnames.txt.  I rebooted in order for the changes to take effect.  When the system came up, I ran some tests and I could now print across the network from the user-loaded systems to the print server. 

Mission Accomplished!

 

–edit–

Actually, let me retract the “Mission Accomplished” statement.  The setup above worked great while testing, but when the users starting sending print jobs to the print server machine, the spoolers on the user systems began to lock up.  The only way to resolve the issue was to reboot the user-loaded servers, which is what I was trying to avoid by moving to a dedicated print server!  To make the matter worse, I also had to reboot the print server.  While the print server was restarting, print jobs from the other user server would get stuck and I would then have to restart that server in order to get jobs flowing again.

I decided to try another approach.  My thought is to start the THEOS spooler, but disable all of the printers.  This would allow it to accept print jobs, but would do nothing with the print jobs.  They would just wait in the queue.

I would then write an Internet Printing Protocol (IPP)client that would send the job (over HTTP) to a Linux server running the Common Unix Print System (CUPS).  Once the CUPS server had successfully accepted the print job, the IPP Client would remove the entry from the THEOS print queue.

Unfortunately, I will not have the needed non-interrupted time to work on the IPP client while at work this week, so I will go in to work on the weekend when I am the only one there and whip out the IPP Client for THEOS.  I hope it works as the users are getting very frustrated with not being able to print and I am getting frustrated at having the continually reboot all 3 servers in order to get print jobs to successfully flow.

In the meantime, I’ll be busy reading RFC 2910 and RFC 2911.

Theo+Fax

Filed under: Uncategorized — Tags: — ganasa @ 8:42 pm

I have been working on a new project to take one large THEOS system and split it in to two smaller systems. 

The large system has THEO+Fax installed with three MultiTech modems connected to the server via Digi AccelePort C/X Concentrators

I removed one of the modems off of the large server and installed it on the smaller server using the first serial port.  I configured the fax server software and everything appeared to be working correctly.

I decided to send a few faxes to myself in order to test.  One was a 5 page invoice with text overlayed on top of a background image.  The fax server reported that it was successfully delivered.  However, when I walked over to my fax machine, I only found 1.5 pages of the invoice.  The other 3.5 pages were missing.  I went to the fax server and resent the fax, paying closer attention to the progression of the fax.  I noticed that after 3 pages, the status went from Sending/Page3 to Sent.  Sure enough, when I walked over to my fax machine, only 3 pages were waiting for me.

I thought that perhaps the fax was too complex, so I ran another test of three page purchase order without any image overlays.  This time I received a status of Error/Could not send.  I resent the fax and was told by the software that it went through.  After walking over to my fax, I found that I had only received 1 page, yet the fax server now reported it had been Sent.

I checked the fax modem configuration between the two servers and they were identical. Why would one server work fine, and moving the same modem to the other server cause incomplete faxes and invalid response codes?

One idea I had for a workaround was to enable PrtNet (a proprietary THEOS product that allows printers to be shared over the network) on the Main server and use the Virtual Fax Printer interface to share the fax server between two machines. On the main server, PRT9 was attached in SYSGEN to the device FAXPRT. On the second server, I attached PRT9 to NETPRT9, which pointed to the main server fax printer.

I found that this mostly worked, but it always faxed a blank page at the beginning of the transmission. Another disadvantage is that I would have to change all fax related applications to use the virtual fax printer interface with embedded +FAX+ commands in the document. The current applications were mixed between the MultiUser BASIC FAX API (/Programs/Basic/Include/FAXINC.BASIC) and using the FAX command directly from a SYSTEM call.

Before spending the time to change all of the application program to use the new interface, I inspected the configuration again, just to see if I was missing something. On the Modem configuration screen, there are two text boxes labeled Init and Reset. They were blank on the working main server. I decided to enter values in these fields to see if it would start working.

I remember reading somewhere that the maximum speed of any fax transmission was 14.4Kbps, so I added $MB14400 to the Init field. The lights on the modem indicated that it was set to a baud rate of 19.2Kbps, so I also added $SB19200 to the init string. The exact text of the init string entered was AT$MB14400$SB19200. I also decided to enter ATZ as the reset string.

After saving these changes and restarting the fax server, I sent some additional tests. They went through without any problems. I noticed that the SETUP FAX application changed the init string and reset string to both contain $MB14400$SB19200 which seemed kind of strange, but it was now faxing consistently without any transmission errors.

An additional side effect was an increase in CPU utilization on the new server with the single fax modem plugged directly in to the COM port. Each time a fax would send, the process associated with that fax would take nearly all available CPU cycles. CPU utilization was minimal on the existing server when sending faxes. My assumption as to the cause of the increase in CPU utilization is that the existing system offloads most of the processing to the intelligent Digi I/O controller, whereas the system using the COM port directly had to manage the flow of the data from the CPU. I’ll keep my eye on the CPU utilization and if it becomes a problem, I’ll recommend that additional Digi products be purchased for the new server.

Older Posts »

Blog at WordPress.com.