myinterfacesuck-0.5
--------------------
---
SHORT:
This proggy is mainly for use with buggy network(ethernet) drivers and/or
broken hardware, what it does is to transparenty reinitialize the network
interface after it locked up.
---
HOW:
When my 3c589D NIC locks up(usually with long uploads at full speed) then,
the TCP kernel dont know about that at all, it does not give any errors, it
looks like there is no hosts around so connections get timed out. I still
dont know what causes such weird behaviour, however i have read somewhere
that NIC i got is somewhat problematic and even Linux people have similar
problems with it. My proggy is to prevent such "events" from lasting long, it
simply emits 4k ICMP messages(small packets like 64 Bytes wont allocate free
requests very fast) to a non forwardable host 255.255.255.255 every user
defined amount of time. This way it causes device requests to melt(if the
device is locked up then it can not free them), thus after allocating all
free requestes kernel sets the error to 55(No buffer space available) and
this is when reinit. comes in. Firstly it brings down symbolic interface,
then the device driver, but puts it back online(this is like reset, this
cures the hardware) and as a last step makes the interface up and running
again. All active transfers should resume shortly without any reconnections
(depends on how fast reqs. will molt, in other words if it will do that
before the timeout).
If you know how to detect interface lockups without sending anything over
the network then please contact me asap!
---
NEWS:
[27-Feb-2008] 0.5
-------------------
Reduced the packet to 4000 Bytes, due to the fact that some
Windgows firewalls treat higher values as someting abnormal
and attempt to log that traffic. Changes in controlc handler.
[14-Jan-2008] 0.4
-------------------
Did more probes and it turned out, that the best results can be
achieved with 4096 Bytes(tried to reduce but this is absolute
minimum). Some other small changes. Removed 0.1 completly.
[14-Jan-2008] 0.3
-------------------
Lowered packet length to 1480 Bytes, removed bloating code from icmp
part and fixed some minor issues. Removed the executable of 0.1 .
[12-Jan-2008] 0.2
-------------------
This version adds some nice controls trough Arexx, there is still 0.1
in 'srcsrcsrc/'. I had a choice between 'getenv()' and ARexx messaging,
so i picked the second method. If you open the shell and type:
> rx "address MIFS1;pause"
or
> rx "address MIFS1;'PAUSE'"
then it will stop sending anything over the socket,
> rx "address MIFS1;type_here_whatever_you_want"
will enable the broadcast of the ICMP echos again, the cmd might be
anything except 'PAUSE'. If you switch between gateways on different
interfaces from time to time then this will be surely very helpful.
---
NOTES:
Requires 68000+(no FPU), OS 2.04+, ~24 KiB of free memory, ARexx,
AmiTCP 30b2+, Not tested with Miami(might not work)!!!
Uses 'syslog()'(TCP/IP log) to tell if it detected particular error.
Use <loglev> of 0 to 5 to see the message in AmiTCP log file/console
or 6 or 7 if you want to treat it as a debug, 8 no syslog.
Interesingly or not but info and debug do not work in AmiTCP 4.6...
If it turns out that this program cant help with your current settings
then increase WRITEREQ(amitcp:db/interfaces) of your interface by a 2 to 4
additional requests. I found that when all requestes are already occupied
and lock up occurs then this program will be simply waiting until someone
free one, but you will be right thinking that this should give an error(its
TCP/IP kernel design or maybe fault...).
If you want to fix something and recompile then you must have SAS/C 6.5x,
'netinclude' and 'netlib' from AmiTCP 30b2 extracted to the dirs in
'srcsrcsrc/'. Then just simply 'cd srcsrcsrc' type 'compile'.
Installation is pretty easy, put this thing after your 'startnet' or in
'startnet' with the arguments that match your interface and the device.
You may also like to 'changetaskpri -2' so it wont be handled too often by
the cpu, but keep in mind that all the hard work is done by the kernel so
you wont gain too much. Anyway, this program is not very cpu intensive,
i made some tests without any interval and measured MIPS of my cpu
with 'bogomips' and it was still acceptable. Another thing is to remember
about non-network related cpu time eaters(like PC-Task), in such case they
should be pushed to -2 or even less to make this program function.
Yes, i have cosidered a check if there are any active xfers so it could
be checking only then, but this seems to be too much hassle and what can be
possibly gained by this?
There is more programs in this package see 'srcsrcsrc/source/'. Note however
that these programs wont probably be updated and may contain some nasty bugs
(especially 'mifsallocreq' - treat it as a one big bug ;).
---
ERROR CODES:
* 0 - ENOERROR Undefined error: 0
1 - EPERM Operation not permitted
2 - ENOENT No such file or directory
3 - ESRCH No such process
4 - EINTR Interrupted system call
5 - EIO Input/output error
6 - ENXIO Device not configured
7 - E2BIG Argument list too long
8 - ENOEXEC Exec format error
9 - EBADF Bad file descriptor
10 - ECHILD No child processes
11 - EDEADLK Resource deadlock avoided
12 - ENOMEM Cannot allocate memory
13 - EACCES Permission denied
14 - EFAULT Bad address
15 - ENOTBLK Block device required
16 - EBUSY Device busy
17 - EEXIST File exists
18 - EXDEV Cross-device link
19 - ENODEV Operation not supported by device
20 - ENOTDIR Not a directory
21 - EISDIR Is a directory
22 - EINVAL Invalid argument
23 - ENFILE Too many open files in system
24 - EMFILE Too many open files
25 - ENOTTY Inappropriate ioctl for device
26 - ETXTBSY Text file busy
27 - EFBIG File too large
28 - ENOSPC No space left on device
29 - ESPIPE Illegal seek
30 - EROFS Read-only file system
31 - EMLINK Too many links
32 - EPIPE Broken pipe
33 - EDOM Numerical argument out of domain
34 - ERANGE Result too large
35 - EAGAIN,EWOULDBLOCK Resource temporarily unavailable
36 - EINPROGRESS Operation now in progress
37 - EALREADY Operation already in progress
38 - ENOTSOCK Socket operation on non-socket
39 - EDESTADDRREQ Destination address required
40 - EMSGSIZE Message too long
41 - EPROTOTYPE Protocol wrong type for socket
42 - ENOPROTOOPT Protocol not available
43 - EPROTONOSUPPORT Protocol not supported
44 - ESOCKTNOSUPPORT Socket type not supported
45 - EOPNOTSUPP Operation not supported
46 - EPFNOSUPPORT Protocol family not supported
47 - EAFNOSUPPORT Address family not supported by protocol family
48 - EADDRINUSE Address already in use
49 - EADDRNOTAVAIL Can't assign requested address
*50 - ENETDOWN Network is down
51 - ENETUNREACH Network is unreachable
52 - ENETRESET Network dropped connection on reset
53 - ECONNABORTED Software caused connection abort
54 - ECONNRESET Connection reset by peer
*55 - ENOBUFS No buffer space available
56 - EISCONN Socket is already connected
57 - ENOTCONN Socket is not connected
58 - ESHUTDOWN Can't send after socket shutdown
59 - ETOOMANYREFS Too many references: can't splice
60 - ETIMEDOUT Connection timed out
61 - ECONNREFUSED Connection refused
62 - ELOOP Too many levels of symbolic links
63 - ENAMETOOLONG File name too long
64 - EHOSTDOWN Host is down
65 - EHOSTUNREACH No route to host
66 - ENOTEMPTY Directory not empty
67 - EPROCLIM Too many processes
68 - EUSERS Too many users
69 - EDQUOT Disc quota exceeded
70 - ESTALE Stale NFS file handle
71 - EREMOTE Too many levels of remote in path
72 - EBADRPC RPC struct is bad
73 - ERPCMISMATCH RPC version wrong
74 - EPROGUNAVAIL RPC prog. not avail
75 - EPROGMISMATCH Program version wrong
76 - EPROCUNAVAIL Bad procedure for program
77 - ENOLCK No locks available
78 - ENOSYS Function not implemented
79 - EFTYPE Inappropriate file type or format
* - perhaps the only useful codes
---
USAGE:
myinterfacesuck <iface> <s2device> <s2unit> <errcode> <loglev> [ival]
<iface> - your main network interface, eth0, eth1, ...
<s2device> - device used by the <iface>
<s2unit> - device unit, usually 0
<errcode> - error code at which reinitialization should occur,
0 = reinit at any error
<loglev> - 0 - emerg, 1 - alert, 2 - crit, 3 - err, 4 - warn,
5 - notice, 6 - info, 7 - debug, 8 - no log
[ival] - time to wait between checks, 1-60 seconds, default: 3
which is reasonable value and gives good results
---
EXAMPLE/PROBE:
Shell-A
"
27.[local]RamDisk:> ifconfig eth0 down
27.[local]RamDisk:>
"
Shell-B
"
3.[local]STUFF:myinterfacesuck-0.1> myinterfacesuck eth0 3c589.device 0 50 8 1
/// ARexx port 'MIFS1' has been attached, 'PAUSE' or 'WHATEVER' controls.
/// monitoring interface 'eth0' for error 50, check every 1 sec(s) ...
/// error 50 detected, transparent reinitialization in progress ...
/// interface should be up and running again.
"
---
RECOMMENDED USAGE FOR 55:
(we want to check as often as possible, but we dont care
if something is xferring at full speed and loads the cpu,
so thats why pri is -2)
changetaskpri -2
run >nil: myinterfacesuck eth0 3c589.device 0 55 5
changetaskpri 0
---
megacz@usa.com
|