File Transfer Protocol

From wiki.gis.com
Jump to: navigation, search

File Transfer Protocol (FTP) is a standard network protocol used to exchange and manipulate files over an Internet Protocol computer network, such as the Internet. FTP is built on a client-server architecture and utilizes separate control and data connections between the client and server applications. Client applications were originally interactive command-line tools with a standardized command syntax, but graphical user interfaces have been developed for all desktop operating systems in use today. FTP is also often used as an application component to automatically transfer files for program internal functions. FTP can be used with user-based password authentication or with anonymous user access.

Connection methods

FTP runs over the Transmission Control Protocol (TCP).[1] Usually FTP servers listen on the well-known port number 21 (IANA-reserved) for incoming connections from clients. A connection to this port from the FTP client forms the control stream on which commands are passed to the FTP server and responses are collected. FTP uses out-of-band control; it opens dedicated data connections on other port numbers. The parameters for the data streams depend on the specifically requested transport mode. Data connections usually use port number 20.

In active mode, the FTP client opens a dynamic port, sends the FTP server the dynamic port number on which it is listening over the control stream and waits for a connection from the FTP server. When the FTP server initiates the data connection to the FTP client it binds the source port to port 20 on the FTP server.

In order to use active mode, the client sends a PORT command, with the IP and port as argument. The format for the IP and port is "h1,h2,h3,h4,p1,p2". Each field is a decimal representation of 8 bits of the host IP, followed by the chosen data port. For example, a client with an IP of 192.168.0.1, listening on port 49154 for the data connection will send the command "PORT 192,168,0,1,192,2". The port fields should be interpreted as p1×256 + p2 = port, or, in this example, 192×256 + 2 = 49154.

In passive mode, the FTP server opens a dynamic port, sends the FTP client the server's IP address to connect to and the port on which it is listening (a 16-bit value broken into a high and low byte, as explained above) over the control stream and waits for a connection from the FTP client. In this case, the FTP client binds the source port of the connection to a dynamic port.

To use passive mode, the client sends the PASV command to which the server would reply with something similar to "227 Entering Passive Mode (127,0,0,1,192,52)". The syntax of the IP address and port are the same as for the argument to the PORT command.

In extended passive mode, the FTP server operates exactly the same as passive mode, however it only transmits the port number (not broken into high and low bytes) and the client is to assume that it connects to the same IP address that was originally connected to. Extended passive mode was added by RFC 2428 in September 1998.

While data is being transferred via the data stream, the control stream sits idle. This can cause problems with large data transfers through firewalls which time out sessions after lengthy periods of idleness. While the file may well be successfully transferred, the control session can be disconnected by the firewall, causing an error to be generated.

The FTP protocol supports resuming of interrupted downloads using the REST command. The client passes the number of bytes it has already received as argument to the REST command and restarts the transfer. In some commandline clients for example, there is an often-ignored but valuable command, "reget" (meaning "get again"), that will cause an interrupted "get" command to be continued, hopefully to completion, after a communications interruption.

Resuming uploads is not as easy. Although the FTP protocol supports the APPE command to append data to a file on the server, the client does not know the exact position at which a transfer got interrupted. It has to obtain the size of the file some other way, for example over a directory listing or using the SIZE command.

In ASCII mode (see below), resuming transfers can be troublesome if client and server use different end of line characters.

Security problems

The original FTP specification is an inherently unsecure method of transferring files because there is no method specified for transferring data in an encrypted fashion. This means that under most network configurations, user names, passwords, FTP commands and transferred files can be captured by anyone on the same network using a packet sniffer. This is a problem common to many Internet protocol specifications written prior to the creation of Secure Sockets Layer (SSL), such as HTTP, SMTP and Telnet. The common solution to this problem is to use either SFTP (SSH File Transfer Protocol), or FTPS (FTP over SSL), which adds SSL or TLS encryption to FTP as specified in RFC 4217.

FTP return codes

FTP server return codes indicate their status by the digits within them. A brief explanation of various digits' meanings are given below:

  • 1xx: Positive Preliminary reply. The action requested is being initiated but there will be another reply before it begins.
  • 2xx: Positive Completion reply. The action requested has been completed. The client may now issue a new command.
  • 3xx: Positive Intermediate reply. The command was successful, but a further command is required before the server can act upon the request.
  • 4xx: Transient Negative Completion reply. The command was not successful, but the client is free to try the command again as the failure is only temporary.
  • 5xx: Permanent Negative Completion reply. The command was not successful and the client should not attempt to repeat it again.
  • x0x: The failure was due to a syntax error.
  • x1x: This response is a reply to a request for information.
  • x2x: This response is a reply relating to connection information.
  • x3x: This response is a reply relating to accounting and authorization.
  • x4x: Unspecified as yet
  • x5x: These responses indicate the status of the Server file system vis-a-vis the requested transfer or other file system action.

Anonymous FTP

A host that provides an FTP service may additionally provide anonymous FTP access. Users typically login to the service with an 'anonymous' account when prompted for user name. Although users are commonly asked to send their email address in lieu of a password, little to no verification is actually performed on the supplied data.

As modern FTP clients typically hide the anonymous login process from the user, the ftp client will supply dummy data as the password (since the user's email address may not be known to the application). For example, the following ftp user agents specify the listed passwords for anonymous logins:

  • Mozilla Firefox (3.5.2) — mozilla@example.com
  • KDE Konqueror (3.5) — anonymous@
  • wget (1.10.2) — -wget@
  • lftp (3.4.4) — lftp@
  • Opera (9.6.4) — opera@

The Gopher protocol has been suggested as an alternative to anonymous FTP, as well as File Service Protocol.[citation needed]

Transfer parameters

According to the FTP standard RFC959, the transfer of data is determined by four main parameters:

  • the data structure: stream-oriented, record-oriented or page-oriented
  • the data type: the textual types of ASCII, EBCDIC, with subtypes for different carriage control disciplines; the binary types of byte-oriented, or arbitrary length word-oriented
  • the vertical format control: for the textual types of ASCII and EBCDIC, whether vertical format control is specified using
  • the transfer mode: stream-oriented transfer, uncompressed block-oriented transfer or compressed block-oriented transfer

By the 1990s, the usage of FTP centered on stream-oriented file structure and stream-oriented transfer mode; most FTP servers and clients from the 1990s onwards do not support other file structures or transfer modes.

Data structure

Data structure is specified using the STRU command. The following file structures are defined in section 3.1.1 of RFC959:

  • F or FILE structure (stream-oriented). Files are viewed as an arbitrary sequence of bytes, characters or words. This is the usual file structure on Unix systems and other systems such as CP/M, MSDOS and Microsoft Windows. [Section 3.1.1.1]
  • R or RECORD structure (record-oriented). Files are viewed as divided into records, which may be fixed or variable length. This file organization is common on mainframe and midrange systems, such as MVS, VM/CMS, OS/400 and VMS.
  • P or PAGE structure (page-oriented). Files are divided into pages, which may either contain data or metadata; each page may also have a header giving various attributes. This file structure was specifically designed for TENEX systems, and is generally not supported on other platforms. RFC1123 section 4.1.2.3 recommends that this structure not be implemented.

Data type

Data type is specified using the TYPE command. The following data types are defined:

  • A (ASCII). Textual data transferred over the network in the NVT ASCII character set.
  • E (EBCDIC). Textual data transferred over the network in the EBCDIC character set.
  • I or IMAGE (byte-oriented). Binary data transferred as a stream of 8-bit bytes.
  • L or LOCAL (word-oriented). Binary data transferred as a stream of words. The number of bits in the word is specified as an argument, e.g. L32 for 32-bit words, L36 for 36-bit words. The words are packed into

A common problem historically has been FTP clients and servers which default to ASCII type, but do not provide any protection against transferring binary files. As a result, the binary files are corrupted, through e.g. translation of newline characters. In most contemporary clients, this is avoided by automatically defaulting to image type. Another approach would be to choose the FTP TYPE based on the type of the file as recorded in the filesystem (for those filesystems which do this) or heuristically.

L8 is effectively equivalent to I, and most FTP servers or clients do not accept other word sizes, save for 36-bit platforms. The data is to be transferred in packed binary format for transfer.

Note the data type indicates the type for transfer, not the type in which the data is stored on the source or destination systems. The client and server are free to convert the data to a form which is most convenient on their platform. For example, the textual data types of A and E may be subjected to translation of character set (e.g. ASCII vs EBCDIC), translation of newline convention (e.g. CRLF vs LF), or translation of textual data between stream-oriented and record-oriented formats (i.e. one record per a line, possibly padded with spaces to the maximum line length v.s. stream-oriented with newline characters to separate the lines). Similarly, a 36-bit platform may choose to store an L32 format file sent or received as 36-bit words each padded with four zero bits. The I data type is the least likely to be converted, but even it may be subject to conversion on non-byte oriented platforms.

Frequently FTP clients use the word "MODE" to refer to the data type, although that is a misnomer, since the word "MODE" is already taken to refer to the transfer mode.

Vertical format control

Only applicable to the textual data types (A and E), and indicated as the second parameter to the TYPE command (section 3.1.1.5):

  • N for non-print, meaning no vertical format control is specified. This is the default if none is specified.
  • T to indicate that vertical format control is specified using the ASCII/EBCDIC TELNET format control characters, i.e. CR, LF, NL, VT, FF
  • A to indicate that ASA vertical format control is to be applied

Transfer mode

The transfer mode is specified by the MODE command (section 3.40. The following modes are defined:

  • S or STREAM MODE: data is represented as a stream of 8-bit bytes. An escape mechanism is defined for record-oriented files, to explicitly indicate record boundaries and explicit end of file. For stream-oriented files, no escape mechanism is defined and end of file is represented by closing the connection.
  • B or BLOCK MODE: data is represented as a stream of blocks. Each block has a header to indicate its length, and also flags to mark end-of-record and end-of-file. The flags can also be used to indicate a suspect data block, e.g. a block of data read from a magnetic tape which failed its checksum, but is being transferred anyway even though it may contain errors. Also supports restart markers, which enable restarting the data transmission from that point.
  • C or COMPRESSED MODE: similar to stream mode, but adds support for run-length encoding and also the flags defined in block mode.

As of the 1990s, most FTP clients and servers only support STREAM mode.

FTP commands

Commands which begin with the letter X are generally reserved for experimental extensions, although one should use SITE subcommands instead for this purpose.

RFC959 defines the following FTP commands, which were also present in RFC765:

  • USER: supplies the username for login
  • PASS: supplies the password for login
  • ACCT: supplies accounting information. For example, a user may work on multiple projects; the account can be used to ensure that the charges for the data storage are billed to the correct project. (Not commonly implemented).
  • CWD: changes the working directory to that specified
  • REIN: removes all authentication information and parameter settings; must be followed by relogin via USER
  • QUIT: terminates the connection
  • PORT: host/port specification for data transfer
  • PASV: enter passive mode
  • TYPE: specify data type and vertical format control (see above)
  • STRU: specify data structure (see above)
  • MODE: specify transmission mode (see above)
  • RETR: initiates a data transfer from server to client, specifying name of file to retrieve
  • STOR: initiates a data transfer from client to server, specifying name file is to be stored in on server
  • APPE: similar to STOR, except if file already exists, append received data to end of it rather than create
  • ALLO: allocates space for a file. Optionally, specifies the maximum size of each record.
  • REST: specifies the restart marker from which the transfer is to resume. Originally intended for use with restart markers sent by the server in B or C mode, but later extended in RFC3659 to byte offsets specified in S mode.
  • RNFR: to rename a file, specify the file to be renamed
  • RNTO: to rename a file, specifies the new name for the file, and performs the rename. Often also used to implement moves.
  • DELE: deletes a file
  • PWD: prints the current working directory
  • LIST: opens a data connection with A or E data type, to transfer a listing of files in the current directory. The format of data is system-specific, but intended to be human readable.
  • NLST: similar to LIST, but transfer unadorned names of files with CRLF or NL.
  • SITE: provides subcommands to perform system specific services. The nature of these services is undefined.
  • STAT: without arguments, current status of connection. With argument, equivalent to LIST, but the listing is transferred over the control connection encapsulated in messages.
  • HELP: provides HELP, optionally with an argument to specify the specific command on which help is requested.
  • NOOP: does nothing

RFC959 adds the following new commands which were not present in RFC765:

  • CDUP: changes the working directory to the parent. Present since the notation for parent directory varies from platform to platform (although most commonly .. on systems descended from Unix or MS DOS).
  • SMNT: mount a different file system or volume. Intended for systems such as DOS or VMS where there is a distinction between volume and directory in pathnames; but commonly unimplemented even on such systems.
  • STOU: store unique - initiates a data transfer from client to server; server shall chose a unique name for file to be received
  • RMD: removes a directory
  • MKD: creates a directory
  • PWD: prints the current directory
  • SYST: identifies the operating system of the server

RFC765 described a number of commands which were removed in RFC959. These have not been part of FTP implementations since the early 1980s, since their functionality was later replaced (in part) by SMTP:

  • MLFL: used to send email over the data connection
  • MAIL: used to send email over the control connection
  • MSND: like MAIL, but sends data directly to user's terminal rather than their mailbox
  • MSOM: behaves as either MAIL or MSND—send to terminal if allowed, otherwise to mailbox
  • MSAM: similar to MSOM—except that MSOM only sends to mailbox if delivery to terminal not possible; but MSAM sends to mailbox irrespective of whether terminal delivery is successfully attempted
  • MRSQ: enables transmission of a single email to multiple users at the same host
  • MRCP: subsequent to MRSQ, identifies one such recipient; repeated for each recipient

RFC2228 adds a number of commands related to encryption and message authentication:

  • AUTH: identifies the authentication/security mechanism to be used
  • ADAT: specifies security data specific to the chosen AUTH mechanism
  • PBSZ: used to negotiate maximum buffer size for encrypted data
  • PROT: specifies protection level for data channel. Following levels are defined:
    • C (Clear) - data channel is subject neither to encryption nor integrity protection
    • S (Safe) - integrity protection applied to data channel
    • E (Confidential) - encryption applied to data channel
    • P (Private) - both encryption and integrity protection applied to data channel
  • CCC: disables integrity protection for subsequent commands on control channel
  • MIC: sends a command with integrity protection
  • CONF: sends a command with confidentiality protection
  • ENC: sends a command with both integrity and confidentiality protection

RFC1639 ("FOOBAR"; succeeded RFC1545) adds support for FTP over arbitrary transport protocols, such as IPX/SPX or OSI. For this, it defines two new commands:

  • LPRT: similar to PORT, but supports arbitrary address and port formats.
  • LPSV: similar extension to PASV

RFC2389 defines two new commands used as a generic extension mechanism for FTP:

  • FEAT: retrieves a listing of optional features supported by FTP server
  • OPTS: a generic mechanism for the client to specify options to arbitrary FTP commands

RFC2428 adds two new commands, similar in principle to RFC1639 but differing in details:

  • EPRT: similar to PORT, but supports arbitrary address families rather than only IPv4; specifically intended for IPv6.
  • EPSV: similar extension to PASV

LPRT sends addresses as an arbitrary octet string (albeit decimal encoded), EPRT sends them as formatted strings, the format of the string being dependent upon the address format. EPRT assumes a the use of TCP-style 16-bit port numbers, whereas LPRT is more flexible and supports transport protocols with greater than 16-bit port numbers.

RFC2640 adds one new command:

  • LANG: used to choose the language for FTP messages

RFC3659 defines several new commands:

  • MDTM: retrieve file modification time
  • SIZE: retrieve file size
  • MLSD: retrieve listing of files in a directory. Unlike NLST, this returns not only file names but also attributes; but unlike LIST, it returns the attributes in an extensible standardised format rather than an arbitrary platform-specific one.
  • MLST: same as MLSD, but retrieves listing for an individual file rather than a directory. For directories, retrieves their own attributes rather than a listing of their members. MLST does not require a data connection, but returns a single line containing the listing for the requested path.

FTP and web browsers

Most recent web browsers and file managers can connect to FTP servers, although they may lack the support for protocol extensions such as FTPS. This allows manipulation of remote files over FTP through an interface similar to that used for local files. This is done via an FTP URL, which takes the form ftp(s)://<ftpserveraddress>  (e.g., ftp://ftp.gimp.org/). A password can optionally be given in the URL, e.g.:   ftp(s)://<login>:<password>@<ftpserveraddress>:<port>. Most web-browsers require the use of passive mode FTP, which not all FTP servers are capable of handling. Some browsers allow only the downloading of files, but offer no way to upload files to the server.

FTP and NAT devices

The representation of the IP addresses and port numbers in the PORT command and PASV reply poses another challenge for Network address translation (NAT) devices in handling FTP. The NAT device must alter these values, so that they contain the IP address of the NAT-ed client, and a port chosen by the NAT device for the data connection. The new address and port will probably differ in length in their decimal representation from the original address and port. This means that altering the values on the control connection by the NAT device must be done carefully, changing the TCP Sequence and Acknowledgment fields for all subsequent packets. Such translation is not usually performed in most NAT devices, but special application layer gateways exist for this purpose.

See also Application-level gateway

FTP over SSH (not SFTP)

FTP over SSH (not SFTP) refers to the practice of tunneling a normal FTP session over an SSH connection.

Because FTP uses multiple TCP connections (unusual for a TCP/IP protocol that is still in use), it is particularly difficult to tunnel over SSH. With many SSH clients, attempting to set up a tunnel for the control channel (the initial client-to-server connection on port 21) will protect only that channel; when data is transferred, the FTP software at either end will set up new TCP connections (data channels) which will bypass the SSH connection, and thus have no confidentiality, integrity protection, etc.

Otherwise, it is necessary for the SSH client software to have specific knowledge of the FTP protocol, and monitor and rewrite FTP control channel messages and autonomously open new forwardings for FTP data channels. Version 3 of SSH Communications Security's software suite, and the GPL licensed FONC are two software packages that support this mode.

FTP over SSH is sometimes referred to as secure FTP; this should not be confused with other methods of securing FTP, such as with SSL/TLS (FTPS). Other methods of transferring files using SSH that are not related to FTP include SFTP and SCP; in each of these, the entire conversation (credentials and data) is always protected by the SSH protocol.

Variants

The Trivial File Transfer Protocol (TFTP) is a similar, but simplified, not interoperable, and unauthenticated version of FTP.

See also

  • File eXchange Protocol (FXP)
  • FTAM
  • FTPFS
  • List of FTP server return codes
  • List of FTP commands
  • List of file transfer protocols
  • Managed File Transfer
  • OBEX
  • Shared file access
  • TCP Wrapper
  • Comparison of FTP client software
  • List of FTP server software
  • Comparison of FTP server software

References

Further reading

  • RFC 959 – File Transfer Protocol (FTP). J. Postel, J. Reynolds. Oct-1985. This obsoleted the preceding RFC 765 and earlier FTP RFCs back to the original RFC 114.
  • RFC 1579 – Firewall-Friendly FTP. Feb-1994.
  • RFC 2228 – FTP Security Extensions. Oct-1997.
  • RFC 2389 - Feature negotiation mechanism for the File Transfer Protocol. Aug-1998.
  • RFC 2428 – Extensions for IPv6, NAT, and Extended passive mode. Sep-1998.
  • RFC 2640 – Internationalization of the File Transfer Protocol. Jul-1999.
  • RFC 3659 – Extensions to FTP. P. Hethmon. Mar-2007.

External links