File

( 2014-11-09)

Slide version

Readings

  1. “Files: A Brief Introduction.” The Linux Information Project 2004. Web. Link (remote) →
  2. “Inode Definition.” The Linux Information Project 2004. Web. Link (remote) →
  3. “Filesystems: A Brief Introduction.” The Linux Information Project 2004. Web. Link (remote) →
  4. “Directory Definition.” The Linux Information Project 2004. Web. Link (remote) →
  5. “Mystery of Binary Files.” Unix & Linux Stack Exchange 2012. Web. Link (remote) →
  6. “Why Don’t You See Binary Code When You Open a Binary File with Text Editor?” Super User 2011. Web. Link (remote) →
  7. “Transclusion: Fixing Electronic Literature.” Jan. 2007. Web. Link (remote) →
  8. “Doug Engelbart Video Archives.” Internet Archive. Web. Link (remote) →
  9. “Highlights of The 1968 Demo.” Doug Engelbart Institute. Web. Link (remote) →
  10. “Doug Engelbart 1968 Demo.” Mouse Site. Print.
  11. Barnard III, G. A., and L. Fein. “Organization And Retrieval of Records Generated in a Large-Scale Engineering Project.” Proceedings of The Eastern Joint Computer Conference. Philadelphia, PA: N.p., 1958. 59–63. Web. Link (remote) →
  12. Bush, Vannevar. “As We May Think.” Atlantic Monthly (1945): n. pag. Web. Link (remote) →
  13. Engelbart, D. C. “Background.” Augmenting Human Intellect: A Conceptual Framework. Menlo Park, CA: Stanford Research Institute, 1962. 47–72. Web. Link (remote) →
  14. Lin, Charles. “Ascii vs. Binary Files.” CMSC 311: Computer Organization 2003. Web. Link (remote) →
  15. Nelson, T. H. “Complex Information Processing: A File Structure for The Complex, The Changing And The Indeterminate.” ACM Press, 1965. 84–100. Web. Link (remote) →

File as abstraction

  • File is another layer of abstraction on top of byte and word
  • In memory addressing, word and word size are adapted from the domain of natural (human) language
  • File is adapted from the domain of paper documents
  • A linear array (indexed sequence) of bytes stored on a durable medium

File contents: data

  • Illustration: viewing a file with a hex viewer
  • Center columns: sixteen two-digit hex numbers arranged in four groups of four columns each
  • Rightmost column: ASCII characters
  • Leftmost column: hexadecimal memory address of the first byte in the row
  • DO: Javascript PC Emulator

      # mkdir temp
      # cd temp
      # echo "Foo, bar, foobarbarous" > foo.txt
      # hexdump foo.txt
      # hexdump -C foo.txt
      # xxd -b foo.txt
    

(If Javascript PC Emulator provided the utility xxd, this final command would display output in bits: that is, in binary notation)

File as container of data

  • The OS represents memory storage in durable media in (pages and) blocks
  • The OS keeps track of blocks via a page table
  • If it is larger than the size of one block, a file may be distributed over more than one block

File elements

  1. An identifier data structure, usually with a numeric identifier
    • Unix-types OSs track files by their index nodes (inodes)
      • Contains memory block address of the file, plus file attribute metadata (modification date, owner, access permissions, etc.)
    • An inode table associates inode numbers with their inodes, for reference
  2. A filename
    • A name given to the file either by the OS or by the user
    • Must be unique within a directory
    • The filesystem records a relationship between the inode and the filename
    • This relationship is represented as a file directory or folder containing both filename and inode number
  3. The file contents

Viewing file elements

  • DO: Javascript PC Emulator

      # mkdir temp
      # cd temp
      # echo "Foo, bar, foobarbarous" > foo.txt
      # ls -li foo.txt
      # cat foo.txt
    

Binary data

  • All data stored or transmitted by a computer
  • Not “human-readable” unless interpreted as text
  • Image files; audio files
  • Compiled executable program files (which may also contain ASCII or Unicode text)

Text data

  • Binary data that happens to be interpreted as text
  • ASCII or Unicode codes assigned to represent a certain character
  • “Human-readable”

Text file

  • Text file as container of the content text data
  • POSIX definition
    1. “A file that contains [text] characters organized into zero or more lines” (terminated by a newline character)
    2. “The lines do not contain NUL characters” (used in data formats and programming languages to mark the end of a string)

Program as text file

#include <stdio.h>

main()
{
    printf("hello, world!\n");
}

Compiling a program text file

(Javascript PC Emulator provides the file hello.c in its root directory; you don’t need to create it. tcc is a substitute for the gcc compiler, which Javascript PC Emulator provides, but which is slow)

Filesystem elements

  1. File
  2. Directory (or folder)
  3. Metadata

Filesystem elements: directory (folder)

Filesystem elements: metadata

  • File attribute metadata for individual files
  • Directory tables that store the names of the files in a directory
  • Memory and storage media attributes

Filesystem history: Bush’s memex

  • Social benefits of science
    1. Material: food, clothing, shelter, physical health
    2. Immaterial: knowledge
  • Conditions of modern progress
    1. Knowledge requires specialization
    2. Specialization is necessary for progress
  • Problems with specialization
    1. Fragments knowledge, and
    2. Produces knowledge in excess
  • “A transformation in scientific records”
    1. “Mechanical aids”
    2. “Mature,” “creative” thought

Indexing systems: arbitrary linear sequences

  • 1, 2, 3, 4, 5, 6, 7, 8, 9, 10…
  • A, B, C, D, E, F, G, H, I, J, K, L…
  • “The human mind does not work that way. [Rather,] It operates by association” (Bush)

Bush: Associative indexing

Douglas Engelbart, NLS (“oN Line System”) demo (1968)

Full video and annotated segments, MouseSite, Stanford University

  • Clip #2, introduction
  • Clip #3, Word processing and file creation

Engelbart: File as “symbol structure”

Any file is a symbol structure whose purpose is to represent a variety of concepts and concept structures in a way that makes them maximally available and useful to the needs of the human’s mental-structure development […] The Memex adds a factor of speed and convenience to ordinary filing-system (symbol-structuring) processes that would encourage new methods of work by the user, and it also adds speed and convenience for processes not generally used before. (Engelbart)

Ted Nelson: A “dream file”

  • “The kinds of file structures required if we are to use the computer for personal files and as an adjunct to creativity are wholly different in character from those customary in business and scientific data processing” (Nelson)
  • ”…the dream file: the file system that would have every feature a novelist or absent-minded professor could want, holding everything he wanted in just the complicated way he wanted it held, and handling notes and manuscripts in as subtle and complex ways as he wanted them handled” (Nelson)

Nelson: a file structure for change

  • “The physical universe is not all that decays. So do abstractions and categories. Human ideas, scholarship, and language are constantly collapsing and unfolding. … While [this process] may be erratic, it never stops; and the meaning of all this for information retrieval should be clear” (Nelson)
  • “To the extent that information retrieval is concerned with seeking true or ideal or permanent codes and categories… [it] seems to me to be fundamentally mistaken. The categories are chimerical (or temporal) and our categorization systems must evolve as they do. … [T]he only ultimate structure is change itself” (Nelson)