Data Directory

From GovTrack.us Wiki

Jump to: navigation, search

GovTrack's data directory (described generally here) has the following layout:

General Notes

When I say "Session" I mean the two-year periods usually called "a Congress", like the 109th Congress for 2005-2006. For me, that's session 109.

Directory Layout and Links to (Informal) Schemas

  • misc/: Miscellaneous data files.
    • bills.technorati.xml: A list of bills by popularity, according to their mentions in the blogosphere. Bill "id"s are the bill type, the Congress, a dash, and the number (see Bill XML for details).
  • us/: This is the main root for all Congressional data.
    • people.xml: Members of Congress (all time)
    • bioguide[1,2,3].csv: A dump of the BioGuide database. Not regularly updated.
    • last_update: The YYYY-MM-DD date of the most recent Daily Digest update from THOMAS that GovTrack has gotten.
    • liv.xml: An XML-ized version of the Legislative Indexing Vocabulary.
    • bills.text/: A directory containing PDF, TXT, XML, and HTML texts of bills.
    • 106...111 (session number): Primary area for legislative data for a session of Congress
      • committees.xml: All current committees and committee membership.
      • committeeschedule.xml: Upcoming committee meetings from this page for the Senate and from the Daily Digest on THOMAS for the House.
      • votes.all.index.xml: A summary of all votes this session.
      • bills/: Full status information for every bill.
      • bills.amdt/: Full status information for every amendment.
      • bills.summary/: XML-ized CRS summaries for bills.
      • bills.cbo/: Congressional Budget Office bill reports, with extracted summaries.
      • cr/: The Congressional Record.
      • gen.rolls-[cart,geo,pca]/: Generated info for votes. Regular projection maps (geo), cartograms (cart), and analysis (pca; though no PCA statistics are currently done). The analysis txt files contain a simple summary of how the parties voted.
      • rolls/: Roll call votes.
      • people.xml: Members of Congress in this session
  • photos/
    • This directory contains jpeg images of Members of Congress, past and present. Not all MoC's have photos. The name of the photo is the GovTrack numeric identifier for the person followed by: nothing, for the largest original image available; 200px, 100px, 50px, for three sizes of the photo, by width; all followed by .jpeg.
  • rdf/
    • This directory contains an RDF dump of the other data. The RDF dump is not regularly updated right now.
Personal tools