SCEDC Home
Home
data center chronicles
Volume I, Issue 3

Welcome to the third issue of the Southern California Earthquake Data Center's electronic newsletter. We produce this quarterly newsletter as part of our continuing efforts to make SCEDC data more accessible to our users, to improve our communication and outreach and to promote the tools and services we provide.

This newsletter will be archived at: www.data.scec.org/about/chronicle/. Please send your questions, comments, suggestions to: webmgr@quakedc.gps.caltech.edu.

Contents:
A. The Archive
B. What's new with STP (Seismic Transfer Program)?
C. Dataless SEED volumes now Available at the SCEDC
D. DHI at the SCEDC
E. Highlight: The SCEDC/SCSN Database System
F. Strong-Motion Naming and Aliases
G. USArray - BigFoot


A. The Archive

The Archive: By the Numbers

Total size of the waveform archive: 3,449 GB
Size of SCEDC parametric and waveform database: 235,647,421 rows

For the period of April 1 - June 30, 2004 :

Data via transferred STP:

  • 1,695,677 waveforms = average of 18,768 waveforms daily.
  • 204 gigabytes of waveform data = average of 2,250 megabytes daily = 26 kilobytes per second.

The SCEDC archived:

  • 4,045 events
  • 935,082 waveforms
  • 70,356 arrivals
  • 195,284 amplitudes

magnitude

Number of
local events (le):

0-1

1236

1-2

1801

2-3

241

3-4

23

4-5

1

5-6

1



# of events

event type

3304

le (local event)

191

qb (quarry blast)

323

re (regional event)

26

sn (sonic blast)

1

st (subnet trigger)

200

ts (teleseism)

4045

TOTAL


Six month summary of requests for catalog information:

Jan

102,347

Feb

84,199

March

154,047

April

96,764

May

45,025

June

103,796

Total

586,178



Continuous Archiving of High-Sample Rate Data

The SCEDC continuously archived 7 hours of HH_, HL_ (80 sps) and EH_, EL_ (100 sps) data from the entire CI and AZ array for the June 15th, 2004 magnitude 5.3 offshore event (42 miles SE of San Clemente Island; EVID: 14065544).

More information on this topic is available at http://www.data.scec.org/about/sigeventsshot.html


B. What's new with STP (Seismic Transfer Program)?

New STP Client - Version 1.4 for Windows

In response to requests from the user community, we have recently released a STP console client for Windows. This client is virtually identical to the UNIX and Linux versions, but it operates in the Windows environment and allows users to download SCEDC data directly onto their PC.

To get STP 1.4 for Windows:

  1. go to http://www.data.scec.org/ftp/programs/stp/
  2. left click on stp.exe
  3. save file to disk
  4. double-click on stp.exe

Any data downloaded will be saved into the same directory that you run stp.exe from. As an example, try:
STP> PHASE -f northridge.txt -e 3144585

This command will save a file (the -f command) called northridge.txt containing phase information for the event (the -e command) 3144585, the event ID for the 6.7 Northridge event to your working directory.


The SCEDC has also developed a GUI version of STP that runs on Windows. To get the GUI version of STP 1.4 for Windows:

  1. go to http://www.data.scec.org/ftp/programs/stp/
  2. left click on stp_gui.exe
  3. save file to disk
  4. double-click on stp_gui.exe

The Windows console client version functions similar to the UNIX and Linux client version, while the GUI version looks similar to the Java version of STP that runs on the SCEDC website. If you experience any difficulties using either client program, please let us know by emailing: mullaney@gps.caltech.edu .

Differences between the Windows and UNIX/Linux Client Versions.

Although the Windows client version of STP looks and works almost exactly the same as the UNIX and Linux client versions, and most of the code has remained the same in modifying STP for Windows, there is a significant difference in the way the programs function. In the UNIX and Linux versions of STP, the client communicates with the server by sending to, and receiving files from the server. Because this method would not work on Windows, it was necessary to use raw socket functions to communicate with the server instead. For STP to work on a Windows platform, it was necessary to set up a Windows Socket (Winsock), which creates a network programming interface for Windows.


C. Dataless SEED volumes now Available at the SCEDC

The SCEDC and SCSN have cooperated to complete the production of station metadata in the form of dataless SEED volumes for the present configuration of all currently-active SCSN broadband stations. A listing of the stations available and links to the volumes are available from the SCEDC website at http://www.data.scec.org/stations/seed/dl_seed.php. This effort is being expanded to provide a complete station history for all SCSN stations.

Users can download individual dataless SEED volumes (format: datlaless.STANAME) from the Data Center's anonymous FTP site at: scec.gps.caltech.edu from /pub/stations/seed/ or via the web at: http://www.data.scec.org/ftp/stations/seed/. A compressed file containing all volumes (CI.dataless.gz) is available from the same location. ASCII RESP files are also available for individual stations and channels at the anonymous FTP site from /pub/stations/response/ or via the web at: http://www.data.scec.org/ftp/stations/response/.


D. DHI at the SCEDC

Work is currently underway to install a Data Handling Interface at the SCEDC. The Data Handling Interface (DHI) provides well-defined standardized methods to remotely access information from the SCEDC and other data centers worldwide. The DHI can be thought of as an Application Programming Interface (API) that can be used as a well-specified, standardized interface to any seismic data center. There are three different DHI servers being installed at the SCEDC: a Network Information Server (Station/Channel/Response information), a Seismogram Server, and an Event Server. The Network server is installed and running and the Seismogram Server is in the final testing stages. Once the Seismogram Server is installed, work will begin on the Event Server installation.

The DHI Servers are an offshoot of the FISSURES project supported by the IRIS DMS. FISSURES uses the distributed computing technology CORBA (Common Object Request Broker Architecture) to allow software systems to work across the Internet in a platform-independent and computer-language neutral manner. In the DHI, CORBA manages the socket connections, creating robust, reliable connections between clients and servers. By writing clients that can access information from a DHI server, one may easily access similar information from any data center that has DHI servers installed. Currently, DHI servers are running at the IRIS DMC, the NCEDC and the University of South Carolina.

For more information about the SCEDC DHI servers, please refer to: http://www.data.scec.org/research/DHI.html. This page will be updated as the status of the DHI servers progresses. General information about the DHI project is available directly from IRIS at: http://www.iris.edu/DHI/.

This work is supported by the IRIS DMS as part of its role in the NSF-funded SCEC-ITR project and has been facilitated by the prior efforts of the IRIS DMC, the NCEDC and the University of South Carolina.


E. Highlight: The SCEDC/SCSN Database System

The SCEDC Oracle 9i database is part of a database system that is used by the Data Center and the Southern California Seismic Network’s Real-Time System (RTS). SCSN data is processed by the RTS and events and supporting parametric information are immediately copied to the SCEDC database. Therefore, in addition to providing long-term storage and catalog information for the SCSN, the SCEDC database is also the source of information for network alarming, post-processing analysis and applications such as ShakeMap immediately following an earthquake.

The database system was designed as part of the TriNet project in 1999 with the following fundamental requirements:

  • Data from the RTS would be available to the archive in near-real time.
  • The system must operate 24/7, with unavailability due to maintenance or failures in software, hardware, and network connectivity minimized.
  • Rapid query access from a very large data set that includes events, locations, arrivals, amplitudes, codas and waveforms for southern California from 1932-present.

To achieve these design goals, our system is set up as follows:

  • The RTS has two servers, each with its own local database: one is primary, the other operates as a shadow. Event information from the RTS databases are replicated to the SCEDC databases within 4 seconds and applications accessing the SCEDC database use data generated by the primary system.
  • The SCEDC has two independent databases on two separate servers that are continually synchronized with one another.
  • The two sets of RTS and SCEDC servers are housed in separate buildings: the USGS building in Pasadena and in the Seismo Lab at Caltech. The systems can operate independently from either site.
  • The most common queries done on the database are for parameters of most the recent earthquakes (magnitude, location, time) and associated waveforms. The most frequent queries are done by SCEDC/SCSN internal applications, which poll the database at regular intervals to get the most up-to-date information about new events. Other common queries are catalog searches made by the public and researchers, either through the web catalog on the SCEDC website or STP. These queries also request parametric and waveform data, but may span over a long period of time. As a result, the database schema has been specifically designed to optimize for both types of searches and a number of database indexes have created to increase performance. In fact, indexes account for 43% of space used by objects in the SCEDC database.

Why use Oracle?

  • Caltech owns an Oracle Enterprise site license that provides the database server software and Advanced Replication feature. The SCEDC pays Oracle directly for licensing the partitioning feature.
  • Oracle allows objects such as tables and indexes to be partitioned, i.e., objects are divided into smaller, more manageable portions. The SCEDC database is currently is 49 Gigabytes and the largest table (waveform) is 9 Gigabytes. The tables which contain waveform, amplitude, and arrival data, are partitioned by year, so users can query the entire table, or they can reference a smaller piece of the table, which significantly improves performance. Partitioning is also a method to reduce maintenance because administration can be focused on particular portions of tables, dividing the maintenance process into more manageable segments.
  • Oracle database software with Advanced Replication allows our system to have multiple, continually-synchronized databases. Oracle also provides stored procedures and integration with the Java which is used by post-processing applications. Further information on Advanced Replication is included at the end of this article.

Looking to the future, we are exploring a switch to an open-source database system such as MySQL for our main systems. Clearly, migrating to a system with the same functionality and performance at a fraction of the cost is desirable and we have been impressed by the speed of MySQL in our performance tests. However, the current production release of MySQL (4.0) lacks a number of features that are used heavily by our system:

Multi-master replication
Stored procedures
Views
Triggers
Sequences

Many of these features are slated to be included in the 5.1 release, in the meantime we continue to monitor for new developments, including PostgreSQL (pgSQL).

Oracle Advanced Replication

The SCEDC/SCSN database system uses Oracle's Advanced Replication feature to replicate data among four databases. Each of our databases have a separate copy of the data... When any transactional statement (such as inserts, updates or deletes) is done on the database, it sends these instructions to the other databases for them to perform on their data.

The system employs both one-way and two-way (also known as "multi-master") replication. The RTS-to-SCEDC replication is one-way: the source database (RTS) pushes the data to the target database, but does not receive updates from the SCEDC databases. Data on the RTS are kept for one week before they are purged. The SCEDC archive databases use two-way, multi-master replication to push updates from either database to the other i.e., the target database is also a source, so the two databases are synchronized.

Advanced Replication can be thought of as a collection of tables, stored procedures, and triggers in the database. When a transactional statement, an insert for example, is executed on a replicated table, it sets off a trigger (a program stored inside the database), that instructs the database to store all necessary information to execute the original insert statement into a queue which is also stored inside the database. At regular intervals (every 4 seconds), an Oracle job is executed to look for any outstanding transactions in the queue. If any are found, they are pushed to the remote database site. If this push is successful, the database marks the transaction as sent. Another Oracle job (executed at every 10 minutes) then removes all sent transactions.

Benefits:

  • The ability to have two database archives that are continually synchronized allows the Data Center to load-balance applications which provides better performance.
  • Having two independent databases on separate servers allows for the possibility of failover if one database should become unavailable. Having replicated data means that each database has its own copy of the data, so if the database becomes disconnected from the system (e.g., in a network outage) the local database objects are still accessible.
  • By storing transactions in a queue, the system also has the ability to send these transactions to the target database at a later time, allowing the target database to resynchronize gracefully. Because the process of manually synchronizing databases can be time consuming, this functionality has proven to be very useful when a database becomes unavailable due to maintenance or unforeseen failure.
  • The Advanced Replication feature allows for interoperability, which means that the databases and servers do not have to be at the same version level or operating system (within limits). This allows the DBA flexibility in upgrading database versions and flexibility in choosing operating systems. For example, the SCEDC is currently testing an Oracle 10g development database on a Linux platform within our Oracle 9i Solaris system. It is also fairly easy to add additional databases and/or replicated objects in this system. For example, as part of our efforts to integrate with the NCEDC in Berkeley, the SCEDC is using this method of replication to share station data.

Costs:

  • Synchronizing databases every 4 seconds requires a substantial amount of database resource overhead. Especially costly are large batch operations where several millions rows are affected. Although normal transactions involving the seismic network never exceed 10,000 rows per transaction, activities such as legacy data migration or data quality control can severely impact system performance and they are usually done with replication temporarily suspended.
  • There is also added administrative cost needed to maintain replication triggers and stored procedures. Simple maintenance operations on a single database, such as altering table structure, become significantly more complicated within a replicated environment. Failure to run these procedures properly can result in the object being unavailable for update on all databases, not just the original target database.



F. Strong-Motion Naming and Aliases

The SCEDC archives strong-motion data from the National Strong-Motion Program (NSMP; network code NP) and the California Strong-Motion Instrumentation Program (CSMIP; network code CE). These organizations identify their stations with a numerical code which previously could not be processed by the SCSN real-time and post-processing systems. To work around this problem, the SCSN assigned an alias to each of these stations until a method of processing numerical station-names was developed.

A solution was recently implemented by the SCSN and most new strong-motion data is now available under the numerical name assigned by its originating network. In the short-term, users will need to be aware of both the numerical name and the alias applied by the SCSN. In the future, we aim to serve the data only under its numerical station identifier. The list of aliases is available at: http://www.data.scec.org/stations/stamapping.html

  NET   ALIAS   NUMBER   LOCATION DESCRIPTION
  CE   400K   24400   East Los Angeles, Obregon Park
  CE   G405   14405   Rolling Hills Estates, Vista School
  CE   J732   23732   San Bernardino, Devils Canyon Rd.
  CE   K851   24851   Los Angeles, W. 3rd & Cloverdale
  CE   K853   24853   Los Angeles, W. Temple & N. Virgil
  NP   BBA   5398   Burbank, Burbank Airport
  NP   BBB   5271   Bombay Beach, Hwy 111
  NP   BVH   5402   Beverly Hills, Civic Ctr and Foothill
  NP   CAB   5404   Calabasas, Pk Sorrento and Pk Granada
  NP   FLL   5401   Fillmore, Santa Clara & Chamberburg Rd
  NP   GRF   141   Los Angeles, Griffith Observatory
  NP   JAB   655   Sylmar, Balboa Blvd.
  NP   JFP   655   Sylmar, Balboa Blvd.
  NP   JGB   655   Sylmar, Balboa Blvd.
  NP   LAX   5399   Los Angeles International Airport
  NP   LT2   5030   Little Rock, Off Pearblossom Hwy (138)
  NP   OKV   5403   Oak View, Hwy. 33
  NP   SSW   5062   Calpatria, Salton Sea Wild Life Refuge
  NP   TCF   5081   Fernwood, Topanga Canyon Blvd.



G. USArray - BigFoot

The transportable array component of USArray ("Bigfoot") formally began operation in California in January, 2004 and will stay until 2007. The southern California contribution to USArray includes the 40 currently-operating SCSN broadband stations listed below. The 40 sps BH_ data from these stations will be transmitted from the SCSN facility in Pasadena to both the Array Network Facility (ANF) and the IRIS Data Management Center (DMC) for archiving.

SCSN stations contributing to USArray:

  STA   Station Name   Latitude   Longitude   Datalogger
 ARV  Arvin  35.1269  -118.83009  Q330
 BBR  Big Bear Solar Observatory  34.2623  -116.92075  Q730
 BC3  Big Chuckawalla Mountains  33.65515  -115.45366  Q4120
 BCC  Bear Creek Country Club  33.57508  -117.26119  Q730
 BEL  Belle Mountain  34.0006  -115.9982  Q730
 BFS  Mt. Baldy Ranger Station  34.237  -117.6582  Q730
 CIA  Catalina Island Airport  33.40186  -118.41372  Q4120
 CWC  Cottonwood Creek  36.43988  -118.08016  Q680
 DAN  Danby  34.63745  -115.38115  Q4120
 DEC  Green Verdugo Microwave  34.25353  -118.33383  Q730
 DVT  Desert View Tower  32.65915  -116.10061  Q730
 EDW2  Edwards Air Force Base 2  34.8811  -117.99388  Q330
 FMP  Fort Macarthur Park  33.71264  -118.29381  Q730
 FUR  Furnace Creek  36.46703  -116.86322  Q4120
 GLA  Glamis  33.05149  -114.82706  Q980
 GRA  Grapevine Ranger Station  36.99608  -117.36621  Q730
 GSC  Goldstone  35.30177  -116.80574  Q4120
 HEC  Hector  34.8294  -116.335  Q4120
 IRM  Iron Mountain Pumping Station  34.15738  -115.14513  Q4120
 ISA  Isabella  35.66278  -118.47403  Q4120
 LGU  Laguna Peak  34.10819  -119.06587  Q4120
 LRL  Laurel Mountain  35.47954  -117.68212  Q4120
 MPM  Manual Prospect Mine  36.05799  -117.48901  Q330
 MPP  McPherson Peak  34.88848  -119.81362  Q730
 NEE  NEEDLES  34.82482  -114.59942  Q980
 OSI  Osito Audit  34.6145  -118.7235  Q980
 PDM  Parker Dam  34.30336  -114.14152  Q4120
 RCT  Rector  36.30523  -119.243842  Q730
 RRX  Barstow Service Center  34.87533  -116.99684  Q4120
 SBC  Santa Barbara  34.44076  -119.71492  Q680
 SCI2  San Clemente Island 2  32.9799  -118.54697  Q330
 SCZ2  Santa Cruz Island 2  33.99543  -119.6351  Q330
 SDP  Sudden Peak  34.56547  -120.50137  Q730
 SHO  Shoshone  35.89953  -116.2753  Q4120
 SMM  Simmler  35.3142  -119.99581  Q730
 SNCC  San Nicolas Island  33.248  -119.524  Q980
 SWS  SAM W. STEWART  32.9408  -115.7958  Q4120
 TIN  Tinemaha  37.05422  -118.23009  Q4120
 TUQ  Turquoise Mountain  35.43584  -115.92389  Q4120
 VES  Vestal  35.84089  -119.08469  Q4120




Research Tools
General Earthquake Information
Stations/ Instrumentation
Educational Resources
About the Data Center >
• website map