DataLoader
Introduction
The purpose of this application is to extract data from properly formatted ASCII text, human readable, tab delimited files and upload the data to a database. For the LEO project we have chosen to use Oracle as our database.
This application is written entirely in Java (jre1.8.0_25) and requires Oracle Database JDBC Driver 12c (ojdbc7).
The application and dependencies are packed into Loader.jar which can be run from any computer with jre 1.8 or higher installed using the command:
java -jar Loader.jar
The main method does not take any parameters and should be run with an account with sufficient write privileges to all folders accessed by the application.
Configuration File
This application requires the configuration file “Dataloader.config” to exist in the application directory and be properly formatted. The format is a java.util.Properties “Configuration File”. Each configuration item is first defined by name (e.g. “configoption1”) and is equal to some value:
Configoption1 = “the value of config option 1”
Sections, used solely for reference, are displayed in brackets [section1]. A sample configuration file is shown below:
[General]
schemas = leo_mini,leo_general,leo_east
sources = c:\\temp\\in1,c:\\temp\\in2,c:\\temp\\in3
output = c:\\temp\\out
[Database]
user = mydatabaseusername
password = mydatabasepassword
tns = jdbc:oracle:thin:@mydatabase.arizona.edu:1521:dbsid
[Logging]
log = e:\\web
web = e:\\web
schemas | Comma separated list of schemas to use (each must exist within the database and must be accessible from the log-on account) |
sources | Comma separated list of folders containing measurement files (windows paths require the double ' ' since the string interprets the first backslash as an escape character |
output | The path for output/processed files |
user | Database user account for accessing the database (and schemas) |
password | Database password for the user account |
tns | TNS path for accessing the database |
log | Path to write the log file |
web | Path to write the files heartbeat.xml and sensorlist.xml |
Application Dependencies
This application requires the Java runtime environment (JRE1.8 or greater) be installed on the host machine.
The following SQL queries are executed:
Select SENSORID(INT), SENSORCODE(INT), SENSORTYPE(STR), METHODDESCRIPTION(STR), STATUS(CHR), LOGGERNAME(STR), MODULE(STR), CHANNEL(STR), ADDRESS(STR) from VIEW_SENSORS
Select CALIBRATIONPARAMETERID(INT), PARAMETERCODE(STR), PARAMETERVALUE(DBL) from CALIBRATIONPARAMETERS where sensorid = XXXX
This query is executed for each sensor
Select SENSORTYPE(STR), VARIABLEINDEX(INT), INVARIABLEID(INT), VARIABLENAME(STR), VARIABLEUNIT(STR), NODATAVALUE(DBL), VALIDRANGEFROM(DBL), VALIDRANGETO(DBL) from VIEW_INPUT_VARIABLES_METADATA
Insert into DATAVALUES (datavalue, localdatetime, sensorid, variableid) values (AAAA, to_date(‘BBBB/BB/BB BB:BB ', 'YYYY/MM/DD HH24:MI'), CC, DD)
** Where A, B, C, D are the actual values required **
Measurement file format
This application will only process properly formatted JSON files.
The "Measurement File" or "m-file" is an ASCII text, human readable, JSON format file which holds information pertaining to a measurement or group of measurements. In general, a measurement file will hold all data from sensors sampled at a particular time (i.e. - measure a group of five thermocouples at 11:30am on Dec. 1, 2012). However, it is not limited to this, as one could group a single sensor, or even a group of sensors, over an extended period of time. The format of this file is a JSON array of JSON objects containing information about the sensor and it's measurement.
Below is an example of a sensor named "MySensor" that measures speed and direction with raw measurements of voltage and counts.
,{"sensorcode":"MySensor","sensorid":102,"variablecode":"Direction","variableid":3,"value":224,"units":"Degrees","DateTime":"2016-03-18 19:00:00"}
,{"sensorcode":"MySensor","sensorid":102,"variablecode":"Counts","variableid":1,"value":6,"units":"#","DateTime":"2016-03-18 19:00:00"}
,{"sensorcode":"MySensor","sensorid":102,"variablecode":"Velocity","variableid":4,"value":0.112,"units":"m/s","DateTime":"2016-03-18 19:00:00"}
]
Useful fields:
- SensorCode - Unique name of a sensor (string)
- VariableCode - Unique name of a variable (string)
- Value - Numeric value of the measurement (float)
- Units - Abbreviation of the units of measure (string)
- DateTime - Time stamp of the measurement in format YYYY-MM-DD HH:MI:SS (string)
- SensorID - Database specific id for the sensor (unsigned integer)
- VariableID - Database specific id for the variable (unsigned integer)
- DBID - The user account that manages the sensor information (string)
- DataTable - The table within the schema to which data is stored (string)
The "Measurement File" or "m-file" is an ASCII text, human readable, tab-delimited file which holds information pertaining to a measurement or group of measurements. In general, a measurement file will hold all data from sensors sampled at a particular time (i.e. - measure a group of five thermocouples at 11:30am on Dec. 1, 2012). However, it is not limited to this, as one could group a single sensor, or even a group of sensors, over an extended period of time.
The measurement file has strict conventions. Measurement files uploaded to the LEO Data Server must follow these guidelines
- File must be properly named using a descriptor and a timestamp with file extension of .txt in the format DESCRIPTION_TIMESTAMP.txt
- The description can contain any valid file name characters EXCLUDING an underscore character “_” such as “My-Description_TIMESTAMP.txt” or “ThisIsADescription_TIMESTAMP.txt”
- The TIMESTAMP must follow an underscore and must follow the format YYYYMMDD. The timestamp can also contain additional timestamp data such as hour, minute, and second information, however, this must trail the initial required section (Description_YYYYMMDDhhmmss.txt)
- The timestamp is used to organize processed files into a date based hierarchy file structure.
- All data within the file must be tab delimited ASCII text. The format for a measurement is:
- “TIMESTAMP <tab> SENSORCODE <tab> datavalue0 <tab> datavalue1 <tab> … datavalueN”
- The TIMESTAMP must be formatted “YYYY-MM-DD hh:mm:ss”
- NOTE: the data loader removes the seconds from the timestamp
- SENSORCODE must be defined in the database and available in the dataloader application before processing. If a SENSORCODE is not recognized the file will fail to process and get moved to the “Error” output folder.
- Data values must match the predefined variable output order and should contain data for missing values (the software will insert the “missing data value” as defined in the database for any trailing missing values).
The current convention for file naming is “DeviceName_TimeStamp.txt”, and example filename is “cRIO-LEO-Center-P01_20131121173002.txt” for a device named “cRIO-LEO-Center-P01” and a timestamp 2013/11/21 at 17:30:02. An example of the file contents for a measurement file containing data for 8 sensors with 14 data points is shown below:
2013-11-21 17:30:01 LEO-C_10_-4_4_GMM222 0.121902 341.325300 |