DataLoader

Last modified by Michael Sibayan on 2022/05/03 16:48

Table of contents

Introduction
Configuration File
Application Dependencies
Measurement file format

Introduction

The purpose of this application is to extract data from properly formatted ASCII text, human readable, tab delimited files and upload the data to a database. For the LEO project we have chosen to use Oracle as our database.

This application is written entirely in Java (jre1.8.0_25) and requires Oracle Database JDBC Driver 12c (ojdbc7).

The application and dependencies are packed into Loader.jar which can be run from any computer with jre 1.8 or higher installed using the command:
java -jar Loader.jar
The main method does not take any parameters and should be run with an account with sufficient write privileges to all folders accessed by the application.

Configuration File

This application requires the configuration file “Dataloader.config” to exist in the application directory and be properly formatted. The format is a java.util.Properties “Configuration File”. Each configuration item is first defined by name (e.g. “configoption1”) and is equal to some value:

Configoption1 = “the value of config option 1”

Sections, used solely for reference, are displayed in brackets [section1]. A sample configuration file is shown below:

[General]
schemas = leo_mini,leo_general,leo_east
sources = c:\\temp\\in1,c:\\temp\\in2,c:\\temp\\in3
output = c:\\temp\\out

[Database]
user = mydatabaseusername
password = mydatabasepassword
tns = jdbc:oracle:thin:@mydatabase.arizona.edu:1521:dbsid

[Logging]
log = e:\\web
web = e:\\web

schemas	Comma separated list of schemas to use (each must exist within the database and must be accessible from the log-on account)
sources	Comma separated list of folders containing measurement files (windows paths require the double ' ' since the string interprets the first backslash as an escape character
output	The path for output/processed files
user	Database user account for accessing the database (and schemas)
password	Database password for the user account
tns	TNS path for accessing the database
log	Path to write the log file
web	Path to write the files heartbeat.xml and sensorlist.xml

Application Dependencies

This application requires the Java runtime environment (JRE1.8 or greater) be installed on the host machine.

The following SQL queries are executed:

Select SENSORID(INT), SENSORCODE(INT), SENSORTYPE(STR), METHODDESCRIPTION(STR), STATUS(CHR), LOGGERNAME(STR), MODULE(STR), CHANNEL(STR), ADDRESS(STR) from VIEW_SENSORS

Select CALIBRATIONPARAMETERID(INT), PARAMETERCODE(STR), PARAMETERVALUE(DBL) from CALIBRATIONPARAMETERS where sensorid = XXXX
This query is executed for each sensor

Select SENSORTYPE(STR), VARIABLEINDEX(INT), INVARIABLEID(INT), VARIABLENAME(STR), VARIABLEUNIT(STR), NODATAVALUE(DBL), VALIDRANGEFROM(DBL), VALIDRANGETO(DBL) from VIEW_INPUT_VARIABLES_METADATA

Insert into DATAVALUES (datavalue, localdatetime, sensorid, variableid) values (AAAA, to_date(‘BBBB/BB/BB BB:BB ', 'YYYY/MM/DD HH24:MI'), CC, DD)
** Where A, B, C, D are the actual values required **

Measurement file format

error This application will only process properly formatted JSON files.

As of January 2016 the Measurement File format has been changed from the old tab delimited variables, to the new JSON format.

The "Measurement File" or "m-file" is an ASCII text, human readable, JSON format file which holds information pertaining to a measurement or group of measurements. In general, a measurement file will hold all data from sensors sampled at a particular time (i.e. - measure a group of five thermocouples at 11:30am on Dec. 1, 2012). However, it is not limited to this, as one could group a single sensor, or even a group of sensors, over an extended period of time. The format of this file is a JSON array of JSON objects containing information about the sensor and it's measurement.

Below is an example of a sensor named "MySensor" that measures speed and direction with raw measurements of voltage and counts.

[{"sensorcode":"MySensor","sensorid":102,"variablecode":"Volts","variableid":1,"value":1.54,"units":"V","DateTime":"2016-03-18 19:00:00"}
,{"sensorcode":"MySensor","sensorid":102,"variablecode":"Direction","variableid":3,"value":224,"units":"Degrees","DateTime":"2016-03-18 19:00:00"}
,{"sensorcode":"MySensor","sensorid":102,"variablecode":"Counts","variableid":1,"value":6,"units":"#","DateTime":"2016-03-18 19:00:00"}
,{"sensorcode":"MySensor","sensorid":102,"variablecode":"Velocity","variableid":4,"value":0.112,"units":"m/s","DateTime":"2016-03-18 19:00:00"}
]

Useful fields:

SensorCode - Unique name of a sensor (string)
VariableCode - Unique name of a variable (string)
Value - Numeric value of the measurement (float)
Units - Abbreviation of the units of measure (string)
DateTime - Time stamp of the measurement in format YYYY-MM-DD HH:MI:SS (string)

SensorID - Database specific id for the sensor (unsigned integer)
VariableID - Database specific id for the variable (unsigned integer)
DBID - The user account that manages the sensor information (string)
DataTable - The table within the schema to which data is stored (string)

The information below is provided for reference and is no longer the standard format.

The "Measurement File" or "m-file" is an ASCII text, human readable, tab-delimited file which holds information pertaining to a measurement or group of measurements. In general, a measurement file will hold all data from sensors sampled at a particular time (i.e. - measure a group of five thermocouples at 11:30am on Dec. 1, 2012). However, it is not limited to this, as one could group a single sensor, or even a group of sensors, over an extended period of time.

The measurement file has strict conventions. Measurement files uploaded to the LEO Data Server must follow these guidelines

File must be properly named using a descriptor and a timestamp with file extension of .txt in the format DESCRIPTION_TIMESTAMP.txt
- The description can contain any valid file name characters EXCLUDING an underscore character “_” such as “My-Description_TIMESTAMP.txt” or “ThisIsADescription_TIMESTAMP.txt”
- The TIMESTAMP must follow an underscore and must follow the format YYYYMMDD. The timestamp can also contain additional timestamp data such as hour, minute, and second information, however, this must trail the initial required section (Description_YYYYMMDDhhmmss.txt)
- The timestamp is used to organize processed files into a date based hierarchy file structure.
All data within the file must be tab delimited ASCII text. The format for a measurement is:
- “TIMESTAMP <tab> SENSORCODE <tab> datavalue0 <tab> datavalue1 <tab> … datavalueN”
- The TIMESTAMP must be formatted “YYYY-MM-DD hh:mm:ss”
  - NOTE: the data loader removes the seconds from the timestamp
- SENSORCODE must be defined in the database and available in the dataloader application before processing. If a SENSORCODE is not recognized the file will fail to process and get moved to the “Error” output folder.
- Data values must match the predefined variable output order and should contain data for missing values (the software will insert the “missing data value” as defined in the database for any trailing missing values).

The current convention for file naming is “DeviceName_TimeStamp.txt”, and example filename is “cRIO-LEO-Center-P01_20131121173002.txt” for a device named “cRIO-LEO-Center-P01” and a timestamp 2013/11/21 at 17:30:02. An example of the file contents for a measurement file containing data for 8 sensors with 14 data points is shown below:

2013-11-21 17:30:01        LEO-C_10_-4_4_GMM222           0.121902              341.325300
2013-11-21 17:30:01        LEO-C_10_4_4_GMM222             0.142809              399.866131
2013-11-21 17:30:01        LEO-C_6_-4_1_HFP-1                     -0.001208             -18.838927
2013-11-21 17:30:01        LEO-C_6_0_1_HFP-1                      0.001314              20.860207
2013-11-21 17:30:01        LEO-C_2_-3_0_Model3130          0.000751              55597.257283
2013-11-21 17:30:01        LEO-C_0_4_0_PE102                      0.000000              0.000000
2013-11-21 17:30:01        LEO-C_6_-4_1_TCAV                      23.783472
2013-11-21 17:30:01        LEO-C_6_0_1_TCAV                       23.336414