file

Log network data.

The file module provides methods and objects designed to simplify writing and reading network traffic logs.

The main objects responsible for logging network data are:

  • WriteFile for formatting and writing network data from a single connection to log file(s).
  • ReadFile for reading data from log file(s) representing a single network connection.
  • LogConnection logs data from a single connection to log file(s).

These objects can write/read arbitrary data to/from a log file(s). The network connection can be specified by either a Connection or MCL Message object. The following objects can only write/read Message objects to/from a log file(s):

  • LogNetwork logs data from multiple connections to a directory of log files.
  • ReadDirectory for reading data from a directory of log files representing multiple network connections.

Example: Log raw data

The following code illustrates writing (LogConnection) and reading (ReadFile) raw data to/from log files:

import time
from mcl import ReadFile
from mcl import LogConnection
from mcl import RawBroadcaster
from mcl.network.udp import Connection

# Path (prefix) to log file.
prefix = os.path.join(EXAMPLE_PATH, 'example')

# Create UDP connection.
connection = Connection('ff15::c73d:ce41:ea8b:c0a0')

# Log raw data transmissions.
logger = LogConnection(prefix, connection)

# Create raw broadcaster from IPv6 connection and broadcast data.
broadcaster = RawBroadcaster(connection)
broadcaster.publish('hello world')
time.sleep(0.1)

# Close broadcaster and stop logger.
broadcaster.close()
logger.close()

# Ensure that the log file exists.
log_file = os.path.join(EXAMPLE_PATH, 'example.log')
print os.path.exists(log_file)

# Read contents of log file.
rf = ReadFile(log_file)
print rf.read()['payload']

Example: Log message data

The following code illustrates writing (LogConnection) and reading (ReadFile) a single message type to/from log files. Note that the example is largely the same as the previous example:

import time
from mcl import Message
from mcl import ReadFile
from mcl import LogConnection
from mcl import MessageBroadcaster
from mcl.network.udp import Connection

# Path (prefix) to log file.
prefix = os.path.join(EXAMPLE_PATH, 'example')

# Create MCL message.
class ExampleMessage(Message):
    mandatory = ('data',)
    connection = Connection('ff15::c43d:ce41:ea7b:c1b0')

# Log raw data transmissions.
logger = LogConnection(prefix, ExampleMessage)

# Create raw broadcaster from IPv6 connection and broadcast data.
broadcaster = MessageBroadcaster(ExampleMessage)
broadcaster.publish(ExampleMessage(data='hello world'))
time.sleep(0.1)

# Close broadcaster and stop logger.
broadcaster.close()
logger.close()

# Ensure that the log file exists.
log_file = os.path.join(EXAMPLE_PATH, 'example.log')
print os.path.exists(log_file)

# Read contents of log file as an unformatted dictionary.
rf = ReadFile(log_file)
msg = rf.read()['payload']
print type(msg)
print msg

# Read contents of log file as an ExampleMessage.
rf = ReadFile(log_file, message=True)
print type(rf.read()['payload'])

Example: Log network data

The following code illustrates writing (LogNetwork) and reading (ReadDirectory) multiple network message types to/from log files.

import time
from mcl import Message
from mcl import LogNetwork
from mcl import ReadDirectory
from mcl import MessageBroadcaster
from mcl.network.udp import Connection

# Create MCL messages.
class ExampleMessageA(Message):
    mandatory = ('string',)
    connection = Connection('ff15::c43d:ce41:ae5b:d1b0')

class ExampleMessageB(Message):
    mandatory = ('number',)
    connection = Connection('ff15::c43d:ce41:ae5b:d1b1')

# Log network traffic.
messages = [ExampleMessageA, ExampleMessageB]
logger = LogNetwork(EXAMPLE_PATH, messages)
logger.open()
log_path = logger.directory

# Create raw broadcaster from IPv6 connection and broadcast data.
broadcaster_A = MessageBroadcaster(ExampleMessageA)
broadcaster_B = MessageBroadcaster(ExampleMessageB)
broadcaster_A.publish(ExampleMessageA(string='one')); time.sleep(0.1)
broadcaster_A.publish(ExampleMessageA(string='two')); time.sleep(0.1)
broadcaster_B.publish(ExampleMessageB(number=1));     time.sleep(0.1)
broadcaster_B.publish(ExampleMessageB(number=2));     time.sleep(0.1)

# Close broadcasters and stop logger.
broadcaster_A.close()
broadcaster_B.close()
logger.close()

# Ensure that the log directory exists.
print os.path.exists(log_path)

# Read contents of log file as an unformatted dictionary. Note that each
# message type has been recorded in a separate .log file.
rf = ReadDirectory(log_path)
for i in range(4):
    msg = rf.read()['payload']
    print type(msg), msg

# Like ReadFile(), ReadDirectory() can return the logged data as MCL
# messages.
rf = ReadDirectory(log_path, message=True)
for i in range(4):
    msg = rf.read()['payload']
    print type(msg), msg

Functions


retrieve_git_hash(repository_path)[source]

Retrieve git hash from repository.

Parameters:repository_path (str) – Path to git repository (.git)
Returns:Current hash of git repository. If the git hash coult not be retrieved, None is returned.
Return type:str
Raises:IOError – If the repository path does not exist.

Classes


class LogConnection(prefix, connection, revision=None, time_origin=None, max_entries=None, max_time=None, open_init=True)[source]

Open a connection and record data to file.

Parameters:
  • prefix (str) – Prefix used for log file(s). The extension is excluded and is handled by WriteFile (to facilitate split logs). For example the prefix ‘./data/TestMessage’ will log data to the file ‘./data/TestMessage.log’ and will log data to the files ‘./data/TestMessage_<NNN>.log’ for split log files (where NNN is incremented for each new split log).
  • connection (Connection) – MCL Message object to record to log file(s).
  • revision (str) – Revision of code used to generate logs. For instance, the hash identifying a commit in a Git repository, can be used to record what version of code was used during logging. The function retrieve_git_hash() can be used for this purpose. If revision is set to None (default), no revision will be recorded in the log header.
  • time_origin (datetime.datetime) – Time origin used to calculate elapsed time during logging (time data was received - time origin). This option allows the time origin to be synchronised across multiple log files. If set to None, the time origin will be set to the time the first logged message was received. This results in the first logged item having an elapsed time of zero.
  • max_entries (int) – Maximum number of entries to record per log file. If set, a new log file will be created once the maximum number of entries has been recorded. Files follow the naming scheme ‘<prefix>_<NNN>.log’ where NNN is incremented for each new log file. If set to None all data will be logged to a single file called ‘<prefix>.log’. This option can be used in combination with max_time.
  • max_time (int) – Maximum length of time, in seconds, to log data. If set, a new log file will be created after the maximum length of time has elapsed. Files follow the naming scheme ‘<prefix>_<NNN>.log’ where NNN is incremented for each new log file. If set to None all data will be logged to a single file called ‘<prefix>.log’. This option can be used in combination with max_entries.
  • open_init (bool) – If set to True, open connection immediately after initialisation (default). If set to False only open connection and log data when open() is called.
max_entries

int

Maximum number of entries to record per log file before splitting.

max_time

int

Maximum length of time, in seconds, to log data before splitting.

close()[source]

Stop logging connection data.

Returns:Returns True if the connection logger was closed. If the connection logger was already closed, the request is ignored and the method returns False.
Return type:bool
is_alive()[source]

Return whether the object is listening for broadcasts.

Returns:Returns True if the object is recording connection data. Returns False if the object is NOT recording connection data.
Return type:bool
open()[source]

Start logging connection data.

Returns:Returns True if the connection logger was started. If the connection logger was already started, the request is ignored and the method returns False.
Return type:bool

class LogNetwork(directory, messages, revision=None, max_entries=None, max_time=None, open_init=True)[source]

Dump network traffic to files.

The LogNetwork object records network traffic to multiple log files. The input directory specifies the location to create a directory, using the following format:

<year><month><day>T<hours><minutes><seconds>_<hostname>

for logging network traffic. The input messages specifies a list of MCL Message objects to record. A log file is created for each message specified in the input messages. For instance if message specifies a configuration for receiving MessageA and MessageB objects, the following directory tree will be created (almost midnight on December 31st 1999):

directory/19991231T235959_host/
                              |-MessageA.log
                              |-MessageB.log

If split logging has been enabled (by the number of entries, elapsed time or both) the log files will be appended with an incrementing counter:

directory/19991231T235959_host/
                              |-MessageA_000.log
                              |-MessageA_001.log
                              |-MessageB_000.log
                              |-MessageB_001.log
                              |-MessageB_002.log
                              |-MessageB_003.log
Parameters:
  • directory (str) – Path to record a directory of network traffic.
  • messages (list) – List of Message objects specifying the network traffic to be logged.
  • revision (str) – Revision of code used to generate logs. For instance, the hash identifying a commit in a Git repository, can be used to record what version of code was used during logging. The function retrieve_git_hash() can be used for this purpose. If revision is set to None (default), no revision will be recorded in the log header.
  • max_entries (int) – Maximum number of entries to record per log file. If set, a new log file will be created once the maximum number of entries has been recorded. If set to None all data will be logged to a single file. This option can be used in combination with max_time.
  • max_time (int) – Maximum length of time, in seconds, to log data. If set, a new log file will be created after the maximum length of time has elapsed. If set to None all data will be logged to a single file. This option can be used in combination with max_entries.
  • open_init (bool) – If set to True, open connection immediately after initialisation (default). If set to False only open connection and log data when open() is called.
messages

list

List of Message objects specifying which network traffic is being logged.

root_directory

str

Location where new log directories are created. This path returns the input specified by the optional directory argument.

directory

str

String specifying the directory where data is being recorded. This attribute is set to none None if the data is NOT being logged to file (stopped state). If the logger is recording data, this attribute is returned as a full path to a newly created directory in the specified directory input using the following the format:

<year><month><day>T<hours><minutes><seconds>_<hostname>
max_entries

int

Maximum number of entries to record per log file. If set to None all data will be logged to a single file.

max_time

int

Maximum length of time, in seconds, to log data. If set to None all data will be logged to a single file.

Raises:
  • IOError – If the log directory does not exist.
  • TypeError – If the any of the inputs are an incorrect type.
close()[source]

Close connections and stop logging network data.

Returns:Returns True if logging was stopped. If network data is currently NOT being logged, the request is ignored and the method returns False.
Return type:bool
open()[source]

Open connections and start logging network data.

Returns:Returns True if logging was started. If network data is currently being logged, the request is ignored and the method returns False.
Return type:bool

class ReadDirectory(source, min_time=None, max_time=None, message=False, ignore_raw=True)[source]

Read data from multiple log files in time order.

The ReadDirectory object reads data from multiple network dump log files in a common directory. The directory may contain single or split log files (see WriteFile and ReadFile).

Note

ReadDirectory assumes the log files have been created by WriteFile and searches for files with the .log extension in the specified directory. ReadDirectory can operate on directories which contain non .log files. Renaming .log files or including .log files which were not formatted by WriteFile is likely to cause an error in ReadDirectory.

Parameters:
  • source (str) – Path to directory containing log files.
  • min_time (float) – Minimum time to extract from log file in seconds.
  • max_time (float) – Maximum time to extract from log file in seconds.
  • message (bool) – If set to False (default), the logged data is returned ‘raw’. If set to True logged data will automatically be decoded into the MCL message type stored in the log file header. Note: to read data as MCL messages, the messages must be loaded into the namespace.
  • ignore_raw (bool) – If set to True (default), any raw log files in the path source will be ignored. If set to False an exception will be raised if any raw logs are encountered.
messages

list

List of Message object stored in the directory of log files.

min_time

float

Minimum time to extract from log file in seconds.

max_time

float

Maximum time to extract from log file in seconds.

Raises:
  • TypeError – If the any of the inputs are an incorrect type.
  • IOError – If the log file/directory does not exist.
  • ValueError – If the minimum time is greater than the maximum time.
is_data_pending()[source]

Return whether data is available for reading.

Returns:Returns True if more data is available. If all data has been read from the log file(s), False is returned.
Return type:bool
read()[source]

Read data from the log files.

Read a line of data from the log files. The data is parsed into a dictionary containing the following fields:

{'elapsed_time: <float>,
 'topic': <string>,
 'payload': <dict or :class:`.Message`>}

where:

  • elapsed_time is the time elapsed between creating the log file and recording the network data.
  • topic is the topic associated with the network data during the broadcast.
  • payload: is the network data, delivered as a dictionary or MCL Message object.

If all network data has been read from the log files (directory), None is returned.

Returns:A dictionary containing, the time elapsed when the line of text was recorded. The topic associated with the message broadcast and a populated MCL message object.
Return type:dict
Raises:IOError – If an error was encountered during reading.
reset()[source]

Reset object and read data from the beginning of the log file(s).


class ReadFile(filename, min_time=None, max_time=None, message=False)[source]

Read data from a log file.

The ReadFile object reads data from network dump log files (see WriteFile). If the data has been logged to a single file, ReadFile can read the data directly from the file:

rf = ReadFile('logs/TestMessage.log')

If the log files have been split, ReadFile can read from the first split to the last split (in the directory) by specifying the prefix of the logs:

rf = ReadFile('logs/TestMessage')

A portion of a split log file can be read by specifying the path to the specific portion:

rf = ReadFile('logs/TestMessage_002.log')

Note that if a portion of a split log file is read using ReadFile, header information will not be available. Header information is only recoreded in the first portion.

Parameters:
  • filename (str) – Prefix/Path to log file. If a prefix is given, ReadFile will assume the log files have been split into numbered chunks. For example, if ‘data/TestMessage’ is specified, ReadFile will read all ‘data/TestMessage_*.log’ files in sequence. If the path to a log file is fully specified, ReadFile will only read the contents of that file (e.g. ‘data/TestMessage_000.log’).
  • min_time (float) – Minimum time to extract from log file.
  • max_time (float) – Maximum time to extract from log file.
  • message (bool or str or Message) – If set to False (default), the logged data is returned ‘raw’. If set to True logged data will automatically be decoded into the MCL message type stored in the log file header. To force the reader to unpack logged data as a specific MCL message type, set this argument to the required Message type or to the string name of the required message type. This option can be useful for reading unnamed messages or debugging log files. Use with caution. Note: to read data as MCL messages, the messages must be loaded into the namespace.
header

dict

Contents of the log file header. If the log file header is not available None is returned, otherwise the following dictionary is returned:

dct = {'text': string,
       'end': int,
       'version': string,
       'revision': string,
       'created': string,
       'type': :data:`.None` or :class:`.Message`}
where:
  • <text> is the header text
  • <end> Pointer to the end of the header
  • <version> Version used to record log files
  • <revision> Git hash of version used to log data
  • <created> Time when log file was created
  • <message> is the type, recorded in the header, used to represent the logged data (either None or Message)
min_time

float

Minimum time to extract from log file.

max_time

float

Maximum time to extract from log file.

Raises:
  • TypeError – If the any of the inputs are an incorrect type.
  • IOError – If the log file/directory does not exist.
  • ValueError – If the minimum time is greater than the maximum time.
is_data_pending()[source]

Return whether data is available for reading.

Returns:Returns True if more data is available. If all data has been read from the log file(s), False is returned.
Return type:bool
read()[source]

Read data from the log file(s).

Read one line of data from the log file(s). The data is parsed into a dictionary containing the following fields:

dct = {'elapsed_time: <float>,
       'topic': <string>,
       'payload': dict or <:class:`.Message` object>}

where:

  • elapsed_time is the time elapsed between creating the log file and recording the network data.
  • topic is the topic associated with the network data during the broadcast.
  • payload: is the network data, delivered as a dictionary or MCL Message object.

If all data has been read from the log file, None is returned.

Returns:A dictionary containing, the time elapsed when the line of text was recorded. The topic associated with the message broadcast and a populated MCL message object.
Return type:dict
Raises:IOError – If an error was encountered during reading.
reset()[source]

Reset object and read data from the beginning of the log file(s).


class WriteFile(prefix, connection, revision=None, time_origin=None, max_entries=None, max_time=None)[source]

Write network messages to log file(s).

The WriteFile object is used for writing network messages to log file(s). To log data to a single file, use:

wf = WriteFile(fname, Message)

WriteFile can be configures to split the log files by number of entries or time. To configure WriteFile to split log files according to the number of entries, instantiate the object using:

wf = WriteFile(fname, Message, max_entries=10)

in the above example, each log file will accumulate 10 entries before closing and starting a new log file. To configure WriteFile to split log files according to time, instantiate the object using:

wf = WriteFile(fname, Message, max_time=60)

in the above example, each log file will accumulate data for 60 seconds before closing and starting a new log file. For example:

wf = WriteFile(fname, Message, max_entries=10, max_time=60)

will accumulate a maximum of 10 entries for a maximum of 60 seconds before closing and starting a new log file. The first condition to be breached will cause a new log file to be created.

Parameters:
  • prefix (str) – Prefix used for log file(s). The extension is excluded and is handled by WriteFile (to facilitate split logs). For example the prefix ‘./data/TestMessage’ will log data to the file ‘./data/TestMessage.log’ and will log data to the files ‘./data/TestMessage_<NNN>.log’ for split log files (where NNN is incremented for each new split log).
  • connection (Connection or Message) – an instance of a MCL connection object or a reference to a MCL message type to record to log file(s).
  • revision (str) – Revision of code used to generate logs. For instance, the hash identifying a commit in a Git repository, can be used to record what version of code was used during logging. The function retrieve_git_hash() can be used for this purpose. If revision is set to None (default), no revision will be recorded in the log header.
  • time_origin (datetime.datetime) – UTC time origin used to calculate elapsed time during logging (time data was received - time origin). This option allows the time origin to be synchronised across multiple log files. If set to None, the time origin will be set to the time the first logged message was received. This results in the first logged item having an elapsed time of zero.
  • max_entries (int) – Maximum number of entries to record per log file. If set, a new log file will be created once the maximum number of entries has been recorded. Files follow the naming scheme ‘<prefix>_<NNN>.log’ where NNN is incremented for each new log file. If set to None all data will be logged to a single file called ‘<prefix>.log’. This option can be used in combination with max_time.
  • max_time (int) – Maximum length of time, in seconds, to log data. If set, a new log file will be created after the maximum length of time has elapsed. Files follow the naming scheme ‘<prefix>_<NNN>.log’ where NNN is incremented for each new log file. If set to None all data will be logged to a single file called ‘<prefix>.log’. This option can be used in combination with max_entries.
max_entries

int

Maximum number of entries to record per log file before splitting.

max_time

int

Maximum length of time, in seconds, to log data before splitting.

Raises:
  • IOError – If the write directory does not exist.
  • ValueError – If any of the inputs are improperly specified.
close()[source]

Close log files.

The WriteFile.close() method finalises the logging process by changing the extension of the log file from ‘.tmp’ to ‘.log’. If WriteFile.close() is NOT called, no data will be lost, however the log file will not be given the ‘.log’ extension.

write(message)[source]

Write network data to a file.

The WriteFile.write() method writes network data to a log file. WriteFile.write() expects network data to be input as a dictionary with the following fields:

message = {'topic': str(),
           'payload': object(),
           'time_received': datetime}

where:

  • topic is the topic associated with the network data during the broadcast.
  • payload: is the network data to be recorded to file.
  • time_received is a datetime.datetime object used to record the time the network data was received.
Parameters:message (dict) – Network data to be recorded. The network data must be stored as a dictionary with the time the data was received, the topic associated with the broadcast and the message payload.