Searching and Sorting Algorithms - Network Analyzer
When working with arrays, two common tasks are searching for data and sorting the data using a certain order, ascending or descending, alphabetically or numerically. To efficiently carry out these tasks, searching and sorting algorithms are used. One simple searching algorithm is linear search. Two well known sorting algorithms are Selection Sort and Bubble Sort. In this laboratory experience, you will complete an application to monitor network flow to practice the implementation of algorithms for searching and sorting.
Objectives:
Implement a modification of the linear search algorithm.
Practice sorting arrays with the Selection Sort and the Bubble Sort.
Practice the use of objects, decision and repetition structures.
Learn some methods of the C++
vector
class.
Pre-Lab:
Before coming to the laboratory session you should have:
Reviewed the algorithms for linear search, Selection Sort and Bubble Sort.
Studied the method
size()
of the C++vector
class.Familiarize yourself with the methods of the
Packet
class included in thepacket.h
file of theNetworkAnalyzer
project.Studied the concepts and instructions for the laboratory session.
Taken the Pre-Lab quiz, available in Moodle.
Communication between Computers
Computers communicate with each other through the Internet Protocol (IP). When a computer sends information to another computer it is sent via Internet Packets that contain the Internet address of the sender computer (source address), and the Internet address of the receiving computer (destination address). Internet addresses are used to guide information from one computer to another, but, once a packet arrives to its destination, who is supposed to receive the information? Which application should receive the information?
The internet packets should also specify the application that sends the information and the application that should receive it. We can think that the Internet address is the mailing address of a house, and the applications that send and receive the information are the people that send and receive the correspondence. To send a letter by mail, one must specify to whom the letter is being sent. This corresponds to specifying the application that receives the information. To identify the source application and the destination application, the Internet protocol uses what is known as port numbers. Looking at the information on the packet, the addresses and ports of the source and destination can be identified.
For instance, when your computer is in the laboratory contacting the Moodle server, the packets that carry the information from your computer to the Moodle server contain the source address of the laboratory computer and the destination address of the Moodle server. The source port is web browser and the destination port is the web server that serves Moodle.
Internet addresses are stored on 4 bytes (32 bits), normally presented to users as strings of 4 decimal values. Each decimal value between 0 and 255 is the decimal representation of one of the 4 bytes: "(0-255).(0-255).(0-255).(0-255)". Examples of IP addresses are: 10.0.1.10
, 192.168.10.11
, 136.145.54.10
.
Port numbers are stored on 2 bytes (16 bits). Therefore, port numbers range from 0 to 65535. Some port numbers assigned to known service applications are: 22 for ssh
, 23 for telnet
, 25 smtp
, 80 for http
.
The application that you will see today can be used to monitor what is known as NetFlows. One NetFlow is composed of the aggregation of the packets of an unidirectional communication between the applications of two computers. For instance, a NetFlow can be composed of the packets used to send the information from your browser to the http
application of the web server running Moodle.
Figure 1 shows the interface for the Network Analyzer application.
Figure 1. Interface to work with the Network Analyzer application.
Each row in the table in Figure 1 is a NetFlow composed of the source and destination address, the source and destination port, the number of packets, and the number of octets (8 bits) in one unidirectional communication between the source and destination computer, from the source port to the destination port.
The application that you will complete gives the user the ability to analyze the status of the network. Among others it allows to:
- identify which communications transmit the largests amount of data
- identify which applications are running in certain computers
- identify which computers transmit a large amount of packets compared to the amount of data
Libraries
For this laboratory experience, you will use objects of class vector
, which are similar to arrays, and you will need to know how to use the size()
method of the vector
class objects. You should also familiarize yourself with the Packet
class defined in this project. The Packet.h
library contains the setters and getters necessary to fill the information of a NetFlow packet.
Packet
objects, and then invoke the method getSrcPort()
on the vector's fourth element (assume that the vector is already full of information when declared). Which of the following instructions will work?
Packet
objects correctly, but invokes the method getSrcPort()
incorrectly. We want to find information in the fourth element. The getSrcPort()
method can be invoked on Packet
objects. The netFlow
object is a vector, not a Packet
.
The second option does not declare the vector correctly. Additionally, the instruction netFlow[i.getSrcPort()];
is not invoking the method getSrcPort()
on one of the netFlow
objects, but instead an i
variable (that has not been defined). The fourth option declares the vector correctly, but once again uses netFlow[i.getSrcPort()]
.
Packet
objects, filtering the elements that have the same Destination Address and Source Address and disabling the rest. Which of the following is correct?netdata.length()
, which corresponds to a string
, but vectors use the function size()
. The fourth options tries to assign a 0
to the corresponding position in the vector netdata
, which is incorrect.
Laboratory Session:
The application you will complete today allows the user to open a file that contains the NetFlow records using the "Open NetFlow File" button, stores the records in a vector of packets, and displays them in the table of the interface as shown in Figure 2.
Figure 2. Interface or the Network Analyzer application with the network flow data packets.
The file you will use for the exercises, network_sample.dat
, contains the NetFlow packet records with the following format:
Source_Address Destination_Address Source_Port Destination_Port Octects Packets
136.145.181.130 136.145.181.227 5 33 764 16
136.145.181.101 136.145.181.213 37 40 48 4
136.145.181.151 136.145.181.60 45 21 316 9
136.145.181.165 136.145.181.19 8 39 795 24
136.145.181.53 136.145.181.174 34 21 79 22
136.145.181.40 136.145.181.140 58 22 186 5
136.145.181.33 136.145.181.209 76 25 614 13
136.145.181.175 136.145.181.38 30 39 100 8
136.145.181.126 136.145.181.99 57 33 965 14
Exercise 1 - Familiarize Yourself with the Packet
Class.
Instructions
Load the project
NetworkAnalyzer
intoQtCreator
. There are two ways to do this:- Using the virtual machine: Double click the file
NetworkAnalyzer.pro
located in the folder/home/eip/labs/sorting-networkanalyzer
of your virtual machine. - Downloading the project’s folder from
Bitbucket
: Use a terminal and write the commandgit clone http:/bitbucket.org/eip-uprrp/sorting-networkanalyzer
to download the foldersorting-networkanalyzer
fromBitbucket
. Double click the fileNetworkAnalyzer.pro
located in the folder that you downloaded to your computer.
- Using the virtual machine: Double click the file
Open the
packet.cpp
file. Study the attributes and methods of thePacket
class.The data that the NetworkAnalyzer application manages are stored in a vector of
Packet
objects. The vector is a class provided in the C++ Standard Template Library that stores data and objects of the same type. As with arrays, vectors assign an index (starting with the index 0) to each element they store. The i-th element of a vectorV
can be accessed usingV[i]
. The main difference between vectors and arrays is that the size of vectors can change, a size doesn't have to be defined beforehand, contrary to arrays.A method
met
of the object in positioni
in the vector can be accessed writingV[i].met()
. The contents of all of the attributes of an object can be assigned to another object of the same class “at once”. For example, you can assign the contents of all of the attributes of the object in the positionk
of the vectorV
the corresponding attributes of the object in the positioni
of the vectorV
by writingV[i]=V[k]
.
Exercise 2 - Filter Communications
Instructions
Open the file
Filter.cpp
. In this exercise you will complete the following functions that can be found in the file:FilterBySrcAddr
FilterByDstAddr
FilterBySrcPort
FilterByDstPort
Each one of the functions receives a vector of objects of class
Packet
and a search key. Each function (notice their names) is related to an attribute of thePacket
class and should filter the packets in the vector that correspond to the key. To filter these packets you will use a modified version of the linear search algorithm that consists of a sequential search to find each occurrence of a particular record of data. In each of the functions, the algorithm must search through all the packets in the vector and disable the packets that are not equal to the search key. To deactivate the packet use thedisable()
method of thePacket
class. The filter consists of keeping only the packets that correspond to the key.For instance, if you are filtering by
Source Address
and the search key is 136.145.181.130, theFilterBySrcAddr
function will keep only the packets in the vector whoseSource Address
is 136.145.181.130 and deactivate the others.The following figure is an screenshot of the application interface after filtering the data by
Source Address
with search key 136.145.181.130.
Figure 3. Interface of the Network Analyzer application with the network flow packets filtered by Source Address
with the key 136.145.181.130.
Exercise 3 - Sorting Data
Instructions
Open the
Sort.cpp
file. In this exercise you will complete the following functions that can be found in the file:SortBySrcAddr
SortByDstAddr
SortBySrcPort
SortByDstPort
Each one of these functions receives a vector of
Packet
objects. Each function (notice their names) is related to an attribute of thePacket
class and should order the packets in the vector according to the attribute of interest.The following figure is a screenshot of the application interface after sorting the data by
Source Address
.
Figure 4. Interface of the Network Analyzer application with the network flow packets ordered by Source Address
.
Complete the
SortBySrcAddr
function implementing the Bubble Sort algorithm, sorting the packets bysource address
.Complete the
SortByDstAddr
function implementing the Selection Sort algorithm, sorting the packets bydestination addres
.Complete the
SortBySrcPort
function using the Bubble Sort algorithm, sorting the packets bysource port
.Complete the
SortByDstPort
function using the Selection Sort algorithm, sorting the packets bydestination port
.
Deliverables
Use "Deliverable 1" in Moodle to upload the
Filter.cpp
file that you modified in Exercise 1. Remember to use good programming techniques, include the names of the programmers involved, and to document your program.Use "Deliverable 2" in Moodle to upload the
Sort.cpp
file that you modified in Exercise 2. Remember to use good programming techniques, include the names of the programmers involved, and to document your program.
References
[1] http://www.nextgigsystems.com/aggregation_switches/gigamon_filter_packets.html
[2] http://metaanalytics.org/web-analytics/social-network-analysis/
[3] http://www.java2novice.com/java-sorting-algorithms/quick-sort/