Projects

1. Stock Market Analyzer

Formerly Known as the “Stock Market Historical Analyzer” until the code has been refactored and additional features have been implemented as to allow for intraday data analysis.

(Late September 2022)

 

This project Interacts with an API (https://marketstack.com) that can provide up to 10 years of historical data and same day data as short as 15-minute intervals of intraday data for my key. This project makes use of  “matplotlibcpp” library which allows for the simple visualization of data by running matplotlib library for Python behind the scenes.  

The primary purpose of this program is to request data, handle returned data expected in JSON format, and graph the final result.

 

The following demo showcases the results of calculating the simple moving average for a given stock given a date range as well as the year to date SMA.

Demo Time Stamps:

0:00-0:14 – Compile Time & Startup

0:14-0:28 – Entering Ticker Symbol & Dates

0:29-0:45 – NYSEARCA: SPY 15 Minute Interval compared with Google seach result (9/28/22). 

0:46-1:20 – Graphing Multiple Stocks from predifined list. 3 Hour Interval. 

1:21-1:57 – Scrolling through code.

2. Examining the usefulness of estimated values from data of an nxn Matrix processed in O(n) time given known parameters.

Exploring this topic stems from the idea of potential real world scenarios in which we may already know certain parameters of a given data set (*such as nxn Matrix*). This analysis will focus on extracting the average value of an nxn Matrix by calculating the trace (diagonal) of the matrix and dividing by the matrix dimension ‘n’. Let it be known that working with any nxn Matrix filled with random and uncorrelated numbers will lead to unreliable performance and unusable data. However, constraining the data to known parameters perhaps may yield different results.

Testing Parameters to take as Guarantees

  • All values within the matrix fall within a given specific inclusive range [A-B].
  • Values within the Matrix follow a uniform distribution.
  • Each row is pre-sorted in ascending order.
  • Values within matrix are of type double and non-negative.

Example situation

We are given the fair price of a commodity and a range [A-B] where the fair price is also the midpoint of the range.

We then receive an nxn Matrix that follow the parameters above.

Each row represents an exchange at a specific location.

Each Column represents an asset that follows the price movement of the commodity. The values within the matrix are the asset’s last price at different locations.

Given a nxn Matrix that follow the parameters listed above. Attempt to calculate the average of all the values in the matrix O(n) time, if the average is greater than the midpoint of the range return “1” else return “0”.

The returned result is expected to be wrong sometimes compared to taking the time to calculate the true average of the matrix. Keep track of how often the returned result is incorrect and by how much. Ultimately given a range and knowing that the numbers follow a distribution it’s easy to say that the average will be close to the midpoint of the range. However a majority of the time the true average will lean towards being above or below and rarely exactly at the midpoint.

To determine if we can count the estimated average output as being correct both the true average and the estimated average would have to return the same value. That is they are both above the fair price or both equal to or below the fair price. If the true average ends up being above the fair price and the estimated below and vice versa we count that output as incorrect.

Determine the usability.

How testing will be done

  • Random numbers will fill in the matrix and rows will be sorted to emulate the given Matrix.
  • Dimension of size 5 will be used for the matrix.
  • Matrices with the following spreads of 100, 50, and 25 will be tested 3 times each for a total of 9 tests.
  • Due to the numbers generated are pseudo-random numbers, spreads for every single test will vary. (101,100,99,51,50,49,26,25,24).
  • To try to ensure random numbers are to be generated and no repeated values the start range and end range converge towards the midpoint.
  • Results will be copied from terminal and put into excel file for better legibility.

Code

  • Main driver class
  • Square matrix class
    • std::vector<std::vector<double>>
  • A testing class will be used for creating methods to print results.

What is being looked for

1. A significant level above 50% correctness, otherwise there is no significant advantage over a random guess. A minimum of 60% correctness would be acceptable output from the Estimated Average.
2. Standard deviations between the differences of the estimated average value and the actual average value.

Results

The chart below displays the accuracy of the estimates. These estimates are the result of attempting to use data extracted in O(n) time as opposed to the O(n^2) solution that would yeild 100% correctness. This method is intended for situations in which a margin of error is allowed.

  1. Range Test 1: (0,101) ,(250,301),(1706,1732)
  2. Range Test 2: (0,100),(200,250),(1706,1732)
  3. Range Test 3: (0,99), (100,149), (1706,1731)

An output is counted as correct if both the exact average and the estimated average are either both below or above the midpoint of the range. 

Demo of program:

0:00 – 0:025 – Running program and showing results.

0:026 -1:25 – Changing parameters and running program.

1:26 – 1:55 – Code from all files.

1:56 – 2:34 – Results from multiple tests put onto an excel file for ease of legibility and analysis. 

Inverted Index

Made to replicate the process of a
search engine, this program uses a local folder with manually entered “search
results” (documents) to use as its corpus to answer user queries.  It returns documents that contain significant
words from the user query as well as it’s frequency within the document. The primary focus for this project was the construction of the inverted index hence for the code reuse on the stemmer algorithm.

 

External Dependency: Stemming Algorithm. Code taken and modified to be incorporated.   


Stemmer Source Code: https://tartarus.org/martin/PorterStemmer/java.txt

Demo Time Stamps:

0:00-0:10 Starting Program

0:11-0:23 User input to search

0:24-0:29 Search Query Results.

0:30-0:36 Output of the help command. Displays actions of all commands. Creation of Res1 File.

0:37-0:47 Setting preferences for stemming.

0:48-1:00 Secondary Search

1:01-1:19 GUI Display

1:20-1:37 Print Command

1:38-2:13 Display of the Inverted Index

2:14-2:22 Exiting Program