Course Projects

Focusing on data Structures and algorithms

Autocomplete Me

Responsive image

Autocomplete Me

Course: Data Structures And Algorithms

This project was focused on utilizing Binary Search to perform searches. The purpose of this project was to create a program that can be used to perform the autocompletion of a term based on a certain number of strings and non negative weights. When a prefix is provided, a search is performed on the dataset and all the words that start with the given prefix is returned along with their weights. To complete this project, I implemented a program called Term that compares strings based on their lexicographic order, descending order by weight, and lexicographic order by query string using only the first N characters. I also implemented a program, Binary Search Deluxe, that does binary search and returns either the first or last key that was found, that is equal to the search term. Lastly, I implemented the Autocomplete program. First, the terms were sorted in lexicographic order, then I used the Binary Search Deluxe to find all the terms that start with the given prefix, and then sorted the matching terms in descending order by weight. This program was written in Java. The picture on the left is a search of Praia, the capital of Cape Verde, my home country.

K-D Trees

Responsive image

K-D Trees

Course: Data Structures And Algorithms

In this project I implemented a program called BrutePointST and KdTreePointST in Java. The BrutePointST was implemented using a red-black BST. This program supported put(), get(), and contains() in O(log N ) time. It returned all the points contained within a rectangular boundary (range search), and it returned the nearest neighbor in time proportional to the number of points in the symbol table. A two-dimensional tree was implemented in KdTreePointSt. This data structure is a Binary Search Tree in which the keys are two-dimensional. This BST contained points in the nodes in which the x and y coordinates of the point were used as keys in an alternating sequence starting with the x coordinate. The 2-D tree allotted for a much more efficient implementation of range search, nearest neighbor, and k-nearest neighbor search. The image on the left is of the program identifying 128 points that are the closest to the location of the mouse using k-nearest search. It was visualized using the provided NearestNeighborVisualizer program.

Wordnet

Responsive image

Wordnet

Course: Data Structures And Algorithms

In this project, I created a Wordnet Digraph, with each vertex v as an integer that represents a synset, and each directed edge v -> w representing that w is a hypernym of v. A Wordnet is a semantic lexicon in which relationships can found between words. Therefore, it could also tell how closely related certain words are. Synsets are sets of synonyms such as {AND circuit, AND gate}. A hypernym is a type of relationship between synsets. One type of relationship is the "is a" relationship. If there was another synset {gate, logic gate}, the relationship would be, AND gate is a kind of logic gate. This program was written in Java. The Picture on the left is an example of a Wordnet Digraph.

Deques and Randomized Queues

Responsive image

Deques and Randomized Queues

Course: Data Structures And Algorithms

The purpose of this project was to create elementary data structures using arrays and linked lists, and to work with generics and iterators. The first problem was to create a double-ended queue or deque, in which items can be added or removed from the front or back. The next problem was to create a Random Queue. This data structure supported removing items uniformly at random. The last part of the project was to create a program that utilized the LinkedDeque or Random Queue to read strings from the command line and print exactly K of them. This program was written in java. The picture on the left is of a deque data structure from https://datastructurec.wordpress.com/2015/09/13/doubly-ended-queuedeque-

Percolation

Responsive image

Percolation

Course: Data Structures And Algorithms

The data structures used in this program were the WeightedQuickUnionUF and 2 Dimensional Lists. The purpose of this project was to model a percolation system and estimate the percolation threshold. Percolation is the process of a liquid passing through a porous material. This is modeled using a two dimensional boolean list with dimensions N x N. Each location is in grid format and is called a site. The percolation threshold is the probability of the system percolating if each site is set to open with probability p and not open with probability (1 - p). To model this, first a percolation object is created with all sites initialized to blocked("false"). Then, a random site is chosen uniformly and opened until the system percolates. The threshold will be the number of open sites divided by the total number of sites. This whole process is done T times and the average is taken. The picture on the left was created using a percolation visualizer that was provided by the professor. This program was written in Java.

8 Puzzle

Responsive image

8 Puzzle

Course: Data Structures And Algorithms

In this project, I implemented programs to solve the 8 puzzle game and generalizations of it. A normal 8 puzzle game consists of a 3 x 3 grid with 8 blocks numbered 1 through 8. If the blocks are in random order, the program will order them 1 through 8 by shifting the blocks to different locations, vertically or horizontally. This project was implemented using a priority queue. An initial search node, (the initial board, 0 moves, and a null previous search node) is inserted into the queue. Then, we delete from the priority queue the search node with the minimum priority, and then insert all the neighboring search nodes (ones that can be reached in one move from the dequeued search node). These steps are then repeated until the search node dequeued is the goal board. This program was written in Java. The picture on the left is of the visualizer showing how the board is being solved. The number 8 board is being moved left to its appropriate position.

Markov Model

Responsive image

Markov Model

Course: Introduction To Programming In Python

The purpose of this project was to use a Markov chain to create a statistical model of a piece of English text and use the model to generate stylized pseudo-random text and fix corrupted texts. In this project the Markov Model that was created allowed for the probability of choosing each successive letter to depend on the preceding letter or letters. A Markov model is created with an order K, that determines the size of string that texts are split into (kgrams). A Markov model is then modeled using a dictionary whose keys are all the different kgrams that a piece of text is split into. The values are then another dictionary whose keys are the letters that follow each kgram and the values of this dictionary are their frequencies. By using this program, a text file that is corrupted--letter or letters are missing in words--can be fixed. This is done by looking at the letter with the highest probability of being in the corrupted position in context to kgrams before and after the corrupted location.

Global Sequence Alignment

Responsive image

Global Sequence Alignment

Course: Introduction To Programming In Python

The purpose of this project was to write computer programs to calculate the optimal sequence alignment of two DNA strings. If a sequence of gene has been discovered, one would like to know its function or what the genes encode. By comparing this sequence with existing ones that have already been studied, and whose functions have been classified , functions of the discovered gene could also be classified based no how similar the two sequences are. This similarity is quantified by an edit distance algorithm. The edit distance algorithm uses dynamic programming to calculate the score of the best possible alignment between two genetic sequences over all possible alignments. The next step of the project was to recover the alignment with the best edit distance score, in this case the smaller of the scores. To do this, the steps of the dynamic programming algorithm was retraced backwards by looking at the path of choices previously made. The programs for this project were written in Python.

Atomic Nature Of Matter

Responsive image

Atomic Nature Of Matter

Course: Introduction To Programming In Python

The purpose of this project was to to re-affirm the atomic nature of matter by tracking the motion of particles undergoing Brownian motion, fitting this data to Einstein’s model, and estimating Avogadro’s number. To achieve the correct results, I wrote a computer programs that analyzed video microscopy data of polystyrene spheres (“beads”) suspended in water, undergoing Brownian motion. The first task was to identify the beads (white blobs) amongst the black background which was the water. By doing so, I was able to track how far each bead moved between each photo that was provided. This program was written in Python. The photo on the left is an example of the data provided.