Name - Shubham Kaushik
Email - kaushiks@bu.edu
-
Describe any design decisions you made. These may be minimal for pa1
- Use of
ConcurrentHashMapfor defining private variable inclass Catalogto make it thread safe. - Defined variable
pagesinclass BufferPoolusingConcurrentHashMapfor thread safety. - For
Join, a simple nested loop join is used, in which it will iterate over one child only once and compare with each tuple of another child. So, if relation R has N records and S has M records it will run do IO for (N + (N*M)). - For
HashEquiJoin, partitions are only done for one of the child and stored in-memory. The hasNext will pull one tuple from another child and do hashing over the join value and pull that particular partition for first child. This way it will save some comparison assuming the data is uniformly distributed
- Use of
-
Discuss and justify any changes you made to the API
-
class Tuple : Added few bunch of private variable like tupleDesc, recordId, fields and implemented an Iterator for fields.
-
class TupleDesc : Added
TDItem[] tItemsarray for holding TDItem objects. The typeAr should have some item otherwise constructor will throw Exception. -
class Catalog : Defined bunch of new private
ConcurrentHashMapvariables, which are thread safe as per the java documentation. -
class BufferPool : Defined variable
pagesusingConcurrentHashMapfor thread safe. ImplementedgetPageto get the page from buffer if exists or if not read it from disk(DbFile/HeapFile). -
class HeapPageId : Defined few private final variables and also updated
hashCode&equalsmethods. -
class RecordId : Defined few private final variabled and also updated
hashCode&equalsmethods. -
class HeapPage : Updated
getNumTuples&getHeaderSizewith the help of given doc on Lab webpage. Also, implemented theiteratormethod for returning an iterator over theTupleobjects stored on this page. -
class HeapFile : Updated the
readPagemethod to read a page fromHeapFile. I am usingRandomAccessFileto seek over the file randomly and read theBufferPool.getPageSizebytes data each time. ThepageNumberis used to calculate the address of the page in the disk. I have also implementedDbFileIteratorto iterate over all of the tuples we have in thisHeapFile. -
SeqScan : This was easy as we just have to link this with the
HeapFileandDbFileIterator. -
class BTreeFile :
- findLeafPage() : This method is used to pull leaf page as per the given Field key value. This is implemented using recursion. If the Field value is passed as 'null', it will pull the left most page at leaf level.
- splitLeafPage() : This method help in splitting the leaf page when we insert more values and the leaf page is full. The middle node is copied to the internal nodes after splitting and parent & sibling pointer are updated.
- splitInternalPage() : The method do the splitting part for the internal node. This also increases the height of the BTree and push the middle node towards the parent node.
- stealFromLeafPage() : The method is used for the Deletion purpose, if the tuples are less than half full in any leaf node. It will try to steal nodes from its siblings.
- stealFromLeftInternalPage() : This method is for stealing nodes from the sibling when the internal node is out of its half full capacity.
- stealFromRightInternalPage() : This is similar to the upper method just be used when stealing nodes from right internal node, instead of left.
- mergeLeafPages() : While deletion if the stealing doesn't work out than merging of two nodes happens in BTree. The mergeLeafPages merges the leaf pages whenever required.
- mergeInternalPages() : This is same as above just to merge the internal nodes.
-
class IndexPredicate : Updated to compare the field values using the given operators. The operators are defined as enums The Comparison is done based in the given '>=', '<=', '==' operators.
-
class Predicate : Defined the required class variables to support IndexPredicate class operations.
-
Filter, Join & HashEquiJoin : Added required class variables and implemented the hasNext() method call to return the results after filter, join and hashJoin. The strategy used for join is simple nested loop join and for hashEquiJoin I have used hashing and making partition for one child.
-
IntegerAggregator, StringAggregator, Aggregate : Implemented integer & string aggregate classes for aggregating on the required column based on groupby if given.
-
HeapPage, HeapFile : Implemented insert and delete tuples method to add mutability of tuples in pages or file. Also, eviction is checked for both operations.
-
Insert, Delete : Insert and delete operators are implemented for inserting the new record or deleting the existing one.
-
-
Describe any missing or incomplete elements of your code
- As per my knowledge, all the elements required for lab1 are complete in my assignment. The only part I could guess is the use of transactionId and Permissions are not in use for anything as of now.
-
Describe how long you spent on the assignment, and whether there was anything you found particularly difficult or confusing
- For this assignment, I've spent around 2-3 days to understand and start myself writing code. There were two things which were bit tricky for me, first the use of bit manipulation for the
HeapPage.isSlotUsedmethod to compute if the slot is in use with the help of header array of bytes, second was the linking of Iterators fromSeqScantoDbFileIteratorofHeapFileclass. Things are quite clear now.
- For this assignment, I've spent around 2-3 days to understand and start myself writing code. There were two things which were bit tricky for me, first the use of bit manipulation for the
-
If you collaborate with someone else, you have to discribe how you split the workload
- I worked alone on this assignment but yes the discussion in labs, javadoc and slide for simpledb helped a lot to speed up.