In this article, the author explores the optimization of Kotlin sequence functions to enhance code performance. The distinct function was improved by avoiding unnecessary operations, resulting in a speed gain of about 10-15%. The flatten function was optimized by creating an EmptyIterator singleton and reducing unnecessary function calls, resulting in a speed gain of 35-37%. These optimizations have been included in Kotlin 2.0 and contribute to improving the overall performance of Kotlin.
What are the optimizations proposed for Kotlin sequence functions to enhance code performance?
The optimizations proposed for Kotlin sequence functions include improving the distinct and flatten functions. The distinct function was improved by avoiding unnecessary operations and loading of the array of all enum values onto the local stack at each comparison, enhancing its speed by about 10-15%. The flatten function was optimized by creating an EmptyIterator singleton, and reducing unnecessary calls to the ensureItemIterator() function, resulting in a speed gain of 35-37%.
In a recent investigation, I explored the performance of sequences, focusing on the speed of transforming collections using sequences as opposed to conventional collection transformation methods. This research highlighted that certain sequence functions were not as fast as they could be, prompting me to propose optimizations. These suggestions were accepted by JetBrains and have been included in Kotlin 2.0, an upgrade which I will detail in this article.
Optimizing the Distinct Function
The distinct function implementation is algorithmically very alike to the filter function. Nevertheless, the distinct function exhibited a 15% performance loss compared to regular collections, whereas the filter function surpassed collections by about 3%-5%. This discrepancy led me to examine the code of both functions to understand why such losses occur in the distinct function.
Observations on the Implementation of the Distinct Function
The distinct function internally forms a HashSet observed and accumulates all the returned elements in it. If the element is already observed, it means it has already been returned and will be skipped. This is quite similar to the method used in the collections code.
However, when examining the class declaration in sequences, we find AbstractIterator, an abstract class that appears to be the source of performance loss.
Examining the AbstractIterator
The AbstractIterator class stores the current state in the state variable, determining whether the next element of the iterator was calculated or not. If it hasn’t been calculated, then it calls the tryToComputeNext() function, which calculates the next element.
Upon comparison with the sequence decorator for the filter function, I noticed that the state was stored as a regular Int number for the filter function, but a typed Enum was used for the distinct function. This led me to question if using the Enum class could cause a 15% performance loss.
To investigate this further, I analyzed the byte-code of both functions. The byte-code for the distinct function revealed an unnecessary loading of the array of all enum values onto the local stack at each comparison.
However, if we rewrite the comparison through ordinal values, we could avoid this unnecessary and costly operation. By avoiding this, we could improve the speed of the distinct function by about 10%-15%.
Optimizing the Flatten Function
During my research, I found that the flatten function was nearly twice as slow as collections. This function is commonly used in various data transformations, making this performance loss particularly detrimental.
The basic decorator in sequences is the flatten operator, and other functions are based on it, such as the plus function. Therefore, optimizing the flatten function was a critical part of my research.
Observations on the Implementation of the Flatten Function
The decorator for the flatten function is intricate. For each call to the hasNext() method, the ensureItemIterator() function is called, which calculates the nested list iterator for the current element and writes its value in the itemIterator field.
Upon closer inspection, I noticed that the itemIterator field is declared as a nullable type. This means that each time this field is accessed, Kotlin adds a check to ensure that the field value is not Null, leading to an additional read operation.
Implementing Optimizations for the Flatten Function
To overcome this, I created an EmptyIterator singleton that maintains the logic that ItemIterator should have in case of a Null value.
I also noticed that the ensureItemIterator() function is always called every time next() and hasNext() are called. As we always work with iterators using the hasNext() + next() method pair, calling the ensureItemIterator() function twice for each element seemed wasteful.
To address this, I added a state field and set it every time ensureItemIterator() is evaluated, and reset it when the next() method is called. This means the ensureItemIterator() function is now called only once per pair of hasNext() + next() calls, significantly optimizing the execution time of the flatten function.
Through these optimizations, the distinct function became about 10%-15% faster, while the flatten function gained 35%-37% in speed.
These improvements are significant not only in terms of enhancing the performance of Kotlin but also for contributing to the Kotlin community. The process of submitting these optimizations to JetBrains via their open youtrack was straightforward and rewarding.
Sharing your solutions and improvements in global products such as Kotlin can have a significant impact. If you come across a problem or envision a potentially brilliant idea, don’t hesitate to create an issue or contribute – this can help to make the product better.
- Optimized Kotlin sequence functions to enhance code performance
- Improved distinct function by avoiding unnecessary operations, resulting in 10-15% speed gain
- Optimized flatten function by creating EmptyIterator singleton and reducing unnecessary function calls, resulting in 35-37% speed gain
- These optimizations have been included in Kotlin 2.0, contributing to overall performance improvement of Kotlin
- Distinct function implementation analyzed and compared to filter function to understand performance losses
- AbstractIterator class examined as a potential source of performance loss in distinct function
- Byte-code analysis revealed unnecessary loading of array of enum values in distinct function, which was optimized to improve speed by 10-15%
- Flatten function identified as being nearly twice as slow as collections, critical to optimize due to common use in data transformations
- Implementation of flatten function observed and optimized by creating EmptyIterator singleton and reducing unnecessary function calls
- Distinct function became 10-15% faster, while flatten function gained 35-37% in speed through optimizations
- Process of submitting optimizations to JetBrains via their open youtrack was straightforward and rewarding
- Sharing improvements and solutions in global products like Kotlin can have significant impact on product enhancement