I was going back through all of my notes of various things I have encountered and I one thing I have been seeing a lot of lately are ulimit settings for various installations not following the recommended defaults. In just about every case, when I queried the team that either made or requested that setting, the standard answer was “I just didn’t want to have to worry about it, so we set it to unlimited”. As with anything, there are reasons why certain settings are recommended and this post seeks to show you why as it relates to the ulimit kernel setting of “nofiles”.
For example, in the WebLogic documentation (as of 12.2.1.4), the ulimit requirements specifically state that the “/etc/security/limits.conf” should contain:
* soft nofile 4096 * hard nofile 65536
Interestingly enough, the Oracle database guide (as of 19c) states that the nofile soft limits should be at least 1024 and the hard nofile limits should be “at least” 65536 for both the grid user and oracle user, which is different than WebLogic. So as you can see, one size doesn’t fit all.
One area where I saw this become important was during the boot sequence of WebLogic Tuxedo. We had an issue where sometimes a server would boot and sometimes it wouldn’t. At the time the best we could tell was that it depended on how busy the cpu was during boot sequence and that led us to truss the “tmboot” process. With the help of @AlexFatkulin what we found was very interesting. We saw this message recurring over and over.
…..
25029: close(5870682) Err#9 EBADF
25029: close(5870683) Err#9 EBADF
25029: close(5870684) Err#9 EBADF
25029: close(5870685) Err#9 EBADF
25029: close(5870686) Err#9 EBADF
25029: close(5870687) Err#9 EBADF
25029: close(5870688) Err#9 EBADF
…..
This message is related directly to the the closing of “open files”. But wait a minute, why are there open file descriptors if the application isn’t up? As part of what “tmboot” was doing during start up was trying to close all possible file descriptors regardless if the descriptor was open or not. So if ulimit -n was set to “unlimited” that resulted in 2147483647 possible open file descriptors. The boot code was then in the loop calling close from 1 to 2147483647 which was taking a very long time resulting in practically an infinite loop. As a corrective action, we set the limit to the recommended defaults and guess what. The server booted every single time.
It looks like setting ulimit hard “nofiles” to unlimited for WebLogic / Tuxedo exposed some bad coding practices which does not track what file descriptors it opened and instead just tries to close all possible descriptors up to the limit.
Bottom line? Always start with the System Recommendations and go from there. Don’t set things to UNLIMITED and think it’s a way to not have to worry about your system. It could expose the bad coding practices of others.
