Today I was looking over the patches I have submitted so far and found that I broke the number 20 today, so I guess it's a good opportunity for a roundup and for putting things into perspective.
Most of the patches had to do with memory leaks and when it comes to coding a memory leak is a pretty bad word, especially in conjunction with the Second Life browser, because the program (or more precisely it's users) seem to have suffered quite badly from the program eating more and more memory with every hour.
I just did a count. 15 of my patches were memory related in one way or another. I did not count exactly, but my guess is that 10 of them were not leaks in the sense that they did not make the program grow over time. They were just sloppy cleanup of one time objects, meaning they did not give up their memory at the end of the program, which is no big issue from a user's perspective, because the memory was required during the runtime of the program and because the operating system will claim the memory a millisecond after the point when a proper cleanup would have given it up. This stuff happens with most programs and those bugs are more leftovers than leaks in that they do not make the memory footprint of the program grow over time.
Then there were a handful of leaks which were neglectible because they just lost a few kilobytes every time you opened or closed a dialog. For example this was the case with the groups dialog, opening that a hundred times would have lost about half a megabyte. Technically this is a real leak ... it does make the program grow every time you do that, but from a user's practical standpoint these minor leaks are irrelevant as well, meaning nobody will be forced to buy more memory for their computers just for being able running the application longer than an hour or two (unless you are opening and closing that dialog every ten seconds).
Which brings me to THE LEAK. The growth of memory over time had a single prime source, which was a leak in a 3rd party library (an open source project called libcurl) which the Lindens were using to fetch data from their servers. As far as I can tell, they were (and are) using the library version from December 2006 and the libcurl folks found and fixed their leak in mid March.
Interestingly the big leak was the hardest to find. It was the second bug I was looking for when I started with the browser source (the first one was the AFK bug) and it took me three full days to nail it.
The reason was partly that it was clouded by leftovers, mini leaks and normal growth of the memory footprint. I first tried to reproduce it more consistently which failed because the program grows to a certain extent over the first half hour as part of the regular design (the memory cache for textures filling up). Took me a day or so to figure it out and eventually resulted in filing an issue to fix regular memory consumption (VWR-733).
Then, when I eventually understood what part of the footprint was "normal" (by design), I still did not know if the stuff I was looking at the leaks log was mainly due to the fact that the program left over parts of it's normal memory consumption or if it was genuinely leaky stuff.
I know now, that most of that stuff in the log *was* actually leaky. The leaks and leftovers I'm finding today are increasingly easy to locate because there is less and less left in the leaks log. It just sticks out. But back then, what I was looking at were logs the size of 50-100MB with nothing but dumped memory blocks.
The reason why I'm writing this is to put things a bit into perspective. It's not that the SL browser is leaking all over. In fact the other real leaks were really neglictible, what made things hard were the leftovers because they cluttered the logs and those logs don't tell which is a leftover and what is a leak.
Which is, why I am currently so religious about fixing all that seemingly irrelevant stuff. Those mini leaks and leftovers may be irrelevant to the end user and with all those shiny new features in the pipeline it's easy to write them off as a secondary or unimportant, and quite frankly, it takes a bit of a deviant coder personality to find pleasure in tracking that kind of stuff down. But sooner or later they are going to come back and bite you int the butt. And interstingly enough, end users seem to have an acute sense about stuff like that, even without knowing exactly why ... just remember the Open Letter project a month ago.
I've been asked a couple of times why I'm investing the efforts here, and quite honestly I'm not entirely sure what's driving me.
Maybe it's because I' thrilled by this project (Second Life), maybe because I love the intellectual challenge. But I guess a big part of it is that I've talked to one person too many who told me they kept crashing every two hours right in the middle of something and that they didn't have or did not want to spend the money for more memory or a faster computer.
I've been following the forum a bit over the last week, and when it came to memory the standard answer was, that you'll need at least 1.5GB or better 2GB on the machine to run SL decently. Most people I know and used to hang out with on the grid however just have 512MB and only a few upgraded to 1GB for the program. I myself bought a new computer because the damn thing did almost swap the hard disk to death on my 512MB laptop and because I thought the low FPS rates were due to the video board.
Go figure ...
Wednesday, May 30, 2007
Monday, May 28, 2007
Release Day: Nicholaz Edition 16d
I've been sorting through the patches available on JIRA and patched the 1.16.0.5 viewer with all my stuff and some of the others found there, so this edition is about 25 patches ahead of the Linden viewer.
A few days ago LL released an update which, from what I've heard, just fixes some Korean language stuff, so I did not bother to apply all my stuff to that one, so you'll still get the upgrade notice with my edition, but I changed the default button to "continue" so you can just hit enter to get across that.
I also (hopefully) fixed the crash reporting. If you crash, it will write two crash files to your Second Life program folder. You can mail me the smaller one an save a copy of the larger (see the AboutCrashes.txt in my archive) and I'll check if I can locate the reason.
If something goes badly wrong with the release, the older editions (like 16c) are still on the download server and should continue to work fine.
Disclaimer: This viewer is inofficial and although I'm doing my best to make it better than the release viewer, it may contain extra bugs. Please understand that it runs against the main grid with your real Second Life account. In this regard, it is similar to a First Look viewer. Please read and understand that link and also read the disclaimer in the Install.txt in my archive.
So, if you like to test and use this release, download it here.
A few days ago LL released an update which, from what I've heard, just fixes some Korean language stuff, so I did not bother to apply all my stuff to that one, so you'll still get the upgrade notice with my edition, but I changed the default button to "continue" so you can just hit enter to get across that.
I also (hopefully) fixed the crash reporting. If you crash, it will write two crash files to your Second Life program folder. You can mail me the smaller one an save a copy of the larger (see the AboutCrashes.txt in my archive) and I'll check if I can locate the reason.
If something goes badly wrong with the release, the older editions (like 16c) are still on the download server and should continue to work fine.
Disclaimer: This viewer is inofficial and although I'm doing my best to make it better than the release viewer, it may contain extra bugs. Please understand that it runs against the main grid with your real Second Life account. In this regard, it is similar to a First Look viewer. Please read and understand that link and also read the disclaimer in the Install.txt in my archive.
So, if you like to test and use this release, download it here.
Labels:
nicholaz edition,
open source,
patches,
release
Sunday, May 27, 2007
Bug-Sunday
I had planned to not go for bug hunting today, because I had to give up on something particularly complex and nasty yesterday night (VWR-828).
But fate decided otherwise, because I had a little scripting project that I wanted to do for a friend (in fact it was long over due).
From habit, I start SL from through the debugger ... you never know what will happen, especially with this program, and I hate it when something crashes and I could have looked at it but can't because I ran it from the desktop instead of investing two or three extra clicks for the debugger.
As far as the script project goes, I lasted about 10 minutes before the viewer froze. 20 minutes later I filed VWR-869 on the JIRA (which usually takes another 10 minutes alone because the site is so slow and I have to explain things etc.).
Good deed done for the day I continued, my wooden object was still standing around, and I was just panning around it, when the viewer crashed (I swear it was one minute into the session when I fired up the browser after gotten all fixed and done with the issue above and wanted to continue with the script project).
This one I will remember as one of my favorite bugs. It was pretty clear that it was something hard to reproduce, so I had to get it right (analyzed correctly at least) on the first time. The thing with debugging is, once you close the debugger after a crash, all evidence which you have not secured will be gone ... quite like letting a horde of press people onto a crime scene.
I'll spare you the details (if you're a programmer, look at VWR-870, it makes a nice puzzle), but it's a fine combination of two almost inconsequential things which, based on timing and other circumstances, lead to a crash. Involved was a counter which was uninitialized, or bad food (some funny guy at Microsoft decided to set new memory to a value of hex BA AD FO OD which makes it's way into variables, giving them a value of BAADF00D, if the program forgets to initialize them correctly).
Well, and to round up the day, while looking at the cause for VWR-870 I found more bad food which brings us to VWR-871 and VWR-873. It will be interesting to watch this issue and see how long it takes for it to arrive in a release build. It is most definitely inconsequential, but it is a matter of style and I have seen minor errors like that sit on JIRA for months.
Leave enough inconsequential stuff inside a program and it will combine and mix and mingle and eventually something will blow up and nobody knows why.
But heck, I haven't even done my script yet and have also not posted my memory leak of the (sun)day.
But fate decided otherwise, because I had a little scripting project that I wanted to do for a friend (in fact it was long over due).
From habit, I start SL from through the debugger ... you never know what will happen, especially with this program, and I hate it when something crashes and I could have looked at it but can't because I ran it from the desktop instead of investing two or three extra clicks for the debugger.
As far as the script project goes, I lasted about 10 minutes before the viewer froze. 20 minutes later I filed VWR-869 on the JIRA (which usually takes another 10 minutes alone because the site is so slow and I have to explain things etc.).
Good deed done for the day I continued, my wooden object was still standing around, and I was just panning around it, when the viewer crashed (I swear it was one minute into the session when I fired up the browser after gotten all fixed and done with the issue above and wanted to continue with the script project).
This one I will remember as one of my favorite bugs. It was pretty clear that it was something hard to reproduce, so I had to get it right (analyzed correctly at least) on the first time. The thing with debugging is, once you close the debugger after a crash, all evidence which you have not secured will be gone ... quite like letting a horde of press people onto a crime scene.
I'll spare you the details (if you're a programmer, look at VWR-870, it makes a nice puzzle), but it's a fine combination of two almost inconsequential things which, based on timing and other circumstances, lead to a crash. Involved was a counter which was uninitialized, or bad food (some funny guy at Microsoft decided to set new memory to a value of hex BA AD FO OD which makes it's way into variables, giving them a value of BAADF00D, if the program forgets to initialize them correctly).
Well, and to round up the day, while looking at the cause for VWR-870 I found more bad food which brings us to VWR-871 and VWR-873. It will be interesting to watch this issue and see how long it takes for it to arrive in a release build. It is most definitely inconsequential, but it is a matter of style and I have seen minor errors like that sit on JIRA for months.
Leave enough inconsequential stuff inside a program and it will combine and mix and mingle and eventually something will blow up and nobody knows why.
But heck, I haven't even done my script yet and have also not posted my memory leak of the (sun)day.
Labels:
bugs,
debugging,
second life
Saturday, May 26, 2007
Memory Leak of the Day (Saturday, 26th)
I have been reporting memory leaks for a couple of days now (you can see the whole series through a link down on my page on the wiki which lists all my JIRA issues), they were even interesting enough that a Linden has asked me to send them to him directly and so I decided to make a series of daily findings until I can run my viewer in the debugger with leak tracing on and zero memory left over at the end of the program.
Today's leak of the day is an easy one, in fact one of the first I found. I reported it as part of a larger patch at VWR-364 but I doubt anybody noticed it there (in fact I've been pretty pissed when I found that nothing of 364 had appeared in the 1.16.0.5 release).
It is a bug where a new particle script is created, and when the program finds it can not correctly decode the data that's supposed to configure the object, it just returns and forgets about the newly created. This happened in a similar way in llviewerpartsim.cpp
Details are on JIRA as VWR-865
Today's leak of the day is an easy one, in fact one of the first I found. I reported it as part of a larger patch at VWR-364 but I doubt anybody noticed it there (in fact I've been pretty pissed when I found that nothing of 364 had appeared in the 1.16.0.5 release).
It is a bug where a new particle script is created, and when the program finds it can not correctly decode the data that's supposed to configure the object, it just returns and forgets about the newly created. This happened in a similar way in llviewerpartsim.cpp
Details are on JIRA as VWR-865
Labels:
debugging,
Memory Leak of the Day,
second life
Welcome
This is the unavoidable "Hello World" style first post that appears on basically every blog. In a few moments it will probably be edited beyond recognition, but WTF :-)
Labels:
blog,
open source,
second life
Subscribe to:
Posts (Atom)