Wednesday, March 26, 2014

Quicktip: Run multiple mongos on Ubuntu with upstart

If you're using Ubuntu with the 10Gen supplied packages for mongodb that use upstart, you may have noticed that running two mognod's on the same host doesn't work out of the box.  There are several threads where people suggest things like making copies of mongod and doing other wacky actions that may or may not involve a chicken.

Here's the easier way (trivial if your configs are templates generated by configuration management):
  • Add pidfilepath to all of your mongodb.conf files, ensuring the pidfilepath is different for each instance you want to run.  (IMO: This should be a default in the config file.)
  • You should already have different config files since you'll need to change the value of the listen port and db directory.  If you specify all of this on the command line you're a bad person and should feel bad about yourself.
  • Edit /etc/init/mongodb.conf and add "-p /pidfilepath/from/config/file.pid" to the start-stop-daemon arguments, before the --exec
  • Copy the init file for each mongod instance you want to run, adjust the pidfile to te correct
  • Run stuff
This forces the stop-start-daemon in the upstart script to check the pidfile to see if your process is running instead of the default of traversing the process list looking for identical commands already running.  Note you don't need to add "-m" to make start-stop-daemon create the pidfile, the mongo setting will do that for us.

Why would you want to do this?  In my case I want to run a replicaset arbiter on a node also running a database so I can test the setup teardown of replicas in vagrant without having to create a third machine.  In other cases, I want to use a separate mongo database on the same machine my 3rd arbiter is running on.

Friday, March 2, 2012

Stupid Puppet Tricks: Appending to a variable

Want to maybe once in a while append things to a variable and not have to result to 15 snippet files and a exec to cat?  Here's one way that assumes that your classes are under your control and you can declare and use top scope variables. 

This is a kludge, and likely shouldn't work in puppet.  It's certainly against the core design and probably just a bad idea.

node 'foo' {
  include appender
  include speaker
  $bob = ''
  appender::append { 'one':
    $text => 'one',
    before => Appender::Append['two'],
  }
  append::append {'two':
    $text => 'two',
  }
  speaker::speak {'truth':
    $lies => $bob,
    require => Appender::Append['two'],
  }
}

class appender {
  define append($text) {
    $bob += "$text "
  }
}

class speaker {
  define speak($lies) {
    notice("the speaker $lies")
  }
}

Wednesday, January 18, 2012

ruby: gem install dies with buffer overflow

In hopes the Google machine will help some other poor hapless soul who stumbles across the same problem...

At work, I'm trying to use vlad-push to do pushes of current code to remote hosts.  I found a few bugs, fixed, them on my Mac, minted a new gem, and then tried to make a RPM at work using the newly built gemfile.  This fails spectacularly anytime you try to parse the gemfile's metadata, like so:

[nhruby@rpmbuilder1 rubygem-vlad-push]$ gem specification vlad-push-1.1.0.gem
*** buffer overflow detected ***: /usr/bin/ruby terminated
======= Backtrace: =========
/lib64/libc.so.6(__chk_fail+0x2f)[0x36cf6e807f]
/usr/lib64/ruby/1.8/x86_64-linux/syck.so(rb_syck_mktime+0x48e)[0x2aaaaade298e]
/usr/lib64/ruby/1.8/x86_64-linux/syck.so(yaml_org_handler+0x860)[0x2aaaaade32a0]
/usr/lib64/ruby/1.8/x86_64-linux/syck.so(syck_defaultresolver_node_import+0x39)[0x2aaaaade34a9]
/usr/lib64/libruby.so.1.8[0x322503492e] 
/usr/lib64/libruby.so.1.8[0x3225034e48] 
/usr/lib64/libruby.so.1.8[0x32250353f2] 
/usr/lib64/libruby.so.1.8(rb_funcall+0x85)[0x32250356c5]
/usr/lib64/ruby/1.8/x86_64-linux/syck.so(rb_syck_load_handler+0x47)[0x2aaaaade2437]
/usr/lib64/ruby/1.8/x86_64-linux/syck.so(syck_hdlr_add_node+0x39)[0x2aaaaaddd839]
/usr/lib64/ruby/1.8/x86_64-linux/syck.so(syckparse+0xb45)[0x2aaaaadde605]
/usr/lib64/ruby/1.8/x86_64-linux/syck.so(syck_parse+0x19)[0x2aaaaade6d29]
/usr/lib64/ruby/1.8/x86_64-linux/syck.so(syck_parser_load+0xed)[0x2aaaaade22ad]
/usr/lib64/libruby.so.1.8[0x322503492e] 
 
On my Mac I use rbenv and had 1.9.3-p0 installed, which spits out a "date:" value in the metadata YAML that causes older version of ruby's YAML parser to crash on Linux systems with hardened glibc builds (more details found here). 

Switching rbenv to use ruby 1.8.7 fixes the issue to produce a date field with saner value that older ruby version (such as those on CentOS/RHEL 5) can cope with.

 

Saturday, June 5, 2010

Get out of the sysadmin firefighting business

A while back there was a post on the lopsa-discuss mailing list about time management.  If you read it and the ensuing thread there are a number of really good suggestions about how to more effectively handle your work time so that you are more productive, less harried and start to really gain a sense of situational awareness about your environment.  It's all good stuff and I have used many of the suggestions in that thread with great success.  If you're a system administrator and feel that you need 36 hours in a day, go read that thread and then do at least one of the recommendations.  You'll never look back.

However, there was one particular bit from the original post that really has been hanging out in the back of my mind, bugging me:
I frequently find myself dealing with so many little things throughout the day that by the end of the day I feel like I've been busy but can't really point at what I've done during the day.
So the entire day is running around "fighting fires?"  Time management can't fix that problem, trust me, I've tried.  It can help and it's a great first step, you should do it.  But at some point you need to stop looking for better firefighting techniques to fix problems and start looking at fireproofing things so they don't catch on fire in the first place.  You might think that's a really hard (or even impossible) thing to do and that asbestos underwear is itchy.  Luckily, you'd be wrong on the first part of that thought, and I'd like to talk about some high level, introductory concepts that can help you get started fireproofing quickly.

And, no, I don't really want to talk about your underwear.

The first step for me is always to fix the flare-ups, the small reoccurring fires.  If you're constantly fighting the same fire, over and over again, then it's time you showed up with something more than a garden hose.  You'll be happy, your users will be happy, your bosses will be happy.  And as a wonderful side effect you'll have more time to manage because you won't be in a reactive mode all of the time fixing things! 

In a perfect world you'd see the problem at its' very core, tackle the it with precision, and resolve the problem once and for all.  Pesky print server?  Replace it!  Unhappy database server?  Upgrade it!

We don't live in a perfect world though, so sometimes the only real fix is manage the problem such that the pain it causes stays at a bearable level until you can handle the problem correctly.  One method is to isolate the problem so that when it explodes it can't take anything else out.  For example, move that troublesome application that causes whatever hardware it's on to lock up and require a reboot to a dedicated host. That way the reboot only effects the application instead of everything on the server.  Another method is to install some sprinklers that will automatically put out the fire for you.  Got a service that likes to leak memory?  Automate a restart during the lowest usage period so that the leak doesn't cause problems during peak usage times.

That's all fine for technical issues.  If you're constantly putting out fires from end-user questions and tickets there are some other strategies that can help.  Documentation is one method, but self-service documentation portals are only so useful.  Often we forget to update the docs so they're a little bit wrong, users don't follow directions carefully, some just don't want to, etc...  I additionally take a three pronged approach to handling fires from users:
  • Educate: Try to educate your users when you can so they understand the problem they're having.  If you explain it well enough, they can synthesize the information and use it to help themselves later.  Better yet, if you have desktop support or helpdesk staff, educate them so they can fix the problem on first contact with the user so everyone walks away happy.
  • Automate: Accountants are not sysadmins.  They do not want to follow a 12 step process to reset their passwords.  Automate thing things people do frequently that cause problems so it's easy and less error prone.
  • Facilitate: Some people just are not reasonable.  Facilitate their needs by getting it done without argument or hassle so everyone can get on with their life.  Often just doing whatever it is will take less time than arguing about it anyway, so skip the drama and suck it up.
A similar strategy can work for management initiated fires too, though with a heavier does of facilitation.

The take away here is that if you're fixing the same thing over and over again, you're not really fixing it.  Step back, look at the problem from all sides, examine the pain points and find a way to get the fire under control enough so you get some time and sanity back and so that your users don't feel like they need those pitchforks and torches.  If you can put the fire out once and for all, even better, if not, you're probably dealing with a big fire which takes a separate type of attention.


A side effect of fixing the flare-ups is that the air is a lot clearer to see the smoke from the real fires.  So my second step in fireproofing is to start looking for that smoke and if possible, the flames at the source.  In order to see the fire before your users do start monitoring the performance, capacity, and availability of your environment. 

If you don't have monitoring in place already, put some in and start with monitoring something about everything.  Don't spend huge amounts of time or money on monitoring at this point because you'll have no idea what you really need.  Stand up some cheap and easy monitoring solution and start tossing stuff into it and see what's useful.  If something breaks, put in a monitor for it.  Eventually you'll have enough monitoring in place (and experience from it) to make an educated and well formed decision about what you need to do in order to get to a point of comprehensive and useful monitoring.  And be sure do do that evaluation, otherwise....

If you do have monitoring, fix it.  Seriously.  If you jut had to fix a series of flare ups and suffer from interruptions every minute of the day because something is broken or needs attention and you weren't proactive in getting it resolved before the users took notice then something is fundamentally broken with your monitoring.  Evaluate what you monitor, how you monitor it, and what you monitor it with. Look to see where the breakdown is.  Too many fine grained monitors make even a server reboot look like a calamity?  Add in some dependencies.  Monitoring package doesn't monitor services well?  Add something else that does.  Is it really hard to setup proper monitoring because each machine needs a finicky client installed?  Find something new.


People laugh and think I'm joking when I say "Monitoring is a journey, not a destination" but I'm not.  Things change and your monitoring will need to change along with those things.  As a system administrator, it is the single most useful thing you can have in your arsenal.


So that's my simple two step, two minute introduction of how to start getting out of the sysadmin firefighting business.  I don't maintain that these suggestions will put out every fire you may have or come across.  I do think they offer a good place on the ground to start with.  In future posts I'd like to examine how to deal with the larger fires that arise, tire fires, better fireproofing though design, and what kind of tools are out there to help you fight the fires.

Saturday, May 1, 2010

Backup Applications for the Mac

I love backups.  They let me sleep well at night, they make me feel good in the morning, and that little pit of despair deep in my soul gets a little smaller every time I see my Time Machine icon spin.  So imagine my reaction when my wife forwarded me an email from her campus IT folks that had this to say about Time Machine after she inquired about why it didn't seem to be installed on her Mac:
"Time Machine is not an enterprise product so is basically banned on campus. It works well and is great for home use but the security issues on campus comes from the fact that it backs up without you thinking about it.  If someone sends you a file with SSNs in it, Time Machine backs it up.  If you delete that file and empty your trash, Time Machine still retains a copy of it.  Time Machine retains a lot of things that you intended to be deleted and never want back even."
The multiple layers of wrongness in that statement have confounded me for days.  Even better, the small pit of despair is now growing again.  So I'm spending my Saturday afternoon evaluating other backup applications for my wife's Time Machine-less laptop.

So far the leading contender is ChronoSync to do regular data protection tasks (backups, archives) of her home directory and Carbon Copy Cloner on a semi-regular basis for disaster recovery images.  CCC is awesome and has been forever but this is the first time I've ever tried ChronoSync.  It looks pretty nice.  I'm interested to see how it does after a week or two on my wife's laptop going back and forth to work every day when the storage device it wants to use isn't available.  The one thing I have found is that it can't automatically mount a remote share and use a disk image that's on the share.  It's not a big deal, but it would be nice to have.  Synk doesn't seem to support this either, so maybe it's not an oft requested feature.

Procedural note: At my wife's campus backups aren't banned, the genre of software known as "backup software" or "data protection software" as I far as I can find isn't banned, and the IT folks helpfully suggested to my wife that manually copying important data to a USB drive was an appropriate data protection method, so clearly copying data to some other location isn't banned.  As a result I don't think that by setting this up for her by request I'm causing my wife to violate any work policies surrounding data storage, protection, or retention.   For those of you following along at home: always check the fine print before futzing with a machine that isn't yours.

System Administration isn't doomed, but it's going to to be hella different real soon

Let's compare and contrast these two articles:
My personal opinion is that cloud computing isn't going to take over the enterprise any time real soon. Most companies don't want their data floating on someone else's machines. And in terms of context and locality, it doesn't make sense to have a "print server in the cloud" much less a file share (and yes, there will always be file shares).

I do thing that the concept and arcitecture of "cloud computing" is going to change the I.T. industry and the profession of system administration a whole lot though. My particular vision goes something like:
  • Various {system,application,database,network} administration roles will blend more. In a cloud, it's much harder to build walls between hardware, network, applications, and "other stuff" because they all depend on each other. To a degree, this has already begun: witness SOA.
  • Scale will start to bite these fines folks. The tools will get better as a direct result.
  • Day to day "I.T. Guy" tasks will be automated/documented/made easy enough for them to be pushed down to the users, or at least desktop/helpdesk staff. "I.T. Guy" either moves laterally to another career, or refocuses as desktop/helpdesk or upper level administration.

Sunday, April 11, 2010

Selling out, or selling up? Yes, I'm Enabling Ads

When I first started trying to write and post regularly I found it difficult to do so.  Part of my difficulty was trying to publish content that wasn't just a whiny rundown of my regular activities -- since my life is pretty boring, that kind of rundown really isn't very interesting either.  The other difficulty I had was finding the will to regularly post.  Two years later I think I'm starting to get the hang of the writing part and since I need to keep doing it in order to get better I'm looking for some extra motivation.

Love, happiness, and a sense of safety and well-being are all great motivators; but the Blogger "Monetize" tab looked much easier and quicker.  So, I'm going to try serving some ads along with content in order to see if having a positive financial impact will spur me into posting more regularly.  To keep my karma levels even I will apply the first $75 of revenue to a Kiva loan.  Judging by the (lack of) traffic I get that will take a while to happen.  Also, since my traffic levels are so low I'm not even going to worry about planning past that first $75.  My goal is to post and write, so I'm really less concerned about money and more concerned about just producing something worthwhile.  As things progress, I will post updates on the effectiveness of this scheme and link to whatever Kiva loans that are made as a result.  Hey, look!  It's already working, I have more stuff to write about!

Since I just enabled ads my request hasn't been fully processed yet.  So in a wonderful twist of irony, this post to my blog about enabling ads on my blog will go live with, you got it, no ads.