Bug #2029
node yaml file can get corrupted
| Status: | Closed | Start date: | 02/26/2009 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | % Done: | 0% |
||
| Category: | plumbing | |||
| Target version: | 0.25.0 | |||
| Affected Puppet version: | 0.24.7 | Branch: | ||
| Keywords: | ||||
| Votes: | 0 |
Description
A few times a week some of my node’s yaml files become corrupted:
Thu Feb 26 01:51:39 -0800 2009 //Node[default]/network/File[/etc/inet/netmasks] (err): Failed to retrieve current state of resource: Could not parse YAML data for node xxx.sun.com: syntax error on line 45, col 8: ` zones: global' Could not describe /files/etc/inet/netmasks: Could not parse YAML data for node xxx.sun.com: syntax error on line 45, col 8: ` zones: global' at /puppet/config/manifests/classes/network.pp:22
And when I take a look in the yaml file, it is easy to spot the corruption:
ipaddress_e1000g0:1: 10.6.48.226 sshrsakey: AAAAB3NzaC1yc2ExxxxxIwAAAIEA1dvidJlovk3aqsMmMmgn7d30BLne9I0wwTVlBNcM0vjISWqWQG7LVRp2cEEkfH/s0PNIj+/Mut14FWqSMxxe3sYKZNnvCJwxOsaHAqpOPCZnugsPvfVKRcFatxxxxxIIOx4aIHGcZSxetVEobErzMTfSUc0B9paXgZ+qm4YaWTU= operatingsystem: Solaris sshdsakey: AAAAB3NzaC1kc3MAAACBAP+jc9R9G2TCT4+m6LMBq/ZzKXNuMkg4Mv3KiU/Ob/SJf0Pd1OzNHi4xxWSow2sELmpicI/ywt0sCsEEdIXYNcKDe1YSkpF4H/h1qdiZfbMPIzalzqPZkHpt40rg93fpgAMY9ummM7OWRHeYdeyLxUEwwTIIza+/C6JOoT+afLmbAAAAFQDglf4ErQZF6lKed4bpOJh+OAlgFwAAAIBHRrxxxxxLh0LbSGwDSchfLF1GBLx30usAqW8PMkF3VH14V+nbqwnD/Knf3qs/Bf/xneUnIL2rgb6bryZw+FRaG8SwKlCw/Iy7AkcMshb/mW2zu+4K5/B2Z5ZAFX3WFDvHeJanxug76UwxQZzzDYx1sopoEV4rsXFJpoSd5mWF7AAAAIEA/S7BebFGrRMy8zzIISGcF8wBiSJI/5Xln3lDPgClVDtPwawb1UUNi15NM975/u5BdGFhKrVtpnZQDgaqQxmULQ+m7mufP2p+3XHqcZH3uDXbz92cMWi/Udww/SJJc/cqyGSJsEjNeVUTVzARqwxxxxxSfVIDw7p2U5LF/GWkOF0= rubyversion: 1.8.7 time: 2009-02-25 16:08:50.020999 -08:00 MmMmgn7d30BLne9I0wwTVlBNcM0vjISWqWQG7LVRp2cEEkfH/s0PNIj+/Mut14FWqSMxxe3sYKZNnvCJwxOsaHAqpOPCZnugsPvfVKRcFat0Xi4LIIOx4aIHGcZSxetVEobErzMTfSUc0B9paXgZ+qm4YaWTU= operatingsystem: Solaris sshdsakey: AAAAB3NzaC1kc3MAAACBAP+jc9R9G2TCT4+m6LMBq/ZzKXNuMkg4Mv3KiU/Ob/SJf0Pd1OzNHi4xxWSow2sELmpicI/ywt0sCsEEdIXYNcKDe1YSkpF4H/h1qdiZfbMPIzalzqPZkHpt40rg93fpgAMY9ummM7OWRHeYdeyLxUEwwTIIza+/C6JOoT+afLmbAAAAFQDglf4ErQZF6lKed4bpOJh+OAlgFwAAAIBHRrD0lI3Lh0LbxxxxxchfLF1GBLx30usAqW8PMkF3VH14V+nbqwnD/Knf3qs/Bf/xneUnIL2rgb6bryZw+FRaG8SwKlCw/Iy7AkcMshb/mW2zu+4K5/B2Z5ZAFX3WFDvHeJanxug76UwxQZzzDYx1sopoEV4rsXFJpoSd5mWF7AAAAIEA/S7BebFGrRMy8zzIISGcF8wBiSJI/5Xln3lDPgClVDtPwawb1UUNi15NM975/u5BdGFhKrVtpnZQDgaqQxmULQ+m7mufP2p+3XHqcZH3uDXbz92cMWi/Udww/SJJc/cqyGSJsEjNeVUTVzARqwJdljNSfVIDw7p2U5LF/GWkOF0= rubyversion: 1.8.7 time: 2009-02-25 16:08:50.020999 -08:00
Related issues
History
Updated by James Turnbull almost 3 years ago
- Category set to plumbing
- Status changed from Unreviewed to Accepted
- Assignee set to Luke Kanies
- Target version set to 0.24.8
Updated by Luke Kanies almost 3 years ago
- Status changed from Accepted to Needs More Information
You’re sure this is in 0.24.7? We added some specific protections, including using a lock file, for writing and reading these files, so they should be safe.
Unless you think it’s not a threading issue and something else is causing the problem?
We don’t use a temp file for writing because I couldn’t find a method that would give us atomic renames while retaining the locks.
Updated by Josh Anderson almost 3 years ago
I’m also experiencing this with 0.24.7. I am, however, running multiple puppetmasterd processes on my master. Could that be the cause of the corruption?
Updated by Martin Englund almost 3 years ago
I’m very sure I’m running 0.24.7 :)
I’m only running one puppetmasterd as well. Let me know if you want add debugging code to it…
Updated by Luke Kanies almost 3 years ago
Any chance it could be happening when the server is shutting down, or something similar?
What’s the frequency of the corruption?
I’m nearly positive it’s not a concurrency issue, which doesn’t leave a lot of other options.
Updated by Martin Englund almost 3 years ago
It happens during normal operations. The server process has been running for weeks.
I get about 2 failures per week. I’ll start to document when they happen (and any suspicious circumstances)
Updated by Luke Kanies almost 3 years ago
- Subject changed from node yaml file can get corrputed to node yaml file can get corrupted
Updated by Martin Englund almost 3 years ago
Got another node yaml file corrupted today.
Updated by Luke Kanies almost 3 years ago
I’m quite stumped on this one. The only way I can see for the files to get corrupt is, maybe, if the server is getting stopped while writing to a file. It doesn’t look like that’s what’s happening, though.
Anyone else have any ideas?
Updated by Luke Kanies almost 3 years ago
I’m bumping this unless we can actually figure out what the problem is.
Updated by Luke Kanies almost 3 years ago
- Target version changed from 0.24.8 to 2.6.0
Updated by Martin Englund almost 3 years ago
I’m now getting about 3 corrupted file per day!
Updated by David Escala almost 3 years ago
This node/yaml corruption has hit us too.
I don’t know why this happens, why the yaml file gets messy, but a corrupted cache should be expired and fetched from its origin again. An invalid cache entry should not stop a node from getting the catalog.
Updated by David Escala almost 3 years ago
Patch here http://github.com/descala/puppet/commit/cf6febe82221a99317a186dc34aa84996ffb381f
Updated by Luke Kanies almost 3 years ago
- Target version changed from 2.6.0 to 0.25.0
We’ll at least get the fix (or some kind of fix) into 0.25.
Updated by Martin Englund almost 3 years ago
That did wonders for me :)
With this fix in place I’m fine with closing this bug…
Updated by Luke Kanies almost 3 years ago
- Status changed from Needs More Information to Ready For Checkin
I applied a form of the patch in the tickets/master/2029 branch in my repo.
Note that this doesn’t fix the corruption, just the cache failure propagating.
Updated by James Turnbull almost 3 years ago
- Status changed from Ready For Checkin to Closed
Pushed in commit:7398fa171fdd6dcaeb2d8fd1c07a23bbd78891d0 in branch master.