Bug #2615

YAML sometimes modifies the contents of string data

Added by R.I. Pienaar over 2 years ago. Updated about 2 years ago.

Status:Closed Start date:09/09/2009
Priority:High Due date:
Assignee:Markus Roberts % Done:

0%

Category:serialization
Target version:0.25.1
Affected Puppet version:0.25.0 Branch:http://github.com/MarkusQ/puppet/tree/feature/master/pson
Keywords:yaml json
Votes: 0

Description

While upgrading from 0.24 to 0.25 I get the following small issue, it only happens with one of my file resources which is rather odd since I have many that looks pretty much identical to this.

        file{"/usr/local/bin/machdb-send":
                owner   => root,
                group   => root,
                mode    => 750,
                content => template("machdb/machdb-send.erb");
        }
09:15:17 xen7> puppetd[19516]: (//machdb::client/File[/usr/local/bin/machdb-send]/content) content changed '{md5}0c606f2e7be81a862a44ae40e469a393' to 'unknown checksum'

The content that gets created is fine, the file isnt corrupt or anything and on the 2nd run of 0.25 nothing weird happens.

If I modify /usr/local/bin/machdb-send and run puppetd —test again it fixes the file to the expected content but again with the ‘unknown checksum’ log message.

All the template in question has are lines like:

my $user        = '<%= machdb_user %>';

not even any loops or weird ruby bits.


Related issues

related to Puppet - Bug #2551: File 'content changed to' message is often missing inform... Closed 08/18/2009
duplicated by Puppet - Bug #2717: unknown checksum error Duplicate 10/12/2009
duplicated by Puppet - Bug #2928: normal template content can cause YAML serialization to fail Rejected 12/14/2009

History

Updated by Markus Roberts over 2 years ago

  • Status changed from Unreviewed to Accepted
  • Target version set to 0.25.1

Updated by Markus Roberts over 2 years ago

  • Priority changed from Normal to Low

The patch for #2551 only produces checksums for file resources with a source which has a checksum; otherwise the “unknown checksum” message is produced.

Updated by R.I. Pienaar over 2 years ago

I’d say the priority on this should probably not be ‘low’ as when people do puppet updates a lot of resources get fired when they shouldn’t.

In my case for example when updating 60 machines to 0.25 i had quite a few bad things happen. From unexpected webservers restarting to in the worst case a mysql database restarting unexpectedly without any actual config changes.

The unknown checksum message triggers notifies, so this is pretty bad.

Updated by Luke Kanies over 2 years ago

volcane volcane wrote:

I’d say the priority on this should probably not be ‘low’ as when people do puppet updates a lot of resources get fired when they shouldn’t.

In my case for example when updating 60 machines to 0.25 i had quite a few bad things happen. From unexpected webservers restarting to in the worst case a mysql database restarting unexpectedly without any actual config changes.

The unknown checksum message triggers notifies, so this is pretty bad.

But you actually have a change, right? Your example above says you edited a file, which resulted in the (incorrectly logged) change, which should result in notifies, right?

Updated by R.I. Pienaar over 2 years ago

Sorry for not being clear:

1) Dont change any files, just upgrade puppet – get these messages and resources triggered. 2) After the first run that logged the changes, look at the files and confirm nothing has changed so no checksum messages should have been logged and nothing should have been triggered. 3) As a test, change the file and run puppet again, again the same unknown checksum error and this time – correctly – resources trigger.

So simply updating puppet and not actually changing anything else causes relationships to trigger.

So I guess I chose a stupid subject for this ticket, its not simply a logging issue.

Updated by Robin Bowes over 2 years ago

I’m seeing this behaviour too.

Here’s the log output after an upgrade from 0.24.8 –> 0.25

2009-09-12_00:46:36.16311 notice: Starting Puppet client version 0.25.0
2009-09-12_00:46:49.75567 notice: //site::statcounter::firewall::allowed_ips/Iptables::Allowed[9020-statcounter-internal]/File[/etc/iptables.d/filter-9020-statcounter-inte
rnal]/content: content changed '{md5}ed4243359dcfcd0f7034ebd2d7796895' to 'unknown checksum'
2009-09-12_00:46:49.76293 notice: //site::statcounter::firewall::allowed_ips/Iptables::Allowed[9010-statcounter-private]/File[/etc/iptables.d/filter-9010-statcounter-priva
te]/content: content changed '{md5}44ee250512616d95837de27ceb07081b' to 'unknown checksum'
2009-09-12_00:46:50.09726 notice: //iptables/Exec[rebuild_iptables]: Triggering 'refresh' from 2 dependencies
2009-09-12_00:46:50.90891 notice: //sshd::config/File[/etc/issue.net]/content: content changed '{md5}09f78bdada2d9d59d6a9389bae12411b' to 'unknown checksum'
2009-09-12_00:46:51.66260 notice: //site::statcounter::service::sshd/Runit::Service[sshd01]/Service[s_runit_sshd01]: Triggering 'refresh' from 1 dependencies
2009-09-12_00:46:51.70468 err: //site::statcounter::service::sshd/Runit::Service[sshd01]/Service[s_runit_sshd01]: Failed to call refresh on Service[s_runit_sshd01]: No command svc defined for provider runit
2009-09-12_00:46:52.08249 notice: Finished catalog run in 4.91 seconds

Nothing has changed in the manifests, I just upgraded the client. The server is already on 0.25.

I’ll raise a separate bug for the “No command svc defined for provider runit” issue.

Updated by James Turnbull over 2 years ago

  • Priority changed from Low to High

Updated by James Turnbull over 2 years ago

  • Subject changed from Incorrect checksum logging to Checksum incorrectly triggers resources and "unknown checksum" message generated

Updated by Luke Kanies over 2 years ago

Can someone try the fresh run with the following patch?

It should fix the logging so you can tell what the new checksum really is and whether there was a change or what.

diff --git a/lib/puppet/type/file/content.rb b/lib/puppet/type/file/content.rb
index ff71a55..6e13ebb 100755
--- a/lib/puppet/type/file/content.rb
+++ b/lib/puppet/type/file/content.rb
@@ -39,9 +39,10 @@ module Puppet
         # Checksums need to invert how changes are printed.
         def change_to_s(currentvalue, newvalue)
             # Our "new" checksum value is provided by the source.
-            unless source = resource.parameter(:source) and newvalue = source.checksum
-                newvalue = "unknown checksum"
+            if source = resource.parameter(:source) 
+                newvalue = "unknown checksum" unless newvalue = source.checksum
             end
+
             if currentvalue == :absent
                 return "defined content as '%s'" % [newvalue]
             elsif newvalue == :absent

Updated by R.I. Pienaar over 2 years ago

Luke Kanies wrote:

Can someone try the fresh run with the following patch?

It should fix the logging so you can tell what the new checksum really is and whether there was a change or what.

[…]

wish I could, but I’ve already refactored all my manifests for 0.25 deprecation noise so can’t first run a 0.24 client to attempt to reproduce.

Updated by Markus Roberts over 2 years ago

Can you provide any additional information that might help us reproduce this?

Updated by Markus Roberts over 2 years ago

  • Status changed from Accepted to Needs More Information

Updated by R.I. Pienaar over 2 years ago

Ok, I made a new module in my development environment that complies to 0.24.x ways and can see that the file contents does change, this seems to be a change in the contents of files once served up under 0.25.0 since the files do not actually change on the file system.

I am seeing an extra character appended to these files:

info: Filebucket[/var/lib/puppet/clientbucket]: Adding /usr/local/bin/machdb-send(1b6380149890b7ae59ebef53682abe93)
info: //machdb::client/File[/usr/local/bin/machdb-send]: Filebucketed /usr/local/bin/machdb-send to puppet with sum 1b6380149890b7ae59ebef53682abe93
notice: //machdb::client/File[/usr/local/bin/machdb-send]/content: content changed '{md5}1b6380149890b7ae59ebef53682abe93' to '{md5}722152b122e0607d501282ef6f72ec6b'
-rwxr-x--- 1 root root 2176 Sep 15 23:03 /usr/local/bin/machdb-send
-r--r----- 1 root root 2175 Sep 15 23:03 /var/lib/puppet/clientbucket/1/b/6/3/8/0/1/4/1b6380149890b7ae59ebef53682abe93/contents
# hexdump /var/lib/puppet/clientbucket/1/b/6/3/8/0/1/4/1b6380149890b7ae59ebef53682abe93/contents|tail -n 2
0000870 6972 746e 4020 6164 6174 0a3b 0a7d 000a
000087f

# hexdump /usr/local/bin/machdb-send|tail -n 2  
0000870 6972 746e 4020 6164 6174 0a3b 0a7d 0a0a
0000880

So it seems when 0.25.0 serves up a file it changes the contents of the file somehow.

Updated by R.I. Pienaar over 2 years ago

And for reference here’s the template that is used to build this file:

% hexdump machdb-send.erb|tail -n 2
00008b0 6972 746e 4020 6164 6174 0a3b 0a7d 000a
00008bf

So I would say in this case 0.24.8 interpreted this correctly, 0.25.0 is not.

Updated by Luke Kanies over 2 years ago

  • Subject changed from Checksum incorrectly triggers resources and "unknown checksum" message generated to Templates produce incorrect content
  • Status changed from Needs More Information to Accepted

Updated by Luke Kanies over 2 years ago

In getting more detail here, it is not happening consistently on all templates, nor on all hosts, nor on all templates for the hosts it is happening on.

So if anyone can help us sort out what’s actually causing this problem, we’d really appreciate it.

Updated by R.I. Pienaar over 2 years ago

For me this reproduces it every time:

1) Set up below module as is on a master – 0.24 or 0.25 doesnt matter, tested on both 0.24.8 and 0.25.0 and create a node block that just includes this class 2) Install 0.24.8 on a fresh node – delete the entire vardir before doing this. 3) Run the node, it creates /tmp/foo 4) Update the node to 0.25.0 5) Run the node, it updates /tmp/foo with a newline appended.

This is on CentOS 5.3 with RPMs from EPEL – except I removed the augeas dependency and rebuilt them.

The module in question: http://nephilim.ml.org/~rip/puppet/2615.tgz

Updated by Markus Roberts over 2 years ago

  • Status changed from Accepted to Needs More Information

volcane² —

Could you diff the two files and confirm that there is a trailing newline and that that is the only difference? I’m not seeing a line-ending change but I am seeing a content change. Thus I’d be interested in seeing exactly what changed for you.

Also, is it the case that all of the services that restarted had associated templates which exhibited this problem? In other words, is it reasonable to assume that the template change is the core cause here?

Thanks, — Markus

Updated by R.I. Pienaar over 2 years ago

Markus Roberts wrote:

volcane² —

Could you diff the two files and confirm that there is a trailing newline and that that is the only difference? I’m not seeing a line-ending change but I am seeing a content change. Thus I’d be interested in seeing exactly what changed for you.

See the hexdump above, the final bit is the only change.

diff just shows “> ” no content change other than the last bit, and since that diff output isn’t particularly clear I showed the hexdump. 0a being \n

Also, is it the case that all of the services that restarted had associated templates which exhibited this problem? In other words, is it reasonable to assume that the template change is the core cause here?

It seems to only affect templates.

Updated by Markus Roberts over 2 years ago

  • Status changed from Needs More Information to Accepted

So what I produced was actually a separate, known issue (related to #1765 and the change in how the hostname is derived). I am not seeing the \n issue.

Updated by Markus Roberts over 2 years ago

volcane volcane —

Could you attach or send me the $vardir/state/localconfig.yaml resulting from a run which appends an additional terminal \n? I believe we may actually be looking at a serialization problem.

— Markus

Updated by Markus Roberts over 2 years ago

I can now reproduce this, at least in vitro. The what appears to be happening is that the content produced by the template is being sent to the client (as part of the catalog) as a yaml-serialized text block using the @|+@ indicator which includes all following newlines in the text block (see http://web.archive.org/web/20071011081219/http://yaml4r.sourceforge.net/cookbook/#the_%27+%27_indicator).

When this happens to fall before a terminating /n, the “extra” /n is sucked up as part of the text. I suspect (but cannot prove) that this behavior existed in some form in 24.8 as well.

Updated by R.I. Pienaar over 2 years ago

Markus Roberts wrote:

I can now reproduce this, at least in vitro. The what appears to be happening is that the content produced by the template is being sent to the client (as part of the catalog) as a yaml-serialized text block using the @|+@ indicator which includes all following newlines in the text block (see http://web.archive.org/web/20071011081219/http://yaml4r.sourceforge.net/cookbook/#the_%27+%27_indicator).

When this happens to fall before a terminating /n, the “extra” /n is sucked up as part of the text. I suspect (but cannot prove) that this behavior existed in some form in 24.8 as well.

Nice one, glad you’re able to reproduce :)

Do you still need my localconfig.yaml?

Updated by James Turnbull over 2 years ago

  • Category set to serialization
  • Assignee set to Markus Roberts

Markus – status of this?

Updated by Markus Roberts over 2 years ago

Markus – status of this?

Probably not a new bug, probably not one that occurs very frequently in practice, no clear idea at the moment how to fix it.

If the content generated by a template ends in multiple newlines the string gets encoded in the catalog using the “|+” flag which means “suck up all the trailing newlines as part of this string.” If the content winds up as the last field of the last resource the newline at the end of the catalog gets included as part of the contents (converting, say, “/n/n” to “/n/n/n”).

Luke and I discussed trying to force the serialization order but that does not look like a clean or general solution. I’m open to other suggestions.

Updated by James Turnbull over 2 years ago

Volcane – Is this a show-stopper for you? Can I do a 0.25.1 release and bump this ticket to the next release?

Updated by R.I. Pienaar over 2 years ago

James Turnbull wrote:

Volcane – Is this a show-stopper for you? Can I do a 0.25.1 release and bump this ticket to the next release?

I think it’s a big deal but if Markus is correct then sure it seems low impact

however I had more cases of this than I have nodes which make me doubt at least some of the conclusions made – it only affecting the last resource.

If this is a yanl issue, would it go away when using Marshall for catelogs?

Updated by Markus Roberts over 2 years ago

If this is a yanl issue, would it go away when using Marshall for catelogs?

If our model is correct, yes. If it does not go away than my understanding of the root cause is completely wrong and we’re back to diagnosing.

  • Markus

Updated by Markus Roberts over 2 years ago

however I had more cases of this than I have nodes which make me doubt at least some of the conclusions made – it only affecting the last resource.

Do you mean by this that it affected more than one resource on one run on a single node?

— Markus

Updated by R.I. Pienaar over 2 years ago

Markus Roberts wrote:

If this is a yanl issue, would it go away when using Marshall for catelogs?

If our model is correct, yes. If it does not go away than my understanding of the root cause is completely wrong and we’re back to diagnosing.

I should have made this obvious connection before.

All my machines use marhall but I also tested with yaml, both are affected.

  • Markus

Updated by R.I. Pienaar over 2 years ago

Markus Roberts wrote:

however I had more cases of this than I have nodes which make me doubt at least some of the conclusions made – it only affecting the last resource.

Do you mean by this that it affected more than one resource on one run on a single node?

Yes

— Markus

Updated by Markus Roberts over 2 years ago

So our current status is that we don’t know what’s causing this and haven’t been able to reproduce it.

sigh

Back to square one.

Updated by Markus Roberts over 2 years ago

So is this presently reproducible? Specifically, if I were to give you a specially patched version of the 0.25.0 client instrumented to report select details of the template data as it passed through the system would you be able to run it in a known-to-fail scenario so we can probe the problem?

Updated by R.I. Pienaar over 2 years ago

Markus Roberts wrote:

So is this presently reproducible? Specifically, if I were to give you a specially patched version of the 0.25.0 client instrumented to report select details of the template data as it passed through the system would you be able to run it in a known-to-fail scenario so we can probe the problem?

I can do that, but did the module I provided not help you reproduce this?

Updated by R.I. Pienaar over 2 years ago

here’s some other – perhaps random – observations:

  • I cannot reproduce this using just ‘puppet’ – the master has to be involved
  • Below is a bit from my fqdn.yaml when run in yaml mode:
>        if ($verbose) {
>                print "Connecting to $url\n";
>                print @data;
>        }
>        
>        
>
>      ? !ruby/sym mode
>      : "750"

And here is the same from 0.24.8:

>        if ($verbose) {
>                print "Connecting to $url\n";
>                print @data;
>        }
>        
>      owner: root
>      path: /tmp/foo
>      group: root
>    tags: 

(I added the > prevent redmine suppressing multiple new lines)

  • In neither of these cases were this file the last in the cat.
  • Both of these are against a 0.25.0 master
  • I can give you access to this machine to play

Updated by Markus Roberts over 2 years ago

  • Access to the machine would probably be the most useful. markus@reductivelabs.com
  • The snippet gave me an idea why we would be seeing this more with 0.25.0; the 1.8.1 compatibility monkey patch may be creating more internal line breaks. But this doesn’t hold up if you see the same thing using marshaling.
  • When I try it with the module you provided, it always works as it should.

Updated by R.I. Pienaar over 2 years ago

Markus Roberts wrote:

  • Access to the machine would probably be the most useful. markus@reductivelabs.com
  • The snippet gave me an idea why we would be seeing this more with 0.25.0; the 1.8.1 compatibility monkey patch may be creating more internal line breaks. But this doesn’t hold up if you see the same thing using marshaling.
  • When I try it with the module you provided, it always works as it should.

I’d need a static ip and public keys, please send to me directly.

Updated by Markus Roberts over 2 years ago

So testing on the provided system I’m seeing slightly different results; specifically:

puppetd --onetime --test --preferred_serialization_format marshal

yields the correct file and

puppetd --onetime --test --preferred_serialization_format yaml

yields the file with the additional terminal “\n”

This seems to indicate that the original theory (yaml’s known “|+” flag behavior) coupled with another mechanism in addition to the end of stream \n may explain the data. I’ve got some theories but I’ll save them until a weeding pass narrows the field a bit (e.g. I’m going to run more tests after lunch).

Updated by R.I. Pienaar over 2 years ago

Markus Roberts wrote:

So testing on the provided system I’m seeing slightly different results; specifically:

[…]

yields the correct file and

[…]

yields the file with the additional terminal “ ”

pretty odd. All of my other machines have catelog_format=marshall and I tested 0.25 with you command lines on another machine and the resulting /tmp/foo is a md5 match for both formats

Updated by R.I. Pienaar over 2 years ago

R.I. Pienaar aka Volcane wrote:

pretty odd. All of my other machines have catelog_format=marshall and I tested 0.25 with you command lines on another machine and the resulting /tmp/foo is a md5 match for both formats

OK, my initial testing was flawed.

I used catalog_format=marshal / yaml to test the different formats and not preferred_serialization_format, puppet seems to silently ignore catelog_format now which is surely a bug. Additionally puppet also doesn’t complain if you spell marshall wrong (extral l) in either catelog_format or preferred_serialization_format – I think there’s some follow up bugs to file for all of this.

So, I confirm this affect only YAML format catelogs the paragraph above details how I missed that before.

Updated by Markus Roberts over 2 years ago

This is a bug in yaml. To demonstrate the error you can go into irb and type:

YAML.load(YAML.dump(/a/ => "\nlong\nstring\n\n" ))

which will return:

{/a/=>"\nlong\nstring\n\n\n"}

Note that the string now has three line feeds at the end instead of two. It is not, however, restricted to the last item in the list (my original assumption on that point was was flawed). Typing:

YAML.load(YAML.dump(/a/ => "\nlong\nstring\n\n", /b/ => "\nanother\nlong\nstring\n\n" ))

gives you two corrupted strings for the price of one.

{/a/=>"\nlong\nstring\n\n\n", /b/=>"\nanother\nlong\nstring\n\n\n"}

Note that I’m using regular expressions as keys here because that’s the easiest way to force the buggy code path; they aren’t essential, just convenient.

Now to figure out a way to fix it.

Updated by Matthias Saou over 2 years ago

I’m seeing some of those too after updating from 0.24.6 to 0.25.1rc1 :

File[/etc/php.ini]/content (notice): content changed '{md5}79a1ebf60a19366a46903ced44f589ea' to 'unknown checksum'

When doing a diff with the previous file, I see only one difference : My original php.ini.erb template has one line which looked empty but actually had two spaces. Once installed by puppet 0.24, the php.ini file didn’t have those two spaces, it was a real empty line, but now with 0.25.1, the two spaces are there :

# diff -u /var/lib/puppet/clientbucket/7/9/a/1/e/b/f/6/79a1ebf60a19366a46903ced44f589ea/contents /etc/php.ini
--- /var/lib/puppet/clientbucket/7/9/a/1/e/b/f/6/79a1ebf60a19366a46903ced44f589ea/contents  2009-10-06 16:35:10.000000000 +0200
+++ /etc/php.ini    2009-10-06 16:35:10.000000000 +0200
@@ -617,7 +617,7 @@
 ; needs to go here.  Specify the location of the extension with the
 ; extension_dir directive above.
 
-
+  
 ;;;;
 ; Note: packaged extension modules are now loaded via the .ini files
 ; found in the directory /etc/php.d; these are loaded by default.

So it seems like 0.25.1 is actually fixing something here…

Updated by Markus Roberts over 2 years ago

  • Subject changed from Templates produce incorrect content to YAML sometimes modifies the contents of string data
  • Keywords set to yaml json

This is one of a number of closely related issues related to data modification in the serialization process.

They are all being rolled into one and will be addressed by providing an internal serialization library that 1) does not depend on other installed software, 2) produces and consumes data in the same format regardless of platform or ruby version, and 3) is entirely under our control, so that any subsequently discovered issues can be resolved.

Updated by Luke Kanies over 2 years ago

Can you add the relationships to the other tickets?

Updated by Markus Roberts over 2 years ago

  • Status changed from Accepted to In Topic Branch Pending Review
  • Branch set to http://github.com/MarkusQ/puppet/tree/feature/master/bundled_json

With the bundled json patch (http://github.com/MarkusQ/puppet/tree/feature/master/bundled_json)

  • When using yaml this should revert to the exceedingly rare / never seen in practice level of 0.24.8;
  • When using pson it should not happen at all.

Updated by Markus Roberts over 2 years ago

  • Status changed from In Topic Branch Pending Review to Ready For Checkin

Note that this is the same branch as #2686

Updated by James Turnbull over 2 years ago

  • Status changed from Ready For Checkin to Code Insufficient

Ditto as per #2686

Updated by Markus Roberts over 2 years ago

  • Branch changed from http://github.com/MarkusQ/puppet/tree/feature/master/bundled_json to http://github.com/MarkusQ/puppet/tree/feature/master/pson

Rebased branch up at http://github.com/MarkusQ/puppet/tree/feature/master/pson

Updated by Markus Roberts over 2 years ago

  • Status changed from Code Insufficient to Ready For Checkin

Updated by James Turnbull over 2 years ago

  • Status changed from Ready For Checkin to Closed

Pushed in commit:bca3b70437666a8b840af032cab20fc1ea4f18a2 in branch 0.25.x

Also available in: Atom PDF