Bug #1469

File recursion with a remote source should not recurse locally

Added by Brice Figureau about 2 years ago. Updated 4 months ago.

Status:Closed Start:07/30/2008
Priority:Normal Due date:
Assignee:James Turnbull % Done:

100%

Category:file
Target version:0.25.0
Affected version:0.24.8 Branch:
Keywords:file recursive slow
Votes: 0

Description

Hi, I have the following manifests:

file {
    "${homeroot}/$name":
    source => [ "puppet:///files/users/$name/skel/", "puppet:///files/users/skel/"],
    sourceselect => all,
    checksum => mtime, 
    recurse => true, owner => $name, group => $name, replace => no,
    require => [ User["$name"], Group["$name"] ];
}

that is used to copy various files to the home directory of a new user.

If the home directory of this user contains a deep hierarchy full of files, puppet takes ages and consume a lot of memory to finally copy over a handful of files.

My understanding of the issue is the following: * Puppet builds localrecurse first (see puppet/type/file.rb). This parts can take a long time, each local present file has to be stat'ed and a new (implicit) resource is created in the catalog. * then Puppet builds the sourcerecurse

If Puppet would build the sourcerecurse first, and prune the localrecurse in function of the availability of the sourced file/dir it would be way much faster. Moreover I don’t think people are using file{} to distribute more than a few files, but I bet they like me have more local files than distributed.

Also note, that there is another issue if filebucket and checksum are enabled in the aforementioned manifest: it will copy over all files, and filebucket the previous one, as seen in this log extract:

debug: Calling fileserver.list
debug: Calling fileserver.describe
debug: Calling fileserver.retrieve
debug: //all_users/Account[brice]/File[/home/brice/.zshrc]/source: Executing 'diff -u /home/brice/.zshrc /tmp/user/0/puppet-diffing20080704-15708-glfegc-0'
debug: //all_users/Account[brice]/File[/home/brice/.zshrc]: Changing checksum,source
debug: //all_users/Account[brice]/File[/home/brice/.zshrc]: 2 change(s)
debug: //all_users/Account[brice]/File[/home/brice/.zshrc]/checksum: Replacing /home/brice/.zshrc checksum {md5}1395a1f669e2c86443b142cc5ad10f55 with {mtime}Fri Jul 04 15:17:39 +0200 2008
notice: //all_users/Account[brice]/File[/home/brice/.zshrc]/checksum: checksum changed '{md5}1395a1f669e2c86443b142cc5ad10f55' to '{mtime}Fri Jul 04 15:17:39 +0200 2008'
debug: Calling puppetbucket.addfile
info: //all_users/Account[brice]/File[/home/brice/.zshrc]: Filebucketed to server with sum 1395a1f669e2c86443b142cc5ad10f55
debug: //all_users/Account[brice]/File[/home/brice/.zshrc]/checksum: Replacing /home/brice/.zshrc checksum {mtime}Fri Jul 04 15:17:39 +0200 2008 with {md5}1395a1f669e2c86443b142cc5ad10f55
notice: //all_users/Account[brice]/File[/home/brice/.zshrc]/source: replacing from source puppet:///files/users/brice/skel/.zshrc with contents {md5}1395a1f669e2c86443b142cc5ad10f55

This is because default file checksum is md5, but my file resource above said “mtime”. It then compares apples to oranges.


Related issues

duplicated by Puppet - Feature #1700: Recursive file copies touching unmanaged files should be ... Duplicate 10/28/2008

History

Updated by James Turnbull about 2 years ago

  • Status changed from Unreviewed to Accepted
  • Assignee set to Luke Kanies
  • Affected version changed from 0.24.4 to 0.25.0

Updated by Luke Kanies almost 2 years ago

  • Subject changed from Recursive file resource with lots of local files is horribly slow. to File recursion with a remote source should not recurse locally
  • Priority changed from Normal to High

I guess we need to change the behaviour here, but we need to provide backward compatibility, probably through some kind of recursion selection attribute:

file { "/etc/ssh": 
  source => "...",
  recurse => true,
  recurseselect => 
}

I don’t like that attribute name, but we need something, and at least for now, it should default to the current behaviour (probably with a deprecation warning).

Updated by Brice Figureau almost 2 years ago

luke wrote:

I guess we need to change the behaviour here, but we need to provide backward compatibility, probably through some kind of recursion selection attribute: […]

I don’t like that attribute name, but we need something, and at least for now, it should default to the current behaviour (probably with a deprecation warning).

Why not simply:

file { "/etc/ssh": 
  source => "...",
  recurse => ,
}

With true = both With false = no recurse

Updated by Peter Meier almost 2 years ago

Why not simply: […] recurse => <remote|both|true|false>,

With true = both With false = no recurse

+1

Updated by Luke Kanies almost 2 years ago

masterzen wrote:

Why not simply: … recurse => <remote|both|true|false>

With true = both With false = no recurse

This is a change in behaviour, but I’m not sure anyone’s actually using that behaviour. That is, you can say ‘recurse => 2’.

This means you’d either need to support multiple arguments to recurse (e.g., recurse => [remote, 2]) or change this behaviour. I expect it’s safe to change the behaviour, but….

Updated by Brice Figureau almost 2 years ago

luke wrote:

masterzen wrote:

Why not simply: … recurse => <remote|both|true|false>

With true = both With false = no recurse

This is a change in behaviour, but I’m not sure anyone’s actually using that behaviour. That is, you can say ‘recurse => 2’.

Yes, I forgot this feature (which I used to workaround my original issue).

This means you’d either need to support multiple arguments to recurse (e.g., recurse => [remote, 2]) or change this behaviour. I expect it’s safe to change the behaviour, but….

I don’t really like the syntax you’re proposing. It is not really “typed”, and error prone (what happens if I write [2,remote] or [1,2]).

If we want to go this way, the following would IMHO be preferable:

... recurse => ,
    recurselimit => [0-9]+

Where:

recurse parameter:
 both => old behavior
 remote => new behavior (recurse on master, prune on local)
 true => same as both
 false => default (ie no-recurse)
 [0-9]+ => both, but limit recursion, give a warning to user that this syntax is deprecated and that they should move to recurselimit.

recurselimit parameter:
 [0-9]+ => recursion limit, usable with both (or true) and remote. If recurse=false it is ignored. If recurse is a number, then an error should be raised.

This way, we preserve the old behavior by default and we warn people that the syntax is deprecated.

Does it make sense?

Updated by Luke Kanies almost 2 years ago

Heh, I didn’t like the syntax I was proposing. :)

Your proposal makes perfect sense, although I would use ‘depth’ instead of ‘recurselimit’.

Updated by Brice Figureau almost 2 years ago

luke wrote:

Heh, I didn’t like the syntax I was proposing. :)

Your proposal makes perfect sense, although I would use ‘depth’ instead of ‘recurselimit’.

I’m all for it.

Updated by Brice Figureau over 1 year ago

  • Assignee changed from Luke Kanies to Brice Figureau

Since I’m working on it, it should be assigned to me :–)

Updated by Brice Figureau over 1 year ago

  • Status changed from Accepted to Ready for Testing
  • Assignee changed from Brice Figureau to James Turnbull
  • Priority changed from High to Normal
  • Target version set to 0.25.0
  • % Done changed from 0 to 100
  • 3 changed from Unknown to Easy
  • Affected version changed from 0.25.0 to 0.24.8

Hi,

It took 5 months :–) but the patch is now available for master in my github repository (see tickets/1469): http://github.com/masterzen/puppet/tree/tickets/1469

Thanks,

Updated by James Turnbull over 1 year ago

  • Status changed from Ready for Testing to Closed

Pushed in commit:“33d3624c91b236e237fe194d9df3d9fa324111c3” in branch master

Also available in: Atom PDF