[PUP-7579] Tags do not support UTF-8 characters Created: 2017/05/19  Updated: 2017/06/28  Resolved: 2017/06/05

Status: Closed
Project: Puppet
Component/s: None
Affects Version/s: None
Fix Version/s: PUP 5.0.0

Type: Bug Priority: Normal
Reporter: Ethan Brown Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: resolved-issue-added
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Blocks
blocks MODULES-4864 vcsrepo ignores "force" for git repos Resolved
Relates
relates to PUP-7536 Improve Tag Support Open
Template:
Acceptance Criteria:

Solves issue with concat module described in MODULES-4684.

Epic Link: Puppet Unicode Adoption Blockers
Team: Agent
Story Points: 2
Sprint: Agent 2017-06-14
Release Notes: Bug Fix
Release Notes Summary: Puppet will now accept unicode tags, useful if you want to run a subset of resources in a catalog using a tag name that isn't US-ASCII. This also allows the concat::fragment defines from the concat module to write to file paths that are not US-ASCII. Marking as a bug fix since puppet accepts unicode for other inputs like resource name/titles, but not tags. We'll also need to update the public docs that specify the valid tag regex.
QA Contact: Eric Delaney
QA Risk Assessment: No Action
QA Risk Assessment Reason: Check in includes tests for the change

 Description   

The Regex for tag validation (at https://github.com/puppetlabs/puppet/blob/master/lib/puppet/util/tagging.rb#L4) is strictly ASCII:

ValidTagRegex = /\A[0-9A-Za-z_][0-9A-Za-z_:.-]*\Z/

This prevents the usage of tags in the following case:

puppet apply -e 'notify { "hi": tag => "_メインディレクトリ_醸造所" }'

With a stack like:

Error: Invalid tag '_メインディレクトリ_醸造所' at /root/foo.pp:1 on node h3a12zjzc6608rh.delivery.puppetlabs.net
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/errors.rb:106:in `fail'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/tagging.rb:27:in `block in tag'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/tagging.rb:13:in `each'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/tagging.rb:13:in `tag'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/resource.rb:185:in `set_parameter'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/resource.rb:385:in `block in extract_parameters'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/resource.rb:381:in `each'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/resource.rb:381:in `extract_parameters'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/resource.rb:368:in `initialize'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/resource.rb:130:in `initialize'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/evaluator/runtime3_resource_support.rb:38:in `new'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/evaluator/runtime3_resource_support.rb:38:in `block in create_resources'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/evaluator/runtime3_resource_support.rb:37:in `map'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/evaluator/runtime3_resource_support.rb:37:in `create_resources'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/evaluator/runtime3_support.rb:321:in `create_resources'
 
require 'puppet/util/tag_set'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/evaluator/evaluator_impl.rb:842:in `block in eval_ResourceExpression'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/evaluator/evaluator_impl.rb:839:in `map'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/evaluator/evaluator_impl.rb:839:in `eval_ResourceExpression'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/visitor.rb:48:in `block in visit_this'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/visitor.rb:42:in `each'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/visitor.rb:42:in `visit_this'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/visitor.rb:71:in `visit_this_1'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/evaluator/evaluator_impl.rb:82:in `evaluate'
/root/foo.pp:in `stack'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/puppet_stack.rb:30:in `eval'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/puppet_stack.rb:30:in `stack'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/evaluator/evaluator_impl.rb:715:in `eval_Program'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/visitor.rb:48:in `block in visit_this'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/visitor.rb:42:in `each'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/visitor.rb:42:in `visit_this'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/visitor.rb:71:in `visit_this_1'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/evaluator/evaluator_impl.rb:82:in `evaluate'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/pops/parser/evaluating_parser.rb:63:in `evaluate'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/ast/pops_bridge.rb:132:in `evaluate'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/ast.rb:31:in `safeevaluate'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/resource/type.rb:184:in `evaluate_code'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/resource.rb:81:in `block in evaluate'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/profiler/around_profiler.rb:58:in `profile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/profiler.rb:51:in `profile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/resource.rb:73:in `evaluate'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/compiler.rb:635:in `evaluate_main'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/compiler.rb:174:in `block (2 levels) in compile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/profiler/around_profiler.rb:58:in `profile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/profiler.rb:51:in `profile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/compiler.rb:174:in `block in compile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/context.rb:65:in `override'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet.rb:294:in `override'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/compiler.rb:162:in `compile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/parser/compiler.rb:33:in `compile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirector/catalog/compiler.rb:268:in `block (2 levels) in compile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/profiler/around_profiler.rb:58:in `profile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/profiler.rb:51:in `profile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirector/catalog/compiler.rb:266:in `block in compile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util.rb:224:in `block in benchmark'
/opt/puppetlabs/puppet/lib/ruby/2.1.0/benchmark.rb:294:in `realtime'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util.rb:223:in `benchmark'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirector/catalog/compiler.rb:264:in `compile'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirector/catalog/compiler.rb:55:in `find'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/indirector/indirection.rb:194:in `find'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/application/apply.rb:256:in `block in main'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/context.rb:65:in `override'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet.rb:294:in `override'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/application/apply.rb:225:in `main'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/application/apply.rb:170:in `run_command'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/application.rb:358:in `block in run'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util.rb:542:in `exit_on_fail'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/application.rb:358:in `run'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/command_line.rb:132:in `run'
/opt/puppetlabs/puppet/lib/ruby/vendor_ruby/puppet/util/command_line.rb:72:in `execute'
/opt/puppetlabs/puppet/bin/puppet:5:in `<main>'

A more lenient and Unicode friendly approach is to use the Unicode friendly word regex like:

ValidTagRegex = /\A[[:word:]][[[:word:]]:.-]*\Z/

This appears necessary to fix the concat module issue reported at MODULES-4684

The open question around this is whether or not tags are restricted in this fashion due to some important underlying reason.

Thomas Hallgren spent some time optimizing regular expressions for tags in https://github.com/puppetlabs/puppet/commit/0badfce72bafc767bf3099718c63324f51108996 and the original check (based on words) goes back 10 years to https://github.com/puppetlabs/puppet/commit/f98be4a7198326b26f1072c401b3e337f340db40#diff-ac9d5450f67274b038705194387c3907R31



 Comments   
Comment by Ethan Brown [ 2017/05/24 ]

Moses also discovered that PDB has some validation code around tags at https://github.com/puppetlabs/puppetdb/blob/master/src/puppetlabs/puppetdb/catalogs.clj#L304-L320 that might need to change as well.

To the point Thomas Hallgren made on the PR we should probably not use :word: in Ruby as its not POSIX compliant. Java has a \p{Alnum} character class given its POSIX compliant, but no equivalent of the Ruby :word:

Comment by Josh Cooper [ 2017/05/26 ]

Merged to master in https://github.com/puppetlabs/puppet/commit/2ac4dc0395e08a603e97698de82833d388095c70

Comment by Josh Cooper [ 2017/06/02 ]

Ethan Brown We also need to update the API docs in https://github.com/puppetlabs/puppet/blob/master/api/schemas/catalog.json#L7-L14 and https://github.com/puppetlabs/puppet/blob/master/api/schemas/catalog.json#L61-L68

Comment by Henrik Lindberg [ 2017/06/07 ]

We also need to make sure that collection results are sane when querying on a tag with lower/uppercase international characters and that the same result is obtained from virtual and exported collection.

In PUP-6772 I voted against unicode in tags because of the many corner cases caused by different support for Unicode in MRI < 2.4.1 and that we are not yet on JRuby 9000.

While the fix here makes it possible to stick unicode chars into a tag - it really needs to work throughout the entire ecosystem.

Comment by Henrik Lindberg [ 2017/06/15 ]

Given that there are no acceptance tests that assert queries work on UTF-8 in tags (exported resources) or that it works with collection of virtual resources (matching tags when running on JRuby < 9k, up/down case issues, byte or codepoint compare etc). Do we want to release-note this feature at all? Or simply say that there is basic support for this (YMMV)? Ping Ethan Brown/Josh Cooper.

Comment by Josh Cooper [ 2017/06/23 ]

I think we should release note this as the latter, adds basic support for unicode tags. For example, running a subset of a catalog works with a unicode tag:

$ cat manifest.pp
notify { 'hello Aۿᚠ𠜎':
  tag => 'Aۿᚠ𠜎'
}
$ bundle exec puppet apply --tags empty manifest.pp
Notice: Compiled catalog for XXX.corp.puppetlabs.net in environment production in 0.02 seconds
Notice: Applied catalog in 0.01 seconds
$ bundle exec puppet apply --tags Aۿᚠ𠜎 manifest.pp
Notice: Compiled catalog for XXX.corp.puppetlabs.net in environment production in 0.02 seconds
Notice: hello Aۿᚠ𠜎
Notice: /Stage[main]/Main/Notify[hello Aۿᚠ𠜎]/message: defined 'message' as 'hello Aۿᚠ𠜎'
Notice: Applied catalog in 0.01 seconds

I'm sure there are downstream implications that we can address as future unicode adoption blockers.

Comment by Ethan Brown [ 2017/06/23 ]

Henrik Lindberg I'm sure you're right that there will be some areas where tags won't work for all their intended use cases when they are specified as non ASCII characters. However, we did have the low-hanging fruit need in this case to address the concat module problem and the problem Josh Cooper outlined.

So I'm in agreement with Josh Cooper about advertising this as a bug fix that introduces initial support, noting we may later have other fixes to add.

Comment by Ethan Brown [ 2017/06/23 ]

New PR to update schema at https://github.com/puppetlabs/puppet/pull/6016
PR also raised to update Puppet 5 docs at https://github.com/puppetlabs/puppet-docs/pull/757

Henrik Lindberg it looks like some of the other regular expressions in puppet-docs are out of date - would you mind updating them?

Generated at Tue Aug 11 00:38:37 PDT 2020 using Jira 8.5.2#805002-sha1:a66f9354b9e12ac788984e5d84669c903a370049.