Confusion about array initialization in C
In C language, if initialize an array like this:
int a[5] = {1,2};
then all the elements of the array that are not initialized explicitly will be initialized implicitly with zeroes.
But, if I initialize an array like this:
int a[5]={a[2]=1};
printf("%d %d %d %d %d\n", a[0], a[1],a[2], a[3], a[4]);
output:
1 0 1 0 0
I don't understand, why does a[0] print 1 instead of 0? Is it undefined behaviour?
Note: This question was asked in an interview.
TL;DR: I don't think the behavior of int a[5]={a[2]=1}; is well defined, at least in C99.
The funny part is that the only bit that makes sense to me is the part you're asking about: a[0] is set to 1 because the assignment operator returns the value that was assigned. It's everything else that's unclear.
If the code had been int a[5] = { [2] = 1 }, everything would've been easy: That's a designated initializer setting a[2] to 1 and everything else to 0. But with { a[2] = 1 } we have a non-designated initializer containing an assignment expression, and we fall down a rabbit hole.
Here's what I've found so far:
a must be a local variable.
6.7.8 Initialization
All the expressions in an initializer for an object that has static storage duration shall be constant expressions or string literals.
a[2] = 1 is not a constant expression, so a must have automatic storage.
a is in scope in its own initialization.
6.2.1 Scopes of identifiers
Structure, union, and enumeration tags have scope that begins just after the appearance of
the tag in a type specifier that declares the tag. Each enumeration constant has scope that
begins just after the appearance of its defining enumerator in an enumerator list. Any
other identifier has scope that begins just after the completion of its declarator.
The declarator is a[5], so variables are in scope in their own initialization.
a is alive in its own initialization.
6.2.4 Storage durations of objects
An object whose identifier is declared with no linkage and without the storage-class
specifier static has automatic storage duration.
For such an object that does not have a variable length array type, its lifetime extends
from entry into the block with which it is associated until execution of that block ends in
any way. (Entering an enclosed block or calling a function suspends, but does not end,
execution of the current block.) If the block is entered recursively, a new instance of the
object is created each time. The initial value of the object is indeterminate. If an
initialization is specified for the object, it is performed each time the declaration is
reached in the execution of the block; otherwise, the value becomes indeterminate each
time the declaration is reached.
There is a sequence point after a[2]=1.
6.8 Statements and blocks
A full expression is an expression that is not part of another expression or of a declarator.
Each of the following is a full expression: an initializer; the expression in an expression
statement; the controlling expression of a selection statement (if or switch); the
controlling expression of a while or do statement; each of the (optional) expressions of
a for statement; the (optional) expression in a return statement. The end of a full
expression is a sequence point.
Note that e.g. in int foo[] = { 1, 2, 3 } the { 1, 2, 3 } part is a brace-enclosed list of initializers, each of which has a sequence point after it.
Initialization is performed in initializer list order.
6.7.8 Initialization
Each brace-enclosed initializer list has an associated current object. When no
designations are present, subobjects of the current object are initialized in order according
to the type of the current object: array elements in increasing subscript order, structure members in declaration order, and the first named member of a union. [...]
The initialization shall occur in initializer list order, each initializer provided for a
particular subobject overriding any previously listed initializer for the same subobject; all
subobjects that are not initialized explicitly shall be initialized implicitly the same as
objects that have static storage duration.
However, initializer expressions are not necessarily evaluated in order.
6.7.8 Initialization
The order in which any side effects occur among the initialization list expressions is
unspecified.
However, that still leaves some questions unanswered:
Are sequence points even relevant? The basic rule is:
6.5 Expressions
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.
a[2] = 1 is an expression, but initialization is not.
This is slightly contradicted by Annex J:
J.2 Undefined behavior
Between two sequence points, an object is modified more than once, or is modified
and the prior value is read other than to determine the value to be stored (6.5).
Annex J says any modification counts, not just modifications by expressions. But given that annexes are non-normative, we can probably ignore that.
How are the subobject initializations sequenced with respect to initializer expressions? Are all initializers evaluated first (in some order), then the subobjects are initialized with the results (in initializer list order)? Or can they be interleaved?
I think int a[5] = { a[2] = 1 } is executed as follows:
Storage for a is allocated when its containing block is entered. The contents are indeterminate at this point.
The (only) initializer is executed (a[2] = 1), followed by a sequence point. This stores 1 in a[2] and returns 1.
That 1 is used to initialize a[0] (the first initializer initializes the first subobject).
But here things get fuzzy because the remaining elements (a[1], a[2], a[3], a[4]) are supposed to be initialized to 0, but it's not clear when: Does it happen before a[2] = 1 is evaluated? If so, a[2] = 1 would "win" and overwrite a[2], but would that assignment have undefined behavior because there is no sequence point between the zero initialization and the assignment expression? Are sequence points even relevant (see above)? Or does zero initialization happen after all initializers are evaluated? If so, a[2] should end up being 0.
Because the C standard does not clearly define what happens here, I believe the behavior is undefined (by omission).
I don't understand, why does a[0] print 1 instead of 0?
Presumably a[2]=1 initializes a[2] first, and the result of the expression is used to initialize a[0].
From N2176 (C17 draft):
6.7.9 Initialization
The evaluations of the initialization list expressions are indeterminately sequenced with respect to
one another and thus the order in which any side effects occur is unspecified. 154)
So it would seem that output 1 0 0 0 0 would also have been possible.
Conclusion: Don't write initializers that modifies the initialized variable on the fly.
My Understanding is
a[2]=1 returns value 1 so code becomes
int a[5]={a[2]=1} --> int a[5]={1}
int a[5]={1} assign value for a[0]=1
Hence it print 1 for a[0]
For example
char str[10]={‘H’,‘a’,‘i’};
char str[0] = ‘H’;
char str[1] = ‘a’;
char str[2] = ‘i;
I think the C11 standard covers this behaviour and says that the result
is unspecified, and I don't think C18 made any relevant changes in
this area.
The standard language is not easy to parse.
The relevant section of the standard is
§6.7.9 Initialization.
The syntax documented is (where the occurrences of 'opt' should be subscripts, but MarkDown makes that hard):
initializer:
assignment-expression
{ initializer-list }
{ initializer-list , }
initializer-list:
designationopt initializer
initializer-list , designationopt initializer
designation:
designator-list =
designator-list:
designator
designator-list designator
designator:
[ constant-expression ]
. identifier
Note that one of the terms is assignment-expression, and since a[2] = 1 is indubitably an assignment expression, it is allowed inside
initializers for arrays with non-static duration:
§4 All the expressions in an initializer for an object that has
static or thread storage duration shall be constant expressions or
string literals.
One of the key paragraphs is:
§19 The initialization shall occur in initializer list order, each
initializer provided for a particular subobject overriding any
previously listed initializer for the same subobject;151)
all subobjects that are not initialized explicitly shall be
initialized implicitly the same as objects that have static storage
duration.
151) Any initializer for the subobject which is overridden
and so not used to initialize that subobject might not be evaluated at
all.
And another key paragraph is:
§23 The evaluations of the initialization list expressions are
indeterminately sequenced with respect to one another and thus the
order in which any side effects occur is unspecified.152)
152) In particular, the evaluation order need not be the
same as the order of subobject initialization.
I'm fairly sure that paragraph §23 indicates that the notation in the
question:
int a[5] = { a[2] = 1 };
leads to unspecified behaviour.
The assignment to a[2] is a side-effect, and the evaluation order of the
expressions are indeterminately sequenced with respect to one another.
Consequently, I don't think there is a way to appeal to the standard and
claim that a particular compiler is handling this correctly or incorrectly.
I try to give a short and simple answer for the puzzle: int a[5] = { a[2] = 1 };
First a[2] = 1 is set. That means the array says: 0 0 1 0 0
But behold, given that you did it in the { } brackets, which are used to initialize the array in order, it takes the first value (which is 1) and sets that to a[0]. It is as if int a[5] = { a[2] }; would remain, where we already got a[2] = 1. The resulting array is now: 1 0 1 0 0
Another example: int a[6] = { a[3] = 1, a[4] = 2, a[5] = 3 }; - Even though the order is somewhat arbitrary, assuming it goes from left to right, it would go in these 6 steps:
0 0 0 1 0 0
1 0 0 1 0 0
1 0 0 1 2 0
1 2 0 1 2 0
1 2 0 1 2 3
1 2 3 1 2 3
The assignment a[2]= 1 is an expression that has the value 1, and you essentially wrote int a[5]= { 1 }; (with the side effect that a[2] is assigned 1 as well).
Order of operations.
First, the assignment occurs and the assignment is evaluated as this:
int a[5] = {1}, which yields the following:
1, 0, 0, 0, 0. We get {1} because a[2]=1 evaluates to true and is implicitly cast to 1 in the assignment series.
Second, the expression within the curly brackets is executed, which results in actually assigning 1 to the array at index 2.
You can experiment by compiling int a[5]={true}; and also int a[5]={a[3]=3}; and observing the results.
edit: I was wrong about the result of the assignment within the initialization list, that results in an integer that is the same as what was assigned.
int a[5] = {1,2};
then all the elements of the array that are not initialized explicitly will be initialized implicitly with zeroes.
But, if I initialize an array like this:
int a[5]={a[2]=1};
printf("%d %d %d %d %d\n", a[0], a[1],a[2], a[3], a[4]);
output:
1 0 1 0 0
I don't understand, why does a[0] print 1 instead of 0? Is it undefined behaviour?
Note: This question was asked in an interview.
TL;DR: I don't think the behavior of int a[5]={a[2]=1}; is well defined, at least in C99.
The funny part is that the only bit that makes sense to me is the part you're asking about: a[0] is set to 1 because the assignment operator returns the value that was assigned. It's everything else that's unclear.
If the code had been int a[5] = { [2] = 1 }, everything would've been easy: That's a designated initializer setting a[2] to 1 and everything else to 0. But with { a[2] = 1 } we have a non-designated initializer containing an assignment expression, and we fall down a rabbit hole.
Here's what I've found so far:
a must be a local variable.
6.7.8 Initialization
All the expressions in an initializer for an object that has static storage duration shall be constant expressions or string literals.
a[2] = 1 is not a constant expression, so a must have automatic storage.
a is in scope in its own initialization.
6.2.1 Scopes of identifiers
Structure, union, and enumeration tags have scope that begins just after the appearance of
the tag in a type specifier that declares the tag. Each enumeration constant has scope that
begins just after the appearance of its defining enumerator in an enumerator list. Any
other identifier has scope that begins just after the completion of its declarator.
The declarator is a[5], so variables are in scope in their own initialization.
a is alive in its own initialization.
6.2.4 Storage durations of objects
An object whose identifier is declared with no linkage and without the storage-class
specifier static has automatic storage duration.
For such an object that does not have a variable length array type, its lifetime extends
from entry into the block with which it is associated until execution of that block ends in
any way. (Entering an enclosed block or calling a function suspends, but does not end,
execution of the current block.) If the block is entered recursively, a new instance of the
object is created each time. The initial value of the object is indeterminate. If an
initialization is specified for the object, it is performed each time the declaration is
reached in the execution of the block; otherwise, the value becomes indeterminate each
time the declaration is reached.
There is a sequence point after a[2]=1.
6.8 Statements and blocks
A full expression is an expression that is not part of another expression or of a declarator.
Each of the following is a full expression: an initializer; the expression in an expression
statement; the controlling expression of a selection statement (if or switch); the
controlling expression of a while or do statement; each of the (optional) expressions of
a for statement; the (optional) expression in a return statement. The end of a full
expression is a sequence point.
Note that e.g. in int foo[] = { 1, 2, 3 } the { 1, 2, 3 } part is a brace-enclosed list of initializers, each of which has a sequence point after it.
Initialization is performed in initializer list order.
6.7.8 Initialization
Each brace-enclosed initializer list has an associated current object. When no
designations are present, subobjects of the current object are initialized in order according
to the type of the current object: array elements in increasing subscript order, structure members in declaration order, and the first named member of a union. [...]
The initialization shall occur in initializer list order, each initializer provided for a
particular subobject overriding any previously listed initializer for the same subobject; all
subobjects that are not initialized explicitly shall be initialized implicitly the same as
objects that have static storage duration.
However, initializer expressions are not necessarily evaluated in order.
6.7.8 Initialization
The order in which any side effects occur among the initialization list expressions is
unspecified.
However, that still leaves some questions unanswered:
Are sequence points even relevant? The basic rule is:
6.5 Expressions
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.
a[2] = 1 is an expression, but initialization is not.
This is slightly contradicted by Annex J:
J.2 Undefined behavior
Between two sequence points, an object is modified more than once, or is modified
and the prior value is read other than to determine the value to be stored (6.5).
Annex J says any modification counts, not just modifications by expressions. But given that annexes are non-normative, we can probably ignore that.
How are the subobject initializations sequenced with respect to initializer expressions? Are all initializers evaluated first (in some order), then the subobjects are initialized with the results (in initializer list order)? Or can they be interleaved?
I think int a[5] = { a[2] = 1 } is executed as follows:
Storage for a is allocated when its containing block is entered. The contents are indeterminate at this point.
The (only) initializer is executed (a[2] = 1), followed by a sequence point. This stores 1 in a[2] and returns 1.
That 1 is used to initialize a[0] (the first initializer initializes the first subobject).
But here things get fuzzy because the remaining elements (a[1], a[2], a[3], a[4]) are supposed to be initialized to 0, but it's not clear when: Does it happen before a[2] = 1 is evaluated? If so, a[2] = 1 would "win" and overwrite a[2], but would that assignment have undefined behavior because there is no sequence point between the zero initialization and the assignment expression? Are sequence points even relevant (see above)? Or does zero initialization happen after all initializers are evaluated? If so, a[2] should end up being 0.
Because the C standard does not clearly define what happens here, I believe the behavior is undefined (by omission).
I don't understand, why does a[0] print 1 instead of 0?
Presumably a[2]=1 initializes a[2] first, and the result of the expression is used to initialize a[0].
From N2176 (C17 draft):
6.7.9 Initialization
The evaluations of the initialization list expressions are indeterminately sequenced with respect to
one another and thus the order in which any side effects occur is unspecified. 154)
So it would seem that output 1 0 0 0 0 would also have been possible.
Conclusion: Don't write initializers that modifies the initialized variable on the fly.
My Understanding is
a[2]=1 returns value 1 so code becomes
int a[5]={a[2]=1} --> int a[5]={1}
int a[5]={1} assign value for a[0]=1
Hence it print 1 for a[0]
For example
char str[10]={‘H’,‘a’,‘i’};
char str[0] = ‘H’;
char str[1] = ‘a’;
char str[2] = ‘i;
I think the C11 standard covers this behaviour and says that the result
is unspecified, and I don't think C18 made any relevant changes in
this area.
The standard language is not easy to parse.
The relevant section of the standard is
§6.7.9 Initialization.
The syntax documented is (where the occurrences of 'opt' should be subscripts, but MarkDown makes that hard):
initializer:
assignment-expression
{ initializer-list }
{ initializer-list , }
initializer-list:
designationopt initializer
initializer-list , designationopt initializer
designation:
designator-list =
designator-list:
designator
designator-list designator
designator:
[ constant-expression ]
. identifier
Note that one of the terms is assignment-expression, and since a[2] = 1 is indubitably an assignment expression, it is allowed inside
initializers for arrays with non-static duration:
§4 All the expressions in an initializer for an object that has
static or thread storage duration shall be constant expressions or
string literals.
One of the key paragraphs is:
§19 The initialization shall occur in initializer list order, each
initializer provided for a particular subobject overriding any
previously listed initializer for the same subobject;151)
all subobjects that are not initialized explicitly shall be
initialized implicitly the same as objects that have static storage
duration.
151) Any initializer for the subobject which is overridden
and so not used to initialize that subobject might not be evaluated at
all.
And another key paragraph is:
§23 The evaluations of the initialization list expressions are
indeterminately sequenced with respect to one another and thus the
order in which any side effects occur is unspecified.152)
152) In particular, the evaluation order need not be the
same as the order of subobject initialization.
I'm fairly sure that paragraph §23 indicates that the notation in the
question:
int a[5] = { a[2] = 1 };
leads to unspecified behaviour.
The assignment to a[2] is a side-effect, and the evaluation order of the
expressions are indeterminately sequenced with respect to one another.
Consequently, I don't think there is a way to appeal to the standard and
claim that a particular compiler is handling this correctly or incorrectly.
I try to give a short and simple answer for the puzzle: int a[5] = { a[2] = 1 };
First a[2] = 1 is set. That means the array says: 0 0 1 0 0
But behold, given that you did it in the { } brackets, which are used to initialize the array in order, it takes the first value (which is 1) and sets that to a[0]. It is as if int a[5] = { a[2] }; would remain, where we already got a[2] = 1. The resulting array is now: 1 0 1 0 0
Another example: int a[6] = { a[3] = 1, a[4] = 2, a[5] = 3 }; - Even though the order is somewhat arbitrary, assuming it goes from left to right, it would go in these 6 steps:
0 0 0 1 0 0
1 0 0 1 0 0
1 0 0 1 2 0
1 2 0 1 2 0
1 2 0 1 2 3
1 2 3 1 2 3
The assignment a[2]= 1 is an expression that has the value 1, and you essentially wrote int a[5]= { 1 }; (with the side effect that a[2] is assigned 1 as well).
Order of operations.
First, the assignment occurs and the assignment is evaluated as this:
int a[5] = {1}, which yields the following:
1, 0, 0, 0, 0. We get {1} because a[2]=1 evaluates to true and is implicitly cast to 1 in the assignment series.
Second, the expression within the curly brackets is executed, which results in actually assigning 1 to the array at index 2.
You can experiment by compiling int a[5]={true}; and also int a[5]={a[3]=3}; and observing the results.
edit: I was wrong about the result of the assignment within the initialization list, that results in an integer that is the same as what was assigned.
Comments
Post a Comment