Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions Changelog
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
version 1.7.4.1 (tag v1.7.4.1rel)
=================================
* Change default encoding for stringtochar/chartostring functions from 'utf-8' to 'utf-8'/'ascii' for dtype.kind='U'/'S'
(issue #1464).

version 1.7.4 (tag v1.7.4rel)
================================
* Make sure automatic conversion of character arrays <--> string arrays works for Unicode strings (issue #1440).
Expand Down
18 changes: 9 additions & 9 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1226,7 +1226,7 @@ <h2 id="support-for-complex-numbers">Support for complex numbers</h2>
<h2 class="section-title" id="header-functions">Functions</h2>
<dl>
<dt id="netCDF4.chartostring"><code class="name flex">
<span>def <span class="ident">chartostring</span></span>(<span>b, encoding='utf-8')</span>
<span>def <span class="ident">chartostring</span></span>(<span>b, encoding=None)</span>
</code></dt>
<dd>
<div class="desc"><p><strong><code>chartostring(b,encoding='utf-8')</code></strong></p>
Expand All @@ -1236,8 +1236,8 @@ <h2 class="section-title" id="header-functions">Functions</h2>
Will be converted to a array of strings, where each string has a fixed
length of <code>b.shape[-1]</code> characters.</p>
<p>optional kwarg <code>encoding</code> can be used to specify character encoding (default
<code>utf-8</code>). If <code>encoding</code> is 'none' or 'bytes', a <code>numpy.string_</code> byte array is
returned.</p>
<code>utf-8</code> for dtype=<code>'UN'</code> or <code>ascii</code> for dtype=<code>'SN'</code>). If <code>encoding</code> is 'none' or 'bytes',
a <code>numpy.string_</code> byte array is returned.</p>
<p>returns a numpy string array with datatype <code>'UN'</code> (or <code>'SN'</code>) and shape
<code>b.shape[:-1]</code> where where <code>N=b.shape[-1]</code>.</p></div>
</dd>
Expand All @@ -1254,7 +1254,7 @@ <h2 class="section-title" id="header-functions">Functions</h2>
<p><strong>calendar</strong>: describes the calendar to be used in the time calculations.
All the values currently defined in the
<code>CF metadata convention &lt;http://cfconventions.org/cf-conventions/cf-conventions#calendar&gt;</code>__ are supported.
Valid calendars <strong>'standard', 'gregorian', 'proleptic_gregorian'
Valid calendars <strong>'standard', 'gregorian', 'proleptic_gregorian', 'tai',
'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'</strong>.
Default is <code>None</code> which means the calendar associated with the first
input datetime instance will be used.</p>
Expand Down Expand Up @@ -1305,7 +1305,7 @@ <h2 class="section-title" id="header-functions">Functions</h2>
<p><strong>calendar</strong>: describes the calendar to be used in the time calculations.
All the values currently defined in the
<code>CF metadata convention &lt;http://cfconventions.org/cf-conventions/cf-conventions#calendar&gt;</code>__ are supported.
Valid calendars <strong>'standard', 'gregorian', 'proleptic_gregorian'
Valid calendars <strong>'standard', 'gregorian', 'proleptic_gregorian', 'tai',
'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'</strong>.
Default is <code>None</code> which means the calendar associated with the first
input datetime instance will be used.</p>
Expand Down Expand Up @@ -1381,7 +1381,7 @@ <h2 class="section-title" id="header-functions">Functions</h2>
<p><strong>calendar</strong>: describes the calendar used in the time calculations.
All the values currently defined in the
<code>CF metadata convention &lt;http://cfconventions.org/cf-conventions/cf-conventions#calendar&gt;</code>__ are supported.
Valid calendars <strong>'standard', 'gregorian', 'proleptic_gregorian'
Valid calendars <strong>'standard', 'gregorian', 'proleptic_gregorian', 'tai',
'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day'</strong>.
Default is <strong>'standard'</strong>, which is a mixed Julian/Gregorian calendar.</p>
<p><strong>only_use_cftime_datetimes</strong>: if False, python datetime.datetime
Expand Down Expand Up @@ -1476,7 +1476,7 @@ <h2 class="section-title" id="header-functions">Functions</h2>
(default) or <code>'U1'</code> (if dtype=<code>'U'</code>)</p></div>
</dd>
<dt id="netCDF4.stringtochar"><code class="name flex">
<span>def <span class="ident">stringtochar</span></span>(<span>a, encoding='utf-8', n_strlen=None)</span>
<span>def <span class="ident">stringtochar</span></span>(<span>a, encoding=None, n_strlen=None)</span>
</code></dt>
<dd>
<div class="desc"><p><strong><code>stringtochar(a,encoding='utf-8',n_strlen=None)</code></strong></p>
Expand All @@ -1487,8 +1487,8 @@ <h2 class="section-title" id="header-functions">Functions</h2>
Will be converted to
an array of characters (datatype <code>'S1'</code> or <code>'U1'</code>) of shape <code>a.shape + (N,)</code>.</p>
<p>optional kwarg <code>encoding</code> can be used to specify character encoding (default
<code>utf-8</code>). If <code>encoding</code> is 'none' or 'bytes', a <code>numpy.string_</code> the input array
is treated a raw byte strings (<code>numpy.string_</code>).</p>
<code>utf-8</code> for dtype=<code>'UN'</code> or <code>ascii</code> for dtype=<code>'SN'</code>). If <code>encoding</code> is 'none' or 'bytes',
a <code>numpy.string_</code> the input array is treated a raw byte strings (<code>numpy.string_</code>).</p>
<p>optional kwarg <code>n_strlen</code> is the number of characters in each string.
Default
is None, which means <code>n_strlen</code> will be set to a.itemsize (the number of bytes
Expand Down
1 change: 1 addition & 0 deletions examples/tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,7 @@ def walktree(top):
datac2.imag = datain['imag']
print(datac.dtype,datac)
print(datac2.dtype,datac2)
nc.close()

# more complex compound type example.
nc = Dataset('compound_example.nc','w') # create a new dataset.
Expand Down
5 changes: 3 additions & 2 deletions src/netCDF4/__init__.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -704,7 +704,8 @@ def stringtochar(
@overload
def stringtochar(
a: npt.NDArray[np.character],
encoding: str = ...,
encoding: str | None = None,
n_strlen: int | None = None,
) -> npt.NDArray[np.str_] | npt.NDArray[np.bytes_]: ...
@overload
def chartostring(
Expand All @@ -714,7 +715,7 @@ def chartostring(
@overload
def chartostring(
b: npt.NDArray[np.character],
encoding: str = ...,
encoding: str | None = None,
) -> npt.NDArray[np.str_] | npt.NDArray[np.bytes_]: ...
def getlibversion() -> str: ...
def rc_get(key: str) -> str | None: ...
Expand Down
30 changes: 20 additions & 10 deletions src/netCDF4/_netCDF4.pyx
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Version 1.7.4
"""Version 1.7.4.1
-------------

# Introduction
Expand Down Expand Up @@ -1283,7 +1283,7 @@ import sys
import functools
from typing import Union

__version__ = "1.7.4"
__version__ = "1.7.4.1"

# Initialize numpy
import posixpath
Expand Down Expand Up @@ -6788,7 +6788,7 @@ returns a rank 1 numpy character array of length NUMCHARS with datatype `'S1'`
arr[0:len(string)] = tuple(string)
return arr

def stringtochar(a,encoding='utf-8',n_strlen=None):
def stringtochar(a,encoding=None,n_strlen=None):
"""
**`stringtochar(a,encoding='utf-8',n_strlen=None)`**

Expand All @@ -6799,8 +6799,8 @@ is the number of characters in each string. Will be converted to
an array of characters (datatype `'S1'` or `'U1'`) of shape `a.shape + (N,)`.

optional kwarg `encoding` can be used to specify character encoding (default
`utf-8`). If `encoding` is 'none' or 'bytes', a `numpy.string_` the input array
is treated a raw byte strings (`numpy.string_`).
`utf-8` for dtype=`'UN'` or `ascii` for dtype=`'SN'`). If `encoding` is 'none' or 'bytes',
a `numpy.string_` the input array is treated a raw byte strings (`numpy.string_`).

optional kwarg `n_strlen` is the number of characters in each string. Default
is None, which means `n_strlen` will be set to a.itemsize (the number of bytes
Expand All @@ -6809,10 +6809,15 @@ used to represent each string in the input array).
returns a numpy character array with datatype `'S1'` or `'U1'`
and shape `a.shape + (N,)`, where N is the length of each string in a."""
dtype = a.dtype.kind
if n_strlen is None:
n_strlen = a.dtype.itemsize
if dtype not in ["S","U"]:
raise ValueError("type must string or unicode ('S' or 'U')")
if encoding is None:
if dtype == 'S':
encoding = 'ascii'
else:
encoding = 'utf-8'
if n_strlen is None:
n_strlen = a.dtype.itemsize
if encoding in ['none','None','bytes']:
b = numpy.array(tuple(a.tobytes()),'S1')
elif encoding == 'ascii':
Expand All @@ -6827,7 +6832,7 @@ and shape `a.shape + (N,)`, where N is the length of each string in a."""
b = numpy.array([[bb[i:i+1] for i in range(n_strlen)] for bb in bbytes])
return b

def chartostring(b,encoding='utf-8'):
def chartostring(b,encoding=None):
"""
**`chartostring(b,encoding='utf-8')`**

Expand All @@ -6838,14 +6843,19 @@ Will be converted to a array of strings, where each string has a fixed
length of `b.shape[-1]` characters.

optional kwarg `encoding` can be used to specify character encoding (default
`utf-8`). If `encoding` is 'none' or 'bytes', a `numpy.string_` byte array is
returned.
`utf-8` for dtype=`'UN'` or `ascii` for dtype=`'SN'`). If `encoding` is 'none' or 'bytes',
a `numpy.string_` byte array is returned.

returns a numpy string array with datatype `'UN'` (or `'SN'`) and shape
`b.shape[:-1]` where where `N=b.shape[-1]`."""
dtype = b.dtype.kind
if dtype not in ["S","U"]:
raise ValueError("type must be string or unicode ('S' or 'U')")
if encoding is None:
if dtype == 'S':
encoding = 'ascii'
else:
encoding = 'utf-8'
bs = b.tobytes()
slen = int(b.shape[-1])
if encoding in ['none','None','bytes']:
Expand Down