Script num=c3=a9ro 5

22 réponses

Olivier Miakinen

19/10/2019 à 18:39

Cette fois ça devrait être bon.

La prochaine étape consistera à écrire directement les fonctions
decode_qp et decode_b64 en shell au lieu d'appeler des programmes
externes. ;-)

#!/bin/bash
############################################################################
# Name:
# decode_headers
#
# Description:
# Script used for decoding RFC 2047 MIME encoded headers.
#
# Example:
# decode_headers <<POTIRON
# Message-ID: <c5a9-1@part.org>
# From: arvo@part.org (=?ASCII?Q?Arvo?= =?L1?Q?=20?= =?UTF-8?Q?P=C3=A4rt?=)
# To: =?Latin1?Q?Fr=E9d=E9ric_Chopin?= <fred@chopin.org>,
# =?Latin2?Q?Anton=EDn_Dvo=F8=E1k?= <anton@dvorak.org>
# Cc: Arvo =?UTF-8?Q?P=C3=A4rt?= <arvo@part.org>
# References: <A65R-4d@chopin.org> <c5a7-3@part.org>
# <A72Q-5a@chopin.org>
# In-Reply-To: <A72Q-5a@chopin.org>
# Subject: Re: Going to =?Shift-JIS?B?k4yLngo=?= (Tokyo) =?UTF-8?B?zpEK?=
# =?UTF-8?B?zrjOrs69zrEK?= (Athens) and =?ISO-8859-5?Q?=BC=DE=E1=DA?=
# =?ISO-8859-5?Q?=D2=D0?= (Moscow)
# POTIRON
# ->
# Message-ID: <c5a9-1@part.org>
# From: arvo@part.org (Arvo Pärt)
# To: Frédéric Chopin <fred@chopin.org>, Antonín Dvořák <anton@dvorak.org>
# Cc: Arvo Pärt <arvo@part.org>
# References: <A65R-4d@chopin.org> <c5a7-3@part.org> <A72Q-5a@chopin.org>
# In-Reply-To: <A72Q-5a@chopin.org>
# Subject: Re: Going to 東京 (Tokyo) Αθήνα (Athens) and Москва (Moscow)
############################################################################

#
# The following functions should be adapted to your system.
#
if [ "$(uname)" = "Linux" ]; then

# GNU/Linux
decode_qp()
{
tr "_" " " | /usr/bin/qprint -d
}
decode_b64()
{
/usr/bin/base64 -d
}

else

# MacOS X
decode_qp()
{
tr "_" " " | /usr/local/bin/qprint -d
}
decode_b64()
{
/usr/bin/base64 -D
}

fi

#
# This script uses the following global variables
# VARY
# used when a function has to return a string and not only a number
# DECODED_LINE
# The current state of a header being currently decoded
# STATUS
# Has three possible values:
# - "none" at the beginning of a new header
# - "decoded-word" after a correctly decoded MIME part
# - "normal" after any other string
#
VARY=""
DECODED_LINE=""
STATUS="none"

#
# Function: usage
#
usage()
{
printf "Usage: %s [OPTION...] [FILE...]\n" $0
printf "Decode headers from given files for RFC 2047 MIME encodings.\n"
printf "\n"
printf "With no FILE, or when FILE is -, read standard input.\n"
printf "\n"
printf " -h, --help Give this help list\n"
exit 1
}

#
# Function: decode_word
#
# Description:
# Check that the parameter is an encoded word, then decode it and
# convert it to UTF-8.
#
# Parameter:
# A single (possibly MIME-encoded) word.
#
# Return value:
# If decoding is ok, set the result into VARY and return 0
# Otherwise return 1
#
decode_word()
{
word="$*"
###################################################################
# An encoded word contains only ASCII characters in range from
# '!' (Ascii value 0x21) to '~' (Ascii value 0x7e). This excludes
# in particular the SPACE (Ascii 0x20) and the TAB (Ascii 0x09).
#
# More specifically, it consists of five parts separated by
# question marks '?'
# 1. A character "="
# 2. The charset, e.g. "UTF-8" or "ISO-8859-1"
# 3. The encoding, B or Q in upper or lower case
# 4. The encoded text
# 5. A character "="
###################################################################

# Check that:
# - there is no character outside range from '!' to '~'
# - the 1st part is a "="
# - the 5th part is a "=" and it is the end of the string
if [ $(LANG=C expr "_$word" : '_[!-~]*$') = 0 ]; then return 1; fi
part1=$(printf "$word" | cut -f 1 -d '?')
part5=$(printf "$word" | cut -f 5- -d '?')
if [ "$part1" != "=" -o "$part5" != "=" ]; then return 1; fi

# Extract charset, encoding, and encoded text
charset=$(printf "$word" | cut -f 2 -d '?')
encoding=$(printf "$word" | cut -f 3 -d '?')
encoded=$(printf "$word" | cut -f 4 -d '?')

case $encoding in
B | b)
decoded=$(printf "$encoded" | decode_b64 2>/dev/null)
if [ $? != 0 ]; then return 1; fi
;;
Q | q)
decoded=$(printf "$encoded" | decode_qp 2>/dev/null)
if [ $? != 0 ]; then return 1; fi
;;
*)
return 1
;;
esac

VARY=$(printf "$decoded" | iconv -f $charset -t UTF-8 2>/dev/null)
return $?
}

#
# Function: add_word
#
# Description:
# Try to decode a new word, and update DECODED_LINE and STATUS
# depending on the result and the previous STATUS.
#
# Parameter:
# A single (possibly MIME-encoded) word.
#
# Return value:
# None
#
# Side effects:
# Change DECODED_LINE and STATUS
#
add_word()
{
# Manage possible initial and final parentheses
# $p1 = prefix, $p2 = word, $p3 = suffix
p123="$*"
p1=$(printf "%s" "${p123}" | sed -e 's/^$[()]*$.*/\1/')
p23="${p123:${#p1}}"
p2=$(printf "%s" "${p23}" | sed -e 's/[()]*$//')
p3="${p23:${#p2}}"

if decode_word "$p2"; then
word="${p1}${VARY}${p3}"
if [ "$STATUS" = "normal" ]; then
DECODED_LINE="${DECODED_LINE} "
elif [ "$STATUS" != "none" -a -n "${p1}" ]; then
DECODED_LINE="${DECODED_LINE} "
fi
DECODED_LINE="${DECODED_LINE}${word}"
if [ -n "${p3}" ]; then
STATUS="normal"
else
STATUS="decoded-word"
fi
else
word="${p123}"
if [ "$STATUS" != "none" ]; then
DECODED_LINE="${DECODED_LINE} "
fi
DECODED_LINE="${DECODED_LINE}${word}"
STATUS="normal"
fi
}

#
# Function: flush_line
#
# Description:
# Before beginning to manage a new header, or just before ending
# the script, print the pending DECODED_LINE if any.
#
# Parameter:
# None
#
# Return value:
# None
#
# Side effects:
# Print things to stdout
# Change DECODED_LINE and STATUS
#
flush_line()
{
if [ -n "${DECODED_LINE}" ]; then
printf "%s\n" "${DECODED_LINE}"
DECODED_LINE=""
STATUS="none"
fi
}

#
# Function: manage_line
#
# Description:
# Manage a new line, which can be either the beginning or the
# continuation of a mail/news header.
# This function prints the previous line if this is a new one,
# then it adds successive parts to the new DECODED_LINE, while
# updating STATUS as needed.
#
# Parameter:
# An input line.
#
# Return value:
# None
#
# Side effects:
# Print things to stdout
# Change DECODED_LINE and STATUS
#
manage_line()
{
line="$*"

# Is it a continuation line?
if [ $(LANG=C expr "_$line" : "_[ \t]") = 0 ]; then
# No: new header
flush_line
fi

for word in $line; do
add_word "$word"
done
}

#
# Function: manage_file
#
# Description:
# Call manage_line for each line in a given file
#
# Parameter:
# A file name, or "-" for stdin
#
# Return value:
# None
#
# Side effects:
# Same as manage_line
#
manage_file()
{
file=${1--} # POSIX-compliant; ${1:--} can be used either.
while IFS= read -r line; do
manage_line "$line"
done < <(cat -- "$file")
}

#
# Parse arguments for -h or --help
#
for i in "$@"; do
case $i in
-h | --help)
usage
;;
-)
;;
-*)
printf "Unknown argument '%s'\n" "$i"
usage
;;
esac
done

#
# Main loop.
# Call manage_file for each filename in parameters,
# then print the last pending DECODED_LINE if any.
#
if [ "$*" = "" ]; then
manage_file "-"
else
for file in "$@"; do
manage_file "$file"
done
fi
flush_line

exit 0

--
Olivier Miakinen

10 réponses

1 2 3

Olivier Miakinen

20/10/2019 à 11:11

Le 20/10/2019 02:54, Joseph-B a écrit :

Corrige moi d'un doute :
du fait de ces fonctions intégrées, il n'y a plus besoin de distinguer entre Linux et Darwin ?

Exactement ! En plus, ça me permettra de le faire fonctionner sur cygwin
où je n'ai ni qprint ni base64.

[...]
En résumé, je vois que partout le QP a bien été décodé (quel que soit le charset) mais que le Base-64 ne l'est pas

C'est déjà une très bonne nouvelle, ça veut dire que seule la fonction
decode_b64 a besoin d'être débuguée, et c'est probablement à cause des
calculs arithmétiques $(( ... )) dans bash. Tu pourrais mettre « set -x »
dans la fonction decode_b64 pour qu'on voie ce qui se passe ?
--
Olivier Miakinen

M.V.

20/10/2019 à 13:27

Le 20 octobre 2019 à 02:54, Joseph-B a pris le temps d'écrire :

=?UTF-8?B?zpEK?= =?UTF-8?B?zrjOrs69zrEK?

Question que je me pose : quand on a un truc du genre ci-dessus,
est-ce que c'est obligatoire que l'espace située entre les 2 "encoded
words" doit disparaître après décodage ?
Autre question : se peut-il qu'il faille utiliser parfois Base64 *puis* iconv ?
Si oui, pourrais-je avoir un exemple d'"encoded word" qui le nécessite ?
Merci.
Bonne journée.
--
Michel VAUQUOIS - http://michelvauquois.fr

josephb

20/10/2019 à 14:23

Bonjour,
Olivier Miakinen <om+ demandait :

Tu pourrais mettre « set -x »
dans la fonction decode_b64 pour qu'on voie ce qui se passe ?

oui, j'avais posté le test ici
Message-ID: <1ofq61z.1hkad6f1eg28gqN%
--
J. B.

josephb

20/10/2019 à 14:23

M.V. surestimant mes connaissances :

=?UTF-8?B?zpEK?= =?UTF-8?B?zrjOrs69zrEK?
Question que je me pose : quand on a un truc du genre ci-dessus,
est-ce que c'est obligatoire que l'espace située entre les 2 "encoded
words" doit disparaître après décodage ?

Sauf erreur de ma part, le caractère espace ici présent va être vu par le shell
comme un opérateur dans une chaine de commandes et donc va forcément disparaître
du résultat.

Autre question : se peut-il qu'il faille utiliser parfois Base64
*puis* iconv ?
Si oui, pourrais-je avoir un exemple d'"encoded word" qui le nécessite ?

Pour tout te dire, les arcanes des RFC impliquées, c'est plutôt le domaine
d'Olivier ;-)
--
J. B.

Olivier Miakinen

20/10/2019 à 17:33

Le 20/10/2019 13:27, M.V. a écrit :

Le 20 octobre 2019 à 02:54, Joseph-B a pris le temps d'écrire :
=?UTF-8?B?zpEK?= =?UTF-8?B?zrjOrs69zrEK? >

Question que je me pose : quand on a un truc du genre ci-dessus,
est-ce que c'est obligatoire que l'espace située entre les 2 "encoded
words" doit disparaître après décodage ?

Oui. Quand deux encoded-words se suivent, on supprime tous les espaces
blancs et sauts de ligne qu'il y a entre les deux. Dans tous les autres
cas (du moins dans un champ tel que Subject) on remplace tous les
espaces blancs et sauts de ligne par une seule espace.

Autre question : se peut-il qu'il faille utiliser parfois Base64 *puis* iconv ?

Oui, c'est le cas lorsque l'encodage utilisé dans l'entête n'est pas celui
de ton terminal (en principe UTF-8).

Si oui, pourrais-je avoir un exemple d'"encoded word" qui le nécessite ?

Il y en a un dans les exemples que j'ai donnés : =?Shift-JIS?B?k4yLngo=? qui encode le nom japonais de Tokyo en Shift-JIS (beaucoup plus efficace
que l'UTF-8 pour le japonais).
--
Olivier Miakinen

Olivier Miakinen

20/10/2019 à 17:39

Le 20/10/2019 14:23, Joseph-B a écrit :

Bonjour,
Olivier Miakinen <om+ demandait :
Tu pourrais mettre « set -x »
dans la fonction decode_b64 pour qu'on voie ce qui se passe ?

oui, j'avais posté le test ici
Message-ID: <1ofq61z.1hkad6f1eg28gqN%

Oups ! Désolé, je n'avais pas vu que tu l'avais fait et que ça
n'affichait rien. C'est à cause de ma redirection de stderr vers
/dev/null.
Est-ce que tu peux rajouter la ligne suivante après la déclaration
de la fonction decode_b64 ?
echo U2FsdXQK | decode_b64
Avec le « set -x », chez moi ça affiche :
+ IFS + read -n 4 chunk
+ c1=U
+ c2=2
+ c3=F
+ c4=s
++ expr index ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ _U
+ i1!
+ n32R42880
++ expr index ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ _2
+ i2U
+ n32T64064
++ printf '%03o' 83
+ printf '123'
S+ '[' F = = ']'
++ expr index ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ _F
+ i3=6
+ n32T64384
++ printf '%03o' 97
+ printf '141'
a+ '[' s = = ']'
++ expr index ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ _s
+ i4E
+ n32T64428
++ printf '%03o' 108
+ printf '154'
l+ IFS + read -n 4 chunk
+ c1=d
+ c2=X
+ c3=Q
+ c4=K
++ expr index ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ _d
+ i10
+ n32v02176
++ expr index ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ _X
+ i2$
+ n32v96384
++ printf '%03o' 117
+ printf '165'
u+ '[' Q = = ']'
++ expr index ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ _Q
+ i3
+ n32v97408
++ printf '%03o' 116
+ printf '164'
t+ '[' K = = ']'
++ expr index ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ _K
+ i4
+ n32v97418
++ printf '%03o' 10
+ printf '12'
+ IFS + read -n 4 chunk
+ c1 + c2 + c3 + c4 ++ expr index ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ _
+ i1=0
+ return 1
--
Olivier Miakinen

Olivier Miakinen

20/10/2019 à 17:48

Le 20/10/2019 14:23, Joseph-B a écrit :

M.V. surestimant mes connaissances :
=?UTF-8?B?zpEK?= =?UTF-8?B?zrjOrs69zrEK? >
Question que je me pose : quand on a un truc du genre ci-dessus,
est-ce que c'est obligatoire que l'espace située entre les 2 "encoded
words" doit disparaître après décodage ?

Sauf erreur de ma part, le caractère espace ici présent va être vu par le shell
comme un opérateur dans une chaine de commandes et donc va forcément disparaître
du résultat.

Euh... le shell n'y est pour rien ici.
Dans mon script, c'est la fonction add_word qui décide d'ajouter ou
non une espace entre deux mots. Quand une espace est ajoutée, c'est
avec l'instruction suivante :
DECODED_LINE="${DECODED_LINE} "
Si on ne passe pas dans ce code, les mots sont collés.

[...]
Pour tout te dire, les arcanes des RFC impliquées, c'est plutôt le domaine
d'Olivier ;-)

... et en l'occurrence, c'est le RFC 2047 qui décrit la suppression
des espaces entre deux encoded-words :
<https://tools.ietf.org/html/rfc2047#section-6.2>
When displaying a particular header field that contains multiple
'encoded-word's, any 'linear-white-space' that separates a pair of
adjacent 'encoded-word's is ignored. (This is to allow the use of
multiple 'encoded-word's to represent long strings of unencoded text,
without having to separate 'encoded-word's where spaces occur in the
unencoded text.)
</>
--
Olivier Miakinen

M.V.

20/10/2019 à 18:15

Le 20 octobre 2019 à 17:33, Olivier Miakinen m'a répondu :

Oui.

Merci pour toutes tes réponses.
Bonne soirée.
--
Michel VAUQUOIS - http://michelvauquois.fr

josephb

20/10/2019 à 18:18

Olivier Miakinen <om+ wrote:

Est-ce que tu peux rajouter la ligne suivante après la déclaration
de la fonction decode_b64 ?
echo U2FsdXQK | decode_b64

Fait et voilà maintenant que ça retourne pour
echo "Subject: Re: Going to =?Shift-JIS?B?k4yLngo=?= (Tokyo)" | olivier6.sh
+ IFS + read -n 4 chunk
+ c1=U
+ c2=2
+ c3=F
+ c4=s
++ expr index ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ _U
expr: syntax error
+ i1 + return 1
Subject: Re: Going to =?Shift-JIS?B?k4yLngo=?= (Tokyo)
et ça s'arrête là.
--
J. B.

josephb

20/10/2019 à 18:18

Olivier Miakinen <om+ wrote:

Euh... le shell n'y est pour rien ici.
Dans mon script, c'est la fonction add_word qui décide d'ajouter ou
non une espace entre deux mots. Quand une espace est ajoutée, c'est
avec l'instruction suivante :
DECODED_LINE="${DECODED_LINE} "
Si on ne passe pas dans ce code, les mots sont collés.

merci de la précision ;-)
--
J. B.

1 2 3

Script num=c3=a9ro 5

10 réponses

Veuillez sélectionner un problème