Character set / encoding confusion

General talk about translations & I18n (Internationalization)

Character set / encoding confusion

Postby pappabravo » 3:46pm, Mon 22 Nov, 2010

This may not be a PHPlist problem. However, it is a problem that appears when I use PHPlist, so perhaps somebody can help?

I am getting totally confused as to what character set / encoding to use. I like to use Word to compose my messages as I need the spelling error detection, grammar etc. As word includes too much garbage I normally cut and paste into Notepad to strip all the unnecessary code, and then I cut and paste into the PHPList message composition box.

When I send a testmail to myself, the HTML version is fine, but the text version shows the special Danish characters as for instance blærekræft. I have seen this before and it has always been a problem to get rid of it. It gets even stranger, when it looks fine in Google mail, but only bad in Outlook.

I have set character set for HTML and Text messages in the configure page as ISO 8859-1.

I use SMTP mailing, and it seems in the config file that you can specify encoding when you do not use phpmailer (which I guess I don't if I use SMTP?), and have tried all three suggested setting with same results.

Any idea what I can do to to just write my Danish characters and get them into my mails without all this fuss?

Thanks
PappaBravo
Copenhagen, Denmark
pappabravo
phpLister
 
Posts: 5
Joined: 10:40am, Thu 21 Oct, 2010

Re: Character set / encoding confusion

Postby H2B2 » 6:10pm, Mon 22 Nov, 2010

In order to know whether or not your system is correctly configured, please check the following charset settings:

1. Charset for messages
On the 'configuration page', check these settings:
- Charset for HTML messages:
- Charset for Text messages:

2. Charset of your admin interface pages
The HTML source of your 'send a message' page should include a charset header line, e.g. <meta http-equiv="content-type" content="text/html; charset=UTF-8" />

3. Charset of public pages
The HTML source of your subscribe page should include a charset header line, e.g. <meta http-equiv="content-type" content="text/html; charset=UTF-8" />

4. Database encoding
If you have access to phpMyAdmin use this command:
    SHOW VARIABLES LIKE 'c%'

When posting in the forum, please always mention which phpList version you are running.
H2B2
Moderator
 
Posts: 7188
Joined: 1:51am, Wed 15 Mar, 2006

Re: Character set / encoding confusion

Postby pappabravo » 9:54pm, Mon 22 Nov, 2010

Hi,
Answers:
1) Both HTML and Text: ISO-8859-1
2) <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
3) <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
4)
+ Options
Partial Texts
Full Texts
Show binary contents
Show BLOB contents
Hide Browser transformation
Variable_name Value
character_set_client utf8
character_set_connection utf8
character_set_database latin1
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/share/mysql/charsets/
collation_connection utf8_general_ci
collation_database latin1_swedish_ci
collation_server latin1_swedish_ci
completion_type 0
concurrent_insert 1
connect_timeout 10

Version (sorry ;-): version 2.10.12

The Danish translation file danish.inc states that the encoding is: iso-8859-1. If I change the encoding in the configuration page (your question #1) to UTF-8 the Danish translation loses the Danish characters which otherwise have displayed correctly.

Rgds
PappaBravo
pappabravo
phpLister
 
Posts: 5
Joined: 10:40am, Thu 21 Oct, 2010

Re: Character set / encoding confusion

Postby H2B2 » 10:16pm, Mon 22 Nov, 2010

Character set configuration is still a bit of a struggle at the moment. As you probably know, all charset encoding settings must point in the same direction, preferably UTF-8 which has been hardcoded since version 2.10.8 or thereabout.
See also viewtopic.php?p=81753#p81753

Also, since version 2.10.11 you need to configure the charset of admin pages separately in a language-specific way, i.e. by changing the charset value of the language_info file in your admin dir (e.g. lists/admin/lan/da/ ).

Your database seems largely set for UTF-8 (except character_set_database and collations), so configuring ALL settings to UTF-8 should have worked. It is possible you weren't aware of the settings in the language_info file and the database. This could explain why UTF-8 didn't work, couldn't it?

I think you have 2 options to solve this:
1. Configure all settings to UTF-8 (see How to configure UTF-8 in phpList)
2. Configure all settings to iso-8859-1 and neutralize UTF-8 hardcoding in 6 php files (see How to configure phpList with non-UTF-8 charsets)
H2B2
Moderator
 
Posts: 7188
Joined: 1:51am, Wed 15 Mar, 2006

Re: Character set / encoding confusion

Postby pappabravo » 9:01pm, Tue 23 Nov, 2010

Hi again,

Of the two options you have suggested towards the end of the last post I will go for UTF-8 everywhere.

Doing that, I know my language file will create problems with the Danish language for instance on the subscribe page, so that is the first to work on. I have taken the danish.inc file into Notepad++. I have set it for UTF-8 encoding and manually overwritten the garbled Danish characters with new correct ones. I have stated that the language is UFT-8 at the $strCharSet. I have saved it as danish-utf8.inc and changed the language in the config.php file to the new file name.

I have changed all instances of character set information setup/configuration to UTF-8. I have even sat ../list/admin/lan/da/language_info to Danish and to UTF-8.

The result now is that the content of my mails display correctly when received, but not the footer info regarding unsubscribe/preferences/forward. Also, any text I try to include from now on in the admin, such as the texts in the config page, names of lists etc. now show the Danish characters as a ? (question mark) after I save and view it again. What more can I change?

Another strange thing is that when I look at the source HTML behind the "subscribe page" it still says charset=iso-8859-1. Where does it get that from when I have changed everything?

Still puzzled...

Rgds
PappaBravo
pappabravo
phpLister
 
Posts: 5
Joined: 10:40am, Thu 21 Oct, 2010

Re: Character set / encoding confusion

Postby H2B2 » 11:36pm, Tue 23 Nov, 2010

pappabravo wrote:What more can I change?

You need to make changes in four different settings:
1. Charset for messages, on the 'configuration page' : done

2. Charset of your admin interface pages: You changed the charset setting for the danish language. This should take effect if you are actually using Danish as your admin interface language, i.e., it won't have any effect if you are using English as your admin language. If you are using english as you admin language, please see viewtopic.php?p=76277#p76277

3. Charset of public pages: done.

4. Database encoding: Is as important as all other settings. You don't mention having changed this. If you haven't done so yet, enter the following command in phpMyAdmin:
    ALTER DATABASE <db_name> DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
Afterwards, check with: SHOW VARIABLES LIKE 'c%'

IMPORTANT: Before altering your database, best make a full database backup.


pappabravo wrote:Another strange thing is that when I look at the source HTML behind the "subscribe page" it still says charset=iso-8859-1. Where does it get that from when I have changed everything?

In order to load the new configuration: did you logout, flush browser cache (Ctrl-F5), and login again?

NOTE
Keep in mind that, when testing your modified setup, it is best to create a new message, as the old message will contain characters that have been saved to the database with an incorrect encoding.

Also remember that when changing the database encoding, all special characters will need to be re-entered in the database in order to display correctly on the subscribe pages and other public pages, as well as in system messages, and footer text. This should always be the last step to take, i.e. after you made sure all charset settings are correctly configured.
H2B2
Moderator
 
Posts: 7188
Joined: 1:51am, Wed 15 Mar, 2006

Re: Character set / encoding confusion

Postby pappabravo » 11:47am, Wed 24 Nov, 2010

First - thank your for your quick replies and help. I can imagine you have been dealing with character set problems a few times by now.

I have got quite a lot further. The big difference was creating the "missing" language_info in the EN folder, and put the UTF-8 reference in there. I did have to retype the texts in the configuration page etc. but now they both show correctly on the admin screens and the "unsubscribe, preferences and forward texts now show the Danish characters.

I have also, after having backed up the database, changed the database encoding.

I thought I was home and dry when the test e-mails to myself all showed correctly. To be on the safe side I also send one to my wife. And to my frustration the text format file totally omitted that Danish characters, while the HMTL was fine.

The only difference is that my wife use Outlook 2007 and I use Outlook 2010.

So we are down to PHPlist working, but I am still frustrated that the character set / encoding standards are so "unstandardized". Have you got any suggests as to what to do to get the text only version into a format that can be read by all mail programs?

Thanks
PappaBravo

PS. I have tried to enter a new message to make sure it wasn't remains of old problems I see. I have also tried the
define("HTMLEMAIL_ENCODING","quoted-printable");
define("TEXTEMAIL_ENCODING","7bit");
in the config.php file with both 7bit and quoted-printable. I don't know what base64 is.
pappabravo
phpLister
 
Posts: 5
Joined: 10:40am, Thu 21 Oct, 2010

Re: Character set / encoding confusion

Postby H2B2 » 12:11pm, Wed 24 Nov, 2010

pappabravo wrote:The only difference is that my wife use Outlook 2007 and I use Outlook 2010.

This might well be a configuration issue of the local mail client: i.e. your wife's Outlook might not be fully configured for UTF-8 (unicode). I'm not using outlook myself, but this may (possibly) help:
Allen Song - Microsoft, Moderator wrote:Please check whether the outlook’s profile is running in Unicode mode. Open Outlook, click Account Settings, double click the account, click More Settings button, in the Advanced tab, please view the description under Mailbox Mode.
Source: social.technet.microsoft.com/Forums


obviously, in most cases you cannot control the configuration of local clients. Best would be to include a link to an online version (HTML page).
See for instance
viewtopic.php?p=45664#p45664
viewtopic.php?p=58496#p58496
viewtopic.php?t=17284
viewtopic.php?t=1501
H2B2
Moderator
 
Posts: 7188
Joined: 1:51am, Wed 15 Mar, 2006

Re: Character set / encoding confusion

Postby oberheimer » 7:59am, Wed 06 Apr, 2011

Why would you want the uft-8 when you are in denmark, strange. We have iso-8859-5 here in sweden i guess it should be the same. It still does'nt work here and i have tried to change everything. It's really strange that you have to do all theese settings, It should have been in the set up and not afterwards. Can't this be fixed in the next version...
I don't have time to spend my time on this and i'm not going to hire someone to fix this
oberheimer
phpLister
 
Posts: 6
Joined: 9:52am, Tue 01 Mar, 2011

Re: Character set / encoding confusion

Postby mkowip » 1:05pm, Thu 21 Apr, 2011

hey man

this problem is caused is due to a hardcoded utf-8 conversion when sending text messages.
no need to change .inc file charset or anything else. I'm running phplist's latest version (as of april 10th) with all settings configured to iso-8859-1.

it was just a matter of modifying admin/sendemaillib.php file at replaceChars(...) function as pointed out below:

FROM:
# eze
# $text = html_entity_decode ( $text , ENT_QUOTES , $GLOBALS['strCharSet'] );
$text = html_entity_decode ( $text , ENT_QUOTES , 'UTF-8' );

TO:
# eze
$text = html_entity_decode ( $text , ENT_QUOTES , $GLOBALS['strCharSet'] );
# $text = html_entity_decode ( $text , ENT_QUOTES , 'UTF-8' );


your base charset will be used to convert HTML to text, thus rendering your message correctly.

that took me weeks to figure out, but it's finally running great now.

Phplist rocks!
mkowip
phpList newbie
 
Posts: 1
Joined: 1:00pm, Thu 21 Apr, 2011

Re: Character set / encoding confusion

Postby H2B2 » 3:19am, Mon 25 Apr, 2011

mkowip wrote:it was just a matter of modifying admin/sendemaillib.php file at replaceChars(...) function as pointed out below:

There are a few more places where UTF-8 is hardcoded. For more details, see the last link in this post in the same thread: viewtopic.php?p=81682#p81682
H2B2
Moderator
 
Posts: 7188
Joined: 1:51am, Wed 15 Mar, 2006

Re: Character set / encoding confusion

Postby Joppedi » 10:25am, Tue 10 May, 2011

I have this problem in swedish and have followed ALL steps mentioned here (chosed utf-8).
It solved the e-mails but it still gets wrong in the different respones based on subscribe template. As I understand it fetches strings from swedish.inc, use it in javascript and then writes it in html with the template.
If I test to change an swedish letter in swedish.inc to ISO-8859-1 in gets right in the output, although the swedish.inc is defined utf-8.

Chosed utf-8 because it seemed the default solution. Perhaps I should try the other way as mkowip suggests?
Joppedi
phpLister
 
Posts: 7
Joined: 10:17am, Wed 11 Oct, 2006
Location: Sweden


Return to Translators & Internationalization

Who is online

Users browsing this forum: No registered users and 1 guest

cron