473,398 Members | 2,125 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

XSLT - Extract leading integers from text string

I am receiving a series of Microsoft Word documents from web clients
that they upload to my server. I need to convert them to XML to pass
through to another system. I have done this through Microsoft Word
2003 with the "Save as XML" option. Now the tricky part is that a
portion of these documents contains a numbered outline. Unfortunately,
I need to remove the numbering before passing the data to the next
system. Here is an example of what I am talking about:

1.{Tab}Check on the status of feedback from the newsgroup.
2.{Tab}If response received, read the response
3.{Tab}If the response applies, use it

(Replace the {Tab} with a tab key...). Unfortunately, there is no
guarantee that it will be a tab key. Some users have been uploading a
true outline formatted document. As such, I need to build in a "slop
factor" if possible.

After the document is saved as XML using the Microsoft Word 2003 "Save
as XML" option, the text string for the numbers looks like:

1.Check on the status...

See the lack of space between the numeral and the text? Now, I could
use the XSLT substring() function except that the number of items
sometimes is less than 10 (single digit), but has been seen to reach
over 100 (triple digit). Does anyone know of a creative way to strip
away the leading integer plus the period from the text string? Or,
does anyone know of a way to determine the location of the decimal if
the characters before it are integers?

Any help would be greatly appreciated.

Greg Howard

Jul 20 '05 #1
4 2083
Tempore 22:29:52, die Wednesday 02 March 2005 AD, hinc in foro {comp.text.xml} scripsit gr*******@no.spam.email.com.no.spam <gr*******@email.com>:
Does anyone know of a creative way to strip
away the leading integer plus the period from the text string? Or,
does anyone know of a way to determine the location of the decimal if
the characters before it are integers?

Any help would be greatly appreciated.


Maybe I didn't read your mail very good, but wouldn't "substring-after( p , '.')" work?
regards,
--
Joris Gillis (http://www.ticalc.org/cgi-bin/acct-v...i?userid=38041)
Spread the wiki (http://www.wikipedia.org)
Jul 20 '05 #2
It would if I knew for a fact that the sentence in question was indeed
part of the outline. But if the user includes a sentence in between
two outline items, then I would process a sentence unnecessarily. For
example

1.{Tab}Check on the status of replies from the newsgroup.
Note to self: This is important to do.
2.{Tab}Read responses.

I would prefer to build in as much prevention as possible without
having to switch over to an XML parser interface (like DOM)

Jul 20 '05 #3
This may not be this best or final answer but I thought I would share
what I found to work for me so that I can move on to other issues:

<xsl:if test="string-length(substring-before(., '.'))&lt;=3">
<xsl:value-of select="substring-after(., '.')" />
</xsl:if>

At least this way I can be sure I am not stripping off the leading
portion of a sentence that has a dollar amount buried in it. If anyone
has any further thoughts or ideas, please share them.

Oh, and thanks Joris Gillis for your earlier input. I failed to say it
when I responded.

-Greg

Jul 20 '05 #4
gr*******@no.spam.email.com.no.spam wrote:
It would if I knew for a fact that the sentence in question was indeed
part of the outline. But if the user includes a sentence in between
two outline items, then I would process a sentence unnecessarily. For
example

1.{Tab}Check on the status of replies from the newsgroup.
Note to self: This is important to do.
2.{Tab}Read responses.

I would prefer to build in as much prevention as possible without
having to switch over to an XML parser interface (like DOM)


Something along this line:

1. get substring before '.'
2. use number() on the result
3. test result of number() against NaN
4. if !NaN, get the substring after '.'

Clumpsy, but doable.
Jul 20 '05 #5

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

1
by: inquirydog | last post by:
Can anyone explain to me why the following XQuery expression (a simple xpath expression) returns a different result than the same expression in xslt? document("document.xml")//a/@b For the...
6
by: David Walker | last post by:
Hi, I have an XML file created by a third party in which an element with a simple content model has a text value consisting of 2 parts separated by a colon, like this ...
4
by: Chris Kettenbach | last post by:
Hi Peter, I get error when processing the stylesheet. It errors here. <xsl:for-each select="registration)=1]"> specifically: Expression does not return a DOM node. registration)=1]<--
5
by: samik_tanik | last post by:
I need to export a datagrid to Excel. I could did this. But, also need to keep the leading zeros in the data. How can I acheive this? Any help would be appreciated. -- Thanking you in...
4
by: Moogy | last post by:
I'm pulling my hair out here. First, I'm new to XML, so that doesn't help, but none of this makes any sense to me. All I'm trying to do is take a simple source XML file and translate it with an...
8
by: nick | last post by:
Hi all can any one please tell me what is wrong in this code?? I'm new to deal with text files and extract data. i'm trying to look for data in a text file (3~4 pages) some lines start with a...
4
by: shaun roe | last post by:
I should like to count the frequency of strings embedded in a longer string, space separated. Specifically, I have: <phiModule> 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 8 5 5 5 6 6 6 7 7 7 7 7 7 7 7...
12
by: Chris | last post by:
Hi, Just wondering if anyone out there knows if it is possible to convert a CSV to xml using XSLT? I've seen a lot of examples of xml to CSV, but is it possible to go back the other way? I...
3
by: joelkeepup | last post by:
Hi, im trying to create a text email message using xslt template , the transforms work great, but the newlines and whitespace in the xslt doc are removed. Is there a setting somewhere I have...
0
by: Charles Arthur | last post by:
How do i turn on java script on a villaon, callus and itel keypad mobile phone
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
1
by: Sonnysonu | last post by:
This is the data of csv file 1 2 3 1 2 3 1 2 3 1 2 3 2 3 2 3 3 the lengths should be different i have to store the data by column-wise with in the specific length. suppose the i have to...
0
by: Hystou | last post by:
There are some requirements for setting up RAID: 1. The motherboard and BIOS support RAID configuration. 2. The motherboard has 2 or more available SATA protocol SSD/HDD slots (including MSATA, M.2...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.